ID |
Date |
Author |
Type |
Category |
Subject |
16270
|
Thu Aug 5 14:59:31 2021 |
Anchal | Update | General | Added temperature sensors at Yend and Vertex too |
I've added the other two temperature sensor modules on Y end (on 1Y4, IP: 192.168.113.241) and in the vertex on (1X2, IP: 192.168.113.242). I've updated the martian host table accordingly. From inside martian network, one can go to the browser and go to the IP address to see the temperature sensor status . These sensors can be set to trigger alarm and send emails/sms etc if temperature goes out of a defined range.
I feel something is off though. The vertex sensor shows temperature of ~28 degrees C, Xend says 20 degrees C and Yend says 26 degrees C. I believe these sensors might need calibration.
Remaining tasks are following:
- Modbus TCP solution:
- If we get it right, this will be easiest solution.
- We just need to add these sensors as streaming devices in some slow EPICS machine in there .cmd file and add the temperature sensing channels in a corresponding database file.
- Python workaround:
- Might be faster but dirty.
- We run a python script on megatron which requests temperature values every second or so from the IP addresses and write them on a soft EPICs channel.
- We still would need to create a soft EPICs channel fro this and add it to framebuilder data acquisition list.
- Even shorted workaround for near future could be to just write temperature every 30 min to a log file in some location.
[anchal, paco]
We made a script under scripts/PEM/temp_logger.py and ran it on megatron. The script uses the requests package to query the latest sensor data from the three sensors every 10 minutes as a json file and outputs accordingly. This is not a permanent solution. |
16274
|
Tue Aug 10 17:24:26 2021 |
paco | Update | General | Five day trend |
Attachment 1 shows a five and a half day minute-trend of the three temperature sensors. Logging started last Thursday ~ 2 pm when all sensors were finally deployed. While it appears that there is a 7 degree gradient along the XARM it seems like the "vertex" (more like ITMX) sensor was just placed on top of a network switch (which feels lukewarm to the touch) so this needs to be fixed. A similar situation is observed in the ETMY sensor. I shall do this later today.
Done. The temperature reading should now be more independent from nearby instruments.
Wed Aug 11 09:34:10 2021 I updated the plot with the full trend before and after rearranging the sensors. |
16277
|
Thu Aug 12 11:04:27 2021 |
Paco | Update | General | PSL shutter was closed this morning |
Thu Aug 12 11:04:42 2021 Arrived to find the PSL shutter closed. Why? Who? When? How? No elog, no fun. I opened it, IMC is now locked, and the arms were restored and aligned. |
16278
|
Thu Aug 12 14:59:25 2021 |
Koji | Update | General | PSL shutter was closed this morning |
What I was afraid of was the vacuum interlock. And indeed there was a pressure surge this morning. Is this real? Why didn't we receive the alert? |
16279
|
Thu Aug 12 20:52:04 2021 |
Koji | Update | General | PSL shutter was closed this morning |
I did a bit more investigation on this.
- I checked P1~P4, PTP2/3, N2, TP2, TP3. But found only P1a and P2 were affected.
- Looking at the min/mean/max of P1a and P2 (Attachment 1), the signal had a large fluctuation. It is impossible to have P1a from 0.004 to 0 instantaneously.
- Looking at the raw data of P1a and P2 (Attachment 2), the value was not steadily large. Instead it looks like fluctuating noise.
So my conclusion is that because of an unknown reason, an unknown noise coupled only into P1a and P2 and tripped the PSL shutter. I still don't know the status of the mail alert. |
16288
|
Mon Aug 23 11:51:26 2021 |
Koji | Update | General | Campus Wide Power Glitch Reported: Monday, 8/23/21 at 9:30am |
Campus Wide Power Glitch Reported: Monday, 8/23/21 at 9:30am (more like 9:34am according to nodus log)
nodus: rebooted. ELOG/apache/svn is running. (looks like Anchal worked on it)
chiara: survived the glitch thanks to UPS
fb1: not responding -> @1pm open to login / seemed rebooted only at 9:34am (network path recovered???)
megatron: not responding
optimus: no route to host
c1aux: ping ok, ssh not responding -> needed to use telnet (vme / vxworks)
c1auxex: ssh ok
c1auxey: ping ok, ssh not respoding -> needed to use telnet (vme / vxworks)
c1psl: ping NG, power cycled the switch on 1X2 -> ssh OK now
c1iscaux: ping NG -> rebooted the machine -> ssh recovered
c1iscaux2: does not exist any more
c1susaux: ping NG -> responds after 1X2 switch reboot
c1pem1: telnet ok (vme / vxworks)
c1iool0: does not exist any more
c1vac1: ethernet service restarted locally -> responding
ottavia: doesnot exist?
c1teststand: ping ok, ssh not respoding
3:20PM we started restarting the RTS |
16290
|
Mon Aug 23 19:00:05 2021 |
Koji | Update | General | Campus Wide Power Glitch Reported: Monday, 8/23/21 at 9:30am |
Restarting the RTS was unsuccessful because of the timing discrepancy error between the RT machines and the FB. This time no matter how we try to set the time, the IOPs do not run with "DC status" green. (We kept having 0x4000)
We then decided to work on the recovery without the data recorded. After some burtrestores, the IMC was locked and the spot appeared on the AS port. However, IPC seemed down and no WFS could run. |
16291
|
Mon Aug 23 22:51:44 2021 |
Anchal | Update | General | Time synchronization efforts |
Related elog thread: 16286
I didn't really achieve anything but I'm listing what I've tried.
- I know now that the timesyncd isn't working because systemd-timesyncd is known to have issues when running on a read-only file system. In particular, the service does not have privileges to change the clock or drift settings at /run/systemd/clock or /etc/adjtime.
- The workarounds to these problems are poorly rated/reviews in stack exchange and require me to change the /etc/systmd/timesyncd.conf file but I'm unable to edit this file.
- I know that Paco was able to change these files earlier as the files are now changed and configured to follow a debian ntp pool server which won't work as the FEs do not have internet access. So the conf file needs to be restored to using ntpserver as the ntp server.
- From system messages, the ntpserver is recognized by the service as shown in the second part of 16285. I really think the issue is in file permissions. the file /etc/adjtime has never been updated since 2017.
- I got help from Paco on how to edit files for FE machines. The FE machines directories are exported from fb1:/diskless/root.jessie/
- I restored the /etc/systmd/timesyncd.conf file to how it as before with just servers=ntpserver line. Restarted timesyncd service on all FEs,I tried a few su the synchronization did not happen.
- I tried a few suggestions from stackexchange but none of them worked. The only rated solution creates a tmpfs directory outside of read-only filesystem and uses that to run timesyncd. So, in my opinion, timesyncd would never work in our diskless read-only file system FE machines.
- One issue in an archlinux discussion ended by the questioner resorting to use opennptd from openBSD distribution. The user claimed that opennptd is simple enough that it can run ntp synchornization on a read-only file system.
- Somehwat painfully, I 'kind of' installed the openntpd tool in the fb1:/diskless/root.jessie directory following directions from here. I had to manually add user group and group for the FEs (which I might not have done correctly). I was not able to get the openntpd daemon to start properly after soe tries.
- I restored everything back to how it was and restarted timesyncd in c1sus even though it would not do anything really.
Quote: |
This time no matter how we try to set the time, the IOPs do not run with "DC status" green. (We kept having 0x4000)
|
|
16292
|
Tue Aug 24 09:22:48 2021 |
Anchal | Update | General | Time synchronization working now |
Jamie told me to use chroot to log in into the chroot jail of debian os that are exported for the FEs and install ntp there. I took following steps at the end of which, all FEs have NTP synchronized now.
- I logged into fb1 through nodus.
- chroot /diskless/root.jessie /bin/bash took me to the bash terminal for debian os that is exported to all FEs.
- Here, I ran sudo apt-get install ntp which ran without any errors.
- I then edited the file in /etc/ntp.conf , i removed the default servers and added following lines for servers (fb1 and nodus ip addresses):
server 192.113.168.201
server 192.113.168.201
- I logged into each FE machine and ran following commands:
sudo systemctl stop systemd-timesyncd.service; sudo systemctl status systemd-timesyncd.service;
timedatectl; sleep 2;sudo systemctl daemon-reload; sudo systemctl start ntp; sleep 2; sudo systemctl status ntp; timedatectl
sudo hwclock -s
- The first line ensures that systemd-timesyncd.service is not running anymore. I did not uninstall timesyncd and left its configuration file as it is.
- The second line first shows the times of local and RTC clocks. Then reloads the daemon services to get ntp registered. Then starts ntp.service and shows it's status. Finally, the timedatectl command shows the synchronized clocks and that NTP synchronization has occured.
- The last line sets the local clock same as RTC clock. Even though this wasn't required as I saw that the clocks were already same to seconds, I just wanted a point where all the local clocks are synchronized to the ntp server.
- Hopefully, this would resolve our issue of restarting the models anytime some glitch happens or when we need ot update something in one of them.
Edit Tue Aug 24 10:19:11 2021:
I also disabled timesyncd on all FEs using sudo systemctl disable systemd-timesyncd.service
I've added this wiki page for summarizing the NTP synchronization knowledge. |
16293
|
Tue Aug 24 18:11:27 2021 |
Paco | Update | General | Time synchronization not really working |
tl;dr: NTP servers and clients were never synchronized, are not synchronizing even with ntp... nodus is synchronized but uses chronyd; should we use chronyd everywhere?
Spent some time investigating the ntp synchronization. In the morning, after Anchal set up all the ntp servers / FE clients I tried restarting the rts IOPs with no success. Later, with Tega we tried the usual manual matching of the date between c1iscex and fb1 machines but we iterated over different n-second offsets from -10 to +10, also without success.
This afternoon, I tried debugging the FE and fb1 timing differences. For this I inspected the ntp configuration file under /etc/ntp.conf in both the fb1 and /diskless/root.jessie/etc/ntp.conf (for the FE machines) and tried different combinations with and without nodus, with and without restrict lines, all while looking at the output of sudo journalctl -f on c1iscey. Everytime I changed the ntp config file, I restarted the service using sudo systemctl restart ntp.service . Looking through some online forums, people suggested basic pinging to see if the ntp servers were up (and broadcasting their times over the local network) but this failed to run (read-only filesystem) so I went into fb1, and ran sudo chroot /diskless/root.jessie/ /bin/bash to allow me to change file permissions. The test was first done with /bin/ping which couldn't even open a socket (root access needed) by running chmod 4755 /bin/ping then ssh-ing into c1iscey and pinging the fb1 machine successfully. After this, I ran chmod 4755 /usr/sbin/ntpd so that the ntp daemon would have no problem in reaching the server in case this was blocking the synchronization. I exited the chroot shell and the ntp daemon in c1iscey; but the ntpstat still showed unsynchronised status. I also learned that when running an ntp query with ntpq -p if a client has succeeded in synchronizing its time to the server time, an asterisk should be appended at the end. This was not the case in any FE machine... and looking at fb1, this was also not true. Although the fb1 peers are correctly listed as nodus, the caltech ntp server, and a broadcast (.BCST.) server from local time (meant to serve the FE machines), none appears to have synchronized... Going one level further, in nodus I checked the time synchronization servers by running chronyc sources the output shows
controls@nodus|~> chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* testntp1.superonline.net 1 10 377 280 +1511us[+1403us] +/- 92ms
^+ 38.229.59.9 2 10 377 206 +8219us[+8219us] +/- 117ms
^+ tms04.deltatelesystems.ru 2 10 377 23m -17ms[ -17ms] +/- 183ms
^+ ntp.gnc.am 3 10 377 914 -8294us[-8401us] +/- 168ms
I then ran chronyc clients to find if fb1 was listed (as I would have expected) but the output shows this --
Hostname Client Peer CmdAuth CmdNorm CmdBad LstN LstC
========================= ====== ====== ====== ====== ====== ==== ====
501 Not authorised
So clearly chronyd succeeded in synchronizing nodus' time to whatever server it was pointed at but downstream from there, neither the fb1 or any FE machines seem to be synchronizing properly. It may be as simple as figuring out the correct ntp configuration file, or switching to chronyd for all machines (for the sake of homogeneity?) |
16295
|
Tue Aug 24 22:37:40 2021 |
Anchal | Update | General | Time synchronization not really working |
I attempted to install chrony and run it on one of the FE machines. It didn't work and in doing so, I lost the working NTP client service on the FE computers as well. Following are some details:
- I added the following two mirrors in the apt source list of root.jessie at /etc/apt/sources.list
deb http://ftp.us.debian.org/debian/ jessie main contrib non-free
deb-src http://ftp.us.debian.org/debian/ jessie main contrib non-free
- Then I installed chrony in the root.jessie using
sudo apt-get install chrony
- I was getting an error E: Can not write log (Is /dev/pts mounted?) - posix_openpt (2: No such file or directory) . To fix this, I had to run:
sudo mount -t devpts none "$rootpath/dev/pts" -o ptmxmode=0666,newinstance
sudo ln -fs "pts/ptmx" "$rootpath/dev/ptmx"
- Then, I had another error to resolve.
Failed to read /proc/cmdline. Ignoring: No such file or directory
start-stop-daemon: nothing in /proc - not mounted?
To fix this, I had to exit to fb1 and run:
sudo mount --bind /proc /diskless/root.jessie/proc
- With these steps, chrony was finally installed, but I immediately saw an error message saying:
Starting /usr/sbin/chronyd...
Could not open NTP sockets
- I figured this must be due to ntp running in the FE machines. I logged into c1iscex and stopped and disabled the ntp service:
sudo systemctl stop ntp
sudo systemctl disable ntp
- I saw some error messages from the above coomand as FEs are read only file systems:
Synchronizing state for ntp.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d ntp defaults
insserv: fopen(.depend.stop): Read-only file system
Executing /usr/sbin/update-rc.d ntp disable
update-rc.d: error: Read-only file system
- So I went back to chroot in fb1 and ran the two command sabove that failed:
/usr/sbin/update-rc.d ntp defaults
/usr/sbin/update-rc.d ntp disable
- The last line gave the output:
insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
- I igored this and moved forward.
- I copied the chronyd.service from nodus to the chroot in fb1 and configured it to use nodus as the server. The I started the chronyd.service
sudo systemctl status chronyd.service
but got the saem issue of NTP sockets.
â—Â chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled)
Active: failed (Result: exit-code) since Tue 2021-08-24 21:52:30 PDT; 5s ago
Process: 790 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=1/FAILURE)
Aug 24 21:52:29 c1iscex systemd[1]: Starting NTP client/server...
Aug 24 21:52:30 c1iscex chronyd[790]: Could not open NTP sockets
Aug 24 21:52:30 c1iscex systemd[1]: chronyd.service: control process exited, code=exited status=1
Aug 24 21:52:30 c1iscex systemd[1]: Failed to start NTP client/server.
Aug 24 21:52:30 c1iscex systemd[1]: Unit chronyd.service entered failed state.
-
I tried a few things to resolve this, but couldn't get it to work. So I gave up on using chrony and decided to go back to ntp service atleast.
-
I stopped, disabled and checked status of chrony:
sudo systemctl stop chronyd
sudo systemctl disable chronyd
sudo systemctl status chronyd
This gave the output:
â—Â chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled)
Active: failed (Result: exit-code) since Tue 2021-08-24 22:09:07 PDT; 25s ago
Aug 24 22:09:07 c1iscex systemd[1]: Starting NTP client/server...
Aug 24 22:09:07 c1iscex chronyd[2490]: Could not open NTP sockets
Aug 24 22:09:07 c1iscex systemd[1]: chronyd.service: control process exited, code=exited status=1
Aug 24 22:09:07 c1iscex systemd[1]: Failed to start NTP client/server.
Aug 24 22:09:07 c1iscex systemd[1]: Unit chronyd.service entered failed state.
Aug 24 22:09:15 c1iscex systemd[1]: Stopped NTP client/server.
-
I went back to fb1 chroot and removed chrony package and deleted the configuration files and systemd service files:
sudo apt-get remove chrony
-
But when I started ntp daemon service back in c1iscex, it gave error:
sudo systemctl restart ntp
Job for ntp.service failed. See 'systemctl status ntp.service' and 'journalctl -xn' for details.
-
Status shows:
sudo systemctl status ntp
â—Â ntp.service - LSB: Start NTP daemon
Loaded: loaded (/etc/init.d/ntp)
Active: failed (Result: exit-code) since Tue 2021-08-24 22:09:56 PDT; 9s ago
Process: 2597 ExecStart=/etc/init.d/ntp start (code=exited, status=5)
Aug 24 22:09:55 c1iscex systemd[1]: Starting LSB: Start NTP daemon...
Aug 24 22:09:56 c1iscex systemd[1]: ntp.service: control process exited, code=exited status=5
Aug 24 22:09:56 c1iscex systemd[1]: Failed to start LSB: Start NTP daemon.
Aug 24 22:09:56 c1iscex systemd[1]: Unit ntp.service entered failed state.
-
I tried to enable back the ntp service by sudo systemctl enable ntp. I got similar error messages of read only filesystem as earlier.
Synchronizing state for ntp.service with sysvinit using update-rc.d...
Executing /usr/sbin/update-rc.d ntp defaults
insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
insserv: fopen(.depend.stop): Read-only file system
Executing /usr/sbin/update-rc.d ntp enable
update-rc.d: error: Read-only file system
-
I came back to c1iscex and tried restarting the ntp service but got same error messages as above with exit code 5.
-
I checked c1sus, the ntp was running there. I tested the configuration by restarting the ntp service, and then it failed with same error message. So the remaining three FEs, c1lsc, c1ioo and c1iscey have running ntp service, but they won't be able to restart.
-
As a last try, I rebooted c1iscex to see if ntp comes back online nicely, but it doesn't.
Bottom line, I went to try chrony in the FEs, and I ended up breaking the ntp client services on the computers as well. We have no NTP synchronization in any of the FEs.
Even though Paco and I are learning about the ntp and cds stuff, I think it's time we get help from someone with real experience. The lab is not in a good state for far too long.
Quote: |
tl;dr: NTP servers and clients were never synchronized, are not synchronizing even with ntp... nodus is synchronized but uses chronyd; should we use chronyd everywhere?
|
|
16306
|
Wed Sep 1 21:55:14 2021 |
Koji | Summary | General | Towards the end upgrade |
- Sat amp mod and test: on going (Tega)
- Coil driver mod and test: on going (Tega)
- Acromag: almost ready (Yehonathan)
- IDC10-DB9 cable / D2100641 / IDC10F for ribbon in hand / Dsub9M ribbon brought from Downs / QTY 2 for two ends -> Made 2 (stored in the DSUB connector plastic box)
- IDC40-DB9 cable / D2100640 / IDC40F for ribbon in hand / DB9F solder brought from Downs / QTY 4 for two ends -> Made 4 0.5m cables (stored in the DSUB connector plastic box)
- DB15-DB9 reducer cable / ETMX2+ETMY2+VERTEX16+NewSOS14 = 34 / to be ordered
- End DAC signal adapter with Dewhitening (with DIFF/SE converter) / to be designed & built
- End ADC adapter (with SE/DIFF converter) / to be designed & built
MISC Ordering
- 3.5 x Sat Amp Adapter made (order more DSUB25 conns)
- -> Gave 2 to Tega, 1.5 in the DSUB box
- 5747842-4 A32100-ND -> 5747842-3 A32099-ND Qty40
- 5747846-3 A32125-ND -> 747846-3 A23311-ND Qty40
- Tega's sat amp components
- 499Ω P499BCCT-ND 78 -> Backorder -> RG32P499BCT-ND Qty 100
- 4.99KΩ TNPW12064K99BEEA 56 -> Qty 100
- 75Ω YAG5096CT-ND 180 -> Qty 200
- 1.82KΩ P18391CT-ND 103 -> Qty 120
- 68 nF P10965-ND 209
- Order more DB9s for Tega's sat amp adapter 4 units (look at the AA IO BOM)
- 4x 8x 5747840-4 DB9M PCB A32092-ND -> 6-747840-9 A123182-ND Qty 35
- 4x 5x 5747844-4 A32117-ND -> Qty 25
- 4x 5x DB9M ribbon MMR09K-ND -> 8209-8000 8209-8000-ND Qty 25
- 4x 5x 5746861-4 DB9F ribbon 5746861-4-ND -> 400F0-09-1-00 LFR09H-ND Qty 35
- Order 18bit DAC AI -> 16bit DAC AI components 4 units
- 4x 4x 5747150-8 DSUB9F PCB A34072-ND -> D09S24A4PX00LF609-6357-ND Qty 20
- 4x 1x 787082-7 CONN D-TYPE RCPT 68POS R/A SLDR (SCSI Female) A3321-ND -> 5787082-7 A31814-ND Qty 5
- 4x 1x 22-23-2021 Connector Header Through Hole 2 position 0.100" (2.54mm) WM4200-ND -> Qty5
|
16317
|
Wed Sep 8 19:06:14 2021 |
Koji | Update | General | Backup situation |
Tega mentioned in the meeting that it could be safer to separate some of nodus's functions from the martian file system.
That's an interesting thought. The summary pages and other web services are linked to the user dir. This has high traffic and can cause the issure of the internal network once we crash the disk.
Or if the internal system is crashed, we still want to use elogs as the source of the recovery info. Also currently we have no backup of the elog. This is dangerous.
We can save some of the risks by adding two identical 2TB disks to nodus to accomodate svn/elog/web and their daily backup.
host |
file system or contents |
condition |
note |
nodus |
root |
none or unknown |
|
nodus |
home (svn, elog) |
none |
|
nodus |
web (incl summary pages) |
backed up |
linked to /cvs/cds |
chiara |
root |
maybe |
need to check with Jon/Anchal |
chiara |
/home/cds |
local copy |
The backup disk is smaller than the main disk. |
chiara |
/home/cds |
remote copy - stalled |
we used to have, but stalled since 2017/11/17 |
fb1 |
root |
maybe |
need to check with Jon/Anchal |
fb1 |
frame |
rsync |
pulled from LDAS according to Tega |
|
|
|
|
|
16319
|
Mon Sep 13 04:12:01 2021 |
Tega | Update | General | Added temperature sensors at Yend and Vertex too |
I finally got the modbus part working on chiara, so we can now view the temperature data on any machine on the martian network, see Attachment 1.
I also updated the entries on /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini, as suggested by Koji, to include the SensorGatway temperature channels, but I still don't see their EPICs channels on https://ldvw.ligo.caltech.edu/ldvw/view. This means the channels are not available via nds so I think the temperature data is not being to be written to frame files on framebuilder but I am not sure what this entails, since I assumed C0EDCU.ini is the framebuilder daq channel list.
When the EPICs channels are available via nds, we should be able to display the temperature data on the summary pages.
Quote: |
I've added the other two temperature sensor modules on Y end (on 1Y4, IP: 192.168.113.241) and in the vertex on (1X2, IP: 192.168.113.242). I've updated the martian host table accordingly. From inside martian network, one can go to the browser and go to the IP address to see the temperature sensor status . These sensors can be set to trigger alarm and send emails/sms etc if temperature goes out of a defined range.
I feel something is off though. The vertex sensor shows temperature of ~28 degrees C, Xend says 20 degrees C and Yend says 26 degrees C. I believe these sensors might need calibration.
Remaining tasks are following:
- Modbus TCP solution:
- If we get it right, this will be easiest solution.
- We just need to add these sensors as streaming devices in some slow EPICS machine in there .cmd file and add the temperature sensing channels in a corresponding database file.
- Python workaround:
- Might be faster but dirty.
- We run a python script on megatron which requests temperature values every second or so from the IP addresses and write them on a soft EPICs channel.
- We still would need to create a soft EPICs channel fro this and add it to framebuilder data acquisition list.
- Even shorted workaround for near future could be to just write temperature every 30 min to a log file in some location.
[anchal, paco]
We made a script under scripts/PEM/temp_logger.py and ran it on megatron. The script uses the requests package to query the latest sensor data from the three sensors every 10 minutes as a json file and outputs accordingly. This is not a permanent solution.
|
|
16334
|
Wed Sep 15 23:53:54 2021 |
Koji | Summary | General | Towards the end upgrade |
Ordered compoenents are in.
- Made 36 more Sat Amp internal boards (Attachment 1). Now we can install the adapters to all the 19 sat amp units.
- Gave Tega the components for the sat amp adapter units. (Attachment 2)
- Gave Tega the componennts for the sat amp / coil driver modifications.
- Made 5 PCBs for the 16bit DAC AI rear panel interface (Attachment 3) |
16335
|
Thu Sep 16 00:00:20 2021 |
Koji | Update | General | RIO Planex 1064 Lasers in the south cabinet |
RIO Planex 1064 Lasers in the south cabinet
Property Number C30684/C30685/C30686/C30687 |
16336
|
Thu Sep 16 01:16:48 2021 |
Koji | Update | General | Frozen 2 |
It happened again. Defrosting required. |
16337
|
Thu Sep 16 10:07:25 2021 |
Anchal | Update | General | Melting 2 |
Put outside.
Quote: |
It happened again. Defrosting required.
|
|
16340
|
Thu Sep 16 20:18:13 2021 |
Anchal | Update | General | Reset |
Fridge brought back inside.
Quote: |
Put outside.
Quote: |
It happened again. Defrosting required.
|
|
|
16341
|
Fri Sep 17 00:56:49 2021 |
Koji | Update | General | Awesome |
The Incredible Melting Man!
|
16403
|
Thu Oct 14 16:38:26 2021 |
Ian MacMillan | Update | General | Kicking optics in freeSwing measurment |
[Ian, Anchal]
We are going to kick the optics tonight at 2am.
The optics we will kick are the PRM BS ITMX ITMY ETMX ETMY
We will kick each one once and record for 2000 seconds and the log files will be placed in users/ian/20211015_FreeSwingTest/logs. |
16405
|
Thu Oct 14 20:16:22 2021 |
Yehonathan | Update | General | PRMI free swinging |
{Yehonathan, Raj}
We aligned the IFO in the PRMI state and let it swing freely. |
16406
|
Fri Oct 15 12:14:27 2021 |
Ian MacMillan | Update | General | Kicking optics in freeSwing measurment |
[Ian, Anchal]
we ran the free swinging test last night and the results match up with in 1/10th of a Hz. We calculated the peak using the getPeakFreqs2 script to find the peaks and they are close to previous values from 2016.
In attachment 1 you will see the results of the test for each optic.
The peak values are as follows:
Optic |
POS (Hz) |
PIT (Hz) |
YAW (Hz) |
SIDE (Hz) |
PRM |
0.94 |
0.96 |
0.99 |
0.99 |
MC2 |
0.97 |
0.75 |
0.82 |
0.99 |
ETMY |
0.98 |
0.98 |
0.95 |
0.95 |
MC1 |
0.97 |
0.68 |
0.80 |
1.00 |
ITMX |
0.95 |
0.68 |
0.68 |
0.98 |
ETMX |
0.96 |
0.73 |
0.85 |
1.00 |
BS |
0.99 |
0.74 |
0.80 |
0.96 |
ITMY |
0.98 |
0.72 |
0.72 |
0.98 |
MC3 |
0.98 |
0.77 |
0.84 |
0.97 |
The results from 2016 can be found at: /cvs/cds/rtcdt/caltech/c1/scripts/SUS/PeakFit/parameters2.m |
16408
|
Fri Oct 15 17:17:51 2021 |
Koji | Summary | General | Vent Prep |
I took over the vent prep: I'm going through the list in [ELOG 15649] and [ELOG 15651]. I will also look at [ELOG 15652] at the day of venting.
- IFO alignment: Two arms are already locking. The dark port beam is well overlapped. We will move PRM/SRM etc. So we don't need to worry about them. [Attachment 1]
scripts>z read C1:SUS-BS_PIT_BIAS C1:SUS-BS_YAW_BIAS
-304.7661529521767
-109.23924626857811
scripts>z read C1:SUS-ITMX_PIT_BIAS C1:SUS-ITMX_YAW_BIAS
15.534616817500943
-503.4536332290159
scripts>z read C1:SUS-ITMY_PIT_BIAS C1:SUS-ITMY_YAW_BIAS
653.0100945988496
-478.16260735781225
scripts>z read C1:SUS-ETMX_PIT_BIAS C1:SUS-ETMX_YAW_BIAS
-136.17863332517527
181.09285307121306
scripts>z read C1:SUS-ETMY_PIT_BIAS C1:SUS-ETMY_YAW_BIAS
-196.6200333695437
-85.40819256078339
- IMC alignment: Locking nicely. I ran WFS relief to move the WFS output on to the alignment sliders. All the WFS feedback values are now <10. Here is the slider snapshots. [Attachment 2]
- PMC alignmnet: The PMC looked like it was quite misaligned -> aligned. IMC/PMC locking snapshot [Attachment 3]
Arm transmissions:
scripts>z avg 10 C1:LSC-TRX_OUT C1:LSC-TRY_OUT
C1:LSC-TRX_OUT 0.9825591325759888
C1:LSC-TRY_OUT 0.9488834202289581
- Suspension Status Snapshot [Attachment 4]
- Anchal aligned the OPLEV beams [ELOG 16407]
I also checked the 100 days trend of the OPLEV sum power. The trend of the max values look flat and fine. [Attachment 5] For this purpose, the PRM and SRM was aligned and the SRM oplev was also aligned. The SRM sum was 23580 when aligned and it was just fine (this is not so visible in the trend plot).
- The X and Y green beams were aligned for the cavity TEM00s. Y end green PZT values were nulled. The transmission I could reach was as follows.
>z read C1:ALS-TRX_OUTPUT C1:ALS-TRY_OUTPUT
0.42343354488901286
0.24739624058377277
It seems that these GTRX and GTRY seemed to have crosstalk. When each green shutters were closed the transmissino and the dark offset were measured to be
>z read C1:ALS-TRX_OUTPUT C1:ALS-TRY_OUTPUT
0.41822833190834546
0.025039383697636856
>z read C1:ALS-TRX_OUTPUT C1:ALS-TRY_OUTPUT
0.00021112720155274818
0.2249448773499293
Note that Y green seemed to have significant (~0.1) of 1st order HOM. I don't know why I could not transfer this power into TEM00. I could not find any significant clipping of the TR beams on the PSL table PDs.
- IMC Power reduction
Now we have nice motorized HWP. sitemap -> PSL -> Power control
== Initial condition == [Attachment 6]
C1:IOO-HWP_POS 38.83
Measured input power = 0.99W
C1:IOO-MC_RFPD_DCMON = 5.38
== Power reduction == [Attachment 7]
- The motor was enabled upon rotation on the screen
C1:IOO-HWP_POS 74.23
Measured input power = 98mW
C1:IOO-MC_RFPD_DCMON = 0.537
- Then, the motor was disabled
- Went to the detection table and swapped the 10% reflector with the 98% reflector stored on the same table. [Attachments 8/9]
After the beam alignment the MC REFL PD received about the same amount of the light as before.
C1:IOO-MC_RFPD_DCMON = 5.6
There is no beam delivered to the WFS paths.
CAUTION: IF THE POWER IS INCREASED TO THE NOMINAL WITH THIS CONFIGURATION, MC REFL PD WILL BE DESTROYED.
- The IMC can already be locked with this configuration. But for the MC Autolocker, the MCTRANS threshold for the autolocker needs to be reduced as well.
This was done by swapping a line in /opt/rtcds/caltech/c1/scripts/MC/AutoLockMC.init
# BEFORE
/bin/csh ./AutoLockMC.csh >> $LOGFILE
#/bin/csh ./AutoLockMC_LowPower.csh >> $LOGFILE
--->
# AFTER
#/bin/csh ./AutoLockMC.csh >> $LOGFILE
/bin/csh ./AutoLockMC_LowPower.csh >> $LOGFILE
Confirmed that the autolocker works a few times by toggling the PSL shutter. The PSL shutter was closed upon the completion of the test
- Walked around the lab and checked all the bellows - the jam nuts are all tight, and I couldn't move them with my hands. So this is okay according to the ancient tale by Steve.
|
16409
|
Fri Oct 15 20:53:49 2021 |
Koji | Summary | General | Vent Prep |
From the IFO point of view, all look good and we are ready for venting from Mon Oct 18 9AM |
16442
|
Mon Nov 1 14:51:34 2021 |
Koji | Update | General | Checking the vent plan |
The vent team described a detailed vent plan (and reports where the actions have been performed)
https://wiki-40m.ligo.caltech.edu/vent/Fall2021
- [Sec.4] We should decide the final PR2 mirror through table-top measurements.
- [Sec 6] BS alignment is probably "unknown" now. So it'd be better to use the ITMY spot as the reference, then align BS for ITMX. For temporary alignment, it's OK though.
- [Sec 9-11] RIght now there is no mounts to place LO3/LO4/AS2/AS3/BHDBS. But we probably want to test something before the installation of the BHD? Just place the BHDBS on a optics mount so that we get an interfered beam on ITMY?
At this point we are supposed to have all the electronics all the CDS necessary for the new SOS control. Otherwise, they are just swinging and the alignment work will just be impossible.
- [Sec 15] The OPLEV mirrors can be freely moved as long as it does not block the main IR beams. Moving ITMXOL1 makes the reflection blocked by ITMXOL2. And moving ITMXOL2 would make the IR beams clipped. Consider replacing the mounts with a fixed mount. (The OPLEV mirrors are 1.5" in dia. It is not common vacuum compatible 1.5inch mounts. If 1" Al mirror is sufficient, we can use it.
https://wiki-40m.ligo.caltech.edu/vent/Fall2021/FinalAlignment
- The arms are the most strict alignment requirement. Everything else will follow the arm alignment. So start from the arms and propagate the alignment to Michelson / PRMI / SRMI.
- We reestablish arm alignment using the end green beams.
- Then recover IR arm alignment. Consider using ASS if possible |
16472
|
Wed Nov 17 07:32:48 2021 |
Chub | Update | General | wire clamp plate mod |
This will be difficult to modify with the magnets and dumbells in place. Even if someone CAN clamp this piece into an endmill machine with the magnets/dumbells in place, the vibration of the cutting operation may be enough to break them off. |
16473
|
Wed Nov 17 11:53:27 2021 |
Koji | Update | General | wire clamp plate mod |
Of course, we remove the magnet-dumbbell for machining. After that the part will be cleaned/baked again. And Yehonathan is going to glue the magnet-dumbbell again. |
16474
|
Wed Nov 17 17:37:53 2021 |
Anchal | Update | General | Placed Nodus and fb1 on UPS power |
Today I placed nodus and fb1 on UPS battery backed supply. Now power glitches should not hurt our cds system. |
16476
|
Thu Nov 18 15:16:10 2021 |
Anchal | Update | General | Moved Chiara to 1X7 above nodus powered with same UPS |
[Anchal, Paco]
We moved chiara to 1X7 above nodus and powered with same UPS from a battery backed port. The UPS is at 40% load capacity. The nameserver and nfs came back online automatically on boot up.
|
16479
|
Mon Nov 22 17:42:19 2021 |
Anchal | Update | General | Connected Megatron to battery backed ports of another UPS |
[Anchal, Paco]
I used the UPS that was providing battery backup for chiara earlier (a APS Back-UPS Pro 1000), to provide battery backup to Megatron. This completes UPS backup to all important computers in the lab. Note that this UPS nominally consumes 36% of UPS capacity in power delivery but at start-up, Megatron was many fans that use up to 90% of the capacity. So we should not use this UPS for any other computer or equipment.
While doing so, we found that PS3 on Megatron was malfunctioning. It's green LED was not lighting up on connecting to power, so we replaced it from the PS3 of old FB computer from the same rack. This solved this issue.
Another thing we found was that Megatron on restart does not get configured to correct nameserver resolution settings and loses the ability to resolve names chiara and fb1. This results in the nfs mounts to fail which in turn results in the script services to fail. We fixed this by identifying that the NetworkManager of ubuntu was not disabled and would mess up the nameserver settings which we want to be run by systemd-resolved instead. We corrected the symbolic link: /etc/resolv.conf -> /run/systemd/resolve/resolv.conf. the we stopped and diabled the NetworkManager service to keep this persistent on reboot. Following are the steps that did this:
> sudo rm /etc/resolv.conf
> ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf
> sudo systemctl stop NetworkManager.service
> sudo systemctl disable NetworkManager.service
|
16485
|
Wed Nov 24 17:13:31 2021 |
Yehonathan | Metaphysics | General | Toilet tank broken |
The toilet tank in the big bathroom stopped refilling. I contacted PPService@caltech.edu and put up an "Out of Order sign". |
16486
|
Mon Nov 29 15:24:53 2021 |
Hang | HowTo | General | Fisher matrix vs length of each FFT segment |
We have been discussing how does the parameter estimation depends on the length per FFT segment. In other words, after we collected a series of data, would it be better for us to divide it into many segments so that we have many averages, or should we use long FFT segments so that we have more frequency bins?
My conclusions are that:
1). We need to make sure that the segment length is long enough with T_seg > min[ Q_i / f_i ], where f_i is the resonant frequency of the i'th resonant peak and the Q_i its quality factor.
2). Once 1) is satisfied, the result depends weakly on the FFT length. There might be a weak hint preferring a longer segment length (i.e., want more freq bins than more averages) though.
=================================================================
To reach the conclusion, I performed the following numerical experiment.
I considered a simple pendulum with resonant frequency f_1 = 0.993 Hz and Q_1 = 6.23. The value of f_1 is chosen such that it is not too special to fall into a single freq bin. Additionally, I set an overall gain of k=20. I generated T_tot = 512 s of data in the time domain and then did the standard frequency domain TF estimation. I.e., I computed the CSD between excitation and response (with noise) over the PSD of the excitation. The spectra of excitation and noise in the readout channel are shown in the first plot.
In the second plot, I showed the 1-sigma errors from the Fisher matrix calculation of the three parameters in this problem, as well as the determinant of the error matrix \Sigma = inv(Fisher matrix). All quantities are plotted as functions of the duration per FFT segment T_seg. The red dotted line is [Q_1/f_1], i.e., the time required to resolve the resonant peak. As one would expect, if T_seg <~ (Q_1/f_1), we cannot resolve the dynamics of the system and therefore we get nonsense PE results. However, once T_seg > (Q_1/f_1), the PE results seem to be just fluctuating (as f_1 does not fall exactly into a single bin). Maybe there is a small hint that longer T_seg is better. Potentially, this might be due to that we lose less information due to windowing? To be investigated further...
I also showed the Fisher estimation vs. MCMC results in the last two plots. Here each dot is an MCMC posterior. The red crosses are the true values, and the purple contours are the results of the Fisher calculations (3-sigma contours). The MCMC results showed similar trends as the Fisher predictions and the results for T_seg = (32, 64, 128) s all have similar amounts of scattering << the scattering of the T_seg=8 s results. Though somehow it showed a biased result. In the third plot, I manually corrected the mean so that we could just compare the scattering. The fourth plot showed the original posterior distribution.
|
16487
|
Tue Nov 30 11:03:44 2021 |
Yehonathan | Metaphysics | General | Toilet tank broken |
a plumber came in yesterday and fixed the issue.
Quote: |
The toilet tank in the big bathroom stopped refilling. I contacted PPService@caltech.edu and put up an "Out of Order sign".
|
|
16488
|
Tue Nov 30 17:11:06 2021 |
Paco | Update | General | Moved white rack to 1X3.5 |
[Paco, Ian, Tega]
We moved the white rack (formerly unused along the YARM) to a position between 1X3, and 1X4. For this task we temporarily removed the hepas near the enclosures, but have since restored them. |
16532
|
Wed Dec 22 14:57:05 2021 |
Koji | Update | General | chiara local backup |
chiara local backup of /cvs/cds has not been running since the move of chiara in Nov 19. The remote backup has not been taken since 2017.
The lack of the local backup was because of the misconfiguration of /etc/fstab.
It was fixed and now the backup disk was mounted. We'll see the backup script running tomorrow morning.
The backup disk is smaller than the main disk. So sooner or later, we will face the backup problem again.
localbackup script was crying because there was no backup disk.
backup>pwd
/opt/rtcds/caltech/c1/scripts/backup
backup>tail localbackup.log
2021-12-18 07:00:02,002 INFO Updating backup image of /cvs/cds
2021-12-18 07:00:02,002 ERROR External drive not mounted!!!
2021-12-19 07:00:01,146 INFO Updating backup image of /cvs/cds
2021-12-19 07:00:01,146 ERROR External drive not mounted!!!
2021-12-20 07:00:01,255 INFO Updating backup image of /cvs/cds
2021-12-20 07:00:01,255 ERROR External drive not mounted!!!
2021-12-21 07:00:01,361 INFO Updating backup image of /cvs/cds
2021-12-21 07:00:01,361 ERROR External drive not mounted!!!
2021-12-22 07:00:01,469 INFO Updating backup image of /cvs/cds
2021-12-22 07:00:01,470 ERROR External drive not mounted!!!
fstab had no entry for the backup disk.
backup>cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sda1 during installation
UUID=972db769-4020-4b74-b943-9b868c26043a / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
UUID=a3f5d977-72d7-47c9-a059-38633d16413e none swap sw 0 0
# OLD BACKUP DISK
#UUID="90a5c98a-22fb-4685-9c17-77ed07a5e000" /media/40mBackup ext4 defaults,relatime,commit=60 0 0
# CURRENT BACKUP DISK as of 2021/09/02
#UUID="1843f813-872b-44ff-9a4e-38b77976e8dc" /media/40mBackup ext4 defaults,relatime,commit=60 0 0
#fb:/frames /frames nfs ro,bg
# CURRENT MAIN DISK as of 2021/09/02
# UUID=92dc7073-bf4d-4c58-8052-63129ff5755b /home/cds ext4 defaults,relatime,commit=60 0 0
UUID="1843f813-872b-44ff-9a4e-38b77976e8dc" /home/cds ext4 defaults,relatime,commit=60 0 0
Checked the dev name of the disks and the UUIDs
backup>sudo lsblk
[sudo] password for controls:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.8G 0 disk
├─sda1 8:1 0 446.9G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 18.9G 0 part [SWAP]
sdb 8:16 0 5.5T 0 disk
└─sdb1 8:17 0 5.5T 0 part /home/cds
sdc 8:32 0 3.7T 0 disk
└─sdc1 8:33 0 3.7T 0 part
sr0 11:0 1 1024M 0 rom
backup> sudo blkid
/dev/sda1: UUID="972db769-4020-4b74-b943-9b868c26043a" TYPE="ext4"
/dev/sda5: UUID="a3f5d977-72d7-47c9-a059-38633d16413e" TYPE="swap"
/dev/sdb1: UUID="1843f813-872b-44ff-9a4e-38b77976e8dc" TYPE="ext4"
/dev/sdc1: UUID="92dc7073-bf4d-4c58-8052-63129ff5755b" TYPE="ext4"
Added the fstab entry for the backup disk
media>cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sda1 during installation
UUID=972db769-4020-4b74-b943-9b868c26043a / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
UUID=a3f5d977-72d7-47c9-a059-38633d16413e none swap sw 0 0
# OLD BACKUP DISK
#UUID="90a5c98a-22fb-4685-9c17-77ed07a5e000" /media/40mBackup ext4 defaults,relatime,commit=60 0 0
# OLD BACKUP DISK as of 2021/09/02
#UUID="1843f813-872b-44ff-9a4e-38b77976e8dc" /media/40mBackup ext4 defaults,relatime,commit=60 0 0
# Current backup disk as of 2021/12/22
UUID="92dc7073-bf4d-4c58-8052-63129ff5755b" /media/40mBackup ext4 defaults,relatime,commit=60 0 0
#fb:/frames /frames nfs ro,bg
# CURRENT MAIN DISK as of 2021/09/02
# UUID=92dc7073-bf4d-4c58-8052-63129ff5755b /home/cds ext4 defaults,relatime,commit=60 0 0
UUID="1843f813-872b-44ff-9a4e-38b77976e8dc" /home/cds ext4 defaults,relatime,commit=60 0 0
|
16535
|
Thu Dec 23 16:38:21 2021 |
Koji | Update | General | Is megatron down? (Re: chiara local backup) |
The local backup seems working fine again. But I found that megatron is down and this is a real issue. This should be fixed at the earliest chance.
It seems that the local backup has been successfully taken this morning.
controls@nodus|backup> tail /opt/rtcds/caltech/c1/scripts/backup/localbackup.log
2021-12-19 07:00:01,146 INFO Updating backup image of /cvs/cds
2021-12-19 07:00:01,146 ERROR External drive not mounted!!!
2021-12-20 07:00:01,255 INFO Updating backup image of /cvs/cds
2021-12-20 07:00:01,255 ERROR External drive not mounted!!!
2021-12-21 07:00:01,361 INFO Updating backup image of /cvs/cds
2021-12-21 07:00:01,361 ERROR External drive not mounted!!!
2021-12-22 07:00:01,469 INFO Updating backup image of /cvs/cds
2021-12-22 07:00:01,470 ERROR External drive not mounted!!!
2021-12-23 07:00:01,594 INFO Updating backup image of /cvs/cds
2021-12-23 07:19:55,560 INFO Backup rsync job ran successfully, transferred 338425 files.
However, I noticed that the autoburt has been stalled since Dec 6 (I used to check how the backup is up-to-date using the autoburt snapshots)
Dec>pwd
/opt/rtcds/caltech/c1/burt/autoburt/snapshots/2021/Dec
Dec>ls -l
total 24
drwxr-xr-x 26 controls controls 4096 Dec 1 23:07 1
drwxr-xr-x 26 controls controls 4096 Dec 2 23:07 2
drwxr-xr-x 26 controls controls 4096 Dec 3 23:07 3
drwxr-xr-x 26 controls controls 4096 Dec 4 23:07 4
drwxr-xr-x 26 controls controls 4096 Dec 5 23:07 5
drwxr-xr-x 19 controls controls 4096 Dec 6 16:07 6
There are a bunch of errors in the log file as follows, but maybe this is not an issue
controls@nodus|burt> pwd
/opt/rtcds/caltech/c1/burt
controls@nodus|burt> tail burtcron.log
!!! ERROR !!! Target c1supepics Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1tstepics Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1x10epics Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1aux Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1dcuepics Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1iscaux Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1iscepics Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1losepics Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1psl Snapshot file inconsistent with Request file
!!! ERROR !!! Target c1susaux Snapshot file inconsistent with Request file
The real issue seems that megatron is down. It has a lot of house keeping jobs on corn including the N2 pressure alert.
https://wiki-40m.ligo.caltech.edu/Computers_and_Scripts/CRON
This needs to be fixed at the earliest chance. |
16536
|
Fri Dec 24 16:49:41 2021 |
Koji | Update | General | Is megatron down? (Re: chiara local backup) |
It turned out that the UPS installed on Nov 22 failed (cf https://nodus.ligo.caltech.edu:8081/40m/16479 ). As a fact, it was alive just for 2 weeks!
The APC UPS unit indicated F06. According to the manual (https://www.apc.com/shop/us/en/products/APC-Power-Saving-Back-UPS-Pro-1000VA/P-BR1000G), F06 means "Relay Welding" and can not be fixed by a user. Resetting the UPS eliminated the error, but I didn't want to have the same issue while no one is in the lab, I moved the megatron power source from the UPS to the power strip on 1Y7. So, megatron is currently vulnerable to a power glitch.
After the power cords were restored, megatron eventually recovered ssh terminals. I manually ran autoburt.cron at 16:50 so that the latest snapshot is taken. |
16646
|
Fri Feb 4 10:04:47 2022 |
Chub | Update | General | dish soap and clean scrub sponges! |
Bought dish soap and scrub sponges today and placed them under the sink with the other dish supplies. |
16647
|
Fri Feb 4 10:21:39 2022 |
Anchal | Summary | General | Complete lab shutdown |
Please edit this same entry throughout the day for the shutdown elogging.
I took a screenshot of C0VAC_MONITOR.adl to ensure that all pnematic valves are in closed positions:

The status message says "All pnematic valves closed" and the latest error message is about "V7 closed, N2 < 6.50e+01".
I found out that there was no autoburt happening for c1vac channels. I created an autoBurt.req file for the vac.db file and saved one snapshot. I also added the path of this file in autoburt/.requestfilelist . Let's see if autoburting starts by that for this file as well.
With this, I think we can safely shutdown acromag chassis. Hopefully, the relays are configured such that the valves are nominally closed in absence of a control signal. After the chassis is shut down, wwe can shutdown C1VAC by:
sudo shutdown
[Chub, Jordan]
At the 1x8 rack, the following were switched off on their respective front panels:
PTP2 & PTP3 Controller
MKS Gauge controller
PRP Gauge Controller
G2P316a & b Controllers
Sorenson
Serial Device Server
Both UPS's
Powered off from back of unit:
TP1 Controller
Acromag chassis
TP2 and 3 controllers were unplugged from respective power strips (labeled C2 and C3)
C1vac and the laptop at the workstation were shut down
Manual Gate valve was closed |
16648
|
Mon Feb 7 09:00:26 2022 |
Paco | Update | General | Scheduled power outage recovery |
[Paco]
Started recovering from scheduled (Feb 05) power outage. Basically, time-reversing through this list.
== Office area ==
- Power martian network switches, WiFi routers on the north-rack.
- Power windows (CAD) machine on.
== Main network stations ==
- Power on nodus, try ping (fail).
- Power on network switches, try ping (success), try ssh controls@nodus.ligo.caltech.edu (success).
- Power on chiara to serve names for other stations, try ssh chiara (success).
- Power on fb1, try ping (success), try ssh fb1 (success).
- Power on paola (xend laptop), viviana (yend laptop), optimus, megatron.
== Control workstations ==
- Power on zita (success)
- Power on giada (success), run system upgrade.
- Power on donatella (success)
- Power on allegra (fail) **
- Power on pianosa (success)
- Power on rossa (success)
- From nodus, started elog (success).
== PSL + Vertex instruments ==
- Turn on newport PD power supplies on PSL table.
- Turn on TC200 temp controller on (setpoint --> 36.9 C)
- Turn on two oscilloscopes in PSL table.
- Turn on PSL (current setpoint --> 2.1 A, other settings seem nominal)
- Turn on Thorlabs HV pzt supply.
- Turn on ITMX OpLev / laser instrument AC strip.
== YEND and XEND instruments ==
- Turn on XEND AUX pump on (current setpoint -->1.984 A)
- Turn on XEND AUX SHG oven on (setpoint --> 37.1 C) (see green beam)
- Turn on XEND AUX shutter controller on.
- Turn on DCPD supply, and OpLev supply AC strip on.
- Turn on YEND AUX pump on (fail) *
- With the controller on STDBY, I tried setting up the current but got HD FAULT (or according to the manual this is what the head reports when the diode temperature is too high...)
- Upon power cycling the controller, even the controller display stopped working... YAUX controller + head died? maybe just the diode? maybe just the controller?
- I borrowed a spare LW125 controller from the PSL table (Yehonathan pointed me to it) and swapped it in.
- Got YEND AUX to lase with this controller, so the old controller is busted but at least the laser head is fine.
- Even saw SHG light. We switched the laser head off to "STDBY" (so it remains warm) and took the faulty controller out of there.
- Turn on YEND AUX SHG oven on (setpoint -->35.7 C)
- Turn on YEND AUX shutter controller on.
== YARM Electronic racks ==
== XARM Electronic racks ==
* Top priority, this needs to be fixed.
** Non-priority, but to be debugged |
16649
|
Mon Feb 7 15:32:48 2022 |
Yehonathan | Update | General | Y End laser controller |
I went to the Y end. The AUX laser was on Standby. I pushed the Standby button. The laser turned on and there was some green light. However, the controller displayed the message "CABLE?" which according to the manual means that the laser head is powered but there is no control over the laser (e.g. the control cable is disconnected). I turned off the controller and disconnected both the power and control cables. I put them back and turned the controller back on.
I pushed the Standby button, the laser turned on and this time the controller displayed the laserhead's state. I was able to change the current/temperature. The problem seems to be resolved. |
16651
|
Mon Feb 7 16:53:02 2022 |
Koji | Update | General | Scheduled power outage recovery |
I went to the X end and found it was warm. Turned out that not all the A/Cs were on. They were turned on now. |
16652
|
Wed Feb 9 11:56:24 2022 |
Anchal | Update | General | Bringing back CDS |
[Anchal, Paco]
Bringing back CDS took a lot of work yesterday. I'm gonna try to summarize the main points here.
mx_start_stop
For some reason, fb1 was not able to mount mx devices automatically on system boot. This was an issue I earlier faced in fb1(clone) too. The fix to this problem is to run the script:
controls@fb1:/opt/mx/sbin/mx_start_stop start
To make this persistent, I've configured a daemon (/etc/systemd/system/mx_start_stop.service) in fb1 to run once on system boot and mount the mx devices as mentioned above. We did not see this issue of later reboots yesterday.
gpstime
Next was the issue of gpstime module out of date on fb1. This issue is also known in the past and requires us to do the following:
controls@fb1:~ 0$ sudo modprobe -r gpstime
controls@fb1:~ 1$ sudo modprobe gpstime
Again, to make this persistent, I've configured a daemon (/etc/systemd/system/re-add-gpstime.service) in fb1 to run the above commands once on system boot. This corrected gpstime automatically and we did not face these problems again.
time synchornization
Later we found that fb1-FE computers, ntp time synchronization was not working and the main reason was that fb1 was unable to access internet. As a rule of thumb, it is always a good idea to try pinging www.google.com on fb1 to ensure that it is connected to internet. The issue had to do with fb1 not being able to find any namespace server. We fixed this issue by reloading bind9 service on chiara a couple of times. We're not really sure why it wasn't working.
~>sudo service bind9 stop
~>sudo service bind9 start
~>sudo service bind9 status
* bind9 is running
After the above, we saw that fb1 ntp server is working fine. You see following output on fb1 when that is the case:
controls@fb1:~ 0$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
-table-moral.bnr 110.142.180.39 2 u 399 512 377 195.034 -14.618 0.122
*server1.quickdr .GPS. 1 u 67 64 377 130.483 -1.621 1.077
+ntp2.tecnico.ul 56.99.239.27 2 u 473 512 377 184.648 -0.775 2.231
+schattenbahnhof 129.69.1.153 2 u 365 512 377 144.848 3.841 1.092
192.168.123.255 .BCST. 16 u - 64 0 0.000 0.000 0.000
On the FE models, timedatectl should show that NTP synchronized feild is yes. That wasn't happening even after us restarting the systemd-timesyncd service. After this, I just tried restarting all FE computers and it started working.
CDS
We had removed all db9 enabling plugs on the new SOSs beforehand to keep coils off just in case CDS does not come back online properly.
Everything in CDS loaded properly except the c1oaf model which kepy showing 0x2bad status. This meant that some IPC flags are red on c1sus, c1mcs and c1lsc as well. But everything else is green. See attachment 1. I then burtrestroed everything in the /opt/rtcds/caltech/c1/burt/autoburt/snapshots/2022/Feb/4/12:19 directory. This includes the snapshot of c1vac as well that I added on autoburt that day. All burt restore statuses were green OK. I think we are in good state now to start watchdogs on the new SOSs and put back the db9 enabling plugs.
Future work:
When somebody gets time, we should make cutom service files in fb1:/etc/systemd/system/ symbolic links to a repo directory and version control these important services. We should also make sure that their dependencies and startup order is correctly configured. I might have done a half-assed job there since I recently learned how to make unit files. We should do the same on nodus and chiara too. Our hope is that on one glorious day, the lab can be restarted without spending more than 20 min on booting up the computers and network.
|
16653
|
Wed Feb 9 13:55:05 2022 |
Koji | Update | General | Bringing back CDS |
Great recovery work and cleaning of the rebooting process.
I'm just curious: Did you observe that the c1sus2 cards have different numbering order than the previous along with the power outage/cycling? |
16655
|
Wed Feb 9 16:43:35 2022 |
Paco | Update | General | Scheduled power outage recovery - Locking mode cleaner(s) |
[Paco, Anchal]
- We went in and measured the power after the power splitting HWP at the PSL table. Almost right before the PSL shutter (which was closed), when the PMC was locked we saw ~ 598 mW (!!)
- Checking back on ESP300, it seems the channel was not enabled even though the right angle was punched in, so it got enabled.
- The power adjustment MEDM screen is not really working...
- Going back to the controller, press HOME on the Axis 1 (our HWP) and see it go to zero...
- Now the power measured is ~ 78 mW.
- Not sure why the MEDM screen didn't really work (this needs to be fixed later)
We proceeded to align the MC optics because all offsets in MC_ALIGN screen were zeroed. After opening the PSL shutter, we used values from last year as a reference, and try to steadily recover the alignment. The IMC lock remains at large. |
16657
|
Thu Feb 10 15:41:00 2022 |
Anchal | Update | General | Scheduled power outage recovery - Locking mode cleaner(s) |
I found out that the ESP300 service needs to be run in root mode for it to be able to connect to the USB port of HWP motor controller. While doing this change, I noticed that the channels hosted by c1psl might have a duplication conflict with some other channel hosting computer, because a lot of them show the Warning: "Identical process variable names on multiple servers" which is not good. Someone should look into this conflict.
I added instructions on the power control MEDM screen as it was very non-trivial to use. I have set the power such that the C1:IOO-MC_RFPD_DCMON is 5.6 and this happened at C1:IOO-HWP_POS_SET 2.29. |
16658
|
Thu Feb 10 17:57:48 2022 |
Anchal | Update | General | Scheduled power outage recovery - Locking mode cleaner(s) |
Something is wrong with the Video MUX. The system did not turn back on with full functionality. Even though we see the screens as they were before the power shutdown, we have lost control on switching any of the videos. I went to check the wiki page about Video MUX which told be we should be able to see the configuration screen on this link, but the page wasn't opening. I went and removed the power cable and put it back in. That brought back the configuration page. Still, I could not change any of the video feeds however this time, I could see the EPICS channel values (like C1:VID-QUAD1_4) change. I tired to go to the configuration page and change the matrix values from the control tab there. I found out that the matrix was mislabeled and while making the changes, I started seeing blue screen on QUAD1_3 (where MC2T was set before). I set the QUAD1_3 (output 23) to MC2T (input 16), but no change. The EPICS values are also set properly, so I don't understand the reason behind blue screen. The same happened when I tried to use:
~>/opt/rtcds/caltech/c1/scripts/general/videoscripts videoswitch3 QUAD1_3 MC2T
Weirdly, this caused the QUAD1_4 screen to go blue. Running following had no effect:
~>/opt/rtcds/caltech/c1/scripts/general/videoscripts videoswitch3 QUAD1_4 MCR
So, I'm not sure what to do. This really needs to be fixed! I wanted to see teh MC2F camera so that I can align IMC, that was the whole reason for this rabit hole. Help needed. |
16659
|
Thu Feb 10 19:03:23 2022 |
Koji | Update | General | Scheduled power outage recovery - Locking mode cleaner(s) |
I came back to the 40m and started the investigation.
If I ping 192.168.113.92, it responds. But telnet (port 23) was rejected. I somehow tried ssh and it responds! I even could login to the host using usual password. Here is the prompt.
controls@nodus|~> ssh 192.168.113.92
controls@192.168.113.92's password:
...
controls@c1sus2:~ 0$
Oh no...
Looks like c1sus2 and the videomux have the IP address conflict.
Here are the useful ELOG links:
https://nodus.ligo.caltech.edu:8081/40m/4498
https://nodus.ligo.caltech.edu:8081/40m/4529 |
16660
|
Thu Feb 10 19:46:37 2022 |
Koji | Update | General | Scheduled power outage recovery - Locking mode cleaner(s) |
== Assign new IP address to c1sus2 ==
cf: [40m ELOG 16398] [40m ELOG 16396]
- Shutdown c1sus2 (Oh, no. This killed c1lsc/c1sus/c1ioo... This should be taken care of later)
- Confirmed 192.168.113.87 is not alive
- Go to chiara
- Modify /diskless/root/etc/hosts
192.168.113.87 c1sus2 c1sus2.martian
- Modify /etc/dhcp/dhcpd.conf
host c1sus2 {
hardware ethernet 00:25:90:06:69:C2;
fixed-address 192.168.113.87;
}
- Modify /var/lib/bind/martian.hosts
c1sus2 A 192.168.113.87
videomux A 192.168.113.92
- Modify /var/lib/bind/martian.hosts/rev.113.168.192.in-addr.arpa
87 PTR c1sus2.martian
92 PTR videomux.martian
- Reload/restart bind9 / dhcpd. Run the following command
sudo service bind9 reload
sudo service isc-dhcp-server restart
- Restart c1sus2 and confirm if the IP address was actually changed
controls@c1sus2:~ 0$ /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:25:90:06:69:c2
inet addr:192.168.113.87 Bcast:192.168.113.255 Mask:255.255.255.0
...
== Restart c1lsc / c1sus /c1ioo ==
- Reboot c1lsc/c1sus/c1ioo
- Go to scripts/cds
- Run startC1LSC.sh and follow the instruction
|