Larry W came by the 40m, and reported that there was a campus-wide power glitch (he was here to check if our networking infrastructure was affected). I thought I'd check the status of the vacuum.
- Attachment #1 is a screenshot of the Vac overview MEDM screen. Clearly something has gone wrong with the modbus process(es). Only the PTP2 and PTP3 gauges seem to be communicative.
- Attachment #2 shows the minute trend of the pressure gauges for a 12 day period - it looks like there is some issue with the frame builder clock, perhaps this issue resurfaced? But checking the system time on FB doesn't suggest anything is wrong.. I double checked with dataviewer as well that the trends don't exist... But checking the status of the individual daqd processes indeed showed that the dates were off by 1 year, so I just restarted all of them and now the time seems correct. How can we fix this problem more permanently? Also, the P1b readout looks suspicious - why are there periods where it seems like we are reading values better than the LSB of the device?
I decided to check the systemctl process status on c1vac:
controls@c1vac:~$ sudo systemctl status modbusIOC.service
● modbusIOC.service - ModbusIOC Service via procServ
Loaded: loaded (/etc/systemd/system/modbusIOC.service; enabled)
Active: active (running) since Thu 2019-01-03 14:53:49 PST; 11min ago
Main PID: 16533 (procServ)
CGroup: /system.slice/modbusIOC.service
├─16533 /usr/bin/procServ -f -L /opt/target/modbusIOC.log -p /run/...
├─16534 /opt/epics/modules/modbus/bin/linux-x86_64/modbusApp /opt/...
└─16582 caRepeater
Jan 03 14:53:49 c1vac systemd[1]: Started ModbusIOC Service via procServ.
Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.
So something did happen today that required restart of the modbus processes. But clearly not everything has come back up gracefully. A few lines of dmesg (there are many more segfaults):
[1706033.718061] python[23971]: segfault at 8 ip 000000000049b37d sp 00007fbae2b5fa10 error 4 in python2.7[400000+31d000]
[1706252.225984] python[24183]: segfault at 8 ip 000000000049b37d sp 00007fd3fa365a10 error 4 in python2.7[400000+31d000]
[1720961.451787] systemd-udevd[4076]: starting version 215
[1782064.269844] audit: type=1702 audit(1546540443.159:38): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.269866] audit: type=1302 audit(1546540443.159:39): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/85/tmp_obj_uAXhPg" inode=173019272 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.365240] audit: type=1702 audit(1546540443.255:40): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.365271] audit: type=1302 audit(1546540443.255:41): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/58/tmp_obj_KekHsn" inode=173019274 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.460620] audit: type=1702 audit(1546540443.347:42): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.460652] audit: type=1302 audit(1546540443.347:43): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/cb/tmp_obj_q62Pdr" inode=173019276 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.545449] audit: type=1702 audit(1546540443.435:44): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.545480] audit: type=1302 audit(1546540443.435:45): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/e3/tmp_obj_gPI4qy" inode=173019277 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.640756] audit: type=1702 audit(1546540443.527:46): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1783440.878997] systemd[1]: Unit serial_TP3.service entered failed state.
[1784682.147280] systemd[1]: Unit serial_TP2.service entered failed state.
[1786407.752386] systemd[1]: Unit serial_MKS937b.service entered failed state.
[1792371.508317] systemd[1]: serial_GP316a.service failed to run 'start' task: No such file or directory
[1795550.281623] systemd[1]: Unit serial_GP316b.service entered failed state.
[1796216.213269] systemd[1]: Unit serial_TP3.service entered failed state.
[1796518.976841] systemd[1]: Unit serial_GP307.service entered failed state.
[1796670.328649] systemd[1]: serial_Hornet.service failed to run 'start' task: No such file or directory
[1797723.446084] systemd[1]: Unit serial_MKS937b.service entered failed state.
I don't know enough about the new system so I'm leaving this for Jon to debug. Attachment #3 shows that the analog readout of the P1 pressure gauge suggests that the IFO is still under vacuum, so no random valve openings were effected (as expected, since we valved off the N2 line for this very purpose). |