40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Thu Dec 20 21:29:41 2018, Jon, Omnistructure, Upgrade, Vacuum Controls Switchover Completed 40m_vacuum_acromag_channels.pdf
    Reply  Fri Dec 21 11:13:13 2018, gautam, Omnistructure, VAC, N2 line valved off VacGauges.pngScreenshot_from_2018-12-21_13-02-06.png
       Reply  Fri Dec 21 12:57:10 2018, Koji, Omnistructure, VAC, N2 line valved off 
          Reply  Thu Jan 3 15:08:37 2019, gautam, Omnistructure, VAC, Vac status unknown Screenshot_from_2019-01-03_15-19-51.pngScreenshot_from_2019-01-03_15-14-14.png997B13A9-CAAF-409C-A6C2-00414D30A141.jpeg
             Reply  Fri Jan 4 17:43:24 2019, gautam, Update, CDS, Timing issues 
                Reply  Wed Jan 9 11:33:35 2019, gautam, Update, CDS, Timing issues still persist gpstimeSync.pngScreenshot_from_2019-01-09_17-56-58.png
          Reply  Fri Jan 4 10:25:19 2019, Jon, Omnistructure, VAC, N2 line valved off 
    Reply  Thu Mar 21 18:36:59 2019, Jon, Omnistructure, Upgrade, Vacuum Controls Switchover Completed 40m_Vacuum_Acromag_Channels_20190321.pdf
Message ID: 14380     Entry time: Thu Jan 3 15:08:37 2019     In reply to: 14379     Reply to this: 14386
Author: gautam 
Type: Omnistructure 
Category: VAC 
Subject: Vac status unknown 

Larry W came by the 40m, and reported that there was a campus-wide power glitch (he was here to check if our networking infrastructure was affected). I thought I'd check the status of the vacuum.

  • Attachment #1 is a screenshot of the Vac overview MEDM screen. Clearly something has gone wrong with the modbus process(es). Only the PTP2 and PTP3 gauges seem to be communicative.
  • Attachment #2 shows the minute trend of the pressure gauges for a 12 day period - it looks like there is some issue with the frame builder clock, perhaps this issue resurfaced? But checking the system time on FB doesn't suggest anything is wrong.. I double checked with dataviewer as well that the trends don't exist... But checking the status of the individual daqd processes indeed showed that the dates were off by 1 year, so I just restarted all of them and now the time seems correct. How can we fix this problem more permanently? Also, the P1b readout looks suspicious - why are there periods where it seems like we are reading values better than the LSB of the device?

I decided to check the systemctl process status on c1vac:

controls@c1vac:~$ sudo systemctl status modbusIOC.service
● modbusIOC.service - ModbusIOC Service via procServ
   Loaded: loaded (/etc/systemd/system/modbusIOC.service; enabled)
   Active: active (running) since Thu 2019-01-03 14:53:49 PST; 11min ago
 Main PID: 16533 (procServ)
   CGroup: /system.slice/modbusIOC.service
           ├─16533 /usr/bin/procServ -f -L /opt/target/modbusIOC.log -p /run/...
           ├─16534 /opt/epics/modules/modbus/bin/linux-x86_64/modbusApp /opt/...
           └─16582 caRepeater

Jan 03 14:53:49 c1vac systemd[1]: Started ModbusIOC Service via procServ.

Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.

So something did happen today that required restart of the modbus processes. But clearly not everything has come back up gracefully. A few lines of dmesg (there are many more segfaults):

[1706033.718061] python[23971]: segfault at 8 ip 000000000049b37d sp 00007fbae2b5fa10 error 4 in python2.7[400000+31d000]
[1706252.225984] python[24183]: segfault at 8 ip 000000000049b37d sp 00007fd3fa365a10 error 4 in python2.7[400000+31d000]
[1720961.451787] systemd-udevd[4076]: starting version 215
[1782064.269844] audit: type=1702 audit(1546540443.159:38): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.269866] audit: type=1302 audit(1546540443.159:39): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/85/tmp_obj_uAXhPg" inode=173019272 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.365240] audit: type=1702 audit(1546540443.255:40): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.365271] audit: type=1302 audit(1546540443.255:41): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/58/tmp_obj_KekHsn" inode=173019274 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.460620] audit: type=1702 audit(1546540443.347:42): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.460652] audit: type=1302 audit(1546540443.347:43): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/cb/tmp_obj_q62Pdr" inode=173019276 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.545449] audit: type=1702 audit(1546540443.435:44): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.545480] audit: type=1302 audit(1546540443.435:45): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/e3/tmp_obj_gPI4qy" inode=173019277 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.640756] audit: type=1702 audit(1546540443.527:46): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1783440.878997] systemd[1]: Unit serial_TP3.service entered failed state.
[1784682.147280] systemd[1]: Unit serial_TP2.service entered failed state.
[1786407.752386] systemd[1]: Unit serial_MKS937b.service entered failed state.
[1792371.508317] systemd[1]: serial_GP316a.service failed to run 'start' task: No such file or directory
[1795550.281623] systemd[1]: Unit serial_GP316b.service entered failed state.
[1796216.213269] systemd[1]: Unit serial_TP3.service entered failed state.
[1796518.976841] systemd[1]: Unit serial_GP307.service entered failed state.
[1796670.328649] systemd[1]: serial_Hornet.service failed to run 'start' task: No such file or directory
[1797723.446084] systemd[1]: Unit serial_MKS937b.service entered failed state.

 

I don't know enough about the new system so I'm leaving this for Jon to debug. Attachment #3 shows that the analog readout of the P1 pressure gauge suggests that the IFO is still under vacuum, so no random valve openings were effected (as expected, since we valved off the N2 line for this very purpose).

Attachment 1: Screenshot_from_2019-01-03_15-19-51.png  26 kB  | Hide | Hide all | Show all
Screenshot_from_2019-01-03_15-19-51.png
Attachment 2: Screenshot_from_2019-01-03_15-14-14.png  83 kB  | Hide | Hide all | Show all
Screenshot_from_2019-01-03_15-14-14.png
Attachment 3: 997B13A9-CAAF-409C-A6C2-00414D30A141.jpeg  989 kB  | Show | Hide all | Show all
ELOG V3.1.3-