Larry W came by the 40m, and reported that there was a campus-wide power glitch (he was here to check if our networking infrastructure was affected). I thought I'd check the status of the vacuum.
I decided to check the systemctl process status on c1vac:
controls@c1vac:~$ sudo systemctl status modbusIOC.service
● modbusIOC.service - ModbusIOC Service via procServ
Loaded: loaded (/etc/systemd/system/modbusIOC.service; enabled)
Active: active (running) since Thu 2019-01-03 14:53:49 PST; 11min ago
Main PID: 16533 (procServ)
├─16533 /usr/bin/procServ -f -L /opt/target/modbusIOC.log -p /run/...
├─16534 /opt/epics/modules/modbus/bin/linux-x86_64/modbusApp /opt/...
Jan 03 14:53:49 c1vac systemd: Started ModbusIOC Service via procServ.
Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.
So something did happen today that required restart of the modbus processes. But clearly not everything has come back up gracefully. A few lines of dmesg (there are many more segfaults):
[1706033.718061] python: segfault at 8 ip 000000000049b37d sp 00007fbae2b5fa10 error 4 in python2.7[400000+31d000]
[1706252.225984] python: segfault at 8 ip 000000000049b37d sp 00007fd3fa365a10 error 4 in python2.7[400000+31d000]
[1720961.451787] systemd-udevd: starting version 215
[1782064.269844] audit: type=1702 audit(1546540443.159:38): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.269866] audit: type=1302 audit(1546540443.159:39): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/85/tmp_obj_uAXhPg" inode=173019272 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.365240] audit: type=1702 audit(1546540443.255:40): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.365271] audit: type=1302 audit(1546540443.255:41): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/58/tmp_obj_KekHsn" inode=173019274 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.460620] audit: type=1702 audit(1546540443.347:42): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.460652] audit: type=1302 audit(1546540443.347:43): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/cb/tmp_obj_q62Pdr" inode=173019276 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.545449] audit: type=1702 audit(1546540443.435:44): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.545480] audit: type=1302 audit(1546540443.435:45): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/e3/tmp_obj_gPI4qy" inode=173019277 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.640756] audit: type=1702 audit(1546540443.527:46): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1783440.878997] systemd: Unit serial_TP3.service entered failed state.
[1784682.147280] systemd: Unit serial_TP2.service entered failed state.
[1786407.752386] systemd: Unit serial_MKS937b.service entered failed state.
[1792371.508317] systemd: serial_GP316a.service failed to run 'start' task: No such file or directory
[1795550.281623] systemd: Unit serial_GP316b.service entered failed state.
[1796216.213269] systemd: Unit serial_TP3.service entered failed state.
[1796518.976841] systemd: Unit serial_GP307.service entered failed state.
[1796670.328649] systemd: serial_Hornet.service failed to run 'start' task: No such file or directory
[1797723.446084] systemd: Unit serial_MKS937b.service entered failed state.
I don't know enough about the new system so I'm leaving this for Jon to debug. Attachment #3 shows that the analog readout of the P1 pressure gauge suggests that the IFO is still under vacuum, so no random valve openings were effected (as expected, since we valved off the N2 line for this very purpose).
Yes, for TP2 and TP3. They both have a small vent valve that opens automatically on shutdown.
Independent question: Are all the turbo forelines vented automatically? We manually did it for the main roughing line.
Looks like I didn't restart all the daqd processes last night, so the data was not in fact being recorded to frames. I just restarted everything, and looks like the data for the last 3 minutes are being recorded . Is it reasonable that the TP1 current channel is reporting 0.75A of current draw now, when the pump is off? Also the temperature readback of TP3 seems a lot jumpier than that of TP2, probably has to do with the old controller having fewer ADC bits or something, but perhaps the SMOO needs to be adjusted.
Gautam and I updated the framebuilder config file, adding the newly-added channels to the list of those to be logged.
[Jon, Koji, Chub, Gautam]
The second pumpdown with the new vacuum system was completed successfully today. A time history is attached below.
We started with the main volume still at 12 torr from the Dec. pumpdown. Roughing from 12 to 0.5 torr took approximately two hours, at which point we valved out RP1 and RP3 and valved in TP1 backed by TP2 and TP3. We additionally used the AUX dry pump connected to the backing lines of TP2 and TP3, which we found to boost the overall pump rate by a factor of ~3. The manual hand-crank valve directly in front of TP1 was used to throttle the pump rate, to avoid tripping software interlocks. If the crank valve is opened too quickly, the pressure differential between the main volume (TP1 intake) and TP1 exhaust becomes >1 torr, tripping the V1 valve-close interlock. Once the main volume pressure reached 1e-2 torr, the crank valve could be opened fully.
We allowed the pumpdown to continue until reaching 9e-4 torr in the main volume. At this point we valved off the main volume, valved off TP2 and TP3, and then shut down all turbo pumps/dry pumps. We will continue pumping tomorrow under the supervision of an operator. If the system continues to perform problem-free, we will likely leave the turbos pumping on the main volume and annuli after tomorrow.
We installed a local controls terminal for the vacuum system on the desk in front of the vacuum rack (pictured below). This console is connected directly to c1vac and can be used to monitor/control the system even during a network outage or power failure. The entire pumpdown was run from this station today.
To open a controls MEDM screen, open a terminal and execute the alias
Similarly, to open a monitor-only MEDM screen, execute the alias
Overnight, the pressure increased from 247 uTorr to 264 uTorr over a period of 30000 seconds. Assuming an IFO volume of 33,000 liters, this corresponds to an average leak rate of ~20 uTorr L / s. It'd be interesting to see how this compares with the spec'd leak rates of the Viton O-ring seals and valves/ outgassing rates. The two channels in the screenshot are monitoring the same pressure from the same sensor, top pane is a digital readout while the bottom is a calibrated analog readout that is subsequently digitized into the CDS system.
Connected the manual gate valve status indicator to the Acromag box this morning. Labeled the temporary cable (a 50' 9p DSUB, will order a proper sized cable shortly) and the panel RV2.
[Jon, Gautam, Chub]
We continued the pumpdown of the IFO today. The main volume pressure has reached 1.9e-5 torr and is continuing to fall. The system has performed without issue all day, so we'll leave the turbos continuously running from here on in the normal pumping configuration. Both TP2 and TP3 are currently backing for TP1. Once the main volume reaches operating pressure, we can transition TP3 to pump the annuli. They have already been roughed to ~0.1 torr. At that point the speed of all three turbo pumps can also be reduced. I've finished final edits/cleanup of the interlock code and MEDM screens.
All the python code running on c1vac is archived to the git repo:
This includes both the interlock code and the serial device clients for interfacing with gauges and pumps.
We're still using the same base MEDM monitor/control screens, but they have been much improved. Improvements:
Note: The apparent glitches in the pressure and TP diagnostic channels are due to the interlock system being taken down to implement some of these changes.
As of 8pm local time, the IFO seems to have equilibriated to atmospheric pressure (I don't hear the hiss of in-rushing air near 1X8 and P1a reports 760 torr). The pumpspool looks healthy and there are no signs in the TP diagnostics channels that anything bad happened to the pumps. Chub is working on getting the N2 setup more robust, we plan to take the EY door off at 9am tomorrow morning with Bob's help.
* I took this opportunity to follow instructions on pg 29 of the manual and set the calibration for the SuperBee pirani gauge to 760 torr so that it is in better agreement with our existing P1a Pirani gauge. The correction was ~8% (820-->760).
Steve came by the lab today, and looked at the status of the upgraded vacuum system. He recommended pumping on the RGA volume, since it has not been pumped on for ~3 months on account of the vacuum upgrade. The procedure (so we may script this operation in the future) was:
CC4 pressure has been steadily falling. Steve recommends leaving things in this state over the weekend. He recommends also turning the RGA unit on so that the temperature rises and there is a bakeout of the RGA. The temperature may be read off manually using a probe attached to it.
I've attached my handwritten notes covering all the serial communications in the vac system, and the relevant wiring for all the adapters, etc. I'll work with Chub to produce a final documentation, but in the meantime this may be a useful reference.
The N2 ran out this weekend (again no reminder email, but I haven't found the time to setup the Python mailer yet). So all the valves Steve and I had opened, closed (rightly so, that's what the interlocks are supposed to do). Chub will post an elog about the new N2 valve setup in the Drill-press room, but we now have sufficient line pressure in the N2 line again. So Chub and I re-opened the valves to keep pumping on the RGA.
I reset the remote of this git repo to the 40m version instead of Jon's personal one, to ensure consistency between what's on the vacuum machine and in the git repo. There is now a N2 checker python mailer that will email the 40m list if all the tank pressures are below 600 PSI (>12 hours left for someone to react before the main N2 line pressure drops and the interlocks kick in). For now, the script just runs as a cron job every 3 hours, but perhaps we should integrate it with the interlock process?
The pressure of the main volume increased from ~1mtorr to 50mtorr for the past 24 hours (86ksec). This rate is about x1000 of the reported number on Jan 10. Do we suspect vacuum leak?
Overnight, the pressure increased from 247 uTorr to 264 uTorr over a period of 30000 seconds. Assuming an IFO volume of 33,000 liters, this corresponds to an average leak rate of ~20 uTorr L / s.
I looked into this a bit today. Did a walkthrough of the lab, didn't hear any obvious hissing (makes sense, that presumably would signal a much larger leak rate).
Attachment #1: Data from the 30 ksec we had the main vol valved off on Jan 10, but from the gauges we have running right now (the CC gauges have not had their HV enabled yet so we don't have that readback).
Attachment #2: Data from ~150 ksec from Friday night till now.
Interpretation: The number quoted from Jan 10 is from the cold-cathode gauge (~20 utorr increase). In the same period, the Pirani gauge reports a increase of ~5 mtorr (=250x the number reported by the cold-cathode gauge). So which gauge do we trust in this regime more? Additionally, the rate at which the annuli pressures are increasing seem consistent between Jan 10 and now, at ~100 mtorr every 30 ksec.
I don't think this is conclusive, but at least the leak rates between Jan 10 and now don't seem that different for the annuli pressures. Moreover, for the Jan 10 pumpdown, we had the IFO at low pressure for several days over the chirstmas break, which presumably gave time for some outgassing which was cleaned up by the TPs on Jan 10, whereas for this current pumpdown, we don't have that luxury.
Do we want to do a systematic leak check before resuming the pumpdown on Monday? The main differences in vacuum I can think of are
This entry by Steve says that the "expected" outgassing rate is 3-5 mtorr per day, which doesn't match either the current observation or that from Jan 10.
We can pump down (or vent) annuli. If this is the leak between the main volume and the annuli, we will be able to see the effect on the leak rate. If this is the leak of an outer o-ring, again pumping down (or venting) of the annuli should temporarily decrease (or increase) the leak rate..., I guess. If the leak rate is not dependent on the pressure of the annuli, we can conclude that it is internal outgassing.
As planned, we valved off the main volume and the annuli from the turbo-pumps at ~730 PM PST. At this time, the main volume pressure was 30 uTorr. It started rising at a rate of ~200 uTorr/hr, which translates to ~5 mtorr/day, which is in the ballpark of what Steve said is "normal". However, the calibration of the Hornet gauge seems to be piecewise-linear (see Attachment #1), so we will have to observe overnight to get a better handle on this number.
We decided to vent the IY and EY chamber annular volumes, and check if this made anu dramatic changes in the main volume pressure increase rate, presumably signalling a leak from the outside. However, we saw no such increase - so right now, the working hypothesis is still that the main volume pressure increase is being driven by outgassing of something from the vacuum.
Let's leave things in this state overnight - V1 and V5 closed so that neither the main volume nor the annuli are being pumped, and get some baseline numbers for what the outgassing rate is.
I guess we forgot to close V5, so we were indeed pumping on the ITMY and ETMY annuli, but the other three were isolated suggest a leak rate of ~200-300 mtorr/day, see Attachment #1 (consistent with my earlier post).
As for the main volume - according to CC1, the pressure saturates at ~250 uTorr and is stable, while the Pirani P1a reports ~100x that pressure. I guess the cold-cathode gauge is supposed to be more accurate at low pressures, but how well do we believe the calibration on either gauge? Either ways, based on last night's test (see Attachment #2), we can set an upper limit of 12 mtorr/day. This is 2-3x the number Steve said is normal, but perhaps this is down to the fact that the outgassing from the main volume is higher immediately after a vent and in-chamber work. It is also 5x lower rate of pressure increase than what was observed on Feb 2.
I am resuming the pumping down with the turbo-pumps, let's see how long we take to get down to the nominal operating pressure of 8e-6 torr, it ususally takes ~ 1 week. V1, VASV, VASE and VABS were opened at 1030am PST. Per Chub's request (see #14435), I ran RP1 and RP3 for ~30 seconds, he will check if the oil level has changed.
Pumpdown looks healthy, so I'm leaving the TPs on overnight. At some point, we should probably get the RGA going again. I don't know that we have a "reference" RGA trace that we can compare the scan to, should check with Steve. The high power (1 W) beam has not yet been sent into the vacuum, we should probably add the interlock condition that shuts off the PSL shutter before that.
[chub, steve, gautam]
Steve came by the lab today. He advised us to turn the RGA on again, now that the main volume pressure is < 20 uTorr. I did this by running the RGAset.py script on c0rga - the temperature of the unit was 22C in the morning, after ~3 hours of the filament being turned on, the temperature has already risen to 34 C. Steve says this is normal. We also opened VM1 (I had to edit the interlocks.yaml to allow VM1 to open when CC1 < 20uTorr instead of 10uTorr), so that the RGA volume is exposed to the main volume. So the nightly scans should run now, Steve suggests ignoring the first few while the pumpdown is still reaching nominal pressure. Note that we probably want to migrate all the RGA stuff to the new c1vac machine.
Other notes from Steve:
The full 1 W is again being sent into the IMC. We have left the PBS+HWP combo installed as Rana pointed out that it is good to have polarization control after the PMC but before the EOM. The G&H mirror setup used to route a pickoff of the post-EOM beam along the east edge of the PSL table to the AUX laser beat setup was deemed too flaky and has been bypassed. Centering on the steering mirror and subsequently the IMC REFL photodiode was done using an IR viewer - this technique allows one to geometrically center the beam on the steering mirror and PD, to the resolution of the eye, whereas the voltage maximization technique using the monitor port and an o'scope doesn't allow the former. Nominal IMC transmission of ~15,000 counts has been recovered, and the IMC REFL level is also around 0.12, consistent with the pre-vent levels.
One of the XT1111 units (XT1111a) in the new vacuum system has malfunctioned. So all valves are closed, PSL shutter is also closed, until this is resolved.
Pressure of the main volume seems to have stabilized - see Attachment #3, so it should be fine to leave the IFO in this state overnight.
The whole point of the upgrade was to move to a more reliable system - but seems quite flaky already.
I sent Gautam instructions to first try stopping the modbus service, power cycling the Acromag chassis, then restarting the service. I've seen the Acromags go into an unresponsive state after a strong electrical transient or shorted signal wires, and the unit has to be power cycled to be reset.
If this doesn't resolve it, I'll come in tomorrow to help with the Acromag replacement. We have plenty of spares.
The problem encountered with the vac controls was indeed resolved via the recommendation I posted yesterday. The Acromags had gone into a protective state (likely caused by an electrical transient in one of the signals) that could only be cleared by power cycling the units. After resetting the system, the main volume pressure dropped quickly and is now < 2e-5 torr, so normal operations can resume. For future reference, below is the procedure to safely reset these units from a trouble state.
If the acromags lock up whenever there is an electrical spike, shouldn't we have them on UPS to smooth out these ripples? And wasn't the idea to have some handshake/watchdog system to avoid silently dying computers?
The problem encountered with the vac controls was indeed resolved via the recommendation I posted yesterday. The Acromags had gone into a protective state (likely caused by an electrical transient in one of the signals)
The acromags are on the UPS. I suspect the transient came in on one of the signal lines. Chub tells me he unplugged one of the signal cables from the chassis around the time things died on Monday, although we couldn't reproduce the problem doing that again today.
In this situation it wasn't the software that died, but the acromag units themselves. I have an idea to detect future occurrences using a "blinker" signal. One acromag outputs a periodic signal which is directly sensed by another acromag. The can be implemented as another polling condition enforced by the interlock code.
While working on the vac controls today, I also took care of some of the remaining to-do items. Below is a summary of what was done, and what still remains.
sudo mkfs -t ext4 /dev/sdb
sudo dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync
controls@c1vac:~$ sudo dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync
[sudo] password for controls:
^C283422+0 records in
283422+0 records out
18574344192 bytes (19 GB) copied, 719.699 s, 25.8 MB/s
I'm rebooting the IOLAN server to load new serial ports. The interlocks might trip when the pressure gauge readbacks cut out.
Today I implemented protection of the vac system against extended power losses. Previously, the vac controls system (both old and new) could not communicate with the APC Smart-UPS 2200 providing backup power. This was not an issue for short glitches, but for extended outages the system had no way of knowing it was running on dwindling reserve power. An intelligent system should sense the outage and put the IFO into a controlled shutdown, before the batteries are fully drained.
What enabled this was a workaround Gautam and I found for communicating with the UPS serially. Although the UPS has a serial port, neither the connector pinout nor the low-level command protocol are released by APC. The only official way to communicate with the UPS is through their high-level PowerChute software. However, we did find "unofficial" documentation of APC's protocol. Using this information, I was able to interface the the UPS to the IOLAN serial device server. This allowed the UPS status to be queried using the same Python/TCP sockets model as all the other serial devices (gauges, pumps, etc.). I created a new service called "serial_UPS.service" to persistently run this Python process like the others. I added a new EPICS channel "C1:Vac-UPS_status" which is updated by this process.
With all this in place, I added new logic to the interlock.py code which closes all valves and stops all pumps in the event of a power failure. To be conservative, this interlock is also tripped when the communications link with the UPS is disconnected (i.e., when the power state becomes unknown). I tested the new conditions against both communication failure (by disconnecting the serial cable) and power failure (by pressing the "Test" button on the UPS front panel). This protects TP2 and TP3. However, I discovered that TP1---the pump that might be most damaged by a sudden power failure---is not on the UPS. It's plugged directly into a 240V outlet along the wall. This is because the current UPS doesn't have any 240V sockets. I'd recommend we get one that can handle all the turbo pumps.
Pin 1: RxD
Pin 2: TxD
Pin 5: GND
Baud rate: 2400
Data bits: 8
Stop bits: 1
Work is completed and the vac system is back in its nominal state.
The vac controls system is going down for migration from Python 2.7 to 3.4. Will advise when it is back up.
I've converted all the vac control system code to run on Python 3.4, the latest version available through the Debian package manager. Note that these codes now REQUIRE Python 3.x. We decided there was no need to preserve Python 2.x compatibility. I'm leaving the vac system returned to its nominal state ("vacuum normal + RGA").
agreed - we need all pumps on UPS for their safety and also so that we can spin them down safely. Can you and Chub please find a suitable UPS?
However, I discovered that TP1---the pump that might be most damaged by a sudden power failure---is not on the UPS. It's plugged directly into a 240V outlet along the wall. This is because the current UPS doesn't have any 240V sockets. I'd recommend we get one that can handle all the turbo pumps.
While glancing at my Vacuum striptool, I noticed that the IFO pressure is 2e-4 torr. There was an "AC power loss" reported by C1Vac about 4 hours (14:07 local time) ago. We are investigating. I closed the PSL shutter.
Jon and I investigated at the vacuum rack. The UPS was reporting a normal status ("On Line"). Everything looked normal so we attempted to bring the system back to the nominal state. But TP2 drypump was making a loud rattling noise, and the TP2 foreline pressure was not coming down at a normal rate. We wonder if the TP2 drypump has somehow been damaged - we leave it for Chub to investigate and give a more professional assessment of the situation and what the appropriate course of action is.
The PSL shutter will remain closed overning, and the main volume and annuli are valved off. We spun up TP1 and TP3 and decided to leave them on (but they have negligible load).
Overnight pressure trends don't suggest anything went awry after the initial interlock trip. Some watchdog script that monitors vacuum pressure and closes the PSL shutter in the event of pressure exceeding some threshold needs to be implemented. Another pending task is to make sure that backup disk for c1vac actually is bootable and is a plug-and-play replacement.
Bob and Chub concluded that the drypump that serves as TP2's forepump had failed. Steve had told me the whereabouts of a spare Agilent IDP-7. This was meant to be a replacement for the TP3 foreline pump when it failed, but we decided to swap it in while diagnosing the failed drypump (which had 2182 hours continuous running according to the hour counter). Sure enough, the spare pump spun up and the TP2fl pressure dropped at a rate consistent with what is expected. I was then able to spin up TP1, TP2 and TP3.
However, when opening V4 (the foreline of TP1 pumped by TP2), I heard a loud repeated click track (~5Hz) from the electronics rack. Shortly after, the interlocks shut down all the TPs again, citing "AC power loss". Something is not right, I leave it to Jon and Chub to investigate.
I can't explain the mechanical switching sound Gautam reported. The relay controlling power to the TP2 forepump is housed in the main AC relay box under the arm tube, not in the Acromag chassis, so it can't be from that. I've cycled through the pumpdown sequence several times and can't reproduce the effect. The Acromag switches for TP2 still work fine.
In any case, I've made modifications to the vacuum interlocks that will help with two of the issues:
C1:AUX-PSL_ShutterRqst --> 0
After finishing this vac work, I began a new pumpdown at ~4:30pm. The pressure fell quickly and has already reached ~1e-5 torr. TP2 current and temp look fine.
PSL shutter was re-opened at 6pm local time. IMC was locked. As of 10pm, the main volume pressure is already back down to the 8e-6 level.
Is this one close to failure as well?
This happened again, about 30,000 seconds (~2:06pm local time according to the logfile) ago. The cited error was the same -
Hard to believe there was any real power loss, nothing else in the lab seems to have been affected so I am inclined to suspect a buggy UPS communication channel. The PSL shutter was not closed - I believe the condition is for P1a to exceed 3 mtorr (it is at 1 mtorr right now), but perhaps this should be modified to close the PSL shutter in the event of any interlock tripping. Also, probably not a bad idea to send an email alert to the lab mailing list in the event of a vac interlock failure.
For tonight, I only plan to work with the EX ALS system anyways so I'm closing the PSL shutter, I'll work with Chub to restore the vacuum if he deems it okay tomorrow.
After getting the go ahead from Chub and Jon, I restored the Vacuum state to "Vacuum normal", see Attachment #1. Steps:
controls@c1vac:/opt/target/python/interlocks$ git diff interlock.py
diff --git a/python/interlocks/interlock.py b/python/interlocks/interlock.py
index 28d3366..46a39fc 100755
@@ -52,8 +52,8 @@ class Interlock(object):
self.pumps = 
for pump in interlocks['pumps']:
pm = PumpManager(pump['name'])
- for condition in pump['conditions']:
+ #for condition in pump['conditions']:
+ # pm.register_condition(*condition)
So far the pressure is coming down smoothly, see Attachment #2. I'll keep an eye on it.
PSL shutter was opened at 645pm local time. IMC locked almost immediately.
Update 11pm: The pressure has reached 8.5e-6 torr without hiccup.
I slightly cleaned up Gautam's disabling of the UPS-predicated vac interlock and restarted the interlock service. This interlock is intended to protect the turbo pumps after a power outage, but it has proven disruptive to normal operations with too many false triggers. It will be reenabled once a new UPS has been installed. For now, as it has been since 2001, the vac pumps are unprotected against an extended power outage.
This activity seems to have closed the PSL shutter (actually I'm not sure why that happened - the interlock should only trip if P1a exceeds 3 mtorr, and looking at the time series for the last 2 hours, it did not ever exceed this threshold). I saw no reason for it to remain closed so I re-opened it just now.
I vote for not remotely rebooting any of the vacuum / PSL subsystems. In the event of something going catastrophically wrong, someone should be on hand to take action in the lab.
I did the following:
I think this completes the pre-pumpdown alignment checks we usually do. The detailed plan for tomorrow is here: please have a look and lmk if I missed something.
[chub, koji, gautam]
Close up photos of the EY and IY chambers may be found here.
Update on the display manager of c1vac: I was able to get it working again by running sudo systemctl restart display-manager. Now I can interact with the MEDM screens on c1vac. It is a bit annoying that this machine doesn't have the users directory so I don't have access to the many convenient StripTool templates though - maybe I'll make local copies tomorrow for the pumpdown.
Overnight, the pressure of the main volume only rose by 10 mtorr, so there was no need to run the roughing pumps again. So we went straight to the turbos - hooked up the AUX drypump and set it up to back TP2. Initially, we tried having both TP2 and TP3 act as backing pumps for TP1, but the wimpy TP3 current was always passing the interlock threshold. So we decided to pump down with TP3 valved off, only TP2 backing TP1. This went smooth - we had to keep an eye on P2, to make sure it stayed below 1 torr. It took ~ 1 hour to go from 500 mtorr to 100 mtorr, but after that, I could almost immediately open up RV2 completely. A safe setting to run at seems to be to have RV2 open by between 0.5 and 1 turn (out of the full range of 7 turns) until the pressure drops to ~100 mtorr. Then we can crank it open. We are, at the time of writing, at ~8e-5 torr and the pressure is coming down steadily.
I had to manually clear the IG error on the CC1 gauge, and re-enabled the High Voltage, so that we have a readback of the main volume pressure in that range. I made a script to do this (enable the HV, the IG error still has to be cleared by pushing the appropriate buttons on the Hornet), it lives at /opt/target/python/serial/turnHornetON.py. I guess it'll take a few days to hit 8e-6 torr, but I don't see any reason to not leave the turbos running over the weekend.
Remaining tasks are (i) disconnect the roughing pump line and (ii) pump down the annuli, which will be done later today. Both were done at ~2pm, now we are in the vacuum normal config. I'll turn the two small turbos to run on "Standby Mode" before I head home today. I think TP3 may be close to end-of-life - the TP3 current went up to 1A even while evacuating the small volume of the annular line (which was already at 1 torr) with the AUX drypump backing it. The interlock condition is set to trip at 1.2A, and this pump is nominally supposed to be able to back TP1 during the pumpdown of the main volume from 500 mtorr, which it wasn't able to do.
I've been monitoring the status of the pumpdown remotely with ndscope lookbacks of C1:Vac-CC1_pressure. Today morning, I saw that the channel was putting out a constant value (signature of EPICS server being frozen). caget did not work either. Then I tried ssh-ing into c1vac to see if there were any issues but I was unable to. The machine isn't responding to ping either. The EPICS value has been frozen since ~1030pm PDT 26 May 2019.
I will try and head to campus later today to check on it. Isn't an email alert or soemthing supposed to be sent out in such an event?
The vacuum itself was fine - CC1 gauge reported a pressure of 1.3e-5 torr. Note to self: the C1:Vac-CC1_HORNET_PRESSURE channel, which is the analog readback of the Hornet gauge and which is hooked up to an Acromag ADC in the c1auxex chassis, is independent of the status of the c1vac machine, and so can serve as a diagnostic.
However, I was unable to interact with c1vac in any way, the monitor hooked up directly to it was showing a frozen display. So I hard-rebooted the system. It took a few minutes to come back online - but even after 10 minutes of waiting, still no display. In the process of the reboot, several valves were closed off - when the EPICS processes restart, there are momentary instances where the readback channels get an "undefined" value, which prompts the main interlock process to transition to a "SAFE" state.
Running df -h, I saw that the /var partition was completely full. Maybe this was somehow interfering with the machine running smoothly? Two files in particular, daemon.log and daemon.log.1 were ~1GB each. The contents of these files seemed to be just the readbacks for the caget and caput commands. So I cleared both these files, and now the /var partition usage is only 26%. I also got the display back up and running on the physical monitor hooked up to the c1vac machine's VGA port. Let's see if this has improved the stability situation. The CPU load is still high (~6-7), with most of this coming from the modbus process. Why is this so high? c1susaux has more Acromag units but claims a much lower load of 0.71. Is the CPU of the c1vac machine somehow inferior?
In the meantime, I ssh-ed into c1vac and restored the "Vacuum normal" valve config. During this little escapade, the main volume pressure rose to ~6e-5 torr. It's coming back down smoothly.
Unrelated to this work: we had turned the RGA off for the vent, I powered it back on and re-initialized it this morning.
Gautam and I debugged a communications problem with TP3 that was causing its python service to fail. We traced the problem back to the querying of the pump controller for its operational parameters (speed, voltage, temp). Some small percentage of the time (~5%, indeterministically), the pump controller is returning an invalid response which causes the service to shut itself down and signal a NO COMM error.
As a temporary fix, I wrapped the failing query in an exception handler to continue past this particular error. However, we suspect the microprocessor in the TP3 controller may be beginning to fail. There is a spare controller sitting right next to it in the vacuum rack. We will ask Chub to install the replacement in the near future.
gautam: this pump is responsible for pumping the annular volume under normal operations. while this problem is being resolved, the annular volume is valved off (as it has been since July 2019 anyway which is when this problem first manifested).
There was a jump in the main volume pressure at ~6pm PDT yesterday. The cause is unknown, but the pressure doesn't seem to be coming back down (but also isn't increasing alarmingly).
I wanted to look at the RGA scans to see if there were any clues as to what changed, but looks like the daily RGA scans stopped updating on Dec 24 2019. The c0rga machine responsible for running these scans doesn't respond to ssh. Not much to be done until the lockdown is over i guess...