Gautam & Steve,
Our controller is back with Osaka maintenace completed. We swapped it in this morning.
TP-1 Osaka maglev controller [ model TCO10M, ser V3F04J07 ] needs maintenance. Alarm led on indicating that we need Lv2 service.
The turbo and the controller are in good working order.
Our maintenance level 2 service price is $...... It consists of a complete disassembly of the controller for internal cleaning of all ICB’s, replacement of all main board capacitors, replacement of all internal cooling units, ROM battery replacement, re-assembly, and mandatory final testing to make sure it meets our factory specifications. Turnaround time is approximately 3 weeks.
RMA 5686 has been assigned to Caltech’s returning TC010M controller. Attached please find our RMA forms. Complete and return them to us via email, along with your PO, prior to shipping the cont
Osaka Vacuum USA, Inc.
510-770-0100 x 109
our TP-1 TG390MCAB is 9 years old. What is the life expectancy of this turbo?
The Osaka maglev turbopumps are designed with a 100,000 hours(or ~ 10 operating years) life span but as you know most of our end-users are
running their Osaka maglev turbopumps in excess of 10+, 15+ years continuously. The 100,000 hours design value is based upon the AL material being rotated at
the given speed. But the design fudge factor have somehow elongated the practical life span.
We should have the cost of new maglev & controller in next year budget. I put the quote into the wiki.
Steve reported to me that the CC1 Hornet gauge was not reporting the IFO pressure after some cable tracing at EX. I found that the power to the unit had been accidentally disconnected. I re-connected the power and manually turned on the HV on the CC gauge (perhaps this can be automated in the new vacuum paradigm). IFO pressure of 8e-6 torr is being reported now.
Jon and I stuck a extender card into the eurocrate at 1X8 earlier today (~5pm PT), to see if the box was getting +24V DC from the Sorensen or not. Upon sticking the card in, the FAIL LEDs on all the VME cards came on. We immediately removed the extender card. Without any intervention from us, after ~1 minute, the FAIL LEDs went off again. Judging by the main volume pressure (Attachment #1) and the Vacuum MEDM screen (Attachment #2), this did not create any issues and the c1vac1 computer is still responsive.
But Steve can perhaps run a check in the AM to confirm that this activity didn't break anything.
Is there a reason why extender cards shouldn't be stuck into eurocrates?
The vacuum and MC are OK
The 40m vacuum envelope has one large single O-ring on the OOC west side. All other doors have double O-ring with annuloses.
There are 3 spacers to protect o-ring. They should not be removed!
The Cryo-pump static seal to VC1 also viton. All gate valves and right angle valve plates have single viton o-ring seal.
Small single viton o-rings on all optical quality viewports.
Helium will permiate through these fast. Leak checking time is limited to 5-10 minutes.
All other seals are copper gaskits. We have 2 manual right angle with METAL-dynamic seal [ VATRING ] as VV1 & RV1
Our 4 ion pumps were closed off for a lomg time. I estmated their pressure to be around ~1 Torr. After talking with Koji we decided not to vent them.
It'd be still useful to wire their position sensors. But make sure we do not actuate the valves.
The cryo pump was regenerated to 1e-4 Torr about 2 years ago. It's pressure can be ~ 2 Torr with charcoal powder. It is a dirty system at room temperature.
Do not actuate VC1 and VC2, and keep its manual valve closed.
IF someone feels we should vent them for some reason, let us know here in the elog before Monday morning.
Wiring of the power, Ethernet, and indicator lights for the vacuum Acromag chassis is complete. Even though this crate will only use +24V DC, I wired the +/-15V connector and indicator lights as well to conform to the LIGO standard. There was no wiring diagram available, so I had to reverse-engineer the wiring from the partially complete c1susaux crate. Attached is a diagram for future use. The crate is ready to begin software developing on Monday.
Vent 80 is nearly complete; the instrument is almost to atmosphere. All four ion pump gate valves have been disconnected, though the position sensors are still connected,and all annulus valves are open. The controllers of TP1 and TP3 have been disconnected from AC power. VC1 and VC2 have been disconnected and must remained closed. Currently, the RGA is being vented through the needle valve and the RGA had been shut off at the beginning of the vent preparations. VM1 and VM3 could not be actuated. The condition status is still listed as Unidentified because of the disconnected valves.
Gautam, Aaron, Chub and Steve,
The vent 81 is completed.
4 ion pumps and cryo pump are at ~ 1-4 Torr (estimated as we have no gauges there), all other parts of the vacuum envelope are at atm. P2 & P3 gauges are out of order.
V1 and VM1 are in a locked state. We suspect this is because of some interlock logic.
TP1 and TP3 controllers are turned off.
Valve conditions as shown: ready to be opened or closed or moved or rewired. To re-iterate: VC1, VC2, and the Ion Pump valves shouldn't be re-connected during the vac upgrade.
Thanks for all of your help.
As I was turning off the lights in the VEA, I heard a rattling sound from near the PSL enclosure. I followed it to a valve - I couldn't see a label on this valve in my brief effort to find one, but it is on the south-west corner of the IMC table, so maybe VABSSCI or VABSSCO? The power cable is somehow spliced with an attachment that looks to be bringing gas in/out of the valve (See Attachment #1), and the nut on the bottom was loose, the whole power cable + mettal attachment was responsible for the rattling. I finger-tightened the nut and the sound went away.
I checked the IMC alignment following the vent, for which the manual beam block placed on the PSL table was removed. The alignment is okay, after minor touchup, the MC Trans was ~1200 cts which is roughly what it was pre-vent. I've closed the PSL shutter again.
Gautam, Aaron, Chub & Steve,
ETMY heavy door replaced by light one.
We did the following: measured 950 particles/cf min of 0.5 micron at SP table, wiped crane and it's cable, wiped chamber,
placed heavy door on clean merostate covered stand, dry wiped o-rings and isopropanol wiped Aluminum light cover
Chub & Steve,
We swapped in our replacement of Varian V70D "bear-can" turbo as factory clean.
The new Agilent TwisTorr 84 FS turbo pump [ model x3502-64002, sn IT17346059 ] with intake screen, fan, vent valve. The controller [ model 3508-64001, sn IT1737C383 ] and a larger drypump IDP-7, [ model x3807-64010, sn MY17170019 ] was installed.
Next things to do:
We still need elaborated test procedure posted
gautam: Koji and I were just staring at the vacuum screen, and realized that the drypumps, which are the backing pumps for TP2 and TP3, are not reflected on the MEDM screen. This should be rectified.
Steve also mentioned that the new small turbo controller does not directly interface with the drypump. So we need some system to delay the starting of the turbo itself, once the drypump has been engaged. Does this system exist?
Linked is the pumpdown procedure, contained in the old 40m documentation. The relevant procedure is "All Off --> Vacuum Normal" on page 11.
[Chub, Koji, Gautam]
We replaced the EY and IOO chamber heavy doors by 10:10 am PST. Torquing was done first oen round at 25 ft-lb, next at 45 ft-lb (we trust the calibration on the torque wrench, but how reliable is this? And how important are these numbers in ensuring a smooth pumpdown?). All went smooth. The interior of the IOO chamber was found to be dirty when Koji ran a wipe along some surfaces.
For this pumpdown, we aren't so concerned with having the IFO in an operating state as we will certainly vent it again early next year. So we didn't follow the full close-up checklist.
Jon and Chub and Koji are working on starting the pumpdown now... In order to not have to wear laser safety goggles while we closed doors and pumped down, I turned off all the 1064nm lasers in the lab.
Per the discussion yesterday, I valved off the N2 line in the drill press room at 11 am PST today morning so as to avoid any accidental software induced gate-valve actuation during the holidays. The line pressure is steadily dropping...
Attachment #1 shows that while the main volume pressure was stable overnight, the the pumpspool pressure has been steadily rising. I think this is to be expected as the turbo pumps aren't running and the valves can't preserve the <1mtorr pressure over long timescales?
Attachment #2 shows the current VacOverview MEDM screen status.
Independent question: Are all the turbo forelines vented automatically? We manually did it for the main roughing line.
Larry W came by the 40m, and reported that there was a campus-wide power glitch (he was here to check if our networking infrastructure was affected). I thought I'd check the status of the vacuum.
I decided to check the systemctl process status on c1vac:
controls@c1vac:~$ sudo systemctl status modbusIOC.service
● modbusIOC.service - ModbusIOC Service via procServ
Loaded: loaded (/etc/systemd/system/modbusIOC.service; enabled)
Active: active (running) since Thu 2019-01-03 14:53:49 PST; 11min ago
Main PID: 16533 (procServ)
├─16533 /usr/bin/procServ -f -L /opt/target/modbusIOC.log -p /run/...
├─16534 /opt/epics/modules/modbus/bin/linux-x86_64/modbusApp /opt/...
Jan 03 14:53:49 c1vac systemd: Started ModbusIOC Service via procServ.
Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.
So something did happen today that required restart of the modbus processes. But clearly not everything has come back up gracefully. A few lines of dmesg (there are many more segfaults):
[1706033.718061] python: segfault at 8 ip 000000000049b37d sp 00007fbae2b5fa10 error 4 in python2.7[400000+31d000]
[1706252.225984] python: segfault at 8 ip 000000000049b37d sp 00007fd3fa365a10 error 4 in python2.7[400000+31d000]
[1720961.451787] systemd-udevd: starting version 215
[1782064.269844] audit: type=1702 audit(1546540443.159:38): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.269866] audit: type=1302 audit(1546540443.159:39): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/85/tmp_obj_uAXhPg" inode=173019272 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.365240] audit: type=1702 audit(1546540443.255:40): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.365271] audit: type=1302 audit(1546540443.255:41): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/58/tmp_obj_KekHsn" inode=173019274 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.460620] audit: type=1702 audit(1546540443.347:42): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.460652] audit: type=1302 audit(1546540443.347:43): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/cb/tmp_obj_q62Pdr" inode=173019276 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.545449] audit: type=1702 audit(1546540443.435:44): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1782064.545480] audit: type=1302 audit(1546540443.435:45): item=0 name="/cvs/cds/caltech/target/c1vac/.git/objects/e3/tmp_obj_gPI4qy" inode=173019277 dev=00:21 mode=0100444 ouid=1001 ogid=1001 rdev=00:00 nametype=NORMAL
[1782064.640756] audit: type=1702 audit(1546540443.527:46): op=linkat ppid=21820 pid=22823 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=4294967295 comm="git" exe="/usr/bin/git" res=0
[1783440.878997] systemd: Unit serial_TP3.service entered failed state.
[1784682.147280] systemd: Unit serial_TP2.service entered failed state.
[1786407.752386] systemd: Unit serial_MKS937b.service entered failed state.
[1792371.508317] systemd: serial_GP316a.service failed to run 'start' task: No such file or directory
[1795550.281623] systemd: Unit serial_GP316b.service entered failed state.
[1796216.213269] systemd: Unit serial_TP3.service entered failed state.
[1796518.976841] systemd: Unit serial_GP307.service entered failed state.
[1796670.328649] systemd: serial_Hornet.service failed to run 'start' task: No such file or directory
[1797723.446084] systemd: Unit serial_MKS937b.service entered failed state.
I don't know enough about the new system so I'm leaving this for Jon to debug. Attachment #3 shows that the analog readout of the P1 pressure gauge suggests that the IFO is still under vacuum, so no random valve openings were effected (as expected, since we valved off the N2 line for this very purpose).
Yes, for TP2 and TP3. They both have a small vent valve that opens automatically on shutdown.
Looks like I didn't restart all the daqd processes last night, so the data was not in fact being recorded to frames. I just restarted everything, and looks like the data for the last 3 minutes are being recorded . Is it reasonable that the TP1 current channel is reporting 0.75A of current draw now, when the pump is off? Also the temperature readback of TP3 seems a lot jumpier than that of TP2, probably has to do with the old controller having fewer ADC bits or something, but perhaps the SMOO needs to be adjusted.
Gautam and I updated the framebuilder config file, adding the newly-added channels to the list of those to be logged.
[Jon, Koji, Chub, Gautam]
The second pumpdown with the new vacuum system was completed successfully today. A time history is attached below.
We started with the main volume still at 12 torr from the Dec. pumpdown. Roughing from 12 to 0.5 torr took approximately two hours, at which point we valved out RP1 and RP3 and valved in TP1 backed by TP2 and TP3. We additionally used the AUX dry pump connected to the backing lines of TP2 and TP3, which we found to boost the overall pump rate by a factor of ~3. The manual hand-crank valve directly in front of TP1 was used to throttle the pump rate, to avoid tripping software interlocks. If the crank valve is opened too quickly, the pressure differential between the main volume (TP1 intake) and TP1 exhaust becomes >1 torr, tripping the V1 valve-close interlock. Once the main volume pressure reached 1e-2 torr, the crank valve could be opened fully.
We allowed the pumpdown to continue until reaching 9e-4 torr in the main volume. At this point we valved off the main volume, valved off TP2 and TP3, and then shut down all turbo pumps/dry pumps. We will continue pumping tomorrow under the supervision of an operator. If the system continues to perform problem-free, we will likely leave the turbos pumping on the main volume and annuli after tomorrow.
We installed a local controls terminal for the vacuum system on the desk in front of the vacuum rack (pictured below). This console is connected directly to c1vac and can be used to monitor/control the system even during a network outage or power failure. The entire pumpdown was run from this station today.
To open a controls MEDM screen, open a terminal and execute the alias
Similarly, to open a monitor-only MEDM screen, execute the alias
Overnight, the pressure increased from 247 uTorr to 264 uTorr over a period of 30000 seconds. Assuming an IFO volume of 33,000 liters, this corresponds to an average leak rate of ~20 uTorr L / s. It'd be interesting to see how this compares with the spec'd leak rates of the Viton O-ring seals and valves/ outgassing rates. The two channels in the screenshot are monitoring the same pressure from the same sensor, top pane is a digital readout while the bottom is a calibrated analog readout that is subsequently digitized into the CDS system.
Connected the manual gate valve status indicator to the Acromag box this morning. Labeled the temporary cable (a 50' 9p DSUB, will order a proper sized cable shortly) and the panel RV2.
[Jon, Gautam, Chub]
We continued the pumpdown of the IFO today. The main volume pressure has reached 1.9e-5 torr and is continuing to fall. The system has performed without issue all day, so we'll leave the turbos continuously running from here on in the normal pumping configuration. Both TP2 and TP3 are currently backing for TP1. Once the main volume reaches operating pressure, we can transition TP3 to pump the annuli. They have already been roughed to ~0.1 torr. At that point the speed of all three turbo pumps can also be reduced. I've finished final edits/cleanup of the interlock code and MEDM screens.
All the python code running on c1vac is archived to the git repo:
This includes both the interlock code and the serial device clients for interfacing with gauges and pumps.
We're still using the same base MEDM monitor/control screens, but they have been much improved. Improvements:
Note: The apparent glitches in the pressure and TP diagnostic channels are due to the interlock system being taken down to implement some of these changes.
As of 8pm local time, the IFO seems to have equilibriated to atmospheric pressure (I don't hear the hiss of in-rushing air near 1X8 and P1a reports 760 torr). The pumpspool looks healthy and there are no signs in the TP diagnostics channels that anything bad happened to the pumps. Chub is working on getting the N2 setup more robust, we plan to take the EY door off at 9am tomorrow morning with Bob's help.
* I took this opportunity to follow instructions on pg 29 of the manual and set the calibration for the SuperBee pirani gauge to 760 torr so that it is in better agreement with our existing P1a Pirani gauge. The correction was ~8% (820-->760).
Steve came by the lab today, and looked at the status of the upgraded vacuum system. He recommended pumping on the RGA volume, since it has not been pumped on for ~3 months on account of the vacuum upgrade. The procedure (so we may script this operation in the future) was:
CC4 pressure has been steadily falling. Steve recommends leaving things in this state over the weekend. He recommends also turning the RGA unit on so that the temperature rises and there is a bakeout of the RGA. The temperature may be read off manually using a probe attached to it.
I've attached my handwritten notes covering all the serial communications in the vac system, and the relevant wiring for all the adapters, etc. I'll work with Chub to produce a final documentation, but in the meantime this may be a useful reference.
The N2 ran out this weekend (again no reminder email, but I haven't found the time to setup the Python mailer yet). So all the valves Steve and I had opened, closed (rightly so, that's what the interlocks are supposed to do). Chub will post an elog about the new N2 valve setup in the Drill-press room, but we now have sufficient line pressure in the N2 line again. So Chub and I re-opened the valves to keep pumping on the RGA.
I reset the remote of this git repo to the 40m version instead of Jon's personal one, to ensure consistency between what's on the vacuum machine and in the git repo. There is now a N2 checker python mailer that will email the 40m list if all the tank pressures are below 600 PSI (>12 hours left for someone to react before the main N2 line pressure drops and the interlocks kick in). For now, the script just runs as a cron job every 3 hours, but perhaps we should integrate it with the interlock process?
The pressure of the main volume increased from ~1mtorr to 50mtorr for the past 24 hours (86ksec). This rate is about x1000 of the reported number on Jan 10. Do we suspect vacuum leak?
Overnight, the pressure increased from 247 uTorr to 264 uTorr over a period of 30000 seconds. Assuming an IFO volume of 33,000 liters, this corresponds to an average leak rate of ~20 uTorr L / s.
I looked into this a bit today. Did a walkthrough of the lab, didn't hear any obvious hissing (makes sense, that presumably would signal a much larger leak rate).
Attachment #1: Data from the 30 ksec we had the main vol valved off on Jan 10, but from the gauges we have running right now (the CC gauges have not had their HV enabled yet so we don't have that readback).
Attachment #2: Data from ~150 ksec from Friday night till now.
Interpretation: The number quoted from Jan 10 is from the cold-cathode gauge (~20 utorr increase). In the same period, the Pirani gauge reports a increase of ~5 mtorr (=250x the number reported by the cold-cathode gauge). So which gauge do we trust in this regime more? Additionally, the rate at which the annuli pressures are increasing seem consistent between Jan 10 and now, at ~100 mtorr every 30 ksec.
I don't think this is conclusive, but at least the leak rates between Jan 10 and now don't seem that different for the annuli pressures. Moreover, for the Jan 10 pumpdown, we had the IFO at low pressure for several days over the chirstmas break, which presumably gave time for some outgassing which was cleaned up by the TPs on Jan 10, whereas for this current pumpdown, we don't have that luxury.
Do we want to do a systematic leak check before resuming the pumpdown on Monday? The main differences in vacuum I can think of are
This entry by Steve says that the "expected" outgassing rate is 3-5 mtorr per day, which doesn't match either the current observation or that from Jan 10.
We can pump down (or vent) annuli. If this is the leak between the main volume and the annuli, we will be able to see the effect on the leak rate. If this is the leak of an outer o-ring, again pumping down (or venting) of the annuli should temporarily decrease (or increase) the leak rate..., I guess. If the leak rate is not dependent on the pressure of the annuli, we can conclude that it is internal outgassing.
As planned, we valved off the main volume and the annuli from the turbo-pumps at ~730 PM PST. At this time, the main volume pressure was 30 uTorr. It started rising at a rate of ~200 uTorr/hr, which translates to ~5 mtorr/day, which is in the ballpark of what Steve said is "normal". However, the calibration of the Hornet gauge seems to be piecewise-linear (see Attachment #1), so we will have to observe overnight to get a better handle on this number.
We decided to vent the IY and EY chamber annular volumes, and check if this made anu dramatic changes in the main volume pressure increase rate, presumably signalling a leak from the outside. However, we saw no such increase - so right now, the working hypothesis is still that the main volume pressure increase is being driven by outgassing of something from the vacuum.
Let's leave things in this state overnight - V1 and V5 closed so that neither the main volume nor the annuli are being pumped, and get some baseline numbers for what the outgassing rate is.
I guess we forgot to close V5, so we were indeed pumping on the ITMY and ETMY annuli, but the other three were isolated suggest a leak rate of ~200-300 mtorr/day, see Attachment #1 (consistent with my earlier post).
As for the main volume - according to CC1, the pressure saturates at ~250 uTorr and is stable, while the Pirani P1a reports ~100x that pressure. I guess the cold-cathode gauge is supposed to be more accurate at low pressures, but how well do we believe the calibration on either gauge? Either ways, based on last night's test (see Attachment #2), we can set an upper limit of 12 mtorr/day. This is 2-3x the number Steve said is normal, but perhaps this is down to the fact that the outgassing from the main volume is higher immediately after a vent and in-chamber work. It is also 5x lower rate of pressure increase than what was observed on Feb 2.
I am resuming the pumping down with the turbo-pumps, let's see how long we take to get down to the nominal operating pressure of 8e-6 torr, it ususally takes ~ 1 week. V1, VASV, VASE and VABS were opened at 1030am PST. Per Chub's request (see #14435), I ran RP1 and RP3 for ~30 seconds, he will check if the oil level has changed.
Pumpdown looks healthy, so I'm leaving the TPs on overnight. At some point, we should probably get the RGA going again. I don't know that we have a "reference" RGA trace that we can compare the scan to, should check with Steve. The high power (1 W) beam has not yet been sent into the vacuum, we should probably add the interlock condition that shuts off the PSL shutter before that.
[chub, steve, gautam]
Steve came by the lab today. He advised us to turn the RGA on again, now that the main volume pressure is < 20 uTorr. I did this by running the RGAset.py script on c0rga - the temperature of the unit was 22C in the morning, after ~3 hours of the filament being turned on, the temperature has already risen to 34 C. Steve says this is normal. We also opened VM1 (I had to edit the interlocks.yaml to allow VM1 to open when CC1 < 20uTorr instead of 10uTorr), so that the RGA volume is exposed to the main volume. So the nightly scans should run now, Steve suggests ignoring the first few while the pumpdown is still reaching nominal pressure. Note that we probably want to migrate all the RGA stuff to the new c1vac machine.
Other notes from Steve:
The full 1 W is again being sent into the IMC. We have left the PBS+HWP combo installed as Rana pointed out that it is good to have polarization control after the PMC but before the EOM. The G&H mirror setup used to route a pickoff of the post-EOM beam along the east edge of the PSL table to the AUX laser beat setup was deemed too flaky and has been bypassed. Centering on the steering mirror and subsequently the IMC REFL photodiode was done using an IR viewer - this technique allows one to geometrically center the beam on the steering mirror and PD, to the resolution of the eye, whereas the voltage maximization technique using the monitor port and an o'scope doesn't allow the former. Nominal IMC transmission of ~15,000 counts has been recovered, and the IMC REFL level is also around 0.12, consistent with the pre-vent levels.
One of the XT1111 units (XT1111a) in the new vacuum system has malfunctioned. So all valves are closed, PSL shutter is also closed, until this is resolved.
Pressure of the main volume seems to have stabilized - see Attachment #3, so it should be fine to leave the IFO in this state overnight.
The whole point of the upgrade was to move to a more reliable system - but seems quite flaky already.
I sent Gautam instructions to first try stopping the modbus service, power cycling the Acromag chassis, then restarting the service. I've seen the Acromags go into an unresponsive state after a strong electrical transient or shorted signal wires, and the unit has to be power cycled to be reset.
If this doesn't resolve it, I'll come in tomorrow to help with the Acromag replacement. We have plenty of spares.
The problem encountered with the vac controls was indeed resolved via the recommendation I posted yesterday. The Acromags had gone into a protective state (likely caused by an electrical transient in one of the signals) that could only be cleared by power cycling the units. After resetting the system, the main volume pressure dropped quickly and is now < 2e-5 torr, so normal operations can resume. For future reference, below is the procedure to safely reset these units from a trouble state.
If the acromags lock up whenever there is an electrical spike, shouldn't we have them on UPS to smooth out these ripples? And wasn't the idea to have some handshake/watchdog system to avoid silently dying computers?
The problem encountered with the vac controls was indeed resolved via the recommendation I posted yesterday. The Acromags had gone into a protective state (likely caused by an electrical transient in one of the signals)
The acromags are on the UPS. I suspect the transient came in on one of the signal lines. Chub tells me he unplugged one of the signal cables from the chassis around the time things died on Monday, although we couldn't reproduce the problem doing that again today.
In this situation it wasn't the software that died, but the acromag units themselves. I have an idea to detect future occurrences using a "blinker" signal. One acromag outputs a periodic signal which is directly sensed by another acromag. The can be implemented as another polling condition enforced by the interlock code.
While working on the vac controls today, I also took care of some of the remaining to-do items. Below is a summary of what was done, and what still remains.
sudo mkfs -t ext4 /dev/sdb
sudo dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync
controls@c1vac:~$ sudo dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync
[sudo] password for controls:
^C283422+0 records in
283422+0 records out
18574344192 bytes (19 GB) copied, 719.699 s, 25.8 MB/s
I'm rebooting the IOLAN server to load new serial ports. The interlocks might trip when the pressure gauge readbacks cut out.
Today I implemented protection of the vac system against extended power losses. Previously, the vac controls system (both old and new) could not communicate with the APC Smart-UPS 2200 providing backup power. This was not an issue for short glitches, but for extended outages the system had no way of knowing it was running on dwindling reserve power. An intelligent system should sense the outage and put the IFO into a controlled shutdown, before the batteries are fully drained.
What enabled this was a workaround Gautam and I found for communicating with the UPS serially. Although the UPS has a serial port, neither the connector pinout nor the low-level command protocol are released by APC. The only official way to communicate with the UPS is through their high-level PowerChute software. However, we did find "unofficial" documentation of APC's protocol. Using this information, I was able to interface the the UPS to the IOLAN serial device server. This allowed the UPS status to be queried using the same Python/TCP sockets model as all the other serial devices (gauges, pumps, etc.). I created a new service called "serial_UPS.service" to persistently run this Python process like the others. I added a new EPICS channel "C1:Vac-UPS_status" which is updated by this process.
With all this in place, I added new logic to the interlock.py code which closes all valves and stops all pumps in the event of a power failure. To be conservative, this interlock is also tripped when the communications link with the UPS is disconnected (i.e., when the power state becomes unknown). I tested the new conditions against both communication failure (by disconnecting the serial cable) and power failure (by pressing the "Test" button on the UPS front panel). This protects TP2 and TP3. However, I discovered that TP1---the pump that might be most damaged by a sudden power failure---is not on the UPS. It's plugged directly into a 240V outlet along the wall. This is because the current UPS doesn't have any 240V sockets. I'd recommend we get one that can handle all the turbo pumps.
Pin 1: RxD
Pin 2: TxD
Pin 5: GND
Baud rate: 2400
Data bits: 8
Stop bits: 1
Work is completed and the vac system is back in its nominal state.
The vac controls system is going down for migration from Python 2.7 to 3.4. Will advise when it is back up.