Sigh. Do we have a spare sat box?
A more comprehensive report has been uploaded here. I'll zip the data files and add them there too. In summary:
I'll upload the data and analysis notebook + liso fit files to the wiki as well shortly. The data, a Jupyter notebook making the plots, and the LISO fit files have been uploaded here.
I didn't do it this time but it'd be nice to also do the noise measurement and get an estimate for the shot-noise intercept current.
While I have the data, I will fit this and post a more complete report on the wiki.
There was no improvement to the situation overnight. So, I did the following today:
IMC is now locked again, I will monitor for glitching/stability.
Update 6pm PDT: as shown in Attachment #1, there is a huge difference in the stability of the lock after the sat box swap. Let's hope it stays this way for a while...
I'll leave the MC1 box open overnight and see if that improves the situation, and if not, I'll switch in the SRM satellite box tomorrow.
I will be in the Clean and Bake lab today from 11:30am to 4pm
Hmm I can't seem to export with the colorbar, might be just my phone though. I tried to add some "cursors" with the temperature at a few spots, but the font color contrast is poor so you have to squint really hard to see the temperatures in the photo I attached.
does the FLIR have an option to export image with a colorbar?
How about just leave the lid open? or more open? I don't know what else can be done in the near term. Maybe swap with the SRM sat box to see if that helps?
Judging by the summary pages, some 18 hours after this change was made and the board re-installed, the MC1 shadow sensors began to report frequent glitches. I can't think of a plausible causal connection, especially given the 18 hour time lag, but also hard to believe there isn't one? As a result, the IMC is no longer able to stay locked for extended periods of time. I did the usual cable squishing, and also took off the lid to see if that helps the situation.
While the reduced series resistance means there is more current flowing through the slow path,
The attached FLIR camera image re-inforces what we already know, that the thermal environment inside the satellite box is horrible. The absolute temperature calibration may be off, but it was difficult to touch the components with a bare finger, so I'd say its definitely > 70 C.
I implemented this change today. We only had 100 ohm, 3W resistors in stock (no 200 ohm with adequate power rating). Assuming 10 V is dropped across this resistor, the power dissipation is V^2/R ~ 1 W, so we should have sufficient margin. DCC entry has been updated with new schematic and photo of the component side of the board. Note that the series resistance of the fast actuation path was untouched.
While the vacuum system was knocked out, I measured the RF transimpedance (using the AM laser setup, didn't do the shot noise intercept current measurement for now) of all the RFPDs (except PMC REFL). At the very least, the following photodiodes are suspect:
For the remaining photodiodes, I measure a transimpedance that is within ~20% of what is on the wiki page. The notches may benefit from some retuning. While I have the data, I will fit this and post a more complete report on the wiki.
Update July 6 1145am: WFS response plots now have legends mapping quadrants, and I've also added the response of a spare PDA10CF (which is now the new POP22/POP110 photodiode).
I will be in the Clean and Bake lab today from 11am to 4pm.
As expected, the requested voltage no longer exceeds the Acromag DAC range, it is now more like 2.5 V. However, I still notice that the MC REFL spot moves somewhat diagonally on the camera image - so maybe the coil gains are seriously imbalanced? Anyway, the WFS control signals can once again be safely offloaded to the slow bias voltages once again, preserving the fast ADC range for other actuation.
The Johnson noise of the series resistor has now increased by a factor of 2, from ~6.4 pA/rtHz to 12.8 pA/rtHz. Assuming a current to force coefficient of 1.6 mN/A per coil, the length noise of the cavity is expected to be 12.8e-12 * 0.064/0.25/(2*pi*100)^2 ~ 8e-18 m/rtHz at 100 Hz. In frequency units, this is 80 uHz/rtHz. I think our IMC noise is at least 10 times higher than this at 100 Hz (in any case, the noise of the coil driver is NOT dominated by the series resistance). Attachment #1 confirms that there isn't any significant MCF noise increase, and I will check with the arm cavity too. Nevertheless, we should, if possible, align the optic better and use as high a series resistance as possible.
The watchdog for MC1 was disabled and the board was pulled out for this work. After it was replaced, the IMC re-locks readily.
But this does not solve the MC1 issue. Only we can do right now is to make the output resister half, for example.
I will be in the Clean and Bake lab from 11pm to 4pm
This earthquake tripped all suspensions and ITMX got stuck. The watchdogs were restored and the stuck optic was released. The IFO was re-aligned, POX/POY and PRMI on carrier locking all work okay.
Per the discussion at the meeting today, the plan of action is:
If I missed something, please add here.
I want some input about what the short-term (next two weeks) commissioning goals should be.
I will be in the Clean and Bake lab today from 10am to 4pm.
I propose we go for all CAPS for all channel names. The lower case names is just a holdover from Steve/Alan from the 90's. All other systems are all CAPS.
It avoids us having to force them all to UPPER in the scripts and channel lists.
This work is finally complete. The dry pump replacement was finished quickly but the controls updates required some substantial debugging.
For one, the mailer code I had been given to install would not run against Python 3.4 on c1vac, the version run by the vac controls since about a year ago. There were some missing dependencies that proved difficult to install (related to Debian Jessie becoming unsupported). I ultimately solved the problem by migrating the whole system to Python 3.5. Getting the Python keyring working within systemd (for email account authentication) also took some time.
Edit: The new interlock flag channel is named C1:Vac-interlock_flag.
Along the way, I discovered why the interlocks had been failing to auto-close the PSL shutter: The interlock was pointed to the channel C1:AUX-PSL_ShutterRqst. During the recent c1psl upgrade, we renamed this channel C1:PSL-PSL_ShutterRqst. This has been fixed.
The main volume is being pumped down, for now still in a TP3-backed configuration. As of 8:30 pm the pressure had fallen back to the upper 1E-6 range. The interlock protection is fully restored. Any time an interlock is triggered in the future, the system will send an immediate notification to 40m mailing list. 👍
The vac system is going down at 11 am today for planned maintenance:
The machine needed a hard reboot as it was un-ssh-able.
The exact time that the machine went down is unknown because the blinkys were not DQ-ed. I've now added these to the EDCU to make these channels actually useful, and we may look back on the reliability (or otherwise) of the Acromag system. To my memory, this is the ~5th time one of the new Acromag servers has needed a hard reboot. While this may be less frequent (?) than the VME machines, perhaps there is some other reason for these dropouts. Maybe something to do with the martian network?
Anyway the machine is back up and running now.
I will be at the 40m today from 11am to 4pm.
We will advise when the work is completed.
The PSL shutter was closed from the vacuum interlock trip. Today, I did the following:
All looks good for now. I will probably get back to PRFPMI locking Monday.
Before the vacuum fracas, the locking was pretty robust. With some human servoing of the input beam, I could maintain locks for ~1 hour. My primary goals were:
I didn't succeed in either so far.
I guess apart from this, we want to run the ALS scan to try and infer something about the absorption-induced thermal lens. I guess at this point, the costs outweigh the benefits in trying to bring in the SRC as well, since we will be changing the SRC config?
In ELOG 15368, I had claimed that the POP QPD based feedback servo actuating on the PRM stabilized the lock. I now believe this scheme of sensing using the POP QPD and feeding back to the PRM is not a good topology for stabilizing the PRC angular motion.
I would also like to bring up the topic of implementing some WFS for the interferometer fields again, there doesn't seem to be any mention of this in the procurement/planning for the BHD. It is not obvious to me yet that we need WFS and not just DC QPDs from a noise point of view, but at least we should discuss this.
Tip Seals were replaced on the forepumps for TP2 and TP3, and both are ready to be installed back onto the forelines.
TP2 Forepump Ultimate Pressure: 180 mtorr
TP3 Forepump Ultimate Pressure: 120 mtorr
The four 4x25DSUB and single 8x25DSUB feedthrough flanges have arrived and will be picked up from the dock and brought to the 40M lab.
For this particular email service, ideally the email should be sent out as soon as the interlock is tripped, so this would require a line of code to be added to the main interlock code. Which I guess would require a restart of the interlock service. So let me know when you guys plan to do the dry-pump tip seal replacement operation (when I presume valves will be closed anyways) so that we can do this in a minimally invasive way.
Ok, this can be added pretty easily. Its value will just be toggled between 1 and 0 every time the interlock server raises/clears the existing string channel. Adding the channel will require restarting the whole vac IOC, so I'll do it at a time when Jordan is on hand in case something fails to come back up.
I will be at the 40m today from 9am to 3pm.
I think we should discuss interlock possibilities at a 40m meeting. I'm reluctant to make the system more complicated, but perhaps we can find ways to reduce the reliance on the turbo pump readbacks. I agree they've proven to be the least reliable.
While we may be able to improve the tolerance to certain kinds of hardware malfunctions (and if so, we should), I don't see interlocks triggering on abnormal behavior of critical equipment as the root problem. As I see it, our bigger problem is with all the malfunctioning, mostly end-of-lifetime pieces of vacuum equipment still in use. If we can address the hardware problems, as I'm trying to do with replacements [ELOG 15412], I think that in itself will make the interlocking much less of an issue.
So why not just have a special mode for the interlock code during pumpdown and venting, and during normal operation we expect the main volume pressure to be <100uTorr so the interlock trips if this condition is violated? These can just be EPICS buttons on the Vac control MEDM screen. Both of these procedures are not "business as usual", and even if we script them in the future, it's likely to have some operator supervising, so I don't think it's unreasonable to have to switch between these modes. I just think the pressure gauges have demonstrated themselves to be much more reliable than these TP serial readbacks (as you say, they worked once upon a time, but that is already evidence of its flakiness?). The Pirani gauges are not ultra-reliable, they have failed in the past, but at least less frequently than this serial comm glitching. In fact, if these readbacks are so flaky, it's not impossible that they don't signal a TP shutdown? I just think the real power of having these multi-channel diagnostics is lost without some AND logic - a turbopump failure is likely to result in an increase in pump current and temperature increase and pump speed decrease, so it's not the individual channel values that should be determining if an interlock is tripped.
It would be better to have a flag channel, might be useful for the summary pages too. I will make it if it is too much trouble.
I've created a purchase list of hardware needed to restore the aging vacuum system. This wasn't planned as part of the BHD upgrade, but I've added it to the BHD procurement list since hardware replacements have become necessary.
The list proposes replacing the aging TP3 Varian turbo pump with the newer Agilent model which has already replaced TP2. It seems I was mistaken in believing we already had a second Agilent pump on hand. A thorough search of the lab has not turned it up, and Steve himself has told me he doesn't remember ordering a second one. Fortunately Steve did leave us a detailed Agilent parts list [ELOG 14322].
It also proposes replacing the glitching TP2 Agilent controller with a new one. The existing one can be sent back for repair and then retained as a spare. Considering that one of these controllers is already malfunctioning after < 2 years, I think it's a very good idea to have a spare on hand.
Below is our current list of vacuum hardware issues. Items that this purchase list will address (limited to only the most urgent) are highlighted in yellow.
I removed the backing pumps for TP2 and TP3 today to test ultimate pressure and determine if they need a tip seal replacement. This was done with Jon backing me on Zoom. We closed off TP3 and powered down TP3 and the auxilliary pump, in order to remove the forepumps from the exhaust line.
Once pumps were removed I connected a Pirani gauge to the pump directly and pumped down, results as follows:
TP2 Forepump (Agilent IDP 7):
TP3 Forepump (Varian SH 110):
TP3 forepump defintely needs a new tip seal, and while the pressure on TP2 Forepump was good there was a significant amount of particulate that came out of the exhaust line, so a new tip seal might not be needed but it is recommeded.
I agree with your assessment, Jordan. If I'm not mistaken the scroll pump for TP2 is new; we had a very early failure with the last new scroll pump (the forepump for TP3) tip seals at just over 5000 hours. Glad to see my replacement seals held up for over 60K hours. If this is the trend with these pumps, we can simply run them to around 60000 hours and replace the seals at that time, rather than waiting for failure! - Chub
I definitely think that protecting the vacuum envelope is a priority - but I don't think it should be at the expense of commissioning time. But if you think these extra interlocks are essential to the safety of the vacuum system, I withdraw my request.
Right, I doubt they were ever recorded or used for interlocks. But the readbacks did work at one point in the past. There's a photo of the old vac monitor screen on p. 19 of E1500239 (last updated 2017) which shows the fields once alive.
I don't disagree that the pressure gauges would register the change. What I'm not sure about is whether the change would violate any of the existing interlock conditions, triggering a shutdown. Looking at what we have now, the only non-pump-related conditions I see that might catch it are the diffpres conditions:
abs(P2 - PTP2) > 1 torr (for a TP2 failure)
abs(P3 - PTP3) > 1 torr (for a TP3 failure)
abs(P1a - P2) > 1 torr (for either a TP2 or TP3 failure)
For the P1a-P2 differential, the threshold of 1 torr is the smallest value that in practice still allows us to pump down the IFO without having to disable the interlocks (P1a-P2 is the TP1 intake/exhaust differential). The purpose of the P2-PTP2/P3-PTP3 differentials is to prevent V4/5 from opening and suddenly exposing the spinning turbo to high pressure. I'm not aware of a real damage threshold calculation that any one has done; I think < 1 torr is lore passed down by Steve.
If a turbo pump fails, the rate it would backstream is unknown (to me, at least) and likely depends on the failure mode. The scenario I'm concerned about is if the backstream rate is slower than the conduction time through the pumspool and into the main volume. In that case, the pressure gauges will rise more or less together all the way up to atmosphere, likely never crossing the 1 torr differential pressure thresholds.
There's already a channel C1:Vac-error_status, where if the value is anything other than an empty string, there is an interlock tripped. Does that work?
I agree there were MEDM fields, but I can't find any record of these channels being recorded till 2018 December, so I don't agree that they were being digitally monitored. You can also look back in the elog (e.g. here and here) and see that the display fields are just blank. I would then assume that no interlocks were dependent on these channels, because otherwise the vacuum interlocks would be perpetually tripped.
Sorry but I'm having trouble imagining a scenario how the pressure gauges wouldn't register this before the IFO volume is compromised. Is there some back of the envelope calculations I can do to understand this? Since both the pressure gauges and the TP diagnostic channels are being monitored via EPICS, the refresh rate is similar, so I don't see how we can have a pump temperature / speed / current threshold tripped but NOT have this be registered on all the pressure gauges, seems like a bit of a contrived scenario to me. Our thresholds currently seem to be arbitrary numbers anyway, or are they based on some expected backstreaming rate? Isn't this scenario degenerate with a leak elsewhere in the vacuum envelope that would be caught by the differential pressure interlocks?
For the email alert, can you expose a soft channel that is a flag - if this flag is not 1, then the service will send out an email.
Looking at images of the old vac screens, the TP2/3 rotation speed and status string were digitally monitored. However I don't know if there were software interlocks predicated on those.
The temperature and current interlocks are implemented precisely because the pumps can shut themselves off. The concern is not about damaging the pumps (their internal logic protects against that); it's that a pump could automatically shut down and back-vent the IFO to atmosphere. Another interlock (e.g., the pressure differentials) might catch it, but it would depend on the back-vent rate and the scenario has never been tested. The temperature and current interlocks are set to trip just before the pump reaches its internal shut-down threshold.
One way we might be able to reduce our reliance on the flaky serial readbacks is to implement rotation-speed hardware interlocks. The old vac documentation alludes to these, but as far as Chub and I could determine in 2018, they never actually existed. The older turbo controllers, at least, had an analog output proportional to speed which could be used to control a relay to interrupt the V4/5 control signals. I'll look into this for the new controllers. If it could be done, we could likely eliminate the layer of serial-readback interlocks altogether.
That would be awesome if you're willing to volunteer. I agree this would be great to have.
I will be at the 40m today from 9:30am to 4pm.
I missed the vacuum discussion on the call today, but I have some questions/comments:
At the very least, I think we should consider making the interlock code have levels (like interrupts on a micro controller). So if the pressure gauges are communicating and are reporting acceptable pressure readings, we should be able to reject unphysical readbacks from the TP controllers.
I still don’t understand why TP2 can’t back TP1, but we just disable all the software interlock conditions contingent on TP2 readbacks. This pump is far newer than TP3, and unless I’ve misunderstood something major about the vacuum infrastructure, I don’t really see why we should trust this flaky serial readbacks for any actionable interlocks, at least without some AND logic (since temperature, current and speed aren’t really independent variables).
I also think we should finally implement the email alert in the event the vacuum interlock is tripped. I can implement this if no one else volunteers.
This might also be a good reminder to get the documentation in order about the new vacuum system.
I replaced an empty N2 cylinder, there are now two empty tanks in the outside rack.
[Jon, Jordan, Koji]
Today Jordan reconfigured the vac system to allow pumping of the main volume resume, with Jon and Koji remotely advising. All clear to resume normal IFO activities. However, the vac system is operating in a temporary configuration that will have to be reverted as we locate replacement components. Details below.
Since serial readback of the TP2 controller seems to be failing, we reconfigured the system with TP3 now backing for TP1. TP2 was valved off (at V4) and shut down until we can replace its controller.
TP3 has its own problems, however. It was valved off in January after its temperature readback began glitching and spuriously triggering the interlocks [ELOG 15140]. However the problem appears to be limited only one readback (rotation speed, current, voltage are fine) and there is enough redundancy in the pump-dependent interlock conditions to safely connect it to the main volume.
We also discovered that sometime since January, the TP3 dry pump has failed. The foreline pressure had risen to 165 torr. Since the TP2 and TP3 dry pumps are not interchangeable (Agilent vs. Varian), we instead valved in the auxiliary dry pump and disconnected the failed dry pump using a KF blank. This is a temporary arrangement until the permanent dry pump can be repaired. Jordan removed it to replace the tip seals and will test it in the bake lab before reinstalling.
With this configuration in place, we proceeded to pump down the main volume without issue (attachment 1). We monitored the pumpdown for about 45 min., until the pressure had reached ~1E-5 torr and TP3 had been transitioned to standby (low-speed) mode.
ITMU01 / ITMU02 as well as the five E1800089 mirrors came back to the 40m. Instead, the two ETM spares (ETMU06 / ETMU08) were delivered to GariLynn.
Jordan worked on transportation.
Note that the E1800089 mirrors are together with the ITM container in the precious optics cabinet.
I will be at the 40m today at 10am to deliver optics to Downs and to replace the TP2 controller.
Didn't mean to sound whiny. I will wait until the vacuum team tells me it is okay.
The vacuum safety policy and design are not clear to me, and I don't know what the first and second defense is. Since we had limited time and bandwidth during the remotely-supported recovery work today, we wanted to work step by step.
The pressure rising rate is 20mtorr/day, and turning on TP3 early next week will resume the main-volume pumping without too much hustle. If you need the IFO time now, contact with Jon and use backing with TP3.
I still don't understand why restoring the vacuum is contingent on this functionality working. All the TPs have their own internal logic to shutdown the pump if some damage threshold is exceeded. Plus, we have the pressure-sensor based interlocks to protect the main volume as well as pumps. While the extra redundancy from the readbacks from the controller is useful, clearly it isn't the first line of defense?
The main volume pressure is currently ~10mTorr. If we pump down before this reaches 500mTorr, the procedure is pretty straightforward. Otherwise, we have to do the dance with the manual throttling valve (judging by current rate of increase, unlikely to exceed this over the weekend, but I lose IFO time).
Obviously I don't want to rush this and have some permanent damage, so I'll stay out of this unless otherwise instructed.
Jon and Koji remotely supported Jordan's resetting the TP2 controller.
From the operator's console in front of the vac rack:
Open a terminal window (click the LXTerminal icon on the desktop)
Type "control" + enter to open the vac controls screen
Toggle all the open valves closed (edit by KA: and manually close RV2 by rotating the gate valve handle )
Turn OFF TP2 by clicking the "Off' button. Make sure the status changes and the rotation speed falls to zero (you'll also hear the pump spinning down)
The other pumps (TP1, TP3) can be left running
Once TP2 has stopped spinning, go to the back of the rack and locate the ethernet cable running from the back of the TP2 controller to the IOLAN server (near the top of the rack). Disconnect and reconnect the cable at each end, verifying it is firmly locked in place.
From the front of the rack, power down the TP2 controller (I don't quite remember for the Agilent, but you might have to move the slider on the front from "Remote" to "Local" first)
Wait about 30 seconds, then power it back on. If you had to move the slider to shut it down, revert it back to the "Remote" position.
Go back to the controls screen on the console. If the pump came back up and is communicating serially again, its status will say something other than "NO COMM"
Turn TP2 back on. Verify that it spins up to its nominal speed (66 kRPM)
At this point you can reopen any valves you initially closed (any that were already closed before, leave closed)
TP2 was stopped and at this moment the glitches were gone. Jordan powercycled the TP2 controller and we brought up the TP2 back at the full speed.
However, the glitches came back as before. Obviously we can't go on from here, and we've decided to stop the recovery process here today.
- We left TP1/2/3 running while the valves including RV2 were closed.
- When Jordan is back in the lab next week, we'll try to use TP3 as the backing of TP1 so that we can resume the main volume pumping.
- Currently, TP3 does not have interlocking and that is a risk. Jon is going to implement it.
- Meanwhile, we will try to replace the controller of TP2. We are supposed to have this in the lab. Ask Chub about the location.
- Once we confirm the stability of the diagnostic signals for TP2, we will come back to the nominal pumping scheme.
I will be in the Clean and Bake lab today from 12pm to 4pm.
1. I agree that it's likely that it was the temp signal glitch.
Recom #2: I approve to reopen the valves to pump down the main volume. As long as there is no frequent glitch, we can just bring the vacuum back to normal with the current software setup.
2. Recom #1 is also reasonable. You can use simple logic like if we register 10 consecutive samples that exceed the threshold, we can activate the interlock. I feel we should still keep the temp interlock. Switching between pumping mode and the normal operation may cause unexpected omission of the interlocks when it is necessary.
3. We should purchase the UPS battery / replacement rotary TIP seal. Once they are in hand, we can stop the vacuum and execute the replacement. Can one person (who?) accomplish everything with some remote help?
4. The lab temp: you mean, 12degC swing with the AC on!?
The pumpspool UPS has its "Replace Battery" indicator light on. Might be a good chance to change the UPS, but at the very least, we should put in fresh batteries (last replacement was in Aug 2017).
I'll say this again - the pumpspool area is noisier than I remember it being, I think one / both of the roughing pumps backing TP2 / TP3 need tip-seal replacements.
BTW, EX is 5C hotter than EY, by virtue of the tarnac outside? In fact, judging by Steve's thermometers, EX reports a 12C swing in 24 hours between 30 C and 18 C (so almost no temperature control) while EY reports a 5C swing between 20 and 25 C. This is borne out by the ETM Oplev data I think...
It looks like the main vacuum interlock was tripped due to a serial communication error from the TP2 controller. With Rana/Koji's permission, I will open V1 and expose the main volume to TP1 again (#2 in last section).
Recommended course of action:
There appears to have been some sort of vacuum failure.
ldas-pcdev1 was down, so the summary pages weren't being generated. I have now switched over to ldas-pcdev6. I suspect some forepump failure, will check up later today unless someone else wants to take care of this.
There was no interlock action, and I don't check the vacuum status every half hour, so there was a period of time last night there was high circulating power in the arm cavities when the main volume pressure was higher than nominal. I have now closed the PSL shutter until the issue is resolved.