I've generated specifications for the new BHD optics. This includes the suspended relay mirrors as well as the breadboard optics (but not the OMCs).
To design the mode-matching telescopes, I updated the BHD mode-matching scripts to reflect Koji's draft layout (Dec. 2019) and used A La Mode to optimize ROCs and positions. Of the relay optics, only a few have an AOI small enough for curvature (astigmatism) and most of those do not have much room to move. This reduced the optimization considerably.
These ROCs should be viewed as a first approximation. Many of the distances I had to eyeball from Koji's drawings. I also used the Gaussian PRC/SRC modes from the current IFO, even though the recycling cavities will both slightly change. I set up a running list of items like these that we still need to resolve in the BHD README.
At a glance, all the specifications can be seen in the optics summary spreadsheet.
The LO beam originates from the PR2 transmission (POP), near ITMX. It is relayed to the BHD beamsplitter (and mode-matched to the OMCs) via the following optical sequence:
The resulting beam profile is shown in Attachment 1.
The AS beam is relayed from the SRM to the BHD beamsplitter (and mode-matched to the OMCs) via the following sequence:
A lens is used because there is not enough room on the BHD breadboard for a pair of (low-AOI) telescope mirrors, like there is in the LO path. The resulting beam profile is shown in Attachment 2.
Nice, and we should also permanently install the camera server (c1cam) which is still sitting on the electronics bench. It is running an adapted version of the Python 2/Debian 8 site code. Maybe if COVID continues long enough I'll get around to making the Python 3 version we've long discussed.
There's this elog from Stephen about better 1064 sensitivity from Basler. We should consider getting one if he finds that its actual SNR is as good as we would expect from the QE improvement.
Hang and I have reanalyzed the BHD telescope designs, with the goal of identifying sufficiently non-degenerate locations for ASC actuation. Given the limited room to reposition optics and the requirement to remain insensitive to small positioning errors, we conclude it is not possible put sufficient Gouy phase separation between the AS1/AS2 and LO1/LO2 locations. However, we can make the current layout work if we instead actuate AS1/AS4 and LO1/LO4. This would require actuating one optic on the breadboard for each relay path. If possible, we believe this offers the simplest solution (i.e., least modification to the current layout).
Radius of curvatures:
For the initial phase of BHD testing, we recently discussed whether the mode-matching telescopes could be built with 100% stock optics. This would allow the optical system to be assembled more quickly and cheaply at a stage when having ultra-low loss and scattering is less important. I've looked into this possibility and conclude that, yes, we do have a good stock optics option. It in fact achieves comprable performance to our optimized custom-curvature design [ELOG 15357]. I think it is certainly sufficient for the initial phase of BHD testing.
It turns out our usual suppliers (e.g., CVI, Edmunds) do not have enough stock options to meet our requirements. This is for two reasons:
However I found that Lambda Research Optics carries 1" and 2" super-polished mirror blanks in an impressive variety of stock curvatures. Even more, they're polished to comprable tolerances as I had specificied for the custom low-scatter optics [DCC E2000296]: irregularity < λ/10 PV, 10-5 scratch-dig, ROC tolerance ±0.5%. They can be coated in-house for 1064 nm to our specifications.
From modeling Lambda's stock curvature options, I find it still possible to achieve mode-matching of 99.9% for the AS beam and 98.6% for the LO beam, if the optics are allowed to move ±1" from their current positions. The sensitivity to the optic positions is slightly increased compared to the custom-curvature design (but by < 1.5x). I have not run the stock designs through Hang's full MC corner-plot analysis which also perturbs the ROCs [ELOG 15339]. However for the early BHD testing, the sensitivity is secondary to the goal of having a quick, cheap implementation.
The following tables show the best telescope designs using stock curvature options. It assumes the optics are free to move ±1" from their current positions. For comparison, the values from the custom-curvature design are also given in parentheses.
The AS relay path is shown in Attachment 1:
The LO relay path is shown in Attachment 2:
I've created a new tab in the BHD procurement spreadsheet ("Stock MM Optics Option") listing the part numbers for the above telescope designs, as well as their fabrication tolerances. The total cost is $2.8k + the cost of the coatings (I'm awaiting a quote from Lambda for the coatings). The good news is that all the curved substrates will receive the same HR/AR coatings, so I believe they can all be done in a single coating run.
MM_total = (MM_vert + MM_horiz) / 2.
The large astigmatic MM loss in the LO case is mainly due to the strong LO4 curvature (R=0.15m) with a 10 deg AOI. I looked again at whether LO1 could be increased from R=5m to the next higher stock value of 7.5m, as this would allow weaker curvatures on LO3 and LO4. However, no, that is not possible---it reduces the LO1-LO2 Gouy phase separation to only 18 deg.
There is, however, a good stock-curvature option if we want to reconsider actuating LO4 instead of LO2 (attachment 1). It achieves 99.2% MM with the OMCs, allowing positions to vary +/-1" from the current design. The LO1-LO4 Gouy phase separation is 72 deg.
Alternatively, we could look at reducing the AOI on LO3 and LO4 (keeping LO1-LO2 actuation).
Hmm? T1300364 suggests MM_total = Sqrt(MM_Vert * MM_Horiz)
I don't think we ever discussed why the angular RMS of the ETMs is so much higher than the ITMs. Maybe that's a separate matter because, even assuming the worst case, the actuation range requirement is
(0.82 μrad RMS) x (15 μrad/μrad) x (10 safety factor) = 0.12 mrad
which is still only order 1% of the pitch/yaw pointing range of the Small Optic Suspensions, according to P1600178 (sec. IV. A). Can we check this requirement off the list?
We computed the required actuation range for the telescope design in elog:15357. The result is summarized in the table below. Here we assume we misalign an IFO mirror by 1 urad, and then compute how many urad do we need to move the (AS1, AS4) or (LO1, LO2) mirrors to simultaneously correct for the two gouy phases.
The most demanding ifo mirrors are the ETMs and the BS, for every 1 urad misalignment the telescope needs to move 10-15 urad to correct for that. However, it is unlikely for those mirrors to move more 100 nrad for a locked ifo with ASC engaged. Thus a few urad actuation should be sufficient. For the recycling mirrors, every 1 urad misalignment also requires ~ 1 urad actuation.
As a result, if we could afford 10 urad actuation range for each telescope suspension, then the gouy phase separations we have should be fine.
We looked at the oplev spectra from gps 1274418500 for 512 sec. This should be a period when the ifo was locked in the PRFPMI state according to elog:15348. We just focused on the yaw data for now. Please see the attached plots. The solid traces are for the ASD, and the dotted ones are the cumulative rms. The total rms for each mirror is also shown in the legend.
I am now confused... The ITMs looked somewhat reasonable in that at least the < 1 Hz motion was suppressed. The total rms is ~ 0.1 urad, which was what I would expect naively (~ x100 times worse than aLIGO).
There seems to be no low-freq suppression on the ETMs though... Is there no arm ASC at the moment???
After further astigmatism/tolerance analysis [ELOG 15380, 15387] our conclusion is that the stock-optic telescope designs [ELOG 15379] are sufficient for the first round of BHD testing. However, for the final BHD hardware we should still plan to procure the custom-curvature optics [DCC E2000296]. The optimized custom-curvature designs are much more error-tolerant and have high probability of achieving < 2% mode-matching loss. The stock-curvature designs can only guarantee about 95% mode-matching.
Below are the final distances between optics in the relay paths. The base set of distances is taken from the 2020-05-21 layout. To minimize the changes required to the CAD model, I was able to achieve near-maximum mode-matching by moving only one optic in each relay path. In the AS path, AS3 moves inwards (towards the BHDBS) by 1.06 cm. In the LO path, LO4 moves backwards (away from the BHDBS) by 3.90 cm.
[Jon, Jordan, Koji]
Today Jordan reconfigured the vac system to allow pumping of the main volume resume, with Jon and Koji remotely advising. All clear to resume normal IFO activities. However, the vac system is operating in a temporary configuration that will have to be reverted as we locate replacement components. Details below.
Since serial readback of the TP2 controller seems to be failing, we reconfigured the system with TP3 now backing for TP1. TP2 was valved off (at V4) and shut down until we can replace its controller.
TP3 has its own problems, however. It was valved off in January after its temperature readback began glitching and spuriously triggering the interlocks [ELOG 15140]. However the problem appears to be limited only one readback (rotation speed, current, voltage are fine) and there is enough redundancy in the pump-dependent interlock conditions to safely connect it to the main volume.
We also discovered that sometime since January, the TP3 dry pump has failed. The foreline pressure had risen to 165 torr. Since the TP2 and TP3 dry pumps are not interchangeable (Agilent vs. Varian), we instead valved in the auxiliary dry pump and disconnected the failed dry pump using a KF blank. This is a temporary arrangement until the permanent dry pump can be repaired. Jordan removed it to replace the tip seals and will test it in the bake lab before reinstalling.
With this configuration in place, we proceeded to pump down the main volume without issue (attachment 1). We monitored the pumpdown for about 45 min., until the pressure had reached ~1E-5 torr and TP3 had been transitioned to standby (low-speed) mode.
Looking at images of the old vac screens, the TP2/3 rotation speed and status string were digitally monitored. However I don't know if there were software interlocks predicated on those.
The temperature and current interlocks are implemented precisely because the pumps can shut themselves off. The concern is not about damaging the pumps (their internal logic protects against that); it's that a pump could automatically shut down and back-vent the IFO to atmosphere. Another interlock (e.g., the pressure differentials) might catch it, but it would depend on the back-vent rate and the scenario has never been tested. The temperature and current interlocks are set to trip just before the pump reaches its internal shut-down threshold.
One way we might be able to reduce our reliance on the flaky serial readbacks is to implement rotation-speed hardware interlocks. The old vac documentation alludes to these, but as far as Chub and I could determine in 2018, they never actually existed. The older turbo controllers, at least, had an analog output proportional to speed which could be used to control a relay to interrupt the V4/5 control signals. I'll look into this for the new controllers. If it could be done, we could likely eliminate the layer of serial-readback interlocks altogether.
That would be awesome if you're willing to volunteer. I agree this would be great to have.
Right, I doubt they were ever recorded or used for interlocks. But the readbacks did work at one point in the past. There's a photo of the old vac monitor screen on p. 19 of E1500239 (last updated 2017) which shows the fields once alive.
I don't disagree that the pressure gauges would register the change. What I'm not sure about is whether the change would violate any of the existing interlock conditions, triggering a shutdown. Looking at what we have now, the only non-pump-related conditions I see that might catch it are the diffpres conditions:
abs(P2 - PTP2) > 1 torr (for a TP2 failure)
abs(P3 - PTP3) > 1 torr (for a TP3 failure)
abs(P1a - P2) > 1 torr (for either a TP2 or TP3 failure)
For the P1a-P2 differential, the threshold of 1 torr is the smallest value that in practice still allows us to pump down the IFO without having to disable the interlocks (P1a-P2 is the TP1 intake/exhaust differential). The purpose of the P2-PTP2/P3-PTP3 differentials is to prevent V4/5 from opening and suddenly exposing the spinning turbo to high pressure. I'm not aware of a real damage threshold calculation that any one has done; I think < 1 torr is lore passed down by Steve.
If a turbo pump fails, the rate it would backstream is unknown (to me, at least) and likely depends on the failure mode. The scenario I'm concerned about is if the backstream rate is slower than the conduction time through the pumspool and into the main volume. In that case, the pressure gauges will rise more or less together all the way up to atmosphere, likely never crossing the 1 torr differential pressure thresholds.
There's already a channel C1:Vac-error_status, where if the value is anything other than an empty string, there is an interlock tripped. Does that work?
I've created a purchase list of hardware needed to restore the aging vacuum system. This wasn't planned as part of the BHD upgrade, but I've added it to the BHD procurement list since hardware replacements have become necessary.
The list proposes replacing the aging TP3 Varian turbo pump with the newer Agilent model which has already replaced TP2. It seems I was mistaken in believing we already had a second Agilent pump on hand. A thorough search of the lab has not turned it up, and Steve himself has told me he doesn't remember ordering a second one. Fortunately Steve did leave us a detailed Agilent parts list [ELOG 14322].
It also proposes replacing the glitching TP2 Agilent controller with a new one. The existing one can be sent back for repair and then retained as a spare. Considering that one of these controllers is already malfunctioning after < 2 years, I think it's a very good idea to have a spare on hand.
Below is our current list of vacuum hardware issues. Items that this purchase list will address (limited to only the most urgent) are highlighted in yellow.
I think we should discuss interlock possibilities at a 40m meeting. I'm reluctant to make the system more complicated, but perhaps we can find ways to reduce the reliance on the turbo pump readbacks. I agree they've proven to be the least reliable.
While we may be able to improve the tolerance to certain kinds of hardware malfunctions (and if so, we should), I don't see interlocks triggering on abnormal behavior of critical equipment as the root problem. As I see it, our bigger problem is with all the malfunctioning, mostly end-of-lifetime pieces of vacuum equipment still in use. If we can address the hardware problems, as I'm trying to do with replacements [ELOG 15412], I think that in itself will make the interlocking much less of an issue.
So why not just have a special mode for the interlock code during pumpdown and venting, and during normal operation we expect the main volume pressure to be <100uTorr so the interlock trips if this condition is violated? These can just be EPICS buttons on the Vac control MEDM screen. Both of these procedures are not "business as usual", and even if we script them in the future, it's likely to have some operator supervising, so I don't think it's unreasonable to have to switch between these modes. I just think the pressure gauges have demonstrated themselves to be much more reliable than these TP serial readbacks (as you say, they worked once upon a time, but that is already evidence of its flakiness?). The Pirani gauges are not ultra-reliable, they have failed in the past, but at least less frequently than this serial comm glitching. In fact, if these readbacks are so flaky, it's not impossible that they don't signal a TP shutdown? I just think the real power of having these multi-channel diagnostics is lost without some AND logic - a turbopump failure is likely to result in an increase in pump current and temperature increase and pump speed decrease, so it's not the individual channel values that should be determining if an interlock is tripped.
Ok, this can be added pretty easily. Its value will just be toggled between 1 and 0 every time the interlock server raises/clears the existing string channel. Adding the channel will require restarting the whole vac IOC, so I'll do it at a time when Jordan is on hand in case something fails to come back up.
It would be better to have a flag channel, might be useful for the summary pages too. I will make it if it is too much trouble.
The vac system is going down at 11 am today for planned maintenance:
We will advise when the work is completed.
This work is finally complete. The dry pump replacement was finished quickly but the controls updates required some substantial debugging.
For one, the mailer code I had been given to install would not run against Python 3.4 on c1vac, the version run by the vac controls since about a year ago. There were some missing dependencies that proved difficult to install (related to Debian Jessie becoming unsupported). I ultimately solved the problem by migrating the whole system to Python 3.5. Getting the Python keyring working within systemd (for email account authentication) also took some time.
Edit: The new interlock flag channel is named C1:Vac-interlock_flag.
Along the way, I discovered why the interlocks had been failing to auto-close the PSL shutter: The interlock was pointed to the channel C1:AUX-PSL_ShutterRqst. During the recent c1psl upgrade, we renamed this channel C1:PSL-PSL_ShutterRqst. This has been fixed.
The main volume is being pumped down, for now still in a TP3-backed configuration. As of 8:30 pm the pressure had fallen back to the upper 1E-6 range. The interlock protection is fully restored. Any time an interlock is triggered in the future, the system will send an immediate notification to 40m mailing list. 👍
I looked into how the new UPS devices suggested by Chub would communicate with the vac interlocks. There are several possible ways, listed in order of preference:
I recommend we proceed with ordering the Tripp Lite 36HW20 for TP1 and Tripp Lite 1AYA6 for TP2 and TP3 (and other 120V electronics). As far as I can tell, the only difference between the two 120V options is that the 6FXN4 model is TAA-compliant.
As suggested last week, Hang and I have reviewed the A+ BHD status (DRD, CDD, and reviewers' comments) and compiled a list of key unanswered questions which could be addressed through Finesse analysis.
In anticipation of others helping with this modeling effort, we've tried to break questions into self-contained projects and estimated their level of difficulty. As you'll see, they range from beginner to Finesse guru.
Here is the procedure for setting up the three new BHD front-ends (c1bhd, c1sus2, c1ioo - replacement). This plan is based on technical advice from Rolf Bork and Keith Thorne.
The overall topology for each machine is shown here. As all our existing front-ends use (obsolete) Dolphin PCIe Gen1 cards for IPC, we have elected to re-use Dolphin Gen1 cards removed from the sites. Different PCIe generations of Dolphin cards cannot be mixed, so the only alternative would be to upgrade every 40m machine. However the drivers for these Gen1 Dolphin cards were last updated in 2016. Consequently, they do not support the latest Linux kernel (4.x) which forces us to install a near-obsolete OS for compatibility (Debian 8).
Chub has placed the order for two new UPS units (115V for TP2/3 and a 220V version for TP1).
They will arrive within the next two weeks.
This year we've struggled with vacuum controls unreliability (e.g., spurious interlock triggers) caused by decaying hardware. Here are details of the vacuum refurbishment plan I described on the 40m call this week.
☑ Refurbish TP2 and TP3 dry pumps. Completed [ELOG 15417].
☑ Automated notifications of interlock-trigger events. Email to 40m list and a new interlock flag channel. Completed [ELOG 15424].
☐ Replace failing UPS.
☐ Remove interlock dependencies on TP2/TP3 serial readbacks. Due to persistent glitching [ELOG 15140, ELOG 15392].
Unlike TP2 and TP3, the TP1 readbacks are real analog signals routed to Acromags. As these have caused us no issues at all, the plan is to eliminate dependence on the TP2/3 digital readbacks in favor of the analog controller outputs. All the digital readback channels will continue to exist, but the interlock system will no longer depend on them. This will require adding 2 new sinking BI channels each for TP2 and TP3 (for a total of 4 new channels). We have 8 open Acromag XT1111 channels in the c1vac system [ELOG 14493], so the new channels can be accommodated. The below table summarizes the proposed changes.
To carry out the next steps of the vac refurbishment plan [ELOG 15499], I've ordered parts necessary for interfacing the UPS units and the analog TP2/3 controller outputs with c1vac. The purchase list is appended to the main BHD list and is located here. Some parts we already had in the boxes of Acromag materials. Jordan is gathering what we do already have and staging it on the vacuum controls console table - please don't move them or put them away.
This afternoon Jordan is going to carry out a test of the V4 and V5 hardware interlocks. To inform the interlock improvement plan , we need to characterize exactly how these work (they pre-date the 2018 upgrade). I have provided him a sequence of steps for each test and will also be backing him up on Zoom.
We will close V1 as a precaution but there should be no other impact to the IFO. The tests are expected to take <1 hour. We will advise when they are completed.
This test has been completed. The IFO configuration has been reverted to nominal.
For future reference: yes, both the V4 and V5 hardware interlocks were found to still be connected and work. A TTL signal from the analog output port of each pump controller (TP2 and TP3) is connected to an auxiliary relay inside the main valve relay box. These serve the purpose of interupting the (Acromag) control signal to the primary V4/5 relay. This interrupt is triggered by each pump's R1 setpoint signal, which is programmed to go low when the rotation speed falls below 80% of the low-speed setting.
That's great news we won't have to worry about a new timing fanout for the two new machines, c1bhd and c1sus2. And there's no plan to change Dolphin IPC drivers. The plan is only to install the same (older) version of the driver on the two new machines and plug into free slots in the existing switch.
The new dolphin eventually helps us. But the installation is an invasive change to the existing system and should be done at the installation stage of the 40m BHD.
The vac system is going down now for planned repairs [ELOG 15499]. It will likely take most of the day. Will advise when it's back up.
Vacuum work is completed. The TP2 and TP3 interlocks have been overhauled as proposed in ELOG 15499 and seem to be performing reliably. We're now back in the nominal system state, with TP2 again backing for TP1 and TP3 pumping the annuli. I'll post the full implementation details in the morning.
I did not get to setting up the new UPS units. That will have to be scheduled for another day.
Yesterday I completed the switchover of small turbo pump interlocks as proposed in ELOG 15499. This overhaul altogether eliminates the dependency on RS232 readbacks, which had become unreliable (glitchy) in both controllers. In their place, the V4(5) valve-close interlocks are now predicated on an analog controller output whose voltage goes high when the rotation speed is >= 80% of the nominal setpoint. The critical speed is 52.8 krpm for TP2 and 40 krpm for TP3. There already exist hardware interlocks of V4(5) using the same signals, which I have also tested.
Unlike the TP1 controller, which exposes simple relays whose open/closed states are sensed by Acromags, what the TP2(3) controllers output is an energized 24V signal for controlling such a relay (output circuit pictured below). I hadn't appreciated this difference and it cost me time yesterday. The ultimate solution was to route the signals through a set of new 24V Phoenix Contact relays installed inside the Acromag chassis. However, this required removing the chassis from the rack and bringing it to the electronics bench (rather than doing the work in situ, as I had planned). The relays are mounted to the second DIN rail opposite the Acromags. Each TP2(3) signal controls the state of a relay, which in turn is sensed using an Acromag XT1111.
The TP2(3) "normal-speed" signals are already in use by hardware interlocks of V4(5). Each signal is routed into the main AC relay box, where it controls an "interrupter" relay through which the Acromag control signal for the main V4(5) relay is passed. These signals are now shared with the digital controls system using a passive DB15 Y-splitter. The signal routing is shown below.
The new turbo-pump-related interlock conditions and their channel predicates are listed below. The full up-to-date channel list and wiring assignments for c1vac are maintained here.
There are two new channels, both of which provide a binary indication of whether the pump speed is outside its nominal range. I did not have enough 24V relays to also add the C1:Vac-TP2(3)_fail channels listed in ELOG 15499. However, these signals are redundant with the existing interlocks, and the existing serial "Status" readback will already print failure messages to the MEDM screens. All of the TP2(3) serial readback channels remain, which monitor voltage, current, operational status, and temperature. The pump on/off and low-speed mode on/off controls remain implemented with serial signals as well.
The new analog readbacks have been added to the MEDM controls screens, circled below:
I'm in the lab this morning to interface the two new UPS units with the digital controls system. Will be out by lunchtime. The disruptions to the vac system should be very brief this time.
I'm leaving the lab shortly. We're not ready to switch over the vac equipment to the new UPS units yet.
The 120V UPS is now running and interfaced to c1vac via a USB cable. The unofficial tripplite python package is able to detect and connect to the unit, but then read queries fail with "OS Error: No data received." The firmware has a different version number from what the developers say is known to be supported.
The 230V UPS is actually not correctly installed. For input power, it has a general type C14 connector which is currently plugged into a 120V power strip. However this unit has to be powered from a 230V outlet. We'll have to identify and buy the correct adapter cable.
With the 120V unit now connected, I can continue to work on interfacing it with python remotely. The next implementation I'm going to try is item #2 of this plan [ELOG 15446].
The vac controls are going down now to pull and test software changes. Will advise when the work is completed.
The vac work is completed. All of the vacuum equipment is now running on the new 120V UPS, except for TP1. The 230V TP1 is still running off wall power, as it always has. After talking with Tripp Lite support today, I believe there is a problem with the 230V UPS. I will post a more detailed note in the morning.
Yesterday's UPS switchover was mostly a success. The new Tripp Lite 120V UPS is fully installed and is communicating with the slow controls system. The interlocks are configured to trigger a controlled shutdown upon an extended power outage (> ~30 s), and they have been tested. All of the 120V pumpspool equipment (the full c1vac/LAN/Acromag system, pressure gauges, valves, and the two small turbo pumps) has been moved to the new UPS. The only piece of equipment which is not 120V is TP1, which is intended to be powered by a separate 230V UPS. However that unit is still not working, and after more investigation and a call to Tripp Lite, I suspect it may be defective. A detailed account of the changes to the system follow below.
Unfortunately, I think I damaged the Hornet (the only working cathode ionization gauge in the main volume) by inadvertently unplugging it while switching over equipment to the new UPS. The electronics are run from multiple daisy-chained power strips in the bottom of the rack and it is difficult to trace where everything goes. After the switchover, the Hornet repeatedly failed to activate (either remotely or manually) with the error "HV fail." Its compatriot, the Pirani SuperBee, also failed about a year ago under similar circumstances (or at least its remote interface did, making it useless for digital monitoring and control). I think we should replace them both, ideally with ones with some built-in protection against power failures.
Four new soft channels per UPS have been created, although the interlocks are currently predicated on only C1:Vac-UPS120V_status.
These new readbacks are visible in the MEDM vacuum control/monitor screens, as circled in Attachment 1:
Yesterday I brought with me a custom power cable for the 230V UPS. It adapts from a 208/120V three-phase outlet (L21-20R) to a standard outlet receptacle (5-15P) which can mate with the UPS's C14 power cable. I installed the cable and confirmed that, at the UPS end, 208V AC was present split-phase (i.e., two hot wires separated 120 deg in phase, each at 120V relative to ground). This failed to power on the unit. Then Jordan showed up and suggested to try powering it instead from a single-phase 240V outlet (L6-20R). However we found that the voltage present at this outlet was exactly the same as what the adapter cable provides: 208V split-phase.
This UPS nominally requires 230V single-phase. I don't understand well enough how the line-noise-isolation electronics work internally, so I can think of three possible explanations:
I called Tripp Lite technical support. They thought the unit should work as powered in the configuration I described, so this leads me to suspect #3.
@Chub and Jordan: Can you please look into somehow replacing this unit, potentially with a U.S.-specific model? Let's stick with the Tripp Lite brand though, as I already have developed the code to interface those.
Unlike our older equipment, which communicates serially with the host via RS232/485, the new UPS units can be connected with a USB 3.0 cable. I found a great open-source package for communicating directly with the UPS from within Python, Network UPS Tools (NUT), which eliminates the dependency on Tripp Lite's proprietary GUI. The package is well documented, supports hundreds of power-management devices, and is available in the Debian package manager from Jessie (Debian 8) up. It consists of a large set of low-level, device-specific drivers which communicate with a "server" running as a systemd service. The NUT server can then be queried using a uniform set of programming commands across a huge number of devices.
I document the full set-up procedure below, as we may want to use this with more USB devices in the future.
First, install the NUT package and its Python binding:
$ sudo apt install nut python-nut
This automatically creates (and starts) a set of systemd processes which expectedly fail, since we have not yet set up the config. files defining our USB devices. Stop these services, delete their default definitions, and replace them with the modified definitions from the vacuum git repo:
$ sudo systemctl stop nut-*.service
$ sudo rm /lib/systemd/system/nut-*.service
$ sudo cp /opt/target/services/nut-*.service /etc/systemd/system
$ sudo systemctl daemon-reload
Next copy the NUT config. files from the vacuum git repo to the appropriate system location (this will overwrite the existing default ones). Note that the file ups.conf defines the UPS device(s) connected to the system, so for setups other than c1vac it will need to be edited accordingly.
$ sudo cp /opt/target/services/nut/* /etc/nut
Now we are ready to start the NUT server, and then enable it to automatically start after reboots:
$ sudo systemctl start nut-server.service
$ sudo systemctl enable nut-server.service
If it succeeds, the start command will return without printing any output to the terminal. We can test the server by querying all the available UPS parameters with
$ upsc 120v
which will print to the terminal screen something like
device.mfr: Tripp Lite
device.model: Tripp Lite UPS
driver.version.data: TrippLite HID 0.81
ups.mfr: Tripp Lite
ups.model: Tripp Lite UPS
Here 120v is the name assigned to the 120V UPS device in the ups.conf file, so it will vary for setups on other systems.
If all succeeds to this point, what we have set up so far is a set of command-line tools for querying (and possibly controlling) the UPS units. To access this functionality from within Python scripts, a set of official Python bindings are provided by the python-nut package. However, at the time of writing, these bindings only exist for Python 2.7. For Python 3 applications (like the vacuum system), I have created a Python 3 translation which is included in the vacuum git repo. Refer to the UPS readout script for an illustration of its usage.
Now that the old APC Smart-UPS 2200 is no longer in use by the vacuum system, I looked into whether it can be repurposed for the framebuilder machine. Yes, it can. The max power consumption of the framebuilder (a SunFire X4600) is 1.137kW. With fresh batteries, I estimate this UPS can power the framebuilder for >10 min. and possibly as long as 30 min., depending on the exact load.
@Chub/Jordan, this UPS is ready to be moved to rack 1X6/1X7. It just has to be disconnected from the wall outlet. All of the equipment it was previously powering has been moved to the new UPS. I have ordered a replacement battery (APC #RBC43) which is scheduled to arrive 9/09-11.
On Friday, I grabbed the Zurich Instruments HF2LI lock-in amplifier and brought it home. As time permits, I will work towards developing a similar readout script as we have for the SR785.
As promised some time ago, I've obtained input noise spectra from the sites calibrated to physical units. They are located in a new subdirectory of the BHD repo: A+/input_noises. I've heavily annotated the notebook that generates them (input_noises.ipynb) with aLOG references, to make it transparent what filters, calibrations, etc. were applied and when the data were taken. Each noise term is stored as a separate HDF5 file, which are all tracked via git LFS.
So far there are measurements of the following sources:
These can be used, for example, to make more realistic Hang's bilinear noise modeling [ELOG 15503] and Yehonathan's Monte Carlo simulations [ELOG 15539]. Let me know if there are other specific noises of interest and I will try to acquire them. It's a bit time-consuming to search out individual channel calibrations, so I will have to add them on a case-by-case basis.
Assembled is the list of dead pressure gauges. Their locations are also circled in Attachment 1.
For replacements, I recommend we consider the Agilent FRG-700 Pirani Inverted Magnetron Gauge. It uses dual sensing techniques to cover a broad pressure range from 3e-9 torr to atmosphere in a single unit. Although these are more expensive, I think we would net save money by not having to purchase two separate gauges (Pirani + hot/cold cathode) for each location. It would also simplify the digital controls and interlocking to have a streamlined set of pressure readbacks.
For controllers, there are two options with either serial RS232/485 or Ethernet outputs. We probably want the Agilent XGS-600, as it can handle all the gauges in our system (up to 12) in a single controller and no new software development is needed to interface it with the slow controls.
Now that the new Agilent full-range gauges (FRGs) have been received, I'm putting together an installation plan. Since my last planning note in Sept. (ELOG 15577), two more gauges appear to be malfunctioning: CC2 and PAN. Those are taken into account, as well. Below are the proposed changes for all the sensors in the system.
Update to the gauge replacement plan (15692), based on Jordan's walk-through today. He confirmed:
Based on this info (and also info from Gautam that the PAN gauge is still working), I've updated the plan as follows. In summary, I now propose we install the fifth FRG in the TP1 foreline (PTP1 location) and leave P2 and P3 where they are, as they are no longer needed elsewhere. Any comments on this plan? I plan to order all the necessary gaskets, blanks, etc. tomorrow.
I've investigated the vacuum controls failure that occurred last night. Here's what I believe happened.
From looking at the system logs, it's clear that there was a sudden loss of power to the control computer (c1vac). Also, the system was actually down for several hours. The syslog shows normal EPICS channel writes (pressure readback updates, etc., and many of them per minute) which suddenly stop at 4:12 pm. There are no error or shutdown messages in the syslog or in the interlock log. The next activity is the normal start-up messaging at 7:39 pm. So this is all consistent with the UPS suddenly failing.
According to the Tripp Lite manual, the FAULT icon indicates "the battery-supported outlets are overloaded." The failure of the TP2 dry pump appears to have caused this. After the dry pump failure, the rising pressure in the TP2 foreline caused TP2's current draw to increase way above its normal operating range. Attachment 1 shows anomalously high TP2 current and foreline pressure in the minutes just before the failure. The critical system-wide failure is that this overloaded the UPS before overloading TP2's internal protection circuitry, which would have shut down the pump, triggering interlocks and auto-notifications.
First, there are too many electronics on the 1 kVA UPS. The reason I asked us to buy a dual 208/120V UPS (which we did buy) is to relieve the smaller 120V UPS. I envision moving the turbo pumps, gauge controllers, etc. all to the 5 kVA unit and reserving the smaller 1 kVA unit for the c1vac computer and its peripherals. We now have the dual 208/120V UPS in hand. We should make it a priority to get that installed.
Second, there are 1 Hz "blinker" channels exposed for c1vac and all the slow controls machines, each reporting the machine's alive status. I don't think they're being monitored by any auto-notification program (running on a central machine), but they could be. Maybe there already exists code that could be co-opted for this purpose? There is an MEDM screen displaying the slow machine statuses at Sitemap > CDS > SLOW CONTROLS STATUS, pictured in Attachment 2. This is the only way I know to catch sudden failures of the control computer itself.
As requested, I placed an order for an assortment of new RF cables: SMA male-male, RG405.
They're expected to arrive mid next week.
Attached is the layout for the "intermediate" CDS upgrade option, as was discussed on Wednesday. Under this plan:
Existing FEs stay where they are (they are not moved to a single rack)
Dolphin IPC remains PCIe Gen 1
RFM network is entirely replaced with Dolphin IPC
Please send me any omissions or corrections to the layout.
I re-ordered the below cables, this time going with flexible, double-shielded RG316-DS. Jordan will pick up and return the RG-405 cables after the holidays.
That's fine, we didn't actually request those. We bought and already have in hand new PCIe x4 cables for the chassis-host connection. They're 3 m copper cables, which was based on the assumption of the time that host and chassis would be installed in the same rack.
Koji asked me assemble a detailed breakdown of the parts received from LHO, which I do based on the high-res photos that Gautam posted of the shipment.
Also, I looked into the mix-up regarding the number of PCIe slots in the new Supermicro servers. The motherboard actually has six PCIe slots and is on the CDS list of boards known to be compatible. The mistake (mine) was in selecting a low-profile (1U) chassis that only exposes one of these slots. But at least it's not a fundamental limitation.
One option is to install an external PCIe expansion chassis that would be rack-mounted right above the FE. It is automatically configured by the system BIOS, so doesn't require any special drivers. It also supports hot-swapping of PCIe cards. There are also cheap ribbon-cable riser cards that would allow more cards to be connected for testing, although this is not as great for permanent mounting.
It may still be better to use the machines offered by Keith Thorne from LLO, as they're more powerful anyway. But if there is going to be an extended delay before those can be received, we should be able to use the machines we already have in conjunction with one of these PCIe expansion options.
Indeed T1800302 is the document I was alluding to, but I completely missed the statement about >3 GHz speed. There is an option for 3.4 GHz processors on the X10SRi-F board, but back in 2019 I chose against it because it would double the cost of the systems. At the time I thought I had saved us $5k. Hopefully we can get the LLO machines in the near term---but if not, I wonder if it's worth testing one of these to see whether the performance is tolerable.
Can you please provide a link to this "list of boards"? The only document I can find is T1800302....
I confirm that PCIe 2.0 motherboards are backwards compatible with PCIe 1.x cards, so there's no hardware issue. My main concern is whether the obsolete Dolphin drivers (requiring linux kernel <=3.x) will work on a new system, albeit one running Debian 8. The OSS PCIe card is automatically configured by the BIOS, so no external drivers are required for that one.
Please also confirm that there are no conflicts w.r.t. the generation of PCIe slots, and the interfaces (Dolphin, OSSI) we are planning to use - the new machines we have are "PCIe 2.0" (though i have no idea if this is the same as Gen 2).
I've produced updated diagrams of the CDS layout, taking the comments in 15476 into account. I've also converted the 40m's diagrams from Omnigraffle ($150/license) to the free, cloud-based platform draw.io. I had never heard of draw.io, but I found that it has most all the same functionality. It also integrates nicely with Google Drive.
Attachment 1: The planned CDS upgrade (2 new FEs, fully replace RFM network with Gen 1 Dolphin IPC)
Attachment 2: The current 40m CDS topology
The most up-to-date diagrams are hosted at the following links:
Please send me any further corrections or omissions. Anyone logged in with LIGO.ORG credentials can also directly edit the diagrams.
I've started writing up a rough testing sequence for getting the three new front-ends operational (c1bhd, c1sus2, c1ioo). Since I anticipate this plan undergoing many updates, I've set it up as a Google doc which everyone can edit (log in with LIGO.ORG credentials).
Link to planning document
Please have a look and add any more tests, details, or concerns. I will continue adding to it as I read up on CDS documentation.
Today I moved the c1bhd machine from the control room to a new test area set up behind (west of) the 1X6 rack. The test stand is pictured in Attachment 1. I assembled one of the new IO chassis and connected it to the host.
The chassis was then powered on and LED lights illuminated indicating that all the components have power. The assembled chassis is pictured in Attachment 2.
Following the procedure outlined T1900700, the system failed the very first test of the communications link between chassis and host, which is to check that all PCIe cards installed in both the host and the expansion chassis are detected. The Dolpin host adapter card is detected:
07:06.0 PCI bridge: Stargen Inc. Device 0102 (rev 02) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=07, secondary=0e, subordinate=0e, sec-latency=0
I/O behind bridge: 00002000-00002fff
Prefetchable memory behind bridge: 00000000c0200000-00000000c03fffff
Capabilities:  Power Management version 2
Capabilities:  MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities:  Express Downstream Port (Slot+), MSI 00
Capabilities:  Subsystem: Device 0000:0000
Kernel driver in use: pcieport
However the OSS PCIe adapter card linking the host to the IO chassis was not detected, nor were any of the cards in the expansion chassis. Gautam previously reported that the OSS card was not detected by the host (though it was not connected to the chassis then). Even now connected to the IO chassis, the card is still not detected. On the chassis-side OSS card, there is a red LED illuminated indicating "HOST CARD RESET" as pictured in Attachment 3. This may indicate a problem with the card on the host side. Still more debugging to be done.
Today I continued with assembly and testing of the new front-ends. The main progress is that the IO chassis is now communicating with the host, resolving the previously reported issue.
Unfortunately, though, it turns out one of the two (host-side) One Stop Systems PCIe cards sent from Hanford is bad. After some investigation, I ultimately resolved the problem by swapping in the second card, with no other changes. I'll try to procure another from Keith Thorne, along with some spares.
Also, two of the three switching power supplies sent from Livingston (250W Channel Well PSG400P-89) appear to be incompatible with the Trenton BPX6806 PCIe backplanes in these chassis. The power supply cable has 20 conductors and the connector on the board has 24. The third supply, a 650W Antec EA-650, does have the correct cable and is currently powering one of the IO chassis. I'll confirm this situation with Keith and see whether they have any more Antecs. If not, I think these supplies can still be bought (not obsolete).
I've gone through all the hardware we've received, checked against the procurement spreadsheet. There are still some missing items:
Once the PCIe communications link between host and IO chassis was working, I carried out the testing procedure outlined in T1900700. This performs a series checks to confirm basic operation/compatibility of the hardware and PCIe drivers. All of the cards installed in both the host and the expansion chassis are detected and appear correctly configured, according to T1900700. In the below tree, there is one ADC, one 16-ch DIO, one 32-ch DO, and one DolphinDX card:
+-05.0-[05-20]----00.0-[06-20]--+-00.0-[07-08]----00.0-----00.0 Contec Co., Ltd Device 86e2
| | +-03.0-[0e]--
| | +-04.0-[0f]--
| | +-06.0-[10-11]----00.0-----04.0 PLX Technology, Inc. PCI9056 32-bit 66MHz PCI <-> IOBus Bridge
| | +-07.0---
| | +-08.0---
| | +-0a.0---
| | \-0b.0---
| +-0a.0-[1e-1f]----00.0-[1f]----00.0 Contec Co., Ltd Device 8632
\-08.0-[21-2a]--+-00.0 Stargen Inc. Device 0101
Before I start building/testing RTCDS models, I'd like to move the new front ends to an isolated subnet. This is guaranteed to prevent any contention with the current system, or inadvertent changes to it.
Today I set up another of the Supermicro servers sent by Livingston in the 1X6 test stand area. The intention is for this machine to run a cloned, bootable image of the current fb1 system, allowing it to function as a bootserver and DAQ server for the FEs on the subnet.
However, this hard disk containing the fb1 image appears to be corrupted and will not boot. It seems to have been sitting disconnected in a box since ~2018, which is not a stable way to store data long term. I wasn't immediately able to recover the disk using fsck. I could spend some more time trying, but it might be most time-effective to just make a new clone of the fb1 system as it is now.
Some progress today towards setting up an isolated subnet for testing the new front-ends. I was able to recover the fb1 backup disk using the Rescatux disk-rescue utility and successfully booted an fb1 clone on the subnet. This machine will function as the boot server and DAQ server for the front-ends under test. (None of these machines are connected to the Martian network or, currently, even the outside Internet.)
Despite the success with the framebuilder, front-ends cannot yet be booted locally because we are still missing the DHCP and FTP servers required for network booting. On the Martian net, these processes are running not on fb1 but on chiara. And to be able to compile and run models later in the testing, we will need the contents of the /opt/rtcds directory also hosted on chiara.
For these reasons, I think it will be easiest to create another clone of chiara to run on the subnet. There is a backup disk of chiara and I attempted to boot it on one of the LLO front-ends, but without success. The repair tool I used to recover the fb1 disk does not find a problem with the chiara disk. However, the chiara disk is an external USB drive, so I suspect there could be a compatibility problem with these old (~2010) machines. Some of them don't even recognize USB keyboards pre-boot-up. I may try booting the USB drive from a newer computer.
Edit: I removed one of the new, unused Supermicros from the 1Y2 rack and set it up in the test stand. This newer machine is able to boot the chiara USB disk without issue. Next time I'll continue with the networking setup.