The service had failed at 16:09 yesterday. I just restarted it and am now able to fetch data again.
Unrelated to this work: I restarted the httpd service on nodus a couple of times this afternoon while experimenting with the summary pages.
Please try it out and tell me about any problems in getting fresh data.
To test a hypothesis, I have left the PSL shutter closed. I notice significant glitches in the dark electronics offsets on all the 11 MHz photodiode I/Q demodulated input channels, which appear coherent. These are non-negligible in magnitude - for now they are uncalibrated in cts, but for an estimate, the POX11 channel shows a shift of ~20 cts (~200uV at the input to the whitening board), while the PDH fringe is ~200 cts pk2pk. A first look is in Attachment #1. The fact that it's in all the 11 MHz channels makes me suspect something in the RF chain, maybe some amplifier? I'll open the shutter tomorrow.
A big factor in how much IFO locking activities can take place is how cooperative the IMC is.
Since the c1psl upgrade, the IMC duty cycle has definitely deteriorated. I took a measurement of the dark noise at the IMC error point with 1 Hz FFT binwidth, with all electrical connections to the IMC servo board except the Acromag and Eurocrate power disconnected. I was horrified at the prominence of 60 Hz harmonics - see Attachment #1. In the past, this kind of feature has been indicative of some error in the measurement technique - but I confirmed that the lines remain even if I unplug the GPIB box, and all combinations of floating/grounded inputs that I tried. We know for sure that there is some excess noise imprinted on the laser light post upgrade. While these lines almost certainly are not responsible for the PCdrive RMS going bonkers, surely this kind of electrical situation isn't good?
Attachment #2 shows the same information translated to frequency noise units, taking into account the complementary sensitivity function, L/(1+L) - the sum contribution of the 60 Hz peaks to the RMS is ~11.5% of the total over the entire band (c.f. 1.7 % that is expected if the noise at multiples of 60 Hz was approximately equal to the surrounding noise levels). Moreover, the measured RMS is 55 times higher than a LISO model.
How can this be fixed?
Provided the IMC is cooperative, the input pointing isn't drifting, and the RF offsets aren't jumping around too much, the locking sequence is now pretty robust.
Most of the analysis uses data between the GPS times 1274418176 and 1274419654 that are recorded to frames.
Here, I provide some details of the sequence. Obviously, I am presenting one of the quickest transitions to the fully locked state, I don't claim that every attempt is so smooth. But it is pretty cool that the whole thing can be done in ~3 minutes.
See Attachment #1 for the labels.
This particular lock held for ~20 minutes so I could run some loop characterization measurements etc.
I am struggling to explain:
In order to estimate the free-running DARM displacement noise, I measured the DARM OLTF using the usual IN1/IN2 prescription. The measured data was then used to fit some model paramters for a loop model that can be used over a larger frequency range.
In summary, the UGF is ~150 Hz and phase margin is ~30 deg. This loop would probably benefit from some low-pass filter being turned on.
I am able to realize ~8 kHz UGF with ~60 degrees of phase margin on the CARM loop OLTF (combination of analog and digital signal paths).
The response of the PRFPMI length degrees of freedom as measured in the LSC PDs was characterized. Two visualizations are in Attachment #1 and Attachment #2.
This isn't meant to be a serious budget, mainly it was to force myself to write the code for generating this more easily in the future.
The measured RIN of the arm cavity transmission when the PRFPMI is locked is ~10x in RMS relative to the single arm POX/POY lock. It is not yet clear to me where the excess is coming from.
Attachment #1 shows the comparison.
I looked at some DC signals for the buildup of the carrier and sideband fields in various places. The results are shown in Attachments #1 and #2.
I agree, I think the PRC excess angular motion, PIT in particular, is a dominant contributor to the RIN. Attachments #1-#3 support this hypothesis. In these plots, "XARM" should really read "COMM" and "YARM" should really read "DIFF", because the error signals from the two end QPDs are mixed to generate the PIT and YAW error signals for these ASC servos - this is some channel renaming that will have to be done on the ASC model. The fact that the scatter plot between these DoFs has some ellipticity probably means the basis transformation isn't exactly right, because if they were truly orthogonal, we would expect them to be uncorrelated?
I guess what this means is that the stability of the lock could be improved by turning on some POP QPD based feedback control, I'll give it a shot.
- PRC TT misalignment (~3Hz)
Don't can you check the correlation between the POP QPD and the arm RIN
There were many locklosses from the point where the arm powers were somewhat stabilized. Attachments #1 and #2 show two individual locklosses. I think what is happening here is that the BS seismometer X channel is glitching, and creating a transient in the angular feedforward filter that blows the lock. The POP QPD based feedback loop cannot suppress this transient, apparently. For now, I get around this problem by boosting the POP QPD feedback loop a bit, and then turning the feedforward filters off. The fact that the other seismometer channels don't report any transient makes me think the problem is either with the seismometer itself, or the readout electronics. The seismometer masses were recently recentered, so I'm leaning towards the latter.
I didn't explicitly check the data, but I am reasonably certain the same effect is responsible for many PRMI locklosses even with the arms held off resonance (though the tolerance to excursions there is higher). Pity really, the feedforward filters were a big help in the lock acquisition...
The CARM loop now has a UGF of ~12 kHz with a phase margin of ~60 degrees. These values of conventional stability indicators are good. The CARM optical gain that best fits the measurements is 9 MW/m.
I've been working on understanding the loop better, here are the notes.
Attachment #1 shows a block diagram of the loop topology.
Attachment #2 shows the OLGs of the two actuation paths.
Attachment #3 and #4 show the model, overlaid with measurements of the loop OLG and crossover TF respectively.
Attachment #5 shows the evolution of the CARM OLG at a few points in the lock acquisition sequence.
Now the I have a model I believe, I need to think about whether there is any benefit to changing some of these loop shapes. I've already raised the possibility of changing the shape of the boosts on the CM board, with which we could get a bit more suppression in the 100 Hz - 1kHz region (noise budget of laser frequency noise --> DARM required to see if this is necessary).
Attachments #1 and Attachments #2 are in the style of elog15356, but with data from a more recent lock. It'd be nice to calibrate the ASDC channel (and in general all channels) into power units, so we have an estimate of how much sideband power we expect, and the rest can be attributed to carrier leakage to ASDC.
On the basis of Attachments #1, the PRG is ~19, and at times, the arm transmission goes even higher. I'd say we are now in the regime where the uncertainty of the losses in the recycling cavity - maybe beamsplitter clipping? is important in using this info to try and constrain the arm cavity losses. I'm also not sure what to make of the asymmetry between TRX and TRY. Allegedly, the Y arm is supposed to be lossier.
This is very interesting. Do you have the ASDC vs PRG (~ TRXor TRY) plot? That gives you insight on what is the cause of the low recycling gain.
I implemented an ASC servo for the PRC, with the POP QPD as a sensor, and the PRM as the actuator. This has improved the stability of the lock (longer locks are possible), and also reduced the RIN of the arm transmission.
Attachment #1 shows the in-loop error signal suppression, and some out-of-loop monitors (POP22 and POPDC).
Attachment #2 compares the arm transmission RIN with the PRFPMI locked, with and without PRC ASC. The 3 Hz bump is definitely squished, but I think we can do better yet.
Attachments #3-5 are in the style of elog15361. No Oplev signals yet, I'll add them soon.
I guess what this means is that the stability of the lock could be improved by turning on some POP QPD based feedback control, I'll give it a shot
The 40m summary pages have been revived. I've not had to make any manual interventions in the last 5 days, so things seem somewhat stable, but I'm sure there will need to be multiple tweaks made. The primary use of the pages right now are for vacuum, seismic and PSL diagnostics.
For these initial attempts, I was just trying to transition MICH to REFL55Q. I agree, the PRCL situation may be more complicated.
Which 1f signals are you going to use? PRCL has sign flipping at the carrier critical coupling. So if the IFO is close to that condition, 1f PRCL suffers from the sign flipping or large gain variation.
I am inclined to believe that the arm cavity losses are such that the IFO is overcoupled. Some calculations, validated with Finesse modeling also suggest that there isn't a sign change for the CARM error signal when the IFO goes from being undercoupled to overcoupled, but I may have made a mistake here?
Thoughts from others?
This EQ seems to have knocked all suspensions out. ITMX was stuck. It is now released, and the IMC is locked again. It looks like there are some serious aftershocks going on so let's keep an eye on things.
I found that there is an issue with the MC1 slow bias voltages.
I usually offload the DC part of the output voltage from the WFS servos to the slow bias voltage sliders, so as to preserve maximum actuation range from the fast system. However, today, I found that this servo wasn't working well at all. So I dug a little deeper. Looking at the EPICS database records:
Around 5pm local time, the three vertex FEs crashed. AFAIK, no one was in the lab or working on anything CDS related, so this is worrying.
There appears to have been some sort of vacuum failure.
ldas-pcdev1 was down, so the summary pages weren't being generated. I have now switched over to ldas-pcdev6. I suspect some forepump failure, will check up later today unless someone else wants to take care of this.
There was no interlock action, and I don't check the vacuum status every half hour, so there was a period of time last night there was high circulating power in the arm cavities when the main volume pressure was higher than nominal. I have now closed the PSL shutter until the issue is resolved.
It looks like the main vacuum interlock was tripped due to a serial communication error from the TP2 controller. With Rana/Koji's permission, I will open V1 and expose the main volume to TP1 again (#2 in last section).
Recommended course of action:
The pumpspool UPS has its "Replace Battery" indicator light on. Might be a good chance to change the UPS, but at the very least, we should put in fresh batteries (last replacement was in Aug 2017).
I'll say this again - the pumpspool area is noisier than I remember it being, I think one / both of the roughing pumps backing TP2 / TP3 need tip-seal replacements.
BTW, EX is 5C hotter than EY, by virtue of the tarnac outside? In fact, judging by Steve's thermometers, EX reports a 12C swing in 24 hours between 30 C and 18 C (so almost no temperature control) while EY reports a 5C swing between 20 and 25 C. This is borne out by the ETM Oplev data I think...
I still don't understand why restoring the vacuum is contingent on this functionality working. All the TPs have their own internal logic to shutdown the pump if some damage threshold is exceeded. Plus, we have the pressure-sensor based interlocks to protect the main volume as well as pumps. While the extra redundancy from the readbacks from the controller is useful, clearly it isn't the first line of defense?
The main volume pressure is currently ~10mTorr. If we pump down before this reaches 500mTorr, the procedure is pretty straightforward. Otherwise, we have to do the dance with the manual throttling valve (judging by current rate of increase, unlikely to exceed this over the weekend, but I lose IFO time).
Obviously I don't want to rush this and have some permanent damage, so I'll stay out of this unless otherwise instructed.
Didn't mean to sound whiny. I will wait until the vacuum team tells me it is okay.
The vacuum safety policy and design are not clear to me, and I don't know what the first and second defense is. Since we had limited time and bandwidth during the remotely-supported recovery work today, we wanted to work step by step.
The pressure rising rate is 20mtorr/day, and turning on TP3 early next week will resume the main-volume pumping without too much hustle. If you need the IFO time now, contact with Jon and use backing with TP3.
I missed the vacuum discussion on the call today, but I have some questions/comments:
At the very least, I think we should consider making the interlock code have levels (like interrupts on a micro controller). So if the pressure gauges are communicating and are reporting acceptable pressure readings, we should be able to reject unphysical readbacks from the TP controllers.
I still don’t understand why TP2 can’t back TP1, but we just disable all the software interlock conditions contingent on TP2 readbacks. This pump is far newer than TP3, and unless I’ve misunderstood something major about the vacuum infrastructure, I don’t really see why we should trust this flaky serial readbacks for any actionable interlocks, at least without some AND logic (since temperature, current and speed aren’t really independent variables).
I also think we should finally implement the email alert in the event the vacuum interlock is tripped. I can implement this if no one else volunteers.
This might also be a good reminder to get the documentation in order about the new vacuum system.
I agree there were MEDM fields, but I can't find any record of these channels being recorded till 2018 December, so I don't agree that they were being digitally monitored. You can also look back in the elog (e.g. here and here) and see that the display fields are just blank. I would then assume that no interlocks were dependent on these channels, because otherwise the vacuum interlocks would be perpetually tripped.
Sorry but I'm having trouble imagining a scenario how the pressure gauges wouldn't register this before the IFO volume is compromised. Is there some back of the envelope calculations I can do to understand this? Since both the pressure gauges and the TP diagnostic channels are being monitored via EPICS, the refresh rate is similar, so I don't see how we can have a pump temperature / speed / current threshold tripped but NOT have this be registered on all the pressure gauges, seems like a bit of a contrived scenario to me. Our thresholds currently seem to be arbitrary numbers anyway, or are they based on some expected backstreaming rate? Isn't this scenario degenerate with a leak elsewhere in the vacuum envelope that would be caught by the differential pressure interlocks?
For the email alert, can you expose a soft channel that is a flag - if this flag is not 1, then the service will send out an email.
So why not just have a special mode for the interlock code during pumpdown and venting, and during normal operation we expect the main volume pressure to be <100uTorr so the interlock trips if this condition is violated? These can just be EPICS buttons on the Vac control MEDM screen. Both of these procedures are not "business as usual", and even if we script them in the future, it's likely to have some operator supervising, so I don't think it's unreasonable to have to switch between these modes. I just think the pressure gauges have demonstrated themselves to be much more reliable than these TP serial readbacks (as you say, they worked once upon a time, but that is already evidence of its flakiness?). The Pirani gauges are not ultra-reliable, they have failed in the past, but at least less frequently than this serial comm glitching. In fact, if these readbacks are so flaky, it's not impossible that they don't signal a TP shutdown? I just think the real power of having these multi-channel diagnostics is lost without some AND logic - a turbopump failure is likely to result in an increase in pump current and temperature increase and pump speed decrease, so it's not the individual channel values that should be determining if an interlock is tripped.
I definitely think that protecting the vacuum envelope is a priority - but I don't think it should be at the expense of commissioning time. But if you think these extra interlocks are essential to the safety of the vacuum system, I withdraw my request.
It would be better to have a flag channel, might be useful for the summary pages too. I will make it if it is too much trouble.
For this particular email service, ideally the email should be sent out as soon as the interlock is tripped, so this would require a line of code to be added to the main interlock code. Which I guess would require a restart of the interlock service. So let me know when you guys plan to do the dry-pump tip seal replacement operation (when I presume valves will be closed anyways) so that we can do this in a minimally invasive way.
Ok, this can be added pretty easily. Its value will just be toggled between 1 and 0 every time the interlock server raises/clears the existing string channel. Adding the channel will require restarting the whole vac IOC, so I'll do it at a time when Jordan is on hand in case something fails to come back up.
In ELOG 15368, I had claimed that the POP QPD based feedback servo actuating on the PRM stabilized the lock. I now believe this scheme of sensing using the POP QPD and feeding back to the PRM is not a good topology for stabilizing the PRC angular motion.
I would also like to bring up the topic of implementing some WFS for the interferometer fields again, there doesn't seem to be any mention of this in the procurement/planning for the BHD. It is not obvious to me yet that we need WFS and not just DC QPDs from a noise point of view, but at least we should discuss this.
I want some input about what the short-term (next two weeks) commissioning goals should be.
Before the vacuum fracas, the locking was pretty robust. With some human servoing of the input beam, I could maintain locks for ~1 hour. My primary goals were:
I didn't succeed in either so far.
I guess apart from this, we want to run the ALS scan to try and infer something about the absorption-induced thermal lens. I guess at this point, the costs outweigh the benefits in trying to bring in the SRC as well, since we will be changing the SRC config?
The PSL shutter was closed from the vacuum interlock trip. Today, I did the following:
All looks good for now. I will probably get back to PRFPMI locking Monday.
The machine needed a hard reboot as it was un-ssh-able.
The exact time that the machine went down is unknown because the blinkys were not DQ-ed. I've now added these to the EDCU to make these channels actually useful, and we may look back on the reliability (or otherwise) of the Acromag system. To my memory, this is the ~5th time one of the new Acromag servers has needed a hard reboot. While this may be less frequent (?) than the VME machines, perhaps there is some other reason for these dropouts. Maybe something to do with the martian network?
Anyway the machine is back up and running now.
Per the discussion at the meeting today, the plan of action is:
If I missed something, please add here.
This earthquake tripped all suspensions and ITMX got stuck. The watchdogs were restored and the stuck optic was released. The IFO was re-aligned, POX/POY and PRMI on carrier locking all work okay.
I implemented this change today. We only had 100 ohm, 3W resistors in stock (no 200 ohm with adequate power rating). Assuming 10 V is dropped across this resistor, the power dissipation is V^2/R ~ 1 W, so we should have sufficient margin. DCC entry has been updated with new schematic and photo of the component side of the board. Note that the series resistance of the fast actuation path was untouched.
As expected, the requested voltage no longer exceeds the Acromag DAC range, it is now more like 2.5 V. However, I still notice that the MC REFL spot moves somewhat diagonally on the camera image - so maybe the coil gains are seriously imbalanced? Anyway, the WFS control signals can once again be safely offloaded to the slow bias voltages once again, preserving the fast ADC range for other actuation.
The Johnson noise of the series resistor has now increased by a factor of 2, from ~6.4 pA/rtHz to 12.8 pA/rtHz. Assuming a current to force coefficient of 1.6 mN/A per coil, the length noise of the cavity is expected to be 12.8e-12 * 0.064/0.25/(2*pi*100)^2 ~ 8e-18 m/rtHz at 100 Hz. In frequency units, this is 80 uHz/rtHz. I think our IMC noise is at least 10 times higher than this at 100 Hz (in any case, the noise of the coil driver is NOT dominated by the series resistance). Attachment #1 confirms that there isn't any significant MCF noise increase, and I will check with the arm cavity too. Nevertheless, we should, if possible, align the optic better and use as high a series resistance as possible.
The watchdog for MC1 was disabled and the board was pulled out for this work. After it was replaced, the IMC re-locks readily.
But this does not solve the MC1 issue. Only we can do right now is to make the output resister half, for example.
While the vacuum system was knocked out, I measured the RF transimpedance (using the AM laser setup, didn't do the shot noise intercept current measurement for now) of all the RFPDs (except PMC REFL). At the very least, the following photodiodes are suspect:
For the remaining photodiodes, I measure a transimpedance that is within ~20% of what is on the wiki page. The notches may benefit from some retuning. While I have the data, I will fit this and post a more complete report on the wiki.
Update July 6 1145am: WFS response plots now have legends mapping quadrants, and I've also added the response of a spare PDA10CF (which is now the new POP22/POP110 photodiode).
Judging by the summary pages, some 18 hours after this change was made and the board re-installed, the MC1 shadow sensors began to report frequent glitches. I can't think of a plausible causal connection, especially given the 18 hour time lag, but also hard to believe there isn't one? As a result, the IMC is no longer able to stay locked for extended periods of time. I did the usual cable squishing, and also took off the lid to see if that helps the situation.
While the reduced series resistance means there is more current flowing through the slow path,
The attached FLIR camera image re-inforces what we already know, that the thermal environment inside the satellite box is horrible. The absolute temperature calibration may be off, but it was difficult to touch the components with a bare finger, so I'd say its definitely > 70 C.
Hmm I can't seem to export with the colorbar, might be just my phone though. I tried to add some "cursors" with the temperature at a few spots, but the font color contrast is poor so you have to squint really hard to see the temperatures in the photo I attached.
I'll leave the MC1 box open overnight and see if that improves the situation, and if not, I'll switch in the SRM satellite box tomorrow.
does the FLIR have an option to export image with a colorbar?
How about just leave the lid open? or more open? I don't know what else can be done in the near term. Maybe swap with the SRM sat box to see if that helps?
There was no improvement to the situation overnight. So, I did the following today:
IMC is now locked again, I will monitor for glitching/stability.
Update 6pm PDT: as shown in Attachment #1, there is a huge difference in the stability of the lock after the sat box swap. Let's hope it stays this way for a while...
A more comprehensive report has been uploaded here. I'll zip the data files and add them there too. In summary:
I'll upload the data and analysis notebook + liso fit files to the wiki as well shortly. The data, a Jupyter notebook making the plots, and the LISO fit files have been uploaded here.
I didn't do it this time but it'd be nice to also do the noise measurement and get an estimate for the shot-noise intercept current.
While I have the data, I will fit this and post a more complete report on the wiki.
I injected some sensing lines and measured their responses in the various photodiodes, with the interferometer in a few different configurations. The results are summarized in Attachments #1 - #3. Even with the PRMI (no arm cavities) locked on 1f error signals, the MICH and PRCL signals show up in nearly the same quadrature in the REFL port photodiodes, except REFL165. I am now thinking if the output (actuation) matrix has something to do with this - part of the MICH control signal is fed back to the PRM in order to minimize the appearance of the MICH dither in the PRCL error signal, but maybe this matrix element is somehow horribly mistuned?
Some other mysteries that I will investigate further:
I blew the long lock last night because I forgot to not clear the ASS offsets when trying to find the right settings for running the ASS system at high power. Will try again tonight...
Lock the PRMI on carrier and measure the sensing matrix, see if the MICH and PRCL signals look sensible in 1f and 3f photodiodes.
This problem reared its ugly head again. I am inclined to believe the problem is electronic and not on the light, since the POY channels seem immune to this issue (see Attachment #1). I will investigate in the daytime tomorrow. Note that while the POX photodiode head has ~twice the transimpedance than POY (per measurement), the POY signal gets amplified by a ZHL-500-HLN amplifier before heading to the demod electronics (nominal gain is 19dB = x9). There is also some imbalance in the light level at the photodiodes I guess, because overall, the PDH fringe is ~twice as large for the Y arm as the X arm. Basically, the y-axes of the attached plot cannot be directly compared between POX and POY.
Mostly this is an annoyance - right now, the POX signal is only used for locking and dither aligning the X arm cavity, and so once that is done, the locking can proceed (as long as the other channels, e.g. REFL11, aren't glitching as well...)
I re-connected the 3 accelerometers located near the MC1/MC3 chamber. It was a bit tedious to get the cabling sorted - I estimate the cable is ~80m long, and the excess length had to be wound around a spool (see Attachment #1), which wasn't really a 1 person job. It's neat-ish for now, but I'm not entirely satisfied. I think we should get shorter cables (~20m), and also mount the pre-amp/power units in a rack instead of leaving it on the floor. The pre-amp settings are x100 for all three channels. The MC2 channels are powered, but are unconnected to the seismometers - it was too tedious to unroll the other spool yesterday. Apart from this, the cable for the "Z" channel had to be re-seated in the strain relief clamp.
I did not enable any of the CDS filters that convert the raw signal into physical units, so for now, these channels are just recording raw counts.
Update 7pm: the spectra in the current config are here - not sure what to make of the MC2_Z channel appearing to show lower noise?
Update July 13 2020 430pm: This afternoon, I hooked up the MC2 accelerometer channels too...
In an effort to make a second usable workstation, I did the following (remotely) on rossa today (not necessarily in this order, I wasn't maintaining a live log so I forgot):
So, in summary, rossa is now all set up for use during lock acquisition. However, until this machine has undergone a few months of testing, we should freeze the pianosa config and not mess with it.
Note that this version of the "crtools" is rather new. Please, use them and if there is an issue, report the errors! I am going to occassionally try lock acquisition using rossa.
wiped and install Debian 10 on rossa today
still to be done: config it as CDS workstation
please don't try to "fix" it in the meantime
This is strange - I was definitely able to launch medm when I was working on this machine remotely on Friday. But now, there does seem to be a problem with this shared library being missing.
First of all, I installed mlocate to find where the shared library files are installed. Then I made the symlink, and now sitemap seems to work again.
Weirdly, my changes to /etc/resolv.conf got overwritten somehow. Was this machine rebooted? Uptime suggests it's only been running for ~6 hours at the time of writing of this elog.
sudo apt install mlocate
sudo ln -s /usr/lib/x86_64-linux-gnu/libreadline.so.7 /usr/lib/x86_64-linux-gnu/libreadline.so.6
when I try 'sitemap' on rossa I get:
medm: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory
Indeed, this is now fixed by following instructions from here. I rebooted rossa at ~1250 PDT and confirmed that resolv.conf didn't get overwritten. The resolv.conf file also now has the following useful lines at the head:
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display
maybe we're supposed to edit something besides resolv.conf since that gets over-written on boot for some linux OS