I've added marked-up schematics + high-res photographs of the SRM coil driver board and dewhitening board to the 40m DCC Document tree (D1700217 and D1700218).
In the attached marked-up schematics, I've also added the proposed changes which Rana and I discussed earlier today. For the thick-film -> thin-film resistor switching, I will try and make a quick LISO model to see if we can get away with replacing just a few rather than re-stuff the whole board.
Another change I think should be made, but I forgot to include on the markups: On the dewhitening board, we should probably replace the decoupling capacitors C41 and C52 with equivalent value electrolytic caps (they are currently tantalum caps which I think are susceptible to fail by shorting input to output).
I've made the LISO models for the dewhitening board and coil driver boards I pulled out.
Attached is a plot of the current noise in the current configuration (i.e. dewhitening board just has a gain x3 stage, and then propagated through the coil driver path), with the top 3 noise contributions: The op-amps (op3 and op5) are the LT1125s on the coil driver board in the bias path, while "R12" is the Johnson noise from the 1k input resistace to the OP27 in the signal path.
Assuming the OSEMs have an actuation gain of 0.016 N/A (so 0.064 N/A for 4 OSEMs), the current noise of ~1e-10 A/rtHz translates to a displacement noise of ~3e-15m/rtHz at ~100Hz (assuming a mirror mass of 0.25kg).
I have NOT included the noise from the LM6321 current buffers as I couldn't find anything about their noise characteristics in the datasheet. LISO files used to generate this plot are attached.
I first set the bias sliders to 0 on the MEDM screen (after checking that the nominal values were stored), then shut down the watchdogs, and then pulled out the boards for inspection + photo-taking.
I've uploaded high-res photos + marked up schematics to the same DCC page linked in the previous page. I've noted the S/Ns of the ITM, BS and SRM boards on the page, I think it makes sense to collect everything on one page, and I guess eventually we will unify everything to a one or two versions.
To take the photos, I tried to reproduce the "LED light painting" technique reported here. I mounted the Canon EOS Rebel T3i on a tripod, and used some A3 sheets of paper to make a white background against which the board to be photographed was placed. I also used the new Macro lens we recently got. I then played around with the aperture and exposure time till I got what I judged to be good photos. The room lights were turned off, and I used the LED on my phone to do the "painting", from ~a metre away. I think the photos have turned out pretty well, the component values are readable.
I've spent the last week investigating various parts of the DAC -> OSEM coil signal chain in order to add these noises to the MICH NB. Here is what I have thus far.
I wanted to match a noise model to noise measurement for the coil-driver de-whitening boards. The main objectives were:
After ~3months without any problems on the slow machine front, I had to reboot c1psl, c1susaux and c1iscaux today. The control room StripTool traces were not being displayed for all the PSL channels so I ran testSlowMachines.bash to check the status of the slow machines, which indicated that these three slow machines were dead. After rebooting the slow machines, I had to burt-restore the c1psl snapshot as usual to get the PMC to lock. Now, both PMC and IMC are locked. I also had to restart the StripTool traces (using scripts/general/startStrip.sh) to get the unresponsive traces back online.
Steve tells me that we probably have to do a reboot of the vacuum slow machines sometime soon too, as the MEDM screen for the Vacuum indicator channels are unresponsive.
Had to reboot c1psl, c1susaux, c1auxex, c1auxey and c1iscaux today. PMC has been relocked. ITMX didn't get stuck. According to this thread, there have been two instances in the last 10 days in which c1psl and c1susaux have failed. Since we seem to be doing this often lately, I've made a little script that uses the netcat utility to check which slow machines respond to telnet, it is located at /opt/rtcds/caltech/c1/scripts/cds/testSlowMachines.bash.
This measurement has been troublesome - I was plagued by large 60Hz harmonics (see Attachment #1), the cause of which was unknown. I powered all electronics used in the measurement set up from the same power strip (one of the new surge-protecting ones Steve recently acquired for us), but these remained present. Yesterday, Koji helped me troubleshoot this issue. We did the various things, I try to put them here in the order we did them:
Today, I tried to repeat the measurement, with the newly made twisted ribbon cable, but the large 60Hz harmonics were back. Then I realized we had also disconnected the WiFi extender and GPIB box yesterday.
Turns out that connecting the Prologix box to the SR785 (even with no power) is the culprit! Disconnecting the Prologix box makes these harmonics go away. I was using the box labelled "Santuzza.martian" (192.168.113.109), but I double-checked with the box labelled "vanna.martian" (192.168.113.105, also a different DC power supply adapter for the box), the effect is the same. I checked various combinations like
but it looks like connecting the GPIB box to the analyzer is what causes the problem. This was reproducible on both SR785s in the lab. So to make this measurement, I had to do things the painful way - acquire the spectrum by manually pushing buttons with the GPIB box disconnected, then re-connect the box and download the data using SRmeasure --getdata. I don't fully understand what is going on, especially since if the input connector is directly terminated using a 50ohm BNC terminator, there are no harmonics, regardless of whether the GPIB box is connected or not. But it is worth keeping this problem in mind for future low-noise measurements. My elog searches did not reveal past reports of similar problems, has anyone seen something like this before?
It also looks like my previous measurement of the de-whitening board noises was plagued by the same problem (I took all those spectra with the GPIB boxes connected). I will repeat this measurement.
At the meeting this week, it was decided that
I also think it would be a good idea to up the 100-ohm resistors in the bias path on the ITM coil driver boards to 1kohm wire-wound. Since the dominant noise on the coil-driver boards is from the voltage noise of the Op-Amps in the bias path, this would definitely be an improvement. Looking at the current values of the bias MEDM sliders, a 10x increase in the resistance for ITMX will not be possible (the yaw bias is ~-1.5V), but perhaps we can go for a 4x increase?
The plan is to then re-install the boards, and see if we can
We can then take a call on how much to up the series resistance in the DAC signal path.
Now that I have figured out the cause of the harmonics, I will also try and measure the combined electronics noise of de-whitening board + coil driver board and compare it to the model.
I've given Steve a list of the thin-film resistors we need to implement the changes discussed in the preceeding elogs - but I figured it would be good to see if we can realize the projected improvement in MICH displacement noise just by fixing the BS Oplev loop shape and turning the existing whitening on. Before re-installing them however, I did make a few changes:
Photos of all the boards were taken prior to re-installation, and have been uploaded to the 40m Google Photos page - I will update schematics + photos on the DCC page once other planned changes are implemented.
I also measured the transfer functions on the de-whitened signal paths on all the boards before re-installing them. I then fit everything using LISO, and updated the filter banks in Foton to match these measurements - the original filters were copied over from FM9 and FM10 to FM7 and FM8. The new filters are appended with the suffix "_0517", and live in FM9 and FM10 of the coil output filter banks. The measured TFs (for ITMs and BS) are summarized in Attachment #1, while Attachment #2 contains the data and LISO file used to do the fits (path to the .bod files in the .fil file will have to be changed appropriately). I used 2 complex pole pairs at ~10 Hz, two complex zero pairs at ~100Hz, real poles at ~15Hz and ~3kHz, and real zeros at ~100Hz and ~550Hz for the fits. The fits line up well with the measured data, and are close enough to the "expected" values (as calculated from component values) to be explained by tolerances on the installed components - I omit the plots here.
After re-installing the boards in the Eurocrate, restoring rough alignment, and updating the filter banks with the most recent measured values, I wanted to see if I could turn the whitening on for one of the optics (ITMY) smoothly before trying to do so in the full DRMI - switching off the "SimDW_0517" filter (FM9) should switch the signal path on the de-whitening board from bypass to de-whitened, and I had confirmed last week with an extender board that the voltage at the appropriate backplane connector pin does change as expected when the FM9 MEDM button is toggled (for both ITMs, BS and SRM). But today I was not able to engage this transition smoothly, the optic seems to be getting kicked around when I engage the whitening. I will need to investigate this further.
Unrelated to this work: the ETMY Oplev HeNe is dead (see Attachment #3). I thought we had just replaced this laser a couple of months ago - what is the expected lifetime of these? Perhaps the power supply at the Y-end is wonky and somehow damaging the HeNe heads?
I think the reason I am unable to engage the de-whitening is that the OL loop is injecting a ton of control noise - see Attachment #1. With the OL loop off (i.e. just local damping loops engaged for the ITMs), the RMS control signal at 100Hz is ~6 orders of magnitude (!) lower than with the OL loop on. So turning on the whitening was just railing the DAC I guess (since the whitening has something like 60dB gain at 100Hz).
The Oplev loops for the ITMs use an "Ellip15" low-pass filter to do the roll-off (2nd order Elliptic low pass filter with 15dB stopband atten and 2dB ripple). I confirmed that if I disable the OL loops, I was able to turn on the whitening for ITMY smoothly.
Now that the ETMY OL HeNe has been replaced, I restored alignment of the IFO. Both arms lock fine (I was also able to engage the ITMY Coil Driver whitening smoothly with the arm locked). However, something funny is going on with ASS - running the dither seems to inject huge offsets into the ITMY pit and yaw such that it almost immediately breaks the lock. This probably has to do with some EPICS values not being reset correctly since the recent slow-machine restarts (for instance, the c1iscaux restart caused all the LSC RFPD whitening gains to be reset to random values, I had to burt-restore the POX11 and POY11 values before I could get the arms to lock), I will have to investigate further.
GV edit 2pm 31 May: After talking to Koji at the meeting, I realized I did not specify what channel the attached spectra are for - it is C1:SUS-ITMY_ULCOIL_OUT.
But today I was not able to engage this transition smoothly, the optic seems to be getting kicked around when I engage the whitening. I will need to investigate this further.
We tried to debug the mysterious sudden failure of ASS - here is a summary of what we did tonight. These are just notes for now, so I don't forget tomorrow.
What are the problems/symptoms?
What are the (known) changes since the servos were last working?
Hypotheses plus checks (indented bullets) to test them:
For whatever reasons, it appears that dithering the cavity mirrors at frequencies with amplitudes that worked ~3 weeks ago is no longer giving us the correct error signals for dither alignment. We are out of ideas for tonight, TBC tomorrow...
Steve alerted me that the IMC wouldn't lock. Reboots for c1susaux, c1iool0 today. I tried using the reset button instead of keying the crates. This worked for c1iool0, but not for c1susaux. So I had to key the latter crate. The machine took a good 5-10 minutes before coming back up, but eventually it did. Now IMC locks fine.
I started by checking if shaking an optic in pitch really moves it in pitch - i.e. how much PIT to YAW coupling is there. The motivation being if we aren't really dithering the optics in orthogonal DoFs, the demodulated error signals carry mixed information which the dither alignment servos get confused by. First, I checked with a low frequency dither (~4Hz) and looked at the green transmission on the video monitors. The spot seemed to respond reasonably orthogonally to both pitch and yaw excitations on either ITMY or ETMY. But looking at the Oplev control signal spectra, there seems to be a significant amount of cross coupling. ITMY YAW, ETMY PIT, and ETMY YAW have the peak in the orthogonal degree of freedom at the excitation frequency roughly 20% of the height of the DoF being driven. But for ITMY PIT, the peaks in the orthogonal DoFs are almost of equal height. This remains true even when I changed the excitation frequencies to the nominal dither alignment servo frequencies.
I then tried to see if I could get parts of the ASS working. I tried to manually align the ITM, ETM and TTs as best as I could. There are many "alignment references" - prior to the coil driver board removal, I had centered all Oplevs and also checked that both X and Y green beams had nominal transmission levels (~0.4 for GTRY, ~0.5 for GTRX). Then there are the Transmon QPDs. After trying various combinations, I was able to get good IR transmission, and reasonable GTRY.
Next, I tried running the ASS loops that use error signals demodulated at the ETM dither frequencies (so actuation is on the ITM and TT1 as per the current output matrix which I did not touch for tonight). This worked reasonably well - Attachment #1 shows that the servos were able to recover good IR transmission when various optics in the Y arm were disturbed. I used the same oscillator frequencies as in the existing burt snapshot. But the amplitudes were tweaked.
Unfortunately I had no luck enabling the servos that demodulate the ITM dithers.
The plan for daytime work tomorrow is to check the linearity of the error signals in response to static misalignment of some optics, and then optimize the elements of the output matrix.
I am uploading a .zip file with Sensoray screen-grabs of all the test-masses in their best aligned state from tonight (except ITMX face, which for some reason I can't grab).
And for good measure, the Oplev spot positions - Attachment #3.
While Gautam is working the restoration of Yarm ASS, I worked on Xarm.
Looks like there was a power glitch at around 10am today.
All frontends, FB, Megatron, Optimus were offline. Chiara reports an uptime of 666 days so looks like its UPS works fine. PSL was tripped, probably the end lasers too (yet to check). Slow machines seem alright (Responds to ping, and I can also telnet into them).
Since all the frontends have to be re-started manually, I am taking this opportunity to investigate some cds issues like the lack of a dmesg log file on some of the frontends. So the IFO will be offline for sometime.
GV Jun 5 6pm: From my discussion with jamie, I gather that the fact that the dmesg output is not written to file is because our front-ends are diskless (this is also why the ring buffer, which is what we are reading from when running "dmesg", gets cleared periodically)
[Koji, Rana, Gautam]
The state this work was started in was as indicated in the previous elog - c1ioo wasn't ssh-able, but was responding to ping. We then did the following:
Why does ntpdate behave this way? And only on one of the frontends? And what is the remaining RFM error?
Koji then restarted the IMC autolocker and FSS slow processes on megatron. The IMC locked almost immediately. The MC2 transmon indicated a large shift in the spot position, and also the PMC transmission is pretty low (while the lab temperature equilibriates after the AC being off during peak daytime heat). So the MC transmission is ~14500 counts, while we are used to more like 16,500 counts nowadays.
Re-alignment of the IFO remains to be done. I also did not restart the end lasers, or set up the Marconi with nominal params.
Attachment #3 - Status of the Master Timing Sequencer after various reboots and power cycling of front ends and associated electronics.
Attachment #4 - Warning lights on C1IOO
Now IFO work like fixing ASS can continue...
Rana suggested taking a look at the Y-arm test mass actuator TFs (measured by driving the coils one at a time, with only local damping loops on, using the Oplev to measure the response to a given drive). Attached are the results from this measurement (I used the Oplev pitch error signal for all 8 measurements). Although the magnitude response for all coils have the expected 1/f^2 shape, there seems to be some significant (~10dB) asymmetry in both the ETM and ITM coils. The phase-response is also not well understood. If we are just measuring the TF of a pendulum with 1 Hz resonant frequency, then at and above 10Hz, I would expect the phase to be either 0 or 180 deg. Looks like there is a notch at 60 Hz somewhere, but it is unclear to me where the ~90 degree phase at ~100Hz is coming from.
For the ITM, the UL OSEM was replaced during the 2016 summer vent - the coil that is in there is now of the short OSEM variety, perhaps it has a different number of turns or something. I don't recall any coil balancing being done after this OSEM swap. For the ETM, it is unclear to me how long this situation has been like this.
Yesterday night, I tried to measure the ASS output matrix by stepping the ITM, ETM and TTs in PIT and YAW, and looking at the response in the various ASS error signals. During this test, I found the ETM and ITM pitch and yaw error signals to be highly coupled (the input matrix was diagonal). As Rana suggested, I think the whole coil driver signal chain from DAC output to coil driver board output has to be checked before attempting to fix ASS. Results from this investigation to follow.
Note: The OSEM calibration hasn't been done in a while (though the HeNes have been swapped out), but as Attachment #2 shows, if we believe the shadow sensor calibration, then the relative calibrations of the ITM and ETM Oplevs agree. So we can directly compare the TFs for the ITM and ETM.
I repeated the test of driving C1:SUS-<Optic>_<coil>_EXC individually and measuring the transfer function to C1:SUS-<Optic>_OPLEV_PERROR for Optic in (ITMX, ITMY, ETMX, ETMY, BS), coil in (LLCOIL, LRCOIL, ULCOIL, URCOIL).
There seems to be a few dB imbalance in the coils in both ETMs, as well as ITMX. ITMY and the BS seem to have pretty much identical TFs for all the coils - I will cross-check using OPLEV_YERROR, but is there any reason why we shouldn't adjust the gains in the coil output (not output matrix) filter banks to correct for this observed imbalance? The Oplev calibrations for the various optics are unknown, so it may not be fair to compare the TFs between optics (I guess the same applies to comparing TF magnitudes from coil to OPLEV_PERROR and OPLEV_YERROR, perhaps we should fix the OL calibrations before fiddling with coil gains...)
The anomalous behaviour of ITMY_UL (10dB greater than the others) was traced down to a rogue x3 gain in the filter module . This has been removed, and now Y arm ASS works fine (with the original dither servo settings). X arm dither still doesn't converge - I double checked the digital filters and all seems in order, will investigate the analog part of the drive electronics now.
I investigated the analog electronics in the coil driver chain by using awggui to drive a given channel with Uniform noise between DC and 8kHz, with an overall gain of 1000 cts. This test was done for both ITMs and the BS. The Whitening/De-Whitening was off during the test. I measured the spectra in
Attachment #1 - There is good agreement between all 3 measurements. To convert the DTT spectrum to Vrms/rtHz, I multiplied the Y-axis by 10V / ( 2*sqrt(2) * 2^15 cts). Between DC and ~1kHz, the measured spectrum everywhere is flat, as expected given the test conditions. The AI filter response is also seen.
Attachment #2 - Zoomed in view of Attachment #1 (without the AI filter part).
*The DTT plots have been coarse-grained to keep the PDF file size managable. X (Y) axes are shared for all the plots in columns (rows).
Similar verification remains to be done for the ETMs, after which the test has to be repeated with the Whitening/DeWhitening engaged. But it's encouraging that things make sense so far (except perhaps the coil balancing can be better as suggested by the previous elog).
I've left both arms locked. The Y-arm dither alignment is working well again, but for the X arm, the loops that actuate on the BS are still weird. Nothing obvious in the tests so far though.
GV 6pm 8 Jun 2017: I realized the X arm transmission was being monitored by the high-gain PD and not the QPD (which is how we usually run the ASS). The ASC mini screen suggested the transmitted beam was reasonably well centered on the X end QPD, and so I switched to this after which the X end dither alignment too converged. Possibly the beam was falling off the other PD, which is why the BS loops, which control the beam spot position on the ETM, were acting weirdly.
will investigate the analog part of the drive electronics now.
Not related to this work:
I noticed the X-arm LSC servo was often hitting its limit - so I reduced the gain from 0.03 to 0.02. This reduced the control signal RMS, and re-acquiring lock at this lower gain wasn't a problem either. See attachment #3 (will be rotated later) for control signal spectra at this revised setting.
*Another issue with the IMC autolocker I've noticed in the recent past: sometimes, the mcup script doesn't get run even though the MC catches a TEM00 mode. So the IMC servo remains in acquisition state (e.g. boosts and WFS servos don't get turned on). Looking at the autolocker log doesn't shed much light - the "saw a flash" log message gets printed, but while normally the mcup script gets run at this point, in these cases, the MC just remains in this weird state.
It happened again. MC2 UL seems to have gotten the biggest glitch. It's a rather small jump in the signal level compared to what I have seen in the recent past in connection with suspect Satellite boxes, and LL and UR sensors barely see it.
I will squish Sat box cables and check the cabling at the coil driver board end as well, given that these are two areas where there has been some work recently. WFS loops will remain off till I figure this out. At least the (newly centered) DC spot positions on the WFS and MC2 TRANS QPD should serve as some kind of reference for good MC alignment.
GV edit 9pm: I tightened up all the cables, but doesn't seem to have helped. There was another, larger glitch just now. UR and LL basically don't see it at all (see Attachment #2). It also seems to be a much slower process than the glitches seen on MC1, with the misalignment happening over a few seconds (it is also a lot slower). I have to see if this is consistent with a glitch in the bias voltage to one of the coils which gets low passed by a 4xpole@1Hz filter.
Once we had the beam approximately centered for all of the above 3 PDs, we turned on the locking for IMC, and it seems to work just fine. We are waiting for another hour for switching on the angular allignment for the mirrors to make sure the alignment holds with WFS turned off.
Reboots for c1susaux, c1iscaux, c1auxex today. I took this opportunity to squish the Sat. Box. Cabling for MC2 (both on the Sat box end and also the vacuum feedthrough) as some work has been recently ongoing there, maybe something got accidently jiggled during the process and was causing MC2 alignment to jump around.
Relocked PMC to offload some of the DC offset, and re-aligned IMC after c1susaux reboot. PMC and IMC transmission back to nominal levels now. Let's see if MC2 is better behaved after this sat. box. voodoo.
Interestingly, since Feb 6, there were no slow machine reboots for almost 3 months, while there have been three reboots in the last three weeks. Not sure what (if anything) to make of that.
I tried playing around with the Oplev loop shape on ITMY, in order to see if I could successfully engage the Coil Driver whitening. Unfortunately, I had no success tonight.
I was trying to guess a loop shape that would work - I guess this will need some more careful thought about loop shape optimization. I was basically trying to keep all the existing filters, and modify the low-passing that minimizes control noise injection. By adding a 4th order elliptic low pass with corner at 50Hz and stopband attenuation of 60dB yielded a stable loop with upper UGF of ~6Hz and ~25deg of phase margin (which is on the low side). But I was able to successfully engage this loop, and as seen in Attachment #1, the noise performance above 50Hz is vastly improved. But it also seems that there is some injection of noise around 6Hz. In any case, as soon as I tried to engage the dewhitening, the DAC output quickly saturated. The whitening filter for the ITMs has ~40dB of gain at ~40Hz already, so it looks like the high frequency roll-off has to be more severe.
I am not even sure if the Elliptic filter is the right choice here - it does have the steepest roll off for a given filter order, but I need to look up how to achieve good roll off without compromising on the phase margin of the overall loop. I am going to try and do the optimization in a more systematic way, and perhaps play around with some of the other filters' poles and zeros as well to get a stable controller that minimizes control noise injection everywhere.
Reboots for c1psl, c1iool0, c1iscaux today. MC autolocker log was complaining that the C1:IOO-MC_AUTOLOCK_BEAT EPICS channel did not exist, and running the usual slow machine check script revealed that these three machines required reboots. PMC was relocked, IMC Autolocker was restarted on Megatron and everything seems fine now.
I tried all versions of power cycling and debugging this problem known to me, including those suggested in this thread and from a more recent time. I am leaving things as it for the night, will look into this more tomorrow. I've also shutdown the ETMX watchdog for the time being. Looks like this has been down since 24Jun 8am UTC.
To re-cap, every time I tried to do this in the last month or so, the optic would get kicked around. I suspected that the main cause was the insufficient low-pass filtering on the Oplev loops, which was causing the DAC rms to rail when the whitening was turned on.
I had tried some loop-tweaking by hand of the OL loops without much success last week - today I had a little more success. The existing OL loops are comprised of the following:
THe elliptic low pass was too shallow. For a first pass at loop shaping today, I checked if the resonant gain filter had any effect on the transmitted power RMS profile - turns out it had negligible effect. So I disabled this filter, replaced the elliptic low pass with a 5th order ELP with 2dB passband ripple and 80dB stopband attenuation. I also adjusted the overall loop gain to have an upper UGF for the OL loops around 2Hz. Looking at the spectrum of one coil output in this configuration (ITMY UL), I determined that the DAC rms was no longer in danger of railing.
However, I was still unable to smoothly engage the de-whitening. The optic again kept getting kicked around each time I tried. So I tried engaging the de-whitening on the ITM with just the local damping loop on, but with the arm locked. This transition was successful, but not smooth. Looking at the transmon spot on the camera, every time I engage the whitening, the spot gets a sizeable kick (I will post a video shortly). In my ~10 trials this afternoon, the arm is able to stay locked when turning the whitening on, but always loses lock when turning the whitening off.
The issue here is certainly not the DAC rms railing. I had a brief discussion with Gabriele just now about this, and he suggested checking for some electronic voltage offset between the two paths (de-whitening engaged and bypassed). I also wonder if this has something to do with some latency between the actual analog switching of paths (done by a slow machine) and the fast computation by the real time model? To be investigated.
GV 170628 11pm: I guess this isn't a viable explanation as the de-whitening switching is handled by the one of the BIO cards which is also handled by the fast FEs, so there isn't any question of latency.
With the Oplev loops disengaged, the initial kick given to the optic when engaging the whitening settles down in about a second. Once the ITM was stable again, I was able to turn on both Oplev loops without any problems. I did not investigate the new Oplev loop shape in detail, but compared to the original loop shape, there wasn't a significant difference in the TRY spectrum in this configuration (plot to follow). This remains to be done in a systematic manner.
Plots to support all of this to follow later in the evening.
Attachment #1: Video of ETMY transmission CCD while engaging whitening. I confirmed that this "glitch" happens while engaging the whitening on the UL channel. This is reminiscent of the Satellite Box glitches seen recently. In that case, the problem was resolved by replacing the high-current buffer in the offending channel. Perhaps something similar is the problem here?
Attachment #2: Summary of the ITMY UL coil output spectra under various conditions.
There were a few more flaky things in the Expansion chassis - the IDE connectors don't have "keys" that fix the orientation they should go in, and the whole timing card assembly is kind of difficult and not exactly secure. But for now, things are back to normal it seems.
I attempted to re-lock the DRMI and try and realize some of the noise improvements we have identified. Summary elog, details to follow.
Basically after this point, I was unable to repeat stuff I did earlier in the evening just a couple of hours ago. The single arm locks catch quickly, and seem stable over the hour timescale, but when I run the X arm dither, the BS PITCH loop starts to oscillate at ~0.1 Hz. Moreover, I am unable to acquire PRMI carrier lock. I must have changed a setting somewhere that I am not catching right now (although I've scripted most of these things for repeatability, so I am at a loss what I'm missing ). The only change I can think of is that I changed the BS Oplev loop shape. But I went back into the filter file archives and restored these to their original configuration. Hopefully I'll have better luck figuring this out tomorrow.
I'm going to go squish cables and the usual sat. box voodoo, hopefully that settles it.
I am not attempting a full characterization tonight, but the important changes since the May locks are in the de-whitening boards and coil driver boards. I did not attempt to engage the coil-dewhitening, but the PD whitening works fine.
As a quick check, I tested the hypothesis that the BS OL loop A2L coupling dominates between ~10-50Hz. The attached control signal spectra [Attachment #2] supports this hypothesis. Now to actually change the loop shape.
I've centered Oplevs of all vertex optics, and also the beams on the REFL and AS PDs. The ITMs and BS have been repeatedly aligned since re-installing their respective coil driver electronics, but the SRM alignment needed some adjustment of the bias sliders.
Full characterization to follow. Some things to check:
Lesson learnt: Don't try and change too many things at once!
GV July 5 1130am: Looks like the MICH loop gain wasn't set correctly when I took the attached spectra, seems like the bump around 300Hz was caused by this. On later locks, this feature wasn't present.
Reboots for c1susaux, c1iscaux today.
I've been making NBs on my laptop, thought I would get the copy under version control up-to-date since I've been negligent in doing so.
The code resides in /ligo/svncommon/NoiseBudget, which as a whole is a git directory. For neatness, most of Evan's original code has been put into the sub-directory /ligo/svncommon/NoiseBudget/H1NB/, while my 40m NB specific adaptations of them are in the sub-directory /ligo/svncommon/NoiseBudget/NB40. So to make a 40m noise budget, you would have to clone and edit the parameter file accordingly, and run python C1NB.py C1NB_2017_04_30.py for example. I've tested that it works in its current form. I had to install a font package in order to make the code run (with sudo apt-get install tex-gyre ), and also had to comment out calls to GwPy (it kept throwing up an error related to the package "lal", I opted against trying to debug this problem as I am using nds2 instead of GwPy to get the time series data anyways).
There are a few things I'd like to implement in the NB like sub-budgets, I will make a tagged commit once it is in a slightly neater state. But the existing infrastructure should allow making of NBs from the control room workstations now.
We spent some time trying to get the noise-budgeting code running today. I guess eventually we want this to be usable on the workstations so we cloned the git repo into /ligo/svncommon. The main objective was to see if we had all the dependencies for getting this code running already installed. The way Evan has set the code up is with a bunch of dictionaries for each of the noise curves we are interested in - so we just commented out everything that required real IFO data. We also commented out all the gwpy stuff, since (if I remember right) we want to be using nds2 to get the data.
Running the code with just the gwinc curves produces the plots it is supposed to, so it looks like we have all the dependencies required. It now remains to integrate actual IFO data, I will try and set up the infrastructure for this using the archived frame data from the 2016 DRFPMI locks..
About 2 weeks ago, I noticed some odd behaviour of the LSC TRY data stream. Its DC value seems to be drifting ~10x more than TRX. Both signals come from the transmission QPDs. At the time, we were dealing with various CDS FE issues but things have been stable on that end for the last two weeks, so I looked into this a bit more today. It seems like one particular channel is bad - Quadrant 4 of the ETMY TRANS QPD. Furthermore, there is a bump around 150Hz, and some features above 2kHz, that are only present for the ETMY channels and not the ETMX ones.
Since these spectra were taken with the PSL shutter closed and all the lab room lights off, it would suggest something is wrong in the electronics - to be investigated.
The drift in TRY can be as large as 0.3 (with 1.0 being the transmitted power in the single arm lock). This seems unusually large, indeed we trigger the arm LSC loops when TRY > 0.3. Attachment #2 shows the second trend of the TRX and TRY 16Hz EPICS channels for 1 day. In the last 12 hours or so, I had left the LSC master switch OFF, but the large drift of the DC value of TRY is clearly visible.
In the short term, we can use the high-gain THORLABS PD for TRY monitoring.
Attachment #1: State of CDS overview screen as of 9.30AM today morning when I came in.
Looks like there may have bene a power glitch, although judging by the wall StripTool traces, if there was one, it happened more than 8 hours ago. FB is down atm so can't trend to find out when this happened.
All FEs and FB are unreachable from the control room workstations, but Megatron, Optimus and Chiara are all ssh-able. The latter reports an uptime of 704 days, so all seems okay with its UPS. Slow machines are all responding to ping as well as telnet.
Recovery process to begin now. Hopefully it isn't as complicated as the most recent effort [FAMOUS LAST WORDS]
I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".
Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.
In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling.
A bit more digging on the diagnostics page of the RAID array reveals that the two power supplies actually failed on Jun 2 2017 at 10:21:00. Not surprisingly, this was the date and approximate time of the last major power glitch we experienced. Apart from this, the only other error listed on the diagnostics page is "Reading Error" on "IDE CHANNEL 2", but these errors precede the power supply failure.
Perhaps the power supplies are not really damaged, and its just in some funky state since the power glitch. After discussing with Jamie, I think it should be safe to power cycle the Jetstor RAID array once the FB machine has been powered down. Perhaps this will bring back one/both of the faulty power supplies. If not, we may have to get new ones.
The problem with FB may or may not be related to the state of the Jestor RAID array. It is unclear to me at what point during the boot process we are getting stuck at. It may be that because the RAID disk is in some funky state, the boot process is getting disrupted.
After a couple of minutes, the front LCD display seemed to indicate that it had finished running some internal checks. The messages indicating failure of power units, which was previously constantly displayed on the front LCD panel, was no longer seen. Going back to the control room and checking the web diagnostics page, everything seemed back to normal.
Jamie suggested verifying that the problem is indeed with the disk and not with the controller, so I tried switching the original boot disk to Slot #1 (from Slot #0 where it normally resides), but the same problem persists - the green "OK" indicator light keeps flashing even in Slot #1, which was verified to be a working slot using the spare 2.5 inch disk. So I think it is reasonable to conclude that the problem is with the boot disk itself.
The disk is a Seagate Savvio 10K.2 146GB disk. The datasheet doesn't explicitly suggest any recovery options. But Table 24 on page 54 suggests that a blinking LED means that the disk is "spinning up or spinning down". Is this indicative of any particular failure moed? Any ideas on how to go about recovery? Is it even possible to access the data on the disk if it doesn't spin up to the nominal operating speed?
I think this is the boot disk failure. I put the spare 2.5 inch disk into the slot #1. The OK indicator of the disk became solid green almost immediately, and it was recognized on the BIOS in the boot section as "Hard Disk". On the contrary, the original disk in the slot #0 has the "OK" indicator kept flashing and the BIOS can't find the harddisk.
Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.
If we have a SATA/USB adapter, we can test if the disk is still responding or not. If it is still responding, can we probably salvage the files?
Chiara used to have a 2.5" disk that is connected via USB3. As far as I know, we have remote and local backup scripts running (TBC), we can borrow the USB/SATA interface from Chiara.
If the disk is completely gone, we need to rebuilt the disk according to Jamie, and I don't know how to do it. (Don't we have any spare copy?)
I couldn't find an external docking setup for this SAS disk, seems like we need an actual controller in order to interface with it. Mike Pedraza in Downs had such a unit, so I took the disk over to him, but he wasn't able to interface with it in any way that allows us to get the data out. He wants to try switching out the logic board, for which we need an identical disk. We have only one such spare at the 40m that I could locate, but it is not clear to me whether this has any important data on it or not. It has "hda RTLinux" written on its front panel with a sharpie. Mike thinks we can back this up to another disk before trying anything, but he is going to try locating a spare in Downs first. If he is unsuccessful, I will take the spare from the 40m to him tomorrow, first to be backed up, and then for swapping out the logic board.
Chatting with Jamie and Koji, it looks like the options we have are:
Keith Thorne sent us two disks: one has the daqd code and the second is the boot disk for the FE machines. Since Jamie managed to successfully compile the daqd code on FB1 yesterday, we decided to try the following: mount the boot disk KT sent us (using a SATA/USB adapter) on /mnt on FB1, get the FEs booted up, and restart the RT models.
I just want to mention that the situation is actually much more dire than we originally thought. The diskless NFS root filesystem for all the front-ends was on that fb disk. If we can't recover it we'll have to rebuilt the front end OS as well.
As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.
While on FB1, Jamie realized he actually had a copy of the /diskless/root directory, which is the NFS filesystem for the FEs, on FB1. So we decided to try and boot some of the FEs with this (instead of starting from scratch with the disks KT sent us). The way things were set up, the FEs were querying the FB machine as the DHCP server. But today, we followed the instructions here to get the FEs to get their IP address from chiara instead. We also added the line
to /etc/exports followed by exportfs -ra on FB1. At which point the FE machine we were testing (c1lsc) was able to boot up.
However, it looks like the NFS filesystem isn't being mounted correctly, for reasons unknown. We commented out some of the rtcds related lines in /etc/rc.local because they were causing a whole bunch of errors at boot (the lines that were touched have been tagged with today's date).
So in summary, the status as of now is:
We will resume recovery efforts on Monday.
Some days ago, I stumbled upon this github page, by a grad student at KIT who developed this code as he was working with Basler GigE cameras. Since we are having trouble installing SnapPy, I figured I'd give this package a try. Installation was very easy, took me ~10mins, and while there isn't great documentation, basic use is very easy - for instance, I was able to adjust the exposure time, and capture an image, all from Pianosa. The attached is some kind of in-built function rendering of the captured image - it is a piece of paper with some scribbles on it near Jigyasa's BRDF measurement setup on the SP table, but it should be straightforward to export the images in any format we like. I believe the axes are pixel indices.
Of course this is only a temporary solution as I don't know if this package will be amenable to interfacing with EPICS servers etc, but seems like a useful tool to have while we figure out how to get SnapPy working. For instance, the HDR image capture routine can now be written entirely as a Python script, and executed via an MEDM button or something.
A rudimentary example file can be found at /opt/rtcds/caltech/c1/scripts/GigE/PyPylon/examples - some of the dictionary keywords to access various properties of the camera (e.g. Exposure time) are different, but these are easy enough to figure out.
I've been working on improving the 40m FINESSE model I set up sometime last year (where the goal was to model various RC folding mirror scenarios). Specifically, I wanted to get the locking feature of FINESSE working, and also simulate the DRMI (no arms) configuration, which is what I have been working on locking the real IFO to. This elog is a summary of what I have from the last few days of working on this.
GV Edit: EQ pointed out that my method of taking the slope of the error signal to compute the sensing element isn't the most robust - it relies on choosing points to compute the slope that are close enough to the zero crossing and also well within the linear region of the error signal. Instead, FINESSE allows this computation to be done as we do in the real IFO - apply an excitation at a given frequency to an optic and look at the twice-demodulated output of the relevant RFPD (e.g. for PRCL sensing element in the 1f DRMI configuration, drive PRM and demodulate REFL11 at 11MHz and the drive frequenct). Attachment #4 is the sensing matrix recomputed in this way - in this case, it produces almost identical results as the slope method, but I think the double-demod technique is better in that you don't have to worry about selecting points for computing the slope etc.
This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.
At around 10:30AM today morning, the PSL mysteriously shut off. Steve and I confirmed that the NPRO controller had the RED "OFF" LED lit up. It is unknown why this happened. We manually turned the NPRO back on and hte PMC has been stably locked for the last hour or so.
There are so many changes to lab hardware/software that have been happening recently, it's not entirely clear to me what exactly was the problem here. But here are the observations:
Steve says that this kind of behaviour is characteristic of a power glitch/surge, but nothing else seems to have been affected (I confirmed that the X and Y end lasers are ON).
Now that all the front end models are running, I re-aligned the IMC, locked it manually, and then tweaked the alignment some more. The IMC transmission now is hovering around 15300 counts. I re-enabled the Autolocker and FSS Slow loops on Megatron as well.
Today I got the mx/open-mx networking working for the front ends. This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel. Again, this will all be documented.
Currently, I am unable to engage the coil-dewhitening filters without destroying cavity locks. One reason why this is so is because the present Oplev servos have a roll-off at high frequencies that is not steep enough - engaging the digital whitening + analog de-whitening just causes the DAC output to saturate. Today, Rana and I discussed some ideas about how to approach this problem. This elog collects these thoughts. As I flesh out these ideas, I will update them in a more complete writeup in T1700363 (placeholder for now). Past relevant elogs: 5376, 9680.
Before the CDS went down, I had taken error signal spectra for the ITMs. I will update this elog tomorrow with these measurements, as well as some noise estimates, to get started.