Looks to have worked this time around.
controls@fb1:~ 0$ sudo dd if=/dev/sda of=/dev/sdc bs=64K conv=noerror,sync
33554416+0 records in
33554416+0 records out
2199022206976 bytes (2.2 TB) copied, 55910.3 s, 39.3 MB/s
You have new mail in /var/mail/controls
I was able to mount all the partitions on the cloned disk. Will now try booting from this disk on the spare machine I am testing in the office area now. That'd be a "real" test of if this backup is useful in the event of a disk failure.
The 4TB HGST drives have arrived. I've started the FB1 dd backup process. Should take a day or so.
Gabriele had prepared a C code implementation of his NN for MICH/PRCL state estimation. He had been trying to get it going on some of the machines in WB, but was unsuccessful. The version of RCG he was trying to compile and run the code on was rather dated so we decided to give it a whirl on our new RCG3.4 here at the 40m. Just noting down stuff we tried here:
All models have been reverted to their state prior to this test, and everything on the CDS_OVERVIEW MEDM screen is green now.
I spent this weekend doing a more careful investigation of the DRMI noise. I think I have some new information/insights. Attachment #1 is the noise budget (png attached because pdf takes forever to upload, probably some ImageMagick problem. The last attachment is a tarball of the PDF). Long elog, so here are the Highlights:
The DAC noise is not limiting us anywhere when the coil de-whitening is switched on.
So I don't understand the measured Dark Noise level, and it is limiting us at frequencies > 200Hz. Some busted electronics in the input signal chain? Or can the LSC demod daughter board gain of ~5 explain the observed noise?
Edit 1730 9 Oct: I had missed out the factor of 5 gain in the demod board in calculating the shot noise curve. Attachment #7 shows the corrected shot noise level. Explicitly:
, where is to convert shot noise in W to displacement units.
This seems to be limiting us from saturating the dark noise once the coil de-whitening is engaged. But lack of coherence means the mechanism is not re-injection of SRCL/PRCL sensing noise? Need to think about what this means / how we can mitigate it.
I've also made several changes to the NB code - will push to git once I finish cleaning stuff up, but it is now much faster to make these plots and see what's what.
I measured the output voltage noise of the Q output of the AS55 Demod Board with the PSL shutter closed, using the SR785 (see Attachment #1). The measured noise is consistent with the expected number of ~120nV/rtHz around 100Hz. I had measured the gain of this board from RFPD input to Q output to be ~5.1: so if the PD dark noise is 16nV/rtHz, this would be amplified to ~80nV/rtHz. Still a discrepancy of ~50%. I didn't measure the noise with the PD input terminated. Added the noise of the demod board output with the RFPD input terminated. The level of ~100nV/rtHz seems consistent with the actual PD dark noise being ~80nV/rtHz, as their quadrature sum is around 130nV/rtHz. Need to dig up the schematics for the demod board + daughter board, and check against LISO, to see if this is consistent with what is expected.
Also - I think I was using the wrong value of the DC power on the AS55 photodiode for shot noise calculations - 13mW was for REFL55, not AS55. I did a crude measurement of the power by sticking the Ophir power meter (filter removed) in front of the AS55 PD with the Michelson flashing around, and noticed the maximum value registered was ~1.2mW. So in the DRMI lock, there would be ~2.4mW, which is 10x lower than the value I was assuming. I've made the correction in the NB code, for the next time the plot is generated. A more rigorous measurement would involve sticking the Ophir in front of the AS110 PD during a DRMI lock. The light from the AS port is split by a 50-50 BS to the AS55 and AS110 PDs (so measuring at AS110 is a reasonable proxy for power at AS55), and the AS110 signals are not used for triggering in the DRMI lock, so this is feasible.
I keep adding traces to this plot, here is the most complete one I have now. Looks like the input noise to the D040179 (measured at "Q out" SMA jack of D990511 with RFPD input terminated) is ~10nV/rtHz. This supports the hypothesis that something is wonky on the daughter board, because the purple trace should only be the quad sum of the orange and green traces. I will pull it out and have a look.
Some other follow-ups on the questions raised at the meeting:
I tried replacing the AD797s on the daughter board with OP27s, and saw no significant improvement in the electronics noise of the demod board. Note that according to LISO, in this configuration, the voltage noise of the Op27 is expected to dominate the total noise of the daughter board. Measurement condition was that the RFPD input was terminated, but the LO input was still being driven (SR785 input range is -50dBVpk for all traces, and the input ranging was set to "UpOnly"). Need to do a more systematic investigation to figure out where this excess noise is coming from. I will upload photos of the board later.
This supports the hypothesis that something is wonky on the daughter board, because the purple trace should only be the quad sum of the orange and green traces. I will pull it out and have a look.
I worked on the daughter board a little more in the evening. I have somehow managed to make the dark noise ~25% worse [Attachment #1].
After making all these changes, I re-installed the card in the eurocrate and repeated the measurement. The Q channel noise was close to the expected value (~50nV/rtHz), but the I channel is twice as noisy. I will continue this investigation tomorrow.
Here is the marked up schematic with the board as it is stuffed. Annoyingly, there is a capacitor (C1) which according to the schematic is supposed to be open, but is stuffed in our board. I can't find any elog about this, and its a pain to measure the value of this capacitance. I will upload all of this + LISO + noise model/measurements to a 40m AS55 daughter board DCC page.
Steve reported problems getting the X arm locked. Alignment sliders were inaccessible. Eurocrate key turning reboots for c1susaux, c1auxex,c1auxey, c1iscaux and c1aux. Usual precautions were taken for ITMX.
This is becoming a once-a-week thing .
Attachment #1 - Measured / modelled noises for AS55 demod board. I've plotted quadrature sum of the LISO trace with the SR785 noise floor with input terminated to ground via 50ohm. Note that these measurements were made after all the changes in the marked up schematic in the previous elog were implemented.
Both channels should be identical - I don't understand why the I channels are noisier than their Q counterparts. This is almost certainly a problem on the daughter board, as the orange traces are pretty much identical for both channels.
The dark red curves were measured by shorting the inputs to D040179 to ground via 50ohms using some Pomona minigrabbers - I wanted to avoid ripping the daughter board out, but this probably explains the excess noise compared to the green trace at low frequencies. All other measurements were made with the board installed in the LSC rack eurocrate, with the LO input driven at the nominal level (I didn't measure this yesterday but a measurement from ~6months ago says that this level is 1.5dBm).
Wall StripTool traces showed that IMC has not been locked for at least 8 hours when I came in this morning. Going to the IMC autolocker log, it looks like the last timestamp was at ~6pm yesterday. Megatron was responding to ping, but I couldn't ssh into it. So I went over to the machine and did a hard-reboot via front panel power switch. The computer took ~10mins to come back online and respond to ping. Once it did, I was able to ssh into it. However, trying the usual commands to restart the IMC autolocker and FSS Slow loops didn't work. Specifically, monitoring the logfile with tail -f Autolocker.log, I would see that the autolocker seemed to get stuck after starting the "blinky" script. Trying to restart the process using sudo initctl restart MCautolocker, init would print to shell that the restart had worked, and reported the PID, but the logfile wouldn't update "live" as it should when tail is used with the -f option. All very strange .
Anyways, as a last resort, I kill -9'ed the PID for the init instance, and init automatically restarted the Autolocker - this did the trick, IMC is locked now and logfile seems to be getting updated normally.
I also cleared a bunch of matlab crash dump files in the home directory.
Koji suggested looking at the output of the AS55 demod board on a fast oscilloscope to look for differences in the two channel outputs (if there is some high-frequency oscillations, for example, we could miss this information in the SR785 spectra). Besides, I was only looking at spectra out to a few kHz on the SR785. I grabbed this data with a 300MHz BW Tektronix oscilloscope (battery mode) today. Input impedance of both channels were set to 1Mohm, and the measurement was made with the RFPD input terminated, output of the daughter board is what is measured. The vertical scaling of the channels was set to the minimum allowed, 1mV/div.
Attachment #1 shows that there is indeed a visible difference between the two channels - the (noisier) I channel has a much larger DC offset of ~5mV compared to the Q channel (I tried switching channels on the O'scope and the larger DC offset remained on the I channel, so seems real). There is also some kind of oscillation going on in the I channel, although the frequency is pretty low, with the peaks spaced ~50us apart. Indeed, in the ASD of the acquired data, the excess power in the I channel at 20kHz and higher harmonics are evident (see Attachment #2). Anyway all of this points to something being anomalous on the daughter board I channel signal path - I will pull it out and monitor the outputs at various points along the signal path with the fast scope to see if I can narrow down what's going on where.
We took a closer look at the AS55 demod board today. The procedure was to just be as thorough as possible, and check the behaviour of the circuit (both Transfer Function and Noise) stage by stage. Checking the transfer function was the key.
During this process, we found that the reason why the Q channels had lower noise than the I channels was because of the gain of the AD829 stage of the circuit was 0dB rather than 4dB (which is what it should be according to the component values used). Specifically, resistor R12, which is supposed to be 1.30kohm, was measured to be 1.03kohm. Replacing this resistor, the transfer functions (see Attachment #1) and noise levels (see Attachment #2) match the expectations from LISO. Some notes:
Assming the whitened ADC noise level is much below this (should only be ~10nV/rtHz), and given the measured sensing element of 6.2e8 V/m, this means that the dark noise sets a maximum achievable sensitivity of 2e-16m/rtHz.
To figure out what (if anything) is to be done next, we need to first figure out what is the goal. In the end, we care about DARM and not MICH. The optical gain for the former is ~300x the latter, so the dark noise contribution gets scaled by this factor (giving us a number of 7e-19 m/rtHz). There are certainly many noises above that level which have to be handled first. Indeed, looking at the DARM spectrum from DRFPMI lock back in March 2016, it looks like the current 1f DRMI (with coils de-whitened) Michelson sensitivity is within a factor of 2 of DARM in the full lock (albeit with vertex DoFs on 3f signals, and no coil de-whitening). Koji pointed out that we need to consider the photodiode resonant circuit itself too.
TODO: Upload all this onto the DCC
While working on the IFO tonight, I noticed that the blinky status lights on c1iscex and c1iscey were frozen (but those on the other 3 FEs seemed fine). But all other lights on the CDS overview screen were green I couldn't access testpoints from these machines, and the EPICS readbacks for models on these FEs (e.g. Oplev servo inputs outputs etc) were frozen at some fixed value. This lasted for a good 5 minutes at least. But the blinky lights started blinking again without me doing anything. Not sure what to make of this. I am also not sure how to diagnose this problem, as trending the slow EPICS records of the CPU execution cycle time (for example) doesn't show any irregularity.
I was looking at the ASDC channel on dataviewer, and toggling various settings like whitening gain. At some point, the signal just froze. So I quit dataviewer and tried restarting it, at which point it complained about not being able to connect to FB. This is when I brought up the CDS_OVERVIEW medm screen, and noticed the frozen 1pps indicator lights. There was certainly something going on with the end FEs, because I was able to ping the machine, but not ssh into it. Once the 1pps lights came back, I was able to ssh into c1iscex and c1iscey, no problems.
Could it be that some of the mx processes stalled, but the systemctl routine automatically restarted them after some time?
So this wasn't just an EPICS freeze? I don't see how this had anything to do with any of the work I did earlier today. I didn't modify any of the running front ends, didn't touch either of the end station machines or the DAQ, and didn't modify the network in any way. I didn't leave anything running.
If you couldn't access test points then it sounds like it was more than just EPICS. It sounds like maybe the end machines somehow fell of the network momentarily. Was there anything else going on at the time?
The signal path for the ASDC signal is AS55 PD --> D990543 (interface board) --> D990694 (whitening board) --> D000076 (AA board) --> ADC Ch 31. Everything in this signal chain should be able to handle signals in the range +/- 10V, which should correspond to the full range of our +/-10V, 16bit ADCs. But the ASDC signal seems to saturate at ~2000 counts (i.e. turning up the analog whitening gain doesn't make the signal get any bigger than this). I investigated this a little more today.
So the problem lies somewhere downstream of the D990694. There are other anomalous behaviours of this channel - e.g. engaging the analog whitening filters changes the DC offset of the signal. I am going to pull out this board to check it out.
Why does this matter? I want to calibrate the ASDC level (and eventually the other PD DC signals as well) into Watts. This is useful for IFO diagnostics, noise budgeting the shot noise level etc.
According to the AS55 schematic, the DC transimpedance is 66.7 ohms. I claim that the DC power on the AS55 photodiode during a DRMI (no arms) lock is ~1mW. The C30642 photodiode (InGaAs) responsivity is ~0.8 A/W. So I'd expect ~50mV to be the signal level into the ADC (assuming gain of all the other electronics in the signal chain at the start of this elog is unity). This corresponds to ~163 counts (since the ADC conversion factor is 2^16 counts over 20volts). The DC signal level I observed is ~200 counts. So things seem roughly consistent.
*Note: Despite my above statement, I don't think it is true that the AS110 PD has more light on it - the BS splitting the light between
AS55 and AS110 PDs is a 50-50 BS, and using the crude method of putting an Ophir power meter in front of both PDs and
monitoring the power while the Michelson was swinging around freely showed roughly the same maximum value.
Last night, I collected ~30mins of data for the vertex seismometer channels and the POP QPD PIT/YAW signals with the PRMI locked on carrier (angular FF OFF). The ITM Oplev loops weren't DC coupled, as they are in the full IFO locking sequence, but I feel like the angular FF filters can be improved - there are frequent sharp dives in the AS110 signal level which are correlated with large amplitude motion of the POP spot on the control room CCD monitor.
Repeating the frequency domain multicoherence analysis using BS_X and BS_Y seismometer channels as witnesses suggest that we can win significantly (see Attachment #1).
I've never really implemented feedforward filters - I was planning on using ericq's latest entry on this subject as a guide. From what I gather, the procedure is as follows:
Some notes from Rana from some years ago: https://nodus.ligo.caltech.edu:8081/40m/11519
If anyone has pointers / other considerations I should take into account, please post here.
This happened again just now - it was roughly this time when this happened last night as well.
There was certainly an EPICS freeze of the kind we were used to seeing prior to replacing the martian wireless router sometime in late 2015 (or early 2016?). I was trying to run the dither alignment servos on the Y-arm at this time, and all the StripTool traces flatlined.
I took the opportunity to try accessing testpoints from the iscey ADCs - specifically C1:SUS-TRY_OUT, and it seemed to work just fine. However, I couldn't ssh into c1iscey.
Looking at the dmesg once I was able to ssh in eventually (~2 minutes deadtime tonight, I feel like it was longer yesterday but can't quantify), I see the following: not sure if there are any clues in here, or whether this is the correct log to check. But there are many instances of the nfs server related message in the log. Note that the system time-stamp corresponds to when this freeze happened.
[5461308.784018] nfs: server 192.168.113.201 not responding, still trying
[5461412.936284] nfs: server 192.168.113.201 OK
[5461412.937130] systemd: Starting Journal Service...
[5461412.947947] systemd-journald: Received SIGTERM from PID 1 (systemd).
[5461412.996063] systemd: Unit systemd-journald.service entered failed state.
[5461413.002627] systemd: systemd-journald.service has no holdoff time, scheduling restart.
[5461413.008983] systemd: Stopping Journal Service...
[5461413.014664] systemd: Starting Journal Service...
[5461413.044262] systemd: Started Journal Service.
[5461413.694838] systemd-journald: Received request to flush runtime journal from PID 1
[steve, jamie, gautam]
The machine that now serves as out Frame Builder, FB1, was sitting on top of megatron. I decided that this wasn't ideal, and asked Steve to get some alternative mounting solution. Today, he procured some shelves to put FB1 on. Jamie suggested looking for the slider-rail that came with the machine, and using that instead, as it will allow us to slide FB1 out of the rack as we do megatron and the old FB. But as luck would have it, the distance between the rack vertical posts is 26 inches, but the rail is 27 inches. So we had to accept using the less ideal solution of putting FB1 on two shelves, with no sliding option. Photo to be uploaded shortly.
For this work, I had to shutdown FB1 for about 1 hour between 3pm and 4pm. It seems to have come back up fine now.
Alex is going to have an undergrad work on a calibration optimization project on the 40m RTCDS system. For this purpose, we wanted to setup a "Simulated DARM loop". Today, Alex and I set this up. I figured we can use the c1tst model for this purpose. We basically copied the topology from Figure 2 of the h(t) paper. Attached are screenshots of the MEDM screens of the system we setup, and the simulink block diagram - the main screen can be accessed from the "SIM PLANT" tab in the sitemp.
It remains to setup the appropriate filters in the filter banks, and an EPICS channel monitor for monitoring the single excitation testpoint in the model. We also did not set up any DQ channels for the time being, as it is not even clear to me what channels need to be DQ-ed.
None of the 3 dd backups I made were bootable - at boot, selecting the drive put me into grub rescue mode, which seemed to suggest that the /boot partition did not exist on the backed up disk, despite the fact that I was able to mount this partition on a booted computer. Perhaps for the same reason, but maybe not.
After going through various StackOverflow posts / blogs / other googling, I decided to try cloning the drives using ddrescue instead of dd.
This seems to have worked for nodus - I was able to boot to console on the machine called rosalba which was lying around under my desk. I deliberately did not have this machine connected to the martian network during the boot process for fear of some issues because of having multiple "nodus"-es on the network, so it complained a bit about starting the elog and other network related issues, but seems like we have a plug-and-play version of the nodus root filesystem now.
chiara and fb1 rootfs backups (made using ddrescue) are still not bootable - I'm working on it.
Nov 6 2017: I am now able to boot the chiara backup as well - although mysteriously, I cannot boot it from the machine called rosalba, but can boot it from ottavia. Anyways, seems like we have usable backups of the rootfs of nodus and chiara now. FB1 is still a no-go, working on it.
controls@fb1:~ 0$ sudo dd if=/dev/sda of=/dev/sdc bs=64K conv=noerror,sync
33554416+0 records in
33554416+0 records out
2199022206976 bytes (2.2 TB) copied, 55910.3 s, 39.3 MB/s
You have new mail in /var/mail/controls
Eurocrate key turning reboots today morning for c1psl and c1aux.c1auxex and c1auxey are also down but I didn't bother keying them for now. PSL FSS slow loop is now active again (its inactivity was what prompted me to check status of the slow machines).
Note that the EPCIS channels for PSL shutter are hosted on c1aux.But looks like the slow machine became unresponsive at some point during the weekend, so plotting the trend data for the PSL shutter channel would have you believe that the PSL shutter was open all the time. But the MC_REFL DC channel tells a different story - it suggests that the PSL shutter was closed at ~4AM on Sunday, presumably by the vacuum interlock system. I wonder:
Eurocrate key turning reboots today morning for and c1susaux, c1auxex and c1auxey. Usual precautions were taken to minimize risk of ITMX getting stuck.
The IFO hasn't been aligned in ~1week, so I recovered arm and PRM alignment by locking individual arms and also PRMI on carrier. I will try recovering DRMI locking in the evening.
As far as MC1 glitching is concerned, there hasn't been any major one (i.e. one in which MC1 is kicked by such a large amount that the autolocker can't lock the IMC) for the past 2 months - but the MC WFS offsets are an indication of when smaller glitches have taken place, and there were large DC offsets on the MC WFS servo outputs, which I offloaded to the DC MC suspension sliders using the MC WFS relief script.
I'd like for the save-restore routine that runs when the slow machines reboot to set the watchdog state default to OFF (currently, after a key-turning reboot, the watchdogs are enabled by default), but I'm not really sure how this whole system works. The relevant files seem to be in the directory /cvs/cds/caltech/target/c1susaux. There is a script in there called startup.cmd, which seems to be the initialization script that runs when the slow machine is rebooted. But looking at this file, it is not clear to me where the default values are loaded from? There are a few "saverestore" files in this directory as well:
Are the "default" channel values loaded from one of these?
Some days ago, I had tried to measure the SRCL->MICH and PRCL->MICH cross couplings using broadband noise injected between 120-180 Hz, a frequency band chosen arbitrarily, in hindsight, I could have done a more broadband test. I've spent some time including the infrastructure to calculate "White-Noise TFs" in the noise budgeting code, where a transfer function is estimated by injecting a "broadband" excitation into a channel of interest, and looking at the resulting response in MICH. I figured this would be useful to estimate other couplings as well, e.g. laser intensity nosie, oscillator noise etc.
I estimate the transfer function of the coupling using the relation (MICH is the median ASD of the MICH error signal in the below expression, and similarly for PRCL)
Attachments #1 and #2 show the spectra of the MICH, PRCL and SRCL signals during 'quiet' times and during the injection, while Attachment #3 shows the calculated coupling TFs using the above relation. These are significantly different (more than 10dB lower) than the numbers I reported in elog 13367, where the measurement was made using swept sine. As can be seen in the attached plots, the injected broadband excitation is visible above the nominal noise level, and I calculated the white noise TFs using ~5mins of data which should be plenty, so I'm not sure atm what to make of the answers from swept-sine and broadband injections being so different.
Attachment #4 shows the noise budget from the October 8 DRMI lock with the updated SRCL->MICH and PRCL->MICH couplings (assumed flat, extrapolated from Attachment #2 in the 120-180Hz band). If these updated coupling numbers are to be believed, then there is still some unexplained noise around 100Hz before we hit the PD dark noise. To be investigated. But if Attachment #4 is to be believed, it is not surprising that there isn't significant coherence between SRCL/PRCL and MICH around 100Hz.
Nov 8 1600: Updating NB to inculde estimated Oplev A2L.
I hadn't re-locked the DRMI after the work on the AS55 demod board. Tonight, I was able to recover the DRMI locking with the old settings.
The feature in the PRCL spectrum (uncalibrated, y-axis is cts/rtHz) at ~1.6kHz is mysterious, I wonder what that's about.
Wasted some time tonight futzing around with various settings because I couldn't catch a DRMI lock, thinking I may have to re-tune demod phases etc given that I've been mucking around the LSC rack a fair bit. But fortunately, the problem turned out to be that the correct feedforward filters were not enabled in the angular feedforward path (seems like these are not SDF monitored). Clue was that there was more angular motion of the POP spot on the CCD than I'm used to seeing, even in the PRMI carrier lock.
After fixing this, lock was acquired within seconds, and the locks are as robust as I remember them - I just broke one after ~20mins locked because I went into the lab. I've been putting off looking at this angular feedforward stuff and trying out some ideas rana suggested, seems like it can be really useful.
As part of the pre-lock work, I dither aligned arms, and then ran the PRCL/MICH dithers as well, following which I re-centered the ITM, PRM and BS Oplevspots onto their respective QPDs - they have not been centered for a couple of months now.
I'm now going to try and measure some other couplings like PSL RIN->MICH, Marconi phase noise->MICH etc.
I tried measuring the coupling of PSL intensity noise by driving some broadband noise bandpassed between 80-300Hz using the spare DAC channel at 1Y3 that I had set up for this purpose a couple of weeks ago (via a battery powered SR560 buffer set to low-noise operation mode because I'm not sure if the DAC output can drive a ~20m long cable). I was monitoring the MC2 TRANS QPD Sum channel spectrum while driving this broadband noise - the "nominal" spectrum isn't very clean, there are a bunch of notches from a 60Hz comb and a forest of peaks over a broad hump from 300Hz-1kHz (see Attachment #1).
I was able to increase the drive to the AOM till the RIN in the band being driven increased by ~10x, and saw no change in the MICH error signal spectrum [see Attachment #1] - during this measurement, the RFPD whitening was turned on for REFL11, REFL55 and AS55, and the ITM coil drivers were de-whitened, so as to get a MICH spectrum that is about as "low-noise" as I've gotten it so far.
I tried increasing the drive further, but at this point, started seeing frequent MC locklosses - I'm not convinced this is entirely correlated to my AOM activities, so I will try some more, but at the very least, this places an upper bound on the coupling from intensity noise to MICH.
The Oplev trace is missing for now, as I have not re-measured the A2L coupling since modifying the Oplev loop shape (specifically the low pass filter and overall gain) to allow engageing the coil de-whitening.
The averaging for the white noise TFs plotted is computed using median averaging - I have used a python transcription of Sujan's matlab code. I use scipy.signal.spectrogram to compute the fft bins (I've set some defaults like 8s fft length and a tukey window), and then take the median average using np.median(). I've also incorporated the ln(2) correction factor.
It seems like GwPy has some in-built capability to compute median (and indeed various other percentile) averages, but since we aren't using it, I just coded this up.
why no oplev trace in the NB ?
also, this method would work better if we had a median averaging python PSD instead of mean averaging as in Welch's method.
We've been talking about increasing the series resistance for the coil driver path for the test masses. One consequence of this will be that we have reduced actuation range.
This may not be a big deal since for almost all of the LSC loops, we currently operate with a limiter on the output of the control filter bank. The value of the limit varies, but to get an idea of what sort of "threshold" velocities we are looking at, I calculated this for our Finesse 400 arm cavities. The calculation is rather simplistic (see Attachment #1), but I think we can still draw some useful conclusions from it:
So, from this rough calculation, it seems like we would lose ~25% efficiency in locking the arm cavity if we up the series resistance from 400ohm to 1kohm. Doesn't seem like a big deal, becuase currently, the single arm locking
There hasn't been a big glitch that misaligns MC1 by so much that the autolocker can't lock for at least 3 months, seems like there was one ~an hour ago.
I disabled autolocker and feedback to the PSL, manually aligned MC1 till the MC_REFL spot looked right on the CCD to me, and then re-engaged the autolocker, all seems to have gone smoothly.
I wanted to use the foton.py utility for my NB tool, and I remember Chris telling me that it was shipping as standard with the newer versions of gds. It wasn't available in the versions of gds available on our workstations - the default version is 2.15.1. So I downloaded gds-2.17.15 from http://software.ligo.org/lscsoft/source/, and installed it to /ligo/apps/linux-x86_64/gds-2.17.15/gds-2.17.15. In it, there is a file at GUI/foton/foton.py.in - this is the one I needed.
Turns out this was more complicated than I expected. Building the newer version of gds throws up a bunch of compilation errors. Chris had pointed me to some pre-built binaries for ubuntu12 on the llo cds wiki, but those versions of gds do not have foton.py. I am dropping this for now.
Jamie pointed out that the compile and install instructions are different for c1dnn:
See also: https://nodus.ligo.caltech.edu:8081/40m/13383.
I think these build instructions have to be run on the c1lsc frontend - in the past, I have been able to compile and install models on any computer with the shared drive mounted (including the control room workstations), but I'm guessing that something has changed since the RCG upgrade. Jamie can correct me on this if I'm wrong.
Pianosa just crashed and ate my elog, along with all the DTT/Foton windows I had open, so more details tomorrow... This workstation has been crashing ~once a month for the last 6 months.
Below ~100Hz, the hypothesis is that the BS oplev A2L contribution dominates the MICH displacement noise. I wanted to see if I could mitigate this my modifying the BS Oplev loop shape.
I've been banging my head against optimal loop shaping, with the OL loop as a test-case, without much success - as was the case with coating PSO, the magic is in smartly defining the cost function, but right now, my optimizer seems to be pushing most of the roots I'm making available for it to place to high frequencies. I've got a term in there that is supposed to guard against this, need to tweak further...
Attachment #2: Eye-fits of measured OL A2L coupling TFs to a 1/f^2 shape, with the gain being the parameter "fitted". I used these value, and the DQ-ed OL error signal in lock, to estimate the red curve labelled "A2L" in Attachment #1. The dots are the measurement, and the lines are the 1/f^2 estimates.
If you go through this thread of elogs, there are lots of pictures of the SOS assembly with the optic in it from the vent last year. I think there are many different perspectives, close ups of the standoffs, and of the OSEMs in their holders in that thread.
This elog has a measurement of the pendulum resonance frequencies with ruby standoffs - although the ruby standoff used was cylindrical, and the newer generation will be in the shape of a prism. There is also a link in there to a document that tells you how to calculate the suspension resonance frequencies using analytic equations.
I've incorporated the functionality to generate sub-budgets for the various grouped traces in the NBs (e.g. the "A2L" trace is really the quadrature sum of the A2L coupling from 6 different angular servos).
For now, I'm only doing this for the A2L coupling, and the AUX length loop coupling groups. But I've set up the machinery in such a way that doing so for more groups is easy.
Here are the sub-budget plots for last night's lock - for the OL plot, there are only 3 lines (instead of 6) because I group the PIT and YAW contributions in the function that pulls the data from the nds server, and don't ever store these data series individually. This should be rectified, because part of the point of making these sub-budgets is to see if there is a particularly bad offender in a given group.
I'll do a quick OL loop noise budget for the ITM loops tomorrow.
I also wonder if it is necessary to measure the Oplev A2L coupling from lock to lock? This coupling will be dependant on the spot position on the optic, and though I run the dither alignment servos to minimize REFL_DC, AS_DC, I don't have any intuition for how the offset from center of optic varies from lock to lock, and if this is at all significant. I've been using a number from a measurement made in May. Need to do some algebra...
I disabled the OL loops for ITMX, ITMY and BS at GPStime 1194897655 to come up with an Oplev noise budget. OL spots were reasonably well centered - by that, I mean that the PIT/YAW error signals were less than 20urad in absolute value.
Attachment #1 is a first look at the DTT spectra - I wonder why the BS Oplev signals don't agree with the ITMs at ~1Hz? Perhaps the calibration factor is off? The sensing noise not really flat above 100Hz - I wonder what all those peaky features are. Recall that the ITM OLs have analog whitening filters before the ADC, but the BS doesn't...
In Attachment #2, I show comparison of the error signal spectra for ITMY and SRM - they're on the same stack, but the SRM channels don't have analog de-whitening before the ADC.
For some reason, DTT won't let me save plots with latex in the axes labels...
I noticed yesterday evening that I wasn't able to engage the single arm locking servos - turned out that they weren't getting triggered, which in turn pointed me to the fact that the arm transmssion channels seemed dead. Poking around a little, I found that there was a red light on the CDS overview screen for c1rfm.
Not sure how to debug further...
* Fix seems to be to restart the sender RFM models (c1scx, c1scy, c1asx, c1asy).
I calibrated the BS oplev PIT and YAW error signals as follows:
The numbers are:
BS Pitch 15 / 130 (old/new) urad/counts
BS Yaw 14 / 170 (old/new) urad/counts
I bet the calibration is out of date; probably we replaced the OL laser for the BS and didn't fix the cal numbers. You can use the fringe contrast of the simple Michelson to calibrate the OLs for the ITMs and BS.
The numbers I have from the fitting don't agree very well with the OSEM readouts. Attachment #1 shows the Oplev pitch and yaw channels, and also the OSEM ones, while I swept the ASC_PIT offset. The output matrix is the "naive" one of (+1,+1,-1,-1). SUSPIT_IN1 reports ~30urad of motion, while SUSYAW_IN1 reports ~10urad of motion.
From the fits, the BS calibration factors were ~x8 for pitch and x12 for yaw - so according to the Oplev channels, the applied sweep was ~80urad in pitch, and ~7urad in yaw.
Seems like either (i) neither the Oplev channels nor the OSEMs are well diagonalized and that their calibration is off by a factor of ~3 or (ii) there is some significant imbalance in the actuator gains of the BS coils...
Need to double check against OSEM readout during the sweep.
Per our discussions in the meetings over the last week, I've tried to put together a simple Oplev noise budget. The only two terms in this for now are the dark noise and a model for the seismic noise, and are plotted together with the measured open-loop error signal spectra.
I restored the nodus crontab (copied over from the Nov 17 backup of the same at /opt/rtcds/caltech/c1/scripts/crontab/crontab_nodus.20171117080001. There wasn't a crontab, so I made one using sudo crontab -e.
This crontab is supposed to execute some backup scripts, send pizza emails, check chiara disk usage, and backup the crontab itself.
I've commented out the backup of nodus' /etc and /export for now, while we get back to fully operational nodus (though we also have a backup of /cvs/cds/caltech/nodus_backup on the external LaCie drive), they can be re-enabled by un-commenting the appropriate lines in the crontab.
The post OS migration admin for nodusa bout apache, elogd, svn, iptables, etc can be found in https://wiki-40m.ligo.caltech.edu/NodusUpgradeNov2017
Update: The svn dump from the old svn was done, and it was imported to the new svn repository structure. Now the svn command line and (simple) web interface is running. "websvn" is not installed.
Confirmed that this crontab is running - the daily backup of the crontab seems to have successfully executed, and there is now a file crontab_nodus.ligo.caltech.edu.20171122080001 in the directory quoted below. The $HOSTNAME seems to be "nodus.ligo.caltech.edu" whereas it was just "nodus", so the file names are a bit longer now, but I guess that's fine...
What is the best way to set this test up?
I think we need a QPD to monitor the spot rather than a single element PD, to answer this question about the sensor noise. Ideally, we want to shoot the HeNe beam straight at the QPD - but at the very least, we need a lens to size the beam down to the same size as we have for the return beam on the Oplevs. Then there is the power - Steve tells me we should expect ~2mW at the output of these HeNes. Assuming 100kohm transimpedance gain for each quadrant and Si responsivity of 0.4A/W at 632nm, this corresponds to 10V (ADC limit) for 250uW of power - so it would seem that we need to add some attenuating optics in the way.
Also, does anyone know of spare QPDs we can use for this test? We considered temporarily borrowing one of the vertex OL QPDs (mark out its current location on the optics table, and move it over to the SP table), but decided against it as the cabling arrangement would be too complicated. I'd like to use the same DAQ electronics to acquire the data from this test as that would give us the most direct estimate of the sensor noise for supposedly no motion of the spot, although by adding 3 optics between the HeNe and the QPD, we are introducing possible additional jitter couplings...
For the OL NB, probably don't have to fudge any seismic noise, since that's a thing we want to suppress. More important is "what the noise would be if the suspended mirrors were no moving w.r.t. inertial space".
For that, we need to look at the data from the OL test setup that Steve is putting on the SP table.
I've attempted to visualize the various components of the cost function in the way I've defined it for the current iteration of the Oplev optimal control loop design code. For each term in the cost function, the way the cost is computed depends on the ratio of the abscissa value to some threshold value (set by hand for now) - if this ratio is >1, the cost is the logarithm of the ratio, whereas if the ratio is <1, the cost is the square of the ratio. Continuity is enforced at the point at which this transition happens. I've plotted the cost function for some of the terms entering the code right now - indicated in dashed red lines are the approximate value of each of these costs for our current Oplev loop - the weights were chosen so that each of the costs were O(10) for the current controller, and the idea was that the optimizer could drive these down to hopefully O(1), but I've not yet gotten that to happen.
Based on the meeting yesterday, some possible ideas:
I've setup a test setup on the ITMY Oplev table. Details + pics to follow, but for now, be aware that
Here are some pics of the setup: https://photos.app.goo.gl/DHMINAV7aVgayYcf1.None of the existing Oplev input/output steering optics were touched. Steve can make modifications as necessary, perhaps we can make similar mods to the SRM Oplev QPD and the BS one to run the HeNe test for a few days...
too complex; just shoot straight from the HeNe to the QPD. We lower the gain of the QPD by changing the resistors; there's no sane reason to keep the existing 100k resistors for a 2 mW beam. The specular reflection of the QPD must be dumped on a black glass V dump (not some flimsy anodized aluminum or dirty razor stack)
Here are a couple of preliminary plots of the noise from a 20minute stretch of data - the new curve is the orange one, labelled sensing, which is the spectrum of the PIT/YAW error signal from the HeNe beam single bounce off a single steering mirror onto the QPD, normalized to account for the difference in QPD sum. The peaky features that were absent in the dark noise are present here.
I am a bit confused about the total sum though - there is ~2.5mW of light incident on the PD, and the transimpedance gain is 10.7kohm. So I would expect 2.5e-3 mW * 0.4A/W * 10.7 kV/A ~ 10.7V over 4 quadrants. The ADC is 16 bit and has a range +/- 10V, so 10.7 V should be ~35,000 cts. But the observed QPD sum is ~14,000 counts. The reflected power was measured to be ~250uW, so ~10% of the total input power. Not sure if this is factored into the photodiode efficiency value of 0.4A/W. I guess there is some fraction of the QPD that doesn't generate any photocurrent (i.e. the grooves defining the quadrants), but is it reasonable that when the Oplev beam is well centered, ~50% of the power is not measured? I couldn't find any sneaky digital gains between the quadrant channels to the sum channel either... But in the Oplev setup, the QPD had ~250uW of power incident on it, and was reporting a sum of ~13,000 counts with a transimpedance gain of 100kohm, so at least the scaling seems to hold...
I guess we wan't to monitor this over a few days, see how stationary the noise profile is etc. I didn't look at the spectrum of the intensity noise during this time.
Pizza mail didn't go out last weekend - looking at logfile, it seems like the "sendmail" service was missing. I installed sendmail following the instructions here: https://tecadmin.net/install-sendmail-server-on-centos-rhel-server/
Except that to start the sendmail service, I used systemctl and not init.d. i.e. I ran systemctl start sendmail.service (as root). Test email to myself works. Let's see if it works this weekend. Of course this isn't so critical, more important are the maintenance emails that may need to go out (e.g. disk usage alert on chiara / N2 pressure check, which looks like nodus' responsibilities).
I don't think this is really a problem - we offload to the fast channels and not to the slow (although we really should offload to the slow channels). I think the best approach is to use the ezcaservo utility to offload the DC part of the ASS control signals to the slow channels, so as to not waste fast channel DAC counts on DC offsets. In principle, this approach should be somewhat immune to the slow channel calibration not being perfect.
While staring at epics records all day I noticed something about the PIT/YAW offset sliders and ASS offset offloading to slow channels scripts that I'm not sure others are aware off, so I'll briefly discuss it in this post.
The PIT and YAW sliders directly control soft channels that are hosted on the slow machine. Secondary epics records disentangle them for the individual coils:
These channels are the direct input for the physical output channels that generate the control voltage.
The fast channels for PIT and YAW have a numerical correction factor built in that accounts for differences between the OSEMs, but the slow channels don't. This means that the slow PIT/YAW controls are not entirely orthogonal but have crosstalk on the order of 10 percent. This in itself is not that dramatic, however the offload offsets scripts for the dither alignment use the fast PIT/YAW values as inputs, which represent the necessary adjustments to the OSEMs only after the individual correction factors have been applied. The offloading to slow knows nothing of this calibration difference between the OSEMs. The result is that there is a ~10 percent of the offset correction error on the mirror alignment AFTER offloading. This will of course converge after a few iterations, but in any case it is recommendable to run the dither alignment again after offloading and not offload the new offsets to the fast channels.
[Koji, Jamie(remote), gautam]
Problems raised in elogs in the thread of 13474 and also 13436 seem to be solved.
I would make a detailed post with how the problems were fixed, but unfortunately, most of what we did was not scientific/systematic/repeatable. Instead, I note here some general points (Jamie/Koji can addto /correct me):
Some other general remarks:
Koji suggested trying to simply retsart the ASS model to see if that fixes the weird errors shown in Attachment #2. This did the trick. But we are now faced with more confusion - during the restart process, the various indicators on the CDS overview MEDM screen froze up, which is usually symptomatic of the machines being unresponsive and requiring a hard reboot. But we waited for a few minutes, and everything mysteriously came back. Over repeated observations and looking at the dmesg of the frontend, the problem seems to be connected with an unresponsive NFS connection. Jamie had noted sometime ago that the NFS seems unusually slow. How can we fix this problem? Is it feasible to have a dedicated machine that is not FB1 do the NFS serving for the FEs?
Looking at the dmesg on c1iscex for example, at least part of the problem seems to be associated with FB1 (192.168.113.201, see Attachment #1). The "server" can be unresponsive for O(100) seconds, which is consistent with the duration for which we see the MEDM status lights go blank, and the EPICS records get frozen. Note that the error timestamped ~4000 was from last night, which means there have been at least 2 more instances of this kind of freeze-up overnight.
I don't know if this is symptomatic of some more widespread problem with the 40m networking infrastructure. In any case, all the CDS overview screen lights were green today morning, and MC autolocker seems to have worked fine overnight.
I have also updated the wiki page with the updated daqd restart commands.
Unrelated to this work - Koji fixed up the MC overview screen such that the MC autolocker button is now visible again. The problem seems to do with me migrating some of the c1ioo EPICS channels from the slow machine to the fast system, as a result of which the EPICS variable type changed from "ENUM" to something that was not "ENUM". In any case, the button exists now, and the MC autolocker blinky light is responsive to its state.
I don't think the problem is fb1. The fb1 NFS is mostly only used during front end boot. It's the rtcds mount that's the one that sees all the action, which is being served from chiara.
Yesterday, while we were bringing the CDS system back online, we noticed that the control room wall StripTool traces for the seismic BLRMS signals did not come back to the levels we are used to seeing even after restarting the PEM model. There are no red lights on the CDS overview screen indicative of DAQ problems. Trending the DQ-ed seismometer signals (these are the calibrated (?) seismometer signals, not the BLRMS) over the last 30 days, it looks like
I poked around at the electronics rack (1X5/1X6) which houses the 1U interface box for these signals - on its front panel, there is a switch that has selectable positions "UVW" and "XYZ". It is currently set to the latter. I am assuming the former refers to velocities in the xyz directions, and the latter is displacement in these directions. Is this the nominal state? I didn't spend too much time debugging the signal further for now.