I'm not sure if this has something to do with the model restarts / new RCG, but while I was re-enabling the MC watchdogs, I noticed the RMS sensor voltage channels on ITMX hovering around ~100mV, even though local damping was on (in which configuration I would expect <1mV if everything is working normally). I was confused by this behaviour, and after staring at the ITMX suspension screen for a while, I noticed that the input to the "ASCP" and "ASCY" servos were "-nan", and the outputs were 10^20 cts (see Attachment #1).
Digging a little deeper, I found that the same problem existed on ITMY, ETMX, ETMY, PRM (but not BS or SRM) - reasons unknown for now.
I have to check where this signal is coming from, but for now I just turned the "ASC Input" switch off. More investigation to be done, but in the meantime, ASS dither alignment may not be possible.
After consulting with Jamie, I have just disabled all outputs to the suspensions other than local damping loop outputs. I need to figure out how to get this configuration into the safe.snap file such that until we are sure of what is going on, the models start up in this safer configuration.
gedit 28 Oct 0026: Seems like this problem is seen at the sites as well. I wonder if the problem is related.
Today, with Johannes' help, I cleaned the fiber tips of the photodiodes. The effect of the cleaning was dramatic - see Attachments #1-4, which are X Beat PD, axial illumination, X Beat PD, oblique illumination, Y beat PD, axial illumination, Y beat PD, oblique illumination. They look much cleaner now, and the feature that looked like a scratch has vanished.
The cleaning procedure followed was:
I will repeat this procedure for all fiber connections once I start putting the box back together - I'm almost done with the new box, just waiting on some hardware to arrive.
Today, I borrowed the fiber microscope from Johannes and took a look at the fibers coupled to the PDs. The PD labelled "BEAT PD AUX Y" has an end that seems scratched (Attachments #1 and #2). The scratch seems to be on (or at least very close to) the core. The other PD (Attachments #3 and #4) doesn't look very clean either, but at least the area near the core seems undamaged. The two attachments for each PD corresponds to the two available lighting settings on the fiber microscope.
I have not attempted to clean them yet, though I have also borrowed the cleaning supplies to facilitate this from Johannes. I also plan to inspect the ends of all other fiber connections before re-installing them.
Seems like this modification didn't really work. There were several large MC1 glitches, and one of them misaligned MC1 so much that the IMC didn't relock for the last ~6 hours. I re-aligned MC1 manually, and now it is locked fine.
Now that all the CDS overview lights are green, I decided to switch back the coil driver outputs to their original state so that the MC optics could be damped and the IMC relocked. I also restored the static PIT/YAW bias values to their original values.
MC1 has been quiet over the last couple of days, lets see how it behaves in the next few days. In all the glitches I have observed, if the IMC is locked and WFS loops are enabled, the loops are able to correct for the DC misalignment caused by the glitch. But the mcwfs off script is currently set up in such a way that the output history is cleared between IMC locks. I made two copies of the mcwfson/mcwfsoff scripts, called mcwfsunhold/mcwfshold respectively. They live in /opt/rtcds/caltech/c1/scripts/MC/WFS. I've also modified the autolocker script to call these modified scripts, such that when the IMC loses lock, the WFS servo outputs are held, while the input is turned off. The hope is that in this configuration, the autolocker can catch a lock even if there is a glitch on MC1.
I haven't tried locking the arms yet, but I think other IFO work discussed at the meeting (like arm loss estimation / cavity scans etc) can proceed.
that's why the Autolocker clears the outputs; we don't want to be holding the offsets from the last ms of lock when it was all messed up; instead it would be best to have a slow (~mHz) relief script that takes the WFS controls and puts them onto the MC SUS sliders. This would then re-align the MC to the input beam rather than the input to the MC. Which is not the best idea.
Seems like this modification didn't really work.
I spent some time today trying to debug this issue.
Jamie and I had opened up the c1sus frontend to try and replace the RFM card before we realized that the problem was in the RCG code generator. During this process, we had disconnected all of the back-panel cabling to this machine (2 ethernet cables, dolphin cable, and RFM cables/fibers). I thought I may have accidentally returned the cables to the wrong positions - but all the status indicator lights indicate that everything is working as it should, and I also confirmed that the cabling is as it is in the pictures of the rack on the wiki page.
Looking at the SimuLink model diagram (see Attachment #1 for example), it looks like (at least some of) these channels are actually on the dolphin network, and not the RFM network (with which we were experiencing problems). This suggests that the problem is something deeper. Although I did see nans in some of the ETMX ASC channels as well, for which the channels are piped over the RFM network. Even more puzzling is that the ASC MEDM screen (Attachment #3) and the SimuLink diagram (Attachment #2) suggest that there is an output matrix in between the input signals and the output angular control signals to the suspensions. As Attachment #4 shows, the rows corresponding to ITMX PIT and YAW are zero (I confirmed using z read <matrixElement>). Attachment #3 shows that the output of all the servo banks except CARM_YAW is zero, but CARM_YAW has no matrix element going to the ITMs (also confirmed with z read <servoOutputChannel>). So 0 x 0 should be 0, but for some reason the model doesn't give this output?
z read <matrixElement>
z read <servoOutputChannel>
GV Edit: As EricQ just pointed out to me, nan x 0 is still nan, which probably explains the whole issue. Poking a little further, it seems like this is an SDF issue - the SDF table isn't able to catch differences for this hold output channel.
As I was writing this elog, I noticed that, as mentioned above, the CARM_YAW output was "nan". When I restart the model (thankfully this didn't crash c1lsc!), it seems to default to this state. Opening up the filter module, I saw that the "hold output" was enabled.
All the points above stand - CARM_YAW output shouldn't have been going anywhere as per the output matrix, but it seems to have been responsible? Seems like a bug in any case if a model restarts with a field as "nan".
Anyways the problem seems to have been resolved so I'm going to try locking and dither aligning the arms now.
Rolf mentioned that a simple update could fix several of the CDS issues we are facing (e.g. inability to open up testpoints), but he didn't seem to have any insight into this particular issue. Jamie will try and recompile all the models and then we have to see if that fixes the remaining problems.
Leaving LSC mode OFF for now while CDS is still under investigation
Not really related to this work: We saw that the safe.snap file for c1oaf seems to have gotten overwritten at some point. I restored the EPICS values from a known good time, and over-wrote the safe.snap file.
In the aftermath of the accidental vent, it looks like the RGA was shutdown.
We followed the instructions in this elog to restart the RGA.
Seems to be working now, Steve says we just need to wait for it to warm up before we can collect a reliable scan.
We have good RGA scan now. There was no scan for 3 months.
At Rolf/Rich Abbott's request, we performed a check of the UPS today.
Steve believed that the UPS was functioning as it should, and the recent accidental vent was because the UPS batteries were insufficiently charged when the test was performed. Today, we decided to try testing the UPS.
We first closed V1, VM1 and VA6 using the MEDM screen. We prepared to pull power on all these valves by loosening the power connections (but not detaching them). [During this process, I lost the screw holding the power cord fixed to the gate valve V1 - we are looking for a replacement right now but it seems to be an odd size. It is cable tied for now.]
The battery charge indicator LEDs on the UPS indicated that the batteries were fully charged.
Next, we hit the "Test" button on the UPS - it has to be held down for ~3 seconds for the test to be actually initiated, seems to be a safety feature of the UPS. Once the test is underway, the LED indicators on the UPS will indicate that the loading is on the UPS batteries. The test itself lasts for ~5seconds, after which the UPS automatically reverts to the nominal configuration of supplying power from the main line (no additional user input is required).
In this test, one of the five battery charge indicator LEDs went off (5 ON LEDs indicate full charge).
So on the basis of this test, it would seem that the UPS is functioning as expected. It remains to be investigated if the various hardware/software interlocks in place will initiate the right sequence of valve closures when required.
Never hit O on the Vacuum UPS !
Note: the " all off " configuration should be all valves closed ! This should be fixed now.
In case of emergency you can close V1 with disconnecting it's actuating power as shown on Atm3 if you have peumatic pressure 60 PSI
In case you want to use it, I had profiled the Lightwave NPRO sometime back, and we were even using it as the AUX X laser for a short period of time.
As for using the AS laser for mode spectroscopy: don't we want to match the beam into the cavity as best as possible, and then use some technique to disturb the input mode (like the dental tooth scraper technique from Chris Mueller's thesis)?
Johannes and I did an arm scan of the X arm today (arm controlled with ALS, monitoring IR transmission) - only 2 IR FSRs were scanned, but there should be sufficient information in there to extract the modulation depth and mode matching - can we use Kaustubh's/Naomi's code?. The Y arm ALS needs to be touched up so I don't have a Y arm scan yet. Note that to get a good arm scan measurement, the High Gain Thorlabs PD should be used as the transmission PD.
I worked a little bit on the Y arm ALS today.
I am now going to measure the OLTFs of both green PDH loops to check that the overall loop gain is okay, and also check the measurement against EricQ's LISO model of the (modified) AUX green PDH servos. Results to follow.
Some weeks ago, I had moved some of the Green steering optics on the PSL table around, in order to flip some mirror mounts and try and get angles of incidence closer to ~45deg on some of the steering mirrors. As a result of this work, I can see some light on the GTRY CCD when the X green shutter is open. It is unclear if there is also some scattered light on the RFPDs. I will post pictures + a more detailed investigation of the situation on the PSL table later, there are multiple stray green beams on the PSL table which should probably be dumped.
As I was writing this elog, I saw the X green lock drop abruptly. During this time, the X arm stayed locked to the IR, and the Y arm beat on the control room network analyzer did not jump (at least not by an amount visible to the eye). Toggling the X end shutter a few times, the green TEM00 lock was re-acquired, but the beatnote has moved on the control room analyzer by ~40MHz. On Friday evening however, the X green lock held for >1 hour. Need to keep an eye on this.
Attachment #1 shows the results of my measurements tonight (SR785 data in Attachment #2). Both loops have a UGF of ~10kHz, with ~55 degrees of phase margin.
Excitation was injected via SR560 at the PDH error point, amplitude was 35mV. According to the LED indicators on these boxes, the low frequency boost stages were ON. Gain knob of the X end PDH box was at 6.5, that of the Y end PDH box was at 4.9. I need to check the schematics to interpret these numbers. GV Edit: According to this elog, these numbers mean that the overall gain of the X end PDH box is approx. 25dB, while that of the Y end PDH box is approx. 15dB. I believe the Y end Lightwave NPRO has an actuator discriminant ~5MHz/V, while the X end Innolight is more like 1MHz/V.
Not sure what to make of the X PDH loop measurement being so much noisier than the Y end, I need to think about this.
More detailed analysis to follow.
I had some trouble getting the daqd processes up and running again using Jamie's instructions.
With Jamie's help however, they are back up and running now. The problem was that the mx infrastructure didn't come back up on its own. So prior to running sudo systemctl restart daqd_*, Jamie ran sudo systemctl start mx. This seems to have done the trick.
c1iscey was still showing red fields on the CDS overview screen so Jamie did a soft reboot. The machine came back up cleanly, so I restarted all the models. But the indicator lights were still red. Apparently the mx processes weren't running on c1iscey. The way to fix this is to run sudo systemctl start mx_stream. Now everything is green.
Now we are going to work on trying the fix Rolf suggested on c1iscex.
It turns out the problem was just a bent pin on the SCSI cable, likely from having to stretch things a bit to reach optimus from the RAID unit.
I hooked it up to megatron, and it was automatically recognized and mounted.
I had to turn off the new FB machine and remove it from the rack to be able to access megatron though, since it was just sitting on top. FB needs a rail to sit on!
At a cursory glance, the filesystem appears intact. I have copied over the achived DRFPMI frame files to my user directory for now, and Gautam is going to look into getting those permanently stored on the LDAS copy of 40m frames, so that we can have some redundancy.
Also, during this time, one of the HDDs in the RAID unit failed its SMART tests, so the RAID unit wanted it replaced. There were some spare drives in a little box directly under the unit, so I've installed one and am currently incorporating it back into the RAID.
There are two more backup drives in the box. We're running a RAID 5 configuration, so we can only lose one drive at a time before data is lost.
Here is what was done (Jamie will correct me if I am mistaken).
So while we are in a better state now, the problem isn't fully solved.
Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.
After getting the go ahead from Jamie, I recompiled all the FE models against the same version of RCG that we tested on the c1iscex models.
To do so:
IFO alignment needs to be redone, but at least we now have a (admittedly rounabout way) of getting testpoints. Did a quick check for "nan-s" on the ASC screen, saw none. So I am re-enabling watchdogs for all optics.
GV 23 August 9am: Last night, I re-aligned the TMs for single arm locks. Before the model restarts, I had saved the good alignment on the EPICs sliders, but the gain of x3 on the coil driver filter banks have to be manually turned on at the moment (i.e. the safe.snap file has them off). ALS noise looked good for both arms, so just for fun, I tried transitioning control of both arms to ALS (in the CARM/DARM basis as we do when we lock DRFPMI, using the Transition_IR_ALS.py script), and was successful.
I completed the revamp of the box, and re-installed the box on the PSL table today. I think it would be ideal to install this on one of the electronic racks, perhaps 1X2 would be best. We would have to re-route the fibers from the PSL table to 1X2, but I think they have sufficient length, and this way, the whole arrangement is much cleaner.
Did a quick check to make sure I could see beat notes for both arms. I will now attempt to measure the ALS noise with this revamped box, to see if the improved power supply and grounding arrangement, as well as fiber cleaning, has had any effect.
Photos + power budget + plan of action for using this box to characterize the green PDH locking to follow.
For quick reference: here is the AM/PM measurement done when we re-installed the repaired Innolight NPRO on the new X endtable.
Since the single arm locking and dither alignment seemed to work alright after the CDS overhaul, I decided to try some recycling cavity locking tonight.
Why should this have changed? I was just on the AS table and did re-center the beam onto the REFL 55 RFPD, but I had also done this in April/May when I was last doing DRMI locking. But I can't explain the apparent factor of ~4 increase in light level. I think I have some measurements of the light levels at various PDs from April 2017, I will see how the present levels line up.
Of course dataviever won't cooperate when I am trying to monitor testpoints.
I may be missing something obvious, but I am quitting for tonight, will look into this more tomorrow.
Unrelated to this work: looking at the GTRY spot on the CCD monitor, there seems to be some excess angular motion. Not sure where this is coming from. In the past, this sort of problem has been symptomatic of something going wonky with the Oplev loops. But I took loop measurements for ITMY and ETMY PIT and YAW, they look normal. I will investigate further when I am doing some more ALS work.
A couple of weeks ago, I was trying to modernize the python version of the FSS Slow temperature control loops, when I accidentally ended up deleting it . There was no svn backup. So the old Perl PID script has been running for the last few days.
Today, I checked out the latest version that Andrew and co. have running in the PSL lab. I had to make some important modifications for the script to work for the 40m setup.
python FSSSlow.py -i FSSSlowPy.ini
Then I stopped the Perl process on megatron by running
sudo initctl stop FSSslow
and started the Python process by running
sudo initctl start FSSslowPy
I have now committed the files FSSSlow.py and FSSSlowPy.ini to the 40m svn. Things seem to be stable for the last 20 mins or so, let's keep an eye on this though - although we had been running the Python PID loop for some months, this version is a slightly modified one.
The initctl stuff still isn't very robust - I think both the Autolocker and the FSS slow servos have to be manually restarted if megatron is shutdown/restarted for whatever reason. It doesn't seem to be a problem with the initctl routine itself - looking at the logs, I can see that init is trying to start both processes, but is failing to do so each time. To be investigated. The wiki procedure to restart this process is up to date.
GV Edit 0000 25 Aug 2017: I had to add a line to the script that checks MC transmission before enabling the PID loop. Change has been committed to svn. Now, when the MC loses lock or if the PSL shutter is kept closed for an extended period of time, the temperature loop doesn't rail.
I tried some DRMI locking again tonight, but had no success. Here is the story.
Looks like I will have to embark on the REFL55 LSC electronics investigation. I was able to successfully lock the PRC on carrier and sideband, and the Michelson lock also seems to work fine, all of which seem to point to a hardware problem with the REFL55 signal chain.
I did a quick check by switching the output of the REFL55 demod board to the inputs normally used by AS55 signals on the whitening board. Setting the whitening gain to +18dB for these channels had the same effect - ADC overflow galore. So looks like the whitening board isn't to blame. I will have to check the demod board out.
Looks like MC1 got another big kick just under 4 hours ago. None of the other optics show any evidence of a glitch so it seems unlikely that this was some sort of global event. It's been well behaved for ~2weeks now. IMC was unlocked. I manually re-aligned MC1, at which point the autolocker was able to lock the IMC.
Looking at this plot, it seems that LR and UL coils seem to have the largest kicks. UR barely saw it. Not sure what (if anything) to make of this - apparently the optic moved by ~20urad with the UR magnet approximately the pivot.
Attachment #1 - Photo of the revamped beat setup. The top panel has to be installed. New features include:
Attachment #2 - Power budget inside the box. Some of these FC/APC connectors seem to not offer good coupling between the two fibers. Specifically, the one on the front panel meant to accept the PSL light input fiber seems particularly bad. Right now, the PSL light is entering the box through one of the front panel connectors marked "PSL + X out". I've also indicated the beat amplitude measured with an RF analyzer. Need to do the math now to confirm if these match the expected amplitudes based on the power levels measured.
Attachment #3 - We repeated the measurement detailed here. The X arm (locked to IR) was used for this test. The "X" delay line electronics were connected to the X green beat PD, while the "Y" delay line electronics were connected to the X IR beat PD. I divided the phase tracker Hz calibration factor by 2 to get IR Hz for the Y arm channels. IR beat was at ~38MHz, green beat was at ~76MHz. The broadband excess noise seen in the previous test is no longer present. Indeed, below ~20Hz, the IR beat seems less noisy. So seems like the cleaning / electronics revamp did some good.
Further characterization needs to be done, but the results of this test are encouraging. If we are able to get this kind of out of loop ALS noise with the IR beat, perhaps we can avoid having to frequently fine-tune the green beat alignment on the PSL table. It would also be ideal to mount this whole 1U setup in an electronics rack instead of leaving it on the PSL table.
GV Edit: I've added better photos to the 40m Google Photos page. I've also started a wiki page for this box / the proposed IR ALS system. For the moment, all that is there is the datasheet to the Fiber Couplers used, I will populate this more as I further characterize the setup.
What are the critical filesystems? I've also indicated the size of these disks and the volume currently used, and the current backup situation.
Not backed up
LDAS pulls files from nodus daily via rsync, so there's no cron job for us to manage. We just allow incoming rsync.
Local backup on /media/40mBackup on chiara via daily cronjob
Remote backup to ldas-cit.ligo.caltech.edu::40m/cvs via daily cronjob on nodus
Currently mounted on Megatron, not backed up.
Then there is Optimus, but I don't think there is anything critical on it.
So, based on my understanding, we need to back up a whole bunch of stuff, particularly the boot disks and root filesystems for Chiara, Megatron and Nodus. We should also test that the backups we make are useful (i.e. we can recover current operating state in the event of a disk failure).
Please edit this elog if I have made a mistake. I also don't have any idea about whether there is any sort of backup for the slow computing system code.
It is unclear when this was last done, and since I modified the coil driver electronics for the ITMs and BS recently, I figured it would be useful to get this calibration done. The primary motivation was to see if we could resolve the discrepancy between the current ALS noise (using POX as a sensor) compared to the Izumi et. al. plot.
Because we are planning to change the coil driver electronics further soon anyways, we decided to do the calibration at a single frequency for tonight. For future reference, the extension of this method to calibrate the actuator over a wider range of frequencies is here. The procedure followed, and the relevant numbers from tonight, are as follows.
Once these calibrations were updated, we decided to control the arms with ALS, and look at the POX spectrum. Y-arm ALS wasn't so stellar tonight, especially at low frequencies. I can see the GTRY spot moving on the CCD monitor, so something is wonky. To be investigated. But the X arm ALS noise looked pretty good.
Seems like updating the calibration did the job; see the attached comparison plot.
I was having a chat with EricQ about this today, just noting some points from our discussion down here so that I remember to look into this tomorrow.
Can we make use of the Jetstor raid array for some kind of consolidated 40m CDS backup system? Once we've gotten everything of interest out of it...
Last night, while we were working on the ALS, I noticed the GTRY spot moving around (in PITCH) on the CCD monitor in the control room at ~1-2Hz. The operating condition was that the arm was locked to the IR, and the PSL green shutter was closed, so that only the arm transmissions were visible on the CCD screens. There was no such noticable movement of the GTRX spot. When looking at the out-of-loop ALS nosie in this configuration (but now with the PSL green shutter open of course), the Y arm ALS noise at low frequencies was much worse than the X arm.
Today, I looked into this a little more. I first checked that the Y-endtable enclosure was closed off as usual (as I had done some tweaking to the green input pointing some days ago). There are various green ghost beams on the Y-endtable. When time permits, we should make an effort to cleanly dump these. But the enclosure was closed as usual.
Then I looked at the in-loop Oplev error signal spectra for the ITMY and ETMY Oplev loops. There was high coherence between ETMYP Oplev error signal and GTRY. So I took a loop transfer function measurement - the upper UGF was around 3.5Hz. I increased the loop gain such that the upper UGF was around 4.5Hz, with phase margin ~30degrees. Doing so visibly reduced the angular movement of the GTRY spot on the CCD. Attachment #1 shows the Oplev loop TF after the gain increase, while Attachment #2 compares the GTRX and GTRY spectra (DC value is approximately the same for both, around 0.4). GTRY still seems a bit noisier at low frequencies, but the out-of-loop ALS noise for the Y arm now lines up much more closely with its reference trace from a known good time.
Y-arm ALS wasn't so stellar tonight, especially at low frequencies. I can see the GTRY spot moving on the CCD monitor, so something is wonky. To be investigated.
MC autolocker and FSS loops were stuck because c1psl was unresponsive. I rebooted it and did a burtrestore to enable PSL locking. Then the IMC locked fine.
c1susaux and c1iscaux were also unresponsive so I keyed those crates as well, after taking the usual steps to avoid ITMX getting stuck - but it still got stuck when the Sat. Box. connectors were reconnected after the reboot, so I had to shake it loose with bias slider jiggling. This is annoying and also not very robust. I am afraid we are going to knock the ITMX magnets off at some point. Is this problem indicative of the fact that the ITMX magnets were somehow glued on in a skewed way? Or can we make the situation better by just tweaking the OSEM-holding fixtures on the cage?
In any case, I've started listing stuff down here for things we may want to do when we vent next.
A couple of minutes ago, Larry W swapped the fibers to our 40m Edgeswitch (BROCADE FWS 648G) to a faster connection. This is the switch to which our gateway machine, NODUS, is connected. The actual swap itself happened at the core router in Bridge, and took only a few seconds. After the switch, I double checked that I was able to ssh into nodus from my laptop, and Larry informed me that everything is working as expected on his end.
Larry also tells us that the other edgeswitch at the 40m (Foundry Networks), to which most of our GC network machines are connected, is a 100MBPS switch, and so we should re-route the connections from this switch to the BROCADE switch at our convenience to take advantage of the faster connection.
Today I tried debugging the mysterious increase in REFL55 signal levels in the DRMI configuration. I focused on the demod board, because last week, I had tried routing these signals through different channels on the whitening board, and saw the same effect.
Based on my tests, everything on the Demod board seems to work as expected. I need to think more about what else could be happening here - specifically do a more direct test on the whitening board.
All connections have been restored untill further debugging later in the evening.
We did an ingenious checkup of the whitening board tonight.
I've restored all connections at that we messed with at the LSC rack to their original positions.
The TT alignment seems to be drifting around more than usual after we disconnected one of the channels - when I came in today afternoon, the spot on the AS camera had drifted by ~1 spot diameter so I had to manually re-align TT1.
After our Demod/Whitening electronics investigations suggested nothing obviously wrong, I decided to give DRMI locking another go tonight.
Surprisingly, there was no evidence of REFL55 behaving weirdly tonight, and I was able to easily lock the DRMI on 1f error signals using the recipe I've been using in the last few months.
Not sure what to make of all this .
I got in a ~15 minute lock, but I wasn't prepared to do any sort of characterization/ sensing / attempt to turn on coil-dewhitening, and I'm too tired to try again tonight. I was however able to whiten the error signals, as I have been able to do in the past. There is a ~45Hz bump in MICH that I haven't seen in the past.
I'll try and do some characterization tomorrow eve, but it's encouraging to at least get back to the pre-FB-failure state of locking.
There was a pretty large glitch in MC1 about an hour ago. The misalignment was so large that the autolocker wasn't able to lock the IMC. I manually re-aligned MC1 using the bias sliders, and now IMC locks fine. Attached is a 90 second plot of 2K data from the OSEMs showing the glitch. Judging from the wall StripTool, the IMC was well behaved for ~4 hours before this glitch - there is no evidence of any sort of misalignment building up, judging from the WFS control signals.
I re-enabled the MC SUS damping and IMC locking for some IFO work just now.
MC1, MC2 and MC3 damping turned off to see glitching action at 9:57am
I haven't done an exhaustive check just yet, but I have loaded a few testpoints in dataviewer, and ran a script that use testpoint channels (specifically the ALS phase tracker UGF setting script), all seems good.
So if I remember correctly, the major CDS fix now required is to solve the model unloading issue.
Thanks to Jamie/Jonathan Hanks/KT for getting us back to this point! Here are the details:
After reading logs and code, it was a simple daqdrc config change.
The daqdrc should read something like this:
configure channels begin end;
What had happened was tpconfig was put before the configure channels
begin end. So when daqd_rcv went to configure its test points it did
not have the channel list configured and could not match test points to
the right model & machine. Dave and I suspect that this is so that it
can do an request directly to the correct front end instead of a general
broadcast to all awgtpman instances.
Simply reordering the config fixes it.
I tested by opening a test point in dataviewer and verifiying that
testpoints had opened/closed by using diag -l. Xmgr/grace didn't seem
to be able to keep up with the test point data over a remote connection.
You can find this in the logs by looking for entries like the following
while the daqd is starting up. When we looked we saw that there was an
entry for every model.
Unable to find GDS node 35 system c1daf in INI fiels
I did some work today to see if I could use the IR beat for ALS control. Initial tests were encouraging.
I will now embark on the noise budgeting.
I am leaving the green beat electronics on the PSL table in the switched state for further testing...
Now that the DRMI locking seems to be repeatable again, I want to see if I can improve the measured MICH noise. Recall that the two dominant sources of noise were
In preparation for some locking attempts today evening, I did the following:
Hopefully, I can successfully engage a similar transition tonight with the DRMI locked. The main difference compared to this daytime test is going to be that the MICH control signal is also going to be routed to the BS.
Tasks for tonight, if all goes well:
Unrelated to this work: the PMC was locked near the upper rail of the PZT, so I re-locked it closer to the middle of the range.
Tonight, I was able to lock the DRMI, turn on the whitening filters for the sensing PDs, and also turn on the coil de-whitening filters for ITMX, ITMY and BS. However, I didn't see the expected improvement in the MICH spectrum between ~50-300 Hz . Sad.
I basically went through the list of tasks I made in the previous elog. Some notes:
Attachment #1: Comparison of MICH_ERR with and without the BS de-whitening. Note that the two ITMs have their coils de-whitened in both sets of traces.
Attachment #2: Spectra of MICH output and one of the BS coil outputs in both states. The DAC RMS increases by ~30x when the de-whitening is engaged, but is still well within limits.
So it looks like the switching of paths is happening correctly. The "CDS BIO STATUS" MEDM screen also shows the appropriate bits toggling when I turn the de-whitening on/off. There is no broadband coherence with MCF between 50-300 Hz so it seems unlikely that this could be frequency noise.
Clearly I am missing something. But anyways I have a good amount of data, may be useful to put together the post CDS/electronics modification DRMI noise budget. More analysis to follow.
I was unable to download data using nds2. Gabriele had reported similar problems a week ago but I hadn't followed up on this.
I repeated steps 5-7 from elog 13161, and now it seems that I can get data from the nds2 servers again. Unclear why the nds2 server had to be restarted. I wonder if this is somehow related to the mysterious acromag EPICS server tmux session dropout.
MC autolocker was not working - PCdrive was railed at its upper rail for ~2 hours judging by the wall StripTool trace. I tried restarting the init processes on megatron, but that didn't fix the problem. The reason seems to have been related to c1iool0 failing - after keying the crate, autolocker came back fine and MC caught lock almost immediately.
Additionally, c1susaux, c1auxex,c1auxey and c1iscaux are also down. I'm not planning on using the IFO tonight so I am not going to reboot these now.
So it would seem that there is some other noise which has a 1/f^2 shape and is at the same level we expected the DAC noise to be at. Rana suggested checking coherence with MC transmission to see if this could be laser intensity noise.
I also want to re-do the actuator calibrations for the vertex optics again before re-posting the revised noise budget.
A wiper script is not yet set up for our new Frame-Builder. The disk usage is ~80% now, so I think we should start running a wiper script that manages overall disk usage and deletes old frame files to this end.
From what I could find on the elog, the way this was done was by running a cron job on FB. There is a perl script, /opt/rtcds/caltech/c1/target/fb/wiper.pl, which from what I could understand, runs a bunch of du commands on different directories to determine if there is a need to delete any files.
I copied this script over to /opt/rtcds/caltech/c1/target/daqd/wiper.pl. This is the directory in which all the new FB stuff resides. Conveniently, the script has a "dry-run" option, which I tried running on FB1. However, I get the following error message:
So it would seem that for some reason, the du commands aren't working. From what I could tell, there aren't any directory paths specific to the old FB machine that need to be changed. I believe the script was working prior to the FB disk crash - unfortunately it doesn't look like this script was under version control but I don't think any changes have been made to this script.
I've been working on analyzing the data from the DRMI locks last week.
Here are the results of the sensing measurement.
The plotting utility is a work in progress - I've basically adapted EricQs scripts and added a few features like plotting the uncertainties in magnitude and phase of the calculated sensing elements. Possible further stuff to implement:
Also, the value I've used for the BS actuator calibration is not a measured one - rather, I estimated what it will be by scaling the old value by the same ratio which the ITMs have changed by post de-whitening board mods. The ITM actuator coefficients were recently measured here. I will re-do the BS calibrations over the weekend.
Noise budgeting to follow - it looks like I didn't set the AS55 demod phase to the previously determined optimal value of -82degrees, I had left it at -42 degrees. To be fixed for the next round of locking.
I downloaded a segment of data from the time when the DRMI was locked with the BS and ITM coil driver de-whitening switched on, and looked at coherence between MC transmission and the MICH error signal. Attachment #1 doesn't show any broadband high coherence between 60-300Hz, so it cannot explain the noise in the full range between 60-300Hz.
The DQ channel for the MC transmission is recorded at 1024 kHz, so to calculate the coherence, I had to decimate the 16K MICH data.
Since we have the AOM installed, I suppose we can actually measure the intensity noise coupling to MICH by driving a line in the AOM.
I also checked for coherence in the 60-300Hz band between MICH/PRCL and MICH/SRCL, and didn't see any appreciable coherence. Need to think about this more.
Rana suggested checking coherence with MC transmission to see if this could be laser intensity noise.
After trying to debug this issue using the Perl debugger, I concluded that the problem is in the part of the code that splits the output of the "du" command into directory and disk usage. For whatever, reason, this isn't working. The version of perl running on the new FB1 machine is 5.20.2, whereas I suspect the version running on the old FB machine was 5.14.2 (which is the version on all the Ubuntu 12 workstations and megatron). Unclear whether downgrading the Perl version is the right way to go.
The FB1 disk is now getting close to full, the usage is up to 85% today.
It is a little different - specifically, the way the splitting of the output of the "du" command into disk usage and directory is different (see Attachment #1). Apart from this, some of the parameters (e.g. what percentage to keep free) are different.
I changed the percentages to match what we had here, and edited a couple of other lines to print out the files that will be deleted. The dry run seemed to work okay, it produced the output below. Not sure why "df -h" reports a different use percentage though...
Since the script seems to be working now, I am going to set it up on FB1's crontab. Thanks Chris!.
controls@fb1:/opt/rtcds/caltech/c1/target/daqd 0$ ./wiper.pl
Mon Sep 18 17:47:06 PDT 2017
Dry run, will not remove any files!!!
You need to rerun this with --delete argument to really delete frame files
Directory disk usage:
Combined 20167664476k or 19694984m or 19233Gb
/frames size 25097525144k at 80.36%
/frames is below keep value of 85.00%
Will not delete any files
df reported usage 80.36%
controls@fb1:/opt/rtcds/caltech/c1/target/daqd 0$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 2.0T 1.7T 152G 92% /
udev 10M 0 10M 0% /dev
tmpfs 13G 177M 13G 2% /run
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda2 19G 3.7G 14G 21% /var
/dev/sda1 461M 65M 373M 15% /boot
/dev/sdb1 24T 19T 3.5T 85% /frames
192.168.113.104:/home/cds/rtcds 2.0T 1.6T 291G 85% /opt/rtcds
192.168.113.104:/home/cds/rtapps 2.0T 1.6T 291G 85% /opt/rtapps
tmpfs 6.3G 0 6.3G 0% /run/user/1001
Attached is the version of the wiper script we use on the CryoLab cymac. It works with perl v5.20.2. Is this different from what you have?
I did a further check on the wiper script by changing the "percent_keep" from 85.0 to 75.0, and running the script in "dry_run" mode again. The script then output to console the names of all the files it would delete in order to free up the required amount of space (but didn't actually delete any files as it was a dry run). Seemed to be sensible.
To set up the cron job, I did the following on FB1:
controls@fb1:~ 0$ sudo systemctl status cron.service
● cron.service - Regular background program processing daemon
Loaded: loaded (/lib/systemd/system/cron.service; enabled)
Active: active (running) since Mon 2017-09-18 18:16:58 PDT; 27min ago
Main PID: 30183 (cron)
└─30183 /usr/sbin/cron -f
Sep 18 18:16:58 fb1 cron: (CRON) INFO (Skipping @reboot jobs -- not system startup)
Sep 18 18:17:01 fb1 CRON: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 18 18:17:01 fb1 CRON: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 18 18:17:01 fb1 CRON: pam_unix(cron:session): session closed for user root
Sep 18 18:25:01 fb1 CRON: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 18 18:25:01 fb1 CRON: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 18 18:25:01 fb1 CRON: pam_unix(cron:session): session closed for user root
Sep 18 18:35:01 fb1 CRON: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 18 18:35:01 fb1 CRON: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Sep 18 18:35:01 fb1 CRON: pam_unix(cron:session): session closed for user root
controls@fb1:~ 0$ crontab -l
# Edit this file to introduce tasks to be run by cron.
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
# For more information see the manual pages of crontab(5) and cron(8)
# m h dom mon dow command
33 3 * * * /opt/rtcds/caltech/c1/target/daqd/wiper.cron
Let's see if this works.
I borrowed the HP impedance test kit from Rich Abbott today. The purpose is to profile the impedance of the NPRO PZTs, as part of the AUX PDH servo investigations. It is presently at the X-end. I will do the test in the coming days.
We set up a measurement of the AUX X laser AM today. Some notes:
Attachment #1 shows a preliminary scan from tonight - we looked at the region 10kHz-10MHz, with an IF bandwidth of 100Hz, 16 averages, and 801 log-spaced frequencies. The idea was to get an idea of where some promising notches in the AM lie, and do more fine-bandwidth scans around those points. Data + code used to generate this plot in Attachment #2.
Rana points out that some of the AM could also be coming from beam jitter - so to put this hypothesis to test, we will put a lens to focus the spot more tightly onto the PD, repeat the measurement, and see if we get different results.
There were a whole bunch of little illegal things Rana spotted on the EX table which he will make a separate post about.
I am running 40 more scans with the same params for some statistics - should be done by the morning.
I borrowed the HP impedance test kit from Rich Abbott today. The purpose is to profile the impedance of the NPRO PZTs, as part of the AUX PDH servo investigations. It is presently at the X-end. I will do the test in the coming days.
Update 12:00 21 Sep: Attachment #3 shows schematically the arrangement we use for the AM measurement. A similar sketch for the proposed PM measurement strategy to follow. After lunch, Steve and I will lay out a longish BNC cable from the LSC rack to the IOO rack, from where there is already a long cable running to the X end. This is to facilitate the PM measurement.
Update 18:30 21 Sep: Attachment #4 was generated using Craig's nice plotting utility. The TF magnitude plot was converted to RIN/V by dividing by the DC voltage of the PDA 55 of ~2.3V (assumption is that there isn't significant difference between the DC gain and RF transimpedance gain of the PDA 55 in the measurement band) The right-hand columns are generated by calculating the deviation of individual measurements from the mean value. We're working on improving this utility and aesthetics - specifically use these statistics to compute coherence, this is a work in progress. Git repo details to follow.
There are only 23 measurements (I was aiming for 40) because of some network connectivity issue due to which the script stalled - this is also something to look into. But this sample already suggests that these measurement parameters give consistent results on repeated measurements above 100kHz.
TO CHECK: PDA 55 is in 0dB gain setting, at which it has a BW of 10MHz (claimed in datasheet).
Some math about relation between coherence and standard deviation of transfer function measurements:
--- relation to variance in TF magnitude. We estimate the variance using the usual variance estimator, and can then back out the coherence using this relation.
--- relation to variance in TF phase. Should give a coherence profile that is consistent with that obtained using the preceeding equation.
It remains to code all of this up into Craig's plotting utility.
We laid out a 45m long BNC cable from the LSC rack to the IOO rack via overhead cable trays. There is ~5m excess length on either side, which have been coiled up and cable-tied for now. The ends are labelled "TO LSC RACK" and "TO IOO RACK" on the appropriate ends. This is to facilitate hooking up the output of the DFD for making a PM measurement of the AUX X laser. There is already a long cable that runs from the IOO rack to the X end.
I've been working on setting up some scripts for measuring the DAC noise.
In all the DRMI noise budgets I've posted, the coil-driver noise contribution has been based on this measurement, which could be improved in a couple of ways:
I only managed to get in measurements for the BS and ITMX today. ITMY to be measured later, and data/analysis to follow.
The ITMX and BS alignments have been restored after this work in case anyone else wants to work with the IFO.
Some slow machine reboots were required today - c1susaux was down, and later, the MC autolocker got stuck because of c1iool0 being unresponsive. I thought we had removed all dependency of the autolocker on c1iool0 when we moved the "IFO-STATE" EPICS variable to the c1ioo model, but clearly there is still some dependancy. To be investigated.
Gabriele reported problems with the nds2 server again. I restarted it again.
update: had to do it again at 1730 today - unclear why nds2 is so flaky. Log files don't suggest anything obvious to me...