As part of the ongoing debugging, I've switched the MC1 and MC3 satellite boxes. Both MC1 and MC3 have their watchdogs shudown for the moment.
In the last 3.5 hours, there has been nothing conclusive - no evidence of any glitching in either MC1 or MC3 sensor channels. I am going to hold off on doing the LEMO cable swap test for a few more hours, to see if we can rule out the satellite box.
I suppose before directory listings were turned off we should have fixed the script to make an index.html, but that's how it goes with "up"-grades. How about re-allow directory listing so that our old links for webview work again?
EQ: https://nodus.ligo.caltech.edu:30889/FE is live
I tried to follow these instructions today to make the Simulink Webview accessible:
controls@nodus|public_html > ln -sfn /users/public_html/FE /export/home/
But...I got a "403 Forbidden" message. What is the secret handshake to get this to work? And why have we added this extra step of security?
This link works for me: https://nodus.ligo.caltech.edu:30889/FE/c1als_slwebview.html. The problem with just /FE/ is that there is no index.html, and we have turned off automatic directory listings.
IIRC, this arrangement was due to the fact that authentication of some of the folders (maybe the wikis) was broken during the nodus upgrade, so there was sensitive information being publicly displayed. This setup gives us discretion over what gets exposed.
In the filter banks there were various version of a 'dewhite' filter. They were all approximately z=150, p=15, g =1 @ DC, but with ~1% differences. I don't trust their provenance and so I've enforced symmetry and fixed their names to reflect what they are (150:15).
The filters were made in response to a measurement of the pentek whitening boards in 2015 (ELOG 11550), but this level of accuracy probably isn't important.
After the repair of the faulty LEMO cable, I left MC1 with it's watchdog off overnight. Unfortunately, it looks like the problem still persists. The first attachment shows a second trend plot for the past 15 hours. Towards the left end of the plot, you can see where I re-connected the LEMO cable for the LR/UR channels.
A couple of months ago, I added a BLRMS block for the IMC optics that calculates BLRMS for the shadow sensor output as well as the coil output. Looking at this trend overnight, I noticed that the glitches appear in the coil outputs as well, as shown in the plot below, which is for a 1 hour stretch last night (I used the full data from a 16Hz coil output channel and not the BLRMS, I am not sure if there is a DQ'ed version of the coil outputs?).
Zooming in further to one of these glitches, we can see that the glitches in the coil and shadow sensor signals are in fact coincident.
But given that the watchdog was turned off all this time, the only voltage going to the coils should be the DC bias voltages. So does this not support the hypothesis that the problem lies in the part of the signal chain that supplies the bias voltage to the coils?
Never mind, the "coil output" channel isn't a true readback of the voltage to the coil, but is the calculated damping output (which is not sent to the coils when the watchdog is shutdown...
Summary pages show no kicking in the ETMX watchdogs from midnight to 6 AM (0800 - 1400 UTC):
Last night, I plugged the ETMX suspension coils back into the satellite box. Tonight, we turned on the damping loops for ETMX. Rana centered the Oplev so we can use that as an additional diagnostic to see if the optic gets kicked around overnight. We will re-assess the situation tomorrow.
Sometime earlier today, Lydia noticed that the +/- 5V Sorensens at the X end were not displaying their nominal voltage/current values (as per the stickers on them). She corrected this.
After some investigation, Rana found that on the AA filter board end, one of the 4pin LEMOs from the whitening board had one of its wires come unstuck from where it was soldered (this presumably happened while we were squishing cables tonight, as the LR channel was fine before that). Also, there was no heat shrink used on any of the solder joints.
The faulty cable has been re-soldered (with heat shrink) and replaced. All 5 sensor signals appear normal on dataviewer now. I am leaving things in this state for the night, let us see if the glitches return overnight.
PSL shutter remains closed
Seems like this stops working every ~2 years. Its been busted since early 2016 according to cron, so I fixed up the paths and restored some missing files and committed things to the SVN (with comments!) and now its working and grabbing the Web viewable versions of the front end models. Just need to restore its viewability and then the world can watch our models any time.
Back in 2011, JoeB wrote some entries on how to automatically update the Simulink webview stuff.
Somehow, the cron broke down over the years. I reran the matlab file by hand today and it worked fine, so now you can see the up to date models using the internet.
The story is: we currently don't expose the whole /users/public_html folder. Instead, we are symlinking the folders from public_html to /export/home/ on nodus, which is where apache looks for things
So, I fixed the links on the Core Optics page by running:
controls@nodus|~ > ln -sfn /users/public_html/40m_phasemap /export/home/
During the course of Rana's inspection of the general state of the IFO, he commented that there seemed to be several seismic-related IMC lock losses in the time that he had been observing it. This issue looked suspiciously like the the MC1 glitches I had noticed sometime late last year, especially since each time the IMC would unlock, we could see significant amounts of motion on MC REFL. To diagnose, we did the following:
Sure enough, there were several glitches that occurred in all 5 sensor channels. These glitches varied in size from a few counts (the smaller ones) to 60-70counts for the bigger ones. In the past, squishing the LEMO connector on the front of the PD whitening board (D000210) had apparently made the glitching go away. So tonight, for starters, we squished everything else - Sat. Box connectors, the breakout board from Sat. Box to whitening board on the back of 1X6, and the DB connector on the front of the whitening board. This had no effect - the glitching remained consistent.
Next, Rana pulled out two of the three 4pin LEMOs, and left only those coresponding to UL/LL plugged in - but the glitching persisted in these two channels. We then pulled out the board. It was installed in 1998, but has a sticker on it that says "fixed in 2003". Not sure what the fix was. Visual inspection of the circuit didn't show anything obviously faulty, but it did look like the two MAX333A quad switches (these control whether the whitening is bypassed or not) had been replaced at some point. There are other undesirable features, such as the use of thick film resistors, but nothing that would explain the glitchy behaviour.
Next, we re-inserted the whitening board back into its original slot in the Eurocrate, but switched the cables (both D sub and LEMO, but only on the whitening board end) between the boards for MC1 and MC3 (i.e. MC1 cables were routed through the whitening board that was originally used for MC3, and vice-versa). But the glitches remained consistent on the MC1 channels. So it looks like the board is not a likely culprit.
Finally, we went in and squished all the cables from the PD whitening board to the ADC (via an AA filter board). For some of the LEMO cables from the whitening board, the LEMO backshells were not properly tightened. Rana fixed these before putting them back in. Some of the connectors were also not pushed in tightly enough, Rana heard the click when he pushed them in. The cables from the adaptor board to the ADC itself looked fine, it was screwed on at both ends, and all these connections looked snug enough. In the interest of completeness, Rana also pushed in the backplane connectors on the Eurocrate (these supply the signals from the BIO cards to switch the whitening ON/OFF). The one corresponding to MC1 was indeed a little loose.
Coming back to the control room, we saw that the MC1 LR sensor was dead. After some investigation, Rana found that on the AA filter board end, one of the 4pin LEMOs from the whitening board had one of its wires come unstuck from where it was soldered (this presumably happened while we were squishing cables tonight, as the LR channel was fine before that). Also, there was no heat shrink used on any of the solder joints. Could this explain the glitchy behaviour? Perhaps, but the glitches remained in the 3 channels that were connected. Anyways, I will repair this cable tomorrow, and we can see if this has fixed the problem or not..
Some misc points:
PSL shutter is closed, MC1 watchdog is shutdown for the night.
We should consider upgrading a few of our workstations to Ubuntu 14 LTS to see how painful it is to run our scripts and DTT and DV. Better to upgrade a bit before we are forced to by circumstance.
I would recommend upgrading the workstations to one of the reference operating systems, either SL7 or Debian squeeze, since that's what the sites are moving towards. If you do that you can just install all the control room software from the supported repos, and not worry about having to compile things from source anymore.
Oot on the streets and in the chat rooms, people often ask, "What is up with the MC_F calibration?".
Not being sure of the wiring in the c1ioo model, I have formed this screencap of today's model and put it here. The MC_LENGTH and MC_FREQ are the filter banks which would calibrate these channels. In the filter banks there were various version of a 'dewhite' filter. They were all approximately z=150, p=15, g =1 @ DC, but with ~1% differences. I don't trust their provenance and so I've enforced symmetry and fixed their names to reflect what they are (150:15). I have also turned on one filter in MC_FREQ so that now the whitening of the Pentek Interface board is compensated.
Why is this TF 1/f? It should be -20 dB/decade if MC_F is in units of Hz* and MCL is a pendulum response. Perhaps its because the combination of the Koji summing box, the Thorlabs HV driver, and the Pomona box forms an additional 1/f ? IF so, this would explain the TF we see. Once we get confirmation from Koji, we can load the TF into the MC_FREQ filter bank and then MC_F will be in units of Hz (as will the summary pages).
(along the way I've also turned off the craaaazzzy servo input enable tickling that gets put in the MC AutoLocker every April Fool's leap year - resist the temptation)
Since we have a frequency counter system here and some oscillators, I wonder if we can just calibrate the MC_L and MC_F directly using a mixer lashed up to one of the counters. If so, and we can get the stabilized laser frequency noise down below 10 mHz/rHz, maybe this is a viable alternative method to the photon calibrators. Counting zero crossings is more honest than counting photons.
Found that the BS whitening was off. Gautam says that "it has always been that way" and "there's nothing in the elog about this" and "I have no special relationship with Putin".
I looked at DV and DTT while turning the OSEM whitening back on. As expected, the sensor noise improved by 10x above 10 Hz. The time series shows no problems - its just less fuzzy now.
All OSEM spectra after the switch show on upper panel of plot. Lower panel shows comparison of BS UL before/after. To rotate the DTT PDF landscape output I typed this:
pdftk BS-white.pdf cat 1N output BSwhite.pdf
"if you see something, do something"
The "apt-get update" was failing on some machines because it couldn't find the 'Debian squeeze' repos, so I made some changes so that Megatron could be upgraded.
I think Jamie set this up for us a long time ago, but now the LSC has stopped supporting these versions of the software. We're running Ubuntu12 and 'squeeze' is meant to support Ubuntu10. Ubuntu12 (which is what LLO is running) corresponds to 'Debian-wheezy' and Ubuntu14 to 'Debian-Jessie' and Ubuntu16 to 'debian-stretch'.
I followed the instructions from software.ligo.org (https://wiki.ligo.org/DASWG/DebianWheezy) and put the recommended lines into the /etc/apt/sources.list.d/lsc-debian.list file.
but I still got 1 error (previously there were ~7 errors):
W: Failed to fetch http://software.ligo.org/lscsoft/debian/dists/wheezy/Release Unable to find expected entry 'contrib/binary-i386/Packages' in Release file (Wrong sources.list entry or malformed file)
Restarting now to see if things work. If its OK, we ought to change our squeeze lines into wheezy for all workstations so that our LSC software can be upgraded.
ITMY is not like the others. Real or just OSEM madness?
Yes, writing minute trends causes hourly FB crashes in the current state of things. The "raw" minute trending is turned on, but I think that these are unknown to nds.
Did we turn off minute trend writing in one of the recent FrameBuilder debug sessions? Seems we only have second trends in 2016. Maybe this explains why its so slow to get minute trends? Dataviewer has to rebuild it from second trend.
controls@nodus|frames > l
drwx------ 2 root root 16384 Jun 8 2009 lost+found/
drwxr-xr-x 2 controls controls 4096 Jul 14 2015 tmp/
-rw-r--r-- 1 controls controls 0 Jul 14 2015 test-file
drwxr-xr-x 5 controls controls 4096 Apr 7 2016 trend/
drwxr-xr-x 4 root root 4096 Apr 11 2016 archive/
drwxr-xr-x 789 controls controls 36864 Jan 13 19:34 full/
controls@nodus|frames > cd trend
controls@nodus|trend > l
drwxr-xr-x 258 controls controls 3342336 Jul 6 2015 minute_raw/
drwxr-xr-x 387 controls controls 36864 Nov 5 2015 minute/
drwxr-xr-x 969 controls controls 36864 Jan 13 19:49 second/
Minute trend data seems not available using the NDS2 server. Its super slow using dataviewer from the control room.
Did some digging into the NDS2 config on megatron. It hasn't been updated in 2 years.
All of the stuff is run by the user 'nds2mgr'. The CronTab for this user was running all the channel name updates and server restarts at 3 AM each day; I've moved it to 5:05 AM. I don't know the password for this user, so I just did 'sudo su nds2mgr' to become him.
On megatron, in /home/nds2mgr/nds2-megatron/ there is a list of channels and configs. The file for the minute trend (C-M-ChanList.txt), hasn't been updated since Nov-2015. ???
After Koji's leap second fix, we were playing around with the X arm locking. In particular, we were playing around with the limit value on the X arm LSC filter bank - the nominal value is 4000, we wanted to see if we could increase this without kicking the optic while acquiring arm lock. We initially increased it to 8000, and then turned it off altogether. Then we rapidly turned the output of the servo ON/OFF, and looked at the arm transmission to see if it came back to the level before unlocking, as an indication of whether the optic was kicked.
These trials suggested a value of 8000 for the limiter was OK, so we left the LSC mode on with the limiter set to 8000. But just as we were about to leave for the night, I noticed on the wall Striptool that the X arm was unlocked. Investigating, we found that the green wasn't even locking to a HOM. Further investigation of the Oplev spot showed that ETMX had received a large kick (both pitch and law errors were ~200urad). ITMX was unaffected.
We initially tried lowering the LSC limit value back to 4000, then used first the Oplev spot and then the green to align the arm. But turning on LSC misaligned the arm after acquiring lock. So we decided to leave LSC off, thinking that the notorious ETMX suspension problems have resurfaced. As a diagnostic, we figured we'd leave the watchdog tripped, and use the Oplev to see if the optic was getting kicked. But the act of turning the watchdog off kicked the optic again (WHY?!).
Looking at the ETMX sus screen, turning off all the damping and LSC (but watchdog on) still leaves a non-zero offset in the "Vmon" field, between 0.02-0.05V depending on the coil. Turning the watchdog OFF takes all these to 0.009V, although I can see the LR value fluctuating between 0.004V and 0.009V. I went to the Xend and squished all the cables on the Sat. Box, but the problem persisted.
At this time, I can't think of any explanation, so I am giving up for the night. To avoid unnecessarily kicking the optic, I am going to unplug the suspension from the Sat. Box and leave one of our tester boxes plugged in, lets see if that sheds any light on the situation...
I think I fixed the DC error issue
1. I added the leap second (leapsecond ?) entry for 2016/12/31, 23:60:00 UTC to daqdrc
set gps_leaps = 820108813 914803214 1119744016;
set gps_leaps = 820108813 914803214 1119744016 1167264018;
2. Restarted FB and all realtime models
Now I don't see any RED light.
The attached file is a python notebook that you can use to get data. Minimal syntax.
"## Get some 40m data using NDS"
PEM config file was also lacking a section named "summary", which is necessary for all parent tabs; this has now been solved. I have deactivated the MEDM pages because Praful's screencap script seemed to be broken (I should have logged this, I apologize).
Pages still not working: PEM and MEDM blank.
The old control room AC has been stick in heating mode for about 2 months. It's thermostate and fan belt was finally replaced. It was calibrated and set to 71 F ( just behind 1X6 on west wall ) around 1pm.
Out belt; sad inside
at 4 pm Rana cried
It must be too tight.
Control room to outside door was realigned.
It is self closing now.
Control room to IFO door lock optimized to soft closing.
All other doors lubricated by Alex of the key shop.
I installed a DC PD (Thorlabs PDA 520) in the beam path to AS55. I placed a 2" 90/10 BS on a flip mount that picks of enough light for the PD to spit out ~8V when the port is bright. Single arm continuous signal will be ~2V. While most of the light still continues towards AS55, the displacement from the BS moves the beam off AS55, so I used the flip mount in case anyone needs to use AS55. The current configuration is UP.
When we're done with loss investigations the flip mount should be removed from the bench.
I hooked the PD up to an ethernet-enabled scope and started scripting the loss map measurement (scope can receive commands via http so we can automate the data acquisition). The scope that was present at the bench and had been used for the MC ringdown measurements had a 'scrambled' screen that I couldn't fix so I had to retrieve another scope ("scope1"). I'll try to find out what's wrong with it but we may have to send it in for repair.
The IFO is more or less back to an operational state. Some details:
One error persists - the "DC" indicator (data concentrator?) on the CDS medm screen for the various models spontaneously go red and return to green often. Is this a known issue with an easy fix?
It was requested this morning.
This is one of those unsolved door lock acquisition problems. Its been happening for years.
Please ask facilities to increase the strength of the door tensioner so that it closes with more force.
The door was not locked this morning.
Please do not use this door if you can not close it!
Last person leaving the lab should check that the latch is cut by the strike plate.
As stated in elog 12618, using an oscilloscope to average the reflected powers and thus circumventing all filtering yielded much better results than before:
XARM: 21 +/- 35 ppm
YARM: 69 +/- 45 ppm
We can probably decrease the measurement uncertainty further by using a larger photodiode that is more suited for DC measurements. It will be placed in the AS pathtemporarily. If we get below 10 ppm systematic errors will begin to matter. To get those under control I will have to re-determine the visibility in the arm cavities and the modulation indices. The numbers to match from an estimate via the power recycing gain are <= 50 ppm arm average from elog 12586. Once the measurement scheme is up and running, we can proceed to generate ETM lossmaps. ITM will still be tricky but let's see what we can do.
Following Yutaro's approach, we can move the beams on the optcs in a deterministic way by several mm on the ETMs. Moving the beam is achieved by introducing offsets into the ASS auto alignment. As an example, the Yaw dither for ETMY is shown:
Each of the 8 test mass rotational degrees of freedom is driven by a particular frequency, and 2 signals are digitally demodulated in the real-time system: The arm transmission ("T") and the LSC arm length feedback signal to the ETM (L). The T signal feeds back to the input pointing, aka Tip Tilts and BS. This maximizes the transmission for a given test mass orientation. The L feedback controls the beam position on the mirrors in the arms. It minimizes the coupling of the dither to the length feedback, which is achieved when the beam goes through the axis of the rotational motion. This is where we introduce the offset:
The signal C1:ASS-YARM_ETM_YAW_L_DEMOD_I_OFFSET (for this example) moves the locking point of the dither-to-length coupling and thus moves the beam around on the ETM. This is true for the PIT and YAW of all test masses except ITMX. In the current configuration the TTs optimize the alignment into the YARM, and for the X we only have the BS, which is why the beam spot on ITMX cannot be independently controlled as-is. We could, however, for the sake of this measurement, temporarily temporarily give TT authority to the XARM feedback to control the ITMX beam position. I imagine something like dither-aligning with ASS the normal way, and then run a customized script in which the XARM is treated as the YARM, feecback to the BS is cut, and the YAW signals are inverted due to the reflection on BS.
Knowing the angle of the offset gives us a way to calculate the beam spot displacement with the cavity geometry. For best results I want to make sure our OpLev calibration is still good (laser power decay, although last time this was done was only about a year ago), which would be analogous to elog 11831.
As for ITM beam position, this scheme only works partially, because it would require the beam to steer further off its axis than in the ETM case. This is problematic because of the spacing between tip tilts and ITMs. I summarize:
The summary pages were not successfully generated for a long period of time at the end of 2016 due to syntax errors in the PEM and Weather configuration files.
These errors caused the INI parser to crash and brought down the whole gwsumm system. It seems that changes in the configuration of the Condor daemon at the CIT clusters may have made our infrastructure less robust against these kinds of problems (which would explain why there wasn't a better error message/alert), but this requires further investigation.
In any case, the solution was as simple as correcting the typos in the config side (on the nodus side) and restarting the cron jobs (on the cluster side, by doing `condor_rm 40m && condor_submit DetectorChar/condor/gw_daily_summary.sub`). Producing pages for the missing days will take some time (how to do so for a particular day is explained in the wiki https://wiki-40m.ligo.caltech.edu/DailySummaryHelp).
RXA: later, Max sent us this secret note:
However, I realize it might not be clear from the page which are the key steps. These are just running:
1) ./DetectorChar/bin/gw_daily_summary --day YYYYMMDD --file-tag some_custom_tag To create pages for day YYYYMMDD (the file-tag option is not strictly necessary but will prevent conflict with other instances of the code running simultaneously).
2) sync those days back to nodus by doing, eg: ./DetectorChar/bin/pushnodus 20160701 20160702
This must all be done from the cluster using the 40m shared account.
./DetectorChar/bin/gw_daily_summary --day YYYYMMDD --file-tag some_custom_tag
./DetectorChar/bin/pushnodus 20160701 20160702
[lydia, ericq, gautam]
We set about following the instructions linked in the previous elog. A few notes/remarks:
Here is a link to an elog with the steps I had to follow the last time there was a similar power glitch.
The RAID array restart was also done not too long ago, we should also do a data consistency check as detailed here, if not already..
If someone hasn't found the time to do this, I can take care of it tomorrow afternoon after I am back.
Does "done" mean they are OK or they are somehow damaged? Do you mean the workstations or the front end machines?
The computers are all done.
megatron and optimus are not responding to ping commands or ssh -- please power them up if they are off; we need them to get data remotely
Jamie started the fm40m Raid rebuilding. It has been beeping since the power outage.
Summary pages have no reading since power glitch.
Valve configuration: vacuum normal
Vacuum envelope: 23C
Rga head: 44C
Caltech Facilities promissed to email the 40m facility drawings in Cad format.
I organized the old of optical , vacuum and facility layout drawings on paper in the old cabinet.
Manasa pointed me to the CAD drawings in the 40m SVN and I've now uploaded them to the 40m DCC Tree so that EricG and SteveV can convert them into SolidWorks.
There was a power glitch last night around 1:15am
The vacuum was not effected.
PSL laser turned on, PMC locked, PSL shutter opened and MC locked.
IR lasers at the ends turned on.
East arm air cond turned on.
The last power glitch was at Nov 3, 2016
In this video: https://youtu.be/iphcyNWFD10, the comments focus on the orange crocs, my wrinkled shirt, and the first aid kit.
Max tells us that soem conf files were bad and that he did something and now some pages are being made. But the PEM and MEDM pages are bank. Also the ASC tab looks bogus to me.
Dead again. No outputs for the past month. We really need a cron job to check this out rather than wait for someone to look at the web page.
MC unlocked, Autolocker waiting for c1iool0 EPICS channels to respond. c1iool0 was responding to ping, but not to telnet. Keyed the crate and its coming back now.
There's many mentions of c1iool0 in the recent past, so it seems like its demise must be imminent. Good thing we have an Acromag team on top of things!
Also, the beam on WFS2 is too high and the autolocker is tickling the Input switch on the servo board too much: this is redundant / conflicting with the MC2 tickler.
The WFS gains are supposedly maximized already. If we remotely try to increase the gain, the two MAX4106 chips in the RF path will oscillate with each other.
We should insert a bi-directional coupler (if we can find some LEMO to SMA converters) and find out how much actual RF is getting into the demod board.
Koji responding to Rana
> For the rough calibration below 10 Hz, we can use the SUS OSEM cal: the SUSPIT and SUSYAW error signals are in units of micro-radians.
I can believe the calibration for the individual OSEMs. But the input matrix looked pretty random, and I was not sure how it was normalized.
If we accept errors by a factor of 2~3, I can just naively believe the calibration factors.
> If the RF signals at the demod input are low enough, we can consider either increasing the light power on the WFS or increasing the IMC mod. depth.
The demod chip has the conversion factor of about the unity. We increased the gains of the AF stages in the demod and whitening boards. However, we only have the RMS of 1~20 counts. This means that we have really small RF signals. We should check what's happening at the RF outputs of the WFS units. Do we have any attenuators in the RF chain? Can we skip them without making the WFS units unstable?
At Hanford, there is this issue with laser jitter turning into an IMC error point noise injection. I wonder if we can try out taking the acoustic band WFS signal and adding it to the MC error point as a digital FF. We could then look at the single arm error signal to see if this makes any improvement. There might be too much digital delay in the WFS signals if the clock rate in the model is too low.