As suggested, I ran the script cds/rebootC1LSC.sh
I got a timeout error when the script tried closing the PSL shutter ('C1:AUX-PSL_ShutterRqst' not found), but Rana and I closed the shutter before leaving last night. c1sus is down, so the script found no route to host c1sus; I'm thinking I need to reset c1sus for the script to run completely. Nonetheless, c1lsc was rebooted, which crashed c1ioo and left the c1lsc FE all red (probably because c1sus wasn't restarted).
I reset c1lsc, c1sus, and c1ioo.
I noticed that the script gives the command 'ssh c1XXX', but we have been getting no route to host using this command. Instead, the machines are currently only reachable as c1XXX.martian. I'm not sure why this is, so I just appended .martian in rebootC1LSC.sh
This time, the script does run. I did get 'no route to host' on c1ioo, so I think I need to reset that machine again. After reset, the script failed to login to c1ioo and c1lsc.
Fri Sep 6 13:09:05 2019
After lunch, I reset the computers again, and try the script again. There is again no route to host for c1ioo. I'm going inside to shutoff the power to c1ioo, since the reset buttom seems to not be working. I still can't login from nodus, so I'm bringing a keyboard and monitor over to plug in directly.
On reset, c1ioo repeatedly reaches the screen in attachment 1, before going black. Holding down shift or ctrl+alt+f1 doesn't get me a command prompt. After waiting/searching the elog for >>3 min, we decided to follow these instructions to cycle the power of c1ioo. The same problem recurred following power up. I found online some instructions that the SunSystems 4600 can hang during reboot if it has become too hot ("reboot during a thermal shutdown"); I did notice that the temperature light was on earlier in this procedure, so perhaps that is the problem. I followed the wiki instructions to shut down the computer again (pressed power button, unplugged 4 power supplies from back of machine), and left it unplugged for 10-30 min (Fri Sep 6 14:46:18 2019 ).
Fri Sep 6 15:03:31 2019
Rana plugged in the power supplies and reset the machine again.
Fri Sep 6 16:30:37 2019
c1ioo is still unreachable! I pressed reset once, and the reset button flashes white. The yellow warning light is still on.
Fri Sep 6 16:54:21 2019
The reset light has stopped flashing, but I still can't access c1ioo. I reset once more, this time watching c1ioo on a monitor directly. I'm still seeing the same boot screen repeatedly. I do see that CPU0 is not clocking, which seems weird.
Following gautam's elog here, I found the Sun Fire X4600 manual for locating faulty CPUs. After the white reset light stopped flashing, I held down the power button to turn off the system. Before shutdown, all of the CPU displayed amber lights; after shutdown, only the leftmost CPU (as viewed from the back, presumably CPU0) displays an amber light. The manual says this is evidence that the CPU or DIMM is faulty. Following the manual, I remove the standby power, then checked out these Instructions for replacing the CPU to remove the CPU; Gautam also has done this before.
Fri Sep 6 20:09:01 2019 Fri Sep 6 20:09:02 2019
I pulled the leftmost CPU module out, following the instructions above. The CPU module matches the physical layout and part number of the Sun Fire X4600 M2 8-DIMM CPU module; pressing the fault reminder light gives amber indicators at the DIMM ejectors, indicating faulty DIMMs (see). The other indicator LEDs did not illuminate.
I located several spare DIMMs in the digital cabinet along Y arm (and a couple with misc computer components in the control room), but didn't find the correct one for this CPU module. The DIMM is Sun PN 371-1764-01; I found it online and ordered eight. Please let me know if this is incorrect.
To protect the CPU module, I've put it in an ESD safe bag with some bubble wrap and a note. It's on the E shop bench.
There was an alarm sound from the Smart-UPS 2200 sitting under the workstation. I see that the 'replace battery' light is red, and this elog tells me that these batteries are replaced every ~1-4 years; the last replacement was march 2016. Holding down the 'test' button for 2-3 seconds results in the alarm sound and does not clear the replace battery indicator.
Saw these slightly delayed.
Q1: Not sure--is it a safe operation for me to remove the DIMM on CPU0, replace CPU0 (with no DIMM), and boot up to try this?
Q2: Specifically, it's this DIMM. The CPU core is compatible with DDR2, clock rate up to 333 MHz (DDR2-667) and 1, 2, or 4 GB of memory.
Q1 Can we run the machine with the reduced # of cores?
Q2 We might be able to order them quickly. What's the spec and configuration of the DIMMs (like DDR2-667MHz ECC 4GBx4, and even more specs (like Samsung 2GB DDR2 RAM PC2-6400 240-Pin DIMM M378T5663EH3) so that we are to identify the exact spec).
Q3 Can we scavenge the old OMC RT machine or even megatron to extract the memories?
One pair of DIMM cards from the Sunstone box had the same Sun part number as those in c1ioo, so I swapped them in and reinstalled c1ioo's CPU0. c1ioo now boots up an seems ready to go, I'm able to log on from nodus. I also reinstalled optimus' CPU0, and optimus boots up with no problems.
While opening up optimus, I noticed a box labelled 'SUNSTONE' sitting below the rack--it contains two CPU modules a similar type as in c1ioo! I'm going to try swapping in the DIMM cards from this SUNSTONE box; I didn't find any elogs about sunstone--where are these modules from?
I reset c1lsc and c1sus, then ran rebootC1LSC.sh as before. All models started by the script are running with minimal red lights; c1oaf, c1cal, c1dnn, c1daf, and c1omc are not started by the script. I manually started these in the order c1cal->c1oaf->c1daf->c1dnn. Starting c1dnn crashed the other FE on c1ioo, so I reset all three FE again, and ran the script again (this time, including the startup for c1cal, c1oaf, and c1daf, but excluding c1dnn).
Except for c1dnn and c1omc, all models are started. The status lights are attached.
[rika, aaron, rana]
We are getting the MC locked in anticipation of making some WFS transfer function measurements.
The PSL screen was all white boxes, so I keyed the PSL crate and burt restored the settings from 11:19am Sep 5 (somewhat earlier than we started rebooting computers). Following this, I ran Milind's unstick.py and then the PSL autolocker script; both worked on the first go, great work Milind!
The modecleaner autolocking script is having substantially more trouble. Rana found that pitch and yaw sliders for all MC optics have been swapped--we think it's because the camera at MC2 has been rotated. Note that for now, sliding pitch gives a change in yaw, and sliding yaw changes pitch.
We noticed that with the WFS servo on, the modecleaner would be well aligned for a while (MC trans ~ 14000), only to lose lock after several minutes. We held the MC2_TRANS_PIT/YAW outputs at 0, so the MC2 QPD does not affect the WFS loop; the beam is well centered on WFS1/2, but not on the MC2 QPD, and with this signal out of the loop MC TRANS recovers to ~15000 counts (consistent with the quiet times over the last 90 days, see attachment 2). Attachment 1 shows the MC lock degrading, followed by some noise where we lost lock, and finally a visible increase in MC trans when we remove the MC2 QPD from the WFS loop.
MC1 Pich 4.4762 Yow 4.4669
MC2 Pich 3.7652 Yow -1.5482
MC3 Pich -0.4159 Yow 1.1477
After automatic locking MC, we stopped automatical locking and took alignment to the center of QPD.
And then again did the automatic locking MC. Finaly Rana move to best alignment.
MC1 Pich 4.4942 Yow 4.6956
MC2 Pich 3.7652 Yow -1.5600
MC3 Pich -0.3789 Yow 1.1477
We used diaggui to measure the response of WFS1/WFS2/MC2 pitch (yaw) to excitations in MC1/MC2/MC3 pitch (yaw). Seeing fluctuations of amplitude ~1 on the MCX_PIT/YAW_OUT channels, we used an amplitude 0.01 excitation at 20 Hz. We will work on scripting some of this tomorrow.
We should also have a plan for the next couple weeks so we are organized; heavily adapted from. Here's what I'm thinking this morning:
It would be good to script some of what we did yesterday. I'm checking out some scripts I'd used for Qryo and armloss measurements to remember the best way to do this.
I noticed yesterday that the PSL_shutterqst box is white, and I've seen timeout requests when eg the reboot script tries to open/close the PSL shutter. It seems like a shutter that should open, so I should find the aux machine to restart it.
We identified the Jenne laser and found a long optical fiber that might be able to transport our beam to the AP table.
Now we're searching for documentation on using this laser. Kevin and John measured a TF last year. Koji advised that we needn't worry too much, the current limit is already set correctly and we need only power on the laser.
We moved the breadboard (including a couple PDs, collimating lenses, laser, steering mirrors, etc) over to the AP table, and set it on top of the panel next to the WFS. We mounted the laser on the AP table, and added one lens with f~68 mm after the laser to fit the beam on a single quadrant; the beam was about 1mm diameter (measured by eye) when it entered the QPD. We turned the laser driver on at ~19.4 mA, and directed it to WFS2 via the last two steering mirrors before WFS2.
We monitored the QPD segments' DC level with ndscope on a laptop, and were able to send the beam to each of the four quadrants in turn. We set up the Agilent network analyzer to drive the laser's amplitude modulation and sent the RF signal from the LEMO output on the QPD head directly to the network analyzer. We will take the measurements tomorrow morning.
At Seiji and Gautam's suggestion, we added an additional RF photodiode (NewFocus 1611) to the system so we can calibrate our transfer functions. The configuration is now laser -> BS --> lenses -> QPD and BS --> lenses -> RFPD. We added lenses to get the beams focused on the RFPD and QPD heads, and are again set up for TF measurement.
We took the following data. These parameters were consistent across all measurements:
After taking the data for segment 1, I moved the beam to segment 2. The beam didn't fit on segment 2 without partially illuminating segment 1 (tested by maximizing the signal on segment 2, then blocking the beam. If the beam is entirely on one segment, only that segment should be effected; in this case, we found that segment 1's DC signal also changed when the beam was blocked). We readjusted the telescoping lenses to get the beam a bit smaller, and now the beam fits on segment 2. We know it is entirely on segment 2 because small beam movements do not change the signal on segment 2.
We are trying to take the remaining data, but AGmeasure keeps hanging while sending the data (after taking the measurement, over 10 min). We tried restarting the network analyzer to no avail. I was able to grab the data by cancelling the measurement and running
I've uploaded the spectrum for segment 1 in the meantime. Zero model is on the way.
When I finished up the measurements on WFS2, I removed the cables from the AP table and closed the cover.
EDIT: I forgot to switch the LEMO connector to measure the other segments, so we measured the RF signal from segment 1 even when the beam was on segments 2-4. We'll have to try again tomorrow.
We are at it again. Rika is setting up the TF measurement, I'm looking into scripting the WFS sensing matrix measurement we made earlier in the week so we can return to it next week.
When we mesuring TF of SEG4, the beam leaking to SEG1 about 1%.
We finished mesurement SEG2-4 and get the figure by running PDH_calibrate.ipynb .
edit: We observed during segment 2 measurements that blocking the beam reduced the DC level of segment 1 by less than 1%, but still clearly observable. As you can see in the plots, something is suspicious about the normalization of these TFs. We took segment 1 data a few days before the other segments, so perhaps we weren't getting the full beam on the reference PD during the later measurements? When I make this measurement for WFS1, I will try to fix some of these problems by choosing different telescoping optics, and I will consider whether removing the QPD heads from their table will improve the measurement.
I'm scripting the WFS sensing matrix measurements. I haven't really scripted DTT before, so I'm trying to find documentation or existing scripts. I came across this elog where Gautam measured a sensing matrix during DRMI lock, and he pointed me to some .xml files used for these measurments.
When I put away the lenses we had used for measuring the RF transfer functions of the QPD heads, I saw that I'd removed them from the cabinet containing green endtable optics, but hadn't noticed the sign forbidding their removal. I'll talk with Koji/Gautam about what happened and what should be done.
I wanted to make a zero model of this circuit to get a handle on the results. I couldn't import zero on pianosa, and I tried pip installing zero, but was denied due to not finding version 3.0.3 of matplotlib. I finally got it to install using
Oddly, even though I can now import zero when I open a python3 session from the command line, when I open a jupyter notebook and switch to a python3 kernel, the zero module is still unavailable. I think I recall that conda manages the jupyter environment -- is pip managing an entirely separate environment (annoying)?
edit: Yeah, it was something like that. I reminded myself how this works with this article.
We noticed last week that the MC2 trans camera has pitch and yaw swapped; I rotated what I thought is the correct camera by 90 degrees clockwise (as viewed from above, like in the attachment), but I now have doubts. It's the camera on the right in the attachment.
I'm using the notebooks from rana as a starting point, and making a script to measure and fill the WFS sensing matrix. It lives at /users/aaron/WFS/scripts/WFSsensingMatrix.ipynb for now. Here's what it does; what's been tested is in green, untested is goldenrod, uncoded is fire brick.
To run these on pianosa, I ran (inside the jupyter notebook)
I'm getting an error when starting the nds2 connection
conn = nds2.connection('192.168.113.201', 31200)
Failed to establish a connection[INFO: Request SASL authentication protocol]+
I didn't find anything on the elog about this error, but I'm looking at the nds user manual. The problem was, I didn't have a valid Kerberos ticket; I opened one on Pianosa with my albert.einstein (note all caps ligo.org).
I'm now able to run the scripts Rana mentions, but I haven't been able to grab the channels I want (eg C1:SUS-MC1_ASCPIT_IN1_OUT); it says the channel isn't found. When I check how many of the Caltech channels are available (conn.count_channels('C1*')), there are none. I was connecting to nds.ligo.caltech.edu, but this must be the wrong server (it has all the channels for the sites). fb and fb1 (and the IP they point to, 192.168.113.201) cannot be connected to, giving the error 'Error occurred trying to write to socket.'
I recall that in the cryo lab, we need to use port 8088 to get data from cymac1, and indeed substituting 31200 -> 8088 lets me access the C1 channels (I can count the channels), but no matter what time I request, nds tells me there is no data available (gap). Gautam came by and diagnosed that the gaps I'm seeing in the frames' data are real, fb is down (see elog).
Continuing, I'm going to modify the script to grab live data. I'm using the iterate and next methods. I noticed that the MC2_TRANS pit/yaw channels are not saved to frames, even though WFS1/2 pit/yaw are. Since I expect I'll want to lookback at these channels, I followed the instructions for adding a daq channel, uncommenting the following line in /opt/rtcds/caltech/c1/chans/daq/C1IOO.ini:
I made a backup of the old version of this .ini file, which can be found in /users/aaron/backups/190917_C1IOO.ini. I did not remake the model, as I couldn't find the c1ioo model in /opt/rtcds/caltech/c1/userapps/trunk or from the matlab command prompt. I restarted the fb via telnet, but didn't restart the model or check the svn (got an error?). The _DQ channels are now reachable on dataviewer, so things seem to be working.
I also tried importing cdsutils, so I can control awg in the same script that we read out the sensing matrix, but I'm getting the python3 error when I import cdsutils:
I tried pip upgrading cdsutils, but it's already up-to-date. I get the above error even if I switch to a python 2 kernel; cdsutils is installed in the python2.7 directory, so I don't know why pip is finding it when I'm running a python 3 kernel. I can move on from this for now, but it would be useful to be able to script the excitation along with the measurement.
Tangentially related, Rika wanted to be running some jupyter notebooks while working on donatella. I ran, on donatella:
hm, that didn't work. Also jupyter is installed when you install conda, so I'm not sure how there is a version of conda but not of jupyter. I also see that pip and pip3 are not recognized commands on donatella.
I noticed that some of the functions in the scipy signal processing toolbox were out of date on pianosa. The cheby and welch filters now accept additional kwargs (for eg, before you needed to give IIR filter methods a cutoff frequency normalized to the Nyquist rate, but now you can give it the frequencies and sampling rate separately).
I want to update this package, but I hesitate to break everyone's existing scripts.
New DIMM cards have arrived. I stored them in the digital cabinet along y arm.
I booted Rossa in rescue mode; though I see no errors on bootup, I still see the same error ("a problem has occurred") after boot, and a prompt to logout. I powered rossa off/on (single short press of power button), no change.
Booting in debug mode, I see that the error occurs when mounting /cvs/cds, with the error
Which is odd, because when I boot in recovery mode, is mounts /cvs/cds successfully.
I booted in emergency mode by adding to the boot command
but didn't have the appropriate root password to troubleshoot further (the usual two didn't work).
I wanted to measure the RF transimpedance of the WFS heads, as outlined above.
Summary: Measurement is not done.
I set up the spectrum analyzer to make the WFS head RF transfer function measurement (V/W) on WFS1. I placed the Jenne laser on the AP table, along with the reference PD power supply, laptop, and laser power supply. The Agilent output AM modulates the laser; the reference PD is again NewFocus 1611, with its AC output sent to Agilent's R channel and DC output sent to an oscilloscope;
I closed the PSL shutter while checking for a location to place the breadboard, and opened it while writing this. Headed back to Cryo to pick up the large incandescent bulb we'd borrowed over the summer.
I measured the RF response of the fiber-coupled NewFocus 1611, calibrating out the cable delay. The laser current was set to 20.0 mA, and the RF power going into the splitter was -10 dBm. The DC voltage was 1.87 V, and Gautam and I measured the power from the fiber at 344uW.
Something still looks very wrong -- the PD is supposed to be flat out to 1GHz, and physical units pending, need food.
The fiber-coupled PD seems to have a factor of ~1.5 difference in responsivity compared to the free-space PD. There are some differences in the two ways I made the measurement that I don't yet understand.
I measured relative responsivities of the fiber and free coupled NewFocus 1611 PDs (scaled by the Jenne AM transfer function).
I made the measurement in two ways, see attachment three. In attachment one, I show the response for separately measuring the two PDs relative to a pickoff of the source (two-port thru calibration). In attachment two I measure the relative responses directly, without picking off a reference (three-port calibration). I scaled the transfer functions by their DC voltages; both PDs have transimpedances of 700 V/A.
However, there are some clear differences in the response (overall factor of 0.5dB offset that may be explained by a miscalibrated DC level; apparent periodicity in attachment 1) that I don't yet understand.The free path of the non-fiber PD is ~5-6 inches, which accounts for the ~45 degrees of phase advance of the fiber relative to free coupled PD signal. (12.7cm / (c / 300 MHz) * 360 degrees ~ 45 degrees)
Mon Oct 7 14:51:53 2019. I closed the PSL shutter to measure the WFS head responsivity.
I made a thru calibration as in this elog, treating laser, reference PD, and WFS RF output as a three-port device. The DC current supplied to the laser is 20.0 mA in all cases. The Agilent spectrum analyzer supplies a -10 dBm excitation to Jenne laser's AM port, and A/B is measured with 20dB attenuation on each input port. Results are in /users/aaron/WFS/data/191007/. The calibration had 100 averages, all other measurements 32 averages; other parameters found in the yml file, same folder as the data.
I normalized the result by the difference between the dark and bright DC levels of each segment.
Mon Oct 7 17:29:58 2019 opened PSL shutter.
I simulated this circuit with zero, but haven't gotten the results to match the measurements above.
I installed nds2 on donatello with yum, but still can't import nds2.
I installed nds2 again, this time successfully with
conda install -c conda-forge python-nds2-client
% Eqn 41 of
% "Doppler-induced dynamics of fields in FabryĖPerot
% cavities with suspended mirrors", Malik Rakhmanov (2000).
% read in ringdown timeseries:
at = importdata('tek00000.csv');
The backup of /cvs/cds (which runs as a cron job on fb40m; see /cvs/cds/caltech/scripts/backup/000README.txt)
has been down since fb40m was rebooted on March 3.
I was unable to start it because of conflicting ssh keys in /home/controls/.ssh .
With help from Dan Kozak, we got it to work with both sets of keys
( id_rsa, which allows one to ssh between computers in our 113 network without typing a password,
and backup2PB which allows the cron job to push the backup files to the archive in Powell-Booth).
It still goes down every time one reboots fb40m, and I don't have a solution.
A simple solution is for the script to send an email whenever it can't connect via ssh keys
(requiring a restart of ssh-agent with a passphrase), but email doesn't seem to work on fb40m.
I'll see if I can get help on how to have sendmail run on fb40m.
Ever since July 22, the backup script that runs on fb40m has failed to ssh to ldas-cit.ligo.caltech.edu to back up our trend frames and /cvs/cds.
This was a new failure mode which the scripts didn't catch, so I only noticed it when fb40m was rebooted a couple of days ago.
Alex fixed the problem (RAID array was configured with the wrong IP address, conflicting with the outside world), and I modified the script ( /cvs/cds/caltech/scripts/backup/rsync.backup ) to handle the new directory structure Alex made.
Now the backup is current and the automated script should keep it so, at least until the next time fb40m is rebooted...
After the mini boot fest that Jenne did today, I checked whether that fixed the overflow issues we yesterday prevented the alignemnt of the arms.
I ran the alignment script for the arms getting 0.85 for TRX and 0.75 for TRY: low values.
After I ran the script ,C1SUSVME1 and C1SUSVME2 started having problems with the FE SYNC (counter at 16378). I rebooted those two and fix the sync problem but the transmitted powers didn't improve.
Are we still having problem due to MC misalignment?
I've now also trended the MOPA output power for the last 200 days to check a possible correlation with the FSS reflected power. See attachment.
The trend shows that the laser power has decayed but it seems that the FSS reflected power has done it even faster: 30% drop in the FSS vs 7% for the MOPA in the last 60 days (attachment n.2).