Update 2019 Sep 19 1730: The pin numbers of the IDC 50 connector are all off by 1. i.e. 3-->4 and so on. I will fix this shortly. The problem was because of me looking at the pinout for the wrong gender of IDC50 connectors.
New DIMM cards have arrived. I stored them in the digital cabinet along y arm.
False alarm - the mistake was mine. Looking at the schematic diagram, the AI/Dewhite board, D000316, accepts the inputs from the DAC on the P2 connector. While restoring the connections at 1Y2, I had plugged the outputs of the DAC interface board into the P1 connectors of the AI boards. Having rectified this problem, I am now able to move the beam on the AS camera in both PIT and YAW using TT1 or TT2. So to zero-th order, this subsystem appears to work. A more in-depth analysis of the angular stability of the TTs can only be done once we re-align the arms and lock some cavities.
While debugging this problem, c1lsc models crashed. I ran the reboot script this morning to bring the models back. There was a 0x4000 error on the DC indicators for the c1lsc models (mx_stream error which couldn't be fixed by restarting the mx service) the first time I ran the script so I did it again, now the indicator lights are in their nominal state.
The custom ribbon cables piping the coil driver board outputs to the eLIGO (?) TTs (a.k.a. TT1 and TT2) are damaged. They need to be re-made. I can't find any pin-mapping for them.
While waiting for the LSC photodiode whitening switching cross-connect work to be done, I thought I'd re-align the IFO a bit. However, I was unable to find any beam making it to the REFL/AS ports despite some TT steering. I remembered that Chub had undone the TT connections at 1Y2 as well, and thought I'd check the cabling to make sure all was in order. On going to the rack, however, I found that these connections were damaged at the coil-driver end (see Attachment #1), presumably during the cable extraction. These need to be re-made...😔
In the data we got yesterday, we can see some filter's effect.
But it is not good coherence above 10Hz, so we mesured again. And this time we save the data as xml file.
And also we chaned the frequency regions broader to watch corner frequency of suspension.
Diagnotics test tools
range: 0.1 Hz to 100 Hz
but at low frequency, the mode maching cavity was unloked cause of too much shaking.
So, we saw single frequency TF, and searched the good amplitude.
First, I tried to get TF @0.1~1 Hz .
0.1 to 1 Hz
points: 61 (I think it's too much becous it takes about an hour)
The TFs and coherence of MC1/PIT to each QPD is below. [above window: coherence, below: TF]
During the mesurement, something happened @0.2-0.3Hz so I stopped it.
We found the coherence of WFS1P and WFS2Y is not good, but others are good.
we guess that it could come from alignment which made Q chainging to small.
Finaly, I also got the .xml data of MC1P 1 Hz to 10 Hz. In this time,
1 to 10 Hz
Now we took single frequency 6 TFs (MC1/2/3 PIT/YAW) @7Hz (Because this frequency has good coherence in all channel).
Aaron wrote the script using dtt to making matrics.
Once stop the auto-locker and realigned to make beam to get into QPD again.
After we lock MC, we took TFs from suspension MC1/2/3 PIT/YAW to WFS1/2 PIT/YAW.
Diagnotics test tools
range: 7 Hz to 50 Hz
Column 0: WFS2_PIT 1: WFS2_YAW 2:WFS1_PIT 3: WFS1_YAW 4: TRANCE_PIT 5:TRANCE_YAW
I'm wondering weather the MC1data I saved is correct, becouse I found the channel was changed when I exported MC2 data. So I took MC1 data again.
We got all data for TFs already. Each data is devided to real part and imaginary part. Then we are arranging the datas to obtain TFs.
TF of MC2 is attachiment 1. So tomorrow, I make other TF.
We aligned optics of WFS as it was. Now auto-locker is working to lock MC.
But it still doesn't lock. We notice that the c1lsc machine doesn't work. So we run rebootCILSC.sh.
Now we reset the hardware!
After reset, auto locking didn't work well. Gautum and Aaron reboot slow c1ioo. Then it works, and Gautam returned the MC to a good alignment.
We found the beam is not in the center of the QPD, we (turned off the MC autolocker and MC loop, then) realigned to make beam to get in to the QPD center. Afterwards we start auto locking.
With the WFS on, the maximum MC transmission we observe is 14,700 counts; after the transmission level stabilizes (MC_TRANS pit and yaw brought to 0), the MC transmission is only 14,200 counts. Perhaps the MC_TRANS QPD offsets need adjustment. We relieve the WFS servo of its DC offsets. This is the configuration we'll use for WFS loop measurements this week.
INCORRECT INFO IN THIS ELOG HAS BEEN REMOVED. SEE THIS ELOG FOR THE UPDATED INFO.
With the help of a tester board, I verified the mapping between fast BIO DB37 pins, and pins on the IDC50 connectors that are to be broken out to the whitening boards. I will enlist Chub to implement this mapping in hardware later today.
We continued to check the latch logic. Today we found that latch.py didn't catch the change of LSB but did for MSB. We determined that this happens when the slider value is chaged between the polling for LSB and MSB.
SInce these two should always be related to a single gain value, latch.py was modified so.
Now we don't observe any logic error for ~100 gain transisitions (see attached).
For the investigation of the latch logic issue for the CARM CM board, I have made the LED logic checkers with DB breakout boards. They require the pull up voltage supply of +15V because the acromag digital out is a open corrector (well... open "source") output.
The logic from Pin1 to Pin16 of DB37 can be monitored. The DB15 connector is only for monitoring the latch enable logic.
What Gautam and I found with the logic outputs was that the latch logic works fine but occasionally we found that the top 2 bits and the bottom 4bit were processed independently.
controls@fb1:~ 127$ sudo systemctl status daqd_fw.service
● daqd_fw.service - Advanced LIGO RTS daqd frame writer
Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
Active: active (running) since Tue 2019-09-17 21:32:25 PDT; 17min ago
Main PID: 22040 (daqd_fw)
└─22040 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer crc thread - label dqprodcrc pid=22108
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] [Tue Sep 17 21:32:31 2019] Producer thread - label dqproddbg pid=22109Producer crc... permitted
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer crc thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer thread - label dqprod pid=22103
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:35 fb1 daqd_fw: [Tue Sep 17 21:32:35 2019] Minute trender made GPS time correction; gps=1252816371; gps%60=51
Sep 17 21:33:31 fb1 daqd_fw: [Tue Sep 17 21:33:31 2019] ->3: clear crc
drwxr-xr-x 2 controls controls 569344 Aug 23 05:17 12465
drwxr-xr-x 2 controls controls 565248 Aug 23 05:41 12466
drwxr-xr-x 2 controls controls 557056 Aug 23 05:53 12505
drwxr-xr-x 2 controls controls 262144 Aug 23 18:40 12506
drwxr-xr-x 2 controls controls 12288 Sep 17 21:54 12528
Unrelated to this work: c1auxey was keyed.
This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 .
Came across this while looking up the BIO situation at 1Y2. For reference, the fix Koji mentions can be seen in the attached screenshot (one example, the other BIO cards also have a similar fix). The 16th bit of the BIO is grounded, and some bit-shifting magic is used to implement the desired output.
Yutaro talked about the BIO bug in KAGRA elog. http://klog.icrr.u-tokyo.ac.jp/osl/?r=9536
I think I made the similar change for the 40m model somewhere (don't remember), but be aware of the presense of this bug.
For some reason, the daqd_fw service was dead on FB. This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 . Simply restarting the fw service does not work, it crashes again after ~20 seconds. The problem may have to do with the indeterminate state of the c1lsc expansion chassis. However, this is not something that can immediately be fixed, as Chub is still working on the wiring there. So in summary, no frame data will be available until we fix this problem (it is still unclear what exactly the problem is). Team WFS can still work by getting online data.
Why were the CDS overview DC indicators not red???
Unrelated to this work: I had to key the c1psl crate to get the IMC autolocker functioning again. However, I found that the key 🔑 turns continuously - as opposed to having two well defined states, ON and OFF. Be careful while handling this.
I'm using the notebooks from rana as a starting point, and making a script to measure and fill the WFS sensing matrix. It lives at /users/aaron/WFS/scripts/WFSsensingMatrix.ipynb for now. Here's what it does; what's been tested is in green, untested is goldenrod, uncoded is fire brick.
To run these on pianosa, I ran (inside the jupyter notebook)
I'm getting an error when starting the nds2 connection
conn = nds2.connection('192.168.113.201', 31200)
Failed to establish a connection[INFO: Request SASL authentication protocol]+
I didn't find anything on the elog about this error, but I'm looking at the nds user manual. The problem was, I didn't have a valid Kerberos ticket; I opened one on Pianosa with my albert.einstein (note all caps ligo.org).
I'm now able to run the scripts Rana mentions, but I haven't been able to grab the channels I want (eg C1:SUS-MC1_ASCPIT_IN1_OUT); it says the channel isn't found. When I check how many of the Caltech channels are available (conn.count_channels('C1*')), there are none. I was connecting to nds.ligo.caltech.edu, but this must be the wrong server (it has all the channels for the sites). fb and fb1 (and the IP they point to, 192.168.113.201) cannot be connected to, giving the error 'Error occurred trying to write to socket.'
I recall that in the cryo lab, we need to use port 8088 to get data from cymac1, and indeed substituting 31200 -> 8088 lets me access the C1 channels (I can count the channels), but no matter what time I request, nds tells me there is no data available (gap). Gautam came by and diagnosed that the gaps I'm seeing in the frames' data are real, fb is down (see elog).
Continuing, I'm going to modify the script to grab live data. I'm using the iterate and next methods. I noticed that the MC2_TRANS pit/yaw channels are not saved to frames, even though WFS1/2 pit/yaw are. Since I expect I'll want to lookback at these channels, I followed the instructions for adding a daq channel, uncommenting the following line in /opt/rtcds/caltech/c1/chans/daq/C1IOO.ini:
I made a backup of the old version of this .ini file, which can be found in /users/aaron/backups/190917_C1IOO.ini. I did not remake the model, as I couldn't find the c1ioo model in /opt/rtcds/caltech/c1/userapps/trunk or from the matlab command prompt. I restarted the fb via telnet, but didn't restart the model or check the svn (got an error?). The _DQ channels are now reachable on dataviewer, so things seem to be working.
I also tried importing cdsutils, so I can control awg in the same script that we read out the sensing matrix, but I'm getting the python3 error when I import cdsutils:
I tried pip upgrading cdsutils, but it's already up-to-date. I get the above error even if I switch to a python 2 kernel; cdsutils is installed in the python2.7 directory, so I don't know why pip is finding it when I'm running a python 3 kernel. I can move on from this for now, but it would be useful to be able to script the excitation along with the measurement.
Tangentially related, Rika wanted to be running some jupyter notebooks while working on donatella. I ran, on donatella:
hm, that didn't work. Also jupyter is installed when you install conda, so I'm not sure how there is a version of conda but not of jupyter. I also see that pip and pip3 are not recognized commands on donatella.
I noticed that some of the functions in the scipy signal processing toolbox were out of date on pianosa. The cheby and welch filters now accept additional kwargs (for eg, before you needed to give IIR filter methods a cutoff frequency normalized to the Nyquist rate, but now you can give it the frequencies and sampling rate separately).
I want to update this package, but I hesitate to break everyone's existing scripts.
Let's not worry about C1LSC until the c1iscaux upgrade is done.
The left one is analog and 90deg rotated.
See also: This issue tracker
We noticed last week that the MC2 trans camera has pitch and yaw swapped; I rotated what I thought is the correct camera by 90 degrees clockwise (as viewed from above, like in the attachment), but I now have doubts. It's the camera on the right in the attachment.
I wanted to make a zero model of this circuit to get a handle on the results. I couldn't import zero on pianosa, and I tried pip installing zero, but was denied due to not finding version 3.0.3 of matplotlib. I finally got it to install using
Oddly, even though I can now import zero when I open a python3 session from the command line, when I open a jupyter notebook and switch to a python3 kernel, the zero module is still unavailable. I think I recall that conda manages the jupyter environment -- is pip managing an entirely separate environment (annoying)?
edit: Yeah, it was something like that. I reminded myself how this works with this article.
When I put away the lenses we had used for measuring the RF transfer functions of the QPD heads, I saw that I'd removed them from the cabinet containing green endtable optics, but hadn't noticed the sign forbidding their removal. I'll talk with Koji/Gautam about what happened and what should be done.
I installed 6 of these in 1Y2. Three were for PD INTF #1-3, and I used three more for the AS110, REFL11, and REFL33 Demod board FEs, where the strain-reflief of the DC power cables to the Eurocrate was becoming a problem. So now there are only 4 units available as spares.
Once the strain-relieving of the Dsub cabling to 1Y3 is done, we can move ahead with testing. I'd like to put this to bed this week if possible.
not need to use DTT. I'm attaching some half-finished notebooks that give the gist.
That's it! Now you have the complex, single frequency TFs. Next you invert the matrix.
The PCB board of the adapter for DIN 96pin to DSUB37 conversion (single DSUB version) was delivered yesterday and I quickly soldered the connectors.
They are ready for use and stored in a JLCPCB cardboard box on a pile of acromag stuff. (Note that the lacel is written on the box with Sharpie)
I'm scripting the WFS sensing matrix measurements. I haven't really scripted DTT before, so I'm trying to find documentation or existing scripts. I came across this elog where Gautam measured a sensing matrix during DRMI lock, and he pointed me to some .xml files used for these measurments.
We are at it again. Rika is setting up the TF measurement, I'm looking into scripting the WFS sensing matrix measurement we made earlier in the week so we can return to it next week.
When we mesuring TF of SEG4, the beam leaking to SEG1 about 1%.
We finished mesurement SEG2-4 and get the figure by running PDH_calibrate.ipynb .
edit: We observed during segment 2 measurements that blocking the beam reduced the DC level of segment 1 by less than 1%, but still clearly observable. As you can see in the plots, something is suspicious about the normalization of these TFs. We took segment 1 data a few days before the other segments, so perhaps we weren't getting the full beam on the reference PD during the later measurements? When I make this measurement for WFS1, I will try to fix some of these problems by choosing different telescoping optics, and I will consider whether removing the QPD heads from their table will improve the measurement.
At Seiji and Gautam's suggestion, we added an additional RF photodiode (NewFocus 1611) to the system so we can calibrate our transfer functions. The configuration is now laser -> BS --> lenses -> QPD and BS --> lenses -> RFPD. We added lenses to get the beams focused on the RFPD and QPD heads, and are again set up for TF measurement.
We took the following data. These parameters were consistent across all measurements:
After taking the data for segment 1, I moved the beam to segment 2. The beam didn't fit on segment 2 without partially illuminating segment 1 (tested by maximizing the signal on segment 2, then blocking the beam. If the beam is entirely on one segment, only that segment should be effected; in this case, we found that segment 1's DC signal also changed when the beam was blocked). We readjusted the telescoping lenses to get the beam a bit smaller, and now the beam fits on segment 2. We know it is entirely on segment 2 because small beam movements do not change the signal on segment 2.
We are trying to take the remaining data, but AGmeasure keeps hanging while sending the data (after taking the measurement, over 10 min). We tried restarting the network analyzer to no avail. I was able to grab the data by cancelling the measurement and running
I've uploaded the spectrum for segment 1 in the meantime. Zero model is on the way.
When I finished up the measurements on WFS2, I removed the cables from the AP table and closed the cover.
EDIT: I forgot to switch the LEMO connector to measure the other segments, so we measured the RF signal from segment 1 even when the beam was on segments 2-4. We'll have to try again tomorrow.
Chub wanted to get the correct part number for the replacement UPS batteries which necessitated opening up the UPS. To be cautious, all the workstations were shutdown at ~9:30am while the unit is pulled out and inspected. While looking at the UPS, we found that the insulation on the main power cord is damaged at both ends. Chub will post photos.
However, despite these precautions, rossa reports some error on boot up (not the same xdisp junk that happened before). pianosa and donatella came back up just fine. It is remotely accessible (ssh-able) though so maybe we can recover it...
please no one touch the UPS: last time it destroyed ROSSA. Please ask Chub to order the replacement batteries so we can do this in a controlled way (fully shutting down ALL workstations first). Last time we wasted 8 hours on ROSSA rebuilding
We identified the Jenne laser and found a long optical fiber that might be able to transport our beam to the AP table.
Now we're searching for documentation on using this laser. Kevin and John measured a TF last year. Koji advised that we needn't worry too much, the current limit is already set correctly and we need only power on the laser.
We moved the breadboard (including a couple PDs, collimating lenses, laser, steering mirrors, etc) over to the AP table, and set it on top of the panel next to the WFS. We mounted the laser on the AP table, and added one lens with f~68 mm after the laser to fit the beam on a single quadrant; the beam was about 1mm diameter (measured by eye) when it entered the QPD. We turned the laser driver on at ~19.4 mA, and directed it to WFS2 via the last two steering mirrors before WFS2.
We monitored the QPD segments' DC level with ndscope on a laptop, and were able to send the beam to each of the four quadrants in turn. We set up the Agilent network analyzer to drive the laser's amplitude modulation and sent the RF signal from the LEMO output on the QPD head directly to the network analyzer. We will take the measurements tomorrow morning.
We should also have a plan for the next couple weeks so we are organized; heavily adapted from. Here's what I'm thinking this morning:
It would be good to script some of what we did yesterday. I'm checking out some scripts I'd used for Qryo and armloss measurements to remember the best way to do this.
I noticed yesterday that the PSL_shutterqst box is white, and I've seen timeout requests when eg the reboot script tries to open/close the PSL shutter. It seems like a shutter that should open, so I should find the aux machine to restart it.
I picked up a unit of D1900068 SR785 accessory box from Dean's office at Downs.
Still removing old cable, terminal blocks and hardware. Once new strain reliefs and cable guides are in place, I will need to disconnect cables and reroute them. Please let me know dates and times when that is not going to interrupt your work!
[rika, aaron, rana]
We are getting the MC locked in anticipation of making some WFS transfer function measurements.
The PSL screen was all white boxes, so I keyed the PSL crate and burt restored the settings from 11:19am Sep 5 (somewhat earlier than we started rebooting computers). Following this, I ran Milind's unstick.py and then the PSL autolocker script; both worked on the first go, great work Milind!
The modecleaner autolocking script is having substantially more trouble. Rana found that pitch and yaw sliders for all MC optics have been swapped--we think it's because the camera at MC2 has been rotated. Note that for now, sliding pitch gives a change in yaw, and sliding yaw changes pitch.
We noticed that with the WFS servo on, the modecleaner would be well aligned for a while (MC trans ~ 14000), only to lose lock after several minutes. We held the MC2_TRANS_PIT/YAW outputs at 0, so the MC2 QPD does not affect the WFS loop; the beam is well centered on WFS1/2, but not on the MC2 QPD, and with this signal out of the loop MC TRANS recovers to ~15000 counts (consistent with the quiet times over the last 90 days, see attachment 2). Attachment 1 shows the MC lock degrading, followed by some noise where we lost lock, and finally a visible increase in MC trans when we remove the MC2 QPD from the WFS loop.
MC1 Pich 4.4762 Yow 4.4669
MC2 Pich 3.7652 Yow -1.5482
MC3 Pich -0.4159 Yow 1.1477
After automatic locking MC, we stopped automatical locking and took alignment to the center of QPD.
And then again did the automatic locking MC. Finaly Rana move to best alignment.
MC1 Pich 4.4942 Yow 4.6956
MC2 Pich 3.7652 Yow -1.5600
MC3 Pich -0.3789 Yow 1.1477
We used diaggui to measure the response of WFS1/WFS2/MC2 pitch (yaw) to excitations in MC1/MC2/MC3 pitch (yaw). Seeing fluctuations of amplitude ~1 on the MCX_PIT/YAW_OUT channels, we used an amplitude 0.01 excitation at 20 Hz. We will work on scripting some of this tomorrow.
One pair of DIMM cards from the Sunstone box had the same Sun part number as those in c1ioo, so I swapped them in and reinstalled c1ioo's CPU0. c1ioo now boots up an seems ready to go, I'm able to log on from nodus. I also reinstalled optimus' CPU0, and optimus boots up with no problems.
While opening up optimus, I noticed a box labelled 'SUNSTONE' sitting below the rack--it contains two CPU modules a similar type as in c1ioo! I'm going to try swapping in the DIMM cards from this SUNSTONE box; I didn't find any elogs about sunstone--where are these modules from?
I reset c1lsc and c1sus, then ran rebootC1LSC.sh as before. All models started by the script are running with minimal red lights; c1oaf, c1cal, c1dnn, c1daf, and c1omc are not started by the script. I manually started these in the order c1cal->c1oaf->c1daf->c1dnn. Starting c1dnn crashed the other FE on c1ioo, so I reset all three FE again, and ran the script again (this time, including the startup for c1cal, c1oaf, and c1daf, but excluding c1dnn).
Except for c1dnn and c1omc, all models are started. The status lights are attached.
Saw these slightly delayed.
Q1: Not sure--is it a safe operation for me to remove the DIMM on CPU0, replace CPU0 (with no DIMM), and boot up to try this?
Q2: Specifically, it's this DIMM. The CPU core is compatible with DDR2, clock rate up to 333 MHz (DDR2-667) and 1, 2, or 4 GB of memory.
Q1 Can we run the machine with the reduced # of cores?
Q2 We might be able to order them quickly. What's the spec and configuration of the DIMMs (like DDR2-667MHz ECC 4GBx4, and even more specs (like Samsung 2GB DDR2 RAM PC2-6400 240-Pin DIMM M378T5663EH3) so that we are to identify the exact spec).
Q3 Can we scavenge the old OMC RT machine or even megatron to extract the memories?
please no one touch the UPS: last time it destroyed ROSSA. Please ask Chub to order the replacement batteries so we can do this in a controlled way (fully shutting down ALL workstations first). Last time we wasted 8 hours on ROSSA rebuilding.
There was an alarm sound from the Smart-UPS 2200 sitting under the workstation. I see that the 'replace battery' light is red, and this elog tells me that these batteries are replaced every ~1-4 years; the last replacement was march 2016. Holding down the 'test' button for 2-3 seconds results in the alarm sound and does not clear the replace battery indicator.
Assuming you are at pianosa, /etc/resolv.conf is like
# Generated by NetworkManager
But this should be like
as indicated in https://nodus.ligo.caltech.edu:8081/40m/14767
I did this change for now. But this might get overridden by Network Manager.
I reset c1lsc, c1sus, and c1ioo.
I noticed that the script gives the command 'ssh c1XXX', but we have been getting no route to host using this command. Instead, the machines are currently only reachable as c1XXX.martian. I'm not sure why this is, so I just appended .martian in rebootC1LSC.sh
This time, the script does run. I did get 'no route to host' on c1ioo, so I think I need to reset that machine again. After reset, the script failed to login to c1ioo and c1lsc.
Fri Sep 6 13:09:05 2019
After lunch, I reset the computers again, and try the script again. There is again no route to host for c1ioo. I'm going inside to shutoff the power to c1ioo, since the reset buttom seems to not be working. I still can't login from nodus, so I'm bringing a keyboard and monitor over to plug in directly.
On reset, c1ioo repeatedly reaches the screen in attachment 1, before going black. Holding down shift or ctrl+alt+f1 doesn't get me a command prompt. After waiting/searching the elog for >>3 min, we decided to follow these instructions to cycle the power of c1ioo. The same problem recurred following power up. I found online some instructions that the SunSystems 4600 can hang during reboot if it has become too hot ("reboot during a thermal shutdown"); I did notice that the temperature light was on earlier in this procedure, so perhaps that is the problem. I followed the wiki instructions to shut down the computer again (pressed power button, unplugged 4 power supplies from back of machine), and left it unplugged for 10-30 min (Fri Sep 6 14:46:18 2019 ).
Fri Sep 6 15:03:31 2019
Rana plugged in the power supplies and reset the machine again.
Fri Sep 6 16:30:37 2019
c1ioo is still unreachable! I pressed reset once, and the reset button flashes white. The yellow warning light is still on.
Fri Sep 6 16:54:21 2019
The reset light has stopped flashing, but I still can't access c1ioo. I reset once more, this time watching c1ioo on a monitor directly. I'm still seeing the same boot screen repeatedly. I do see that CPU0 is not clocking, which seems weird.
Following gautam's elog here, I found the Sun Fire X4600 manual for locating faulty CPUs. After the white reset light stopped flashing, I held down the power button to turn off the system. Before shutdown, all of the CPU displayed amber lights; after shutdown, only the leftmost CPU (as viewed from the back, presumably CPU0) displays an amber light. The manual says this is evidence that the CPU or DIMM is faulty. Following the manual, I remove the standby power, then checked out these Instructions for replacing the CPU to remove the CPU; Gautam also has done this before.
Fri Sep 6 20:09:01 2019 Fri Sep 6 20:09:02 2019
I pulled the leftmost CPU module out, following the instructions above. The CPU module matches the physical layout and part number of the Sun Fire X4600 M2 8-DIMM CPU module; pressing the fault reminder light gives amber indicators at the DIMM ejectors, indicating faulty DIMMs (see). The other indicator LEDs did not illuminate.
I located several spare DIMMs in the digital cabinet along Y arm (and a couple with misc computer components in the control room), but didn't find the correct one for this CPU module. The DIMM is Sun PN 371-1764-01; I found it online and ordered eight. Please let me know if this is incorrect.
To protect the CPU module, I've put it in an ESD safe bag with some bubble wrap and a note. It's on the E shop bench.
As suggested, I ran the script cds/rebootC1LSC.sh
I got a timeout error when the script tried closing the PSL shutter ('C1:AUX-PSL_ShutterRqst' not found), but Rana and I closed the shutter before leaving last night. c1sus is down, so the script found no route to host c1sus; I'm thinking I need to reset c1sus for the script to run completely. Nonetheless, c1lsc was rebooted, which crashed c1ioo and left the c1lsc FE all red (probably because c1sus wasn't restarted).
via Polish chat, GV tells us to RTFE
While going to take some transfer functions of the MC WFS loop, LSC was down. When we tried to restart the FE using 'rtcds restart --all', c1lsc crashed and froze. We manually reset c1lsc, then laboriously determined the correct order of machines to reboot. Here's what works best:
rtcds start c1x04 c1lsc c1ass c1oaf c1cal c1daf
Starting c1dnn crashes the other FE
rtcds restart --all
rtcds restart c1rfm c1sus c1mcs
restarting c1pem crashes the other FE on c1sus
We're seeing a lot of red IPC indicators--perhaps it's an issue with the order we're restarting?
There were a bunch of useless / degenerate channels added - e.g. whitening gains which are alreay burt-snapshot. Maybe there are many more useless channels being trended, but no need to add more.
Copy-pasting wasn't done correctly - the first 4 added channels were duplicates. There are in fact 5 LO power mons, one for each of the frequencies 11, 33, 55, 110 and 165 MHz.
I cleaned up. Basically only the detect-mon channels, and the ALS channels, are new in the setup now. I will review if any extra channels are required later. While checking that the daqd is happy, I noticed c1lsc FEs are in their stuck state, see Attachment #1. I guess a cable was bumped when the strain relief operation was underway. I'm not attempting a remote resuscitation.
I added the list of new c1iscaux channels to /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini and restarted the framebuilder. Koji had thought some of these channels might have previously existed under slightly different names. However, after looking through C0EDCU.ini and the other _SLOW.ini files, I did not find any candidates for removal. As far as I can tell, all of these channels are being recorded for the first time.
Following the death of rossa, which was hosting the only working environment for the GigE camera software, I've set up a new dedicated rackmount camera server: c1cam (details here). The Python server script is now configured as a persistent systemd service, which automatically starts on boot and respawns after a crash. The server depends on a set of EPICS channels being available to control the camera settings, so c1cam is also running a softIOC service hosting these channels. At the moment only the ETMX camera is set up, but we can now easily add more cameras.
Instructions for connecting to a live video feed are posted here. Any machine on the martian network can stream the feed(s). The only requirement is that the client machine have GStreamer 0.10 installed (all the control room workstations satisfy this).
As much as possible, the code and dependencies are hosted on the /cvs/cds network drive instead of installed locally. The client/server code and the Pylon5, PyPylon, and PyEpics dependencies are all installed at /cvs/cds/rtcds/caltech/c1/scripts/GigE. The configuration files for the soft IOC are located at /cvs/cds/caltech/target/c1cam.
The 40m GigE camera code is a slightly-updated version of the 10+ year-old camera code in use at the sites. Consequently every one of its dependencies is now deprecated. Ultimately, we'd like to upgrade to the following:
This is a long-term project, however, as many of these APIs are very different between Python 2 and 3.
I did some more investigation of what the appropriate cavity geometry would be for the OMC. Unsurprisingly, depending on the incident mode content, the preferred operating point changes. So how do we choose what the "correct" model is? Is it accurate to model the output beam HOM content from NPROs (is this purely determined by the geometry of the lasing cavity?), which we can then propagate through the PMC, IMC, and CARM cavities? This modeling will be written up in the design document shortly.
*Colorbar label errata - instead of 1 W on BS, it should read 1 W on PRM. The heatmaps take a while to generate, so I'll fix that in a bit.
Update 230pm PDT: I realize there are some problems with these plots. The critically coupled f2 sideband getting transmitted through the T=10% SRM should have significantly more power than the transmission through a T=100ppm optic. For similar modulation depth (which we have), I think it is indeed true that there will be x1000 more f2 power than f1 power for both the IFO AS beam and the LO pickoff through the PRC. But if the LO is picked off elsewhere, we have to do the numbers again.
Attachment #1: Two candidate models. The first follows the power law assumption of G1201111, while in the second, I preserved the same scaling, but for the f1 sideband, I set the DC level by assuming a PRG of 45, modulation depth of 0.18, and 100 ppm pickoff from the PRC such that we get 50 mW of carrier light (to act as a local oscillator) for 10 W incident on the back of PRM. Is this a reasonable assumption?
Attachment #2: Heatmaps of the OMC transmission, assuming (i) 0 contrast defect light in the carrier TEM00 mode, (ii) PRG=45 and (iii) 1 W incident on the back of PRM. The color bar limits are preserved for both plots, so the "dark" areas of the plot, which indicate candidate operating points, are darker in the left-hand plot. Obviously, when there is more f1 power incident on the OMC, more of it is transmitted. But my point is that the "best operating point(s)" in both plots are different.
Why is this model refinement necessary? In the aLIGO OMC design, an assumption was made that the light level of the f1 sideband is 1/1000th that of the f2 sideband in the interferometer AS beam. This is justified as the RC lengths are chosen such that the f2 sideband is critically coupled to the AS port, but the f1 is not (it is not quite anti-resonant either). For the BHD application, this assumption is no longer true, as long as the LO beam is picked off after the RF sidebands are applied. There will be significant f1 content as well, and so the mode content of the f1 field is critical in determining the OMC filtering performance.
The internal ribbon cable for the MC1 satellite box was replaced with the one in the spare box. The MC1 box was closed and reinstalled as before. The IMC is locking well.
Now the burnt cable was disassembled and reassembles with a new cable. It is now in the spare box.
The case closed (literally)
I have checked the MC1 satellite box and made a bunch of changes. For now, the glitches coming from the satellite box is gone. I quickly tested the MC1 damping and the IMC locking. The IMC was locked as usual. I still have some cleaning up but will work on them today and tomorrow.
Attachment 1: Result
The noise level of the satellite box was tested with the suspension simulator (i.e., five pair of the LED and PD in a plastic box).
Each plot shows the ASD of the sensor outputs 1) before the modification, 2) after the change, and 3) with the satellite box disconnected (i.e., the noise from the PD whitening filter in the SUS rack).
Before the modification, these five signals showed significant (~0.9) correlation each other, indicating that the noise source is common. After the modification, the spectra are lowered down to the noise level of the whitening filters, and there is no correlation observed anymore. EXCEPT FOR the LR sensor: It seems that the LR has additional noise issue somewhere in the downstream. This is a separate issue.
Attachment 2: Photo of the satellite box before the modification
The thermal environment in the box is terrible. They are too hot to touch. You can see that the flat ribbon cable was burned. The amps, buffers, and regulators generate much heat.
Attachment 3: Where the board was modified
- (upper left corner) Every time I touched C51, the diode output went to zero. So C51 was replaced with WIMA 10uF (50V) cap.
- (lower left area) I found a clear indication of the glitch coming from the PD bias path (U3C). So I first replaced another 10uF (C50) with WIMA 10uF (50V). This did not change the glitch. So I replaced U3 (LT1125). This U3 had unused opamp which had railed to the supply voltage. Pins 14 and 15 of U3 were shorted to ground.
- (lower right corner) Similarly to U3, U6 also had two opamps which are railed due to no termination. U6 was replaced, and Pins 11, 12, 14, and 15 were shorted to ground.
- (middle right) During the course of the search, I suspected that the LR glitch comes from U5. So U5 was replaced to the new chip, but this had no effect.
Attachment 4: Thermal degradation of the internal ribbon cable
Because of the heat, the internal ribbon cable lost the flexibility. The cable is cracked and brittle. It now exposes some wires. This needs to be replaced. I'll work on this later this week.
Attachment 5: Thermal degradation of the board
Because of the excessive heat for those 20years, the bond between the board and the patten were degraded. In conjunction with extremely thin wire pattern, desoldering of the components (particularly LT1125s) was very difficult. I'd want to throw away this board right now if it were possible...
Attachment 6: Shorting the unused opamps
This shows how the pieces of wires were soldered to ground vias to short the unused opamps.
Attachment 7: Comparison of the noise level with the sus simulator and the actual MC1 motion
After the satellite box fix, the sensor outputs were measured with the suspension connected. This shows that the suspension is moving much more than the noise level around 1Hz. However, at the microseismic frequency there is also most no mergin. Considering the use of the adaptive feedforward, we need to lower the noise of the satellite box as well as the noise of the whitening filters.
=> Use better chips (no LT1125, no current buffers), use low noise resistors, better thermal environment.
Started the troubleshoot from the MC1 issue. Gautam showed me how to use the fake PD/LED pair to diagnose the satellite box without involving the suspension mechanics.
This revealed that the MC1 has frequent light level glitches which are common for five sensors. This feature does not exist in the test with the MC3 satellite box. I will open and check the MC1 satellite box to find the cause of this common glitches tomorrow. MC1 is currently shutdown and undamped.
BTW, at the MC3 test, i found that J2 of the satellite box (male Dsub) has all the pins too low (or too short?). I brought the box outside and found that the housing of this connector was half broken down. The connector was reassembled and the metal parts of the housing was bent again so that the housing can hold the connector body tightly.
The MC3 satellite box was restored and connected to the cables. As I touched this box, it is still under probation.