I've put the analog camera back and disconnected the 151 unit GigE. But I ran out of time and wasn't able to replace the beamsplitter. I've put all the equipments back to the place where I took them from. The chopper and beam dump mount, that Koji had got me for the scatterometer, are kept outside, on the table I was working on earlier, in the control room. The camera lenses, additional GigEs, wedge beamsplitter, 1050nm LED and all related equipments are kept in the GigE box. This box was put back into CCD cameras' cabinet near the X arm.
Note: To clean stuff up, I had entered the lab around 9.30pm on Monday. This might have affected Yehonathan's loss measurement readings (until then around 57 readings had been recorded).
Sorry for the late update.
We run a loss measurement on the Y arm with 50 repetitions.
I want to collect some data with the arms locked to investigate the possibility/usefullness of having seismic feedforward implemented for the arms (it is already known to help the IMC length and PRC angular stability at low frequencies). To facilitate diagnostics I modified the file /users/Templates/Seismic/Seismic_vs_TRXTRYandMC.xml to have the correct channel names in light of Lydia's channel name changes in 2016. Looking at the coherence data, the alignment of the cartesian coordinate system of the Seismometers at the ends and the global interferometer coordinate system can be improved.
I don't know if for the MISO filter design if there is any difference in using TRX/TRY as the target, or the arm length control signal.
Data collection started at 1249018179. I've setup a script running in a tmux shell to turn off the LSC enable in 2 hours.
I analyze the 100 reps loss measurement of the Y arm using the AnalyzeLossData.ipynb notebook.
The mean of the measured loss is ~ 100ppm and the variation between the repititions is ~ 27%.
In the real measurement the misaligned and locked states are repeatedly switched between each other. I plot the misaligned and locked PD readings seperately over time.
There seems to be a drift that is correlated between the two readings. This is probably a drift in the power after the MC2. To verify, I plot the ratio between those readings and find no apparent drift:
The variation in the ratio is less than 1%. The loss figure, computed to be 1 minus this ratio times a big number, give a much worse variation. I plot the histogram of the loss figure at each repitition (excluding extremely bad measurements):
The mean is ~ 100ppm. And the variation is ~ 27%.
We hypothesize that the systematic error in the loss measurement can come from the fact that the requirement on the alignment of the cavity mirrors is not stringent enough.
We repeat the loss measurement with 50 measurements. This time we change the thresholds for the error signals of the dither-align in the measureArmLoss.py file from 0.5 to 0.3.
We repeat the analysis done before:
We plot the reflected power of the two states on top of each other:
This time it appears there was no drift. The histogram of the loss measurement:
The mean is 104ppm and the variation is 27%.
What I notice this time is that the PD readings in the aligned and misaligned states are anti-correlated. This is also true in the previous run (where there was drift) when looking in the short time scales. I plot several time series to demonstrate:
I wonder what can cause this behaviour.
The VEA laptop asia was configured to be able to connect to too many WiFi networks - it was getting conflicted in its default position at the vertex and trying to hop between networks, for some reason trying to connect to networks that had poor signal strength. I deleted all options from the known networks except 40MARS. Now the network connection seems much more stable and reliable.
We check for unexpected drifts in the PD reading (clipping and such). We put a pickoff mirror where the PD used to be and place the PD at the edge of the table such that the beam is focused on it (see attachment).
The arms are completley misaligned. We note the time of start of measurement to be 1249086917.
cdsutils is not working on rossa.
Import cdsutils produces this error:
In : import cdsutils
OSError Traceback (most recent call last)
<ipython-input-2-949babce8459> in <module>()
----> 1 import cdsutils
/ligo/apps/linux-x86_64/cdsutils-480/lib/python2.7/site-packages/cdsutils/__init__.py in <module>()
---> 55 import awg
56 except ImportError:
/ligo/apps/linux-x86_64/cdsutils-480/lib/python2.7/site-packages/cdsutils/awg.py in <module>()
---> 32 import sys, numpy, awgbase
33 from time import sleep
34 from threading import Thread, Event, Lock
/ligo/apps/linux-x86_64/cdsutils-480/lib/python2.7/site-packages/cdsutils/awgbase.py in <module>()
17 libawg = CDLL('libawg.so')
18 libtestpoint = CDLL('libtestpoint.so')
---> 19 libSIStr = CDLL('libSIStr.so')
/ligo/apps/anaconda/lib/python2.7/ctypes/__init__.pyc in __init__(self, name, mode, handle, use_errno, use_last_error)
365 if handle is None:
--> 366 self._handle = _dlopen(self._name, mode)
368 self._handle = handle
OSError: libSIStr.so: cannot open shared object file: No such file or directory
ML2013 is unable to open Simulink on any of the workstations. We decided to make the default version of Matlab R2015b (the default of the version of RCG we are using).
I commenced the procedure of the migration, starting with making a tagged commit of the current running simulink models. A local backup was also made, plus we have the usual chiara-based backup so I think we're in good hands.
Currently the branch and tag are protected - once we verify that everything works as expected post migration, I will open it up. I changed the directory structure of the models, need to confirm that the rtcds compilers don't have any hardcoded paths which may break due to my change.
The symlink to Matlab R2013 was deleted and a new symlink to R2015b was made. I activated the license using the Caltech campus license. Now running matlab from shell starts up R2015b . Simulink even works 😲 .
The requirement on the phase noise on the direct backscatter from the OMC back into the SRM is that it be less than @ 100 Hz, for a safety factor (arbitrarily chosen) of 10 (= 20dB below unsqueezed vacuum). Assuming 5 optics between the OMC and SRM which contribute incoherently for a factor of sqrt(5), and assuming a total of 1 ppm of the LO power to be backscattered, we need the suspensions to be moving @ 100 Hz. This seems possible to realize with single stage suspensions - I assume we get f^4 filtering from the pendulum at 100 Hz, and that there is an additional 80 dB attenuation (from the stack) of the assumed 1 micron/rtHz motion at 100 Hz, for an overall 160 dB attenutaiton, yielding 10^-14 m/rtHz at 100 Hz.
This is the same calculation as I had posted a couple of months ago (see elog that this is a reply to), except that Koji pointed out that the LO power is expected to dominate the (carrier) power incident on the OMC cavity(ies). So the more meaningful comparison to make is to have the x-axes of the plots denote the backscatter fraction, rather than the LO power. One subtlety is that because the phase of the scattered field is random, the displacement-noise induced phase noise could show up in the amplitude quadrature. I think that in these quadrature field amplitude units, the RIN and phase noise are directly comparable but I might have missed a factor of 2*pi. But in the worst case, if all the phase noise shows up in the amplitude quadrature, we end up being only ~10dB below unsqueezed vacuum (for 1 ppm backscatter).
For the requirement on the noise in the intensity quadrature - I think this is automatically satisfied because the RIN requirement on the incident LO field is in the mid 10^-9 1/rtHz regime.
I grab 2 hours of the PD measurements using dlData_simple.ipynb in the misaligned state.
I get pretty much a normally distributed reading without drifts (Attachements 1 and 2).
The error in the reading is ~ 0.5%.
I am pretty sure this amount of noise is enough to explain the big noise in the Loss figure measurement.
The reason is that the loss formula is #(1-P_Locked/P_Misaligned+T1)-T2) where T1 and T2 are the transmissions of the ITM and ETM.
The average of the ratio P_Locked/P_Misaligned is ~ 1.01 for a loss figure of ~ 100ppm.
The standard deviation of the ratio is ~ 1% which is also the standard deviation of the expression in the brackets.
The average of this experssion however is ~ 0.01.
The reduction of the mean amplifies the error in the loss measurments by a factor of a few 10s!
Couple IR light into fiber with good MM at EY
At ~1am PDT today, all the MC1 shadow sensor readbacks (fast CDS channels and Slow Acromag channels, latter not shown here) went to negative values. Of course a negative value makes no sense. After ~3 hours, they came back to positive values again. But since then, the shadow sensor RMS noise has been significantly higher in the >20 Hz band, and there are frequent glitches which kick the suspension. The IMC has been having trouble staying locked. I claim that this has to do with the Satellite box.
No action being taken now while I work on the ALS. In the past the problem has fixed itself.
We scoped out the 1Y3 rack this morning to figure out what needs to be done hardware wise. We did not think about how to power the Acromag crate - the LSC rack electronics are all powered by linear supplies and not Sorensens, and the linear supplies are operating at pretty close to their maximum current-drive. The Acromag box draws ~3A of current from the 20 V supply, not sure what the current draw will be from the 15 V supply. Options:
I'm going with option #2 unless anyone has strong objections.
We want to know that we can lock the interferometer with the ALS beat note being generated by beating IR pickoffs (rather than the vertex green transmission). The hope is also to make the ALS system good enough that we can transition the CARM offset directly to 0 after the DRMI is locked with arms held off resonance.
Attachment #1: Shows the layout. The realized MM is ~36 %. c.f. the 85% predicted by a la mode. It is difficult to optimize much more given the tight layout, and the fact that these fast lenses require the beam to be well centered on them. They are reasonably well aligned, but I don't want to futz around with the pointing into the doubling crystal. Consequently, I don't have much control over the pointing.
Attachment #2: Shows pictures of the fiber tips at both ends before/after cleaning. The tips are now much cleaner.
The BeatMouth NF1611 DC monitor reports ~580 mV with only the EY light incident on it. This corresponds to ~60 uW of light making it to the photodiode, which is only 25% of what we send in. This is commensurate with the BS loss + mating sleeve losses.
To find the beat between PSL and EY beams, I had to change the temperature control MEDM slider for the EY laser to -8355 cts (it was 225 cts). Need to check where this lies in the mode-hop scan by actually looking at the X-tal temperature on the front panel of the EY NPRO controller - we want to be at ~39.3 C on the EY X-tal, given the PSL X-tal temp of ~30.61 C. Just checked it, front panel reports 39.2C, so I think we're good.
EY enclosure was closed up and ETMY Oplev was re-enabled after my work. Some cleanup/stray beam dumping remains to be done, I will enlist Chub's help on Monday.
I borrowed an old-looking Variac variable transformer from the power supplies cabinet along the y-arm. It is currently in the TCS lab.
I bench tested the functionality of all the c1iscaux Acromag crate channels. Summary: we are not ready for a Monday install, much debugging remains.
I am leaving the crate powered (by bench supplies) in the office area so I have the option to work remotely on this.
With Chub's help, most of the problems have been resolved. Summary: I judge that we are good to go ahead with an install tomorrow.
Since we don't immediately need the CM board, I say we push ahead with the install - at least that will restore the ability to lock PRMI / DRMI. Then we can debug these issues in situ - I'm certain the issue is related to the EPICS/Modbus setup and not the hardware because I verified the physical channel map using the Acromag windows utility.
Any other ideas? The problem persists and it's annoying that the IMC cannot be locked.
> Looking through the manual, I found a recommendation (pg10) that the "IN-" terminal of the Acromag ADC units be tied to the "RTN" pins on the same units. I don't know if this preserves the differential receiving capability of the Acromag ADCs
I suppose, we loose the differential capability of an input if the -IN is connected to whatever defined potential. We should check if the channels are still working as a true differential or not.
2. If the multi bit operation is too complicated to solve, we can use EPICS Calc channels to breakout a value to bits and send the individual bits as same as the other individual binary channels.
This morning, I wanted to move the existing cables going to the P1 connectors of the iLIGO whitening boards to the P2 connector, to test the modifications made to allow whitening stage switching. Unfortunately, I found that the shrouds werent installed. Where can I find these?
As it turns out, only one extra shroud needed to be installed - I did this and migrated the cables for the 4 whitening boards from the P1 to P2 connectors. So until the new Acromag box is installed, we have no control over the whitening gains (slow channels), but do still have control over the whitening filter enable/disable (controlled by fast BIO). I am thinking about the easiest way to test the latter - I think the ambient PD dark noise level is too low to be seen above ADC noise even with the whitening enabled, and setting up drive signals to individual channels is too painful - maybe with +45dB of whitening gain, the (z,p) whitening filter shape can be seen with just PD/demod chain electroncis noise.
I came aross an interesting suggestion by Yutaro that KAGRA's low-frequency ALS noise could be limited by the fact that the IMC comes between the point where the frequencies of the PSL and AUX lasers are sensed (i.e. the ALS beat note), and the point where we want them to be equal (i.e. the input of the arm cavity). I wanted to see if the same effect could be at play in the 40m ALS system. A first estimate suggests to me that the numbers are definitely in the ballpark. If this is true, we may benefit from lower noise ALS by picking off the PSL beam for the ALS beat note after the IMC.
Even though the KAGRA phase lock scheme is different from the 40m scheme, the algebra holds. I needed an estimate of how much the arm cavity moves, I used data from a POX lock to estimate this. The estimate is probably not very accurate (since the arm cavity length is more stable than the IMC length, and the measured ALS noise, e.g. this elog, is actually better than what this calculation would have me believe), but should be the right order of magnitude. From this crude estimate, it does look like for f<10 Hz, this effect could be significant. I assumed an IMC pole of 3.8 kHz for this calculation.
I've indicated a "target" ALS performance where the ALS noise would be less than the CARM linewidth, which would hopefully make the locking much easier. Seems like realizing this target will be touch-and-go. But if we can implement length feedforward control for the arm cavities using seismometers, the low frequency motion of the optics should go down. It would be interesting to see if the ALS noise gets better at low frequencies with length feedforward engaged.
* Some updates were made to the plot:
What about just use high gain feedback to MC2 below 20 Hz for the IMC lock? That would reduce the excess if this theory is correct.
Installation: The following equipment were installed in 1Y3, see Attachment #1:
Removal: The following equipment was removed from 1Y3:
I judge that we are good to go ahead with an install tomorrow.
Work done today:
Testing of functionality:
Much testing remains to be done, but I defer further testing till Monday - the main functionality to be verified in the short run is the whitening gain stepping. The strain-relief of cables and general cleanup will be undertaken by Chub. Current state of affairs is in Attachment #3, leaves much to be desired in terms of cleanliness.
I will also setup the autoburt for the new machine on Monday. We will also need to add some channels to C0EDCU.ini if we want to trend them over some years (e.g. RF signal powers for monitoring ERA-5 health).
* c1lsc FE was rebooted using the usual script, and everything seems to be healthy in CDS-land again, see Attachment #4.
Here is what is left to do:
Today I set up the autoburt.req file for the c1iscaux channels, and confirmed that the snapshots are getting recorded. There were a lot of channels in the old autoburt.req file which I thought were un-necessary (and several which no longer exist), so now the only channels that are burt-ed are the whitening gains and states of the AA filters. If someone feels we need more channels to be snapshot recorded, you can add them to the file.
In the old target directory, there were also various versions of a "saverestore.req" file - why do we need this in addition to an autoburt? I guess it is possible they are used by the IFOconfigure scripts to setup some whitening gains etc...
Started the troubleshoot from the MC1 issue. Gautam showed me how to use the fake PD/LED pair to diagnose the satellite box without involving the suspension mechanics.
This revealed that the MC1 has frequent light level glitches which are common for five sensors. This feature does not exist in the test with the MC3 satellite box. I will open and check the MC1 satellite box to find the cause of this common glitches tomorrow. MC1 is currently shutdown and undamped.
BTW, at the MC3 test, i found that J2 of the satellite box (male Dsub) has all the pins too low (or too short?). I brought the box outside and found that the housing of this connector was half broken down. The connector was reassembled and the metal parts of the housing was bent again so that the housing can hold the connector body tightly.
The MC3 satellite box was restored and connected to the cables. As I touched this box, it is still under probation.
I have checked the MC1 satellite box and made a bunch of changes. For now, the glitches coming from the satellite box is gone. I quickly tested the MC1 damping and the IMC locking. The IMC was locked as usual. I still have some cleaning up but will work on them today and tomorrow.
Attachment 1: Result
The noise level of the satellite box was tested with the suspension simulator (i.e., five pair of the LED and PD in a plastic box).
Each plot shows the ASD of the sensor outputs 1) before the modification, 2) after the change, and 3) with the satellite box disconnected (i.e., the noise from the PD whitening filter in the SUS rack).
Before the modification, these five signals showed significant (~0.9) correlation each other, indicating that the noise source is common. After the modification, the spectra are lowered down to the noise level of the whitening filters, and there is no correlation observed anymore. EXCEPT FOR the LR sensor: It seems that the LR has additional noise issue somewhere in the downstream. This is a separate issue.
Attachment 2: Photo of the satellite box before the modification
The thermal environment in the box is terrible. They are too hot to touch. You can see that the flat ribbon cable was burned. The amps, buffers, and regulators generate much heat.
Attachment 3: Where the board was modified
- (upper left corner) Every time I touched C51, the diode output went to zero. So C51 was replaced with WIMA 10uF (50V) cap.
- (lower left area) I found a clear indication of the glitch coming from the PD bias path (U3C). So I first replaced another 10uF (C50) with WIMA 10uF (50V). This did not change the glitch. So I replaced U3 (LT1125). This U3 had unused opamp which had railed to the supply voltage. Pins 14 and 15 of U3 were shorted to ground.
- (lower right corner) Similarly to U3, U6 also had two opamps which are railed due to no termination. U6 was replaced, and Pins 11, 12, 14, and 15 were shorted to ground.
- (middle right) During the course of the search, I suspected that the LR glitch comes from U5. So U5 was replaced to the new chip, but this had no effect.
Attachment 4: Thermal degradation of the internal ribbon cable
Because of the heat, the internal ribbon cable lost the flexibility. The cable is cracked and brittle. It now exposes some wires. This needs to be replaced. I'll work on this later this week.
Attachment 5: Thermal degradation of the board
Because of the excessive heat for those 20years, the bond between the board and the patten were degraded. In conjunction with extremely thin wire pattern, desoldering of the components (particularly LT1125s) was very difficult. I'd want to throw away this board right now if it were possible...
Attachment 6: Shorting the unused opamps
This shows how the pieces of wires were soldered to ground vias to short the unused opamps.
Attachment 7: Comparison of the noise level with the sus simulator and the actual MC1 motion
After the satellite box fix, the sensor outputs were measured with the suspension connected. This shows that the suspension is moving much more than the noise level around 1Hz. However, at the microseismic frequency there is also most no mergin. Considering the use of the adaptive feedforward, we need to lower the noise of the satellite box as well as the noise of the whitening filters.
=> Use better chips (no LT1125, no current buffers), use low noise resistors, better thermal environment.
The internal ribbon cable for the MC1 satellite box was replaced with the one in the spare box. The MC1 box was closed and reinstalled as before. The IMC is locking well.
Now the burnt cable was disassembled and reassembles with a new cable. It is now in the spare box.
The case closed (literally)
I did some more investigation of what the appropriate cavity geometry would be for the OMC. Unsurprisingly, depending on the incident mode content, the preferred operating point changes. So how do we choose what the "correct" model is? Is it accurate to model the output beam HOM content from NPROs (is this purely determined by the geometry of the lasing cavity?), which we can then propagate through the PMC, IMC, and CARM cavities? This modeling will be written up in the design document shortly.
*Colorbar label errata - instead of 1 W on BS, it should read 1 W on PRM. The heatmaps take a while to generate, so I'll fix that in a bit.
Update 230pm PDT: I realize there are some problems with these plots. The critically coupled f2 sideband getting transmitted through the T=10% SRM should have significantly more power than the transmission through a T=100ppm optic. For similar modulation depth (which we have), I think it is indeed true that there will be x1000 more f2 power than f1 power for both the IFO AS beam and the LO pickoff through the PRC. But if the LO is picked off elsewhere, we have to do the numbers again.
Attachment #1: Two candidate models. The first follows the power law assumption of G1201111, while in the second, I preserved the same scaling, but for the f1 sideband, I set the DC level by assuming a PRG of 45, modulation depth of 0.18, and 100 ppm pickoff from the PRC such that we get 50 mW of carrier light (to act as a local oscillator) for 10 W incident on the back of PRM. Is this a reasonable assumption?
Attachment #2: Heatmaps of the OMC transmission, assuming (i) 0 contrast defect light in the carrier TEM00 mode, (ii) PRG=45 and (iii) 1 W incident on the back of PRM. The color bar limits are preserved for both plots, so the "dark" areas of the plot, which indicate candidate operating points, are darker in the left-hand plot. Obviously, when there is more f1 power incident on the OMC, more of it is transmitted. But my point is that the "best operating point(s)" in both plots are different.
Why is this model refinement necessary? In the aLIGO OMC design, an assumption was made that the light level of the f1 sideband is 1/1000th that of the f2 sideband in the interferometer AS beam. This is justified as the RC lengths are chosen such that the f2 sideband is critically coupled to the AS port, but the f1 is not (it is not quite anti-resonant either). For the BHD application, this assumption is no longer true, as long as the LO beam is picked off after the RF sidebands are applied. There will be significant f1 content as well, and so the mode content of the f1 field is critical in determining the OMC filtering performance.
I added the list of new c1iscaux channels to /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini and restarted the framebuilder. Koji had thought some of these channels might have previously existed under slightly different names. However, after looking through C0EDCU.ini and the other _SLOW.ini files, I did not find any candidates for removal. As far as I can tell, all of these channels are being recorded for the first time.
Following the death of rossa, which was hosting the only working environment for the GigE camera software, I've set up a new dedicated rackmount camera server: c1cam (details here). The Python server script is now configured as a persistent systemd service, which automatically starts on boot and respawns after a crash. The server depends on a set of EPICS channels being available to control the camera settings, so c1cam is also running a softIOC service hosting these channels. At the moment only the ETMX camera is set up, but we can now easily add more cameras.
Instructions for connecting to a live video feed are posted here. Any machine on the martian network can stream the feed(s). The only requirement is that the client machine have GStreamer 0.10 installed (all the control room workstations satisfy this).
As much as possible, the code and dependencies are hosted on the /cvs/cds network drive instead of installed locally. The client/server code and the Pylon5, PyPylon, and PyEpics dependencies are all installed at /cvs/cds/rtcds/caltech/c1/scripts/GigE. The configuration files for the soft IOC are located at /cvs/cds/caltech/target/c1cam.
The 40m GigE camera code is a slightly-updated version of the 10+ year-old camera code in use at the sites. Consequently every one of its dependencies is now deprecated. Ultimately, we'd like to upgrade to the following:
This is a long-term project, however, as many of these APIs are very different between Python 2 and 3.
There were a bunch of useless / degenerate channels added - e.g. whitening gains which are alreay burt-snapshot. Maybe there are many more useless channels being trended, but no need to add more.
Copy-pasting wasn't done correctly - the first 4 added channels were duplicates. There are in fact 5 LO power mons, one for each of the frequencies 11, 33, 55, 110 and 165 MHz.
I cleaned up. Basically only the detect-mon channels, and the ALS channels, are new in the setup now. I will review if any extra channels are required later. While checking that the daqd is happy, I noticed c1lsc FEs are in their stuck state, see Attachment #1. I guess a cable was bumped when the strain relief operation was underway. I'm not attempting a remote resuscitation.
While going to take some transfer functions of the MC WFS loop, LSC was down. When we tried to restart the FE using 'rtcds restart --all', c1lsc crashed and froze. We manually reset c1lsc, then laboriously determined the correct order of machines to reboot. Here's what works best:
rtcds start c1x04 c1lsc c1ass c1oaf c1cal c1daf
Starting c1dnn crashes the other FE
rtcds restart --all
rtcds restart c1rfm c1sus c1mcs
restarting c1pem crashes the other FE on c1sus
We're seeing a lot of red IPC indicators--perhaps it's an issue with the order we're restarting?
via Polish chat, GV tells us to RTFE
As suggested, I ran the script cds/rebootC1LSC.sh
I got a timeout error when the script tried closing the PSL shutter ('C1:AUX-PSL_ShutterRqst' not found), but Rana and I closed the shutter before leaving last night. c1sus is down, so the script found no route to host c1sus; I'm thinking I need to reset c1sus for the script to run completely. Nonetheless, c1lsc was rebooted, which crashed c1ioo and left the c1lsc FE all red (probably because c1sus wasn't restarted).
I reset c1lsc, c1sus, and c1ioo.
I noticed that the script gives the command 'ssh c1XXX', but we have been getting no route to host using this command. Instead, the machines are currently only reachable as c1XXX.martian. I'm not sure why this is, so I just appended .martian in rebootC1LSC.sh
This time, the script does run. I did get 'no route to host' on c1ioo, so I think I need to reset that machine again. After reset, the script failed to login to c1ioo and c1lsc.
Fri Sep 6 13:09:05 2019
After lunch, I reset the computers again, and try the script again. There is again no route to host for c1ioo. I'm going inside to shutoff the power to c1ioo, since the reset buttom seems to not be working. I still can't login from nodus, so I'm bringing a keyboard and monitor over to plug in directly.
On reset, c1ioo repeatedly reaches the screen in attachment 1, before going black. Holding down shift or ctrl+alt+f1 doesn't get me a command prompt. After waiting/searching the elog for >>3 min, we decided to follow these instructions to cycle the power of c1ioo. The same problem recurred following power up. I found online some instructions that the SunSystems 4600 can hang during reboot if it has become too hot ("reboot during a thermal shutdown"); I did notice that the temperature light was on earlier in this procedure, so perhaps that is the problem. I followed the wiki instructions to shut down the computer again (pressed power button, unplugged 4 power supplies from back of machine), and left it unplugged for 10-30 min (Fri Sep 6 14:46:18 2019 ).
Fri Sep 6 15:03:31 2019
Rana plugged in the power supplies and reset the machine again.
Fri Sep 6 16:30:37 2019
c1ioo is still unreachable! I pressed reset once, and the reset button flashes white. The yellow warning light is still on.
Fri Sep 6 16:54:21 2019
The reset light has stopped flashing, but I still can't access c1ioo. I reset once more, this time watching c1ioo on a monitor directly. I'm still seeing the same boot screen repeatedly. I do see that CPU0 is not clocking, which seems weird.
Following gautam's elog here, I found the Sun Fire X4600 manual for locating faulty CPUs. After the white reset light stopped flashing, I held down the power button to turn off the system. Before shutdown, all of the CPU displayed amber lights; after shutdown, only the leftmost CPU (as viewed from the back, presumably CPU0) displays an amber light. The manual says this is evidence that the CPU or DIMM is faulty. Following the manual, I remove the standby power, then checked out these Instructions for replacing the CPU to remove the CPU; Gautam also has done this before.
Fri Sep 6 20:09:01 2019 Fri Sep 6 20:09:02 2019
I pulled the leftmost CPU module out, following the instructions above. The CPU module matches the physical layout and part number of the Sun Fire X4600 M2 8-DIMM CPU module; pressing the fault reminder light gives amber indicators at the DIMM ejectors, indicating faulty DIMMs (see). The other indicator LEDs did not illuminate.
I located several spare DIMMs in the digital cabinet along Y arm (and a couple with misc computer components in the control room), but didn't find the correct one for this CPU module. The DIMM is Sun PN 371-1764-01; I found it online and ordered eight. Please let me know if this is incorrect.
To protect the CPU module, I've put it in an ESD safe bag with some bubble wrap and a note. It's on the E shop bench.
Assuming you are at pianosa, /etc/resolv.conf is like
# Generated by NetworkManager
But this should be like
as indicated in https://nodus.ligo.caltech.edu:8081/40m/14767
I did this change for now. But this might get overridden by Network Manager.
There was an alarm sound from the Smart-UPS 2200 sitting under the workstation. I see that the 'replace battery' light is red, and this elog tells me that these batteries are replaced every ~1-4 years; the last replacement was march 2016. Holding down the 'test' button for 2-3 seconds results in the alarm sound and does not clear the replace battery indicator.
please no one touch the UPS: last time it destroyed ROSSA. Please ask Chub to order the replacement batteries so we can do this in a controlled way (fully shutting down ALL workstations first). Last time we wasted 8 hours on ROSSA rebuilding.
Q1 Can we run the machine with the reduced # of cores?
Q2 We might be able to order them quickly. What's the spec and configuration of the DIMMs (like DDR2-667MHz ECC 4GBx4, and even more specs (like Samsung 2GB DDR2 RAM PC2-6400 240-Pin DIMM M378T5663EH3) so that we are to identify the exact spec).
Q3 Can we scavenge the old OMC RT machine or even megatron to extract the memories?
Saw these slightly delayed.
Q1: Not sure--is it a safe operation for me to remove the DIMM on CPU0, replace CPU0 (with no DIMM), and boot up to try this?
Q2: Specifically, it's this DIMM. The CPU core is compatible with DDR2, clock rate up to 333 MHz (DDR2-667) and 1, 2, or 4 GB of memory.
One pair of DIMM cards from the Sunstone box had the same Sun part number as those in c1ioo, so I swapped them in and reinstalled c1ioo's CPU0. c1ioo now boots up an seems ready to go, I'm able to log on from nodus. I also reinstalled optimus' CPU0, and optimus boots up with no problems.
While opening up optimus, I noticed a box labelled 'SUNSTONE' sitting below the rack--it contains two CPU modules a similar type as in c1ioo! I'm going to try swapping in the DIMM cards from this SUNSTONE box; I didn't find any elogs about sunstone--where are these modules from?
I reset c1lsc and c1sus, then ran rebootC1LSC.sh as before. All models started by the script are running with minimal red lights; c1oaf, c1cal, c1dnn, c1daf, and c1omc are not started by the script. I manually started these in the order c1cal->c1oaf->c1daf->c1dnn. Starting c1dnn crashed the other FE on c1ioo, so I reset all three FE again, and ran the script again (this time, including the startup for c1cal, c1oaf, and c1daf, but excluding c1dnn).
Except for c1dnn and c1omc, all models are started. The status lights are attached.
[rika, aaron, rana]
We are getting the MC locked in anticipation of making some WFS transfer function measurements.
The PSL screen was all white boxes, so I keyed the PSL crate and burt restored the settings from 11:19am Sep 5 (somewhat earlier than we started rebooting computers). Following this, I ran Milind's unstick.py and then the PSL autolocker script; both worked on the first go, great work Milind!
The modecleaner autolocking script is having substantially more trouble. Rana found that pitch and yaw sliders for all MC optics have been swapped--we think it's because the camera at MC2 has been rotated. Note that for now, sliding pitch gives a change in yaw, and sliding yaw changes pitch.
We noticed that with the WFS servo on, the modecleaner would be well aligned for a while (MC trans ~ 14000), only to lose lock after several minutes. We held the MC2_TRANS_PIT/YAW outputs at 0, so the MC2 QPD does not affect the WFS loop; the beam is well centered on WFS1/2, but not on the MC2 QPD, and with this signal out of the loop MC TRANS recovers to ~15000 counts (consistent with the quiet times over the last 90 days, see attachment 2). Attachment 1 shows the MC lock degrading, followed by some noise where we lost lock, and finally a visible increase in MC trans when we remove the MC2 QPD from the WFS loop.
MC1 Pich 4.4762 Yow 4.4669
MC2 Pich 3.7652 Yow -1.5482
MC3 Pich -0.4159 Yow 1.1477
After automatic locking MC, we stopped automatical locking and took alignment to the center of QPD.
And then again did the automatic locking MC. Finaly Rana move to best alignment.
MC1 Pich 4.4942 Yow 4.6956
MC2 Pich 3.7652 Yow -1.5600
MC3 Pich -0.3789 Yow 1.1477
We used diaggui to measure the response of WFS1/WFS2/MC2 pitch (yaw) to excitations in MC1/MC2/MC3 pitch (yaw). Seeing fluctuations of amplitude ~1 on the MCX_PIT/YAW_OUT channels, we used an amplitude 0.01 excitation at 20 Hz. We will work on scripting some of this tomorrow.
Still removing old cable, terminal blocks and hardware. Once new strain reliefs and cable guides are in place, I will need to disconnect cables and reroute them. Please let me know dates and times when that is not going to interrupt your work!
I picked up a unit of D1900068 SR785 accessory box from Dean's office at Downs.
We should also have a plan for the next couple weeks so we are organized; heavily adapted from. Here's what I'm thinking this morning:
It would be good to script some of what we did yesterday. I'm checking out some scripts I'd used for Qryo and armloss measurements to remember the best way to do this.
I noticed yesterday that the PSL_shutterqst box is white, and I've seen timeout requests when eg the reboot script tries to open/close the PSL shutter. It seems like a shutter that should open, so I should find the aux machine to restart it.
We identified the Jenne laser and found a long optical fiber that might be able to transport our beam to the AP table.
Now we're searching for documentation on using this laser. Kevin and John measured a TF last year. Koji advised that we needn't worry too much, the current limit is already set correctly and we need only power on the laser.
We moved the breadboard (including a couple PDs, collimating lenses, laser, steering mirrors, etc) over to the AP table, and set it on top of the panel next to the WFS. We mounted the laser on the AP table, and added one lens with f~68 mm after the laser to fit the beam on a single quadrant; the beam was about 1mm diameter (measured by eye) when it entered the QPD. We turned the laser driver on at ~19.4 mA, and directed it to WFS2 via the last two steering mirrors before WFS2.
We monitored the QPD segments' DC level with ndscope on a laptop, and were able to send the beam to each of the four quadrants in turn. We set up the Agilent network analyzer to drive the laser's amplitude modulation and sent the RF signal from the LEMO output on the QPD head directly to the network analyzer. We will take the measurements tomorrow morning.
Chub wanted to get the correct part number for the replacement UPS batteries which necessitated opening up the UPS. To be cautious, all the workstations were shutdown at ~9:30am while the unit is pulled out and inspected. While looking at the UPS, we found that the insulation on the main power cord is damaged at both ends. Chub will post photos.
However, despite these precautions, rossa reports some error on boot up (not the same xdisp junk that happened before). pianosa and donatella came back up just fine. It is remotely accessible (ssh-able) though so maybe we can recover it...
please no one touch the UPS: last time it destroyed ROSSA. Please ask Chub to order the replacement batteries so we can do this in a controlled way (fully shutting down ALL workstations first). Last time we wasted 8 hours on ROSSA rebuilding