About 30mins ago, I saw another glitch on MC1 - this happened while the Watchdog was shutdown.
In order to further narrow down the cause of the glitch, we switched the Coil Driver Board --> Satellite box DB(15?) connectors on the coil drivers between MC1 and MC3 coil driver boards. I also changed the static PIT/YAW bias voltages to MC1 and MC3 such that MC-REFL is now approximately back to the center of the CCD monitor.
To clarify, I logged into fb1, and ran sudo systemctl restart daqd_*. The only change I made was to uncomment the line quoted below in the master file.
sudo systemctl restart daqd_*.
Looking at the log using systemctl, I see the following (I just tried restarting the daqd processes again):
Aug 11 00:00:31 fb1 daqd_fw: LDASUnexpected::unexpected: Caught unexpected exception "
This is a bug. Please log an LDAS problem report including this message
Aug 11 00:00:31 fb1 daqd_fw: daqd_fw: LDASUnexpected.cc:131: static void LDASTools::Error::LDASUnexpected::unexpected(): Assertion `false' failed.
Aug 11 00:00:32 fb1 systemd: daqd_fw.service: main process exited, code=killed, status=6/ABRT
Aug 11 00:00:32 fb1 systemd: Unit daqd_fw.service entered failed state.
Aug 11 00:00:32 fb1 systemd: daqd_fw.service holdoff time over, scheduling restart.
Aug 11 00:00:32 fb1 systemd: Stopping Advanced LIGO RTS daqd frame writer...
Aug 11 00:00:32 fb1 systemd: Starting Advanced LIGO RTS daqd frame writer...
Aug 11 00:00:32 fb1 systemd: daqd_fw.service start request repeated too quickly, refusing to start.
Aug 11 00:00:32 fb1 systemd: Failed to start Advanced LIGO RTS daqd frame writer.
Aug 11 00:00:32 fb1 systemd: Unit daqd_fw.service entered failed state.
The live data grabbing using cdsutils still seems to be working though - so I've kicked MC1 again, and am grabbing 2 hours of data live on Pianosa.
So we tried this again with a fresh build of daqd_fw, and it still fails. The error message is pointing to an underlying bug in the framecpp library ("LDASTools"), which may be tricky to solve. I'm rustling the appropriate bushes...
So it appears we now have full frames and second, minute, and minute_raw trends.
We are still not able to raise test points with daqd_rcv (e.g. the NDS1 server), which is why dataviewer and nds2-client can't get test points on their own.
We were not able to add the EDCU (EPICS client) channels without daqd_fw crashing.
We have a new kernel image that's supposed to solve the module unload instability issue. In order to try it we'll need to restart the entire system, though, so I'll do that on Monday morning.
I've got the CDS guys investigating the test point and EDCU issues, but we won't get any action on that until next week.
Remaining unresolved issues:
The coil driver electronics for MC1, upstream of the Satellite box, was what was previously MC3 electronics.
Attachment #1 shows that there were no glitches in MC3 sensor channels (which are now physically connected to what was previously MC1 coil driver electronics).
Attachment #2 shows the second trends for a 12 hour period for MC1 and MC3 sensor channels. The MC3 channels look well behaved, but there are frequent glitches (at least 9 in the last 12 hours ) visible in the MC1 channels.
So to recap:
I need to confirm that the output of the coil driver board goes straight to the Sat. Box, but if there are no intermediate elements, the problem is either in the cable from coil driver to sat. box, or downstream of the Satellite box - i.e. vacuum feedthroughs or the suspension itself? The size of the glitches is roughly the same in all 4 face channels (~60-80cts pp).
GV addendum 14 Aug 2017, 10.30am: Attachment #3 shows the second trend for the MC sensor channels over the weekend. While there were many on Saturday, it seems that Sunday was quieter.
To add to Gautam's entry: we swapped the cables at the coil driver side (these are the ones that go from coil driver to sat box). In this state, damping is not useable since the MC1 servos would drive MC3.
~70 counts in the sensor means ~70 microns of motion. Since the watchdogs are off and the coil drivers are swapped, this can't be laser beam getting in to the sensors.
WE have to consider that these are some real strain release type events happening in the suspension wire or wire standoff, so may require a vent to inspect and possible repair MC1.
We used to have similar suspension excursion at ETMX. This was the motivation to replace the stand-offs from Al ones to ruby ones. Did the replacement solve the issue at ETMX?
Decided to try adding in an OP amp just to see if it would work. Added LT1012 and a 100k resistor to the circuit (I originally wanted to do AD743 as it seems to be the best choice according to Zach's elog here, but it said that they are very precious so I went with LT1012 for testing purposes). When heating it with a heating gun, the output voltage went down by a few 0.01V. The maximum voltage was 0.686V. Similar thing happened when I switched to a 10k resistor, where the maximum was 0.705V and it also went down by a few 0.01V upon heating.
I've attached a few pictures showing the circuit.
I didn't realize that the LT1012 needed an additional input to function. I added in +15V and -15V to pins 7 and 4, respectively and placed a 10k resistor and the numbers make more sense now. The voltage showed a negative value, but it became more negative as I heated it up (it's negative due to how a transimpedance amplifier works).
I have attached the new setup and the value it shows (~-3V). It became more negative by about 0.4V, which translates to about a 40K increase in temperature, which makes sense.
In addition, I have attached an updated sketch of the circuit. I will need to do more testing to determine how accurate this is. The next step would be to calculate how much noise there is currently and figure out how to remove this circuit from the breadboard and use a PCB or something like that for final testing in an insulated container.
The reason I chose AD743 initially for the OP amp is because at low frequencies (which is what we are working with), a FET amp such as AD743 will have a low current noise at high impedance, which is what we have in this case. While a FET amp has high voltage noise compared to other OP amps, the current noise becomes more important at high impedance, so it will work better. According to Zach's graphs, the AD743 is best at high impedances, followed by LT1012.
Today, I borrowed the fiber microscope from Johannes and took a look at the fibers coupled to the PDs. The PD labelled "BEAT PD AUX Y" has an end that seems scratched (Attachments #1 and #2). The scratch seems to be on (or at least very close to) the core. The other PD (Attachments #3 and #4) doesn't look very clean either, but at least the area near the core seems undamaged. The two attachments for each PD corresponds to the two available lighting settings on the fiber microscope.
I have not attempted to clean them yet, though I have also borrowed the cleaning supplies to facilitate this from Johannes. I also plan to inspect the ends of all other fiber connections before re-installing them.
Last week, we were talking about reviving the Fiber ALS box. Right now, it's not in great shape. Some changes to be made:
Previous elog thread about work done on this box: elog11650
I'm upgrading the linux kernel for all the front ends to one that is supposedly more stable and won't freeze when we unload RTS models (linux-image-3.2.88-csp). Since it's a different kernel version it requires rebuilds of all kernel-related support stuff (mbuf, symmetricom, mx, open-mx, dolphin) and all the front end models. All the support stuff has been upgraded, but we're now waiting on the front end rebuilds, which takes a while.
Initial testing indicates that the kernel is more stable; we're mostly able to unload/reload RTS modules without the kernel freezing. However, the c1iscey host seems to be oddly problematic and has frozen twice so far on module unloads. None of the other hosts have frozen on unload (yet), though, so still not clear.
We're now seeing some timing errors between the front ends and daqd, resulting in a "0x4000" status message in the 'C1:DAQ-DC0_*_STATUS' channels. Part of the problem was an issue with the IRIG-B/GPS receiver timing unit, which I'll log in a separate post. Another part of the problem was a bug in the symmetricom driver, which has been resolved. That wasn't the whole problem, though, since we're still seeing timing errors. Working with Jonathan to resolve.
I don't think we can say for sure. I was just talking to EricQ about this, he said the glitches were often seen when changing the alignment offsets when aligning the arm. I am pretty sure I have seen the ETMX alignment change abruptly since the Ruby Standoff replacement (the Oplev spot just slides across the MEDM display rapidly), but I can't find an elog where I've put in details. I also haven't done a whole lot of work with the arm cavities where I would have noticed this problem. There is this test that Eric did, and it didn't throw up any red flags. But the suspension can be well behaved for weeks at a time before this problem pops up again.
There was also the flaky power connection to the timing card on the ETMX expansion chassis which was fixed only recently, after which there has been no systematic investigation of the status of ETMX.
If it is true that these events are caused by strain building up in the suspension wire, I wonder how we can take systematic steps to avoid it. From what I remember of the SOS assembly procedure, the (unglued) standoff is slid along the optic with the wire under slight tension until the wire slips into the groove on the standoff. Then the tension in the wire is adjusted till the optic is pitch balanced and at the desired height. But it is easy to imagine imprinting some torsional stresses in the (40 um?) wire during this process of looping it around under the optic and placing it in the groove. But perhaps this mechanism makes a negligible contribution to the effect we are seeing, and some other mechanism is responsible in this case.
Today we saw a weird issue with the GPS receiver (EndRun Technologies Tempus LX). GPS timing on fb1 (which is handled via IRIG-B connection to the receiver with a spectracom card) was off by +18 seconds. We tried resetting the GPS receiver and it still came up with +18 second offset. To be clear, the GPS receiver unit itself was showing a time on it's front panel that looked close enough to 24-hour UTC, but was off by +18s. The time also said "GPS" vertically to the right of the time.
We started exploring the settings on the GPS receiver and found this menu item:
Clock -> "Time Mode" -> "UTC"/"GPS"/"Local"
The setting when we found it was "GPS", which seems logical enough. However, when we switched it to "UTC" the time as shown on the front panel was correct, now with "UTC" vertically to the right of the time, and fb1 was then showing the correct GPS time.
From the manual:
The fact that moving to "UTC" fixed the problem, even though that is supposed to remove the leap second correction, might indicate that there's another bug in the symmetricom driver...
BL-FS300C-PH-LE was replaced after 2,904 hrs It did not explode this time. The 4 monts life period is actually pritty good because this is a $73. cheap bulb. The best-high priced warranty is 5 months.
PS: future option_bulbless laser projector
HITACHI LP-WU3500 PROJECTOR $2,549.00
DLP, WUXGA, 3500 LUMENS, HDMI, 20K HR LASER, 5YR WARRANTY, 1.36-2.34:1 LENS, 24/7 DUTY CYCLE
45x72 IMAGE FROM 8' TO 13'10" LENS TO SCREEN, AND AT 10' APPROX. 154FL OF BRIGHTNESS ON A 1.0 GAIN SCREEN
This would give you everything you are requesting, plus a lamp-less design and 5yr warranty. Ground shipping would be free anywhere in the lower 48, and we would not charge sales tax on orders billing/shipping outside of AZ. If you have any questions or if you would like to order... just let me know!
Tested to make sure that even when only the AD586 was heated that there was no change in the reading. I did so by placing the AD586 away from the rest of the circuit and blowing hot air only on it. There was, in fact, no change.
From Keith Thorne:
Soooo, "UTC" is the correct mode for the GPS receiver.
[johannes, gautam, jamie]
The last entry I found relating to ref cavity was 2011 Aug 19
Tried taking the circuit from the breadboard to the PCB. I attached all the components to adapters that would allow them to be connected to the PCB. From the first picture, the first component is AD586, the second is AD590, and the third is LT1012, along with a resistor across it. I then soldered the connections between the components, as can be seen in the second picture. When I tested out this version of the circuit by hooking it up to the DC source, I got a reading of ~-15V. I will have to check all the connections to make sure there is contact where there should be one, and no contact where there shouldn't be. I had issues attaching the tiny AD590 and LT1012 to its adaptor, so the issue may lie there as well. I'll also check that each component is in working order as well.
Once I figure out where my error is, my plan is to build two more of these and place a metal object such that it contacts only the surface of the AD590s. This would allow me to compare the three values to the actual temperature of the metal, which would then tell me how accurate this setup is.
Note on the resistor: I measured all the resistors and chose three that had exactly 10.00k Ohm. The voltage detected is dependent on the resistor, so if we are to take three identical copies, I ensured that there would be no error due to the resistors being a little different.
The CDS system has now been up moved to a supposedly more stable real-time-patched linux kernel (3.2.88-csp) and RCG r4447 (roughly the head of trunk, intended to be release 3.4). With one major and one minor exception, everything seems to be working:
The remaining issues are:
Issues that have been fixed:
What's the current backup situation?
Good question. We need to figure something out. fb1 root is on a RAID1, so there is one layer of safety. But we absolutely need a full backup of the fb1 root filesystem. I don't have any great suggestions, other than just getting an external disk, 1T or so, and just copying all of root (minus NFS mounts).
We also need to copy chiara's root. What is the best way to get the full image of the root FS?
We may need to restore these root images to a different disk with a different capacity.
Is the dump command good for this?
RFM network is back! Everything green again.
Use of RFM has been turned off in adLigoRTS trunk in favor of the new long-range PCIe networking being developed for the sites. Rolf provided a single-line patch that re-enables it:
controls@c1sus:/opt/rtcds/rtscore/trunk 0$ svn diff
--- src/epics/util/feCodeGen.pl (revision 4447)
+++ src/epics/util/feCodeGen.pl (working copy)
@@ -122,7 +122,7 @@
$diagTest = -1;
$flipSignals = 0;
$virtualiop = 0;
-$rfm_via_pcie = 1;
+$rfm_via_pcie = 0;
$edcu = 0;
$casdf = 0;
$globalsdf = 0;
This patched was applied to RTS source checkout we're using for the FE builds (/opt/rtcds/rtscore/trunk, which is r4447, and is linked to /opt/rtcds/rtscore/release). The following models that use RFM were re-compiled, re-installed, and re-started:
The re-compiled models now see the RFM cards (dmesg log from c1ioo):
[24052.203469] c1x03: Total of 4 I/O modules found and mapped
[24052.203471] c1x03: ***************************************************************************
[24052.203473] c1x03: 1 RFM cards found
[24052.203474] c1x03: RFM 0 is a VMIC_5565 module with Node ID 180
[24052.203476] c1x03: address is 0xffffc90021000000
[24052.203478] c1x03: ***************************************************************************
This cleared up all RFM transmission error messages.
CDS upstream are working to make this RFM usage switchable in a reasonable way.
Now that all the CDS overview lights are green, I decided to switch back the coil driver outputs to their original state so that the MC optics could be damped and the IMC relocked. I also restored the static PIT/YAW bias values to their original values.
MC1 has been quiet over the last couple of days, lets see how it behaves in the next few days. In all the glitches I have observed, if the IMC is locked and WFS loops are enabled, the loops are able to correct for the DC misalignment caused by the glitch. But the mcwfs off script is currently set up in such a way that the output history is cleared between IMC locks. I made two copies of the mcwfson/mcwfsoff scripts, called mcwfsunhold/mcwfshold respectively. They live in /opt/rtcds/caltech/c1/scripts/MC/WFS. I've also modified the autolocker script to call these modified scripts, such that when the IMC loses lock, the WFS servo outputs are held, while the input is turned off. The hope is that in this configuration, the autolocker can catch a lock even if there is a glitch on MC1.
I haven't tried locking the arms yet, but I think other IFO work discussed at the meeting (like arm loss estimation / cavity scans etc) can proceed.
I'm not sure if this has something to do with the model restarts / new RCG, but while I was re-enabling the MC watchdogs, I noticed the RMS sensor voltage channels on ITMX hovering around ~100mV, even though local damping was on (in which configuration I would expect <1mV if everything is working normally). I was confused by this behaviour, and after staring at the ITMX suspension screen for a while, I noticed that the input to the "ASCP" and "ASCY" servos were "-nan", and the outputs were 10^20 cts (see Attachment #1).
Digging a little deeper, I found that the same problem existed on ITMY, ETMX, ETMY, PRM (but not BS or SRM) - reasons unknown for now.
I have to check where this signal is coming from, but for now I just turned the "ASC Input" switch off. More investigation to be done, but in the meantime, ASS dither alignment may not be possible.
After consulting with Jamie, I have just disabled all outputs to the suspensions other than local damping loop outputs. I need to figure out how to get this configuration into the safe.snap file such that until we are sure of what is going on, the models start up in this safer configuration.
gedit 28 Oct 0026: Seems like this problem is seen at the sites as well. I wonder if the problem is related.
Today, with Johannes' help, I cleaned the fiber tips of the photodiodes. The effect of the cleaning was dramatic - see Attachments #1-4, which are X Beat PD, axial illumination, X Beat PD, oblique illumination, Y beat PD, axial illumination, Y beat PD, oblique illumination. They look much cleaner now, and the feature that looked like a scratch has vanished.
The cleaning procedure followed was:
I will repeat this procedure for all fiber connections once I start putting the box back together - I'm almost done with the new box, just waiting on some hardware to arrive.
The PSL HEPA was running noisy at 100V The bearing is wearing out. I turned it down to 30V It is quiet there.
Got it to work. One of the connections was faulty. I decided to check the temperature measured against a thermometer. The sensor showed 26.1 C, but the thermometer showed 25.8 C after I let them both cool down after heating them up. The temperature of the thermometer was dropping at the time of measurement, but the temperature of the sensor was not. This is still a rough version of the final sensor, so I'm not sure what exactly causes this discrepancy.
Seems like this modification didn't really work. There were several large MC1 glitches, and one of them misaligned MC1 so much that the IMC didn't relock for the last ~6 hours. I re-aligned MC1 manually, and now it is locked fine.
that's why the Autolocker clears the outputs; we don't want to be holding the offsets from the last ms of lock when it was all messed up; instead it would be best to have a slow (~mHz) relief script that takes the WFS controls and puts them onto the MC SUS sliders. This would then re-align the MC to the input beam rather than the input to the MC. Which is not the best idea.
Seems like this modification didn't really work.
The JetStor RAID unit that we had been using for frame writing before the fb meltdown has some archived frames from DRFPMI locks that I want to get at. I spent some time today trying to mount it on optimus with no success
The unit was connected to fb via a SCSI cable to a SCSI-to-PCI card inside of fb. I moved the card to optimus, and attached the cable. However, no mountable device corresponding to the RAID seems to show up anywhere.
The RAID unit can tell that it's hooked up to a computer, because when optimus restarts, the RAID event log says "Host Channel 0 - SCSI Bus Reset."
The computer is able to get some sort of signals from the RAID unit, because when I change the SCSI ID, the syslog will say 'detected non-optimal RAID status'.
The PCI card is ID'd fine in lspci as "06:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev c1)"
'lsssci' does not list anything related to the unit
Using 'mpt-status -p', which is somehow associated with this kind of thing returns the disheartening output:
Checking for SCSI ID:0
Checking for SCSI ID:1
Checking for SCSI ID:2
Checking for SCSI ID:3
Checking for SCSI ID:4
Checking for SCSI ID:5
Checking for SCSI ID:6
Checking for SCSI ID:7
Checking for SCSI ID:8
Checking for SCSI ID:9
Checking for SCSI ID:10
Checking for SCSI ID:11
Checking for SCSI ID:12
Checking for SCSI ID:13
Checking for SCSI ID:14
Checking for SCSI ID:15
Nothing found, contact the author
I spent some time today trying to debug this issue.
Jamie and I had opened up the c1sus frontend to try and replace the RFM card before we realized that the problem was in the RCG code generator. During this process, we had disconnected all of the back-panel cabling to this machine (2 ethernet cables, dolphin cable, and RFM cables/fibers). I thought I may have accidentally returned the cables to the wrong positions - but all the status indicator lights indicate that everything is working as it should, and I also confirmed that the cabling is as it is in the pictures of the rack on the wiki page.
Looking at the SimuLink model diagram (see Attachment #1 for example), it looks like (at least some of) these channels are actually on the dolphin network, and not the RFM network (with which we were experiencing problems). This suggests that the problem is something deeper. Although I did see nans in some of the ETMX ASC channels as well, for which the channels are piped over the RFM network. Even more puzzling is that the ASC MEDM screen (Attachment #3) and the SimuLink diagram (Attachment #2) suggest that there is an output matrix in between the input signals and the output angular control signals to the suspensions. As Attachment #4 shows, the rows corresponding to ITMX PIT and YAW are zero (I confirmed using z read <matrixElement>). Attachment #3 shows that the output of all the servo banks except CARM_YAW is zero, but CARM_YAW has no matrix element going to the ITMs (also confirmed with z read <servoOutputChannel>). So 0 x 0 should be 0, but for some reason the model doesn't give this output?
z read <matrixElement>
z read <servoOutputChannel>
GV Edit: As EricQ just pointed out to me, nan x 0 is still nan, which probably explains the whole issue. Poking a little further, it seems like this is an SDF issue - the SDF table isn't able to catch differences for this hold output channel.
As I was writing this elog, I noticed that, as mentioned above, the CARM_YAW output was "nan". When I restart the model (thankfully this didn't crash c1lsc!), it seems to default to this state. Opening up the filter module, I saw that the "hold output" was enabled.
All the points above stand - CARM_YAW output shouldn't have been going anywhere as per the output matrix, but it seems to have been responsible? Seems like a bug in any case if a model restarts with a field as "nan".
Anyways the problem seems to have been resolved so I'm going to try locking and dither aligning the arms now.
Rolf mentioned that a simple update could fix several of the CDS issues we are facing (e.g. inability to open up testpoints), but he didn't seem to have any insight into this particular issue. Jamie will try and recompile all the models and then we have to see if that fixes the remaining problems.
Leaving LSC mode OFF for now while CDS is still under investigation
Not really related to this work: We saw that the safe.snap file for c1oaf seems to have gotten overwritten at some point. I restored the EPICS values from a known good time, and over-wrote the safe.snap file.
My motivation tonight was to get an up-to-date spectrum of a calibrated measurement of the out-of-loop displacement of an arm locked on ALS (using the PDH signal as the out-of-loop sensor) to compare the performance of ALS control noise with the Izumi et al green locking paper.
I was able to fish out the PSD from the paper from the 40m svn, but the comparison as plotted looks kind of fishy. I don't see why the noise from 10-60Hz should be so different/worse. We updated the POX counts to meters conversion by looking at the Hz-calibrated ALSX signal and a ~800Hz line injected on ETMX.
They are synchronised tiny glitches. They are not mechanical.
On Friday, I cleaned up the circuit so that there are only three connections needed (+15V, -15V, GND) and a BNC connector for reading the output. Today, I added in bypass capacitors. The small yellow ones are 0.1 microF ceramic, and the large ones are 100 microF electrolytic. They are used to stabilize the +15V and -15V inputs to the OP amp and minimize fluctuations, since it doesn't have a regulator for stability. I have also attached the circuit diagram for the OP amp only, where 1 are the electrolytic and 2 are the ceramic. The temperature is still about 2 degrees off, but if that difference is constant for all temperatures in our range we can just calibrate it later.
Here is a helpful link on bypass capacitors (thanks to Kevin for sending it to me).
As a note, the electrolytic capacitors do have a polarity, so it is important to place them correctly (the negative side is towards the lower voltage potential, and not always towards ground).
In the aftermath of the accidental vent, it looks like the RGA was shutdown.
We followed the instructions in this elog to restart the RGA.
Seems to be working now, Steve says we just need to wait for it to warm up before we can collect a reliable scan.
We have good RGA scan now. There was no scan for 3 months.
At Rolf/Rich Abbott's request, we performed a check of the UPS today.
Steve believed that the UPS was functioning as it should, and the recent accidental vent was because the UPS batteries were insufficiently charged when the test was performed. Today, we decided to try testing the UPS.
We first closed V1, VM1 and VA6 using the MEDM screen. We prepared to pull power on all these valves by loosening the power connections (but not detaching them). [During this process, I lost the screw holding the power cord fixed to the gate valve V1 - we are looking for a replacement right now but it seems to be an odd size. It is cable tied for now.]
The battery charge indicator LEDs on the UPS indicated that the batteries were fully charged.
Next, we hit the "Test" button on the UPS - it has to be held down for ~3 seconds for the test to be actually initiated, seems to be a safety feature of the UPS. Once the test is underway, the LED indicators on the UPS will indicate that the loading is on the UPS batteries. The test itself lasts for ~5seconds, after which the UPS automatically reverts to the nominal configuration of supplying power from the main line (no additional user input is required).
In this test, one of the five battery charge indicator LEDs went off (5 ON LEDs indicate full charge).
So on the basis of this test, it would seem that the UPS is functioning as expected. It remains to be investigated if the various hardware/software interlocks in place will initiate the right sequence of valve closures when required.
Never hit O on the Vacuum UPS !
Note: the " all off " configuration should be all valves closed ! This should be fixed now.
In case of emergency you can close V1 with disconnecting it's actuating power as shown on Atm3 if you have peumatic pressure 60 PSI
I worked a little bit on the Y arm ALS today.
I am now going to measure the OLTFs of both green PDH loops to check that the overall loop gain is okay, and also check the measurement against EricQ's LISO model of the (modified) AUX green PDH servos. Results to follow.
Some weeks ago, I had moved some of the Green steering optics on the PSL table around, in order to flip some mirror mounts and try and get angles of incidence closer to ~45deg on some of the steering mirrors. As a result of this work, I can see some light on the GTRY CCD when the X green shutter is open. It is unclear if there is also some scattered light on the RFPDs. I will post pictures + a more detailed investigation of the situation on the PSL table later, there are multiple stray green beams on the PSL table which should probably be dumped.
As I was writing this elog, I saw the X green lock drop abruptly. During this time, the X arm stayed locked to the IR, and the Y arm beat on the control room network analyzer did not jump (at least not by an amount visible to the eye). Toggling the X end shutter a few times, the green TEM00 lock was re-acquired, but the beatnote has moved on the control room analyzer by ~40MHz. On Friday evening however, the X green lock held for >1 hour. Need to keep an eye on this.
Attachment #1 shows the results of my measurements tonight (SR785 data in Attachment #2). Both loops have a UGF of ~10kHz, with ~55 degrees of phase margin.
Excitation was injected via SR560 at the PDH error point, amplitude was 35mV. According to the LED indicators on these boxes, the low frequency boost stages were ON. Gain knob of the X end PDH box was at 6.5, that of the Y end PDH box was at 4.9. I need to check the schematics to interpret these numbers. GV Edit: According to this elog, these numbers mean that the overall gain of the X end PDH box is approx. 25dB, while that of the Y end PDH box is approx. 15dB. I believe the Y end Lightwave NPRO has an actuator discriminant ~5MHz/V, while the X end Innolight is more like 1MHz/V.
Not sure what to make of the X PDH loop measurement being so much noisier than the Y end, I need to think about this.
More detailed analysis to follow.
It turns out the problem was just a bent pin on the SCSI cable, likely from having to stretch things a bit to reach optimus from the RAID unit.
I hooked it up to megatron, and it was automatically recognized and mounted.
I had to turn off the new FB machine and remove it from the rack to be able to access megatron though, since it was just sitting on top. FB needs a rail to sit on!
At a cursory glance, the filesystem appears intact. I have copied over the achived DRFPMI frame files to my user directory for now, and Gautam is going to look into getting those permanently stored on the LDAS copy of 40m frames, so that we can have some redundancy.
Also, during this time, one of the HDDs in the RAID unit failed its SMART tests, so the RAID unit wanted it replaced. There were some spare drives in a little box directly under the unit, so I've installed one and am currently incorporating it back into the RAID.
There are two more backup drives in the box. We're running a RAID 5 configuration, so we can only lose one drive at a time before data is lost.
I had some trouble getting the daqd processes up and running again using Jamie's instructions.
With Jamie's help however, they are back up and running now. The problem was that the mx infrastructure didn't come back up on its own. So prior to running sudo systemctl restart daqd_*, Jamie ran sudo systemctl start mx. This seems to have done the trick.
c1iscey was still showing red fields on the CDS overview screen so Jamie did a soft reboot. The machine came back up cleanly, so I restarted all the models. But the indicator lights were still red. Apparently the mx processes weren't running on c1iscey. The way to fix this is to run sudo systemctl start mx_stream. Now everything is green.
Now we are going to work on trying the fix Rolf suggested on c1iscex.
Here is what was done (Jamie will correct me if I am mistaken).
So while we are in a better state now, the problem isn't fully solved.
Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.
After getting the go ahead from Jamie, I recompiled all the FE models against the same version of RCG that we tested on the c1iscex models.
To do so:
IFO alignment needs to be redone, but at least we now have a (admittedly rounabout way) of getting testpoints. Did a quick check for "nan-s" on the ASC screen, saw none. So I am re-enabling watchdogs for all optics.
GV 23 August 9am: Last night, I re-aligned the TMs for single arm locks. Before the model restarts, I had saved the good alignment on the EPICs sliders, but the gain of x3 on the coil driver filter banks have to be manually turned on at the moment (i.e. the safe.snap file has them off). ALS noise looked good for both arms, so just for fun, I tried transitioning control of both arms to ALS (in the CARM/DARM basis as we do when we lock DRFPMI, using the Transition_IR_ALS.py script), and was successful.
Didn't someone look at what the OLG req. should be for these servos at some point? I wonder if we can make a parallel digital path that we switch on after green lock. Then we could make this a simple 1/f box and just add in the digital path (take analog control signal into ADC, filter, and then sum into the control point further down the path to the laser) for the low frequency boost.
The V1 gate valve specs installed at 40m wiki page. VAT model number 10846-UE44-0007 Our main volume pumping goes through this 8" id gate valve V1 to Maglev turbo or Cryo pump to VC1
The ion pumps have 6" id gate valves:VAT 10844-UE44-AAY1, Pneumatic actuator with position indicator and double acting solenoid valve 115V 60Hz Purchased 1999 Dec 22
UHV gate valves 2.5" id. VAT 10836-UE44 Pneumatic actuator with position indicator and double acting solenoid valve 115V 60 hz, IFO to RGA VM1 & RGA to Maglev VM2
mini UHV gate valve 1.5" id. VAT 01032-UE01 2016 cataloge page 14, manual - no position indicator, VM4 next to manual adjustable fine leak valve to RGA
UHV angle valve 1.5" id, model VAT 28432-GE41, Viton plate seal, pneumatic actuator with position indicator & solenoid valve 115V & single acting closing spring MEDM screen: VM3,VC2, V3,V4,V5,V6,VA6,V7 & annuloses Each chamber annulos has 2 valves.
UHV angle valve 1.5" id, model VAT 57132-GE05 go page 208, Metal tip seal, manual actuating only with position indicator, MEDM screen: roughing RV1 and venting VV1 hand wheel needed to close to torque spec
UHV angle valve 1.5" id. model VAT 28432-GE01 Viton plate seal, manual operation only at IT gauges Hornet & Super Bee and ion pumps roughing ports. These are not labeled.
The Cryo pump interlock wiring was added too
Note: all moving valve plate seals are single.
I completed the revamp of the box, and re-installed the box on the PSL table today. I think it would be ideal to install this on one of the electronic racks, perhaps 1X2 would be best. We would have to re-route the fibers from the PSL table to 1X2, but I think they have sufficient length, and this way, the whole arrangement is much cleaner.
Did a quick check to make sure I could see beat notes for both arms. I will now attempt to measure the ALS noise with this revamped box, to see if the improved power supply and grounding arrangement, as well as fiber cleaning, has had any effect.
Photos + power budget + plan of action for using this box to characterize the green PDH locking to follow.
For quick reference: here is the AM/PM measurement done when we re-installed the repaired Innolight NPRO on the new X endtable.
Since the single arm locking and dither alignment seemed to work alright after the CDS overhaul, I decided to try some recycling cavity locking tonight.
Why should this have changed? I was just on the AS table and did re-center the beam onto the REFL 55 RFPD, but I had also done this in April/May when I was last doing DRMI locking. But I can't explain the apparent factor of ~4 increase in light level. I think I have some measurements of the light levels at various PDs from April 2017, I will see how the present levels line up.
Of course dataviever won't cooperate when I am trying to monitor testpoints.
I may be missing something obvious, but I am quitting for tonight, will look into this more tomorrow.
Unrelated to this work: looking at the GTRY spot on the CCD monitor, there seems to be some excess angular motion. Not sure where this is coming from. In the past, this sort of problem has been symptomatic of something going wonky with the Oplev loops. But I took loop measurements for ITMY and ETMY PIT and YAW, they look normal. I will investigate further when I am doing some more ALS work.
A couple of weeks ago, I was trying to modernize the python version of the FSS Slow temperature control loops, when I accidentally ended up deleting it . There was no svn backup. So the old Perl PID script has been running for the last few days.
Today, I checked out the latest version that Andrew and co. have running in the PSL lab. I had to make some important modifications for the script to work for the 40m setup.
python FSSSlow.py -i FSSSlowPy.ini
Then I stopped the Perl process on megatron by running
sudo initctl stop FSSslow
and started the Python process by running
sudo initctl start FSSslowPy
I have now committed the files FSSSlow.py and FSSSlowPy.ini to the 40m svn. Things seem to be stable for the last 20 mins or so, let's keep an eye on this though - although we had been running the Python PID loop for some months, this version is a slightly modified one.
The initctl stuff still isn't very robust - I think both the Autolocker and the FSS slow servos have to be manually restarted if megatron is shutdown/restarted for whatever reason. It doesn't seem to be a problem with the initctl routine itself - looking at the logs, I can see that init is trying to start both processes, but is failing to do so each time. To be investigated. The wiki procedure to restart this process is up to date.
GV Edit 0000 25 Aug 2017: I had to add a line to the script that checks MC transmission before enabling the PID loop. Change has been committed to svn. Now, when the MC loses lock or if the PSL shutter is kept closed for an extended period of time, the temperature loop doesn't rail.
I tried some DRMI locking again tonight, but had no success. Here is the story.
Looks like I will have to embark on the REFL55 LSC electronics investigation. I was able to successfully lock the PRC on carrier and sideband, and the Michelson lock also seems to work fine, all of which seem to point to a hardware problem with the REFL55 signal chain.
I did a quick check by switching the output of the REFL55 demod board to the inputs normally used by AS55 signals on the whitening board. Setting the whitening gain to +18dB for these channels had the same effect - ADC overflow galore. So looks like the whitening board isn't to blame. I will have to check the demod board out.