Since the lab-wide computer shutdown last Wednesday, all the realtime models running on c1lsc have been flaky. The error is always the same:
[58477.149254] c1cal: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1daf: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1ass: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1oaf: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1lsc: ADC TIMEOUT 0 10963 19 11027
[58478.148001] c1x04: timeout 0 1000000
[58479.148017] c1x04: timeout 1 1000000
[58479.148017] c1x04: exiting from fe_code()
This has happened at least 4 times since Wednesday. The reboot script makes recovery easier, but doing it once in 2 days is getting annoying, especially since we are running many things (e.g. ASS) in custom configurations which have to be reloaded each time. I wonder why the problem persists even though I've power-cycled the expansion chassis? I want to try and do some IFO characterization today so I'm going to run the reboot script again but I'll get in touch with J Hanks to see if he has any insight (I don't think there are any logfiles on the FEs anyways that I'll wipe out by doing a reboot). I wonder if this problem is connected to DuoTone? But if so, why is c1lsc the only FE with this problem? c1sus also does not have the DuoTone system set up correctly...
The last time this happened, the problem apparently fixed itself so I still don't have any insight as to what is causing the problem in the first place . Maybe I'll try disabling c1oaf since that's the configuration we've been running in for a few weeks.
Independent from the problems the vertex machine has been having (I think, unless it's something happening over the shared memory network), I noticed on Friday that the ETMX watchdog was tripped. Today, once again, the ETMX watchdog was tripped. There is no evidence of any abnormal seismic activity around that time, and anyways, none of the other watchdogs tripped. Attachment #1 shows that this happened ~838am PT today morning. Attachment #2 shows the 2k sensor data around the time of the trip. If the latter is to be believed, there was a big impulse in the UL shadow sensor signal which may have triggered the trip. I'll squish cables and see if that helps - Steve and I did work at the EX electronics rack (1X9) on Friday but this problem precedes our working there...
OK, how about this:
The question still remains of how to combine the fast and bias paths in this proposed scheme. I think the following approach works for prototyping at least:
In the longer term, perhaps the Satellite Box revamp can accommodate a bias voltage summation connector.
Bah! Too complex.
I have neglected many practical concerns. Some things that come to mind:
I spent most of today fighting various CDS errors.
Let's see how stable this configuration is. Onto some locking now...
Stability was short-lived it seems. When I came in this morning, all models on c1lsc were dead already, and now c1sus is also dead (Attachment #1). Moreover, MC1 shadow sensors failed for a brief period again this afternoon (Attachment #2). I'm going to wait for some CDS experts to take a look at this since any fix I effect seems to be short-lived. For the MC1 shadow sensors, I wonder if the Trillium box (and associated Sorensen) failure somehow damaged the MC1 shadow sensor/coil driver electronics.
I've left the c1lsc frontend shutdown for now, to see if c1sus and c1ioo can survive without any problems overnight. In parallel, we are going to try and debug the MC1 OSEM Sensor problem - the idea will be to disable the bias voltage to the OSEM LEDs, and see if the readback channels still go below zero, this would be a clear indication that the problem is in the readback transimpedance stage and not the LED. Per the schematic, this can be done by simply disconnecting the two D-sub connectors going to the vacuum flange (this is the configuration in which we usually use the sat box tester kit for example). Attachment #1 shows the current setup at the PD readout board end. The dark DC count (i.e. with the OSEM LEDs off) is ~150 cts, while the nominal level is ~1000 cts, so perhaps this is already indicative of something being broken but let's observe overnight.
Gautam and I tested out the DAC that he installed in the latter half of last week. We confirmed that at least one of the channels is can successfully drive a sine wave (ch10, 1-indexed). We had to measure the output directly on the SCSI connector (breakout in the FE hard drive cabinet along the Y arm), since the SCSI breakout box (D080303) seems not to be working (wiring diagram in Gautam's elog from his SURF years).
Overnight, all models on c1sus and c1ioo seem to have had no stability issues, supporting the hypothesis that timing issues stem from c1lsc. Moreover, the MC1 shadow sensor readouts showed no negative values over a ~12hour period. I think we should just observe this for another day, in any case I don't think there is any urgent IFO related activity scheduled.
I am starting the c1x04 model (IOP) on c1lsc to see how it behaves overnight.
Well, there was apparently an immediate reaction - all the models on c1sus and c1ioo reported an ADC timeout and crashed. I'm going to reboot them and still have c1x04 IOP running, to see what happens.
[97544.431561] c1pem: ADC TIMEOUT 3 8703 63 8767
[97544.431574] c1mcs: ADC TIMEOUT 1 8703 63 8767
[97544.431576] c1sus: ADC TIMEOUT 1 8703 63 8767
[97544.454746] c1rfm: ADC TIMEOUT 0 9033 9 8841
I was preparing for the aLIGO EOM measuement to be carried out tomorrow afternoon.
I did a few modifications to the PLL setup.
Tomorrow I am going to modulate the EOM with the AUX Marconi via an amplifier (probably)
Automated scripts (AGinit.py and AGmeas.py) are in /users/koji/scripts
I will revert the setup once the measurement is done tomorrow.
Rich and I worked on the EOM measurement. After the measurement, the setup was reverted to the nominal state
As part of this slow but systematic debugging, I am turning on the c1lsc model overnight to see if the model crashes return.
Today while Rich Abbott was here, Koji and I had a brief discussion with him about the HV amplifier idea for the coil driver bias path. He gave us some useful tips, perhaps most useful being a topology that he used and tested for an aLIGO ITM ESD driver which we can adapt to our application. It uses a PA95 high voltage amplifier which differs from the PA91 mainly in the output voltage range (up to 900V for the former, "only" 400V for the former. He agrees with the overall design idea of
He also gave some useful suggestions like
I am going to work on making a prototype version of this box for 5 channels that we can test with ETMX. I have been told that the coupling from side coil to longitudinal motion is of the order of 1/30, in which case maybe we only need 4 channels.
For operating the SRC in the "Signal-Recycled" tuning, the SRC macroscopic length needs to be ~4.04m (compared to the current value of ~5.399m), assuming we don't do anything fancy like change the modulation frequencies and not transmit through the IMC. We're putting together a notebook with all the calculations, but today I was thinking about what the signal extraction path should be, specifically which chamber the SRM should be in. Just noting down the thoughts I had here while they're fresh in my head, all this has to be fleshed out, maybe I'm making this out to be more of a problem than it actually is.
The model seems to have run without issues overnight. Not completely related, but the MC1 shadow sensor signals also don't show any abnormal excursions to negative values in the last 48 hours. I'm thinking about re-connecting the satellite box (but preserving the breakout setup at 1X6 for a while longer) and re-locking the IMC. I'll also start c1ass on the c1lsc frontend. I would say that the other models on c1lsc (i.e. c1oaf, c1cal, c1daf) aren't really necessary for basic IFO operation.
A brief follow-up on this since we discussed this at the meeting yesterday: the attached DV screenshot shows the full 2k data for a period of 2 seconds starting just before the watchdog tripped. It is clear that the timescale of the glitch in the UL channel is much faster (~50 ms) compared to the (presumably mechanical) timescale seen in the other channels of ~250 ms, with the step also being much smaller (a few counts as opposed to the few thousand counts seen in the UL channel, and I guess 1 OSEM count ~ 1 um). All this supports the hypothesis that the problem is electrical and not mechanical (i.e. I think we can rule out the Acromag sending a glitchy signal to the coil and kicking the optic). The watchdog itself gets tripped because the tripping condition is the RMS of the shadow sensor outputs, which presumably exceeds the set threshold when UL glitches by a few thousand counts.
After this work of increasing the series resistance on ETMX, there have been numerous occassions where the insufficient misalignment of ETMX has caused problems in locking vertex cavities. Today, I modified the script (located at /opt/rtcds/caltech/c1/medm/MISC/ifoalign/AlignSoft.py) to avoid such problems. The way the misalign script works is to write an offset value to the "TO_COIL" filter bank (accessed via "Output Filters" button on the Suspension master MEDM screen - not the most intuitive place to put an offset but okay). So I just increased the value of this offset from 250 counts to 2500 counts (for ETMX only). I checked that the script works, now when both ETMs are misaligned, the AS55Q signal shows a clean Michelson-like sine wave as it fringes instead of having the arm cavity PDH fringes as well .
Note that the svn doesn't seem to work on the newly upgraded SL7 machines: svn status gives me the following output.
Is it safe to run 'svn upgrade'? Or is it time to migrate to git.ligo.org/40m/scripts?
For the first time after the whirlwind vent, I managed to lock the PRMI.
I don't have the energy to make a DRMI attempt tonight - but the signs are encouraging. I'd like to use the IFO in the next few days to try and recover DRMI locking. The main concern is that the optical path on the AS beam has changed by ~0.3m I estimate. So the demod phase for AS55 may need to be adjusted, but the change due to optical path length only should be ~10degrees so the DRMI locking with the old settings should still work. Perhaps we also want to scan the PRC and SRC with the phase information from the Trans/Refl transfer functions as well.
Don't want to jinx it, but the c1lsc FE models have been stable. Tomorrow, I'd like to re-enable c1cal, since it has some useful channels for NBing. Could c1daf/c1oaf which have significant amounts of custom C code be the culprits?
Here is an other big one
Larry W said that some security issues were flagged on nodus. So I ran
on nodus. The exclude flag is because there were some conflicts related to that particular package. Hopefully this has fixed the problem. It's been a while since the last update, which was in January of this year.
We finished up making the new c1omc model (screenshot attached).
The new channels are only four DAC for ASC into the OMC, and one DAC for the OMC length:
In preparation for attempting some DRMI locking, I did the following:
Not related to this work, but I turned the Agilent NA off since we aren't using it immediately.
While working on the single arm alignment, I noticed that today, i was able to get the X arm transmission back to ~1.22, and the GTRX to 0.52. These are closer to the values I remember from prior to the vent. Running the dither alignment promptly degrades both the green and IR transmissions. Since the pianosa SL7 upgrade, I can't use the sensoray to capture images, but to me, the spot looks a little off-center in Yaw on ETMX in this configuration, I've tried to show this in the phone grab (Atm #2). Maybe indicative of clipping somewhere upstream of ITMX?
Anyways, I'm pushing onwards for now, something to check out in the daytime.
After I effected the series resistance change for ETMX, the X arm ASS didn't work (i.e. IR transmission would degrade if the servo was run). Today, we succeeded in recovering a functional ASS servo .
We then tried to maximize GTRX using the PZT mirrors, but were only successful in reaching a maximum of 0.41. The value I remember from before the vent was 0.5, and indeed, with the IR alignment not quite optimized before we began this work, I saw GTRX of 0.48. But the IR dither servo signals indicate that the cavity axis may have shifted (spot position on the ITM, which is uncontrolled, seems to have drifred significantly, the Pitch signal doesn't stay on the StripTool scale anymore). So we may have to double check that the transmitted beam isn't falling off the GTRX DC PD.
After tweaking the AS55 demod phase, SRM alignment, triggering settings, I got a few brief DRMI locks in tonight, I'm calling it a success (though this isn't really robust yet). The main things to do now are:
I think the main IFO characterization remaining to be done to determine the status of the IFO post vent is to measure the losses of the arm cavities. IMO, we will need to certainly fix the clipping at ETMY before we attempt some serious locking.
I made a script to scan the OMC length at each setpoint for the two TTs steering into the OMC. It is currently located on nodus at /users/aaron/OMC/scripts/OMC_lockScan.py.
I haven't tested it and used some ez.write syntax that I hadn't used before, so I'll have to double check it.
My other qualm is that I start with all PZTs set at 0, and step around alternative +/- values on each PZT at the same magnitude (for example, at some value of PZT1_PIT, PZT1_YAW, PZT2_PIT, I'll scan PZT2_YAW=1, then PZT2_YAW=-1, then PZT2_YAW=2). If there's strong hysteresis in the PZTs, this might be a problem.
It looks like we can have a stable SRC of length 4.044 m without getting any new mirrors, so this is an option to consider in the short-term.
gautam 245pm: Koji pointed out that the G&H mirrors are coated for normal incidence, but looking at the measurement, it looks like the optic has T~75ppm at 45 degree incidence, which is maybe still okay. Alternatively, we could use the -600m SR3 as the single folding mirror in the SRC, at the expense of slightly reduced mode-matching between the arm cavity and SRC.
I took another pass at this. Here is what I have now:
Attachment #1: Composite amplifier design to suppress voltage noise of PA91 at low frequencies.
Attachment #2: Transfer function from input to output.
Attachment #3: Top 5 voltage noise contributions for this topology.
Attachment #4: Current noises for this topology, comparison to current noise from fast path and slow DAC noise.
Attachment #5: LISO file for this topology.
Looks like this will do the job. I'm going to run this by Rich and get his input on whether this will work (this design has a few differences from Rich's design), and also on how to best protect from HV incidents.
Starting c1cal now, let's see if the other c1lsc FE models are affected at all... Moreover, since MC1 seems to be well-behaved, I'm going to restore the nominal eurocrate configuration (sans extender board) tomorrow.
The manufacturer of a vacuum pump supplies a chart for each pump showing pumping speed (volume in unit time) vs pressure. The example, for a fictitious pump, shows the pumping speed is substantially constant over a large pressure range.
By multiplying pumping speed by pressure at which that pumping speed occurs, we get a measure called pump throughput. We can tabulate those results, as shown in the table below, or plot them as a graph of pressure vs pump throughput. As is clear from the chart, pump throughput (which might also be called mass flow) decreases proportionally with PRESSURE, at least over the pressure range where pumping speed is constant.
The roughing pump speed actually will reach 0 l/s at it's ultimate pressure performance.
Our roughing pump pumping speed will slowly drop as chamber pressure drops. Below 10 Torr this decrease is accelerated and bottoms out. This where the Root pump can help. See NASA evaluation of dry rough pumps...What is a root pump
We have been operating succsessfully with a narrow margin. The danger is that the Maglev forline peaks at 4 Torr. This puts load on the small turbo TP2, TP3 & large TP1
The temperature of these TP2 & 3 70 l/s drag turbos go up to 38 C and their rotation speed slow to 45K rpm from 50K rpm because of the large volume 33,000 liters
Either high temp or low rotation speed of drag turbo or long time of overloading can shut down the small turbo pumps......meaning: stop pumping, wait till they cool down
The manual gate valve installed helped to lower peak temp to 32C It just took too long.
We have been running with 2 external fans [one on TP1 & one on TP3] for cooling and one aux drypump to help lowering the foreline pressure of TP2 & 3
The vacuum control upgrade should include adding root pump into the zero pumping speed range.
Atm1, Pump speed chart: TP1 turbo -red, root pump -blue and mechanical pump green. Note green color here representing an oily rotory pump. Our small drypumps [SH-100] typically run above 100 mTorr
They are the forepump of TP2 & 3 Our pumpdown procedure: Oily Leybold rotory pumps ( with safety orifice 350 mT to atm ) rough to 500 mTorr
Here we switch over to TP2 & 3 running at 50k RPM with drypumps SH-100 plus Aux Triscroll
TP1- Maglev rotating full speed when V1 is opened at full volume at 500 mTorr
History: the original design of the early 1990s had no dry scroll pumps. Oil free dry scrools replaced the oily forepumps of TP2 & TP3 in ~2002 at the cost of degrading the forline pressure somewhat.
We had 2 temperature related Maglev failers in 2005 Aug 8 and 2006 April 5 Osaka advised us to use AUX fan to cool TP1 This helped.
Atm2, Wanted Root pump - Leybold EcoDry 65 plus
Atm3, Typical 8 hrs pumpdown from 2007 with TP2 & 3
Atm4, Last pumpdown zoomed in from 400 mT to 1mT with throttled gate valve took 9 hrs The foreline pressure of TP1 peaked at 290 mT, TP3 temperature peaked at 32C
This technic is workable, but 9 hrs is too long.
Atm5, The lowest pressure achived in the 40m Vacuum Envelope 5e-7 Torr with pumps Maglev ~300 l/s, Cryo 1500 l/s and 3 ion pumps of 500 l/s [ in April 2002 at pumpdown 53 day 7 ] with annuloses at ~ 10 mTorr
Atm6, Osaka TG390MCAB Throughput with screen ~300 L/s at 12 cfm backing pump
I had a very fruitful discussion with Rich about this circuit today. He agreed with the overall architecture, but made the following suggestions (Attachment #1 shows the circuit with these suggestions incorporated):
If all this sounds okay, I'd like to start making the PCB layout (with 5 such channels) so we can get a couple of trial boards and try this out in a couple of weeks. Per the current threat matrix and noises calculated, coil driver noise is still projected to be the main technical noise contribution in the 40m PonderSqueeze NB (more on this in a separate elog).
When I came in this morning no light was reaching the MC. One fast machine was dead, c1lsc, and a number of the slow machines: c1susaux, c1iool0, c1auxex, c1auxey, c1iscaux. Gautam walked me through reseting the slow machines manually and the fast machines via the reboot script. The computers are all back online and the MC is again able to lock.
6.2M Bandon, OR did not trip any sus
Glitch, small amplitude, 350 counts & no trip.
I tried unsuccessfully to relock the MC this afternoon.
I came in to find it in a trouble state with a huge amount of noise on C1:PSL-FSS_PCDRIVE visible on the projector monitor. Light was reaching the MC but it was unable to lock.
I don't know what had been wrong, but I could lock the PMC as usual.
The IMC got relocked by AutoLocker. I checked the LSC and confirmed at least Y arm could be locked just by turning on the LSC servos.
I don't know what had been wrong, but I could lock the PMC as usual.
The IMC got relocked by AutoLocker. I checked the LSC and confirmed at least Y arm could be locked just by turning on the LSC servos.
The second big glich trips ETMX sus. There were small earth quakes around the glitches. It's damping recovered.
Small earth quakes and suspensions. Which one is the most free and most sensitive: ITMX
I found c1lsc unresponsive again today. Following the procedure in elog #13935, I ran the rebootC1LSC.sh script to perform a soft reboot of c1lsc and restart the epics processes on c1lsc, c1sus, and c1ioo. It worked. I also manually restarted one unresponsive slow machine, c1aux.
After the restarts, the CDS overview page shows the first three models on c1lsc are online (image attached). The above elog references c1oaf having to be restarted manually, so I attempted to do that. I connect via ssh to c1lsc and ran the script startc1oaf. This failed as well, however.
In this state I was able to lock the MICH configuration, which is sufficient for my purposes for now, but I was not able to lock either of the arm cavities. Are some of the still-dead models necessary to lock in resonant configurations?
All suspension tripped. Their damping restored. The MC is locked.
ITMX-UL & side magnets are stuck.
TP-1 Osaka maglev controller [ model TCO10M, ser V3F04J07 ] needs maintenance. Alarm led on indicating that we need Lv2 service.
The turbo and the controller are in good working order.
Our maintenance level 2 service price is $...... It consists of a complete disassembly of the controller for internal cleaning of all ICB’s, replacement of all main board capacitors, replacement of all internal cooling units, ROM battery replacement, re-assembly, and mandatory final testing to make sure it meets our factory specifications. Turnaround time is approximately 3 weeks.
RMA 5686 has been assigned to Caltech’s returning TC010M controller. Attached please find our RMA forms. Complete and return them to us via email, along with your PO, prior to shipping the cont
Osaka Vacuum USA, Inc.
510-770-0100 x 109
our TP-1 TG390MCAB is 9 years old. What is the life expectancy of this turbo?
The Osaka maglev turbopumps are designed with a 100,000 hours(or ~ 10 operating years) life span but as you know most of our end-users are
running their Osaka maglev turbopumps in excess of 10+, 15+ years continuously. The 100,000 hours design value is based upon the AL material being rotated at
the given speed. But the design fudge factor have somehow elongated the practical life span.
We should have the cost of new maglev & controller in next year budget. I put the quote into the wiki.
I freed ITMX and coarsely realigned the IFO using the OPLEVs. All the alignments were a bit off from overnight.
The IFO is still only able to lock in MICH mode currently, which was the situation before the earthquake. This morning I additionally tried restoring the burt state of the four machines that had been rebooted in the last week (c1iscaux, c1aux, c1psl, c1lsc) but that did not solve it.
Electrician is coming to fix one of the fluorenent light fixture holder in the east arm tomorrow morning at 8am. He will be out by 9am.
The job did not get done. There was no scaffolding or ladder to reach troubled areas.
c1lsc crashed again. I've contacted Rolf/JHanks for help since I'm out of ideas on what can be done to fix this problem.
Rolf came by today morning. For now, we've restarted the FE machine and the expansion chassis (note that the correct order in which to do this is: turn off computer--->turn off expansion chassis--->turn on expansion chassis--->turn on computer). The debugging measures Rolf suggested are (i) to replace the old generation ADC card in the expansion chassis which has a red indicator light always on and (ii) to replace the PCIe fiber (2010 make) running from the c1lsc front-end machine in 1X6 to the expansion chassis in 1Y3, as the manufacturer has suggested that pre-2012 versions of the fiber are prone to failure. We will do these opportunistically and see if there is any improvement in the situation.
Another tip from Rolf: if the c1lsc FE is responsive but the models have crashed, then doing sudo reboot by ssh-ing into c1lsc should suffice* (i.e. it shouldn't take down the models on the other vertex FEs, although if the FE is unresponsive and you hard reboot it, this may still be a problem). I'll modify I've modified the c1lsc reboot script accordingly.
* Seems like this can still lead to the other vertex FEs crashing, so I'm leaving the reboot script as is (so all vertex machines are softly rebooted when c1lsc models crash).