40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 54 of 339  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  13732   Thu Apr 5 19:31:17 2018 gautamUpdateCDSEPICS processes for c1ioo dead

I found all the EPICS channels for the model c1ioo on the FE c1ioo to be blank just now. The realtime model itself seemed to be running fine, judging by the IMC alignment (as the WFS loops seemed to still be running okay). I couldn't find any useful info in demsg but I don't know what I'm looking for. So my guess is that somehow the EPICS process for that model died. Unclear why.

  13733   Fri Apr 6 10:00:29 2018 gautamUpdateCDSCDS puzzle
Quote:

Clearly, there is a discrepancy for f>20Hz. Why?

Spectral leakage

  13734   Fri Apr 6 16:07:18 2018 gautamUpdateCDSFrequent EPICS dropouts

Kira informed me that she was having trouble accessing past data for her PID tuning tests. Looking at the last day of data, it looks like there are frequent EPICS data dropouts, each up to a few hours. Observations (see Attachment #1 for evidence):

  1. Problem seems to be with writing these EPICS channels to frames, as the StripTool traces do not show the same discontinuities.
  2. Problem does not seem local to c1auxex (new Acromag machine). These dropouts are also happening in other EPICS channel records. I have plotted PMC REFL, which is logged by the slow machine c1psl, and you can see the dropouts happen at the same times.
  3. Problem does not seem to happen to the fast channel DQ records (see ETMX Sensor record plotted for 24 hours, there are no discontinuties).

It is difficult to diagnose how long this has been going on for, as once you start pulling longer stretches of data on dataviewer, any "data freezes" are washed out in the extent of data plotted.

Attachment 1: EPICS_dropout.png
EPICS_dropout.png
  13758   Wed Apr 18 10:44:45 2018 gautamUpdateCDSslow machine bootfest

All slow machines (except c1auxex) were dead today, so I had to key them all. While I was at it, I also decided to update MC autolocker screen. Kira pointed out that I needed to change the EPCIS input type (in the RTCDS model) to a "binary input", as opposed to an "analog input", which I did. Model recompilation and restart went smooth. I had to go into the epics record manually to change the two choices to "ENABLE" and "DISABLE" as opposed to the default "ON" and "OFF". Anyways, long story short, MC autolocker controls are a bit more intuitive now I think.

Attachment 1: MCautolocker_MEDM_revamp.png
MCautolocker_MEDM_revamp.png
  13812   Thu May 3 12:19:13 2018 gautamUpdateCDSslow machine bootfest

Reboot for c1susaux and c1iscaux today. ITMX precautions were followed. Reboots went smoothly.

IMC is shuttered while Jon does PLL characterization...


Now relocked.

  13898   Wed May 30 16:12:30 2018 Jonathan HanksSummaryCDSLooking at c1oaf issues

When c1oaf starts up there are 446 gain channels that should be set to 0.0 but which end up at 1.0.  An example channel is C1:OAF-ADAPT_CARM_ADPT_ACC1_GAIN.  The safe.snap file states that it should be set to 0.  After model start up it is at 1.0.

We ran some tests, including modifying the safe.snap to make sure it was reading the snap file we were expecting.  For this I set the setpoint to 0.5.  After restart of the model we saw that the setpoint went to 0.5 but the epics value remained at 1.0.  I then set the snap file back to its original setting.  I ran the epics sequencer by hand in a gdb session and verified that the sequencer was setting the field to 0.  I also built a custom sequencer that would catch writes by the sdf system to the channel.  I only saw one write, the initial write that pushed a 0.  I have reverted my changes to the sequencer.

The gain channel can be caput to the correct value and it is not pushed back to 1.0.  So there does not appear to be a process actively pushing the value to 1.0.  On Rolfs sugestion we ran the sequencer w/o the kernel object loaded, and saw the same behavior.

This will take some thought.

  13925   Thu Jun 7 12:20:53 2018 gautamUpdateCDSslow machine bootfest

FSS slow wasn't running so PSL PZT voltage was swinging around a lot. Reason was that was c1psl unresponsive. I keyed the crate, now it's okay. Now ITMX is stuck - Johannes just told be about an un-elogged c1susaux reboot. Seems that ITMX got stuck at ~4:30pm yesterday PT. After some shaking, the optic was loosened. Please follow the procedure in future and if you do a reboot, please elog it and verify that the optic didn't get stuck.

Attachment 1: ITMX_stuck.png
ITMX_stuck.png
  13934   Fri Jun 8 14:40:55 2018 c1lscUpdateCDSi am dead
Attachment 1: 31.png
31.png
  13935   Fri Jun 8 20:15:08 2018 gautamUpdateCDSReboot script

Unfortunately, this has happened (and seems like it will happen) enough times that I set up a script for rebooting the machine in a controlled way, hopefully it will negate the need to repeatedly go into the VEA and hard-reboot the machines. Script lives at /opt/rtcds/caltech/c1/scripts/cds/rebootC1LSC.sh. SVN committed. It worked well for me today. All applicable CDS indicator lights are now green again. Be aware that c1oaf will probably need to be restarted manually in order to make the DC light green. Also, this script won't help you if you try to unload a model on c1lsc and the FE crashes. It relies on c1lsc being ssh-able. The basic logic is:

  1. Ask for confirmation.
  2. Shutdown all vertex optic watchdogs, PSL shutter.
  3. ssh into c1sus and c1ioo, shutdown all models on these machines, soft reboot them.
  4. ssh into c1lsc, soft reboot the machine. No attempt is made to unload the models.
  5. Wait 2 minutes for all machines to come back online.
  6. Restart models on all 3 vertex FEs (IOPs first, then rest).
  7. Prompt user for confirmation to re-enable watchdog status and open PSL shutter.
Attachment 1: 31.png
31.png
  13942   Mon Jun 11 18:49:06 2018 gautamUpdateCDSc1lsc dead again

Why is this happening so frequently now? Last few lines of error log:

[  575.099793] c1oaf: DAQ EPICS: Int = 199  Flt = 706 Filters = 9878 Total = 10783 Fast = 113
[  575.099793] c1oaf: DAQ EPICS: Number of Filter Module Xfers = 11 last = 98
[  575.099793] c1oaf: crc length epics = 43132
[  575.099793] c1oaf:  xfer sizes = 128 788 100988 100988 
[240629.686307] c1daf: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1cal: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1ass: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1oaf: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1lsc: ADC TIMEOUT 0 43039 31 43103
[240630.684493] c1x04: timeout 0 1000000 
[240631.684938] c1x04: timeout 1 1000000 
[240631.684938] c1x04: exiting from fe_code()

I fixed it by running the reboot script.

Attachment 1: 36.png
36.png
  13947   Mon Jun 11 23:22:53 2018 gautamUpdateCDSEX wiring confusion

 [Koji, gautam]

Per this elog, we don't need any AIOut channels or Oplev channels. However, the latest wiring diagram I can find for the EX Acromag situation suggests that these channels are hooked up (physically). If this is true, there are 12 ADC channels that are occupied which we can use for other purposes. Question for Johannes: Is this true? If so, Kira has plenty of channels available for her Temperature control stuff..

As an aside, we found that the EPICS channel names for the TRX/TRY QPD gain stages are somewhat strangely named. Looking closely at the schematic (which has now been added to the 40m DCC tree, we can add out custom mods later), they do (somewhat) add up, but I think we should definitely rename them in a more systematic manner, and use an MEDM screen to indicate stuff like x4 or x20 or "Active" etc. BTW, the EX and EY QPDs have different settings. But at least the settings are changed synchronously for all four quadrants, unlike the WFS heads...


Unrelated: I had to key the c1iscaux and c1auxey crates.

  13958   Wed Jun 13 23:23:44 2018 johannesUpdateCDSEX wiring confusion

It's true.

I went through the wiring of the c1auxex crate today to disentangle the pin assignments. The full detail can be found in attachment #1, #2 has less detail but is more eye candy. The red flagged channels are now marked for removal at the next opportunity. This will free up DAQ channels as follows:

TYPE Total Available now Available after
ADC 24 2 14
DAC 16 8 12
BIO sinking 16 7 7
BIO sourcing 8 8 8

This should be enough for temperature sensing, NPRO diagnostics, and even eventual remote PDH control with new servo boxes.

Attachment 1: c1auxex_channels.pdf
c1auxex_channels.pdf
Attachment 2: XEND_slow_wiring.pdf
XEND_slow_wiring.pdf
  13961   Thu Jun 14 10:41:00 2018 gautamUpdateCDSEX wiring confusion

Do we really have 2 free ADC channels at EX now? I was under the impression we had ZERO free, which is why we wanted to put a new ADC unit in. I think in the wiring diagram, the Vacuum gauge monitor channel, Seis Can Temp Sensor monitor, and Seis Can Heater channels are missing. It would also be good to have, in the wiring diagram, a mapping of which signals go to which I/O ports (Dsub, front panel BNC etc) on the 4U(?) box housing all the Acromags, this would be helpful in future debugging sessions.

Quote:
 
TYPE Total Available now Available after
ADC 24 2 14

 

  13965   Thu Jun 14 15:31:18 2018 johannesUpdateCDSEX wiring confusion

Bad wording, sorry. Should have been channels in excess of ETMX controls. I'll add the others to the list as well.

Updated channel list and wiring diagram attached. Labels are 'F' for 'Front' and 'R' for - you guessed it - 'Rear', the number identifies the slot panel the breakout is attached to.

Attachment 1: XEND_slow_wiring.pdf
XEND_slow_wiring.pdf
Attachment 2: c1auxex_channels.pdf
c1auxex_channels.pdf
  14000   Thu Jun 21 22:13:12 2018 gautamUpdateCDSpianosa upgrade

pianosa has been upgraded to SL7. I've made a controls user account, added it to sudoers, did the network config, and mounted /cvs/cds using /etc/fstab. Other capabilities are being slowly added, but it may be a while before this workstation has all the kinks ironed out. For now, I'm going to follow the instructions on this wiki to try and get the usual LSC stuff working.

  14003   Fri Jun 22 00:59:43 2018 gautamUpdateCDSpianosa functional, but NO DTT

MEDM, EPICS and dataviewer seem to work, but diaggui still doesn't work (it doesn't work on Rossa either, same problem as reported here, does a fix exist?). So looks like only donatella can run diaggui for now. I had to disable the systemd firewall per the instructions page in order to get EPICS to work. Also, there is no MATLAB installed on this machine yet. sshd has been enabled.

  14007   Fri Jun 22 15:13:47 2018 gautamUpdateCDSDTT working

Seems like DTT also works now. The trick seems to be to run sudo /usr/bin/diaggui instead of just diaggui. So this is indicative of some conflict between the yum installed gds and the relic gds from our shared drive. I also have to manually change the NDS settings each time, probably there's a way to set all of this up in a more smooth way but I don't know what it is. awggui still doesn't get the correct channels, not sure where I can change the settings to fix that.

Attachment 1: Screenshot_from_2018-06-22_15-12-37.png
Screenshot_from_2018-06-22_15-12-37.png
  14008   Fri Jun 22 15:22:39 2018 sudoUpdateCDSDTT working
Quote:

Seems like DTT also works now. The trick seems to be to run sudo /usr/bin/diaggui instead of just diaggui. So this is indicative of some conflict between the yum installed gds and the relic gds from our shared drive. I also have to manually change the NDS settings each time, probably there's a way to set all of this up in a more smooth way but I don't know what it is. awggui still doesn't get the correct channels, not sure where I can change the settings to fix that.

DON"T RUN DIAGGUI AS ROOT

  14027   Wed Jun 27 21:18:00 2018 gautamUpdateCDSLab maintenance scripts from NODUS---->MEGATRON

I moved the N2 check script and the disk usage checking script from the (sudo) crontab of nodus no to the controls user crontab on megatron yes.

  14028   Thu Jun 28 08:09:51 2018 SteveUpdateCDS vacuum pneumatic N2 pressure

The fardest I can go back on channel C1: Vac_N2pres is  320 days

C1:Vac-CC1_Hornet Presuure gauge started logging Feb. 23, 2018

Did you update the " low N2 message"  email addresses?

 

Quote:

I moved the N2 check script and the disk usage checking script from the (sudo) crontab of nodus no to the controls user crontab on megatron yes.

 

Attachment 1: 320d_N2.png
320d_N2.png
  14029   Thu Jun 28 10:28:27 2018 ranaUpdateCDS vacuum pneumatic N2 pressure

we disabled logging the N2 Pressure to a text file, since it was filling up disk space. Now it just sends an email to our 40m mailing list, so we'll all get a warning.

The crontab uses the 'bash' version of output redirection '2>&1', which redirects stdout and stderr, but probably we just want stderr, since stdout contains messages without issues and will just fill up again.

  14133   Sun Aug 5 13:28:43 2018 gautamUpdateCDSc1lsc flaky

Since the lab-wide computer shutdown last Wednesday, all the realtime models running on c1lsc have been flaky. The error is always the same:

[58477.149254] c1cal: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1daf: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1ass: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1oaf: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1lsc: ADC TIMEOUT 0 10963 19 11027
[58478.148001] c1x04: timeout 0 1000000 
[58479.148017] c1x04: timeout 1 1000000 
[58479.148017] c1x04: exiting from fe_code()

This has happened at least 4 times since Wednesday. The reboot script makes recovery easier, but doing it once in 2 days is getting annoying, especially since we are running many things (e.g. ASS) in custom configurations which have to be reloaded each time. I wonder why the problem persists even though I've power-cycled the expansion chassis? I want to try and do some IFO characterization today so I'm going to run the reboot script again but I'll get in touch with J Hanks to see if he has any insight (I don't think there are any logfiles on the FEs anyways that I'll wipe out by doing a reboot). I wonder if this problem is connected to DuoTone? But if so, why is c1lsc the only FE with this problem? c1sus also does not have the DuoTone system set up correctly...

The last time this happened, the problem apparently fixed itself so I still don't have any insight as to what is causing the problem in the first place frown. Maybe I'll try disabling c1oaf since that's the configuration we've been running in for a few weeks.

  14136   Mon Aug 6 00:26:21 2018 gautamUpdateCDSMore CDS woes

I spent most of today fighting various CDS errors.

  • I rebooted c1lsc around 3pm, my goal was to try and do some vertex locking and figure out what the implications were of having only ~30% power we used to have at the AS port.
  • Shortly afterwards (~4pm), c1lsc crashed.
  • Using the reboot script, I was able to bring everything back up. But the DC lights on c1sus models were all red, and a 0x4000 error was being reported.
  • This error is indicative of some timing issue, but all the usual tricks (reboot vertex FEs in various order, restart the mx_streams etc) didn't clear this error.
  • I checked the Tempus GPS unit, that didn't report any obvious problems (i.e. front display was showing the correct UTC time).
  • Finally, I decided to shut down all watchdogs, soft reboot all the FEs, soft reboot FB, power cycle all expansion chassis.
  • This seems to have done the trick - I'm leaving c1oaf disabled for now.
  • The remaining red indicators are due to c1dnn and c1oaf being disabled.

Let's see how stable this configuration is. Onto some locking now...

Attachment 1: CDSoverview.png
CDSoverview.png
  14139   Mon Aug 6 14:38:38 2018 gautamUpdateCDSMore CDS woes

Stability was short-lived it seems. When I came in this morning, all models on c1lsc were dead already, and now c1sus is also dead (Attachment #1). Moreover, MC1 shadow sensors failed for a brief period again this afternoon (Attachment #2). I'm going to wait for some CDS experts to take a look at this since any fix I effect seems to be short-lived. For the MC1 shadow sensors, I wonder if the Trillium box (and associated Sorensen) failure somehow damaged the MC1 shadow sensor/coil driver electronics.

Quote:
 

Let's see how stable this configuration is. Onto some locking now...

Attachment 1: CDScrash.png
CDScrash.png
Attachment 2: MC1failures.png
MC1failures.png
  14140   Mon Aug 6 19:49:09 2018 gautamUpdateCDSMore CDS woes

I've left the c1lsc frontend shutdown for now, to see if c1sus and c1ioo can survive without any problems overnight. In parallel, we are going to try and debug the MC1 OSEM Sensor problem - the idea will be to disable the bias voltage to the OSEM LEDs, and see if the readback channels still go below zero, this would be a clear indication that the problem is in the readback transimpedance stage and not the LED. Per the schematic, this can be done by simply disconnecting the two D-sub connectors going to the vacuum flange (this is the configuration in which we usually use the sat box tester kit for example). Attachment #1 shows the current setup at the PD readout board end. The dark DC count (i.e. with the OSEM LEDs off) is ~150 cts, while the nominal level is ~1000 cts, so perhaps this is already indicative of something being broken but let's observe overnight.

Attachment 1: IMG_7106.JPG
IMG_7106.JPG
  14142   Tue Aug 7 11:30:46 2018 gautamUpdateCDSMore CDS woes

Overnight, all models on c1sus and c1ioo seem to have had no stability issues, supporting the hypothesis that timing issues stem from c1lsc. Moreover, the MC1 shadow sensor readouts showed no negative values over a ~12hour period. I think we should just observe this for another day, in any case I don't think there is any urgent IFO related activity scheduled.

  14143   Tue Aug 7 22:28:23 2018 gautamUpdateCDSMore CDS woes

I am starting the c1x04 model (IOP) on c1lsc to see how it behaves overnight.

Well, there was apparently an immediate reaction - all the models on c1sus and c1ioo reported an ADC timeout and crashed. I'm going to reboot them and still have c1x04 IOP running, to see what happens.

[97544.431561] c1pem: ADC TIMEOUT 3 8703 63 8767
[97544.431574] c1mcs: ADC TIMEOUT 1 8703 63 8767
[97544.431576] c1sus: ADC TIMEOUT 1 8703 63 8767
[97544.454746] c1rfm: ADC TIMEOUT 0 9033 9 8841
Quote:

Overnight, all models on c1sus and c1ioo seem to have had no stability issues, supporting the hypothesis that timing issues stem from c1lsc. Moreover, the MC1 shadow sensor readouts showed no negative values over a ~12hour period. I think we should just observe this for another day, in any case I don't think there is any urgent IFO related activity scheduled.

  14146   Wed Aug 8 23:03:42 2018 gautamUpdateCDSc1lsc model started

As part of this slow but systematic debugging, I am turning on the c1lsc model overnight to see if the model crashes return.

  14149   Thu Aug 9 12:31:13 2018 gautamUpdateCDSCDS status update

The model seems to have run without issues overnight. Not completely related, but the MC1 shadow sensor signals also don't show any abnormal excursions to negative values in the last 48 hours. I'm thinking about re-connecting the satellite box (but preserving the breakout setup at 1X6 for a while longer) and re-locking the IMC. I'll also start c1ass on the c1lsc frontend. I would say that the other models on c1lsc (i.e. c1oaf, c1cal, c1daf) aren't really necessary for basic IFO operation.

Quote:

As part of this slow but systematic debugging, I am turning on the c1lsc model overnight to see if the model crashes return.

  14151   Thu Aug 9 22:50:13 2018 gautamUpdateCDSAlignSoft script modified

After this work of increasing the series resistance on ETMX, there have been numerous occassions where the insufficient misalignment of ETMX has caused problems in locking vertex cavities. Today, I modified the script (located at /opt/rtcds/caltech/c1/medm/MISC/ifoalign/AlignSoft.py) to avoid such problems. The way the misalign script works is to write an offset value to the "TO_COIL" filter bank (accessed via "Output Filters" button on the Suspension master MEDM screen - not the most intuitive place to put an offset but okay). So I just increased the value of this offset from 250 counts to 2500 counts (for ETMX only). I checked that the script works, now when both ETMs are misaligned, the AS55Q signal shows a clean Michelson-like sine wave as it fringes instead of having the arm cavity PDH fringes as well yes.

Note that the svn doesn't seem to work on the newly upgraded SL7 machines: svn status gives me the following output.

svn: E155036: Please see the 'svn upgrade' command
svn: E155036: Working copy '/cvs/cds/rtcds/userapps/trunk/cds/c1/medm/MISC/ifoalign' is too old (format 10, created by Subversion 1.6)

 Is it safe to run 'svn upgrade'? Or is it time to migrate to git.ligo.org/40m/scripts?

Attachment 1: MichelsonFringing.png
MichelsonFringing.png
  14166   Wed Aug 15 21:27:47 2018 gautamUpdateCDSCDS status update

Starting c1cal now, let's see if the other c1lsc FE models are affected at all... Moreover, since MC1 seems to be well-behaved, I'm going to restore the nominal eurocrate configuration (sans extender board) tomorrow.

  14171   Mon Aug 20 15:16:39 2018 JonUpdateCDSRebooted c1lsc, slow machines

When I came in this morning no light was reaching the MC. One fast machine was dead, c1lsc, and a number of the slow machines: c1susaux, c1iool0, c1auxex, c1auxey, c1iscaux. Gautam walked me through reseting the slow machines manually and the fast machines via the reboot script. The computers are all back online and the MC is again able to lock.

  14187   Tue Aug 28 18:39:41 2018 JonUpdateCDSC1LSC, C1AUX reboots

I found c1lsc unresponsive again today. Following the procedure in elog #13935, I ran the rebootC1LSC.sh script to perform a soft reboot of c1lsc and restart the epics processes on c1lsc, c1sus, and c1ioo. It worked. I also manually restarted one unresponsive slow machine, c1aux.

After the restarts, the CDS overview page shows the first three models on c1lsc are online (image attached). The above elog references c1oaf having to be restarted manually, so I attempted to do that. I connect via ssh to c1lsc and ran the script startc1oaf. This failed as well, however.

In this state I was able to lock the MICH configuration, which is sufficient for my purposes for now, but I was not able to lock either of the arm cavities. Are some of the still-dead models necessary to lock in resonant configurations?

Attachment 1: CDS_FE_STATUS.png
CDS_FE_STATUS.png
  14192   Tue Sep 4 10:14:11 2018 gautamUpdateCDSCDS status update

c1lsc crashed again. I've contacted Rolf/JHanks for help since I'm out of ideas on what can be done to fix this problem.

Quote:

Starting c1cal now, let's see if the other c1lsc FE models are affected at all... Moreover, since MC1 seems to be well-behaved, I'm going to restore the nominal eurocrate configuration (sans extender board) tomorrow.

  14193   Wed Sep 5 10:59:23 2018 wgautamUpdateCDSCDS status update

Rolf came by today morning. For now, we've restarted the FE machine and the expansion chassis (note that the correct order in which to do this is: turn off computer--->turn off expansion chassis--->turn on expansion chassis--->turn on computer). The debugging measures Rolf suggested are (i) to replace the old generation ADC card in the expansion chassis which has a red indicator light always on and (ii) to replace the PCIe fiber (2010 make) running from the c1lsc front-end machine in 1X6 to the expansion chassis in 1Y3, as the manufacturer has suggested that pre-2012 versions of the fiber are prone to failure. We will do these opportunistically and see if there is any improvement in the situation.

Another tip from Rolf: if the c1lsc FE is responsive but the models have crashed, then doing sudo reboot by ssh-ing into c1lsc should suffice* (i.e. it shouldn't take down the models on the other vertex FEs, although if the FE is unresponsive and you hard reboot it, this may still be a problem). I'll modify I've modified the c1lsc reboot script accordingly.

* Seems like this can still lead to the other vertex FEs crashing, so I'm leaving the reboot script as is (so all vertex machines are softly rebooted when c1lsc models crash).

Quote:

c1lsc crashed again. I've contacted Rolf/JHanks for help since I'm out of ideas on what can be done to fix this problem.

  14194   Thu Sep 6 14:21:26 2018 gautamUpdateCDSADC replacement in c1lsc expansion chassis

Todd E. came by this morning and gave us (i) 1x new ADC card and (ii) 1x roll of 100m (2017 vintage) PCIe fiber. This afternoon, I replaced the old ADC card in the c1lsc expansion chassis, and have returned the old card to Todd. The PCIe fiber replacement is a more involved project (Steve is acquiring some protective tubing to route it from the FE in 1X6 to the expansion chassis in 1Y3), but hopefully the problem was the ADC card with red indicator light, and replacing it has solved the issue. CDS is back to what is now the nominal state (Attachment #1) and Yarm is locked for Jon to work on his IFOcoupling study. We will monitor the stability in the coming days.

Quote:

(i) to replace the old generation ADC card in the expansion chassis which has a red indicator light always on and (ii) to replace the PCIe fiber (2010 make) running from the c1lsc front-end machine in 1X6 to the expansion chassis in 1Y3, as the manufacturer has suggested that pre-2012 versions of the fiber are prone to failure. We will do these opportunistically and see if there is any improvement in the situation.

Attachment 1: CDSoverview.png
CDSoverview.png
  14195   Fri Sep 7 12:35:14 2018 gautamUpdateCDSADC replacement in c1lsc expansion chassis

Looks like the ADC was not to blame, same symptoms persist.

Quote:

The PCIe fiber replacement is a more involved project (Steve is acquiring some protective tubing to route it from the FE in 1X6 to the expansion chassis in 1Y3), but hopefully the problem was the ADC card with red indicator light, and replacing it has solved the issue.

Attachment 1: Screenshot_from_2018-09-07_12-34-52.png
Screenshot_from_2018-09-07_12-34-52.png
  14196   Mon Sep 10 12:44:48 2018 JonUpdateCDSADC replacement in c1lsc expansion chassis

Gautam and I restarted the models on c1lsc, c1ioo, and c1sus. The LSC system is functioning again. We found that only restarting c1lsc as Rolf had recommended did actually kill the models running on the other two machines. We simply reverted the rebootC1LSC.sh script to its previous form, since that does work. I'll keep using that as required until the ongoing investigations find the source of the problem.

Quote:

Looks like the ADC was not to blame, same symptoms persist.

Quote:

The PCIe fiber replacement is a more involved project (Steve is acquiring some protective tubing to route it from the FE in 1X6 to the expansion chassis in 1Y3), but hopefully the problem was the ADC card with red indicator light, and replacing it has solved the issue.

 

  14202   Thu Sep 20 11:29:04 2018 gautamUpdateCDSNew PCIe fiber housed

[steve, yuki, gautam]

The plastic tubing/housing for the fiber arrived a couple of days ago. We routed ~40m of fiber through roughly that length of the tubing this morning, using some custom implements Steve sourced. To make sure we didn't damage the fiber during this process, I'm now testing the vertex models with the plastic tubing just routed casually (= illegally) along the floor from 1X4 to 1Y3 (NOTE THAT THE WIKI PAGE DIAGRAM IS OUT OF DATE AND NEEDS TO BE UPDATED), and have plugged in the new fiber to the expansion chassis and the c1lsc front end machine. But I'm seeing a DC error (0x4000), which is indicative of some sort of timing error (Attachment #1) **. Needs more investigation...

Pictures + more procedural details + proper routing of the protected fiber along cable trays after lunch. If this doesn't help the stability problem, we are out of ideas again, so fingers crossed...

** In the past, I have been able to fix the 0x4000 error by manually rebooting fb (simply restarting the daqd processes on fb using sudo systemctl restart daqd_* doesn't seem to fix the problem). Sure enough, seems to have done the job this time as well (Attachment #2). So my initial impression is that the new fiber is functioning alright yes.

Quote:

The PCIe fiber replacement is a more involved project (Steve is acquiring some protective tubing to route it from the FE in 1X6 to the expansion chassis in 1Y3)

Attachment 1: PCIeFiberSwap.png
PCIeFiberSwap.png
Attachment 2: PCIeFiberSwap_FBrebooted.png
PCIeFiberSwap_FBrebooted.png
  14203   Thu Sep 20 16:19:04 2018 gautamUpdateCDSNew PCIe fiber install postponed to tomorrow

[steve, gautam]

This didn't go as smoothly as planned. While there were no issues with the new fiber over the ~3 hours that I left it plugged in, I didn't realize the fiber has distinct ends for the "HOST" and "TARGET" (-5 points to me I guess). So while we had plugged in the ends correctly (by accident) for the pre-lunch test, while routing the fiber on the overhead cable tray, we switched the ends (because the "HOST" end of the cable is close to the reel and we felt it would be easier to do the routing the other way. 

Anyway, we will fix this tomorrow. For now, the old fiber was re-connected, and the models are running. IMC is locked.

Quote:

Pictures + more procedural details + proper routing of the protected fiber along cable trays after lunch. If this doesn't help the stability problem, we are out of ideas again, so fingers crossed...

  14206   Fri Sep 21 16:46:38 2018 gautamUpdateCDSNew PCIe fiber installed and routed

[steve, koji, gautam]

We took another pass at this today, and it seems to have worked - see Attachment #1. I'm leaving CDS in this configuration so that we can investigate stability. IMC could be locked. However, due to the vacuum slow machine having failed, we are going to leave the PSL shutter closed over the weekend.

Attachment 1: PCIeFiber.png
PCIeFiber.png
Attachment 2: IMG_5878.JPG
IMG_5878.JPG
  14208   Fri Sep 21 19:50:17 2018 KojiUpdateCDSFrequent time out

Multiple realtime processes on c1sus are suffering from frequent time outs. It eventually knocks out c1sus (process).

Obviously this has started since the fiber swap this afternoon.

gautam 10pm: there are no clues as to the origin of this problem on the c1sus frontend dmesg logs. The only clue (see Attachment #3) is that the "ADC" error bit in the CDS status word is red - but opening up the individual ADC error log MEDM screens show no errors or overflows. Not sure what to make of this. The IOP model on this machine (c1x02) reports an error in the "Timing" bit of the CDS status word, but from the previous exchange with Rolf / J Hanks, this is down to a misuse of ADC0 Ch31 which is supposed to be reserved for a DuoTone diagnostic signal, but which we use for some other signal (one of the MC suspension shadow sensors iirc). The response is also not consistent with this CDS manual - which suggests that an "ADC" error should just kill the models. There are no obvious red indicator lights in the c1sus expansion chassis either.

Attachment 1: 33.png
33.png
Attachment 2: 49.png
49.png
Attachment 3: Screenshot_from_2018-09-21_21-52-54.png
Screenshot_from_2018-09-21_21-52-54.png
  14210   Sat Sep 22 00:21:07 2018 KojiUpdateCDSFrequent time out

[Gautam, Koji]

We had another crash of c1sus and Gautam did full power cycling of c1sus. It was a sturggle to recover all the frontends, but this solved the timing issue.

We went through full reset of c1sus, and rebooting all the other RT hosts, as well as daqd and fb1.

Attachment 1: 23.png
23.png
  14253   Sun Oct 14 16:55:15 2018 not gautamUpdateCDSpianosa upgrade

DASWG is not what we want to use for config; we should use the K. Thorne LLO instructions, like I did for ROSSA.

Quote:

pianosa has been upgraded to SL7. I've made a controls user account, added it to sudoers, did the network config, and mounted /cvs/cds using /etc/fstab. Other capabilities are being slowly added, but it may be a while before this workstation has all the kinks ironed out. For now, I'm going to follow the instructions on this wiki to try and get the usual LSC stuff working.

  14267   Fri Nov 2 12:07:16 2018 ranaUpdateCDSNDScope

https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=44971

Let's install Jamie's new Data Viewer

  14293   Tue Nov 13 21:53:19 2018 gautamUpdateCDSRFM errors

This problem resurfaced, which I noticed when I couldn't get the single arm locks going.

The fix was NOT restarting the c1rfm model, which just brought the misery of all vertex FEs crashing and the usual dance to get everything back.

Restarting the sender models (i.e. c1scx and c1scy) seems to have done the trick though.

Attachment 1: RFMerrors.png
RFMerrors.png
  14344   Tue Dec 11 14:33:29 2018 gautamUpdateCDSNDScope

NDscope is now running on pianosa. To be really useful, we need the templates, so I've made /users/Templates/NDScope_templates where these will be stored. Perhaps someone can write a parser to convert dataviewer .xml to something ndscope can understand. To get it installed, I had to run:

sudo yum install ndscope
sudo yum install python34-gpstime
sudo yum install python34-dateutil
sudo yum install python34-requests

 I also changed the pythonpath variable to include the python3.4 site-packages library in .bashrc

Quote:

https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=44971

Let's install Jamie's new Data Viewer

Attachment 1: ndscope.png
ndscope.png
  14356   Thu Dec 13 22:56:28 2018 gautamUpdateCDSFrames

[koji, gautam]

We looked into the /frames situation a bit tonight. Here is a summary:

  1. We have already lost some second trend data since the new FB has been running from ~August 2017.
  2. The minute trend data is still safe from that period, we believe.
  3. The Jetstor has ~2TB of trend data in the /frames/trend folder.
    • This is a combination of "second", "minute_raw" and "minute".
    • It is not clear to us what the distinction is between "minute_raw" and "minute", except that the latter seems to go back farther in time than the former.
    • Even so, the minute trend folder from October 2011 is empty - how did we manage to get the long term trend data?? From the folder volumes, it appears that the oldest available trend data is from ~July 24 2015.

Plan of action:

  1. The wiper script needs to be tweaked a bit to allow more storage for the minute trends (which we presumably want to keep for long term).
  2. We need to clear up some space on FB1 to transfer the old trend data from Jetstor to FB1.
  3. We need to revive the data backup via LDAS. Also summary pages.

BTW - the last chiara (shared drive) backup was October 16 6 am. dmesg showed a bunch of errors, Koji is now running fsck in a tmux session on chiara, let's see if that repairs the errors. We missed the opportunity to swap in the 4TB backup disk, so we will do this at the next opportunity.

  14359   Fri Dec 14 14:25:36 2018 KojiUpdateCDSchiara backup

fsck of chiara backup disk (UUID="90a5c98a-22fb-4685-9c17-77ed07a5e000") was done. But this required many files to be fixed. So the backed-up files are not reliable now.
On the top of that, the disk became not recognized from the machine.

I went to the disk and disconnected the USB and then the power supply, which was/is connected to the UPS.
Then they are reconnected again. This made the disk came back as /media/90a5c98a-22fb-4685-9c17-77ed07a5e000. (*)
After unmounting this disk, I ran "sudo mount -a" to follow the way of mounting as fstab does.
Now I am running the backup script manually so that we can pretend to maintain a snapshot of the day at least.

(*) This is the same situation we found at the recovery from the power shutdown. So my hypothesis is that on Oct 16 at 7 AM during the backup there was a USB failure or disk failure or something which unmounted the disk. This caused some files got damaged. Also this caused the disk mounted as /media/90a5c98a-22fb-4685-9c17-77ed07a5e000. So since then, we did not have the backup.
Update (20:00): The disk connection failed again. I think this disk is no longer reliable.

 

Attachment 1: fsck_log.log
sudo fsck -yV UUID="90a5c98a-22fb-4685-9c17-77ed07a5e000"           [238/276]
[sudo] password for controls:
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /media/40mBackup] fsck.ext4 -y /dev/sde1
e2fsck 1.42 (29-Nov-2011)
/dev/sde1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Error reading block 527433852 (Attempt to read block from filesystem resulted in
 short read) while getting next inode from scan.  Ignore error? yes

... 283 more lines ...
  14374   Thu Dec 20 17:17:41 2018 gautamUpdateCDSLogging of new Vacuum channels

Added the following channels to C0EDCU.ini:

[C1:Vac-P1b_pressure]
units=torr
[C1:Vac-PRP_pressure]
units=torr
[C1:Vac-PTP2_pressure]
units=torr
[C1:Vac-PTP3_pressure]
units=torr
[C1:Vac-TP2_rot]
units=kRPM
[C1:Vac-TP3_rot]
units=kRPM

Also modified the old P1 channel to

[C1:Vac-P1a_pressure]
units=torr

Unfortunately, we realized too late that we don't have these channels in the frames, so we don't have the data from this test pumpdown logged, but we will have future stuff. I say we should also log diagnostics from the pumps, such as temperature, current etc. After making the changes, I restarted the daqd processes.


Things to add to ASA wiki page once the wiki comes back online:

  1. What is the safe way to clean the cryo pump if we want to use it again?
  2. What are safe conditions to turn the RGA on?
ELOG V3.1.3-