40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 51 of 339  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  12321   Thu Jul 21 15:03:13 2016 varunUpdateCDSDAFI Update

1) I have added the status summary of the DAFI block to the main FE status overview screen in the c1lsc cloumn. (attachment 1)

2) I have edited all the kissel matrix buttons appropriately, and given them appropriate lables. (attachment 2)

Attachment 1: festatus.png
festatus.png
Attachment 2: matrices.png
matrices.png
  12324   Thu Jul 21 22:02:35 2016 varunUpdateCDSDAFI update: Frequency warping

The code for frequency warping contained a "printf()" command, which had caused the system to crash in one another instance (refer elog 12320) . Hence, I tried running the code tody by removing this line. Unfortunately, this did not work. the model still crashed. Attached is the screenshot of the FE status.

Attachment 1: 07212016.png
07212016.png
  12336   Tue Jul 26 09:56:34 2016 ericqUpdateCDSc1susaux restarted

c1susaux (which controls watchdogs and alignments for all non-ETM optics) was down, the last BURT was done yesterday around 2PM. 

I restarted via keying the crate. I restored the BURT snapshot from yesterday.

  12361   Mon Aug 1 20:09:37 2016 ranaUpdateCDSDAFI Update

I found the DAFI screen as a button inside of the LSC screen - I think its more logically found from the sitemap, so I'll move it into there as well.

Quote:

1) I have added the status summary of the DAFI block to the main FE status overview screen in the c1lsc cloumn. (attachment 1)

2) I have edited all the kissel matrix buttons appropriately, and given them appropriate lables. (attachment 2)

Gautam and I noticed a 60 Hz + harmonics hum which comes from the DAFI. Its the noisiest thing in the control room. It goes away when we unplug the fiber coming into the control room FiBox receiver, so its not a ground loop on this end. Probably a ground loop at the LSC rack.

Upon further investigation we notice that the Fibox at the LSC rack had its gain turned all the way up to +70 dB. This seemed too much, we reduced it to ~20 (?) so that we could use more of the DAC range.  Also, it is powered by a AC/DC converter plugged in to the LSC rack power strip. We cannot use this for a permanent install - must power the FiBox using the same power supplies as are used for the LSC electronics. Probably we'll have to make a little box that takes the fused rack power of 15 V and turns it into +12 V with a regulator (max current of 0.15 A). Making sure that the FiBox doesn't pollute the rest of the LSC stuff with its nasty internal DC-DC converters.

We also put a high pass in the output filter banks of DAFI. For the PEM channels we put in a 60 Hz comb. WE then routed the Y-end Guralp in through the boxes and out the output, mostly bypassing the frequency shifting and AGC. It seems that there is still a problem with GUR2.

Does anyone know which one is GUR1 and which one is GUR2? I don't remember the result of the Guralp cable switching adventures - maybe Koji or Steve does. According to the trend it was totally dead before March and in March it became alive enough for us to see ~30 ADC counts of action, so way smaller than GUR111 or GUR snoopy or whatever its called.

  12539   Fri Oct 7 20:25:14 2016 KojiUpdateCDSPower-cycled c1psl and c1iool0

Found the MC autolocker kept failing, It turned out that c1iool0 and c1psl went bad and did not accept the epics commands.

Went to the rack and power cycled them. Burt resotred with the snapshot files at 5:07 today.

The PMC lock was restored, IMC was locked, WFS turned on, and WFS output offloaded to the bias sliders.

The PMC seemed highly misaligned, but I didn't bother myself to touch it this time.

  12542   Mon Oct 10 11:48:05 2016 gautamUpdateCDSPower-cycled c1susaux, realigned PMC, spots centered on WFS1 and WFS2

[Koji, Gautam]

We did the following today morning:

  1. I re-aligned the PMC - transmission level on the scope on the PSL table is now ~0.72V which is around what I remember it being
  2. The spot had fallen off WFS 2 - so we froze the output of the MC WFS servo, and turned the servo off. Then we went to the table to re-center the spot on the WFS. The alignment had drifted quite a bit on WFS2, and so we had to change the scale on the grid on the MEDM screen to +/-10 (from +/- 1) to find the spot and re-center it using the steering mirror immediately before the WFS. It would appear that the dark offsets are different on WFS1 and WFS2, so the "SUM" reads ~2.5 on WFS1 and ~0.3 on WFS2 when the spots are well centered
  3. Coming back to the control room, we ran the WFSoffsets script and turned on the WFS servo again. Trying to run the relief servo, we were confronted by an error message that c1susaux needed to be power cycled (again). This is of course the slow machine that the ITMX suspension is controlled by, and in the past, power cycling c1susaux has resulted in the optic getting stuck. An approach that seems to work (without getting ITMX stuck)  is to do the following:
    • Save the alignment of the optic, turn off Oplev servo
    • Move the bias sliders on IFO align to (0,0) slowly
    • Turn the watchdog for ITMX off
    • Unplug the cables running from the satellite box to the vacuum feedthrough
    • Power cycle the slow machine. Be aware that when the machine comes back on, the offset sliders are reset to the value in the saved file! So before plugging the cables back in, it would be advisable to set these to (0,0) again, to avoid kicking the optic while plugging the cables back in
    • Plug in the cables, restore alignment and Oplev servos, check that the optic isn't stuck
  4. Y green beat touch up - I tweaked the alignment of the first mirror steering the PSL green (after the beam splitter to divide PSL green for X and Y beats) to maximize the beat amplitude on a fast scope. Doing so increased the beat amplitude on the scope from about 20mVpp to ~35mVpp. A detailed power budget for the green beats is yet to be done

It is unfortunate we have to do this dance each time c1susaux has to be restarted, but I guess it is preferable to repeated unsticking of the optic, which presumably applies considerable shear force on the magnets...


After Wednesday's locking effort, Eric had set the IFO to the PRMI configuration, so that we could collect some training data for the PRC angular feedforward filters and see if the filter has changed since it was last updated. We should have plenty of usable data, so I have restored the arms now.

  12592   Wed Nov 2 22:56:45 2016 gautamUpdateCDSc1pem revamped

installing the BLRMS 2k blocks turned out to be quite non-trivial due to a whole host of CDS issues that had to be debugged, but i've restored everything to a good state now, and the channels are being logged. detailed entry with all the changes to follow.

  12595   Thu Nov 3 12:38:42 2016 gautamUpdateCDSc1pem revamped

A number of changes were made to C1PEM and some library parts. Recall that the motivation was to add BLRMS channels for all our suspension coils and shadow sensor PDs, which we are first testing out on the IMC mirrors.

Here is the summary:

BLRMS_2k library block

  • The name of the custom C code block in this library part was named 'BLRMSFILTER' which conflicted with the name of the function call in the C code it is linked to, which lead to compilation errors
  • Even though the part was found in /opt/rtcds/userapps/release/cds/c1/models and not in the common repository, just to be safe, I made a copy of the part called BLRMS_2k_40m which lives in the above directory. I also made a copy of the code it calls in /opt/rtcds/userapps/release/cds/c1/src

C1PEM model + filter channels

  • Adding the updated BLRMS_2k_40m library part still resulted in some compilation errors - specifically, it was telling me to check for missing links around the ADC parts
  • Eric suggested that the error messages might not be faithfully reporting what the problem is - true enough, the problem lay in the fact that c1pem wasn't updated to follow the namespace convention that we now use in all the RT models - the compiler was getting confused by the fact that the BLRMS stuff was in a namespace block called 'SUS', but the rest of the PEM stuff wasn't in such a block
  • I revamped c1pem to add namespace blocks called PEM and DAF, and put the appropriate stuff in the blocks, after which there were no more compilation errors
  • However, this namespace convention messed up the names of the filter modules and associated channels - this was resolved with Eric's help (find and replace did the job, this is a familiar problem that we had encountered not too long ago when C1IOO was similarly revamped...)
  • There was one last twist in that the model would compile and install, but just would not start. I tried the usual voodo of restarting all the models, and even did a soft reboot of c1sus, to no avail. Looking at dmesg, I tracked the problem down to a burt restore issue - the solution was to press the little 'BURT' button next to c1pem on the CDS overview MEDM screen as soon as it appeared while restarting the model

All the channels seem to exist, and FB seems to not be overloaded judging by the performance overnight up till the power outage. I will continue to monitor this...

GV Edit 3 Nov 2016 7pm:

I had meant to check the suitability of the filters used - there is a detailed account of the filters implemented in BLRMSFILTER.c here, and I quickly looked at the file on hand to make sure the BP filters made sense (see Attachment #1). These the BP filters are 8th order elliptical filters and the lowpass filters are16th order elliptical filters scaled for the appropriate frequency band, which are somewhat different from what we use on the seismometer BLRMS channels, where the filters are order 4, but I don't think we are significantly overloaded on the computational aspect, and the lowpass filters have sufficiently steep roll-off, these should be okay...

Attachment 1: BLRMSresp.pdf
BLRMSresp.pdf
  12597   Thu Nov 3 13:36:16 2016 ericqUpdateCDSc1pem revamped

It seems that the EX and EY BLRMS banks were missing the BP and LP filters for the 0.03-0.1 and 0.1-0.3 bands. I've copied over the filters from the BS seismometer.

However, if it looks like the integrated C code BLRMS block works out well, we could replace the seismometers' filter module heavy BLRMS blocks and cut down on the PEM model bloat.

  12599   Fri Nov 4 18:31:05 2016 LydiaUpdateCDSc1auxex channels/pins for Acromag

Here are the channels we are planning to switch over from c1auxex to Acromag, and their current pin numbers on the existing VME boards. 

Analog inputs: 

C1:SUS-ETMX_UL_AIOut    #C0 S0
C1:SUS-ETMX_LL_AIOut    #C0 S1
C1:SUS-ETMX_UR_AIOut    #C0 S2
C1:SUS-ETMX_LR_AIOut    #C0 S3
C1:SUS-ETMX_Side_AIOut    #C0 S4
C1:SUS-ETMX_OL_SEG1    #C0 S5
C1:SUS-ETMX_OL_SEG2    #C0 S6
C1:SUS-ETMX_OL_SEG3    #C0 S7
C1:SUS-ETMX_OL_SEG4    #C0 S8
C1:SUS-ETMX_OL_X    #C0 S9
C1:SUS-ETMX_OL_Y    #C0 S10
C1:SUS-ETMX_OL_S    #C0 S11
C1:SUS-ETMX_ULPD    #C0 S12
C1:SUS-ETMX_LLPD    #C0 S13
C1:SUS-ETMX_URPD    #C0 S14
C1:SUS-ETMX_LRPD    #C0 S15
C1:SUS-ETMX_SPD    #C0 S16
C1:SUS-ETMX_ULV    #C0 S17
C1:SUS-ETMX_LLV    #C0 S18
C1:SUS-ETMX_URV    #C0 S19
C1:SUS-ETMX_LRV    #C0 S20
C1:SUS-ETMX_SideV    #C0 S21
C1:SUS-ETMX_ULPD_MEAN    #C0 S12
C1:SUS-ETMX_LLPD_MEAN    #C0 S13
C1:SUS-ETMX_SDPD_MEAN    #C0 S16

Analog Outputs:

C1:ASC-QPDX_S1WhiteGain    #C0 S0
C1:ASC-QPDX_S2WhiteGain    #C0 S1
C1:ASC-QPDX_S3WhiteGain    #C0 S2
C1:ASC-QPDX_S4WhiteGain    #C0 S3
C1:SUS-ETMX_ULBiasAdj    #C0 S4
C1:SUS-ETMX_LLBiasAdj    #C0 S5
C1:SUS-ETMX_URBiasAdj    #C0 S6
C1:SUS-ETMX_LRBiasAdj    #C0 S7
C1:LSC-EX_GREENLASER_TEMP    #C0 S0 This appears to have the same pin as another channel-- is it not being used? 

Binary Outputs:

C1:SUS-ETMX_UL_ENABLE    #C0 S0
C1:SUS-ETMX_LL_ENABLE    #C0 S1
C1:SUS-ETMX_UR_ENABLE    #C0 S2
C1:SUS-ETMX_LR_ENABLE    #C0 S3
C1:SUS-ETMX_SD_ENABLE    #C0 S4
C1:ASC-QPDX_GainSwitch1    #C0 S7
C1:ASC-QPDX_GainSwitch2    #C0 S8
C1:ASC-QPDX_GainSwitch3    #C0 S9
C1:ASC-QPDX_GainSwitch4    #C0 S10
C1:AUX-GREEN_X_Shutter2    #C0 S15

  12600   Sat Nov 5 15:45:44 2016 ranaUpdateCDSc1auxex channels/pins for Acromag

We don't need to record any of the AIOut channels, the OL channels (since we record them fast), or the _MEAN channels (I think they must be CALC records or just bogus).

  12604   Mon Nov 7 19:49:44 2016 JohannesUpdateCDSacromag chassis hooked up to PSL

[Lydia, Johannes]

We're waiting on the last couple electrical components to arrive that are needed to complete the acromag chassis, but it is essentially operational. Right now it is connected to the PSL Mephisto's diagnostics port, for which only a single XT1221 A/D unit is needed. We assigned the IP address 192.168.113.121 to it. For the time being I'm running a tmux session on megatron (named "acromag") that grabs and broadcasts the epics channels, with Lydia's original channel definitions. Since the chassis is 4U tall, there's not really any place in the rack for it, so we might want to move it to the X-end before we start shuffling rack components around. Once we finalize its location we can proceed with adding the channels to the frames.

For the eventual gradual replacement of the slow machines, we need to put some thought into the connectors we want in the chassis. If we want to replicate the VME crate connectors we probably need to make our own PCB boards for them, as there don't seem to be panel-mount screw terminal blocks readily available for DIN 41612 connectors. Furthermore, if we want to add whitening/AA filters, the chassis may actually be large enough to accomodate them, and arranging things on the inside is quite flexible. There are a few things to be considered when moving forward, for example how many XT units we can practically fit in the chassis (space availability, heat generation, and power requirements) and thus how many channels/connectors we can support with each.

Steve: 1X3 has plenty of room

Attachment 1: acromag_chassis_location.jpg
acromag_chassis_location.jpg
Attachment 2: acromag_chassis_top_view.jpg
acromag_chassis_top_view.jpg
  12607   Tue Nov 8 17:51:09 2016 LydiaUpdateCDSacromag chassis hooked up to PSL

We set up the chassis in 1X7 today. Steve is ordering a longer 25 pin cable to reach. Until then the PSL diagnostic channels will not be usable.

  12608   Wed Nov 9 11:40:44 2016 ericqUpdateCDSsafe.snap BURT files now in svn

This is long overdue, but our burt files for SDF now live in the LIGO userapps SVN, as they should.

The canonical files live in places like /opt/rtcds/userapps/release/cds/c1/burtfiles/c1x01_safe.snap and are symlinked to files like /opt/rtcds/caltech/c1/target/c1x01/c1x01epics/burt/safe.snap

  12610   Thu Nov 10 19:02:03 2016 gautamUpdateCDSEPICS Freezes are back

I've been noticing over the last couple of days that the EPICS freezes are occurring more frequently again. Attached is an instance of StripTool traces flatlining. Not sure what has changed recently in terms of the network to cause the return of this problem... Also, they don't occur coincidentally on multiple workstations, but they do pop up in both pianosa and rossa.

Not sure if it is related, but we have had multiple slow machine crashes today as well. Specifically, I had to power cycle C1PSL, C1SUSAUX, C1AUX, C1AUXEX, C1IOOL0 at some point today

Attachment 1: epicsFreezesBack.png
epicsFreezesBack.png
  12613   Mon Nov 14 14:21:06 2016 gautamSummaryCDSReplacing DIMM on Optimus

I replaced the suspected faulty DIMM earlier today (actually I replaced a pair of them as per the Sun Fire X4600 manual). I did things in the following sequence, which was the recommended set of steps according to the maintenance manual and also the set of graphics on the top panel of the unit:

  1. Checked that Optimus was shut down
  2. Removed the power cables from the back to cut the standby power. Two of the fan units near the front of the chassis were displaying fault lights, perhaps this has been the case since the most recent power outage after which I did not reboot Optimus
  3. Took off the top cover, removed CPU 6 (labelled "G" in the unit). The manual recommends finding faulty DIMMs by looking for an LED that is supposed to indicate the location of the bad card, but I couldn't find any such LEDs in the unit we have, perhaps this is an addition to the newer modules?
  4. Replaced the topmost (w.r.t the orientation the CPU normally sits inside the chassis) DIMM card with one of the new ones Steve ordered
  5. Put everything back together, powered Optimus up again. Reboot went smoothly, fan unit fault lights which I mentioned earlier did not light up on the reboot so that doesn't look like an issue.

I then checked for memory errors using edac-utils, and over the last couple of hours, found no errors (corrected or otherwise, see Praful's earlier elog for the error messages that we were getting prior to the DIMM swap)- I guess we will need to monitor this for a while more before we can say that the issue has been resolved.

Looking at dmesg after the reboot, I noticed the following error messages (not related to the memory issue I think):

[   19.375865] k10temp 0000:00:18.3: unreliable CPU thermal sensor; monitoring disabled
[   19.375996] k10temp 0000:00:19.3: unreliable CPU thermal sensor; monitoring disabled
[   19.376234] k10temp 0000:00:1a.3: unreliable CPU thermal sensor; monitoring disabled
[   19.376362] k10temp 0000:00:1b.3: unreliable CPU thermal sensor; monitoring disabled
[   19.376673] k10temp 0000:00:1c.3: unreliable CPU thermal sensor; monitoring disabled
[   19.376816] k10temp 0000:00:1d.3: unreliable CPU thermal sensor; monitoring disabled
[   19.376960] k10temp 0000:00:1e.3: unreliable CPU thermal sensor; monitoring disabled
[   19.377152] k10temp 0000:00:1f.3: unreliable CPU thermal sensor; monitoring disabled

I wonder if this could explain why the fans on Optimus often go into overdrive and make a racket? For the moment, the fan volume seems normal, comparable to the other SunFire X4600s we have running like megatron and FB...

  12615   Mon Nov 14 19:32:51 2016 ranaSummaryCDSReplacing DIMM on Optimus

I did apt-get update and then apt-get upgrade on optimus. All systems are nominal.

  12632   Mon Nov 21 19:54:13 2016 JohannesUpdateCDSacromag chassis hooked up to PSL

[Lydia, Johannes]

We connected and powered up the Acromag chassis today. It lives in 1X4 and is powered by the Sorensen +20V power supply in 1X5 via the fuse rail on the side of 1X4. For this we had to branch off the 20V path to the dewhitening and anti-image filter crate of the c1:susaux driven SOS optics. After confirming that none of the daughter modules in the crate draw from the 20V line, we added a wire leading to a new fuse we added for this unit and ran a power cable from there.

The diagnostic connector of the PSL laser is now connected to the unit and a tmux session was created on megatron that interfaces with the chassis and broadcasts the EPICS channels. We need to watch out in the coming days for epics freezes/outages, as in the past these seemed to occur around the same times we were toying with the Acromags.

Quote:

We set up the chassis in 1X7 today. Steve is ordering a longer 25 pin cable to reach. Until then the PSL diagnostic channels will not be usable.

 

Attachment 1: acromag_chassis.jpg
acromag_chassis.jpg
  12649   Wed Nov 30 11:56:56 2016 LydiaUpdateCDSWiring for Acromag auxex replacement

I've attached a schematic for how we will connect the Acromag mosules to the slow channel I/O curently going to c1auxex. The following changes are made:

  • We are getting rid of the slow readbacks from the Anti-Image and Oplev boards, as Rana says they are unnnecessary.
  • The whitening switching for the QPD is currently done by a Contec "fast" binary I/O module, but can be managed by acromag instead. This alllows CAB_1Y9_34 to  be fed directly into the Acromag box since all of its connections can now be managed slow. 
  • There's no need to change the PD whitening scheme around (since the signals never get huge), so we can set those to always be on and then lose those Contec channels. This means all of the necessary pins on CAB_1Y9_10 can go to Acromag. 
  • All the other backplane cables go the the fast machines only. 

 

Attachment 1: auxex_acromag.pdf
auxex_acromag.pdf
  12651   Wed Nov 30 14:54:01 2016 JohannesUpdateCDSSlow machine replacement

I was talking with Larry yesterday, and he suggested the rack-mounted supermicro machines SYS-5017A-EP (~$400) or SYS-5018A-FTN4 (~$600) that he uses for moving data around in LIGO. They have 2 gigabit ethernet ports and can thus function as modbus gateways, conveniently placed in the rack close to the slow DAQ/DIO chassis and running some local ubuntu or other distro (I think Aidan uses CentOS in the PSL lab). These only have atom processors, which would be sufficient for the slow machine replacement, but there are many more powerful models with sometimes subtle differences. If we motion towards a more complete GigECam coverage in the lab it could be better to kill two birds with one stone and get something a little faster that can do the video capture/processing, since these machines will be distributed more or less strategically around the lab. Just a thought, as I have currently no clear idea what resources are required for this or how much we're throwing at this GigECam upgrade.

 

Quote:

I've attached a schematic for how we will connect the Acromag mosules to the slow channel I/O curently going to c1auxex. The following changes are made:

  • We are getting rid of the slow readbacks from the Anti-Image and Oplev boards, as Rana says they are unnnecessary.
  • The whitening switching for the QPD is currently done by a Contec "fast" binary I/O module, but can be managed by acromag instead. This alllows CAB_1Y9_34 to  be fed directly into the Acromag box since all of its connections can now be managed slow. 
  • There's no need to change the PD whitening scheme around (since the signals never get huge), so we can set those to always be on and then lose those Contec channels. This means all of the necessary pins on CAB_1Y9_10 can go to Acromag. 
  • All the other backplane cables go the the fast machines only. 

 

 

  12677   Wed Dec 14 19:16:57 2016 LydiaUpdateCDSAcromag Binary I/O testing

I looked into converting the QPD whitening switches for the X end to Acromag.

  • To test this out and be able to freely toggle filters without messing anything up, I added a temporary dummy cdsFiltCtrl module (ACROMAG_BIO_TEST) to the c1scx model.
  • The filters can be toggled from the automatically generated medm screen medm/c1scx/C1SCX_ACROMAG_BIO_TEST.adl
  • The control output of the dummy filter bank is sent to a channel named C1:SCX-ACROMAG_SWCTRL.
  • I was able to read in the appropriate bits from there and send them to the appropriate acromag channel using a calcout channel.
    • I couldn't get individual bo channels to work. This Acromag module is configured to write to 4 channels at a time, so I set that up with an analog output channel. The calcout channel shifts each relevant bit from C1:SCX-ACROMAG_SWCTRL to the right place for writing to the Acromag. 
  • I connected the Acromag XT1111 Binary I/O unit to a temporary power supply and verified that toggling the filters on and off changed the output appropriately. This is a sinking output model so the output pin is connected to the return if the switch is on. 

The plan from here:

  • Determine how to configure these outputs to be compatible with the QPD whitening board.
  • Modify the SUS PD whitening board to always use the analog filter and remove digital option in models.
  • Test DACs 
  • Verify that the QPD whitening gain switches aren't doing anything
  • Assemble new Acromag box for X end and connect to QPD whitening, SUS PD whitening and SOS driver boards
  12699   Tue Jan 10 16:20:11 2017 SteveUpdateCDSpower glitch......Raid is rebuilding

Jamie started the fm40m Raid rebuilding. It has been beeping since the power outage.

Summary pages have no reading since power glitch.

 

Attachment 1: rebuilding_in_progress.png
rebuilding_in_progress.png
  12700   Tue Jan 10 21:47:00 2017 ranaUpdateCDSpower glitch

Does "done" mean they are OK or they are somehow damaged? Do you mean the workstations or the front end machines?

The computers are all done.

megatron and optimus are not responding to ping commands or ssh -- please power them up if they are off; we need them to get data remotely

  12701   Tue Jan 10 22:55:43 2017 gautamUpdateCDSpower glitch - recovery steps

Here is a link to an elog with the steps I had to follow the last time there was a similar power glitch.

The RAID array restart was also done not too long ago, we should also do a data consistency check as detailed here, if not already..

If someone hasn't found the time to do this, I can take care of it tomorrow afternoon after I am back.

Quote:

Does "done" mean they are OK or they are somehow damaged? Do you mean the workstations or the front end machines?

The computers are all done.

megatron and optimus are not responding to ping commands or ssh -- please power them up if they are off; we need them to get data remotely

 

  12702   Wed Jan 11 16:35:03 2017 gautamUpdateCDSpower glitch - recovery progress

[lydia, ericq, gautam]

We set about following the instructions linked in the previous elog. A few notes/remarks:

  1. It is important to run the ntpdate commands before restarting the models. Sometimes, multiple restarts of the models were required to turn all the indicator blocks on the MEDM screen green.
  2. There was also an issue of multiple ntpd processes running on the same machine, which obviously caused all sorts of timing havoc. EricQ helped us diagnose and fix these. At the moment, all the lights are green on the CDS status MEDM screen
  3. On the hardware side, apart from the usual suspects of frontends/megatron/optimus/fb needing to be rebooted, I noticed that the ETMX OSEM lights were off on the control room monitors. Investigation pointed to the 2 20V sorensens at the X end outputting 0V, 0A after the power glitch. We turned down both dials, and then gradually ramped them up again. Both Sorensens now read +/-20V, 0.3A, which is in agreement with the label stuck onto them.
  4. Restarted MC autolocker and FSS Slow scripts on megatron. I have not yet looked at the status of the nds2 server on megatron.
  5. 11 MHz Marconi has yet to be restarted - but I am unable to get even the IMC locked at the moment. For some reason, the RMS of the MC1 and MC3 coils are way higher than what I am used to seeing (~5mV rms as compared to the <1mV rms I am used to seeing for a damped optic). I will investigate further. Leaving MC autolocker disabled for now.
  12708   Thu Jan 12 17:31:51 2017 gautamUpdateCDSDC errors

The IFO is more or less back to an operational state. Some details:

  1. The IMC mirror excess motion alluded to in the previous elog was due to some timing issues on c1sus. The "DAC" and "DK" blocks in the c1x02 diag word were red instead of green. Restarting all the models on c1sus fixed the problem
  2. When c1ioo was restarted, all of Koji's changes (digital) to the MC WFS servo where lost as they were not committed to the SDF. Eric suggested that I could just restore them from burt snapshots, which is what I did. I used the c1iooepics.snap file from 12:19PM PST on 26 December 2016, which was a time when the WFS servo was working well as per this elog by Koji. I have also committed all the changes to the SDF. IMC alignment has been stable for the last 4 hours.
  3. Johannes aligned and locked the arms today. There was a large DC offset on POX11, which was zeroed out by closing the PSL shutter and running LSC offsets. Both arms lock and stay aligned now.
  4. The doubling oven controller at the Y end was switched off. Johannes turned it on.
  5. Eric and I started a data consistency check on the RAID array yesterday, it has completed today and indicated no issues
  6. NDS2 is now running again on megatron so channel access from outside should(???) be possible again.

One error persists - the "DC" indicator (data concentrator?) on the CDS medm screen for the various models spontaneously go red and return to green often. Is this a known issue with an easy fix?

  12715   Fri Jan 13 21:41:23 2017 KojiUpdateCDSDC errors

I think I fixed the DC error issue

1. I added the leap second (leapsecond ?) entry for 2016/12/31, 23:60:00 UTC to daqdrc


[OLD]
set gps_leaps = 820108813 914803214 1119744016;
[NEW]
set gps_leaps = 820108813 914803214 1119744016 1167264018;

2. Restarted FB and all realtime models

Now I don't see any RED light.

  12727   Tue Jan 17 20:47:23 2017 ranaUpdateCDSSimulink Webview updated

Seems like this stops working every ~2 years. Its been busted since early 2016 according to cron, so I fixed up the paths and restored some missing files and committed things to the SVN (with comments!) and now its working and grabbing the Web viewable versions of the front end models. Just need to restore its viewability and then the world can watch our models any time.

Quote:

Back in 2011, JoeB wrote some entries on how to automatically update the Simulink webview stuff.

Somehow, the cron broke down over the years. I reran the matlab file by hand today and it worked fine, so now you can see the up to date models using the internet.

https://nodus.ligo.caltech.edu:30889/FE/

 

  12754   Wed Jan 25 14:30:20 2017 gautamUpdateCDSslow machine bootfest

[gautam, lydia]

We rebooted c1psl, c1iscaux and c1aux which were all showing the typical symptom of responding to ping but not to telnet (and also blanked out epics fields on the MEDM screens). Keyed all these crates.

Restored burt snapshots for c1psl, PMC locked fine, and IMC is also locked now.

Johannes forgot to elog this yesterday, but he rebooted c1susaux following the usual procedure to avoid getting ITMX stuck. 

 

  12762   Fri Jan 27 17:07:52 2017 LydiaUpdateCDSslow machine bootfest

Rebooted c1iscaux, c1auxex and c1auxey which were all not reponding to telnet. The watchdogs for the ETMs were turned off and then I keyed all 3 crates. All slow machines are reponding to telnet now. Both green lasers locked to the arms so I didn't do any burt restore.

  12763   Fri Jan 27 17:49:41 2017 jamieUpdateCDStest of new daqd code on fb1

Just FYI I'm running a test of updated daqd code on fb1. 

fb1 has it's own fiber to the daq network switch, so nothing had to be modified to do this test. This *should* not affect anything in the rest of the system, but as we all know these are famous last words....  If something is going haywire, and you can't get in touch with me and can't figure what else to do, you can just log on to fb1 and shut it down.  It's not writing any data to any of the network filesystems.

The daqd code under test is from the latest advLigoRTS 3.2.1 tag, which has daqd stability fixes that will hopefully address the problems we were seeing last time I tried this upgrade.  We'll see...

I'm going to let it run over the weekend, and will check in periodically.

  12765   Fri Jan 27 20:52:36 2017 gautamUpdateCDStest of new daqd code on fb1

I'm not sure if this is related, but since today morning, I've noticed that the data concentrator errors have returned. Looking at daqd.log, there is a 1 second timing mismatch error that is being generated. Usually, manually running ntpdate on the front ends fixes this problem, but it did not work today.

Attachment 1: DCerrors.png
DCerrors.png
  12766   Fri Jan 27 21:21:35 2017 gautamUpdateCDSc1pem revamped

The coil and PD BLRMS are useful tools in identifying when glitches occur in the PD  readout, I thought it would be good to install them for ITMY, ETMX and SRM (since I plan to switch the MC3 satellite box, which we suspect to be problematic, with the SRM one). For this purpose, I had to install some IPC SHMEM blocks in C1SUS and recompile. 24 IPC channels were added to pipe the coil, PD and Oplev signals from C1SUS to C1PEM - the recompilation went smoothly, and it doesn't look like the model computation time has increased significantly or that the model is any closer to timing out.

However, I was unable to install the BLRMS blocks in C1PEM, as when I tried to compile the model with BLRMS for these extra 24 channels, I got a compilation error saying that I have exceeded the maximum allowed 499 testpoints per channel. Is there any workaround to this? It would be possible to create a custom BLRMS block that doesn't have all those testpoints, maybe this is the way to go? Especially if we want to install these channels for all our SOS optics, and also replace the current Seismic BLRMS with this scheme for consistency?

GV edit: I have implemented this scheme - after backing up the original BLRMS_2k part, I made a new one with no testpoints and only EPICS readouts. Doing so allowed me to recompile c1pem without any issues, the CPU time seems to have gone up by 3us from ~55us to 58us. So the BLRMS data record is only available at 16Hz, since there are no DQ channels in the BRLMS block - do we want these in any case? Let's see how this does over the weekend...

  12769   Sat Jan 28 12:05:57 2017 jamieUpdateCDStest of new daqd code on fb1
Quote:

I'm not sure if this is related, but since today morning, I've noticed that the data concentrator errors have returned. Looking at daqd.log, there is a 1 second timing mismatch error that is being generated. Usually, manually running ntpdate on the front ends fixes this problem, but it did not work today.

If this problem started before ~4pm on Friday then it's probably unrelated, since I didn't start any of these tests until after that.  If unexplained problem persist then we can try shutting of the fb1 daqd and see if that helps.

  12770   Mon Jan 30 18:41:41 2017 jamieUpdateCDSTEST ABORTED of new daqd code on fb1

I just aborted the fb1 test and reverted everything to the nominal configuration.  Everything looks to be operating nominally.  Front ends are mostly green except for c1rfm and c1asx which are currently not being acquired by the DAQ, and an unknown IPC error with c1daf.  Please let me know if any unusual problems are encountered.

The behavior of daqd on fb1 with the latest release (3.2.1) was not improved.  After turning on the full pipe it was back to crashing every 10 minutes or so when the full and second trend frames were being written out.  lame.  back to the drawing board...

  12776   Tue Jan 31 15:08:13 2017 ericqMetaphysicsCDSMinute Trend Koan

A novice was learning at the feet of Master Daqd. At the end of the lesson he looked through his notes and said, “Master, I have a few questions. May I ask them?”

Master Daqd nodded.

"Do we record minute trends of our data?"

"Yes, we record raw minute trends in /frames/trend/minute_raw"

"I see. Do we back up minute trends?"

"Yes, we back up all frames present in /frames/trend/minute"

"Wait, this means we are not recording our current trends! What is the reason for the existence of seperate minute and minute_raw trends?

“The knowledge you seek can be answered only by the gods.”

"Can we resume recording the minute trends?"

Master Daqd nodded, turned, and threw himself off the railing, falling to his death on the rocks below.

Upon seeing this, the novice was enlightened. He proceeded to investigate how to convert raw minute trends to minute trends so that historical records could be preserved, and precisely when Master Daqd started throwing himself off the mountain when asked to record minute trends.

  12777   Tue Jan 31 17:28:36 2017 ranaSummaryCDSMinute Trend Koan

Someone installed "Debian" on allegra. Why? Dataviewer doesn't work on there. Is there some advantage to making this thing have a different OS than the others? Any objections to going back to Ubuntu12?

  12779   Tue Jan 31 20:25:26 2017 ericqSummaryCDSMinute Trend Koan
Quote:

Someone installed "Debian" on allegra. Why? Dataviewer doesn't work on there. Is there some advantage to making this thing have a different OS than the others? Any objections to going back to Ubuntu12?

My elog negligence punchcard is getting pretty full... It's pretty much for the same reason as using Debian for optimus; much of the workstation software is getting packaged for Debian, which could offload our need for setting things up in a custom 40m way. Hacking the debian-focused software.ligo.org repos into Ubuntu has caused me headaches in the past. Allegra wasn't being used often, so I figured it was a good test bed for trying things out.

The dataviewer issue was dataviewer's inability to pull the `fb` out of `fb:8088` in the NDSSERVER env variable. I made a quick fix for it in the dataviewer launching script, but there is probably a better way to do it.

  12781   Tue Jan 31 22:15:02 2017 JohannesUpdateCDSvme crate backplane adapter boards

I made a crude sketch for how Lydia and I envision the connector situation on the back of the vme crates to be solved. Essentially the side panels of each crate extend about 2" (52 mm) beyond the edge of the DIN connectors. This is plenty of space for a simple PCB board. The connector of choice is D-Sub. We can split the 64 used pins into 2x 37 D-Sub OR (2x25 pin + 1x15pin). The former has fewer cables, but a few excess unused leads. A quick google search showed me that it is much cheaper to get twisted pair cables for 15 and 25 pin D-Subs. From what I remember, the used pins on the DIN connectors are concentrated on the low numbers end and the high numbers end, so might not need the 'middle' connector in many cases if we decide to break it up into three. I have to check this with Lydia though.

The D-Sub connectors would be panel mounted, for which we need a narrow panel piece with dsub cutouts. We can run horizontal struts across the vme crate from side panel to side panel. This way the force upon cable (dis)connection is mostly on the panel which is attached to the struts which are attached to the crate. This will also prevent gravitational sag or cable strain from pulling on the DIN connection, and we can use twisted pair cables with backshell, screws, and strain reliefs.

I was lookng into getting started with the PCB when Altium complained that the license is expired and to renew it. This is a relatively simple board layout so some free software out there is probably enough.

Attachment 1: vme_backplane_conn_sketch.jpg
vme_backplane_conn_sketch.jpg
  12791   Thu Feb 2 18:28:29 2017 ranaSummaryCDSMinute Trend Koan

and the song remains the same...

the version of SVN on these workstations is ahead of the one on the other workstations so now we can't do 'svn up' on any of the Ubuntu12 machines. One allegra and optimus I get this error:

controls@allegra|GWsummaries> svn up
Updating '.':
svn: E180001: Unable to connect to a repository at URL 'file:///cvs/cds/caltech/svn/trunk/GWsummaries'
svn: E180001: Unable to open an ra_local session to URL
svn: E180001: Unable to open repository 'file:///cvs/cds/caltech/svn/trunk/GWsummaries'

Quote:
Quote:

Someone installed "Debian" on allegra. Why? Dataviewer doesn't work on there. Is there some advantage to making this thing have a different OS than the others? Any objections to going back to Ubuntu12?

My elog negligence punchcard is getting pretty full... It's pretty much for the same reason as using Debian for optimus; much of the workstation software is getting packaged for Debian, which could offload our need for setting things up in a custom 40m way. Hacking the debian-focused software.ligo.org repos into Ubuntu has caused me headaches in the past. Allegra wasn't being used often, so I figured it was a good test bed for trying things out.

The dataviewer issue was dataviewer's inability to pull the `fb` out of `fb:8088` in the NDSSERVER env variable. I made a quick fix for it in the dataviewer launching script, but there is probably a better way to do it.

I'm not sure if its possible to downgrade our chans repo back to the old one, but I highly recommend that no one do 'svn upgrade' in any of our repos until we remove all of the Debian installs in the 40m lab or hire a full-time sysadmin.

  12794   Fri Feb 3 11:03:06 2017 jamieUpdateCDSmore testing fb1; DAQ DOWN DURING TEST

More testing of fb1 today.  DAQ DOWN UNTIL FURTHER NOTICE.

Testing Wednesday did not resolve anything, but Jonathan Hanks is helping.

  12796   Fri Feb 3 11:40:34 2017 ericqSummaryCDS/cvs/cds/caltech/chans back on svn1.6

I was able to bring back svn 1.6 formatting to /cvs/cds/caltech/chans by doing the following on nodus:

cd /cvs/cds/caltech
mkdir newchans
cd newchans
svn co https://nodus.ligo.caltech.edu:30889/svn/trunk/chans ./
rm -rf ../chans/.svn
mv ./.svn ../chans/

Note that I used the http address for the repository. The svn repository doesn't live at file:///cvs/cds/caltech/svn anymore; all of our checkouts (e.g. in the scripts directory) use http to get the one true repo location, regardless of where it lives on nodus' filesystem. (I suppose we could also use https://nodus.martian:30889/svn to stick to the local network, but I don't think we're that limited by the caltech network speed)

Presumably, at some point we will want to introduce a newer operating system into the 40m, as ubuntu 12.04 hits end-of-life in April 2017. Ubuntu 16.04 includes svn 1.8, so we'll also hit this issue if we choose that OS. 


Aside from the svn issues, this directory (/cvs/cds/caltech/chans) only contains pre-2010 channels. Filters and DAQ ini files currently live in /opt/rtcds/caltech/c1/chans, which is not under version control. It's also not clear to me why summary page configurations should be kept in this /cvs/cds place.

  12797   Sat Feb 4 12:00:59 2017 ranaSummaryCDS/cvs/cds/caltech/chans back on svn1.6

True - its an issue. Koji and I are updating zita into Ubuntu16 LTS. If it looks like its OK with various tools we'll swap over the others into it. Until then I figure we're best off turning allegra back into Ubuntu12 to avoid a repeat of this kind of conflict. Once the workstations in the LLO control room are running smoothly on a new OS for a year, we can transfer into that. I don't think any of us wants to be the CDS beta tester for DV or DTT.

  12798   Sat Feb 4 12:20:39 2017 jamieSummaryCDS/cvs/cds/caltech/chans back on svn1.6
Quote:

True - its an issue. Koji and I are updating zita into Ubuntu16 LTS. If it looks like its OK with various tools we'll swap over the others into it. Until then I figure we're best off turning allegra back into Ubuntu12 to avoid a repeat of this kind of conflict. Once the workstations in the LLO control room are running smoothly on a new OS for a year, we can transfer into that. I don't think any of us wants to be the CDS beta tester for DV or DTT.

Just to be clear, since there seems to be some confusion, the SVN issue has nothing to do with Debian vs. Ubuntu.  SVN made non-backwards compatible changes to their working copy data format that breaks newer checkouts with older clients.  You will run into the exact same problem with newer Ubuntu versions.

I recommend the 40m start moving towards the reference operating systems (Debian 8 or SL7) as that's where CDS is moving.  By moving to newer Ubuntu versions you're moving away from CDS support, not towards it.

  12799   Sat Feb 4 12:29:20 2017 jamieSummaryCDS/cvs/cds/caltech/chans back on svn1.6

No, not confused on that point. We just will not be testing OS versions at the 40m or running multiple OS's on our workstations. As I've said before, we will only move to so-called 'reference' systems once they've been in use for a long time.

Quote:
Quote:

True - its an issue. Koji and I are updating zita into Ubuntu16 LTS. If it looks like its OK with various tools we'll swap over the others into it. Until then I figure we're best off turning allegra back into Ubuntu12 to avoid a repeat of this kind of conflict. Once the workstations in the LLO control room are running smoothly on a new OS for a year, we can transfer into that. I don't think any of us wants to be the CDS beta tester for DV or DTT.

Just to be clear, since there seems to be some confusion, the SVN issue has nothing to do with Debian vs. Ubuntu.  SVN made non-backwards compatible changes to their working copy data format that breaks newer checkouts with older clients.  You will run into the exact same problem with newer Ubuntu versions.

I recommend the 40m start moving towards the reference operating systems (Debian 8 or SL7) as that's where CDS is moving.  By moving to newer Ubuntu versions you're moving away from CDS support, not towards it.

 

  12800   Sat Feb 4 12:50:01 2017 jamieSummaryCDS/cvs/cds/caltech/chans back on svn1.6
Quote:

No, not confused on that point. We just will not be testing OS versions at the 40m or running multiple OS's on our workstations. As I've said before, we will only move to so-called 'reference' systems once they've been in use for a long time.

Ubuntu16 is not to my knowledge used for any CDS system anywhere.  I'm not sure how you expect to have better support for that.  There are no pre-compiled packages of any kind available for Ubuntu16.  Good luck, you big smelly doofuses. Nyah, nyah, nyah.

  12803   Mon Feb 6 15:18:08 2017 gautamUpdateCDSslow machine bootfest

Had to reboot c1psl, c1susaux, c1auxex, c1auxey and c1iscaux today. PMC has been relocked. ITMX didn't get stuck. According to this thread, there have been two instances in the last 10 days in which c1psl and c1susaux have failed. Since we seem to be doing this often lately, I've made a little script that uses the netcat utility to check which slow machines respond to telnet, it is located at /opt/rtcds/caltech/c1/scripts/cds/testSlowMachines.bash.

The script can be executed by ./testSlowMachines.bash.

  12810   Tue Feb 7 19:14:59 2017 JohannesUpdateCDSvme crate backplane adapter board layout

After fighting with Altium for what seems like an eternity I have finished putting my vision of the vme crate backplane adapter board into an electronic format. It is dimensioned to fill the back space of the crate exactly. The connectors are panel mount and the PCB attaches to the connectors with screws, such that the whole thing will be mechanically much more stable than the current configuration. A mounting bracket will attach to horizontal struts that need to be installed in the crates, mechanical drawings to follow.

Attachment 1: vme_backplane.pdf
vme_backplane.pdf
  12893   Mon Mar 20 11:18:58 2017 gautamUpdateCDSNo internet connectivity on control room machines

There is no internet connectivity on any of the control room machines. 

I have been trying to debug by tracing the cabling situation in the rack in the office area, and will update if/when this problem has been resolved. I had last come into the lab on Saturday and there was no problem then. There 40m wireless network servicing the office area seems to work fine.

 

  12894   Mon Mar 20 14:39:44 2017 gautamUpdateCDSNo internet connectivity on control room machines

Koji diagnosed that the NAT router was to blame for this problem. I simply power cycled this router, and now the connectivity has been restored. 

It was possible to log into nodus and then to pianosa - and it was also possible to log into the various control room machines once logged into nodus. However, the outward packets seemed to not get transmitted. Anyways, power cycling the NAT Router unit seems to have done the job.

Quote:

There is no internet connectivity on any of the control room machines. 

I have been trying to debug by tracing the cabling situation in the rack in the office area, and will update if/when this problem has been resolved. I had last come into the lab on Saturday and there was no problem then. There 40m wireless network servicing the office area seems to work fine.

 

 

ELOG V3.1.3-