40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 290 of 344  Not logged in ELOG logo
ID Date Author Typedown Category Subject
  11134   Wed Mar 11 19:15:03 2015 KojiSummaryLSCROUGH calibration of the darm spectrum during the full PRFPMI lock

I made very rough calibration of the DARM spectra before and after the transition for the second lock on Mar 8.

The cavity pole (expected to be 4.3kHz) was not compensated. Also the servo bump was not compensated.

[Error calibration]

While the DARM/CARM were controlled with ALS, the calibration of them are provided by the ALS phase tracker calibration.
i.e 1 degree = 19.23kHz

This means that the calibration factor is

DARM [deg] * 19.23e3 [Hz/deg] / c [m Hz] * lambda [m] * L_arm [m]
= DARM* 19.23e3/299792458*1064e-9*38.5 = 2.6e-9 *DARM [m]

[Feedback calibration]

Then, the feedback signal was calibrated by the suspension response (f=1Hz, Q=5)
so that the error and feedback signals can match at 100Hz.

This gave me the DC factor of 5e-8.


The spectra at 1109832200 (ALS only, even not on the resonance) and 1109832500 (after DARM/CARM transitions) were taken.
Jenne said that the whitening filters for AS55Q was not on.

  11147   Thu Mar 19 16:58:19 2015 SteveSummaryIOOMC alignment not drifting; PSL beam is drifting

Polaris mounts ordered.

Quote:

In the attached plot you can see that the MC REFL fluctuations started getting larger on Feb 24 just after midnight. Its been bad ever since. What happened that night or the afternoon of Feb 23?
The WFS DC spot positions were far off (~0.9), so I unlocked the IMC and aligned the spots on there using the nearby steering mirrors - lets see if this helps.

Also, these mounts should be improved. Steve, can you please prepare 5 mounts with the Thorlabs BA2 or BA3 base, the 3/4" diameter steel posts, and the Polanski steel mirror mounts? We should replace the mirror mounts for the 1" diameter mirrors during the daytime next week to reduce drift.

 

  11148   Thu Mar 19 17:11:32 2015 steveSummarySUSoplev laser summary updated

           March  19, 2015   2  new  JDSU 1103P, sn P919645 & P919639 received from Thailand through Edmond Optics. Mfg date 12/2014............as spares

  11149   Fri Mar 20 10:51:09 2015 SteveSummaryIOOMC alignment not drifting; PSL beam is drifting

Are the two  visible small srews holding the adapter plate only?

If yes, it is the weakest point of the IOO path.

  11158   Mon Mar 23 09:42:29 2015 SteveSummaryIOO4" PSL beam path posts

To achive the same beam height each components needs their specific post height.

 Beam Height Base Plate Mirror Mount Lens Mount  Waveplates-Rotary 0.75" OD. SS Post Height                      
             
4" Thorlabs BA2   Newport LH-1   2.620"  
4" Thorlabs BA2 Polaris K1     2.620"  
4" Thorlabs BA2 Polaris K2     2.220"  
4" Thorlabs BA2   Thorlabs LMR1   2.750"  
4" Thorlabs BA2     New Focus 9401 2.120"  
4" Thorlabs BA2 Newport U100     2.620"  
4" Thorlabs BA2 Newport U200     2.120"  
4" Newport 9021   LH-1   2.0" PMC-MM lens with xy translation stage: Newport 9022, 9065A    Atm3
4" Newport 9021   LH-1   1.89 MC-MM lens with translation stage: Newport 9022, 9025        Atm2

We have 2.625" tall, 3/4" OD SS posts for Polaris K1 mirror mounts: 20 pieces

Ordered Newport LH-1 lens mounts with axis height 1.0 yes

 

  11171   Wed Mar 25 18:27:34 2015 KojiSummaryGeneralSome maintainance

- I found that the cable for the AS55 LO signal had the shielding 90% broken. It was fixed.

- The Mon5 monitor in the control room was not functional for months. I found a small CRT down the east arm.
It is now set as MON5 showing the picture from cameras. Steve, do we need any safety measure for this CRT?

  11173   Wed Mar 25 18:48:11 2015 KojiSummaryLSC55MHz demodulators inspection

[Koji Den EricG]

We inspected the {REFL, AS, POP}55 demodulators.

Short in short, we did the following changes:

- The REFL55 PD RF signal is connected to the POP55 demodulator now.
Thus, the POP55 signals should be used at the input matrix of the LSC screens for PRMI tests.

- The POP55 PD RF signal is connected to the REFL55 demodulator now.

- We jiggled the whitening gains and the whitening triggers. Whitening gains for the AS, REFL, POP PDs are set to be 9, 21, 30dB as before.
However, the signal gain may be changed. The optimal gains should be checked through the locking with the interferometer.


- Test 1

Inject 55.3MHz signal to the demodulators. Check the amplitude in the demodulated signal with DTT.
The peak height in the spectrum was calibrated to counts (i.e. it is not counts/rtHz)
We check the amplitude at the input of the input filters (e.g. C1:LSC-REFL55_I_IN1). The whitening gains are set to 0dB.
And the whitening filters were turned off.

REFL55
f_inj = 55.32961MHz -10dBm
REFL55I @999Hz  22.14 [cnt]
REFL55Q @999Hz  26.21 [cnt]


f_inj = 55.33051MHz -10dBm
REFL55I @ 99Hz  20.26 [cnt]  ~200mVpk at the analog I monitor
REFL55Q @ 99Hz  24.03 [cnt]


f_inj = 55.33060MHz -10dBm
REFL55I @8.5Hz  22.14 [cnt]
REFL55Q @8.5Hz  26.21 [cnt]


----
f_inj = 55.33051MHz -10dBm
AS55I   @ 99Hz 585.4 [cnt]
AS55Q   @ 99Hz 590.5 [cnt]   ~600mVpk at the analog Q monitor

f_inj = 55.33051MHz -10dBm
POP55I  @ 99Hz 613.9 [cnt]   ~600mVpk at the analog I monitor
POP55Q  @ 99Hz 602.2 [cnt]

We wondered why the REFL55 has such a small response. The other demodulators seems to have some daughter board. (Sigg amp?)
This maybe causing this difference.

-----

- Test 2

We injected 1kHz 1Vpk AF signal into whitening board. The peak height at 1kHz was measured.
The whitening filters/gains were set to be the same condition above.

f_inj = 1kHz 1Vpk
REFL55I 2403 cnt
REFL55Q
2374 cnt
AS55I   2374 cnt
AS55Q   2396 cnt
POP55I  2365 cnt
POP55Q
  2350 cnt

So, they look identical. => The difference between REFL55 and others are in the demodulator.

  11180   Fri Mar 27 20:32:17 2015 KojiSummaryLSCLocking activity

- Adjutsed the IMC WFS operating point. The IMC refl is 0.42-0.43.

- The arms are aligned with ASS

- The X arm green was aligned with ASX. PZT offsets slides were adjusted to offload the servo outputs.

- I tried the locking once and the transition was successfull. I even tried the 3f-1f transition but the lock was lost. I wasn't sure what was the real cause.

I need to go now. I leave the IFO at the state that it is waiting for the arms locked with IR for the full locking trial.

  11235   Wed Apr 22 11:48:30 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Since the Frequency counters have not been a reliable error signal for FOL PID loop, we will put together an analog delay line frequency dicriminator as an alternative method to obtain the beat frequency.

The configuration will be similar to what was done in elog 4254 in the first place.

For a delay line frequency dicriminator, the output at the mixer is proportional to cos(\theta_{b}) where \theta_{b} = 2 \pi f_{b}L/v

L - cable length asymmetry, fb - beat frequency and v - velocity of light in the cable.

The linear output signal canbe obtained for  0< \theta_{b}<\pi

For our purpose in FOL, if we would like to measure beat frequency over a bandwidth of 200MHz, this would correspond to a cable length difference of 0.5 m (assuming the speed of light in the coaxial cable is ~ 2x108m/s.

  11236   Wed Apr 22 14:56:18 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

[Koji, Manasa]

Since the bandwidth of the fiber PD is ~ 1GHz, we could design the frequency discriminator to have a wider bandwidth (~ 500MHz). The output from the frequency discriminator could then be used to define the range setting of the frequency counter for readout or may be even error signal to the PID loop.

A test run for the analog DFD with cable length difference of 27cm gave a linear output signal with zero-crossing at ~206MHz.

Detailed schematic of the setup and plot (voltage vs frequency) will be updated shortly.

  11245   Fri Apr 24 21:31:20 2015 KojiSummaryCDSautomatic mxstream resetting

We were too much annoyed by frequent stall of mxstream. We'll update the RCG when time comes (not too much future).

But for now, we need an automatic mxstream resetting.

I found there is such a script already.

/opt/rtcds/caltech/c1/scripts/cds/autoMX

So this script was registered to crontab on megatron.
It is invoked every 5 minutes.

# Auto MXstream reset when it fails
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /opt/rtcds/caltech/c1/scripts/cds/autoMX >> /opt/rtcds/caltech/c1/scripts/cds/autoMX.log

  11252   Sun Apr 26 00:56:21 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

  11267   Fri May 1 20:33:31 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Same thing again todaysad. So I renamed the /etc/init/elog.conf so that it doesn't keep respawning bootlessly. Until then restart elog using the start script in /cvs/cds/caltech/elog/ as usual.

I'll let EQ debug when he gets back - probably we need to pause the elog respawn so that it waits until nodus is up for a few minutes before starting.

Quote:

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

 

  11268   Sun May 3 01:04:19 2015 ranaSummaryPEMSeismo signals are bad

https://ldas-jobs.ligo.caltech.edu/~max.isi/summary/day/20150502/pem/seismic/

Looks like some of our seismometers are oscillating, not mounted well, or something like that. No reason for them to be so different.

Which Guralp is where? And where are our accelerometers mounted?

  11270   Mon May 4 10:21:09 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Attached is the schematic of the analog DFD and the plot showing the zero-crossing for a delay line length of 27cm. The bandwidth for the linear output signal obtained roughly matches what is expected from the length difference (370MHz) .

We could use a smaller cable to further increase our bandwidth. I propose we use this analog DFD to determine the range at which the frequency counter needs to be set and then use the frequency counter readout as the error signal for FOL.

 

  11272   Mon May 4 12:42:34 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Koji suggested that I make a cosine fit for the curve instead of a linear fit.

I fit the data to V(f) = A + B cos(2\pi f_{b}L/v) 
where L - cable length asymmetry (27 cm) , fb - beat frequency and v - velocity of light in the cable (2*10m/s)

The plot with the cosine fit is attached. 

Fit coefficients (with 95% confidence bounds):
       A =      0.4177  (0.3763, 0.4591)
       B =       2.941  (2.89, 2.992)

  11300   Mon May 18 14:46:20 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Measuring the voltage noise and frequency response of the Analog Delay-line Frequency Discriminator (DFD)

The schematic and an actual photo of the setup is shown below. The setup was checked to be physically sturdy with no loose connections or moving parts.

The voltage noise at the output of the DFD was measured using an SR785 signal analyzer while simultaneously monitoring the signal on an oscilloscope.

The noise at the output of the DFD was measured for no RF input and at several RF input frequencies including the zero crossing frequency and the optimum operating frequency of the DFD (20MHz).

The plot below show the voltage noise for different RF inputs to the DFD. It can be seen that the noise level is slightly lower at the zero crossing frequency where the amplitude noise is eliminated by the DFD.

I also did measurements to obtain the frequency response of the setup as the cable length difference has changed from the prior setup. The cable length difference is 21cm and the obtained linear signal at the output of the DFD extends over ~ 380MHz which is good enough for our purposes in FOL. A cosine fit to the data was done as before. //edit- Manasa: The gain of SR560 was set to 20 to obtain the data shown below//

Fit Coefficients (with 95% confidence bounds):
       a =     -0.8763  (-1.076, -0.6763)
       b =       3.771  (3.441, 4.102)

Data and matlab scripts are zipped and attached.

  11368   Mon Jun 22 12:57:09 2015 ericqSummaryLSCX/Y green beat mode overlap measurement redone

I took measurements at the green beat setup on the PSL table, and found that our power / mode overlap situation is still consistent with what Koji and Manasa measured last September [ELOG 10492]. I also measured the powers at the BBPDs with the Ophir power meter.

Both mode overlaps are around 50%, which is fine. 

The beatnote amplitudes at the BBPD outputs at a frequency of about 50MHz are -20.0 and -27.5 dBm for the X and Y beats, respectively. This is consistent with the measured optical power levels and a PD response of ~0.25 A/W at 532nm. The main reason for the disparity is that there is much more X green light than Y green light on the table (factor of ~20), and the greater amount of green PSL light on the Y BBPD (factor of ~3) does not quite make up for it. 

One way to punch up the Y beat a little might be to adjust the pickoff optics. Of 25uW of Y arm transmitted green light incident on the polarizing beamsplitter that seperates the X and Y beams, only 13uW makes it to the Y BBPD, but this would only win us a couple dBms at most. 

In any case, with the beat setup as it exists, it looks like we should design the next beatbox iteration to accept RF inputs of around -20 to -30 dBm. 


In the style of the referenced ELOG, here are today's numbers. 

            XARM   YARM 
o BBPD DC output (mV)
 V_DARK:   +  1.0  + 2.2
 V_PSL:    +  7.1  +21.3
 V_ARM:    +165.0  + 8.2


o BBPD DC photocurrent (uA)
I_DC = V_DC / R_DC ... R_DC: DC transimpedance (2kOhm)

 I_PSL:       3.6   10.7
 I_ARM:      82.5    4.1


o Expected beat note amplitude
I_beat_full = I1 + I2 + 2 sqrt(e I1 I2) cos(w t) ... e: mode overwrap (in power)

I_beat_RF = 2 sqrt(e I1 I2)

V_RF = 2 R sqrt(e I1 I2) ... R: RF transimpedance (2kOhm)

P_RF = V_RF^2/2/50 [Watt]
     = 10 log10(V_RF^2/2/50*1000) [dBm]

     = 10 log10(e I1 I2) + 82.0412 [dBm]
     = 10 log10(e) +10 log10(I1 I2) + 82.0412 [dBm]

for e=1, the expected RF power at the PDs [dBm]
 P_RF:      -13.2  -21.5


o Measured beat note power (no alignment done)     
 P_RF:      -20.0  -27.5  [dBm] (53.0MHz and 46.5MHz) 
    e:       45.7   50.1  [%]                         

 

  11370   Mon Jun 22 14:53:37 2015 ranaSummaryLSCX/Y green beat mode overlap measurement redone
  • Why is there a factor of 20 power difference? Some of it is the IR laser power difference, but I thought that was just a factor of 4 in green.
  • Why is the mode overlap only 50% and not more like 75%?
  • IF we have enough PSL green power, we could do the Y-beat with a 80/20 instead of a 50/50 and get better SNR.
  • The FFD-100 response is more like 0.33 A/W at 532 nm, not 0.25 A/W.

In any case, this signal difference is not big, so we should not need a different amplifier chain for the two signals. The 20 dB of amplification in the BeatBox was a fine way, but not great in circuit layout.

The BBPD has an input referred current noise of 10 pA/rHz and a transimpedance of 2 kOhm, so an output voltage noise of 20 nV/rHz (into 50 Ohms). This would be matched by an Amp with NF = 26 dB, which is way worse than anything we could bur from mini-circuits, so we should definitely NOT use anything like the low-noise, low output power amps used currently (e.g. ZFL-1000LN....never, ever use these for anything). We should use a single ZHL-3A-S (G = 25 dB, NF < 6 dB, Max Out = 30 dBm) for each channel (and nothing else) before driving the cables over to the LSC rack into the aLIGO demod board. I just ordered two of these now.

  11384   Tue Jun 30 11:33:00 2015 JamieSummaryCDSprepping for CDS upgrade

This is going to be a big one.  We're at version 2.5 and we're going to go to 2.9.3.

RCG components that need to be updated:

  • mbuf kernel module
  • mx_stream driver
  • iniChk.pl script
  • daqd
  • nds

Supporting software:

  • EPICS 3.14.12.2_long
  • ldas-tools (framecpp) 1.19.32-p1
  • libframe 8.17.2
  • gds 2.16.3.2
  • fftw 3.3.2

Things to watch out for:

  • RTS 2.6:
    • raw minute trend frame location has changed (CRC-based subdirectory)
    • new kernel patch
  • RTS 2.7:
    • supports "commissioning frames", which we will probably not utilize.  need to make sure that we're not writing extra frames somewhere
  • RTS 2.8:
    • "slow" (EPICS) data from the front-end processes is acquired via DAQ network, and not through EPICS.  This will increase traffic on the DAQ lan.  Hopefully this will not be an issue, and the existing network infrastructure can handle it, but it should be monitored.
  11390   Wed Jul 1 19:16:21 2015 JamieSummaryCDSCDS upgrade in progress

The CDS upgrade is now underway

Here's what's happened so far:

  • Installed and linked in all the RTS supporting software packages in /opt/rtapps (only on front end machines and fb):
    controls@c1lsc ~ 2$ find /opt/rtapps/ -mindepth 1 -maxdepth 1 -type l -ls
    12582916    0 lrwxrwxrwx   1 controls 1001           12 Jul  1 13:16 /opt/rtapps/gds -> gds-2.16.3.2
    12603452    0 lrwxrwxrwx   1 controls 1001           10 Jul  1 13:17 /opt/rtapps/fftw -> fftw-3.3.2
    12603451    0 lrwxrwxrwx   1 controls 1001           15 Jul  1 13:16 /opt/rtapps/libframe -> libframe-8.17.2
    12603450    0 lrwxrwxrwx   1 controls 1001           13 Jul  1 13:16 /opt/rtapps/libmetaio -> libmetaio-8.2
    12582915    0 lrwxrwxrwx   1 controls 1001           34 Jul  1 15:24 /opt/rtapps/framecpp -> ldas-tools-1.19.32-p1/linux-x86_64
    12582914    0 lrwxrwxrwx   1 controls 1001           20 Jul  1 13:15 /opt/rtapps/epics -> epics-3.14.12.2_long
  • Checked out the RTS source for the version we'll be using: 2.9.4

/opt/rtcds/rtscore/tags/advLigoRTS-2.9.4

  • built and installed all of the RTS components:
    • mbuf
    • mx_stream
    • daqd
    • nds
    • awgtpman
       
  • mx_stream is not working. Unknown why. It won't start on the front end machines (only tested on c1lsc so far) with the following error:
    controls@c1lsc ~ 1$ /opt/rtcds/caltech/c1/target/fb/mx_stream -s c1x04 c1lsc c1ass c1oaf c1cal -d fb:0
    mmapped address is 0x7ff7b71a0000
    send len = 263596
    mx_connect failed Remote Endpoint is Closed
    controls@c1lsc ~ 1$
    
    Have contact Keith T. and Rolf B. for backup.  This is a blocker, since this is what ferries the data from the front ends.
     
  • Rebuilt almost all models.  This was good.  Initially nothing would compile because of IPC creation errors, so I moved the old chans/ipc/C1.ipc file out of the way and generated a new one and then everything compiled (of course senders have to be compiled before receivers).
    I only had to fix a couple of things in the models themselves:
    • c1ioo - unterminated FiltCtrl inputs
    • C1_SUS_SINGLE_CONTROL - unterminated FiltCtrl inputs
    • c1oaf - bad part named "STATIC". There is some hacky namespace stuff going on in the RCG. I was able to just explode that part and it now works.
    • c1lsc - unterminated FiltCtrl inputs
    Haven't installed or tried to run anything yet, but the fact they compile is good.
    Some models are not compiling because they have C code in src blocks that are throwing errors:
    • c1lsc
    • c1cal
    It shouldn't be too hard to fix whatever is causing those compile errors.

That's it for today.  Will pick up again first thing tomorrow

  11392   Tue Jul 7 17:22:16 2015 JessicaSummary Time Delay in ALS Cables

I measured the transfer functions in the delay line cables, and then calculated the time delay from that.

The first cable had a time delay of 1272 ns and the second had a time delay of 1264 ns. 

Below are the plots I created to calculate this. There does seem to be a pattern in the residual plots however, which was not expected. 

The R-Square parameter was very close to 1 for both fits, indicating that the fit was good. 

  11393   Tue Jul 7 18:27:54 2015 JamieSummaryCDSCDS upgrade: progress!

After a couple of days of struggle, I made some progress on the CDS upgrade today:

Front end status:

  • RTS upgraded to 2.9.4, and linked in as "release":

/opt/rtcds/rtscore/release -> tags/advLigoRTS-2.9.4

  • mbuf kernel module built installed
  • All front ends have been rebooted with the latest patched kernel (from 2.6 upgrade)
  • All models have been rebuilt, installed, restarted.  Only minor model issues had to be corrected (unterminated unused inputs mostly).
  • awgtpman rebuilt, and installed/running on all front-ends
  • open-mx upgraded to 1.5.2:

/opt/open-mx -> open-mx-1.5.2

  • All front ends running latest version of mx_stream, built against 2.9.4 and open-mx-1.5.2.

We have new GDS overview screens for the front end models:

It's possible that our current lack of IRIG-B GPS distribution means that the 'TIM' status bit will always be red on the IOP models.  Will consult with Rolf.

There are other new features in the front ends that I can get into later.

DAQ (fb) status:

  • daqd and nds rebuilt against 2.9.4, both now running on fb

40m daqd compile flags:

cd src/daqd
./configure --enable-debug --disable-broadcast --without-myrinet --with-mx --enable-local-timing --with-epics=/opt/rtapps/epics/base --with-framecpp=/opt/rtapps/framecpp
make
make clean
install daqd /opt/rtcds/caltech/c1/target/fb/

However, daqd has unfortunately been very unstable, and I've been trying to figure out why.  I originally thought it was some sort of timing issue, but now I'm not so sure.

I had to make the following changes to the daqdrc:

set gps_leaps = 820108813 914803214 1119744016;

That enumerates some list of leap seconds since some time.  Not sure if that actually does anything, but I added the latest leap seconds anyway:

set symm_gps_offset=315964803;

This updates the silly, arbitrary GPS offset, that is required to be correct when not using external GPS reference.

Finally, the last thing I did that finally got it running stably was to turn off all trend frame writing:

# start trender;
# start trend-frame-saver;
# sync trend-frame-saver;
# start minute-trend-frame-saver;
# sync minute-trend-frame-saver;
# start raw_minute_trend_saver;

For whatever reason, it's the trend frame writing that that was causing things daqd to fall over after a short amount of time.  I'll continue investigating tomorrow.

 

We still have a lot of cleanup burt restores, testing, etc. to do, but we're getting there.

  11395   Wed Jul 8 17:46:20 2015 JessicaSummaryGeneralUpdated Time Delay Plots

I re-measured the transfer function for Cable B, because the residuals in my previous post for cable B indicated a bad fit. 

I also realized I had made a mistake in calculating the time delay, and calculated more reasonable time delays today. 

Cable A had a delay of 202.43 +- 0.01 ns.

Cable B had a delay of 202.44 +- 0.01 ns. 

  11396   Wed Jul 8 20:37:02 2015 JamieSummaryCDSCDS upgrade: one step forward, two steps back

After determining yesterday that all the daqd issues were coming from the frame writing, I started to dig into it more today.  I also spoke to Keith Thorne, and got some good suggestions from Gerrit Kuhn at GEO.

I  realized that it probably wasn't the trend writing per se, but that turning on more writing to disk was causing increased load on daqd, and consequently on the system itself.  With more frame writing turned on the memory consuption increased to the point of maxing out the physical RAM.  The system the probably starting swaping, which certainly would have choked daqd.

I noticed that fb only had 4G of RAM, which Keith suggested was just not enough.  Even if the memory consumption of daqd has increased significantly, it still seems like 4G would not be enough.  I opened up fb only to find that fb actually had 8G of RAM installed!  Not sure what happend to the other 4G, but somehow they were not visible to the system.  Koji and I eventually determined, via some frankenstein operations with megatron, that the RAM was just dead.  We then pulled 4G of RAM from megatron and replaced the bad RAM in fb, so that fb now has a full 8G of RAM cool.

Unfortunately, when we got fb fully back up and running we found that fb is not able to see any of the other hosts on the data concentrator network sad.  mx_info, which displays the card and network status for the myricom myrinet fiber card, shows:

MX Version: 1.2.16
MX Build: controls@fb:/opt/src/mx-1.2.16 Tue May 21 10:58:40 PDT 2013
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  299.8 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Wrong Network
    Network:    Myrinet 10G

    MAC Address:    00:60:dd:46:ea:ec
    Product code:    10G-PCIE-8AL-S
    Part number:    09-03916
    Serial number:    352143
    Mapper:        00:60:dd:46:ea:ec, version = 0x63e745ee, configured
    Mapped hosts:    1

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:46:ea:ec fb:0                            D 0,0

Note that all front end machines should be listed in the table at the bottom, and they're not.   Also note the "Wrong Network" note in the Status line above.  It appears that the card has maybe been initialized in a bad state?  Or Koji and I somehow disturbed the network when we were cleaning up things in the rack.  "sudo /etc/init.d/mx restart" on fb doesn't solve the problem.  We even rebooted fb and it didn't seem to help.

In any event, we're back to no data flow.  I'll pick up again tomorrow.

  11397   Wed Jul 8 21:02:02 2015 JamieSummaryCDSCDS upgrade: another step forward, so we're back to where we started (plus a bit?)

Koji did a bit of googling to determine that 'Wrong Network' status message could be explained by the fb myrinet  operating in the wrong mode:
(This was the useful link to track down the issue (KA))
 

    Network:    Myrinet 10G

I didn't notice it before, but we should in fact be operating in "Ethernet" mode, since that's the fabric we're using for the DC network.  Digging a bit deeper we found that the new version of mx (1.2.16) had indeed been configured with a different compile option than the 1.2.15 version had:

controls@fb ~ 0$ grep '$ ./configure' /opt/src/mx-1.2.15/config.log          
  $ ./configure --enable-ether-mode --prefix=/opt/mx
controls@fb ~ 0$ grep '$ ./configure' /opt/src/mx-1.2.16/config.log
  $ ./configure --enable-mx-wire --prefix=/opt/mx-1.2.16
controls@fb ~ 0$

So that would entirely explain the problem.  I re-linked mx to the older version (1.2.15), reloaded the mx drivers, and everything showed up correctly:

controls@fb ~ 0$ /opt/mx/bin/mx_info
MX Version: 1.2.12
MX Build: root@fb:/root/mx-1.2.12 Mon Nov  1 13:34:38 PDT 2010
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  299.8 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:46:ea:ec
    Product code:    10G-PCIE-8AL-S
    Part number:    09-03916
    Serial number:    352143
    Mapper:        00:60:dd:46:ea:ec, version = 0x00000000, configured
    Mapped hosts:    6

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:46:ea:ec fb:0                              1,0
   1) 00:25:90:0d:75:bb c1sus:0                           1,0
   2) 00:30:48:be:11:5d c1iscex:0                         1,0
   3) 00:30:48:d6:11:17 c1iscey:0                         1,0
   4) 00:30:48:bf:69:4f c1lsc:0                           1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
controls@fb ~ 0$

The front end hosts are also showing good omx info (even though they had been previously as well):

controls@c1lsc ~ 0$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.2
 build: controls@fb:/opt/src/open-mx-1.5.2 Tue May 21 11:03:54 PDT 2013

Found 1 boards (32 max) supporting 32 endpoints each:
 c1lsc:0 (board #0 name eth1 addr 00:30:48:bf:69:4f)
   managed by driver 'igb'

Peer table is ready, mapper is 00:30:48:d6:11:17
================================================
  0) 00:30:48:bf:69:4f c1lsc:0
  1) 00:60:dd:46:ea:ec fb:0
  2) 00:25:90:0d:75:bb c1sus:0
  3) 00:30:48:be:11:5d c1iscex:0
  4) 00:30:48:d6:11:17 c1iscey:0
  5) 00:14:4f:40:64:25 c1ioo:0
controls@c1lsc ~ 0$

This got all the mx_stream connections back up and running.

Unfortunately, daqd is back to being a bit flaky.  With all frame writing enabled we saw daqd crash again.  I then shut off all trend frame writing and we're back to a marginally stable state: we have data flowing from all front ends, and full frames are being written, but not trends.

I'll pick up on this again tomorrow, and maybe try to rebuild the new version of mx with the proper flags.

  11398   Thu Jul 9 13:26:47 2015 JamieSummaryCDSCDS upgrade: new mx 1.2.16 installed

I rebuilt/installed mx 1.2.16 to use "ether-mode", instead of the default MX-10G:

controls@fb /opt/src/mx-1.2.16 0$ ./configure --enable-ether-mode --prefix=/opt/mx-1.2.16
...
controls@fb /opt/src/mx-1.2.16 0$ make
..
controls@fb /opt/src/mx-1.2.16 0$ make install
...

I then rebuilt/installed daqd so that it properly linked against the updated mx install:

controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ ./configure --enable-debug --disable-broadcast --without-myrinet --with-mx --with epics=/opt/rtapps/epics/base --with-framecpp=/opt/rtapps/framecpp --enable-local-timing
...
controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ make
...
controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ install daqd /opt/rtcds/caltech/c1/target/fb/

It's now back to running and receiving data from the front ends (still not stable yet, though).

  11400   Thu Jul 9 16:50:13 2015 JamieSummaryCDSCDS upgrade: if all else fails try throwing metal at the problem

I roped Rolf into coming over and adding his eyes to the problem.  After much discussion we couldn't come up with any reasonable explanation for the problems we've been seeing other than daqd just needing a lot more resources that it did before.  He said he had some old Sun SunFire X4600s from which we could pilfer memory.  I went over to Downs and ripped all the CPU/memory cards out of one of his machines and stuffed them into fb:

fb now has 8 CPU and 16G of RAM

Unfortunately, this is still not enough.  Or at least it didn't solve the problem; daqd is showing the same instabilities, falling over a couple of minutes after I turn on trend frame writing.  As always, before daqd fails it starts spitting out the following to the logs:

[Thu Jul  9 16:37:09 2015] main profiler warning: 0 empty blocks in the buffer

followed by lines like:

[Thu Jul  9 16:37:27 2015] GPS MISS dcu 44 (ASX); dcu_gps=1120520264 gps=1120519812

right before it dies.

I'm no longer convinced that this is a resource issue, though, judging by the resource usage right before the crash:

top - 16:47:32 up 48 min,  5 users,  load average: 0.91, 0.62, 0.61
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.9%us,  0.9%sy,  0.0%ni, 89.1%id,  0.9%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  15952104k total, 13063468k used,  2888636k free,   138648k buffers
Swap:  1023996k total,        0k used,  1023996k free,  7672292k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12016 controls  20   0 8098m 4.4g 104m S  106 29.1   6:45.79 daqd
 4953 controls  20   0 53580 6092 5096 S    0  0.0   0:00.04 nds

Load average less than 1 per CPU, plenty of free memory (~3G free, 0 swap), no waiting for IO (0.9%wa), etc.  daqd is utilizing lots of  threads, which should be spread across many cpus, so even the >100%CPU should be ok.   I'm at a loss...

  11402   Mon Jul 13 01:11:14 2015 JamieSummaryCDSCDS upgrade: current assessment

daqd is still behaving unstably.  It's still unclear what the issue is.

The current failures look like disk IO contention.  However, it's hard to see any evidince of daqd is suffering from large IO wait while it's failing.

The frame size itself is currently smaller than it was before the upgrade:

controls@fb /frames/full 0$ ls -alth 11190 | head
total 369G
drwxr-xr-x 321 controls controls  36K Jul 12 22:20 ..
drwxr-xr-x   2 controls controls 268K Jun 23 06:06 .
-rw-r--r--   1 controls controls  67M Jun 23 06:06 C-R-1119099984-16.gwf
-rw-r--r--   1 controls controls  68M Jun 23 06:06 C-R-1119099968-16.gwf
-rw-r--r--   1 controls controls  69M Jun 23 06:05 C-R-1119099952-16.gwf
-rw-r--r--   1 controls controls  69M Jun 23 06:05 C-R-1119099936-16.gwf
-rw-r--r--   1 controls controls  67M Jun 23 06:05 C-R-1119099920-16.gwf
-rw-r--r--   1 controls controls  68M Jun 23 06:05 C-R-1119099904-16.gwf
-rw-r--r--   1 controls controls  68M Jun 23 06:04 C-R-1119099888-16.gwf
controls@fb /frames/full 0$ ls -alth 11208 | head
total 17G
drwxr-xr-x   2 controls controls  20K Jul 13 01:00 .
-rw-r--r--   1 controls controls  45M Jul 13 01:00 C-R-1120809632-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 01:00 C-R-1120809408-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809392-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809376-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809360-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:55 C-R-1120809344-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:55 C-R-1120809328-16.gwf
controls@fb /frames/full 0$

This would seem to indicate that it's not an increase in frame size that's to blame.

Because slow data is now transported to daqd over the MX data concentrator network rather than via EPICS (RTS 2.8), there is more network on the MX network.   I note also that the channel lists have increased in size:

controls@fb /opt/rtcds/caltech/c1/chans/daq 0$ ls -alt archive/C1LSC* | head -20
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 18:21 archive/C1LSC_150706_182146.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 18:16 archive/C1LSC_150706_181603.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 16:09 archive/C1LSC_150706_160946.ini
-rw-r--r-- 1 4294967294 4294967294  43366 Jul  1 16:05 archive/C1LSC_150701_160519.ini
-rw-r--r-- 1 4294967294 4294967294  43366 Jun 25 15:47 archive/C1LSC_150625_154739.ini
...

I would have thought, though, that data transmission errors would show up in the daqd status bits.

  11404   Mon Jul 13 18:12:50 2015 JamieSummaryCDSCDS upgrade: left running in semi-stable configuration

I have been watching daqd all day and I don't feel particularly closer to understanding what the issues are.  However, things are

Interestingly, though, the stability appears highly variable at the moment.  This morning, daqd was very unstable and was crashing within a couple of minutes of starting.  However this afternoon, things seemed much more stable.  As of this moment, daqd has been running for for 25 minutes now, writing full frames as well as minute and second trends (no minute_raw), without any issues.  What has changed?

To reiterate, I have been closing watching disk IO to /frames.  I see no indication that there is any disk contention while daqd is failing.  It's still possible, though, that there are disk IO issues affecting daqd at a level that is not readily visible.  From dstat, the frame writes are visible, but nothing else.

I have made one change that could be positively affecting things right now: I un-exported /frames from NFS.  This eliminates anything external from reading /frames over the network.  In particular, it also shuts off the transfer of frames to LDAS.  Since I've done this, daqd has appeared to be more stable.  It's NOT totally stable, though, as the instance that I described above did eventually just die after 43 minutes, as I was writing this.

In any event, as things are currently as stable as I've seen them, I'm leaving it running in this configuration for the moment, with the following relevant daqdrc parameters:

start main 16;
start frame-saver;
sync frame-saver;
start trender 60 60;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start profiler;
start trend profiler;
  11406   Tue Jul 14 09:08:37 2015 JamieSummaryCDSCDS upgrade: left running in semi-stable configuration

Overnight daqd restarted itself only about twice an hour, which is an improvement:

controls@fb /opt/rtcds/caltech/c1/target/fb 0$ tail logs/restart.log
daqd: Tue Jul 14 03:13:50 PDT 2015
daqd: Tue Jul 14 04:01:39 PDT 2015
daqd: Tue Jul 14 04:09:57 PDT 2015
daqd: Tue Jul 14 05:02:46 PDT 2015
daqd: Tue Jul 14 06:01:57 PDT 2015
daqd: Tue Jul 14 06:43:18 PDT 2015
daqd: Tue Jul 14 07:02:19 PDT 2015
daqd: Tue Jul 14 07:58:16 PDT 2015
daqd: Tue Jul 14 08:02:44 PDT 2015
daqd: Tue Jul 14 09:02:24 PDT 2015

Un-exporting /frames might have helped a bit.  However, the problem is obviously still not fixed.

  11408   Tue Jul 14 10:28:02 2015 ericqSummaryCDSCDS upgrade: left running in semi-stable configuration

There remains a pattern to some of the restarts, the following times are all reported as restart times. (There are others in between, however.)

daqd: Tue Jul 14 00:02:48 PDT 2015
daqd: Tue Jul 14 01:02:32 PDT 2015
daqd: Tue Jul 14 03:02:33 PDT 2015
daqd: Tue Jul 14 05:02:46 PDT 2015
daqd: Tue Jul 14 06:01:57 PDT 2015
daqd: Tue Jul 14 07:02:19 PDT 2015
daqd: Tue Jul 14 08:02:44 PDT 2015
daqd: Tue Jul 14 09:02:24 PDT 2015
daqd: Tue Jul 14 10:02:03 PDT 2015

Before the upgrade, we suffered from hourly crashes too:

daqd_start Sun Jun 21 00:01:06 PDT 2015
daqd_start Sun Jun 21 01:03:47 PDT 2015
daqd_start Sun Jun 21 02:04:04 PDT 2015
daqd_start Sun Jun 21 03:04:35 PDT 2015
daqd_start Sun Jun 21 04:04:04 PDT 2015
daqd_start Sun Jun 21 05:03:45 PDT 2015
daqd_start Sun Jun 21 06:02:43 PDT 2015
daqd_start Sun Jun 21 07:04:42 PDT 2015
daqd_start Sun Jun 21 08:04:34 PDT 2015
daqd_start Sun Jun 21 09:03:30 PDT 2015
daqd_start Sun Jun 21 10:04:11 PDT 2015

So, this isn't neccesarily new behavior, just something that remains unfixed. 

  11409   Tue Jul 14 11:57:27 2015 jamieSummaryCDSCDS upgrade: left running in semi-stable configuration
Quote:

There remains a pattern to some of the restarts, the following times are all reported as restart times. (There are others in between, however.)

daqd: Tue Jul 14 00:02:48 PDT 2015
daqd: Tue Jul 14 01:02:32 PDT 2015
daqd: Tue Jul 14 03:02:33 PDT 2015
daqd: Tue Jul 14 05:02:46 PDT 2015
daqd: Tue Jul 14 06:01:57 PDT 2015
daqd: Tue Jul 14 07:02:19 PDT 2015
daqd: Tue Jul 14 08:02:44 PDT 2015
daqd: Tue Jul 14 09:02:24 PDT 2015
daqd: Tue Jul 14 10:02:03 PDT 2015

Before the upgrade, we suffered from hourly crashes too:

daqd_start Sun Jun 21 00:01:06 PDT 2015
daqd_start Sun Jun 21 01:03:47 PDT 2015
daqd_start Sun Jun 21 02:04:04 PDT 2015
daqd_start Sun Jun 21 03:04:35 PDT 2015
daqd_start Sun Jun 21 04:04:04 PDT 2015
daqd_start Sun Jun 21 05:03:45 PDT 2015
daqd_start Sun Jun 21 06:02:43 PDT 2015
daqd_start Sun Jun 21 07:04:42 PDT 2015
daqd_start Sun Jun 21 08:04:34 PDT 2015
daqd_start Sun Jun 21 09:03:30 PDT 2015
daqd_start Sun Jun 21 10:04:11 PDT 2015

So, this isn't neccesarily new behavior, just something that remains unfixed. 

That's interesting, that we're still seeing those hourly crashes.

We're not writing out the full set of channels, though, and we're getting more failures than just those at the hour, so we're still suffering.

  11412   Tue Jul 14 16:51:01 2015 JamieSummaryCDSCDS upgrade: problem is not disk access

I think I have now determined once and for all that the daqd problems are NOT due to disk IO contention.

I have mounted a tmpfs at /frames/tmp and have told daqd to write frames there.  The tmpfs exists entirely in RAM.  There is essentially zero IO wait for such a filesystem, so daqd should never have trouble writing out the frames.

But yet daqd continues to fail with the "0 empty blocks in the buffer" warnings.  I've been down a rabbit hole.

  11414   Tue Jul 14 17:14:23 2015 EveSummarySummary PagesFuture summary pages improvements

Here is a list of suggested improvements to the summary pages. Let me know if there's something you'd like for me to add to this list!

  • A lot of plots are missing axis labels and titles, and I often don't know what to call these labels. I could use some help with this.
  • Check the weather and vacuum tabs to make sure that we're getting the expected output. Set the axis labels accordingly. 
  • Investigate past periods of missing data on DataViewer to see if the problem was with the data requisition process, the summary page production process, or something else.
  • Based on trends in data over the past three months, set axis ranges accordingly to encapsulate the full data range.
  • Create a CDS tab to store statistics of our digital systems. We will use the CDS signals to determine when the digital system is running and when the minute trend is missing. This will allow us to exclude irrelevant parts of the data.
  • Provide duty ratio statistics for the IMC.
  • Set triggers for certain plots. For example, for channels C1:LSC-XARM OUT DQ and page 4 LIGO-T1500123–v1 C1:LSC-YARM OUT DQ to be plotted in the Arm LSC Control signals figures, C1:LSCTRX OUT DQ and C1:LSC-TRY OUT DQ must be higher than 0.5, thus acting as triggers.
  • Include some flag or other marking indicating when data is not being represented at a certain time for specific plots.
  • Maybe include some cool features like interactive plots.
  11415   Wed Jul 15 13:19:14 2015 JamieSummaryCDSCDS upgrade: reducing mx end-points as last ditch effort

I tried one last thing, suggested by Keith and Gerrit.  I tried reducing the number of mx end-points on fb to zero, which should reduce the total number of fb threads, in the hope that the extra threads were causing the chokes.

On Tue, Jul 14 2015, Keith Thorne <kthorne@ligo-la.caltech.edu> wrote:
> Assumptions
>  1) Before the upgrade (from RCG 2.6?), the DAQ had been working, reading out front-ends, writing frames trends
>  2) In upgrading to RCG 2.9, the mx start-up on the frame builder was modified to use multiple end-points
> (i.e. /etc/init.d/mx has a line like
> # 1 10G card - X2
> MX_MODULE_PARAMS="mx_max_instance=1 mx_max_endpoints=16 $MX_MODULE_PARAMS"
>  (This can be confirmed by the daqd log file with lines at the top like
> 263596
> MX has 16 maximum end-points configured
> 2 MX NICs available
> [Fri Jul 10 16:12:50 2015] ->4: set thread_stack_size=10240
> [Fri Jul 10 16:12:50 2015] new threads will be created with the stack of size 10
> 240K
>
> If this is the case, the problem may be that the additional thread on the frame-builder (one per end-point) take up so many slots on the 8-core
> frame-builder that they interrupt the frame-writing thread, thus preventing the main buffer from being emptied.  
>
> One could go back to a single end-point. This only helps keep restart of front-end A from hiccuping DAQ for front-end B.
>
> You would have to remove code on front-ends (/etc/init.d/mx_stream) that chooses endpoints. i.e.
> # find line number in rtsystab. Use that to mx_stream slot on card (0-15)
> line_num=`grep -v ^# /etc/rtsystab | grep --perl-regexp -n "^${hostname}\s" | se
> d 's/^\([0-9]*\):.*/\1/g'`
> line_off=$(expr $line_num - 1)
> epnum=$(expr $line_off % 2)
> cnum=$(expr $line_off / 2)
>
>     start-stop-daemon --start --quiet -b -m --pidfile /var/log/mx_stream0.pid --exec /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream -- -e 0 -r "$epnum" -W 0 -w 0 -s "$sys" -d x2daqdc0:$cnum -l /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream_logs/$hostname.log

As per Keith's suggestion, I modified the mx startup script to only initialize a single endpoint, and I modified the mx_stream startup to point them all to endpoint 0.  I verified that indeed daqd was a single MX end-point:

MX has 1 maximum end-points configured

It didn't help.  After 5-10 minutes daqd crashes with the same "0 empty blocks" messages.

I should also mention that I'm pretty sure the start of these messages does not seem coincident with any frame writing to disk; further evidence that it's not a disk IO issue.

Keith is looking at the system now, so we if he can see anything obvious.  If not, I will start reverting to 2.5.

  11417   Wed Jul 15 18:19:12 2015 JamieSummaryCDSCDS upgrade: tentative stabilty?

Keith Thorne provided his eyes on the situation today and had some suggestions that might have helped things

Reorder ini file list in master file.  Apparently the EDCU.ini file (C0EDCU.ini in our case), which describes EPICS subscriptions to be recorded by the daq, now has to be specified *after* all other front end ini files.  It's unclear why, but it has something to do with RTS 2.8 which changed all slow channels to be transported over the mx network.  This alone did not fix the problem, though.

Increase second trend frame size.  Interestingly, this might have been the key.  The second trend frame size was increased to 600 seconds:

start trender 600 60;

The two numbers are the lengths in seconds for the second and minute trends respectively.  They had been set to "60 60", but Keith suggested that longer second trend frames are better, for whatever reason.  It seems he may be right, given that daqd has been running and writing full and trend frames for 1.5 hours now without issue. 


As I'm writing this, though, the daqd just crashed again.  I note, though, that it's right after the hour, and immediately following writing out a one hour minute trend file.  We've been seeing these hour, on the hour, crashes of daqd for quite a while now.  So maybe this is nothing new.  I've actually been wondering if the hourly daqd crashes were associated with writing out the minute trend frames, and I think we might have more evidence to point to that.

If increasing the size of the second trend frames from 60 seconds (35M) to 600 seconds (70M) made a difference in stability, could there be an issue since writing out files that are smaller than some value?  The full frames are 60M, and the minute trends are 35M.

  11427   Sat Jul 18 15:37:19 2015 JamieSummaryCDSCDS upgrade: current status

So it appears we have found a semi-stable configuration for the DAQ system post upgrade:

Here are the issues:

daqd

dadq is running mostly stably for the moment, although it still crashes at the top of every hour (see below).  Here are some relevant points of about the current configuration:

  • recording data from only a subset of front-ends, to reduce the overall load:
    • c1x01
    • c1scx
    • c1x02
    • c1sus
    • c1mcs
    • c1pem
    • c1x04
    • c1lsc
    • c1ass
    • c1x05
    • c1scy
  • 16 second main buffer:
    start main 16;
  • trend lengths: second: 600, minute: 60
    start trender 600 60;
  • writing to frames:
    • full
    • second
    • minute
    • (NOT raw minute trends)
  • frame compression ON

This elliminates most of the random daqd crashing.  However, daqd still crashes at the top of every hour after writing out the minute trend frame. Still unclear what the issue is, but Keith is investigating.  In some sense this is no worse that where we were before the upgrade, since daqd was also crashing hourly then.  It's still crappy, though, so hopefully we'll figure something out.

The inittab on fb automatically restarts daqd after it crashes, and monit on all of the front ends automatically restarts the mx_stream processes.

front ends

The front end modules are mostly running fine.

One issue is that the execution times seem to have increased a bit, which is problematic for models that were already on the hairy edge.  For instance, the rough aversage for c1sus has some from ~48us to 50us.  This is most problematic for c1cal, which is now running at ~66us out of 60, which is obviously untenable.  We'll need to reduce the load in c1cal somehow.

All other front end models seem to be working fine, but a full test is still needed.

There was an issue with the DACs on c1sus, but I rebooted and everything came up fine, optics are now damped:

  11437   Wed Jul 22 22:06:42 2015 EveSummarySummary PagesFuture summary pages improvements

- CDS Tab

We want to monitor the status of the digital control system.

1st plot
Title: EPICS DAQ Status
I wonder we can plot the binary numbers as statuses of the data acquisition for the realtime codes.
We want to use the status indicators. Like this:
https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary/day/20150722/plots/H1-MULTI_A8CE50_SEGMENTS-1121558417-86400.png

channels:
C1:DAQ-DC0_C1X04_STATUS
C1:DAQ-DC0_C1LSC_STATUS
C1:DAQ-DC0_C1ASS_STATUS
C1:DAQ-DC0_C1OAF_STATUS
C1:DAQ-DC0_C1CAL_STATUS

C1:DAQ-DC0_C1X02_STATUS
C1:DAQ-DC0_C1SUS_STATUS
C1:DAQ-DC0_C1MCS_STATUS
C1:DAQ-DC0_C1RFM_STATUS
C1:DAQ-DC0_C1PEM_STATUS

C1:DAQ-DC0_C1X03_STATUS
C1:DAQ-DC0_C1IOO_STATUS
C1:DAQ-DC0_C1ALS_STATUS

C1:DAQ-DC0_C1X01_STATUS
C1:DAQ-DC0_C1SCX_STATUS
C1:DAQ-DC0_C1ASX_STATUS

C1:DAQ-DC0_C1X05_STATUS
C1:DAQ-DC0_C1SCY_STATUS
C1:DAQ-DC0_C1TST_STATUS

1st plot
Title: IOP Fast Channel DAQ Status
These have two bits each. How can we handle it?
If we need to shrink it to a single bit take "AND" of them.
C1:FEC-40_FB_NET_STATUS (legend: c1x04, if a legend placable)
C1:FEC-20_FB_NET_STATUS (legend: c1x02)
C1:FEC-33_FB_NET_STATUS (legend: c1x03)
C1:FEC-19_FB_NET_STATUS (legend: c1x01)
C1:FEC-46_FB_NET_STATUS (legend: c1x05)

3rd plot
Title C1LSC CPU Meters
channels:
C1:FEC-40_CPU_METER (legend: c1x04)
C1:FEC-42_CPU_METER (legend: c1lsc)
C1:FEC-48_CPU_METER (legend: c1ass)
C1:FEC-22_CPU_METER (legend: c1oaf)
C1:FEC-50_CPU_METER (legend: c1cal)
The range is from 0 to 75 except for c1oaf that could go to 500.
Can we plot c1oaf with the value being devided by 8? (Then the legend should be c1oaf /8)

4th plot
Title C1SUS CPU Meters
channels:
C1:FEC-20_CPU_METER (legend: c1x02)
C1:FEC-21_CPU_METER (legend: c1sus)
C1:FEC-36_CPU_METER (legend: c1mcs)
C1:FEC-38_CPU_METER (legend: c1rfm)
C1:FEC-39_CPU_METER (legend: c1pem)
The range is be from 0 to 75 except for c1pem that could go to 500.
Can we plot c1pem with the value being devided by 8? (Then the legend should be c1pem /8)

5th plot
Title C1IOO CPU Meters
channels:
C1:FEC-33_CPU_METER (legend: c1x03)
C1:FEC-34_CPU_METER (legend: c1ioo)
C1:FEC-28_CPU_METER (legend: c1als)
The range is be from 0 to 75.

6th plot
Title C1ISCEX CPU Meters
channels:
C1:FEC-19_CPU_METER (legend: c1x01)
C1:FEC-45_CPU_METER (legend: c1scx)
C1:FEC-44_CPU_METER (legend: c1asx)
The range is be from 0 to 75.

7th plot
Title C1ISCEY CPU Meters
channels:
C1:FEC-46_CPU_METER (legend: c1x05)
C1:FEC-47_CPU_METER (legend: c1scy)
C1:FEC-91_CPU_METER (legend: c1tst)
The range is be from 0 to 75.

=====================

IOO

We want a duty ratio plot for the IMC. C1:IOO-MC_TRANS_SUM >1e4 is the good period.

Duty ratio plot looks like the right plot of the following link
https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary/day/20150722/lock/segments/

=====================

SUS: OPLEV

OL_PIT_INMON and OL_YAW_INMON are good for the slow drift monitor.
But their sampling rate is too slow for the PSDs.
Can you use
C1:SUS-ETM_OPLEV_PERROR
C1:SUS-ETM_OPLEV_YERROR
etc...
For the PSDs? They are 2kHz sampling DQ channels. You would be able to plot
it up to ~1kHz. In fact, we want to monitor the PSD from 100mHz to 1kHz.
How can you set up the resolution (=FFT length)?

=====================

LSC / ASC / ALS tabs

Let's make new tabs LSC, ASC, and ALS

LSC:

We should have a plot for
C1:LSC-TRX_OUT_DQ
C1:LSC-TRY_OUT_DQ
C1:LSC-POPDC_OUT_DQ
It's OK to use the minute trend for now.
You can check the range using dataviewer.

ASC:

Let's use
C1:SUS_MC1_ASCPIT_OUT16 (legend: IMC WFS)
C1:ASS-XARM_ITM_YAW_OSC_CLKGAIN (legend: XARM ASS)
C1:ASS-YARM_ITM_YAW_OSC_CLKGAIN (legend: YARM ASS)
C1:ASX-XARM_M1_PIT_OSC_CLKGAIN (legend: XARM Green ASS)
as the status indicators. There is no YARM Green ASS yet.

ALS:

Title: ALS Green transmission
We want a time series of
ALS-TRX_OUT16
ALS-TRY_OUT16

Title: ALS Green beatnote
Another time series
ALS-BEATX_FINE_Q_MON
ALS-BEATY_FINE_Q_MON

Title: Frequency monitor
We have frequency counter outputs, but I have to talk to Eric to know the channel names

  11441   Thu Jul 23 20:57:15 2015 JessicaSummaryGeneralApplying Pre-filter to data before IIR Wiener Filtering

I updated my bandpass filter and have included the bode plot below in Figure 1. It is a fourth order elliptic bandpass filter with a passband ripple of 1dB and a stopband attenuation of 30 dB. It emphasizes the area between 3 and 40 Hz.

Below, I applied this filter to the huddle test data. The results from this were only slightly better in the targeted region than when no pre-filter was applied. 

When I pre-filtered the mode cleaner data and then used an IIR wiener filter, I found that the results did not differ much from the data that was not pre-filtered. I'm not sure yet if I'm targeting the right region of this data with my bandpass filter, and will be looking more into choosing a better region. Also, I am only using certain regions of ff when calculating the transfer function, and need to optimize that region also. I uploaded the code I used to make these plots to github.

  11456   Tue Jul 28 20:42:50 2015 JessicaSummaryGeneralNew Seismometer Data Coherence

I was looking at the new seismometer data and plotted the coherence between the different arms of C1:PEM_GUR1 and C1:PEM_GUR2. There was not much coherence in the X arms, Y arms, or Z arms of each seismometer, but there were within the x and y arms of the seismometer.

I think the area we should focus on with filtering is lower ranges, between 0.01 and 0.1, because that it where coherence is most clearly high. It is higher in high frequencies but also incredibly noisy, meaning it probably wouldn't be good to try to filter there. 

  11457   Wed Jul 29 10:34:42 2015 IgnacioSummaryLSCCoherence of arms and seismometers

Jessica and I took 45 mins  (GPS times from 1122099200 to 1122101950) worth of data from the following channels:

C1:IOO-MC_L_DQ (mode cleaner)
C1:LSC-XARM_IN1_DQ (X arm length)
C1:LSC-YARM_IN1_DQ (Y arm length)

and for the STS, GUR1, and GUR2 seismometer signals.

The PSD for MCL and the arm length signals is shown below,

I looked at the coherence between the arm length and each of the three seismometers, plot overload incoming below,

For the coherence between STS and XARM and YARM,

  

For GUR1,

  

Finally for GUR2,

 

A few remarks:

1) From the coherence plots, we can see that the arm length signals are coherent with the seismometer signals the most from 0.5 - 50 Hz. This is most evident in the coherence with STS. I think subtraction will be most useful in this range. This agrees with what we see in the PSD of the arm length signals, the magnitude of the PSD starts increasing from 1 Hz and reaches a maximum at about 30 Hz. This is indicative of which frequencies most of the noise is present.

2) Eric did not remember which of  GUR1 and GUR2 corresponded to the ends of XARM and YARM. So, I went to the end of XARM, and jumped for a couple seconds to disturb whatever Gurald was in there. Using dataviewer I determined it was GUR1. Anyways, my point is, why is GUR1 less coherent with both arms and not just XARM?  Since it is at the end of XARM, I was expecting GUR1 to be more coherent with XARM. Is it because, though different arms, the PSD's of both arms are roughly the same? 

3) Similarly, GUR2 shows about the same levels of coherence for both arms, but it is more coherent. Is GUR2 noisier because of its location?

Code: ARMS_COH.m.zip

  11458   Wed Jul 29 11:15:21 2015 JessicaSummaryLSCPSDs of Arms with seismometer subtraction

Ignacio and I downloaded data from the STS, GUR1, and GUR2 seismometers and from the mode cleaner and the x and y arms. The PSDs along the arms have the most noise at frequencies greater than 1 Hz, so we should focus on filtering in that area. The noise levels start dropping at around 30 Hz, but are still much higher than is seen at frequencies below 1 Hz. However, because the spectra is so low at frequencies below that, Wiener filtering alone injected a significant amount of noise into those frequencies and did not do much to reduce the noise at higher frequencies. Pre-filtering will be required, and I have started implementing a pre-filter, but with no improvements yet. So far, I have tried making a bandpass filter, but a highpass filter may be ideal in this case because so much of the noise is above 1 Hz. 

  11461   Wed Jul 29 21:40:39 2015 KojiSummaryCDSStatus of the frame data syncing

The trend data hasn't been synced with LDAS since Jul 27 5AM local.

40m:

controls@nodus|minute > pwd
/frames/trend/minute
controls@nodus|minute > ls -l 11222 | tail
total 590432
-rw-r--r-- 1 controls controls 35758781 Jul 29 11:59 C-M-1122228000-3600.gwf
-rw-r--r-- 1 controls controls 35501472 Jul 29 12:59 C-M-1122231600-3600.gwf
-rw-r--r-- 1 controls controls 35296271 Jul 29 13:59 C-M-1122235200-3600.gwf
-rw-r--r-- 1 controls controls 35459901 Jul 29 14:59 C-M-1122238800-3600.gwf
-rw-r--r-- 1 controls controls 35550346 Jul 29 15:59 C-M-1122242400-3600.gwf
-rw-r--r-- 1 controls controls 35699944 Jul 29 16:59 C-M-1122246000-3600.gwf
-rw-r--r-- 1 controls controls 35549480 Jul 29 17:59 C-M-1122249600-3600.gwf
-rw-r--r-- 1 controls controls 35481070 Jul 29 18:59 C-M-1122253200-3600.gwf
-rw-r--r-- 1 controls controls 35518238 Jul 29 19:59 C-M-1122256800-3600.gwf
-rw-r--r-- 1 controls controls 35514930 Jul 29 20:59 C-M-1122260400-3600.gwf

 

LDAS Minute trend:

[koji.arai@ldas-pcdev3 C-M-11]$ pwd
/archive/frames/trend/minute-trend/40m/C-M-11
[koji.arai@ldas-pcdev3 C-M-11]$ ls -l | tail
-rw-r--r-- 1 1001 1001 35488497 Jul 26 19:59 C-M-1121997600-3600.gwf
-rw-r--r-- 1 1001 1001 35477333 Jul 26 21:00 C-M-1122001200-3600.gwf
-rw-r--r-- 1 1001 1001 35498259 Jul 26 21:59 C-M-1122004800-3600.gwf
-rw-r--r-- 1 1001 1001 35509729 Jul 26 22:59 C-M-1122008400-3600.gwf
-rw-r--r-- 1 1001 1001 35472432 Jul 26 23:59 C-M-1122012000-3600.gwf
-rw-r--r-- 1 1001 1001 35472230 Jul 27 00:59 C-M-1122015600-3600.gwf
-rw-r--r-- 1 1001 1001 35468199 Jul 27 01:59 C-M-1122019200-3600.gwf
-rw-r--r-- 1 1001 1001 35461729 Jul 27 02:59 C-M-1122022800-3600.gwf
-rw-r--r-- 1 1001 1001 35486755 Jul 27 03:59 C-M-1122026400-3600.gwf
-rw-r--r-- 1 1001 1001 35467084 Jul 27 04:59 C-M-1122030000-3600.gwf

 

  11465   Thu Jul 30 11:47:54 2015 KojiSummaryCDSStatus of the frame data syncing

Today it was synced at 5AM but that was all.

40m:

controls@nodus|minute > pwd
/frames/trend/minute
controls@nodus|minute > ls -l 11222|tail
-rw-r--r-- 1 controls controls 35521183 Jul 29 21:59 C-M-1122264000-3600.gwf
-rw-r--r-- 1 controls controls 35509281 Jul 29 22:59 C-M-1122267600-3600.gwf
-rw-r--r-- 1 controls controls 35511705 Jul 29 23:59 C-M-1122271200-3600.gwf
-rw-r--r-- 1 controls controls 35809690 Jul 30 00:59 C-M-1122274800-3600.gwf
-rw-r--r-- 1 controls controls 35752082 Jul 30 01:59 C-M-1122278400-3600.gwf
-rw-r--r-- 1 controls controls 35927246 Jul 30 02:59 C-M-1122282000-3600.gwf
-rw-r--r-- 1 controls controls 35775843 Jul 30 03:59 C-M-1122285600-3600.gwf
-rw-r--r-- 1 controls controls 35648583 Jul 30 04:59 C-M-1122289200-3600.gwf
-rw-r--r-- 1 controls controls 35643898 Jul 30 05:59 C-M-1122292800-3600.gwf
-rw-r--r-- 1 controls controls 35704049 Jul 30 06:59 C-M-1122296400-3600.gwf
controls@nodus|minute > ls -l 11223|tail
total 139616
-rw-r--r-- 1 controls controls 35696854 Jul 30 08:02 C-M-1122300000-3600.gwf
-rw-r--r-- 1 controls controls 35675136 Jul 30 08:59 C-M-1122303600-3600.gwf
-rw-r--r-- 1 controls controls 35701754 Jul 30 09:59 C-M-1122307200-3600.gwf
-rw-r--r-- 1 controls controls 35718038 Jul 30 10:59 C-M-1122310800-3600.gwf

LDAS Minute trend:

[koji.arai@ldas-pcdev3 C-M-11]$ pwd
/archive/frames/trend/minute-trend/40m/C-M-11
[koji.arai@ldas-pcdev3 C-M-11]$ ls -l |tail
-rw-r--r-- 1 1001 1001 35518238 Jul 29 19:59 C-M-1122256800-3600.gwf
-rw-r--r-- 1 1001 1001 35514930 Jul 29 20:59 C-M-1122260400-3600.gwf
-rw-r--r-- 1 1001 1001 35521183 Jul 29 21:59 C-M-1122264000-3600.gwf
-rw-r--r-- 1 1001 1001 35509281 Jul 29 22:59 C-M-1122267600-3600.gwf
-rw-r--r-- 1 1001 1001 35511705 Jul 29 23:59 C-M-1122271200-3600.gwf
-rw-r--r-- 1 1001 1001 35809690 Jul 30 00:59 C-M-1122274800-3600.gwf
-rw-r--r-- 1 1001 1001 35752082 Jul 30 01:59 C-M-1122278400-3600.gwf
-rw-r--r-- 1 1001 1001 35927246 Jul 30 02:59 C-M-1122282000-3600.gwf
-rw-r--r-- 1 1001 1001 35775843 Jul 30 03:59 C-M-1122285600-3600.gwf
-rw-r--r-- 1 1001 1001 35648583 Jul 30 04:59 C-M-1122289200-3600.gwf

  11502   Thu Aug 13 12:06:39 2015 Jessica SummaryIOOBetter predicted subtraction did not work as well Online

Yesterday I adjusted the preweighting of my IIR fit to the transfer function of MC2, and also managed to reduce the number of poles and zeros from 8 to 6, giving a smoother rolloff. The bode plots are pictured here:

The predicted IIR subtraction was very close to the predicted FIR subtraction, so I thought these coefficients would lead to a better online filter.

However, the actual subtraction of the MCL was not as good and noise was injected into the Y arm.

The final comparison of the subtraction factors between the online and offline data showed that the preweighting, while it improved the offline subtraction, needs more work to improve the online subtraction also.

  11509   Fri Aug 14 23:49:34 2015 KojiSummaryGeneralB&K Shaker fixed

I fixed a shaker that was claimed to be broken. I had to cut the rubber membrane to open the head.

Once it was opened, the cause of the trouble was obvious. The soldering joint could not put up with the motion of the head.

It is interesting to see that the spring has the damping layer between the metal sheets.

After the repair the DC resistance was measured. It was 1.9Ohm. The side of the shaker chassis said "3.5Ohm, Max 15VA". So it can take more than 4A (wow).

I gave 2A DC from the bench top supply and turn the current on and off. I could confirm the head was moving.

I'll claim the use of this shaker for the seismometer development.

  11524   Sat Aug 22 15:48:32 2015 KojiSummaryLSCArm locking recovery

As per Ignacio's request, I restored the arm locking.

- MC WFS relief

- Slow DC restored to ~0V

- Turned off DARM/CARM

- XARM/YARM turned on

- XARM/YARM ASS& Offset offloading

  11551   Tue Sep 1 02:44:44 2015 KojiSummaryCDSc1oaf, c1mcs modified for the IMC angular FF

[Koji, Ignacio]

In order to allow us to work on the IMC angular FF, we made the signal paths from PEM to MC SUSs.
In fact, there already were the paths from c1pem to c1oaf. So, the new paths were made from c1oaf to c1mcs. (Attachment 1~3)

After some debugging those two models started running. The additional cost of the processing time is insignificant.
FB was restarted to accomodate the change.

Once the modification of the models was completed, the OAF screens were modified. It seemed that the Kissel button
for the output matrix haven't been updated for the PRM ASC implementation. This was fixed as the button was updated this time.
In addition, the button for the FM matrix was also made and pasted.

 

  11569   Thu Sep 3 19:52:24 2015 ranaSummarySUSSUS drift monitor

Since Andrey's SUS Drift mon screen back in 2007, we've had several versions which used different schemes and programming languages. Diego made an update back in January.

Today I added his stuff to the SVN since it was lost in the NFS disks somewhere. Its in SUS/DRIFT_MON/.

Since we've been updating our userapps directory recently to pull in the screens and scripts from the sites, we also got a copy of the Thomas Abbott drift mon stuff which is better (Diego actually removed the yellow/red functionality as part of the 'upgrade'), but more complicated. For now we have the old one. I updated the good values with all optics roughly aligned just a few minutes ago.
 

ELOG V3.1.3-