40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
 40m Log, Page 61 of 341 Not logged in
ID Date Author Type Category Subject
11032   Sat Feb 14 22:14:02 2015 KojiSummaryLSC[HOW TO] 3f modulation cancellation

When I finished my measurements, the modulation setup was reverted to the conventional one.
If someone wants to use the 3f cancellation setting, it can be done along with this HOW-TO.

The 3f cancellation can be realized by adding a carefully adjusted delay line and attenuation for the 55MHz modulation
on the frequency generation box at the 1X2 rack.  Here is the procedure:

1) Turn off the frequency generation box

There is a toggle switch at the rear of the unit. It's better to turn it off before any cable action.
The outputs of the frequency generation box are high in general. We don't want to operate
the amplifiers without proper impedance matching in any occasion.

2) Remove the small SMA cable between 55MHz out and 55MHz in (Left arrow in the attachment 1).

According to the photo by Alberto (svn: /docs/upgrade08/RFsystem/frequencyGenerationBox/photos/DSC_2410.JPG),
this 55MHz out is the output of the frequency multiplier. The 55MHz in is the input for the amplifier stages.
Therefore, the cable length between these two connectors changes the relative phase between the modulations at 11MHz and 55MHz.

3) Add a delay line box with cables (Attachment 2).

Connect the cables from the delay line box to the 55MHz in/out connectors. I used 1.5m BNC cables.
The delay line box was set to have 28ns delay.

4) Set the attenuation of the 55MHz EOM drive (Right arrow in the attachment 1) to be 10dB.

Rotate the attenuation for 55MHz EOM from 0dB nominal to 10dB.

5) Turn on the frequency modulation box

For reference, the 3rd attachment shows the characteristics of the delay line cable/box combo when the 3f modualtion reduction
was realized. It had 1.37dB attenuation and +124deg phase shift. This phase change corresponds to the time delay of 48ns.
Note that the response of a short cable used for the measurement has been calibrated out using the CAL function of the network analyzer.

11033   Sun Feb 15 16:20:44 2015 KojiSummaryLSC[ELOG LIST] 3f modulation cancellation

## Summary of the ELOGS

3f modulation cancellation theory http://nodus.ligo.caltech.edu:8080/40m/11005

3f modulation cancellation adjustment setup http://nodus.ligo.caltech.edu:8080/40m/11029

Receipe for the 3f modulation cancellation http://nodus.ligo.caltech.edu:8080/40m/11032

Modulation depth analysis http://nodus.ligo.caltech.edu:8080/40m/11036

11034   Sun Feb 15 20:55:48 2015 ranaSummaryLSC[ELOG LIST] 3f modulation cancellation

I wonder if DRMI can be locked on 3f using this lower 55 MHz modulation depth. It seems that PRMI should be unaffected, but that the 3*f2 signals for SRCL will be too puny. Is it really possible to scale up the overall modulation depths by 3x to compensate for this?

11035   Mon Feb 16 00:08:44 2015 KojiSummaryLSC[ELOG LIST] 3f modulation cancellation

This KTP crystal has the maximum allowed RF power of 10W (=32Vpk) and V_pi = 230V. This corresponds to the maximum allowed
modulation depth of 32*Pi/230 = 0.44. So we probably can achieve gamma_1 of ~0.4 and gamma_2 of ~0.13. That's not x3 but x2,

Then Kiwamu's triple resonant circuit LIGO-G1000297-v1 actually shows the modulation up to ~0.7. Therefore it is purely an issue
how to deliver sufficient modulation power. (In fact his measurement shows some nonlinearity above the modulation depth of ~0.4
so we should keep the maximum power consumption of 10W at the crystal)

This means that we need to review our RF system (again!)

- Review infamous crazy attn/amp combinations in the frequency generation box.
- Use Teledyne Cougar ampilfier (A2CP2596) right before the triple resonant box. This should be installed closely to the triple resonant box in order to
minimize the effects of the reflection due to imperferct impedance matching.
- Review and refine the triple resonant circuit - it's not built on a PCB but on a universal board. I think that we don't need triple
resonance, but double is OK as the 29.5MHz signal is small.

We want +28V supply at 1X1 for the Teledyne amp and the AOM driver. Do we have any unused Sorensen?

11036   Mon Feb 16 01:45:12 2015 KojiSummaryLSCmodulation depth analysis

Based on the measured modulation profiles, the depth of each modulation was estimated.
Least square sum minimization of the relative error was used for the cost function.
-8th, -12th~-14th, n=>7th are not included in the estimation for the nominal case.
-7th~-9th, -11th~-15th, n=>7th are not included in the estimation for the 3f reduced case.

## Nominal modulation

m_f1 = 0.194
m_f2 = 0.234
theta_f1f2 = 41.35deg
m_IMC = 0.00153

## 3f reduced modulation

m_f1 = 0.191
m_f2 = 0.0579
theta_f1f2 = 180deg
m_IMC = 0.00149

(Sorry! There is no error bars. The data have too few statistics...)

11106   Fri Mar 6 00:59:13 2015 ranaSummaryIOOMC alignment not drifting; PSL beam is drifting

In the attached plot you can see that the MC REFL fluctuations started getting larger on Feb 24 just after midnight. Its been bad ever since. What happened that night or the afternoon of Feb 23?
The WFS DC spot positions were far off (~0.9), so I unlocked the IMC and aligned the spots on there using the nearby steering mirrors - lets see if this helps.

Also, these mounts should be improved. Steve, can you please prepare 5 mounts with the Thorlabs BA2 or BA3 base, the 3/4" diameter steel posts, and the Polanski steel mirror mounts? We should replace the mirror mounts for the 1" diameter mirrors during the daytime next week to reduce drift.

11109   Fri Mar 6 13:48:17 2015 dark kiwamuSummaryIOOtriple resonance circuit

I was asked by Koji to point out where a schematic of the triple resonant circuit is.
It seems that I had posted a schematic of what currently is installed (see elog 4562 from almost 4 yrs ago!).

(Some transfomer story)
Then I immediately noticed that it did not show two components which were wideband RF transformers. In order to get an effective turns ratio of 1:9.8 (as indicated in the schematic) from a CoilCrfat's transformer kit in the electronics table, I had put two more transformers in series to a PWB1040L which is shown in the schematic. If I am not mistaken, this PWB1040L must be followed by a PWB1015L and PWB-16-AL in the order from the input side to the EOM side. This gives an impedance ratio of 96 or an effective turns ratio of sqrt(96) = 9.8.

Also, if one wants to review and/or upgrade the circuit, this document may be helpful:
https://wiki-40m.ligo.caltech.edu/Electronics/Multi_Resonant_EOM?action=AttachFile&do=get&target=design_EOM.pdf
This is a document that I wrote some time ago describing how I wanted to make the circuit better. Apparently I did not get a chance to do it.

11112   Fri Mar 6 19:54:15 2015 ranaSummaryIOOMC alignment not drifting; PSL beam is drifting

MC Refl alignment follow up: the alignment from last night seems still good today. We should keep an  on the MC WFS DC spots and not let them get beyond 0.5.

11134   Wed Mar 11 19:15:03 2015 KojiSummaryLSCROUGH calibration of the darm spectrum during the full PRFPMI lock

I made very rough calibration of the DARM spectra before and after the transition for the second lock on Mar 8.

The cavity pole (expected to be 4.3kHz) was not compensated. Also the servo bump was not compensated.

[Error calibration]

While the DARM/CARM were controlled with ALS, the calibration of them are provided by the ALS phase tracker calibration.
i.e 1 degree = 19.23kHz

This means that the calibration factor is

DARM [deg] * 19.23e3 [Hz/deg] / c [m Hz] * lambda [m] * L_arm [m]
= DARM* 19.23e3/299792458*1064e-9*38.5 = 2.6e-9 *DARM [m]

[Feedback calibration]

Then, the feedback signal was calibrated by the suspension response (f=1Hz, Q=5)
so that the error and feedback signals can match at 100Hz.

This gave me the DC factor of 5e-8.

The spectra at 1109832200 (ALS only, even not on the resonance) and 1109832500 (after DARM/CARM transitions) were taken.
Jenne said that the whitening filters for AS55Q was not on.

11147   Thu Mar 19 16:58:19 2015 SteveSummaryIOOMC alignment not drifting; PSL beam is drifting

Polaris mounts ordered.

 Quote: In the attached plot you can see that the MC REFL fluctuations started getting larger on Feb 24 just after midnight. Its been bad ever since. What happened that night or the afternoon of Feb 23? The WFS DC spot positions were far off (~0.9), so I unlocked the IMC and aligned the spots on there using the nearby steering mirrors - lets see if this helps. Also, these mounts should be improved. Steve, can you please prepare 5 mounts with the Thorlabs BA2 or BA3 base, the 3/4" diameter steel posts, and the Polanski steel mirror mounts? We should replace the mirror mounts for the 1" diameter mirrors during the daytime next week to reduce drift.

11148   Thu Mar 19 17:11:32 2015 steveSummarySUSoplev laser summary updated

March  19, 2015   2  new  JDSU 1103P, sn P919645 & P919639 received from Thailand through Edmond Optics. Mfg date 12/2014............as spares

11149   Fri Mar 20 10:51:09 2015 SteveSummaryIOOMC alignment not drifting; PSL beam is drifting

Are the two  visible small srews holding the adapter plate only?

If yes, it is the weakest point of the IOO path.

11158   Mon Mar 23 09:42:29 2015 SteveSummaryIOO4" PSL beam path posts

To achive the same beam height each components needs their specific post height.

 Beam Height Base Plate Mirror Mount Lens Mount Waveplates-Rotary 0.75" OD. SS Post Height 4" Thorlabs BA2 Newport LH-1 2.620" 4" Thorlabs BA2 Polaris K1 2.620" 4" Thorlabs BA2 Polaris K2 2.220" 4" Thorlabs BA2 Thorlabs LMR1 2.750" 4" Thorlabs BA2 New Focus 9401 2.120" 4" Thorlabs BA2 Newport U100 2.620" 4" Thorlabs BA2 Newport U200 2.120" 4" Newport 9021 LH-1 2.0" PMC-MM lens with xy translation stage: Newport 9022, 9065A    Atm3 4" Newport 9021 LH-1 1.89 MC-MM lens with translation stage: Newport 9022, 9025        Atm2

We have 2.625" tall, 3/4" OD SS posts for Polaris K1 mirror mounts: 20 pieces

Ordered Newport LH-1 lens mounts with axis height 1.0

11171   Wed Mar 25 18:27:34 2015 KojiSummaryGeneralSome maintainance

- I found that the cable for the AS55 LO signal had the shielding 90% broken. It was fixed.

- The Mon5 monitor in the control room was not functional for months. I found a small CRT down the east arm.
It is now set as MON5 showing the picture from cameras. Steve, do we need any safety measure for this CRT?

11173   Wed Mar 25 18:48:11 2015 KojiSummaryLSC55MHz demodulators inspection

[Koji Den EricG]

We inspected the {REFL, AS, POP}55 demodulators.

Short in short, we did the following changes:

- The REFL55 PD RF signal is connected to the POP55 demodulator now.
Thus, the POP55 signals should be used at the input matrix of the LSC screens for PRMI tests.

- The POP55 PD RF signal is connected to the REFL55 demodulator now.

- We jiggled the whitening gains and the whitening triggers. Whitening gains for the AS, REFL, POP PDs are set to be 9, 21, 30dB as before.
However, the signal gain may be changed. The optimal gains should be checked through the locking with the interferometer.

- Test 1

Inject 55.3MHz signal to the demodulators. Check the amplitude in the demodulated signal with DTT.
The peak height in the spectrum was calibrated to counts (i.e. it is not counts/rtHz)
We check the amplitude at the input of the input filters (e.g. C1:LSC-REFL55_I_IN1). The whitening gains are set to 0dB.
And the whitening filters were turned off.

REFL55 f_inj = 55.32961MHz -10dBm REFL55I @999Hz  22.14 [cnt] REFL55Q @999Hz  26.21 [cnt]

f_inj = 55.33051MHz -10dBm REFL55I @ 99Hz  20.26 [cnt]  ~200mVpk at the analog I monitor REFL55Q @ 99Hz  24.03 [cnt]

f_inj = 55.33060MHz -10dBm REFL55I @8.5Hz  22.14 [cnt] REFL55Q @8.5Hz  26.21 [cnt]

----
f_inj = 55.33051MHz -10dBm AS55I   @ 99Hz 585.4 [cnt] AS55Q   @ 99Hz 590.5 [cnt]   ~600mVpk at the analog Q monitor

f_inj = 55.33051MHz -10dBm POP55I  @ 99Hz 613.9 [cnt]   ~600mVpk at the analog I monitor POP55Q  @ 99Hz 602.2 [cnt]

We wondered why the REFL55 has such a small response. The other demodulators seems to have some daughter board. (Sigg amp?)
This maybe causing this difference.

-----

- Test 2

We injected 1kHz 1Vpk AF signal into whitening board. The peak height at 1kHz was measured.
The whitening filters/gains were set to be the same condition above.

f_inj = 1kHz 1Vpk REFL55I 2403 cnt REFL55Q 2374 cnt AS55I   2374 cnt AS55Q   2396 cnt POP55I  2365 cnt POP55Q  2350 cnt

So, they look identical. => The difference between REFL55 and others are in the demodulator.

11180   Fri Mar 27 20:32:17 2015 KojiSummaryLSCLocking activity

- Adjutsed the IMC WFS operating point. The IMC refl is 0.42-0.43.

- The arms are aligned with ASS

- The X arm green was aligned with ASX. PZT offsets slides were adjusted to offload the servo outputs.

- I tried the locking once and the transition was successfull. I even tried the 3f-1f transition but the lock was lost. I wasn't sure what was the real cause.

I need to go now. I leave the IFO at the state that it is waiting for the arms locked with IR for the full locking trial.

11235   Wed Apr 22 11:48:30 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Since the Frequency counters have not been a reliable error signal for FOL PID loop, we will put together an analog delay line frequency dicriminator as an alternative method to obtain the beat frequency.

The configuration will be similar to what was done in elog 4254 in the first place.

For a delay line frequency dicriminator, the output at the mixer is proportional to $cos(\theta_{b})$ where $\theta_{b} = 2 \pi f_{b}L/v$

L - cable length asymmetry, fb - beat frequency and v - velocity of light in the cable.

The linear output signal canbe obtained for  $0< \theta_{b}<\pi$

For our purpose in FOL, if we would like to measure beat frequency over a bandwidth of 200MHz, this would correspond to a cable length difference of 0.5 m (assuming the speed of light in the coaxial cable is ~ 2x108m/s.

11236   Wed Apr 22 14:56:18 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

[Koji, Manasa]

Since the bandwidth of the fiber PD is ~ 1GHz, we could design the frequency discriminator to have a wider bandwidth (~ 500MHz). The output from the frequency discriminator could then be used to define the range setting of the frequency counter for readout or may be even error signal to the PID loop.

A test run for the analog DFD with cable length difference of 27cm gave a linear output signal with zero-crossing at ~206MHz.

Detailed schematic of the setup and plot (voltage vs frequency) will be updated shortly.

11245   Fri Apr 24 21:31:20 2015 KojiSummaryCDSautomatic mxstream resetting

We were too much annoyed by frequent stall of mxstream. We'll update the RCG when time comes (not too much future).

But for now, we need an automatic mxstream resetting.

I found there is such a script already.

/opt/rtcds/caltech/c1/scripts/cds/autoMX

So this script was registered to crontab on megatron.
It is invoked every 5 minutes.

# Auto MXstream reset when it fails
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /opt/rtcds/caltech/c1/scripts/cds/autoMX >> /opt/rtcds/caltech/c1/scripts/cds/autoMX.log

11252   Sun Apr 26 00:56:21 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

11267   Fri May 1 20:33:31 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Same thing again today. So I renamed the /etc/init/elog.conf so that it doesn't keep respawning bootlessly. Until then restart elog using the start script in /cvs/cds/caltech/elog/ as usual.

I'll let EQ debug when he gets back - probably we need to pause the elog respawn so that it waits until nodus is up for a few minutes before starting.

 Quote: Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki. After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

11268   Sun May 3 01:04:19 2015 ranaSummaryPEMSeismo signals are bad

https://ldas-jobs.ligo.caltech.edu/~max.isi/summary/day/20150502/pem/seismic/

Looks like some of our seismometers are oscillating, not mounted well, or something like that. No reason for them to be so different.

Which Guralp is where? And where are our accelerometers mounted?

11270   Mon May 4 10:21:09 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Attached is the schematic of the analog DFD and the plot showing the zero-crossing for a delay line length of 27cm. The bandwidth for the linear output signal obtained roughly matches what is expected from the length difference (370MHz) .

We could use a smaller cable to further increase our bandwidth. I propose we use this analog DFD to determine the range at which the frequency counter needs to be set and then use the frequency counter readout as the error signal for FOL.

11272   Mon May 4 12:42:34 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Koji suggested that I make a cosine fit for the curve instead of a linear fit.

I fit the data to $V(f) = A + B cos(2\pi f_{b}L/v)$
where L - cable length asymmetry (27 cm) , fb - beat frequency and v - velocity of light in the cable (2*10m/s)

The plot with the cosine fit is attached.

Fit coefficients (with 95% confidence bounds):
A =      0.4177  (0.3763, 0.4591)
B =       2.941  (2.89, 2.992)

11300   Mon May 18 14:46:20 2015 manasaSummaryGeneralDelay line frequency discriminator for FOL error signal

Measuring the voltage noise and frequency response of the Analog Delay-line Frequency Discriminator (DFD)

The schematic and an actual photo of the setup is shown below. The setup was checked to be physically sturdy with no loose connections or moving parts.

The voltage noise at the output of the DFD was measured using an SR785 signal analyzer while simultaneously monitoring the signal on an oscilloscope.

The noise at the output of the DFD was measured for no RF input and at several RF input frequencies including the zero crossing frequency and the optimum operating frequency of the DFD (20MHz).

The plot below show the voltage noise for different RF inputs to the DFD. It can be seen that the noise level is slightly lower at the zero crossing frequency where the amplitude noise is eliminated by the DFD.

I also did measurements to obtain the frequency response of the setup as the cable length difference has changed from the prior setup. The cable length difference is 21cm and the obtained linear signal at the output of the DFD extends over ~ 380MHz which is good enough for our purposes in FOL. A cosine fit to the data was done as before. //edit- Manasa: The gain of SR560 was set to 20 to obtain the data shown below//

Fit Coefficients (with 95% confidence bounds):
a =     -0.8763  (-1.076, -0.6763)
b =       3.771  (3.441, 4.102)

Data and matlab scripts are zipped and attached.

11368   Mon Jun 22 12:57:09 2015 ericqSummaryLSCX/Y green beat mode overlap measurement redone

I took measurements at the green beat setup on the PSL table, and found that our power / mode overlap situation is still consistent with what Koji and Manasa measured last September [ELOG 10492]. I also measured the powers at the BBPDs with the Ophir power meter.

Both mode overlaps are around 50%, which is fine.

The beatnote amplitudes at the BBPD outputs at a frequency of about 50MHz are -20.0 and -27.5 dBm for the X and Y beats, respectively. This is consistent with the measured optical power levels and a PD response of ~0.25 A/W at 532nm. The main reason for the disparity is that there is much more X green light than Y green light on the table (factor of ~20), and the greater amount of green PSL light on the Y BBPD (factor of ~3) does not quite make up for it.

One way to punch up the Y beat a little might be to adjust the pickoff optics. Of 25uW of Y arm transmitted green light incident on the polarizing beamsplitter that seperates the X and Y beams, only 13uW makes it to the Y BBPD, but this would only win us a couple dBms at most.

In any case, with the beat setup as it exists, it looks like we should design the next beatbox iteration to accept RF inputs of around -20 to -30 dBm.

In the style of the referenced ELOG, here are today's numbers.

            XARM   YARM
o BBPD DC output (mV)
 V_DARK:   +  1.0  + 2.2  V_PSL:    +  7.1  +21.3  V_ARM:    +165.0  + 8.2

o BBPD DC photocurrent (uA)
I_DC = V_DC / R_DC ... R_DC: DC transimpedance (2kOhm)  I_PSL:       3.6   10.7  I_ARM:      82.5    4.1

o Expected beat note amplitude I_beat_full = I1 + I2 + 2 sqrt(e I1 I2) cos(w t) ... e: mode overwrap (in power) I_beat_RF = 2 sqrt(e I1 I2) V_RF = 2 R sqrt(e I1 I2) ... R: RF transimpedance (2kOhm) P_RF = V_RF^2/2/50 [Watt]      = 10 log10(V_RF^2/2/50*1000) [dBm]
     = 10 log10(e I1 I2) + 82.0412 [dBm]
     = 10 log10(e) +10 log10(I1 I2) + 82.0412 [dBm]

for e=1, the expected RF power at the PDs [dBm]  P_RF:      -13.2  -21.5

o Measured beat note power (no alignment done)       P_RF:      -20.0  -27.5  [dBm] (53.0MHz and 46.5MHz)      e:       45.7   50.1  [%]                         

11370   Mon Jun 22 14:53:37 2015 ranaSummaryLSCX/Y green beat mode overlap measurement redone
• Why is there a factor of 20 power difference? Some of it is the IR laser power difference, but I thought that was just a factor of 4 in green.
• Why is the mode overlap only 50% and not more like 75%?
• IF we have enough PSL green power, we could do the Y-beat with a 80/20 instead of a 50/50 and get better SNR.
• The FFD-100 response is more like 0.33 A/W at 532 nm, not 0.25 A/W.

In any case, this signal difference is not big, so we should not need a different amplifier chain for the two signals. The 20 dB of amplification in the BeatBox was a fine way, but not great in circuit layout.

The BBPD has an input referred current noise of 10 pA/rHz and a transimpedance of 2 kOhm, so an output voltage noise of 20 nV/rHz (into 50 Ohms). This would be matched by an Amp with NF = 26 dB, which is way worse than anything we could bur from mini-circuits, so we should definitely NOT use anything like the low-noise, low output power amps used currently (e.g. ZFL-1000LN....never, ever use these for anything). We should use a single ZHL-3A-S (G = 25 dB, NF < 6 dB, Max Out = 30 dBm) for each channel (and nothing else) before driving the cables over to the LSC rack into the aLIGO demod board. I just ordered two of these now.

11384   Tue Jun 30 11:33:00 2015 JamieSummaryCDSprepping for CDS upgrade

This is going to be a big one.  We're at version 2.5 and we're going to go to 2.9.3.

RCG components that need to be updated:

• mbuf kernel module
• mx_stream driver
• iniChk.pl script
• daqd
• nds

Supporting software:

• EPICS 3.14.12.2_long
• ldas-tools (framecpp) 1.19.32-p1
• libframe 8.17.2
• gds 2.16.3.2
• fftw 3.3.2

Things to watch out for:

• RTS 2.6:
• raw minute trend frame location has changed (CRC-based subdirectory)
• new kernel patch
• RTS 2.7:
• supports "commissioning frames", which we will probably not utilize.  need to make sure that we're not writing extra frames somewhere
• RTS 2.8:
• "slow" (EPICS) data from the front-end processes is acquired via DAQ network, and not through EPICS.  This will increase traffic on the DAQ lan.  Hopefully this will not be an issue, and the existing network infrastructure can handle it, but it should be monitored.
11390   Wed Jul 1 19:16:21 2015 JamieSummaryCDSCDS upgrade in progress

## The CDS upgrade is now underway

Here's what's happened so far:

• Installed and linked in all the RTS supporting software packages in /opt/rtapps (only on front end machines and fb):
controls@c1lsc ~ 2$find /opt/rtapps/ -mindepth 1 -maxdepth 1 -type l -ls 12582916 0 lrwxrwxrwx 1 controls 1001 12 Jul 1 13:16 /opt/rtapps/gds -> gds-2.16.3.2 12603452 0 lrwxrwxrwx 1 controls 1001 10 Jul 1 13:17 /opt/rtapps/fftw -> fftw-3.3.2 12603451 0 lrwxrwxrwx 1 controls 1001 15 Jul 1 13:16 /opt/rtapps/libframe -> libframe-8.17.2 12603450 0 lrwxrwxrwx 1 controls 1001 13 Jul 1 13:16 /opt/rtapps/libmetaio -> libmetaio-8.2 12582915 0 lrwxrwxrwx 1 controls 1001 34 Jul 1 15:24 /opt/rtapps/framecpp -> ldas-tools-1.19.32-p1/linux-x86_64 12582914 0 lrwxrwxrwx 1 controls 1001 20 Jul 1 13:15 /opt/rtapps/epics -> epics-3.14.12.2_long • Checked out the RTS source for the version we'll be using: 2.9.4 /opt/rtcds/rtscore/tags/advLigoRTS-2.9.4 • built and installed all of the RTS components: • mbuf • mx_stream • daqd • nds • awgtpman • mx_stream is not working. Unknown why. It won't start on the front end machines (only tested on c1lsc so far) with the following error: controls@c1lsc ~ 1$ /opt/rtcds/caltech/c1/target/fb/mx_stream -s c1x04 c1lsc c1ass c1oaf c1cal -d fb:0
send len = 263596
mx_connect failed Remote Endpoint is Closed
controls@c1lsc ~ 1$ Have contact Keith T. and Rolf B. for backup. This is a blocker, since this is what ferries the data from the front ends. • Rebuilt almost all models. This was good. Initially nothing would compile because of IPC creation errors, so I moved the old chans/ipc/C1.ipc file out of the way and generated a new one and then everything compiled (of course senders have to be compiled before receivers). I only had to fix a couple of things in the models themselves: • c1ioo - unterminated FiltCtrl inputs • C1_SUS_SINGLE_CONTROL - unterminated FiltCtrl inputs • c1oaf - bad part named "STATIC". There is some hacky namespace stuff going on in the RCG. I was able to just explode that part and it now works. • c1lsc - unterminated FiltCtrl inputs Haven't installed or tried to run anything yet, but the fact they compile is good. Some models are not compiling because they have C code in src blocks that are throwing errors: • c1lsc • c1cal It shouldn't be too hard to fix whatever is causing those compile errors. That's it for today. Will pick up again first thing tomorrow 11392 Tue Jul 7 17:22:16 2015 JessicaSummary Time Delay in ALS Cables I measured the transfer functions in the delay line cables, and then calculated the time delay from that. The first cable had a time delay of 1272 ns and the second had a time delay of 1264 ns. Below are the plots I created to calculate this. There does seem to be a pattern in the residual plots however, which was not expected. The R-Square parameter was very close to 1 for both fits, indicating that the fit was good. 11393 Tue Jul 7 18:27:54 2015 JamieSummaryCDSCDS upgrade: progress! After a couple of days of struggle, I made some progress on the CDS upgrade today: ## Front end status: • RTS upgraded to 2.9.4, and linked in as "release": /opt/rtcds/rtscore/release -> tags/advLigoRTS-2.9.4 • mbuf kernel module built installed • All front ends have been rebooted with the latest patched kernel (from 2.6 upgrade) • All models have been rebuilt, installed, restarted. Only minor model issues had to be corrected (unterminated unused inputs mostly). • awgtpman rebuilt, and installed/running on all front-ends • open-mx upgraded to 1.5.2: /opt/open-mx -> open-mx-1.5.2 • All front ends running latest version of mx_stream, built against 2.9.4 and open-mx-1.5.2. We have new GDS overview screens for the front end models: It's possible that our current lack of IRIG-B GPS distribution means that the 'TIM' status bit will always be red on the IOP models. Will consult with Rolf. There are other new features in the front ends that I can get into later. ## DAQ (fb) status: • daqd and nds rebuilt against 2.9.4, both now running on fb 40m daqd compile flags: cd src/daqd ./configure --enable-debug --disable-broadcast --without-myrinet --with-mx --enable-local-timing --with-epics=/opt/rtapps/epics/base --with-framecpp=/opt/rtapps/framecpp make make clean install daqd /opt/rtcds/caltech/c1/target/fb/ However, daqd has unfortunately been very unstable, and I've been trying to figure out why. I originally thought it was some sort of timing issue, but now I'm not so sure. I had to make the following changes to the daqdrc: set gps_leaps = 820108813 914803214 1119744016; That enumerates some list of leap seconds since some time. Not sure if that actually does anything, but I added the latest leap seconds anyway: set symm_gps_offset=315964803; This updates the silly, arbitrary GPS offset, that is required to be correct when not using external GPS reference. Finally, the last thing I did that finally got it running stably was to turn off all trend frame writing: # start trender; # start trend-frame-saver; # sync trend-frame-saver; # start minute-trend-frame-saver; # sync minute-trend-frame-saver; # start raw_minute_trend_saver; For whatever reason, it's the trend frame writing that that was causing things daqd to fall over after a short amount of time. I'll continue investigating tomorrow. We still have a lot of cleanup burt restores, testing, etc. to do, but we're getting there. 11395 Wed Jul 8 17:46:20 2015 JessicaSummaryGeneralUpdated Time Delay Plots I re-measured the transfer function for Cable B, because the residuals in my previous post for cable B indicated a bad fit. I also realized I had made a mistake in calculating the time delay, and calculated more reasonable time delays today. Cable A had a delay of 202.43 +- 0.01 ns. Cable B had a delay of 202.44 +- 0.01 ns. 11396 Wed Jul 8 20:37:02 2015 JamieSummaryCDSCDS upgrade: one step forward, two steps back After determining yesterday that all the daqd issues were coming from the frame writing, I started to dig into it more today. I also spoke to Keith Thorne, and got some good suggestions from Gerrit Kuhn at GEO. I realized that it probably wasn't the trend writing per se, but that turning on more writing to disk was causing increased load on daqd, and consequently on the system itself. With more frame writing turned on the memory consuption increased to the point of maxing out the physical RAM. The system the probably starting swaping, which certainly would have choked daqd. I noticed that fb only had 4G of RAM, which Keith suggested was just not enough. Even if the memory consumption of daqd has increased significantly, it still seems like 4G would not be enough. I opened up fb only to find that fb actually had 8G of RAM installed! Not sure what happend to the other 4G, but somehow they were not visible to the system. Koji and I eventually determined, via some frankenstein operations with megatron, that the RAM was just dead. We then pulled 4G of RAM from megatron and replaced the bad RAM in fb, so that fb now has a full 8G of RAM . Unfortunately, when we got fb fully back up and running we found that fb is not able to see any of the other hosts on the data concentrator network . mx_info, which displays the card and network status for the myricom myrinet fiber card, shows: MX Version: 1.2.16 MX Build: controls@fb:/opt/src/mx-1.2.16 Tue May 21 10:58:40 PDT 2013 1 Myrinet board installed. The MX driver is configured to support a maximum of: 8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host =================================================================== Instance #0: 299.8 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0 Status: Running, P0: Wrong Network Network: Myrinet 10G MAC Address: 00:60:dd:46:ea:ec Product code: 10G-PCIE-8AL-S Part number: 09-03916 Serial number: 352143 Mapper: 00:60:dd:46:ea:ec, version = 0x63e745ee, configured Mapped hosts: 1 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:46:ea:ec fb:0 D 0,0 Note that all front end machines should be listed in the table at the bottom, and they're not. Also note the "Wrong Network" note in the Status line above. It appears that the card has maybe been initialized in a bad state? Or Koji and I somehow disturbed the network when we were cleaning up things in the rack. "sudo /etc/init.d/mx restart" on fb doesn't solve the problem. We even rebooted fb and it didn't seem to help. In any event, we're back to no data flow. I'll pick up again tomorrow. 11397 Wed Jul 8 21:02:02 2015 JamieSummaryCDSCDS upgrade: another step forward, so we're back to where we started (plus a bit?) Koji did a bit of googling to determine that 'Wrong Network' status message could be explained by the fb myrinet operating in the wrong mode: (This was the useful link to track down the issue (KA))  Network: Myrinet 10G I didn't notice it before, but we should in fact be operating in "Ethernet" mode, since that's the fabric we're using for the DC network. Digging a bit deeper we found that the new version of mx (1.2.16) had indeed been configured with a different compile option than the 1.2.15 version had: controls@fb ~ 0$ grep '$./configure' /opt/src/mx-1.2.15/config.log$ ./configure --enable-ether-mode --prefix=/opt/mx
controls@fb ~ 0$grep '$ ./configure' /opt/src/mx-1.2.16/config.log
$./configure --enable-mx-wire --prefix=/opt/mx-1.2.16 controls@fb ~ 0$

So that would entirely explain the problem.  I re-linked mx to the older version (1.2.15), reloaded the mx drivers, and everything showed up correctly:

controls@fb ~ 0$/opt/mx/bin/mx_info MX Version: 1.2.12 MX Build: root@fb:/root/mx-1.2.12 Mon Nov 1 13:34:38 PDT 2010 1 Myrinet board installed. The MX driver is configured to support a maximum of: 8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host =================================================================== Instance #0: 299.8 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0 Status: Running, P0: Link Up Network: Ethernet 10G MAC Address: 00:60:dd:46:ea:ec Product code: 10G-PCIE-8AL-S Part number: 09-03916 Serial number: 352143 Mapper: 00:60:dd:46:ea:ec, version = 0x00000000, configured Mapped hosts: 6 ROUTE COUNT INDEX MAC ADDRESS HOST NAME P0 ----- ----------- --------- --- 0) 00:60:dd:46:ea:ec fb:0 1,0 1) 00:25:90:0d:75:bb c1sus:0 1,0 2) 00:30:48:be:11:5d c1iscex:0 1,0 3) 00:30:48:d6:11:17 c1iscey:0 1,0 4) 00:30:48:bf:69:4f c1lsc:0 1,0 5) 00:14:4f:40:64:25 c1ioo:0 1,0 controls@fb ~ 0$

The front end hosts are also showing good omx info (even though they had been previously as well):

controls@c1lsc ~ 0$/opt/open-mx/bin/omx_info Open-MX version 1.5.2 build: controls@fb:/opt/src/open-mx-1.5.2 Tue May 21 11:03:54 PDT 2013 Found 1 boards (32 max) supporting 32 endpoints each: c1lsc:0 (board #0 name eth1 addr 00:30:48:bf:69:4f) managed by driver 'igb' Peer table is ready, mapper is 00:30:48:d6:11:17 ================================================ 0) 00:30:48:bf:69:4f c1lsc:0 1) 00:60:dd:46:ea:ec fb:0 2) 00:25:90:0d:75:bb c1sus:0 3) 00:30:48:be:11:5d c1iscex:0 4) 00:30:48:d6:11:17 c1iscey:0 5) 00:14:4f:40:64:25 c1ioo:0 controls@c1lsc ~ 0$

This got all the mx_stream connections back up and running.

Unfortunately, daqd is back to being a bit flaky.  With all frame writing enabled we saw daqd crash again.  I then shut off all trend frame writing and we're back to a marginally stable state: we have data flowing from all front ends, and full frames are being written, but not trends.

I'll pick up on this again tomorrow, and maybe try to rebuild the new version of mx with the proper flags.

11398   Thu Jul 9 13:26:47 2015 JamieSummaryCDSCDS upgrade: new mx 1.2.16 installed

I rebuilt/installed mx 1.2.16 to use "ether-mode", instead of the default MX-10G:

controls@fb /opt/src/mx-1.2.16 0$./configure --enable-ether-mode --prefix=/opt/mx-1.2.16 ... controls@fb /opt/src/mx-1.2.16 0$ make
..
controls@fb /opt/src/mx-1.2.16 0$make install ... I then rebuilt/installed daqd so that it properly linked against the updated mx install: controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ ./configure --enable-debug --disable-broadcast --without-myrinet --with-mx --with epics=/opt/rtapps/epics/base --with-framecpp=/opt/rtapps/framecpp --enable-local-timing ... controls@fb /opt/rtcds/rtscore/release/src/daqd 0$make ... controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ install daqd /opt/rtcds/caltech/c1/target/fb/

It's now back to running and receiving data from the front ends (still not stable yet, though).

11400   Thu Jul 9 16:50:13 2015 JamieSummaryCDSCDS upgrade: if all else fails try throwing metal at the problem

I roped Rolf into coming over and adding his eyes to the problem.  After much discussion we couldn't come up with any reasonable explanation for the problems we've been seeing other than daqd just needing a lot more resources that it did before.  He said he had some old Sun SunFire X4600s from which we could pilfer memory.  I went over to Downs and ripped all the CPU/memory cards out of one of his machines and stuffed them into fb:

fb now has 8 CPU and 16G of RAM

Unfortunately, this is still not enough.  Or at least it didn't solve the problem; daqd is showing the same instabilities, falling over a couple of minutes after I turn on trend frame writing.  As always, before daqd fails it starts spitting out the following to the logs:

[Thu Jul  9 16:37:09 2015] main profiler warning: 0 empty blocks in the buffer

followed by lines like:

[Thu Jul  9 16:37:27 2015] GPS MISS dcu 44 (ASX); dcu_gps=1120520264 gps=1120519812

right before it dies.

I'm no longer convinced that this is a resource issue, though, judging by the resource usage right before the crash:

top - 16:47:32 up 48 min,  5 users,  load average: 0.91, 0.62, 0.61
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.9%us,  0.9%sy,  0.0%ni, 89.1%id,  0.9%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  15952104k total, 13063468k used,  2888636k free,   138648k buffers
Swap:  1023996k total,        0k used,  1023996k free,  7672292k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12016 controls  20   0 8098m 4.4g 104m S  106 29.1   6:45.79 daqd
4953 controls  20   0 53580 6092 5096 S    0  0.0   0:00.04 nds

Load average less than 1 per CPU, plenty of free memory (~3G free, 0 swap), no waiting for IO (0.9%wa), etc.  daqd is utilizing lots of  threads, which should be spread across many cpus, so even the >100%CPU should be ok.   I'm at a loss...

11402   Mon Jul 13 01:11:14 2015 JamieSummaryCDSCDS upgrade: current assessment

daqd is still behaving unstably.  It's still unclear what the issue is.

The current failures look like disk IO contention.  However, it's hard to see any evidince of daqd is suffering from large IO wait while it's failing.

The frame size itself is currently smaller than it was before the upgrade:

controls@fb /frames/full 0$ls -alth 11190 | head total 369G drwxr-xr-x 321 controls controls 36K Jul 12 22:20 .. drwxr-xr-x 2 controls controls 268K Jun 23 06:06 . -rw-r--r-- 1 controls controls 67M Jun 23 06:06 C-R-1119099984-16.gwf -rw-r--r-- 1 controls controls 68M Jun 23 06:06 C-R-1119099968-16.gwf -rw-r--r-- 1 controls controls 69M Jun 23 06:05 C-R-1119099952-16.gwf -rw-r--r-- 1 controls controls 69M Jun 23 06:05 C-R-1119099936-16.gwf -rw-r--r-- 1 controls controls 67M Jun 23 06:05 C-R-1119099920-16.gwf -rw-r--r-- 1 controls controls 68M Jun 23 06:05 C-R-1119099904-16.gwf -rw-r--r-- 1 controls controls 68M Jun 23 06:04 C-R-1119099888-16.gwf controls@fb /frames/full 0$ ls -alth 11208 | head
total 17G
drwxr-xr-x   2 controls controls  20K Jul 13 01:00 .
-rw-r--r--   1 controls controls  45M Jul 13 01:00 C-R-1120809632-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 01:00 C-R-1120809408-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809392-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809376-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809360-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:55 C-R-1120809344-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:55 C-R-1120809328-16.gwf
controls@fb /frames/full 0$ This would seem to indicate that it's not an increase in frame size that's to blame. Because slow data is now transported to daqd over the MX data concentrator network rather than via EPICS (RTS 2.8), there is more network on the MX network. I note also that the channel lists have increased in size: controls@fb /opt/rtcds/caltech/c1/chans/daq 0$ ls -alt archive/C1LSC* | head -20
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 18:21 archive/C1LSC_150706_182146.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 18:16 archive/C1LSC_150706_181603.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 16:09 archive/C1LSC_150706_160946.ini
-rw-r--r-- 1 4294967294 4294967294  43366 Jul  1 16:05 archive/C1LSC_150701_160519.ini
-rw-r--r-- 1 4294967294 4294967294  43366 Jun 25 15:47 archive/C1LSC_150625_154739.ini
...

I would have thought, though, that data transmission errors would show up in the daqd status bits.

11404   Mon Jul 13 18:12:50 2015 JamieSummaryCDSCDS upgrade: left running in semi-stable configuration

I have been watching daqd all day and I don't feel particularly closer to understanding what the issues are.  However, things are

Interestingly, though, the stability appears highly variable at the moment.  This morning, daqd was very unstable and was crashing within a couple of minutes of starting.  However this afternoon, things seemed much more stable.  As of this moment, daqd has been running for for 25 minutes now, writing full frames as well as minute and second trends (no minute_raw), without any issues.  What has changed?

To reiterate, I have been closing watching disk IO to /frames.  I see no indication that there is any disk contention while daqd is failing.  It's still possible, though, that there are disk IO issues affecting daqd at a level that is not readily visible.  From dstat, the frame writes are visible, but nothing else.

I have made one change that could be positively affecting things right now: I un-exported /frames from NFS.  This eliminates anything external from reading /frames over the network.  In particular, it also shuts off the transfer of frames to LDAS.  Since I've done this, daqd has appeared to be more stable.  It's NOT totally stable, though, as the instance that I described above did eventually just die after 43 minutes, as I was writing this.

In any event, as things are currently as stable as I've seen them, I'm leaving it running in this configuration for the moment, with the following relevant daqdrc parameters:

start main 16;
start frame-saver;
sync frame-saver;
start trender 60 60;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start profiler;
start trend profiler;
11406   Tue Jul 14 09:08:37 2015 JamieSummaryCDSCDS upgrade: left running in semi-stable configuration

Overnight daqd restarted itself only about twice an hour, which is an improvement:

controls@fb /opt/rtcds/caltech/c1/target/fb 0$tail logs/restart.log daqd: Tue Jul 14 03:13:50 PDT 2015 daqd: Tue Jul 14 04:01:39 PDT 2015 daqd: Tue Jul 14 04:09:57 PDT 2015 daqd: Tue Jul 14 05:02:46 PDT 2015 daqd: Tue Jul 14 06:01:57 PDT 2015 daqd: Tue Jul 14 06:43:18 PDT 2015 daqd: Tue Jul 14 07:02:19 PDT 2015 daqd: Tue Jul 14 07:58:16 PDT 2015 daqd: Tue Jul 14 08:02:44 PDT 2015 daqd: Tue Jul 14 09:02:24 PDT 2015 Un-exporting /frames might have helped a bit. However, the problem is obviously still not fixed. 11408 Tue Jul 14 10:28:02 2015 ericqSummaryCDSCDS upgrade: left running in semi-stable configuration There remains a pattern to some of the restarts, the following times are all reported as restart times. (There are others in between, however.) daqd: Tue Jul 14 00:02:48 PDT 2015 daqd: Tue Jul 14 01:02:32 PDT 2015 daqd: Tue Jul 14 03:02:33 PDT 2015 daqd: Tue Jul 14 05:02:46 PDT 2015 daqd: Tue Jul 14 06:01:57 PDT 2015 daqd: Tue Jul 14 07:02:19 PDT 2015 daqd: Tue Jul 14 08:02:44 PDT 2015 daqd: Tue Jul 14 09:02:24 PDT 2015 daqd: Tue Jul 14 10:02:03 PDT 2015 Before the upgrade, we suffered from hourly crashes too: daqd_start Sun Jun 21 00:01:06 PDT 2015 daqd_start Sun Jun 21 01:03:47 PDT 2015 daqd_start Sun Jun 21 02:04:04 PDT 2015 daqd_start Sun Jun 21 03:04:35 PDT 2015 daqd_start Sun Jun 21 04:04:04 PDT 2015 daqd_start Sun Jun 21 05:03:45 PDT 2015 daqd_start Sun Jun 21 06:02:43 PDT 2015 daqd_start Sun Jun 21 07:04:42 PDT 2015 daqd_start Sun Jun 21 08:04:34 PDT 2015 daqd_start Sun Jun 21 09:03:30 PDT 2015 daqd_start Sun Jun 21 10:04:11 PDT 2015 So, this isn't neccesarily new behavior, just something that remains unfixed. 11409 Tue Jul 14 11:57:27 2015 jamieSummaryCDSCDS upgrade: left running in semi-stable configuration  Quote: There remains a pattern to some of the restarts, the following times are all reported as restart times. (There are others in between, however.) daqd: Tue Jul 14 00:02:48 PDT 2015 daqd: Tue Jul 14 01:02:32 PDT 2015 daqd: Tue Jul 14 03:02:33 PDT 2015 daqd: Tue Jul 14 05:02:46 PDT 2015 daqd: Tue Jul 14 06:01:57 PDT 2015 daqd: Tue Jul 14 07:02:19 PDT 2015 daqd: Tue Jul 14 08:02:44 PDT 2015 daqd: Tue Jul 14 09:02:24 PDT 2015 daqd: Tue Jul 14 10:02:03 PDT 2015 Before the upgrade, we suffered from hourly crashes too: daqd_start Sun Jun 21 00:01:06 PDT 2015 daqd_start Sun Jun 21 01:03:47 PDT 2015 daqd_start Sun Jun 21 02:04:04 PDT 2015 daqd_start Sun Jun 21 03:04:35 PDT 2015 daqd_start Sun Jun 21 04:04:04 PDT 2015 daqd_start Sun Jun 21 05:03:45 PDT 2015 daqd_start Sun Jun 21 06:02:43 PDT 2015 daqd_start Sun Jun 21 07:04:42 PDT 2015 daqd_start Sun Jun 21 08:04:34 PDT 2015 daqd_start Sun Jun 21 09:03:30 PDT 2015 daqd_start Sun Jun 21 10:04:11 PDT 2015 So, this isn't neccesarily new behavior, just something that remains unfixed. That's interesting, that we're still seeing those hourly crashes. We're not writing out the full set of channels, though, and we're getting more failures than just those at the hour, so we're still suffering. 11412 Tue Jul 14 16:51:01 2015 JamieSummaryCDSCDS upgrade: problem is not disk access I think I have now determined once and for all that the daqd problems are NOT due to disk IO contention. I have mounted a tmpfs at /frames/tmp and have told daqd to write frames there. The tmpfs exists entirely in RAM. There is essentially zero IO wait for such a filesystem, so daqd should never have trouble writing out the frames. But yet daqd continues to fail with the "0 empty blocks in the buffer" warnings. I've been down a rabbit hole. 11414 Tue Jul 14 17:14:23 2015 EveSummarySummary PagesFuture summary pages improvements Here is a list of suggested improvements to the summary pages. Let me know if there's something you'd like for me to add to this list! • A lot of plots are missing axis labels and titles, and I often don't know what to call these labels. I could use some help with this. • Check the weather and vacuum tabs to make sure that we're getting the expected output. Set the axis labels accordingly. • Investigate past periods of missing data on DataViewer to see if the problem was with the data requisition process, the summary page production process, or something else. • Based on trends in data over the past three months, set axis ranges accordingly to encapsulate the full data range. • Create a CDS tab to store statistics of our digital systems. We will use the CDS signals to determine when the digital system is running and when the minute trend is missing. This will allow us to exclude irrelevant parts of the data. • Provide duty ratio statistics for the IMC. • Set triggers for certain plots. For example, for channels C1:LSC-XARM OUT DQ and page 4 LIGO-T1500123–v1 C1:LSC-YARM OUT DQ to be plotted in the Arm LSC Control signals figures, C1:LSCTRX OUT DQ and C1:LSC-TRY OUT DQ must be higher than 0.5, thus acting as triggers. • Include some flag or other marking indicating when data is not being represented at a certain time for specific plots. • Maybe include some cool features like interactive plots. 11415 Wed Jul 15 13:19:14 2015 JamieSummaryCDSCDS upgrade: reducing mx end-points as last ditch effort I tried one last thing, suggested by Keith and Gerrit. I tried reducing the number of mx end-points on fb to zero, which should reduce the total number of fb threads, in the hope that the extra threads were causing the chokes. On Tue, Jul 14 2015, Keith Thorne <kthorne@ligo-la.caltech.edu> wrote: > Assumptions > 1) Before the upgrade (from RCG 2.6?), the DAQ had been working, reading out front-ends, writing frames trends > 2) In upgrading to RCG 2.9, the mx start-up on the frame builder was modified to use multiple end-points > (i.e. /etc/init.d/mx has a line like > # 1 10G card - X2 > MX_MODULE_PARAMS="mx_max_instance=1 mx_max_endpoints=16$MX_MODULE_PARAMS"
>  (This can be confirmed by the daqd log file with lines at the top like
> 263596
> MX has 16 maximum end-points configured
> 2 MX NICs available
> [Fri Jul 10 16:12:50 2015] ->4: set thread_stack_size=10240
> [Fri Jul 10 16:12:50 2015] new threads will be created with the stack of size 10
> 240K
>
> If this is the case, the problem may be that the additional thread on the frame-builder (one per end-point) take up so many slots on the 8-core
> frame-builder that they interrupt the frame-writing thread, thus preventing the main buffer from being emptied.
>
> One could go back to a single end-point. This only helps keep restart of front-end A from hiccuping DAQ for front-end B.
>
> You would have to remove code on front-ends (/etc/init.d/mx_stream) that chooses endpoints. i.e.
> # find line number in rtsystab. Use that to mx_stream slot on card (0-15)
> line_num=grep -v ^# /etc/rtsystab | grep --perl-regexp -n "^${hostname}\s" | se > d 's/^$$[0-9]*$$:.*/\1/g' > line_off=$(expr $line_num - 1) > epnum=$(expr $line_off % 2) > cnum=$(expr $line_off / 2) > > start-stop-daemon --start --quiet -b -m --pidfile /var/log/mx_stream0.pid --exec /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream -- -e 0 -r "$epnum" -W 0 -w 0 -s "$sys" -d x2daqdc0:$cnum -l /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream_logs/\$hostname.log

As per Keith's suggestion, I modified the mx startup script to only initialize a single endpoint, and I modified the mx_stream startup to point them all to endpoint 0.  I verified that indeed daqd was a single MX end-point:

MX has 1 maximum end-points configured

It didn't help.  After 5-10 minutes daqd crashes with the same "0 empty blocks" messages.

I should also mention that I'm pretty sure the start of these messages does not seem coincident with any frame writing to disk; further evidence that it's not a disk IO issue.

Keith is looking at the system now, so we if he can see anything obvious.  If not, I will start reverting to 2.5.

11417   Wed Jul 15 18:19:12 2015 JamieSummaryCDSCDS upgrade: tentative stabilty?

Keith Thorne provided his eyes on the situation today and had some suggestions that might have helped things

Reorder ini file list in master file.  Apparently the EDCU.ini file (C0EDCU.ini in our case), which describes EPICS subscriptions to be recorded by the daq, now has to be specified *after* all other front end ini files.  It's unclear why, but it has something to do with RTS 2.8 which changed all slow channels to be transported over the mx network.  This alone did not fix the problem, though.

Increase second trend frame size.  Interestingly, this might have been the key.  The second trend frame size was increased to 600 seconds:

start trender 600 60;

The two numbers are the lengths in seconds for the second and minute trends respectively.  They had been set to "60 60", but Keith suggested that longer second trend frames are better, for whatever reason.  It seems he may be right, given that daqd has been running and writing full and trend frames for 1.5 hours now without issue.

As I'm writing this, though, the daqd just crashed again.  I note, though, that it's right after the hour, and immediately following writing out a one hour minute trend file.  We've been seeing these hour, on the hour, crashes of daqd for quite a while now.  So maybe this is nothing new.  I've actually been wondering if the hourly daqd crashes were associated with writing out the minute trend frames, and I think we might have more evidence to point to that.

If increasing the size of the second trend frames from 60 seconds (35M) to 600 seconds (70M) made a difference in stability, could there be an issue since writing out files that are smaller than some value?  The full frames are 60M, and the minute trends are 35M.

11427   Sat Jul 18 15:37:19 2015 JamieSummaryCDSCDS upgrade: current status

So it appears we have found a semi-stable configuration for the DAQ system post upgrade:

Here are the issues:

## daqd

dadq is running mostly stably for the moment, although it still crashes at the top of every hour (see below).  Here are some relevant points of about the current configuration:

• recording data from only a subset of front-ends, to reduce the overall load:
• c1x01
• c1scx
• c1x02
• c1sus
• c1mcs
• c1pem
• c1x04
• c1lsc
• c1ass
• c1x05
• c1scy
• 16 second main buffer:
start main 16;
• trend lengths: second: 600, minute: 60
start trender 600 60;
• writing to frames:
• full
• second
• minute
• (NOT raw minute trends)
• frame compression ON

This elliminates most of the random daqd crashing.  However, daqd still crashes at the top of every hour after writing out the minute trend frame. Still unclear what the issue is, but Keith is investigating.  In some sense this is no worse that where we were before the upgrade, since daqd was also crashing hourly then.  It's still crappy, though, so hopefully we'll figure something out.

The inittab on fb automatically restarts daqd after it crashes, and monit on all of the front ends automatically restarts the mx_stream processes.

## front ends

The front end modules are mostly running fine.

One issue is that the execution times seem to have increased a bit, which is problematic for models that were already on the hairy edge.  For instance, the rough aversage for c1sus has some from ~48us to 50us.  This is most problematic for c1cal, which is now running at ~66us out of 60, which is obviously untenable.  We'll need to reduce the load in c1cal somehow.

All other front end models seem to be working fine, but a full test is still needed.

There was an issue with the DACs on c1sus, but I rebooted and everything came up fine, optics are now damped:

11437   Wed Jul 22 22:06:42 2015 EveSummarySummary PagesFuture summary pages improvements

- CDS Tab

We want to monitor the status of the digital control system.

1st plot
Title: EPICS DAQ Status
I wonder we can plot the binary numbers as statuses of the data acquisition for the realtime codes.
We want to use the status indicators. Like this:
https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary/day/20150722/plots/H1-MULTI_A8CE50_SEGMENTS-1121558417-86400.png

channels:
C1:DAQ-DC0_C1X04_STATUS
C1:DAQ-DC0_C1LSC_STATUS
C1:DAQ-DC0_C1ASS_STATUS
C1:DAQ-DC0_C1OAF_STATUS
C1:DAQ-DC0_C1CAL_STATUS

C1:DAQ-DC0_C1X02_STATUS
C1:DAQ-DC0_C1SUS_STATUS
C1:DAQ-DC0_C1MCS_STATUS
C1:DAQ-DC0_C1RFM_STATUS
C1:DAQ-DC0_C1PEM_STATUS

C1:DAQ-DC0_C1X03_STATUS
C1:DAQ-DC0_C1IOO_STATUS
C1:DAQ-DC0_C1ALS_STATUS

C1:DAQ-DC0_C1X01_STATUS
C1:DAQ-DC0_C1SCX_STATUS
C1:DAQ-DC0_C1ASX_STATUS

C1:DAQ-DC0_C1X05_STATUS
C1:DAQ-DC0_C1SCY_STATUS
C1:DAQ-DC0_C1TST_STATUS

1st plot
Title: IOP Fast Channel DAQ Status
These have two bits each. How can we handle it?
If we need to shrink it to a single bit take "AND" of them.
C1:FEC-40_FB_NET_STATUS (legend: c1x04, if a legend placable)
C1:FEC-20_FB_NET_STATUS (legend: c1x02)
C1:FEC-33_FB_NET_STATUS (legend: c1x03)
C1:FEC-19_FB_NET_STATUS (legend: c1x01)
C1:FEC-46_FB_NET_STATUS (legend: c1x05)

3rd plot
Title C1LSC CPU Meters
channels:
C1:FEC-40_CPU_METER (legend: c1x04)
C1:FEC-42_CPU_METER (legend: c1lsc)
C1:FEC-48_CPU_METER (legend: c1ass)
C1:FEC-22_CPU_METER (legend: c1oaf)
C1:FEC-50_CPU_METER (legend: c1cal)
The range is from 0 to 75 except for c1oaf that could go to 500.
Can we plot c1oaf with the value being devided by 8? (Then the legend should be c1oaf /8)

4th plot
Title C1SUS CPU Meters
channels:
C1:FEC-20_CPU_METER (legend: c1x02)
C1:FEC-21_CPU_METER (legend: c1sus)
C1:FEC-36_CPU_METER (legend: c1mcs)
C1:FEC-38_CPU_METER (legend: c1rfm)
C1:FEC-39_CPU_METER (legend: c1pem)
The range is be from 0 to 75 except for c1pem that could go to 500.
Can we plot c1pem with the value being devided by 8? (Then the legend should be c1pem /8)

5th plot
Title C1IOO CPU Meters
channels:
C1:FEC-33_CPU_METER (legend: c1x03)
C1:FEC-34_CPU_METER (legend: c1ioo)
C1:FEC-28_CPU_METER (legend: c1als)
The range is be from 0 to 75.

6th plot
Title C1ISCEX CPU Meters
channels:
C1:FEC-19_CPU_METER (legend: c1x01)
C1:FEC-45_CPU_METER (legend: c1scx)
C1:FEC-44_CPU_METER (legend: c1asx)
The range is be from 0 to 75.

7th plot
Title C1ISCEY CPU Meters
channels:
C1:FEC-46_CPU_METER (legend: c1x05)
C1:FEC-47_CPU_METER (legend: c1scy)
C1:FEC-91_CPU_METER (legend: c1tst)
The range is be from 0 to 75.

=====================

IOO

We want a duty ratio plot for the IMC. C1:IOO-MC_TRANS_SUM >1e4 is the good period.

Duty ratio plot looks like the right plot of the following link
https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary/day/20150722/lock/segments/

=====================

SUS: OPLEV

OL_PIT_INMON and OL_YAW_INMON are good for the slow drift monitor.
But their sampling rate is too slow for the PSDs.
Can you use
C1:SUS-ETM_OPLEV_PERROR
C1:SUS-ETM_OPLEV_YERROR
etc...
For the PSDs? They are 2kHz sampling DQ channels. You would be able to plot
it up to ~1kHz. In fact, we want to monitor the PSD from 100mHz to 1kHz.
How can you set up the resolution (=FFT length)?

=====================

LSC / ASC / ALS tabs

Let's make new tabs LSC, ASC, and ALS

LSC:

We should have a plot for
C1:LSC-TRX_OUT_DQ
C1:LSC-TRY_OUT_DQ
C1:LSC-POPDC_OUT_DQ
It's OK to use the minute trend for now.
You can check the range using dataviewer.

ASC:

Let's use
C1:SUS_MC1_ASCPIT_OUT16 (legend: IMC WFS)
C1:ASS-XARM_ITM_YAW_OSC_CLKGAIN (legend: XARM ASS)
C1:ASS-YARM_ITM_YAW_OSC_CLKGAIN (legend: YARM ASS)
C1:ASX-XARM_M1_PIT_OSC_CLKGAIN (legend: XARM Green ASS)
as the status indicators. There is no YARM Green ASS yet.

ALS:

Title: ALS Green transmission
We want a time series of
ALS-TRX_OUT16
ALS-TRY_OUT16

Title: ALS Green beatnote
Another time series
ALS-BEATX_FINE_Q_MON
ALS-BEATY_FINE_Q_MON

Title: Frequency monitor
We have frequency counter outputs, but I have to talk to Eric to know the channel names

11441   Thu Jul 23 20:57:15 2015 JessicaSummaryGeneralApplying Pre-filter to data before IIR Wiener Filtering

I updated my bandpass filter and have included the bode plot below in Figure 1. It is a fourth order elliptic bandpass filter with a passband ripple of 1dB and a stopband attenuation of 30 dB. It emphasizes the area between 3 and 40 Hz.

Below, I applied this filter to the huddle test data. The results from this were only slightly better in the targeted region than when no pre-filter was applied.

When I pre-filtered the mode cleaner data and then used an IIR wiener filter, I found that the results did not differ much from the data that was not pre-filtered. I'm not sure yet if I'm targeting the right region of this data with my bandpass filter, and will be looking more into choosing a better region. Also, I am only using certain regions of ff when calculating the transfer function, and need to optimize that region also. I uploaded the code I used to make these plots to github.

11456   Tue Jul 28 20:42:50 2015 JessicaSummaryGeneralNew Seismometer Data Coherence

I was looking at the new seismometer data and plotted the coherence between the different arms of C1:PEM_GUR1 and C1:PEM_GUR2. There was not much coherence in the X arms, Y arms, or Z arms of each seismometer, but there were within the x and y arms of the seismometer.

I think the area we should focus on with filtering is lower ranges, between 0.01 and 0.1, because that it where coherence is most clearly high. It is higher in high frequencies but also incredibly noisy, meaning it probably wouldn't be good to try to filter there.

11457   Wed Jul 29 10:34:42 2015 IgnacioSummaryLSCCoherence of arms and seismometers

Jessica and I took 45 mins  (GPS times from 1122099200 to 1122101950) worth of data from the following channels:

C1:IOO-MC_L_DQ (mode cleaner)
C1:LSC-XARM_IN1_DQ (X arm length)
C1:LSC-YARM_IN1_DQ (Y arm length)

and for the STS, GUR1, and GUR2 seismometer signals.

The PSD for MCL and the arm length signals is shown below,

I looked at the coherence between the arm length and each of the three seismometers, plot overload incoming below,

For the coherence between STS and XARM and YARM,

For GUR1,

Finally for GUR2,

A few remarks:

1) From the coherence plots, we can see that the arm length signals are coherent with the seismometer signals the most from 0.5 - 50 Hz. This is most evident in the coherence with STS. I think subtraction will be most useful in this range. This agrees with what we see in the PSD of the arm length signals, the magnitude of the PSD starts increasing from 1 Hz and reaches a maximum at about 30 Hz. This is indicative of which frequencies most of the noise is present.

2) Eric did not remember which of  GUR1 and GUR2 corresponded to the ends of XARM and YARM. So, I went to the end of XARM, and jumped for a couple seconds to disturb whatever Gurald was in there. Using dataviewer I determined it was GUR1. Anyways, my point is, why is GUR1 less coherent with both arms and not just XARM?  Since it is at the end of XARM, I was expecting GUR1 to be more coherent with XARM. Is it because, though different arms, the PSD's of both arms are roughly the same?

3) Similarly, GUR2 shows about the same levels of coherence for both arms, but it is more coherent. Is GUR2 noisier because of its location?

Code: ARMS_COH.m.zip

ELOG V3.1.3-