40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 289 of 344  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  2167   Mon Nov 2 10:56:09 2009 kiwamuUpdateLSCcron job works succesfully & no timing jitter

As I wrote on Oct.27th, the cron job works every Sunday.

I found it worked well on the last Sunday (Nov.1st).

And I can not find any timing jitter in the data, its delay still stay 3*Ts.

  16316   Wed Sep 8 18:00:01 2021 KojiUpdateVACcronjobs & N2 pressure alert

In the weekly meeting, Jordan pointed out that we didn't receive the alert for the low N2 pressure.

To check the situation, I went around the machines and summarized the cronjob situation.
[40m wiki: cronjob summary]
Note that this list does not include the vacuum watchdog and mailer as it is not on cronjob.

Now, I found that there are two N2 scripts running:

1. /opt/rtcds/caltech/c1/scripts/Admin/n2Check.sh on megatron and is running every minute (!)
2. /opt/rtcds/caltech/c1/scripts/Admin/N2check/pyN2check.sh on c1vac and is running every 3 hours.

Then, the N2 log file was checked: /opt/rtcds/caltech/c1/scripts/Admin/n2Check.log

Wed Sep 1 12:38:01 PDT 2021 : N2 Pressure: 76.3621
Wed Sep 1 12:38:01 PDT 2021 : T1 Pressure: 112.4
Wed Sep 1 12:38:01 PDT 2021 : T2 Pressure: 349.2
Wed Sep 1 12:39:02 PDT 2021 : N2 Pressure: 76.0241
Wed Sep 1 12:39:02 PDT 2021 : N2 pressure has fallen to 76.0241 PSI !

Tank pressures are 94.6 and 98.6 PSI!

This email was sent from Nodus.  The script is at /opt/rtcds/caltech/c1/scripts/Admin/n2Check.sh

Wed Sep 1 12:40:02 PDT 2021 : N2 Pressure: 75.5322
Wed Sep 1 12:40:02 PDT 2021 : N2 pressure has fallen to 75.5322 PSI !

Tank pressures are 93.6 and 97.6 PSI!

This email was sent from Nodus.  The script is at /opt/rtcds/caltech/c1/scripts/Admin/n2Check.sh

...

The error started at 11:39 and lasted until 13:01 every minute. So this was coming from the script on megatron. We were supposed to have ~20 alerting emails (but did none).
So what's happened to the mails? I tested the script with my mail address and the test mail came to me. Then I sent the test mail to 40m mailing list. It did not reach.
-> Decided to put the mail address (specified in /etc/mailname , I believe) to the whitelist so that the mailing list can accept it.
I did run the test again and it was successful. So I suppose the system can now send us the alert again.
And alerting every minute is excessive. I changed the check frequency to every ten minutes.

What's happened to the python version running on c1vac?
1) The script is running, spitting out some error in the cron report (email on c1vac). But it seems working.
2) This script checks the pressures of the bottles rather than the N2 pressure downstream. So it's complementary.
3) During the incident on Sept 1, the checker did not trip as the pressure drop happened between the cronjob runs and the script didn't notice it.
4) On top of them, the alert was set to send the mails only to an our former grad student. I changed it to deliver to the 40m mailing list. As the "From" address is set to be some ligox...@gmail.com, which is a member of the mailing list (why?), we are supposed to receive the alert. (And we do for other vacuum alert from this address).

 

 

 

 

  11311   Tue May 19 16:18:57 2015 ericqUpdateGeneralcrons fixed

I wrapped rampdown.py in rampdown.sh, which is just these lines:

#!/bin/bash
source /ligo/cdscfg/workstationrc.sh
/opt/rtcds/caltech/c1/scripts/SUS/rampdown.py > /dev/null 2>&1

This is now what megatron's cron runs. It appears to be working.

I also added the workstationrc line to the n2 and chiara HDD checking scripts that run on nodus, which should resolve the issue from ELOG 11249

  8141   Sat Feb 23 00:34:28 2013 yutaUpdateComputerscrontab in op340m deleted and restored (maybe)

I accidentally overwrote crontab in op340m with an empty file.
By checking /var/cron in op340m, I think I restored it.

But somehow, autolockMCmain40m does not work in cron job, so it is currently running by nohup.

What I did:
  1. I ssh-ed op340m to edit crontab to change MC autolocker to usual power mode. I used "crontab -e", but it did not show anything. I exited emacs and op340m.
  2. Rana found that the file size of crontab went 0 when I did "crontab -e".
  3. I found my elog #6899 and added one line to crontab

55 * * * *  /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/MC/autolockMCmain40m >/cvs/cds/caltech/logs/scripts/mclock.cronlog 2>&1

  4. It didn't run correctly, so Rana used his hidden power "nohup" to run autolockMCmain40m in background.
  5. Koji's hidden magic "/var/cron/log" gave me inspiration about what was in crontab. So, I made a new crontab in op340m like this;

34 * * * *  /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/MC/autolockMCmain40m >/cvs/cds/caltech/logs/scripts/mclock.cronlog 2>&1
55 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/RCthermalPID.pl >/cvs/cds/caltech/logs/scripts/RCthermalPID.cronlog 2>&1
07 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/FSSSlowServo >/cvs/cds/caltech/logs/scripts/FSSslow.cronlog 2>&1
00 * * * * /opt/rtcds/caltech/c1/burt/autoburt/burt.cron >> /opt/rtcds/caltech/c1/burt/burtcron.log
13 * * * * /cvs/cds/caltech/conlog/bin/check_conlogger_and_restart_if_dead
14,44 * * * * /opt/rtcds/caltech/c1/scripts/SUS/rampdown.pl > /dev/null 2>&1


  6. It looks like some of them started running, but I haven't checked if they are working or not. We need to look into them.

Moral of the story:
  crontab needs backup.

  8146   Sat Feb 23 15:26:26 2013 yutaUpdateComputerscrontab in op340m updated

I found some daily cron jobs for op340m I missed last night. Also, I edited timings of hourly jobs to maintain consistency with the past. Some of them looks old, but I will leave as it is for now.
At least, burt, FSSSlowServo and autolockMCmain40m seems like they are working now.
If you notice something is missing, please add it to crontab.

07 * * * * /opt/rtcds/caltech/c1/burt/autoburt/burt.cron >> /opt/rtcds/caltech/c1/burt/burtcron.log
13 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/FSSSlowServo >/cvs/cds/caltech/logs/scripts/FSSslow.cronlog 2>&1
14,44 * * * * /cvs/cds/caltech/conlog/bin/check_conlogger_and_restart_if_dead
15,45 * * * * /opt/rtcds/caltech/c1/scripts/SUS/rampdown.pl > /dev/null 2>&1
55 * * * *  /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/MC/autolockMCmain40m >/cvs/cds/caltech/logs/scripts/mclock.cronlog 2>&1
59 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/RCthermalPID.pl >/cvs/cds/caltech/logs/scripts/RCthermalPID.cronlog 2>&1

00 0 * * * /var/scripts/ntp.sh > /dev/null 2>&1
00 4 * * * /opt/rtcds/caltech/c1/scripts/RGA/RGAlogger.cron >> /cvs/cds/caltech/users/rward/RGA/RGAcron.out 2>&1
00 6 * * * /cvs/cds/scripts/backupScripts.pl
00 7 * * * /opt/rtcds/caltech/c1/scripts/AutoUpdate/update_conlog.cron

  8147   Sat Feb 23 15:46:16 2013 ranaUpdateComputerscrontab in op340m updated

According to Google, you can add a line in the crontab to backup the crontab by having the cronback.py script be in the scripts/ directory. It needs to save multiple copies, or else when someone makes the file size zero it will just write a zero size file onto the old backup.

  5123   Fri Aug 5 13:51:51 2011 steveUpdateSUScross coupling

We need a plan how to minimize cross coupling in the OSEMs now

  6274   Fri Feb 10 23:19:09 2012 kiwamuUpdateIOOcross talk causing fake seimometer signals

[ Koji / Kiwamu ]

The frequent unlock of the MC are most likely unrelated to ground motion.

Although the reason why MC became unstable is still unclear.

 

 

There are two facts which suggest that the ground motion and the MC unlock are unrelated :

(1) It turned out that the seismometer signals (C1:PEM-SEIS-STS_AAA ) have a big cross talk with the MC locking signals.

    For example, when we intentionally unlocked the MC, the seismometer simultaneously showed a step-shaped signals, which looked quite similar to what we have observed.

    I guess there could be some kind of electrical cross talk happening between some MC locking signals and the seismometer channels.

    So we should not trust the signals from the STS seismometers. This needs a further investigation.

(2) We looked at the OSEM and oplev signals of some other suspended optics, and didn't find any corresponding fluctuations.

    The suspensions we checked are ETMX, ETMY, ITMX and MC1.

     None of them showed an obvious sign of the active ground motions in the past 24 hours or so.

Quote from #6266

It seems that somehow the seismic noise became louder from about 1:00 AM.

  1561   Fri May 8 02:39:02 2009 pete, ranaUpdateLockingcrossover

attached plot shows MC_IN1/MC_IN2.  needs work.

This is supposed to be a measurement of the relative gain of the MCL and AO paths in the CM servo. We expect there to

be a more steep slope (ideally 1/f). Somehow the magnitude is very shallow and so the crossover is not stable. Possible

causes? Saturations in the measurement, broken whitening filters, extremely bad delay in the digital system? needs work.

 

  1517   Fri Apr 24 16:02:31 2009 steveHowToVACcryo pump interlock rule

I tested the cryopump interlock today. It is touchy. I do not have full confidence in it.

I'm proposing that VC1 gate valve should be kept closed while nobody is working in the 40m lab.

How to open gate valve:

1, confirm temp of 12K on the gauge at the  bottom of the cryopump

2, if medm screen cryo reads OFF( meaning warm) hit reset will result reading ON (meaning cold 12K )

3, open VC1 gate valve if P1 is not higher than 20 mTorr

 

VC1 was closed at 18:25,

IFO condition: not pumped,

expected leak plus  out gassing should be less than 5 mTorr/day

The RGA is in bg-mode, annuloses are closed off

  1532   Wed Apr 29 10:20:14 2009 steveHowToVACcryo pump interlock rule is waved

Quote:

I tested the cryopump interlock today. It is touchy. I do not have full confidence in it.

I'm proposing that VC1 gate valve should be kept closed while nobody is working in the 40m lab.

How to open gate valve:

1, confirm temp of 12K on the gauge at the  bottom of the cryopump

2, if medm screen cryo reads OFF( meaning warm) hit reset will result reading ON (meaning cold 12K )

3, open VC1 gate valve if P1 is not higher than 20 mTorr

 

VC1 was closed at 18:25,

IFO condition: not pumped,

expected leak plus  out gassing should be less than 5 mTorr/day

The RGA is in bg-mode, annuloses are closed off

 The Cryo pump is running reliably since April 22 hence there is no need to close VC1 repeatedly.

The photo switch interlock was put back onto the H2 vapor pressure gauge and it is working.

  1527   Tue Apr 28 09:27:32 2009 steveConfigurationVACcryopump deserves some credit

Congratulation Yoichi and Peter for full rf locking at night. Let's remember that the cryopump was shaking the hole vac envelope and ifo during this full lock.

  1610   Wed May 20 01:41:19 2009 peteUpdateVACcryopump probably not it

I found some neat signal analysis software for my mac (http://www.faberacoustical.com/products/), and took a spectrum of the ambient noise coming from the cryopump.  The two main noise peaks from that bad boy were nowhere near 3.7 kHz.

  9856   Fri Apr 25 22:20:01 2014 ranaUpdateIOOcsh/tcsh hackery combatted

To make the mcwfson/off scripts work from rossa (and not just Jamie's pet machine) I swapped the sh-bang line at the top of the script to use 'env bash' instead of 'env csh' in the case of mcwfsoff and 'env tcsh' in the case of mcwfson.

The script was failing to work due to $OSTYPE being defined for pianosa csh/tcsh, but not on rossa.

During debugging I also bypassed the ezcawrapper for ezcaswitch so that now when ezcaswitch is called, it directly runs the binary and not the script which calls the binary with numerous retries. In the future, all new scripts will be rewritten to use cdsutils, but until then beware of ezcaswitch failures.

WFS scripts checked into the SVN.

This was all in an effort to get Koji to allow me to upgrade pianosa to ubuntu 12 so that I can have ipython notebook on there.

Objections to upgrading pianosa? (chiara and megatron are already running ubuntu 12)

  8164   Mon Feb 25 22:42:32 2013 yutaSummaryAlignmentcurrent IFO situation

[Jenne,Yuta]

Both arms are aligned starting from Y green.
We have all beams unclipped except for IPANG. I think we should ignore IPANG and go on to PRMI locking and FPMI locking using ALS.
IPANG/IPPOS and oplev steering mirrors are kept un-touched after pumping until now.

Current alignment situation:
 - Yarm aligned to green (Y green transmission ~240 uW)
 - TT1/TT2 aligned to Yarm (TRY ~0.86)
 - BS and Xarm alined to each other (TRX ~ with MI fringe in AS)
 - X green is not aligned yet
 - PRMI aligned

Current output beam situation:
 IPPOS - Coming out clear but off in yaw. Not on QPD.
 IPANG - Coming out but too high in pitch and clipped half of the beam. Not on QPD.
 TRY   - On PD/camera.
 POY   - On PD.
 TRX   - On PD/camera.
 POX   - On PD.
 REFL  - Coming out clear, on camera (centered without touching steering mirrors).
 AS    - Coming out clear, on camera (centered without touching steering mirrors).
 POP   - Coming out clear, on camera (upper left on camera).


Oplev values:

Optic    Pre-pump(pit/yaw)    PRFPMI aligned(pit/yaw)
ITMX    -0.26 /  0.60         0.25 /  0.95
ITMY    -0.12 /  0.08         0.50 /  0.39
ETMX    -0.03 / -0.02        -0.47 /  0.19
ETMY     0.37 / -0.62        -0.08 /  0.80
BS      -0.01 / -0.18        -1    /  1 (almost off)
PRM     -0.34 /  0.03        -1    /  1 (almost off)


 All values +/- ~0.01. So, oplevs are not useful for alignment reference.

OSEM values:
Optic    Pre-pump(pit/yaw)    PRFPMI aligned(pit/yaw)
ITMX    -1660 / -1680        -1650 / -1680
ITMY    -1110 /   490        -1070 /   440
ETMX     -330 / -5380         -380 / -5420
ETMY    -1890 /   490        -1850 /   430
BS        370 /   840          360 /   800
PRM      -220 /  -110         -310 /  -110


  All values +/- ~10.
  We checked that if there's ~1200 difference, we still see flash in Watec TR camera. So, OSEM values are quite good reference for optic alignment.

IPANG drift:
  On Saturday, when Rana, Manasa, and I are trying to get Y arm flash, we noticed IPANG was drifting quite a lot in pitch. No calibration is done yet, but it went off the IPANG QPD within ~1 hour (attached).
  When I was aligning Yarm and Xarm at the same time, TRY drifted within ~1 hour. I had to tweak TT1/TT2 mainly in yaw to keep TRY. I also had to keep Yarm alignment to Y green. I'm not sure what is drifting so much. Suspects are TT2, PR2/PR3, Y arm and Y green.

  I made a simple script(/opt/rtcds/caltech/c1/scripts/Alignment/ipkeeper) for keeping input pointing by centering the beam on IPPOS/IPANG using TT1/TT2. I used this for keeping input pointing while scanning Y arm alignment to search for Y arm flash this weekend (/opt/rtcds/caltech/c1/scripts/Alignment/scanArmAlignment.py). But now we have clipped IPANG.


So, what's useful for alignment after pumping?:

  Optic alignment can be close by restoring OSEM values. For input pointing, IPPOS/IPANG are not so useful. Centering the beam on REFL/AS (POP) camera is a good start. But green works better.

  3689   Mon Oct 11 16:09:10 2010 yutaSummarySUScurrent OSEM outputs

Background:
 The output range of the OSEM is 0-2V.
 So, the OSEM output should fluctuate around 1V.
 If not, we have to modify the position of it.

What I did:
 Measured current outputs of the 5 OSEMs for each 8 suspensions by reading sensor outputs(C1:SUS-XXX_YYPDMON) on medm screens.

Result:

  BS ITMX ITMY PRM SRM MC1 MC2 MC3
UL 1.20 0.62 1.69 1.18 1.74 1.25 0.88 1.07
UR 1.21 0.54 1.50 0.99 1.77 1.64 1.46 0.31
LR 1.39 0.62 0.05 0.64 2.06 1.40 0.31 0.19
LL 1.19 0.88 0.01 0.64 1.64 1.00 0.05 1.03
SD 1.19 0.99 0.97 0.79 1.75 0.71 0.77 0.93

 White: OK (0.8~1.2)
 Yellow: needs to be fixed
 Red: BAD. definitely need fix

  5176   Wed Aug 10 15:39:33 2011 jamieUpdateSUScurrent SUS input diagonalization overview

Below is the overview of all the core IFO suspension input diagonalizatidons.

Summary: ITMY, PRM, BS are really bad (in that order) and are our top priorities.

UPDATE:

I had originally put the condition number of the calculated input matrix (M) in the last column.  However, after some discussion we decided that this is not in fact what we want to look at.  The condition number of a matrix is unity if the matrix is completely diagonal.  However, even our ideal input matrix is not diagonal, so the "best" condition number for the input matrix is unclear.

What instead we do know is that the matrix, B, that describes the difference between the calculated input matrix, M, and the ideal input matrix, M0: should be diagonal (identity, in fact):

M = M0 B

B should be diagonal (identity, in fact), and it's condition number should ideally be 1.  So now we calculate B-1, since it can be calculated from the pre-inverted input matrix:

B-1 = M-1 * M0

From that we calculate cond(B) == cond(B-1).

cond(B) is our new measure of the "badness" of the OSEMS.

new summary: ITMY, PRM, BS are really bad (in that order) and are our top priorities.

 

TM    INPUT MATRIX (M)
 cond(M) cond(B)
 PRM PRM.png        pit     yaw     pos     side    butt
UL   -2.000  -2.000  -2.000  -0.345   2.097 
UR   -0.375  -0.227  -0.312  -0.060   0.247 
LR    1.060   1.075   0.971   0.143  -0.984 
LL   -0.565  -0.698  -0.717  -0.141   0.672 
SD    1.513   1.485   1.498   1.000  -1.590
 75.569 106.756
 SRM SRM.png  

      pit     yaw     pos     side    butt
UL    0.791   1.060   1.114  -0.133   1.026 
UR    1.022  -0.940   1.052  -0.061  -1.027 
LR   -0.978  -0.987   0.886  -0.031   0.903 
LL   -1.209   1.013   0.948  -0.103  -1.043 
SD    0.286   0.105   1.249   1.000   0.030

 2.6501 3.90776
 BS  BS.png  

      pit     yaw     pos     side    butt
UL    1.420   0.818  -0.069   0.352   1.038 
UR    0.276  -1.182   1.931  -0.217  -0.905 
LR   -1.724  -0.274   1.940  -0.254   0.862 
LL   -0.580   1.726  -0.060   0.315  -1.194 
SD    0.560   0.171  -3.535   1.000   0.075 

9.8152 7.28516
ITMX ITMX.png       pit     yaw     pos     side    butt
UL    0.437   1.015   1.050  -0.065   0.714 
UR    0.827  -0.985   1.129  -0.221  -0.957 
LR   -1.173  -1.205   0.950  -0.281   1.245 
LL   -1.563   0.795   0.871  -0.125  -1.084 
SD   -0.581  -0.851   2.573   1.000  -0.171 
 4.08172 4.69811
 ITMY  ITMY.png  

      pit     yaw     pos     side    butt
UL    0.905  -0.884  -0.873   0.197   0.891 
UR   -1.095   1.088   1.127  -0.252  -1.115 
LR   -0.012  -0.028   0.002   0.001   0.030 
LL    1.988  -2.000  -1.998   0.451   1.964 
SD    4.542  -4.608  -4.621   1.000   4.517 

 801.453 774.901
 ETMX  ETMX.png        pit     yaw     pos     side    butt
UL    0.344   0.475   1.601   0.314   1.043 
UR    0.283  -1.525   1.786  -0.071  -1.181 
LR   -1.717  -1.569   0.399  -0.102   0.938 
LL   -1.656   0.431   0.214   0.283  -0.837 
SD    0.995  -2.632  -0.999   1.000  -0.110 
 4.26181 4.33518
 ETMY  ETMY.png        pit     yaw     pos     side    butt
UL   -0.212   1.272   1.401  -0.127   0.941 
UR    0.835  -0.728   1.534  -0.101  -1.054 
LR   -0.953  -1.183   0.599  -0.066   0.827 
LL   -2.000   0.817   0.466  -0.092  -1.177 
SD   -0.172   0.438   2.238   1.000  -0.008 
 4.04847 4.33725

 

  6862   Sun Jun 24 00:10:45 2012 yutaUpdateGreen Lockingcurrent beat electronics

I moved amplifiers for beat PD at PSL table to 1X2 rack. Current beat setup from PD to ADC is shown below. Setup for X beat and Y beat are almost the same except for minor difference like cable kind for the delay line.

Currently, DC power for amplifiers ZHL-1000LN+ is supplied by Aligent E3620A.
I tried to use power supply from the side of 1X1 rack, but fuse plug(Phoenix Contact ST-SI-UK-4) showed red LED, so I couldn't use it.
Measured amplification was +25 dB for 10-100 MHz.

Measured gain from RF input to monitor output of the beat box was ~ -1 db for 10-100 MHz.

beatsetup20120623.png

  3948   Thu Nov 18 16:32:21 2010 yutaSummaryCDScurrent damping status for all optics c1sus handles

Summary:
   I set Q-values for each ringdown of PRM, BS, ITMX, ITMY, MC1, MC2, MC3 to ~5 using QAdjuster.py.
   Here are the results;
c1susdampings.png

  Red ringdowns indicate the second try after gain setting.

Note:

  - ITMX and ITMY are referred according to MEDM screens in this entry.
  - ITMX(south) OSEM positions are currently so bad(LL and SD are all the way in/out).
        I have to change IFO_ALIGN slider values to check the damping servo. For SIDE, I couldn't do that. I reverted the slider change after the damping checking.
  - ITMY(west) somehow has opposite coil gain sign.
       Usually for the other optics, UL,UR,LR,LL is 1,-1,1,-1. But for ITMY to damp, they are -1,1,-1,1.
  - PRM damps, but ringdown doesn't look nice. There must be something funny going on.
  - SRM doesn't have OSEMs put in now.

  13421   Thu Nov 9 10:51:37 2017 gautamSummaryLSCcurrent procedure for compiling and installing c1dnn code

Jamie pointed out that the compile and install instructions are different for c1dnn:

cd /opt/rtcds/caltech/c1/rtbuild/test/nn-test
make c1dnn
make install-c1dnn

See also: https://nodus.ligo.caltech.edu:8081/40m/13383.

I think these build instructions have to be run on the c1lsc frontend - in the past, I have been able to compile and install models on any computer with the shared drive mounted (including the control room workstations), but I'm guessing that something has changed since the RCG upgrade. Jamie can correct me on this if I'm wrong.

  13411   Mon Nov 6 18:22:48 2017 jamieSummaryLSCcurrent procedure for running c1dnn code

This is the current procedure to start the c1dnn model:

$ ssh c1lsc
$ sudo systemctl start rts-epics@c1dnn
$ sudo systemctl start rts-awgtpman@c1dnn
$ sudo /usr/bin/cset proc -s rts-c1dnn --exec /opt/rtcds/caltech/c1/target/c1dnn/bin/c1dnn -- -m c1dnn
...

Then to shutdown:

...
Ctrl-C
$ sudo systemctl stop rts-awgtpman@c1dnn
$ sudo systemctl stop rts-epics@c1dnn

The daqd already knows about this model, so nothing should need to be done to the daqd to make the dnn channels available.

  3405   Thu Aug 12 11:19:04 2010 kiwamuUpdateCDScurrent status

Current status of the new CDS test (summaries can be found on the wiki)

- Cables 

  All the necessary cables are in our hands, such as SCSIs, 37 pin D-sub and 3 pin power cables.

  With a big help from Jenne and Koji, we made 37 pin F/M cables and 3 pin power cables on the last Tuesday.

  Now all the cables are connected to the machines except for the 3 pin power cables.

  To install the power cables I will switch off Sorensens at new 1X5  for a minute.

 

- Suspension model file 

  The model file named C1SUS was made and I succeeded in compiling and building it. So it is ready for the test.

  But some medm screens look like not correctly complied.

  For example the channel names which are listed on C1SUS_MONITOR_ADC0.adl aren't correctly assigned.

 

- ADCs 

 All the ADC channels are working. Also the channel assignments are correct.

 I checked all the ADC channels if it was working while I ran the front end realtime code C1SUS.

 By injecting a signal from a function generator directly to the SCSIs, I could clearly see the numbers hopping on medm screens.

 

- DACs 

  None of them are working although the computer can recognize that all three DAC cards are mounted.

  I think something in the IOP file and the model file may be wrong because this symptom looks similar to that of C1ISCEX we tested two weeks before ( see this entry ) 

  I have to fix it anyhow because the DACs are very necessary part for this damping test.

 

- Binary Outputs

  They didn't work at all. Even the computer didn't recognize the binary output cards.

  I think I should put some magical components on the model files.

 

- Conversion of the filter coefficients 

  The conversion of the filter coefficients has been doen yesterday.

  It looks running well because I can load and unload these filters on medm screens.

  I manually copied the filter coefficients from the current suspension filter file to that of new filter file.

 The current file can be found under /cvs/cds/caltech/chans/,  and the new one is under /cvs/cds/rtcds/caltech/c1/chans


Suspended works

   - Binary Outputs

   - simlink realtime model with the new CDS PARTS ( see the notice from Joe )

 

To be done

   - let DACs work

   - damping tests

 

  3408   Thu Aug 12 14:01:53 2010 josephbUpdateCDScurrent status

First, awesome progress.

Second, the Binary Output parts do in fact need to be added to the c1sus model.  They don't need to appear in the IOP though.  (They are somehow automatically taken care of by the IOP code). 

It looks like in the 2004 revision in SVN, the cdsCDO32 part (located in the CDS_PARTS.mdl/IO_PARTS) was broken.  I fixed it and updated the svn with it, so a svn update should pull the corrected version.

I'm not sure whats wrong with the DAQs.  I'll try to take a closer look at the model file tonight and see if I can make suggestions.  When you write outputs on DAC_0 or DAC_1 is the C1SUS GDS TP screen showing anything?

  5252   Wed Aug 17 03:10:06 2011 kiwamuUpdateGeneralcurrent status of in-vac work

[Jamie / Suresh / Kiwamu] 

The in-vac work is ongoing.

Before we run out our energy we are posting this entry to briefly report the current status.

- (done) BS earthquake stop adjustment.

- (done) PRM earthquake stop adjustment

- (done) MC spot position check => They are almost the same within 10 %.

- (done) Injection and alignment of the ABSL laser to make the beam brighter in the vertex region.

- (done) POY => We repositioned an in-vac steering mirror to get the POY beam hitting the center of the steering mirror.

                           It's now coming out from the chamber.

(done)  IPANG => realigned two mirrors on the ETMY table to get the IPANG out from the chamber. Now it's reaching the ETMY optical table.

                             It needs a final touch before we pump down.  We revisited it later in the night after realigning the IFO and it is well aligned now.

- (done)  POP => We have aligned the ABSL laser injected from the AS port to reach the REFL camera.  We turned it up to max power of 300mW and used it as a substitute for the PRC beam.

                         Even this was not enough to see anything in the POP beam path after the PR2 (tip-tilt).  So we used a green beam from the Y-arm as a guide of the POP beam path because the ABSL (POP) beam was too dim to work with.

                         We placed a lens and a CCD camera to detect the green and then blocked the Y-green.  It was then possible to see the ABSL-POP beam in the CCD camera.  The lens and the CCD are markers for this beam. 

                         Do not remove these markers unless absolutely necessary.

-(done) POX => We located the ABSL (mimicking POX) beam on the POXM1 mirror and adjusted the mirror to ensure that the beam exits at the right height and a convenient location on the POX table. 

- (0%) OSEM mid-range adjustment

- (0%) IPPOS

- (0%) oplev re-alignment

  6514   Tue Apr 10 11:08:29 2012 taraUpdatePSLcurved mirror behind AOM removed

We removed the curved mirror behind the AOM (ROC=0.3m) on PSL table. The mirror is now in PSL lab. See PSL:905 for more detail.

  15416   Fri Jun 19 11:02:10 2020 ChubUpdateGeneralcustom feedthrough flanges are here!

The four 4x25DSUB and single 8x25DSUB feedthrough flanges have arrived and will be picked up from the dock and brought to the 40M lab.

  6473   Fri Mar 30 17:37:09 2012 steveUpdateGeneralcutting green welding glass for beam dumps

Quote:

Schott, green welding glass, shade 14, 3 mm thick  was measured in the beam path of 1.2W, S polarization of 1064nm at ~1 mm diameter size as MC reflected path.

Absorption 95%, R 5% at incident angle 25-50 degrees. It looks like the perfect material for beam trap.

 

 The CIT Chemistry Glass Shop cuts turned out to be sloppy using diamond disc blade  cutter.

East coast Precision Glass & Optics  offered scribed cut and polished side. Their quote price was high and time consuming.

The GLASS HOUSE  shop in Pasadena 626 / 796-9151 on Walnut did a good and cheap job. Oscar the cutter will finish the rest of the cutting by next Friday, April 6

He used scribed cutting technic and his 1" x  0.5"  pieces are good. Bob will pick up them up.

  3770   Sat Oct 23 14:42:01 2010 yutaSummaryCDSdamped MC suspensions

After replacing filters, MC suspensions damped  last night.

Further measurement next time.

  5264   Thu Aug 18 15:54:35 2011 steveUpdateSUSdamped and undamped OSEMs

damped sus at atm1 and freeswingging sus at atm2

 

  3692   Mon Oct 11 22:04:28 2010 yutaUpdateCDSdamping for MCs are basically working

Background:
 Even if we can't use the Dataviewer to get the time series of each 4 DOF displacements, we can still use StripTool to monitor the ringdowns and see if the damping servo is working.

What I did:
1. Excited vibration by kicking the mirror randomly (by putting some offsets randomly and turing the filters on and off randomly).

2. Turned all the servo off by clicking "ShutDown" button.

3. Turned all the servo on by clicking "Normal" button.

3. Monitored each 4 DOF displacements with StripTool and see if there are any considerably low-Q ringdown after turning on the servo.
 The values I monitored are as follows;
  C1:SUS_MCX_SUSPOS_INMON
  C1:SUS_MCX_SUSPIT_INMON
  C1:SUS_MCX_SUSYAW_INMON
  C1:SUS_MCX_SDSEN_INMON  (X=1,2,3)

All the settings I used for this observation are automatically saved here;
  /cvs/cds/caltech/burt/autoburt/snapshots/2010/Oct/11/21:07/c1mcs.epics

Result:
 Attached is the screenshots of StripTool Graph window for each 3 MCs.
 As you can see, the dampings for each DOF, each MCs are basically working.

Note:
 Do NOT turn off all the damping servo by clicking "Damp" buttons or setting the SUSXXX_GAIN to 0. It crashes c1mcs.

Next work:
 - check and relate the signal sign with the actual moving direction of the optics
 - fix data aquisition system
 - measure Q-values when the servo is on and off using DAQ and Dataviewer

  2939   Mon May 17 10:57:16 2010 steveConfigurationSUSdamping restored to ETMYs

ETMY-south sus damping was restored

  5534   Sat Sep 24 01:21:11 2011 kiwamuUpdateSUSdamping test

As a suspension test I am leaving all of the suspensions restored and damped with OSEMS but without oplevs

  11104   Thu Mar 5 20:44:30 2015 JenneUpdateSUSdamprestore script updated

I just realized that the "damprestore" script that can be called from the watchdog screen did not have the new oplev names.  I have updated it, and added it to the svn.

  6529   Thu Apr 12 20:56:07 2012 DenUpdatePEMdaq

GUR1 XYZ, GUR2 XYZ, MC_F channels are now recorded at 256 Hz.

EDIT by JCD:  What Den means to say here is that (a) he modified some .ini files, and (b) he restarted the fb.

  7713   Wed Nov 14 21:59:09 2012 DenUpdateCDSdaq errors

I tried to add a test point to C1MCS model and spent next two hours rebooting front-ends, restarting models and realigning MC.

dmesg told me that DAQ channels can not be allocated as they already exist. Last time we met this problem Jamie emailed Alex about it. Jamie, what is the output? Restarting iop model does not help this time.

  6576   Thu Apr 26 20:44:23 2012 JamieUpdateCDSdaq network failure, c1ioo failing to start models

Den tried adding a SINGLE acquire channel to the c1ioo, which for some reason hung c1ioo and took down the entire DAQ network (at least all communication between the front ends and the fb).  We recovered by restarting c1ioo and restarting mx_stream on all the rest of the front-ends

After "recovering", though, c1ioo is failing to load models, or at least it's IOP. Here is the tail of dmesg when trying to start the IOP:

[ 1751.140283] c1x03: Initializing space for daqLib buffers
[ 1751.140284] c1x03: Initializing Network
[ 1751.140285] c1x03: Found 1 frameBuilders on network
[ 1751.250658] CPU 2 is now offline
[ 1751.250657] c1x03: Sync source = 4
[ 1751.250657] c1x03: Waiting for EPICS BURT Restore = 1
[ 1751.310008] c1x03: Waiting for EPICS BURT 0
[ 1751.310008] c1x03: BURT Restore Complete
[ 1751.310008] c1x03: Initialized servo control parameters.
[ 1751.311699] c1x03: DAQ Ex Min/Max = 1 3
[ 1751.311699] c1x03: DAQ XEx Min/Max = 3 53
[ 1751.311733] c1x03: DAQ Tp Min/Max = 10001 10007
[ 1751.311733] c1x03: DAQ XTp Min/Max = 10007 10507
[ 1751.311737] c1x03: DIRECT MEMORY MODE of size 64
[ 1751.311737] c1x03: daqLib DCU_ID = 33
[ 1751.311737] c1x03: Invalid num daq chans = 0
[ 1751.311737] c1x03: DAQ init failed -- exiting

The chan file for this model (/opt/rtcds/caltech/c1/chans/daq/C1X03.ini) looks totally fine, has two un-acquired channels uncommented, and has otherwise not been touched. The C1:FEC-33_MSGDAQ is also reading: "ERROR reading DAQ file!"

I'm at a loss for what is going on. I've tried restarting every CDS process on the machine, restarting the model multiple times, restarting fb, and even restarting the entire c1ioo machine, all to no affect.

  7093   Mon Aug 6 19:37:50 2012 JamieUpdateCDSdaqd and CDS network problems today

For some reason this afternoon we've been experiencing a lot of problems with the framebuilder, and with the CDS network in general.  The framebuilder has been very unresponsive, although the daqd logs seem to indicate that things are ok.  All models will loose contact with fb for very long stretches.  Attempts to kill/restart daqd don't seem to fix the problem.

These problems seem to be associated with the general CDS network issues as well.  The network seems to become very slow, and the workstations all become very slow.  The later I assume is because of the network and that so much of the work we do is on network mounted filesystems (/opt/rtcds, /ligo, etc.).

My current speculation is that daqd on fb is doing something stupid, like trying to read or write a bunch of stuff from /frames, which is also network mounted, and that clogs up the entire network.  Some serious network debugging is going to be needed to figure out what's going on, though.

I'm afraid daqd is caught in some bad state now, though.  It's not responding to anything, and every attempt to kill it seems to bring it back into the bad state.  Hopefully I can get Alex to help me figure out what's going on tomorrow.   Maybe it will clear up on it's own tonight...

  7094   Mon Aug 6 19:54:53 2012 JamieUpdateCDSdaqd and CDS network problems today

It looks like daqd is indeed caught in some bad state.  It seems to die at some point after making GPS corrections to minute trender:

...
[Mon Aug  6 19:45:13 2012] Minute trender made GPS time correction; gps=1028342727; gps%60=27
tail: `fb/logs/daqd.log' has been replaced;  following end of new file
263596
MX endpoint opened
startup file interpreter thread tid=140334118615312
calling yyparse(5, 6)
[Mon Aug  6 19:50:08 2012] ->5: #set avoid_reconnect
[Mon Aug  6 19:50:08 2012] ->5: set thread_stack_size=102400
[Mon Aug  6 19:50:08 2012] new threads will be created with the stack of size 102400K
[Mon Aug  6 19:50:08 2012] ->5: set allow_tpman_connect_fail
[Mon Aug  6 19:50:08 2012] ->5: #set dcu_status_check=5
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=-1
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=31535998
[Mon Aug  6 19:50:08 2012] ->5: ##set symm_gps_offset=347155213
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=378691215
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=378691212
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=315964799
[Mon Aug  6 19:50:08 2012] ->5: set symm_gps_offset=315964801
[Mon Aug  6 19:50:08 2012] ->5: set debug=0
[Mon Aug  6 19:50:08 2012] ->5: set log=2
[Mon Aug  6 19:50:08 2012] ->5: set zero_bad_data=0
[Mon Aug  6 19:50:08 2012] ->5: set dcu_status_check=9
[Mon Aug  6 19:50:08 2012] ->5: set controller_dcu=33
[Mon Aug  6 19:50:08 2012] ->5: set master_config="/opt/rtcds/caltech/c1/target/fb/master"
[Mon Aug  6 19:50:10 2012] finished configuring data channels
[Mon Aug  6 19:50:10 2012] ->5: configure channels begin end
Unable to find GDS node 90 system c1x00 in INI files
Unable to find GDS node 92 system c1tst2 in INI files
Unable to find GDS node 95 system c1x10 in INI files
[Mon Aug  6 19:50:10 2012] ->5: tpconfig "/opt/rtcds/caltech/c1/target/gds/param/testpoint.par"
[Mon Aug  6 19:50:10 2012] ->5: set gps_leaps = 820108813
[Mon Aug  6 19:50:10 2012] ->5: set detector_name="CIT"
[Mon Aug  6 19:50:10 2012] ->5: set detector_prefix="C1"
[Mon Aug  6 19:50:10 2012] ->5: set detector_longitude=-90.7742403889
[Mon Aug  6 19:50:10 2012] ->5: set detector_latitude=30.5628943337
[Mon Aug  6 19:50:10 2012] ->5: set detector_elevation=.0
[Mon Aug  6 19:50:10 2012] ->5: set detector_azimuths=1.1,4.7123889804
[Mon Aug  6 19:50:10 2012] ->5: set detector_altitudes=1.0,2.0
[Mon Aug  6 19:50:10 2012] ->5: set detector_midpoints=2000.0, 2000.0
[Mon Aug  6 19:50:10 2012] ->5: set num_dirs = 10
[Mon Aug  6 19:50:10 2012] ->5: set frames_per_dir=225
[Mon Aug  6 19:50:10 2012] ->5: set full_frames_per_file=1
[Mon Aug  6 19:50:10 2012] ->5: set full_frames_blocks_per_frame=16
[Mon Aug  6 19:50:10 2012] ->5: set frame_dir="/frames/full", "C-R-", ".gwf"
[Mon Aug  6 19:50:10 2012] ->5: set trend_num_dirs=10
[Mon Aug  6 19:50:10 2012] ->5: set trend_frames_per_dir=1440
[Mon Aug  6 19:50:10 2012] ->5: set trend_frame_dir= "/frames/trend/second", "C-T-", ".gwf"
[Mon Aug  6 19:50:10 2012] ->5: set raw-minute-trend-dir="/frames/trend/minute_raw"
[Mon Aug  6 19:50:10 2012] ->5: set nds-jobs-dir="/opt/rtcds/caltech/c1/target/fb"
[Mon Aug  6 19:50:10 2012] ->5: set minute-trend-num-dirs=10
[Mon Aug  6 19:50:10 2012] ->5: set minute-trend-frames-per-dir=24
[Mon Aug  6 19:50:10 2012] ->5: set minute-trend-frame-dir="/frames/trend/minute", "C-M-", ".gwf"
[Mon Aug  6 19:50:10 2012] ->5: start main 10
[Mon Aug  6 19:50:12 2012] main started
[Mon Aug  6 19:50:12 2012] ->5: start profiler
[Mon Aug  6 19:50:12 2012] ->5: # comment out this block to stop saving data
[Mon Aug  6 19:50:12 2012] frame saver started
[Mon Aug  6 19:50:12 2012] ->5: start frame-saver
[Mon Aug  6 19:50:13 2012] ->5: sync frame-saver
[Mon Aug  6 19:50:13 2012] ->5: start trender
[Mon Aug  6 19:50:13 2012] trender started
[Mon Aug  6 19:50:13 2012] trend frame saver started
[Mon Aug  6 19:50:13 2012] ->5: start trend-frame-saver
[Mon Aug  6 19:50:14 2012] ->5: sync trend-frame-saver
[Mon Aug  6 19:50:14 2012] minute trend frame saver started
[Mon Aug  6 19:50:14 2012] ->5: start minute-trend-frame-saver
[Mon Aug  6 19:50:14 2012] Done creating ADC structures
[Mon Aug  6 19:50:15 2012] ->5: sync minute-trend-frame-saver
[Mon Aug  6 19:50:15 2012] raw minute trend frame saver started
[Mon Aug  6 19:50:15 2012] ->5: start raw_minute_trend_saver
[Mon Aug  6 19:50:15 2012] ->5: #frame-writer "225.225.225.1" broadcast="131.215.113.0" all
[Mon Aug  6 19:50:15 2012] ->5: #sleep 5
[Mon Aug  6 19:50:15 2012] producer started
[Mon Aug  6 19:50:15 2012] ->5: start producer
[Mon Aug  6 19:50:15 2012] ->5: start epics dcu
[Mon Aug  6 19:50:15 2012] MX receiver thread started
[Mon Aug  6 19:50:15 2012] edcu started
[Mon Aug  6 19:50:15 2012] ->5: start epics server "C0:DAQ-DC0_" "C1:DAQ-DC0_"
[Mon Aug  6 19:50:15 2012] epics server started
[Mon Aug  6 19:50:15 2012] ->5: start listener 8087
[Mon Aug  6 19:50:15 2012] ->5: start listener 8088 1
[Mon Aug  6 19:50:15 2012] ->5: sleep 60
[Mon Aug  6 19:50:15 2012] Epics server started
[Mon Aug  6 19:50:15 2012] EDCU has 2553 channels configured; first=0

[Mon Aug  6 19:50:18 2012] Minute trender made GPS time correction; gps=1028343032; gps%60=32
...

The "tail:..." line indicates that the log was moved and replaced, which indicates a daqd restart.  As far as I know this was not manually triggered.

After the restart the same thing happens again.  About once every five minutes.

  7095   Mon Aug 6 20:08:45 2012 JamieUpdateCDSdaqd and CDS network problems today

When daqd is caught in this state it can not be killed.  It's in "uninterruptable sleep" ('D' state in the top output below).  This usually indicates that it's waiting for the kernel, usually due to some missing or hung IO.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                      
28038 controls  20   0 4430m 2.0g  19m D    0 27.1   0:15.00 daqd                                                                                         

The memory footprint also seems to be getting big.  It's clearly trying to do something stupid that it can't handle.

  3828   Fri Oct 29 18:37:33 2010 yutaSummaryCDSdaqd and current CDS status

Background:
  Before Joe left(~ 1 hour ago), fb was working for a while. But after he left, daqd core dumped.
  This is maybe because we started c1sus and c1rms again for a delay measurement, just before he left.

What I did:
  I restarted IOP(c1x02) and FE models.
  Now it seems OK (we can use dataviewer and diaggui), but daqd reports bunch of errors like;

CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C1:SUS-ITMX_TO_COIL_0_3_INMON", Connecting to: 192.168.113.85:42367, Ignored: c1susdaq:42367"
    Source File: ../cac.cpp line 1208
    Current Time: Fri Oct 29 2010 18:07:39.132686519
..................................................................


tp node xx invalid  (xx is 38 to 36)

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant  ...
                   ... 

   Please add other stuff you need.

Below is an example of how the color code works:

06-25-2009.NN_24ThreatLevel.GJH2L69BK.1.jpg

  9536   Tue Jan 7 23:53:35 2014 JamieUpdateCDSdaqd can't connect to c1vac1, c1vac2

dadq is logging the following error messages to it's log related to the fact that it can't connect to c1vac1 and c1vac2:

CAC: Unable to connect because "Connection timed out"
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "c1vac2.martian:5064"
    Source File: ../cac.cpp line 1127
    Current Time: Tue Jan 07 2014 23:50:53.355609430
..................................................................
CAC: Unable to connect because "Connection timed out"
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "c1vac1.martian:5064"
    Source File: ../cac.cpp line 1127
    Current Time: Tue Jan 07 2014 23:50:53.356568469
..................................................................

 Not sure if this is related to the full /frames issue that we've been seeing.

  14889   Tue Sep 17 14:01:46 2019 gautamUpdateCDSdaqd fw dead

For some reason, the daqd_fw service was dead on FB. This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 . Simply restarting the fw service does not work, it crashes again after ~20 seconds. The problem may have to do with the indeterminate state of the c1lsc expansion chassis. However, this is not something that can immediately be fixed, as Chub is still working on the wiring there. So in summary, no frame data will be available until we fix this problem (it is still unclear what exactly the problem is). Team WFS can still work by getting online data.

Why were the CDS overview DC indicators not red???


Unrelated to this work: I had to key the c1psl crate to get the IMC autolocker functioning again. However, I found that the key 🔑 turns continuously - as opposed to having two well defined states, ON and OFF. Be careful while handling this.

  14891   Tue Sep 17 21:34:07 2019 gautamUpdateCDSdaqd fw dead no more

Summary:

  1. Frames seem to be written again.yesSlowly but surely, we are converging to an operable state...
  2. No frames are available for the period 23 Aug to 17 September 2019
  3. Don't edit the C0EDCU.ini file unless you know what you're doing.
  4. If you make some changes to the RT system/channel list or reboot FEs, please make sure all the dependent systems are back up and running. There shouldn't be a need to willy-nilly reboot things.
  5. Tomorrow I will prepare the map of BIO channels for Chub to restore the whitening switching capability. Then we can try locking some cavities.

Details:

  1. First, I checked to make sure the /frames partition wasn't full. It wasn't. yes
  2. Next, I looked into the C0EDCU.ini file.
    • The last date for which frames are available, 23 Aug, coincided with the date when this file was modified.
    • It is a known problem that the daqd_fw service can crash if one of the channels in this file is reporting an unusually large number.
    • Several channels were added to this file - in the end, only 9 new ones were required, 5x "DetectMon" channels for each of the RF demodulation frequencies, and 4 for the new ALS LO and RF signal power monitor channels.
    • It is highly likely that one of the other channels was what caused the daqd_fw service to crash - though I can't say for sure, because I did not exhaustively search through the ~100 un-necessary channels that were in this file to see what values they were reporting.
  3. For good measure, I ran the reboot script, and brought the c1lsc models back online.
    • I want to do the mapping of the BIO channels to the pin-out of the BIO adaptor unit, which requires c1lsc to run.
    • Reboot script ran smoothly.
  4. Then I went into fb and restarted all the daqd services. This time, they all seem to run without crashing, at least in the ~10min window it took me to type out this elog.

controls@fb1:~ 127$ sudo systemctl status  daqd_fw.service
● daqd_fw.service - Advanced LIGO RTS daqd frame writer
   Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
   Active: active (running) since Tue 2019-09-17 21:32:25 PDT; 17min ago
 Main PID: 22040 (daqd_fw)
   CGroup: /daqd.slice/daqd_fw.service
           └─22040 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw

Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer crc thread - label dqprodcrc pid=22108
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] [Tue Sep 17 21:32:31 2019] Producer thread - label dqproddbg pid=22109Producer crc... permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer crc thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread - label dqprod pid=22103
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:35 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:35 2019] Minute trender made GPS time correction; gps=1252816371; gps%60=51
Sep 17 21:33:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:33:31 2019] ->3: clear crc

drwxr-xr-x 2 controls controls 569344 Aug 23 05:17 12465
drwxr-xr-x 2 controls controls 565248 Aug 23 05:41 12466
drwxr-xr-x 2 controls controls 557056 Aug 23 05:53 12505
drwxr-xr-x 2 controls controls 262144 Aug 23 18:40 12506
drwxr-xr-x 2 controls controls  12288 Sep 17 21:54 12528
 

Unrelated to this work: c1auxey was keyed.

Quote:

This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 .

  11854   Mon Dec 7 00:45:28 2015 ericqUpdateCDSdaqd is mad

I glanced at the summary pages and noticed that, since Friday around when we first loaded up the new BLRMS parts, daqd has crashing very frequently (few times per hour). 

I'm going to comment out the c1pem lines from the daqd master file for tonight, and see if that helps. 

  11856   Mon Dec 7 10:45:21 2015 ericqUpdateCDSdaqd is mad

Since removing c1pem from the daqd master file, daqd has not crashed. I suppose we're running into the stability issue that motivated us to disable some of the other models (IOPs, RFM, etc.) during the RCG upgrade. 

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

  11859   Mon Dec 7 11:25:10 2015 jamieUpdateCDSdaqd is mad
Quote:

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

The new fb1 daqd was also crashing even without the trend writing enabled.  I'm not sure how much that's affected by the load, though, e.g. it might be able to handle the extra load fine but then die because of some other issue not related to the number of channels being acquired.

We should schedule some time this week to work on fb1 some more.

  5486   Tue Sep 20 17:45:30 2011 kiwamuUpdateCDSdaqd is restarting by hisself ?

[Jenne / Kiwamu]

 Fb was sick. Dataviewer and Fourier Tools didn't work for a while.

After 10 minutes later they became healthy again. No idea what exactly was going on.

One thing we found was that : during the sickness of fb, it looks like daqd was restarting by hisself. Is this normal ??

Here is the bottom sentences of restart.log. Apparently daqd was rebooting although we didn't command to do so.

  daqd_start Tue Sep 20 02:41:17 PDT 2011
  daqd_start Tue Sep 20 13:18:12 PDT 2011
  daqd_start Tue Sep 20 17:33:00 PDT 2011

  1551   Wed May 6 16:56:35 2009 rana, alex, joeConfigurationComputersdaqd log, cront, etc.
While Alex came over, we investigated the log file problems with DAQD and NDS on FB0. There was a lot of
the standard puzzling and mumbling, but eventually we saw that it doesn't create its log file and so it
doesn't write to it. The log file is /usr/controls/main_daqd.log. The other files called daqd.log.DATE
in the logs/ directory are actually not written to. Its awesome.

We also have put in a fix for the overflowing jobs/ directory. It gets a file written to it every time
you make and NDS request and our seisBLRMS has been overloading it. There's now a cron for it in the fb0
crontab which cleans out week-old files at 6:30 AM every day.

We also changed the time of the daily backup from 3:30 AM (when people are still working) to 5:50 AM
(by which time the seismic has ramped up and interferometerists should be asleep). I didn't like the
idea of a bandwidth hog nailing the framebuilder during the peak of interferometer work.

#
# Script to backup via rsync the most recent 40m minute trends and
# any changes to the /cvs/cds filesystem.
#
50 05 * * * /cvs/cds/caltech/scripts/backup/rsync.backup < /dev/null > /cvs/cds/caltech/scripts/\
backup/rsync.backup.log 2>&1

30 06 * * * find /usr/controls/jobs -mtime +7 -exec /bin/rm -f {} \;

seisBLRMS.m restarted on mafalda.
  9530   Tue Jan 7 22:44:45 2014 JenneUpdateCDSdaqd on fb is segfaulting every ~30 seconds

The daqd process is segfaulting and restarting itself every 30 seconds or so.  It's pretty frustrating. 

Just for kicks, I tried an mxstream restart, clearing the testpoints, and restarting the daqd process, but none of things changed anything.  

Manasa found an elog from a year ago (elog 7105 and preceding), but I'm not sure that it's a similar / related problem.  Jamie, please help us!

Here is a screen dump from the "dtail":

Every 1.0s: dmesg | tail -50                                                                                                                         Tue Jan  7 22:43:23 2014

[   33.498691]  [<ffffffff8104a063>] kthread+0x7a/0x82
[   33.498695]  [<ffffffff81003654>] kernel_thread_helper+0x4/0x10
[   33.498698]  [<ffffffff81049fe9>] ? kthread+0x0/0x82
[   33.498701]  [<ffffffff81003650>] ? kernel_thread_helper+0x0/0x10
[   33.498703] ---[ end trace 6236defa99b3e091 ]---
[   33.498705] mx INFO: Board 0: allocated MSI IRQ 67
[   33.498713] mx INFO: CPU0: PAT = 0x7010600070106
[   33.498715] mx INFO: CPU0: new PAT = 0x1010600070106
[   33.498718] mx INFO: Board 0: Using PAT index 6
[   33.499101] eth0: no IPv6 routers present
[   33.531013] mx INFO: Board 0: device 8, rev 0, 1 ports and 2096896 bytes of SRAM available
[   33.531017] mx INFO: Board 0: Bridge is 10de:005d
[   33.531228] mx INFO: Board 0: MAC address = 00:60:dd:46:ea:ec
[   33.535971] mx INFO: Loaded mcp of len 235448
[   34.489244] mx INFO: Starting usermode mapper at /opt/mx/sbin/mx_start_mapper
[   39.148855] mx INFO: mx0: Link0 is UP
[   39.588511] mx INFO: myri0: Will use skbuf frags (4096 bytes, order=0)
[   39.589299] mx INFO: 1 Myrinet board found and initialized
[  287.706367] daqd used greatest stack depth: 3368 bytes left
[86605.907520] daqd[18407]: segfault at 38b08e4c0 ip 00007f11b3942a6c sp 00007f10b1917d50 error 4
[86605.907530] daqd[18424]: segfault at 38b544f90 ip 00007f11b3942a6c sp 00007f10b12c6d30 error 4 in libc-2.10.1.so[7f11b390e000+14c000] in libc-2.10.1.so[7f11b390e000+14c00
0]
[86605.907544]
[86605.919454] daqd[21319] general protection ip:7f11b3942a6c sp:7f10b1814d30 error:0
[86605.919462] daqd[18442] general protection ip:7f11b3942a6c sp:7f10b0bf4d30 error:0
[86605.919615] daqd[18443]: segfault at 38aee3db0 ip 00007f11b3942a6c sp 00007f10b0b73d50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919694] daqd[18412]: segfault at 38aff35d0 ip 00007f11b3942a6c sp 00007f10b1752d30 error 4
[86605.919701] daqd[18417]: segfault at 38b544f70 ip 00007f11b3942a6c sp 00007f10b154dd50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919708] daqd[18445]: segfault at 38aff35b0 ip 00007f11b3942a6c sp 00007f10b0ab1d50 error 4
[86605.919733] daqd[18429]: segfault at 38b42ae90 ip 00007f11b3942a6c sp 00007f10b10c1d50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919741] daqd[18440]: segfault at 38b08e480 ip 00007f11b3942a6c sp 00007f10b0cb6d30 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958551]  in libc-2.10.1.so[7f11b390e000+14c000] in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958557]
[86605.958577]  in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958586]  in libc-2.10.1.so[7f11b390e000+14c000]
[86605.959639] daqd used greatest stack depth: 3160 bytes left
[98139.100888] show_signal_msg: 13 callbacks suppressed
[98139.100895] daqd[23753]: segfault at 39c7363b0 ip 00007f5bf253ba6c sp 00007f5b69b48d30 error 4 in libc-2.10.1.so[7f5bf2507000+14c000]
[98687.815120] daqd used greatest stack depth: 2984 bytes left
[208995.594227] daqd[10386] general protection ip:7f3b7c930a6c sp:7f3a79f09d50 error:0 in libc-2.10.1.so[7f3b7c8fc000+14c000]
[353015.067479] daqd used greatest stack depth: 2880 bytes left
[367406.863618] daqd[13078]: segfault at 41 ip 0000000000000041 sp 00007fb1f0ba2cf8 error 14 in daqd[400000+7c000]
[367406.863833] daqd[13104] general protection ip:7fb2f3018a6c sp:7fb1f01c8d30 error:0
[367406.863877] daqd[13086] general protection ip:7fb2f3018a6c sp:7fb1f089ad30 error:0
[367406.877408] daqd[13080]: segfault at 41 ip 0000000000000041 sp 00007fb1f0ae0ca8 error 14 in daqd[400000+7c000]
[367406.877435]  in libc-2.10.1.so[7fb2f2fe4000+14c000]
[367406.877442] daqd[13100]: segfault at 39ba287b0 ip 00007fb2f3018a6c sp 00007fb1f034cd30 error 4 in libc-2.10.1.so[7fb2f2fe4000+14c000]
[367406.878372]  in libc-2.10.1.so[7fb2f2fe4000+14c000]
[399802.887523] daqd[18295] general protection ip:7fb056a71a6c sp:7faf96125f10 error:0 in libc-2.10.1.so[7fb056a3d000+14c000]
[410595.969327] daqd[22057]: segfault at 3a91f27b0 ip 00007f48e96eea6c sp 00007f47e6c26d50 error 4 in libc-2.10.1.so[7f48e96ba000+14c000]
[410595.988926] daqd[22068]: segfault at 3a91f2790 ip 00007f48e96eea6c sp 00007f47e681bd30 error 4 in libc-2.10.1.so[7f48e96ba000+14c000]

  7105   Tue Aug 7 15:04:23 2012 JamieUpdateCDSdaqd problem was root-owned files and directories

Apparently the last problem was because of root-owned frame directories that daqd was trying to write to.  During debugging Alex had run daqd as root, but it's supposed to run as controls.  All the /frame directories are supposed to be owned by controls.  When daqd was run as root, it created new frame directories owned by root, which controls couldn't write to when I restarted daqd the proper way.  Once we chown'd the directories daqd started running again.

Alex also put in a "fix" for the core dump problem.  He touched an empty core file owned by root:

-rw-r--r-- 1 root root 0 Aug  7 14:38 /opt/rtcds/caltech/c1/target/fb/core

This will prevent any dying daqd process owned by controls from dumping it's core at that location.  Personally I think this is a horribly hacky "solution" that doesn't actually fix any of the issues that were causing the segfaults to begin with, but it might prevent some of the network slow down we see when the core does dump.  It's mostly just masking the problem, though, so I'm tempted to remove it so we all feel the pain when daqd starts shitting all over the network again.

ELOG V3.1.3-