40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 286 of 341  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  6473   Fri Mar 30 17:37:09 2012 steveUpdateGeneralcutting green welding glass for beam dumps

Quote:

Schott, green welding glass, shade 14, 3 mm thick  was measured in the beam path of 1.2W, S polarization of 1064nm at ~1 mm diameter size as MC reflected path.

Absorption 95%, R 5% at incident angle 25-50 degrees. It looks like the perfect material for beam trap.

 

 The CIT Chemistry Glass Shop cuts turned out to be sloppy using diamond disc blade  cutter.

East coast Precision Glass & Optics  offered scribed cut and polished side. Their quote price was high and time consuming.

The GLASS HOUSE  shop in Pasadena 626 / 796-9151 on Walnut did a good and cheap job. Oscar the cutter will finish the rest of the cutting by next Friday, April 6

He used scribed cutting technic and his 1" x  0.5"  pieces are good. Bob will pick up them up.

Attachment 1: IMG_0096.JPG
IMG_0096.JPG
  3770   Sat Oct 23 14:42:01 2010 yutaSummaryCDSdamped MC suspensions

After replacing filters, MC suspensions damped  last night.

Further measurement next time.

Attachment 1: dam.png
dam.png
  5264   Thu Aug 18 15:54:35 2011 steveUpdateSUSdamped and undamped OSEMs

damped sus at atm1 and freeswingging sus at atm2

 

Attachment 1: 5susLL.jpg
5susLL.jpg
Attachment 2: 8freeSUSLL.jpg
8freeSUSLL.jpg
  3692   Mon Oct 11 22:04:28 2010 yutaUpdateCDSdamping for MCs are basically working

Background:
 Even if we can't use the Dataviewer to get the time series of each 4 DOF displacements, we can still use StripTool to monitor the ringdowns and see if the damping servo is working.

What I did:
1. Excited vibration by kicking the mirror randomly (by putting some offsets randomly and turing the filters on and off randomly).

2. Turned all the servo off by clicking "ShutDown" button.

3. Turned all the servo on by clicking "Normal" button.

3. Monitored each 4 DOF displacements with StripTool and see if there are any considerably low-Q ringdown after turning on the servo.
 The values I monitored are as follows;
  C1:SUS_MCX_SUSPOS_INMON
  C1:SUS_MCX_SUSPIT_INMON
  C1:SUS_MCX_SUSYAW_INMON
  C1:SUS_MCX_SDSEN_INMON  (X=1,2,3)

All the settings I used for this observation are automatically saved here;
  /cvs/cds/caltech/burt/autoburt/snapshots/2010/Oct/11/21:07/c1mcs.epics

Result:
 Attached is the screenshots of StripTool Graph window for each 3 MCs.
 As you can see, the dampings for each DOF, each MCs are basically working.

Note:
 Do NOT turn off all the damping servo by clicking "Damp" buttons or setting the SUSXXX_GAIN to 0. It crashes c1mcs.

Next work:
 - check and relate the signal sign with the actual moving direction of the optics
 - fix data aquisition system
 - measure Q-values when the servo is on and off using DAQ and Dataviewer

Attachment 1: SUS-MC1.png
SUS-MC1.png
Attachment 2: SUS-MC2.png
SUS-MC2.png
Attachment 3: SUS-MC3.png
SUS-MC3.png
  2939   Mon May 17 10:57:16 2010 steveConfigurationSUSdamping restored to ETMYs

ETMY-south sus damping was restored

  5534   Sat Sep 24 01:21:11 2011 kiwamuUpdateSUSdamping test

As a suspension test I am leaving all of the suspensions restored and damped with OSEMS but without oplevs

  11104   Thu Mar 5 20:44:30 2015 JenneUpdateSUSdamprestore script updated

I just realized that the "damprestore" script that can be called from the watchdog screen did not have the new oplev names.  I have updated it, and added it to the svn.

  6529   Thu Apr 12 20:56:07 2012 DenUpdatePEMdaq

GUR1 XYZ, GUR2 XYZ, MC_F channels are now recorded at 256 Hz.

EDIT by JCD:  What Den means to say here is that (a) he modified some .ini files, and (b) he restarted the fb.

  7713   Wed Nov 14 21:59:09 2012 DenUpdateCDSdaq errors

I tried to add a test point to C1MCS model and spent next two hours rebooting front-ends, restarting models and realigning MC.

dmesg told me that DAQ channels can not be allocated as they already exist. Last time we met this problem Jamie emailed Alex about it. Jamie, what is the output? Restarting iop model does not help this time.

  6576   Thu Apr 26 20:44:23 2012 JamieUpdateCDSdaq network failure, c1ioo failing to start models

Den tried adding a SINGLE acquire channel to the c1ioo, which for some reason hung c1ioo and took down the entire DAQ network (at least all communication between the front ends and the fb).  We recovered by restarting c1ioo and restarting mx_stream on all the rest of the front-ends

After "recovering", though, c1ioo is failing to load models, or at least it's IOP. Here is the tail of dmesg when trying to start the IOP:

[ 1751.140283] c1x03: Initializing space for daqLib buffers
[ 1751.140284] c1x03: Initializing Network
[ 1751.140285] c1x03: Found 1 frameBuilders on network
[ 1751.250658] CPU 2 is now offline
[ 1751.250657] c1x03: Sync source = 4
[ 1751.250657] c1x03: Waiting for EPICS BURT Restore = 1
[ 1751.310008] c1x03: Waiting for EPICS BURT 0
[ 1751.310008] c1x03: BURT Restore Complete
[ 1751.310008] c1x03: Initialized servo control parameters.
[ 1751.311699] c1x03: DAQ Ex Min/Max = 1 3
[ 1751.311699] c1x03: DAQ XEx Min/Max = 3 53
[ 1751.311733] c1x03: DAQ Tp Min/Max = 10001 10007
[ 1751.311733] c1x03: DAQ XTp Min/Max = 10007 10507
[ 1751.311737] c1x03: DIRECT MEMORY MODE of size 64
[ 1751.311737] c1x03: daqLib DCU_ID = 33
[ 1751.311737] c1x03: Invalid num daq chans = 0
[ 1751.311737] c1x03: DAQ init failed -- exiting

The chan file for this model (/opt/rtcds/caltech/c1/chans/daq/C1X03.ini) looks totally fine, has two un-acquired channels uncommented, and has otherwise not been touched. The C1:FEC-33_MSGDAQ is also reading: "ERROR reading DAQ file!"

I'm at a loss for what is going on. I've tried restarting every CDS process on the machine, restarting the model multiple times, restarting fb, and even restarting the entire c1ioo machine, all to no affect.

  7093   Mon Aug 6 19:37:50 2012 JamieUpdateCDSdaqd and CDS network problems today

For some reason this afternoon we've been experiencing a lot of problems with the framebuilder, and with the CDS network in general.  The framebuilder has been very unresponsive, although the daqd logs seem to indicate that things are ok.  All models will loose contact with fb for very long stretches.  Attempts to kill/restart daqd don't seem to fix the problem.

These problems seem to be associated with the general CDS network issues as well.  The network seems to become very slow, and the workstations all become very slow.  The later I assume is because of the network and that so much of the work we do is on network mounted filesystems (/opt/rtcds, /ligo, etc.).

My current speculation is that daqd on fb is doing something stupid, like trying to read or write a bunch of stuff from /frames, which is also network mounted, and that clogs up the entire network.  Some serious network debugging is going to be needed to figure out what's going on, though.

I'm afraid daqd is caught in some bad state now, though.  It's not responding to anything, and every attempt to kill it seems to bring it back into the bad state.  Hopefully I can get Alex to help me figure out what's going on tomorrow.   Maybe it will clear up on it's own tonight...

  7094   Mon Aug 6 19:54:53 2012 JamieUpdateCDSdaqd and CDS network problems today

It looks like daqd is indeed caught in some bad state.  It seems to die at some point after making GPS corrections to minute trender:

...
[Mon Aug  6 19:45:13 2012] Minute trender made GPS time correction; gps=1028342727; gps%60=27
tail: `fb/logs/daqd.log' has been replaced;  following end of new file
263596
MX endpoint opened
startup file interpreter thread tid=140334118615312
calling yyparse(5, 6)
[Mon Aug  6 19:50:08 2012] ->5: #set avoid_reconnect
[Mon Aug  6 19:50:08 2012] ->5: set thread_stack_size=102400
[Mon Aug  6 19:50:08 2012] new threads will be created with the stack of size 102400K
[Mon Aug  6 19:50:08 2012] ->5: set allow_tpman_connect_fail
[Mon Aug  6 19:50:08 2012] ->5: #set dcu_status_check=5
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=-1
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=31535998
[Mon Aug  6 19:50:08 2012] ->5: ##set symm_gps_offset=347155213
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=378691215
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=378691212
[Mon Aug  6 19:50:08 2012] ->5: #set symm_gps_offset=315964799
[Mon Aug  6 19:50:08 2012] ->5: set symm_gps_offset=315964801
[Mon Aug  6 19:50:08 2012] ->5: set debug=0
[Mon Aug  6 19:50:08 2012] ->5: set log=2
[Mon Aug  6 19:50:08 2012] ->5: set zero_bad_data=0
[Mon Aug  6 19:50:08 2012] ->5: set dcu_status_check=9
[Mon Aug  6 19:50:08 2012] ->5: set controller_dcu=33
[Mon Aug  6 19:50:08 2012] ->5: set master_config="/opt/rtcds/caltech/c1/target/fb/master"
[Mon Aug  6 19:50:10 2012] finished configuring data channels
[Mon Aug  6 19:50:10 2012] ->5: configure channels begin end
Unable to find GDS node 90 system c1x00 in INI files
Unable to find GDS node 92 system c1tst2 in INI files
Unable to find GDS node 95 system c1x10 in INI files
[Mon Aug  6 19:50:10 2012] ->5: tpconfig "/opt/rtcds/caltech/c1/target/gds/param/testpoint.par"
[Mon Aug  6 19:50:10 2012] ->5: set gps_leaps = 820108813
[Mon Aug  6 19:50:10 2012] ->5: set detector_name="CIT"
[Mon Aug  6 19:50:10 2012] ->5: set detector_prefix="C1"
[Mon Aug  6 19:50:10 2012] ->5: set detector_longitude=-90.7742403889
[Mon Aug  6 19:50:10 2012] ->5: set detector_latitude=30.5628943337
[Mon Aug  6 19:50:10 2012] ->5: set detector_elevation=.0
[Mon Aug  6 19:50:10 2012] ->5: set detector_azimuths=1.1,4.7123889804
[Mon Aug  6 19:50:10 2012] ->5: set detector_altitudes=1.0,2.0
[Mon Aug  6 19:50:10 2012] ->5: set detector_midpoints=2000.0, 2000.0
[Mon Aug  6 19:50:10 2012] ->5: set num_dirs = 10
[Mon Aug  6 19:50:10 2012] ->5: set frames_per_dir=225
[Mon Aug  6 19:50:10 2012] ->5: set full_frames_per_file=1
[Mon Aug  6 19:50:10 2012] ->5: set full_frames_blocks_per_frame=16
[Mon Aug  6 19:50:10 2012] ->5: set frame_dir="/frames/full", "C-R-", ".gwf"
[Mon Aug  6 19:50:10 2012] ->5: set trend_num_dirs=10
[Mon Aug  6 19:50:10 2012] ->5: set trend_frames_per_dir=1440
[Mon Aug  6 19:50:10 2012] ->5: set trend_frame_dir= "/frames/trend/second", "C-T-", ".gwf"
[Mon Aug  6 19:50:10 2012] ->5: set raw-minute-trend-dir="/frames/trend/minute_raw"
[Mon Aug  6 19:50:10 2012] ->5: set nds-jobs-dir="/opt/rtcds/caltech/c1/target/fb"
[Mon Aug  6 19:50:10 2012] ->5: set minute-trend-num-dirs=10
[Mon Aug  6 19:50:10 2012] ->5: set minute-trend-frames-per-dir=24
[Mon Aug  6 19:50:10 2012] ->5: set minute-trend-frame-dir="/frames/trend/minute", "C-M-", ".gwf"
[Mon Aug  6 19:50:10 2012] ->5: start main 10
[Mon Aug  6 19:50:12 2012] main started
[Mon Aug  6 19:50:12 2012] ->5: start profiler
[Mon Aug  6 19:50:12 2012] ->5: # comment out this block to stop saving data
[Mon Aug  6 19:50:12 2012] frame saver started
[Mon Aug  6 19:50:12 2012] ->5: start frame-saver
[Mon Aug  6 19:50:13 2012] ->5: sync frame-saver
[Mon Aug  6 19:50:13 2012] ->5: start trender
[Mon Aug  6 19:50:13 2012] trender started
[Mon Aug  6 19:50:13 2012] trend frame saver started
[Mon Aug  6 19:50:13 2012] ->5: start trend-frame-saver
[Mon Aug  6 19:50:14 2012] ->5: sync trend-frame-saver
[Mon Aug  6 19:50:14 2012] minute trend frame saver started
[Mon Aug  6 19:50:14 2012] ->5: start minute-trend-frame-saver
[Mon Aug  6 19:50:14 2012] Done creating ADC structures
[Mon Aug  6 19:50:15 2012] ->5: sync minute-trend-frame-saver
[Mon Aug  6 19:50:15 2012] raw minute trend frame saver started
[Mon Aug  6 19:50:15 2012] ->5: start raw_minute_trend_saver
[Mon Aug  6 19:50:15 2012] ->5: #frame-writer "225.225.225.1" broadcast="131.215.113.0" all
[Mon Aug  6 19:50:15 2012] ->5: #sleep 5
[Mon Aug  6 19:50:15 2012] producer started
[Mon Aug  6 19:50:15 2012] ->5: start producer
[Mon Aug  6 19:50:15 2012] ->5: start epics dcu
[Mon Aug  6 19:50:15 2012] MX receiver thread started
[Mon Aug  6 19:50:15 2012] edcu started
[Mon Aug  6 19:50:15 2012] ->5: start epics server "C0:DAQ-DC0_" "C1:DAQ-DC0_"
[Mon Aug  6 19:50:15 2012] epics server started
[Mon Aug  6 19:50:15 2012] ->5: start listener 8087
[Mon Aug  6 19:50:15 2012] ->5: start listener 8088 1
[Mon Aug  6 19:50:15 2012] ->5: sleep 60
[Mon Aug  6 19:50:15 2012] Epics server started
[Mon Aug  6 19:50:15 2012] EDCU has 2553 channels configured; first=0

[Mon Aug  6 19:50:18 2012] Minute trender made GPS time correction; gps=1028343032; gps%60=32
...

The "tail:..." line indicates that the log was moved and replaced, which indicates a daqd restart.  As far as I know this was not manually triggered.

After the restart the same thing happens again.  About once every five minutes.

  7095   Mon Aug 6 20:08:45 2012 JamieUpdateCDSdaqd and CDS network problems today

When daqd is caught in this state it can not be killed.  It's in "uninterruptable sleep" ('D' state in the top output below).  This usually indicates that it's waiting for the kernel, usually due to some missing or hung IO.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                      
28038 controls  20   0 4430m 2.0g  19m D    0 27.1   0:15.00 daqd                                                                                         

The memory footprint also seems to be getting big.  It's clearly trying to do something stupid that it can't handle.

  3828   Fri Oct 29 18:37:33 2010 yutaSummaryCDSdaqd and current CDS status

Background:
  Before Joe left(~ 1 hour ago), fb was working for a while. But after he left, daqd core dumped.
  This is maybe because we started c1sus and c1rms again for a delay measurement, just before he left.

What I did:
  I restarted IOP(c1x02) and FE models.
  Now it seems OK (we can use dataviewer and diaggui), but daqd reports bunch of errors like;

CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C1:SUS-ITMX_TO_COIL_0_3_INMON", Connecting to: 192.168.113.85:42367, Ignored: c1susdaq:42367"
    Source File: ../cac.cpp line 1208
    Current Time: Fri Oct 29 2010 18:07:39.132686519
..................................................................


tp node xx invalid  (xx is 38 to 36)

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant  ...
                   ... 

   Please add other stuff you need.

Below is an example of how the color code works:

06-25-2009.NN_24ThreatLevel.GJH2L69BK.1.jpg

  9536   Tue Jan 7 23:53:35 2014 JamieUpdateCDSdaqd can't connect to c1vac1, c1vac2

dadq is logging the following error messages to it's log related to the fact that it can't connect to c1vac1 and c1vac2:

CAC: Unable to connect because "Connection timed out"
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "c1vac2.martian:5064"
    Source File: ../cac.cpp line 1127
    Current Time: Tue Jan 07 2014 23:50:53.355609430
..................................................................
CAC: Unable to connect because "Connection timed out"
CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "c1vac1.martian:5064"
    Source File: ../cac.cpp line 1127
    Current Time: Tue Jan 07 2014 23:50:53.356568469
..................................................................

 Not sure if this is related to the full /frames issue that we've been seeing.

  14889   Tue Sep 17 14:01:46 2019 gautamUpdateCDSdaqd fw dead

For some reason, the daqd_fw service was dead on FB. This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 . Simply restarting the fw service does not work, it crashes again after ~20 seconds. The problem may have to do with the indeterminate state of the c1lsc expansion chassis. However, this is not something that can immediately be fixed, as Chub is still working on the wiring there. So in summary, no frame data will be available until we fix this problem (it is still unclear what exactly the problem is). Team WFS can still work by getting online data.

Why were the CDS overview DC indicators not red???


Unrelated to this work: I had to key the c1psl crate to get the IMC autolocker functioning again. However, I found that the key 🔑 turns continuously - as opposed to having two well defined states, ON and OFF. Be careful while handling this.

  14891   Tue Sep 17 21:34:07 2019 gautamUpdateCDSdaqd fw dead no more

Summary:

  1. Frames seem to be written again.yesSlowly but surely, we are converging to an operable state...
  2. No frames are available for the period 23 Aug to 17 September 2019
  3. Don't edit the C0EDCU.ini file unless you know what you're doing.
  4. If you make some changes to the RT system/channel list or reboot FEs, please make sure all the dependent systems are back up and running. There shouldn't be a need to willy-nilly reboot things.
  5. Tomorrow I will prepare the map of BIO channels for Chub to restore the whitening switching capability. Then we can try locking some cavities.

Details:

  1. First, I checked to make sure the /frames partition wasn't full. It wasn't. yes
  2. Next, I looked into the C0EDCU.ini file.
    • The last date for which frames are available, 23 Aug, coincided with the date when this file was modified.
    • It is a known problem that the daqd_fw service can crash if one of the channels in this file is reporting an unusually large number.
    • Several channels were added to this file - in the end, only 9 new ones were required, 5x "DetectMon" channels for each of the RF demodulation frequencies, and 4 for the new ALS LO and RF signal power monitor channels.
    • It is highly likely that one of the other channels was what caused the daqd_fw service to crash - though I can't say for sure, because I did not exhaustively search through the ~100 un-necessary channels that were in this file to see what values they were reporting.
  3. For good measure, I ran the reboot script, and brought the c1lsc models back online.
    • I want to do the mapping of the BIO channels to the pin-out of the BIO adaptor unit, which requires c1lsc to run.
    • Reboot script ran smoothly.
  4. Then I went into fb and restarted all the daqd services. This time, they all seem to run without crashing, at least in the ~10min window it took me to type out this elog.

controls@fb1:~ 127$ sudo systemctl status  daqd_fw.service
● daqd_fw.service - Advanced LIGO RTS daqd frame writer
   Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
   Active: active (running) since Tue 2019-09-17 21:32:25 PDT; 17min ago
 Main PID: 22040 (daqd_fw)
   CGroup: /daqd.slice/daqd_fw.service
           └─22040 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw

Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer crc thread - label dqprodcrc pid=22108
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] [Tue Sep 17 21:32:31 2019] Producer thread - label dqproddbg pid=22109Producer crc... permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer crc thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread - label dqprod pid=22103
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:35 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:35 2019] Minute trender made GPS time correction; gps=1252816371; gps%60=51
Sep 17 21:33:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:33:31 2019] ->3: clear crc

drwxr-xr-x 2 controls controls 569344 Aug 23 05:17 12465
drwxr-xr-x 2 controls controls 565248 Aug 23 05:41 12466
drwxr-xr-x 2 controls controls 557056 Aug 23 05:53 12505
drwxr-xr-x 2 controls controls 262144 Aug 23 18:40 12506
drwxr-xr-x 2 controls controls  12288 Sep 17 21:54 12528
 

Unrelated to this work: c1auxey was keyed.

Quote:

This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 .

Attachment 1: RTFEstatus.png
RTFEstatus.png
  11854   Mon Dec 7 00:45:28 2015 ericqUpdateCDSdaqd is mad

I glanced at the summary pages and noticed that, since Friday around when we first loaded up the new BLRMS parts, daqd has crashing very frequently (few times per hour). 

I'm going to comment out the c1pem lines from the daqd master file for tonight, and see if that helps. 

  11856   Mon Dec 7 10:45:21 2015 ericqUpdateCDSdaqd is mad

Since removing c1pem from the daqd master file, daqd has not crashed. I suppose we're running into the stability issue that motivated us to disable some of the other models (IOPs, RFM, etc.) during the RCG upgrade. 

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

  11859   Mon Dec 7 11:25:10 2015 jamieUpdateCDSdaqd is mad
Quote:

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

The new fb1 daqd was also crashing even without the trend writing enabled.  I'm not sure how much that's affected by the load, though, e.g. it might be able to handle the extra load fine but then die because of some other issue not related to the number of channels being acquired.

We should schedule some time this week to work on fb1 some more.

  5486   Tue Sep 20 17:45:30 2011 kiwamuUpdateCDSdaqd is restarting by hisself ?

[Jenne / Kiwamu]

 Fb was sick. Dataviewer and Fourier Tools didn't work for a while.

After 10 minutes later they became healthy again. No idea what exactly was going on.

One thing we found was that : during the sickness of fb, it looks like daqd was restarting by hisself. Is this normal ??

Here is the bottom sentences of restart.log. Apparently daqd was rebooting although we didn't command to do so.

  daqd_start Tue Sep 20 02:41:17 PDT 2011
  daqd_start Tue Sep 20 13:18:12 PDT 2011
  daqd_start Tue Sep 20 17:33:00 PDT 2011

  1551   Wed May 6 16:56:35 2009 rana, alex, joeConfigurationComputersdaqd log, cront, etc.
While Alex came over, we investigated the log file problems with DAQD and NDS on FB0. There was a lot of
the standard puzzling and mumbling, but eventually we saw that it doesn't create its log file and so it
doesn't write to it. The log file is /usr/controls/main_daqd.log. The other files called daqd.log.DATE
in the logs/ directory are actually not written to. Its awesome.

We also have put in a fix for the overflowing jobs/ directory. It gets a file written to it every time
you make and NDS request and our seisBLRMS has been overloading it. There's now a cron for it in the fb0
crontab which cleans out week-old files at 6:30 AM every day.

We also changed the time of the daily backup from 3:30 AM (when people are still working) to 5:50 AM
(by which time the seismic has ramped up and interferometerists should be asleep). I didn't like the
idea of a bandwidth hog nailing the framebuilder during the peak of interferometer work.

#
# Script to backup via rsync the most recent 40m minute trends and
# any changes to the /cvs/cds filesystem.
#
50 05 * * * /cvs/cds/caltech/scripts/backup/rsync.backup < /dev/null > /cvs/cds/caltech/scripts/\
backup/rsync.backup.log 2>&1

30 06 * * * find /usr/controls/jobs -mtime +7 -exec /bin/rm -f {} \;

seisBLRMS.m restarted on mafalda.
  9530   Tue Jan 7 22:44:45 2014 JenneUpdateCDSdaqd on fb is segfaulting every ~30 seconds

The daqd process is segfaulting and restarting itself every 30 seconds or so.  It's pretty frustrating. 

Just for kicks, I tried an mxstream restart, clearing the testpoints, and restarting the daqd process, but none of things changed anything.  

Manasa found an elog from a year ago (elog 7105 and preceding), but I'm not sure that it's a similar / related problem.  Jamie, please help us!

Here is a screen dump from the "dtail":

Every 1.0s: dmesg | tail -50                                                                                                                         Tue Jan  7 22:43:23 2014

[   33.498691]  [<ffffffff8104a063>] kthread+0x7a/0x82
[   33.498695]  [<ffffffff81003654>] kernel_thread_helper+0x4/0x10
[   33.498698]  [<ffffffff81049fe9>] ? kthread+0x0/0x82
[   33.498701]  [<ffffffff81003650>] ? kernel_thread_helper+0x0/0x10
[   33.498703] ---[ end trace 6236defa99b3e091 ]---
[   33.498705] mx INFO: Board 0: allocated MSI IRQ 67
[   33.498713] mx INFO: CPU0: PAT = 0x7010600070106
[   33.498715] mx INFO: CPU0: new PAT = 0x1010600070106
[   33.498718] mx INFO: Board 0: Using PAT index 6
[   33.499101] eth0: no IPv6 routers present
[   33.531013] mx INFO: Board 0: device 8, rev 0, 1 ports and 2096896 bytes of SRAM available
[   33.531017] mx INFO: Board 0: Bridge is 10de:005d
[   33.531228] mx INFO: Board 0: MAC address = 00:60:dd:46:ea:ec
[   33.535971] mx INFO: Loaded mcp of len 235448
[   34.489244] mx INFO: Starting usermode mapper at /opt/mx/sbin/mx_start_mapper
[   39.148855] mx INFO: mx0: Link0 is UP
[   39.588511] mx INFO: myri0: Will use skbuf frags (4096 bytes, order=0)
[   39.589299] mx INFO: 1 Myrinet board found and initialized
[  287.706367] daqd used greatest stack depth: 3368 bytes left
[86605.907520] daqd[18407]: segfault at 38b08e4c0 ip 00007f11b3942a6c sp 00007f10b1917d50 error 4
[86605.907530] daqd[18424]: segfault at 38b544f90 ip 00007f11b3942a6c sp 00007f10b12c6d30 error 4 in libc-2.10.1.so[7f11b390e000+14c000] in libc-2.10.1.so[7f11b390e000+14c00
0]
[86605.907544]
[86605.919454] daqd[21319] general protection ip:7f11b3942a6c sp:7f10b1814d30 error:0
[86605.919462] daqd[18442] general protection ip:7f11b3942a6c sp:7f10b0bf4d30 error:0
[86605.919615] daqd[18443]: segfault at 38aee3db0 ip 00007f11b3942a6c sp 00007f10b0b73d50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919694] daqd[18412]: segfault at 38aff35d0 ip 00007f11b3942a6c sp 00007f10b1752d30 error 4
[86605.919701] daqd[18417]: segfault at 38b544f70 ip 00007f11b3942a6c sp 00007f10b154dd50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919708] daqd[18445]: segfault at 38aff35b0 ip 00007f11b3942a6c sp 00007f10b0ab1d50 error 4
[86605.919733] daqd[18429]: segfault at 38b42ae90 ip 00007f11b3942a6c sp 00007f10b10c1d50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919741] daqd[18440]: segfault at 38b08e480 ip 00007f11b3942a6c sp 00007f10b0cb6d30 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958551]  in libc-2.10.1.so[7f11b390e000+14c000] in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958557]
[86605.958577]  in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958586]  in libc-2.10.1.so[7f11b390e000+14c000]
[86605.959639] daqd used greatest stack depth: 3160 bytes left
[98139.100888] show_signal_msg: 13 callbacks suppressed
[98139.100895] daqd[23753]: segfault at 39c7363b0 ip 00007f5bf253ba6c sp 00007f5b69b48d30 error 4 in libc-2.10.1.so[7f5bf2507000+14c000]
[98687.815120] daqd used greatest stack depth: 2984 bytes left
[208995.594227] daqd[10386] general protection ip:7f3b7c930a6c sp:7f3a79f09d50 error:0 in libc-2.10.1.so[7f3b7c8fc000+14c000]
[353015.067479] daqd used greatest stack depth: 2880 bytes left
[367406.863618] daqd[13078]: segfault at 41 ip 0000000000000041 sp 00007fb1f0ba2cf8 error 14 in daqd[400000+7c000]
[367406.863833] daqd[13104] general protection ip:7fb2f3018a6c sp:7fb1f01c8d30 error:0
[367406.863877] daqd[13086] general protection ip:7fb2f3018a6c sp:7fb1f089ad30 error:0
[367406.877408] daqd[13080]: segfault at 41 ip 0000000000000041 sp 00007fb1f0ae0ca8 error 14 in daqd[400000+7c000]
[367406.877435]  in libc-2.10.1.so[7fb2f2fe4000+14c000]
[367406.877442] daqd[13100]: segfault at 39ba287b0 ip 00007fb2f3018a6c sp 00007fb1f034cd30 error 4 in libc-2.10.1.so[7fb2f2fe4000+14c000]
[367406.878372]  in libc-2.10.1.so[7fb2f2fe4000+14c000]
[399802.887523] daqd[18295] general protection ip:7fb056a71a6c sp:7faf96125f10 error:0 in libc-2.10.1.so[7fb056a3d000+14c000]
[410595.969327] daqd[22057]: segfault at 3a91f27b0 ip 00007f48e96eea6c sp 00007f47e6c26d50 error 4 in libc-2.10.1.so[7f48e96ba000+14c000]
[410595.988926] daqd[22068]: segfault at 3a91f2790 ip 00007f48e96eea6c sp 00007f47e681bd30 error 4 in libc-2.10.1.so[7f48e96ba000+14c000]

  7105   Tue Aug 7 15:04:23 2012 JamieUpdateCDSdaqd problem was root-owned files and directories

Apparently the last problem was because of root-owned frame directories that daqd was trying to write to.  During debugging Alex had run daqd as root, but it's supposed to run as controls.  All the /frame directories are supposed to be owned by controls.  When daqd was run as root, it created new frame directories owned by root, which controls couldn't write to when I restarted daqd the proper way.  Once we chown'd the directories daqd started running again.

Alex also put in a "fix" for the core dump problem.  He touched an empty core file owned by root:

-rw-r--r-- 1 root root 0 Aug  7 14:38 /opt/rtcds/caltech/c1/target/fb/core

This will prevent any dying daqd process owned by controls from dumping it's core at that location.  Personally I think this is a horribly hacky "solution" that doesn't actually fix any of the issues that were causing the segfaults to begin with, but it might prevent some of the network slow down we see when the core does dump.  It's mostly just masking the problem, though, so I'm tempted to remove it so we all feel the pain when daqd starts shitting all over the network again.

  6106   Mon Dec 12 13:02:08 2011 kiwamuUpdateCDSdaqd restarted

I have restarted the daqd process at 1:01 PM since I have added some new ALS's daq channels.

  7102   Tue Aug 7 14:17:07 2012 JamieUpdateCDSdaqd running again; related to c1sup issue

So daqd's problem was apparently the bad/non-running c1sup model.  The c1sup model, which I reported on attempting to get running in 7097, was not running because there were no available CPUs on the c1sus FE machine.  This was due to my stupid undercounting of the number of CPUs.  Anyway, for reasons I don't understand, this was causing daqd to segfault.  Removing c1sup from c1sus "fixed" the problem.

Alex agreed that daqd should definitely not be segfaulting in this circumstance.  It's still unclear exactly what daqd was looking at that was causing it to crash.

I'm going to move c1sup to c1iscex, which has a lot of spare CPUs.

  7096   Mon Aug 6 20:22:50 2012 JamieUpdateCDSdaqd segfaulting after five minutes

I tried running daqd manually, and sure enough it segfaults after about five minutes (see log below).  I've uncommented it from /etc/inittab on fb and I'm leaving it off for now until we can figure out what's going on.

controls@fb /opt/rtcds/caltech/c1/target/fb 0$ /opt/rtcds/caltech/c1/target/fb/daqd -c /opt/rtcds/caltech/c1/target/fb/daqdrc
263596
MX endpoint opened
startup file interpreter thread tid=139790943115536
calling yyparse(5, 6)
[Mon Aug  6 20:15:27 2012] ->5: #set avoid_reconnect
[Mon Aug  6 20:15:27 2012] ->5: set thread_stack_size=102400
[Mon Aug  6 20:15:27 2012] new threads will be created with the stack of size 102400K
[Mon Aug  6 20:15:27 2012] ->5: set allow_tpman_connect_fail
[Mon Aug  6 20:15:27 2012] ->5: #set dcu_status_check=5
[Mon Aug  6 20:15:27 2012] ->5: #set symm_gps_offset=-1
[Mon Aug  6 20:15:27 2012] ->5: #set symm_gps_offset=31535998
[Mon Aug  6 20:15:27 2012] ->5: ##set symm_gps_offset=347155213
[Mon Aug  6 20:15:27 2012] ->5: #set symm_gps_offset=378691215
[Mon Aug  6 20:15:27 2012] ->5: #set symm_gps_offset=378691212
[Mon Aug  6 20:15:27 2012] ->5: #set symm_gps_offset=315964799
[Mon Aug  6 20:15:27 2012] ->5: set symm_gps_offset=315964801
[Mon Aug  6 20:15:27 2012] ->5: set debug=0
[Mon Aug  6 20:15:27 2012] ->5: set log=2
[Mon Aug  6 20:15:27 2012] ->5: set zero_bad_data=0
[Mon Aug  6 20:15:27 2012] ->5: set dcu_status_check=9
[Mon Aug  6 20:15:27 2012] ->5: set controller_dcu=33
[Mon Aug  6 20:15:27 2012] ->5: set master_config="/opt/rtcds/caltech/c1/target/fb/master"
[Mon Aug  6 20:15:30 2012] finished configuring data channels
[Mon Aug  6 20:15:30 2012] ->5: configure channels begin end
GDS server NODE=19 HOST=c1iscex DCUID=19
GDS server NODE=20 HOST=c1sus DCUID=20
GDS server NODE=21 HOST=c1sus DCUID=21
GDS server NODE=22 HOST=c1lsc DCUID=22
GDS server NODE=25 HOST=c1iscex DCUID=61
GDS server NODE=28 HOST=c1ioo DCUID=28
GDS server NODE=33 HOST=c1ioo DCUID=33
GDS server NODE=34 HOST=c1ioo DCUID=34
GDS server NODE=36 HOST=c1sus DCUID=36
GDS server NODE=38 HOST=c1sus DCUID=38
GDS server NODE=39 HOST=c1sus DCUID=39
GDS server NODE=40 HOST=c1lsc DCUID=40
GDS server NODE=42 HOST=c1lsc DCUID=42
GDS server NODE=45 HOST=c1iscex DCUID=45
GDS server NODE=46 HOST=c1iscey DCUID=46
GDS server NODE=47 HOST=c1iscey DCUID=47
GDS server NODE=48 HOST=c1lsc DCUID=48
GDS server NODE=50 HOST=c1lsc DCUID=50
GDS server NODE=51 HOST=c1ioo DCUID=51
GDS server NODE=60 HOST=c1lsc DCUID=60
GDS server NODE=61 HOST=c1iscex DCUID=61
GDS server NODE=62 HOST=c1sus DCUID=62
Unable to find GDS node 90 system c1x00 in INI files
GDS server NODE=91 HOST=c1lsc DCUID=60
Unable to find GDS node 92 system c1tst2 in INI files
Unable to find GDS node 95 system c1x10 in INI files
TP: node = 19, host = c1iscex, dup = 0, prog = 0x31002013, vers = 1
Initialized TP interface node=19, host=c1iscex
TP: node = 20, host = c1sus, dup = 0, prog = 0x31002014, vers = 1
Initialized TP interface node=20, host=c1sus
TP: node = 21, host = c1sus, dup = 0, prog = 0x31002015, vers = 1
Initialized TP interface node=21, host=c1sus
TP: node = 22, host = c1lsc, dup = 0, prog = 0x31002016, vers = 1
Initialized TP interface node=22, host=c1lsc
TP: node = 25, host = c1iscex, dup = 0, prog = 0x31002019, vers = 1
Initialized TP interface node=25, host=c1iscex
TP: node = 28, host = c1ioo, dup = 0, prog = 0x3100201c, vers = 1
Initialized TP interface node=28, host=c1ioo
TP: node = 33, host = c1ioo, dup = 0, prog = 0x31002021, vers = 1
Initialized TP interface node=33, host=c1ioo
TP: node = 34, host = c1ioo, dup = 0, prog = 0x31002022, vers = 1
Initialized TP interface node=34, host=c1ioo
TP: node = 36, host = c1sus, dup = 0, prog = 0x31002024, vers = 1
Initialized TP interface node=36, host=c1sus
TP: node = 38, host = c1sus, dup = 0, prog = 0x31002026, vers = 1
Initialized TP interface node=38, host=c1sus
TP: node = 39, host = c1sus, dup = 0, prog = 0x31002027, vers = 1
Initialized TP interface node=39, host=c1sus
TP: node = 40, host = c1lsc, dup = 0, prog = 0x31002028, vers = 1
Initialized TP interface node=40, host=c1lsc
TP: node = 42, host = c1lsc, dup = 0, prog = 0x3100202a, vers = 1
Initialized TP interface node=42, host=c1lsc
TP: node = 45, host = c1iscex, dup = 0, prog = 0x3100202d, vers = 1
Initialized TP interface node=45, host=c1iscex
TP: node = 46, host = c1iscey, dup = 0, prog = 0x3100202e, vers = 1
Initialized TP interface node=46, host=c1iscey
TP: node = 47, host = c1iscey, dup = 0, prog = 0x3100202f, vers = 1
Initialized TP interface node=47, host=c1iscey
TP: node = 48, host = c1lsc, dup = 0, prog = 0x31002030, vers = 1
Initialized TP interface node=48, host=c1lsc
TP: node = 50, host = c1lsc, dup = 0, prog = 0x31002032, vers = 1
Initialized TP interface node=50, host=c1lsc
TP: node = 51, host = c1ioo, dup = 0, prog = 0x31002033, vers = 1
Initialized TP interface node=51, host=c1ioo
TP: node = 60, host = c1lsc, dup = 0, prog = 0x3100203c, vers = 1
Initialized TP interface node=60, host=c1lsc
TP: node = 61, host = c1iscex, dup = 0, prog = 0x3100203d, vers = 1
Initialized TP interface node=61, host=c1iscex
TP: node = 62, host = c1sus, dup = 0, prog = 0x3100203e, vers = 1
Initialized TP interface node=62, host=c1sus
TP: node = 91, host = c1lsc, dup = 0, prog = 0x3100205b, vers = 1
Initialized TP interface node=91, host=c1lsc
[Mon Aug  6 20:15:30 2012] ->5: tpconfig "/opt/rtcds/caltech/c1/target/gds/param/testpoint.par"
[Mon Aug  6 20:15:30 2012] ->5: set gps_leaps = 820108813
[Mon Aug  6 20:15:30 2012] ->5: set detector_name="CIT"
[Mon Aug  6 20:15:30 2012] ->5: set detector_prefix="C1"
[Mon Aug  6 20:15:30 2012] ->5: set detector_longitude=-90.7742403889
[Mon Aug  6 20:15:30 2012] ->5: set detector_latitude=30.5628943337
[Mon Aug  6 20:15:30 2012] ->5: set detector_elevation=.0
[Mon Aug  6 20:15:30 2012] ->5: set detector_azimuths=1.1,4.7123889804
[Mon Aug  6 20:15:30 2012] ->5: set detector_altitudes=1.0,2.0
[Mon Aug  6 20:15:30 2012] ->5: set detector_midpoints=2000.0, 2000.0
[Mon Aug  6 20:15:30 2012] ->5: set num_dirs = 10
[Mon Aug  6 20:15:30 2012] ->5: set frames_per_dir=225
[Mon Aug  6 20:15:30 2012] ->5: set full_frames_per_file=1
[Mon Aug  6 20:15:30 2012] ->5: set full_frames_blocks_per_frame=16
[Mon Aug  6 20:15:30 2012] ->5: set frame_dir="/frames/full", "C-R-", ".gwf"
[Mon Aug  6 20:15:30 2012] ->5: set trend_num_dirs=10
[Mon Aug  6 20:15:30 2012] ->5: set trend_frames_per_dir=1440
[Mon Aug  6 20:15:30 2012] ->5: set trend_frame_dir= "/frames/trend/second", "C-T-", ".gwf"
[Mon Aug  6 20:15:30 2012] ->5: set raw-minute-trend-dir="/frames/trend/minute_raw"
[Mon Aug  6 20:15:30 2012] ->5: set nds-jobs-dir="/opt/rtcds/caltech/c1/target/fb"
[Mon Aug  6 20:15:30 2012] ->5: set minute-trend-num-dirs=10
[Mon Aug  6 20:15:30 2012] ->5: set minute-trend-frames-per-dir=24
[Mon Aug  6 20:15:30 2012] ->5: set minute-trend-frame-dir="/frames/trend/minute", "C-M-", ".gwf"
[Mon Aug  6 20:15:30 2012] ->5: start main 10
Allocated move buffer size 11616356 bytes
[Mon Aug  6 20:15:32 2012] main started
[Mon Aug  6 20:15:32 2012] ->5: start profiler
[Mon Aug  6 20:15:32 2012] ->5: # comment out this block to stop saving data
[Mon Aug  6 20:15:32 2012] frame saver started
[Mon Aug  6 20:15:32 2012] ->5: start frame-saver
[Mon Aug  6 20:15:33 2012] ->5: sync frame-saver
[Mon Aug  6 20:15:33 2012] ->5: start trender
[Mon Aug  6 20:15:33 2012] trender started
[Mon Aug  6 20:15:33 2012] trend frame saver started
[Mon Aug  6 20:15:33 2012] ->5: start trend-frame-saver
[Mon Aug  6 20:15:34 2012] ->5: sync trend-frame-saver
[Mon Aug  6 20:15:34 2012] minute trend frame saver started
[Mon Aug  6 20:15:34 2012] ->5: start minute-trend-frame-saver
[Mon Aug  6 20:15:34 2012] Done creating ADC structures
[Mon Aug  6 20:15:35 2012] ->5: sync minute-trend-frame-saver
[Mon Aug  6 20:15:35 2012] raw minute trend frame saver started
[Mon Aug  6 20:15:35 2012] ->5: start raw_minute_trend_saver
[Mon Aug  6 20:15:35 2012] ->5: #frame-writer "225.225.225.1" broadcast="131.215.113.0" all
[Mon Aug  6 20:15:35 2012] ->5: #sleep 5
[Mon Aug  6 20:15:35 2012] producer started
[Mon Aug  6 20:15:35 2012] ->5: start producer
[Mon Aug  6 20:15:35 2012] ->5: start epics dcu
[Mon Aug  6 20:15:35 2012] MX receiver thread started
[Mon Aug  6 20:15:35 2012] edcu started
[Mon Aug  6 20:15:35 2012] ->5: start epics server "C0:DAQ-DC0_" "C1:DAQ-DC0_"
[Mon Aug  6 20:15:35 2012] epics server started
[Mon Aug  6 20:15:35 2012] ->5: start listener 8087
[Mon Aug  6 20:15:35 2012] ->5: start listener 8088 1
[Mon Aug  6 20:15:35 2012] ->5: sleep 60
Creating C1:DAQ-DC0_PEM_SLOW_STATUS
Creating C1:DAQ-DC0_PEM_SLOW_CRC_CPS
Creating C1:DAQ-DC0_PEM_SLOW_CRC_SUM
Creating C1:DAQ-DC0_C1X01_STATUS
Creating C1:DAQ-DC0_C1X01_CRC_CPS
Creating C1:DAQ-DC0_C1X01_CRC_SUM
Creating C1:DAQ-DC0_C1X02_STATUS
Creating C1:DAQ-DC0_C1X02_CRC_CPS
Creating C1:DAQ-DC0_C1X02_CRC_SUM
Creating C1:DAQ-DC0_C1SUS_STATUS
Creating C1:DAQ-DC0_C1SUS_CRC_CPS
Creating C1:DAQ-DC0_C1SUS_CRC_SUM
Creating C1:DAQ-DC0_C1OAF_STATUS
Creating C1:DAQ-DC0_C1OAF_CRC_CPS
Creating C1:DAQ-DC0_C1OAF_CRC_SUM
Creating C1:DAQ-DC0_C1ALS_STATUS
Creating C1:DAQ-DC0_C1ALS_CRC_CPS
Creating C1:DAQ-DC0_C1ALS_CRC_SUM
Creating C1:DAQ-DC0_C1X03_STATUS
Creating C1:DAQ-DC0_C1X03_CRC_CPS
Creating C1:DAQ-DC0_C1X03_CRC_SUM
Creating C1:DAQ-DC0_C1IOO_STATUS
Creating C1:DAQ-DC0_C1IOO_CRC_CPS
Creating C1:DAQ-DC0_C1IOO_CRC_SUM
Creating C1:DAQ-DC0_C1MCS_STATUS
Creating C1:DAQ-DC0_C1MCS_CRC_CPS
Creating C1:DAQ-DC0_C1MCS_CRC_SUM
Creating C1:DAQ-DC0_C1RFM_STATUS
Creating C1:DAQ-DC0_C1RFM_CRC_CPS
Creating C1:DAQ-DC0_C1RFM_CRC_SUM
Creating C1:DAQ-DC0_C1PEM_STATUS
Creating C1:DAQ-DC0_C1PEM_CRC_CPS
Creating C1:DAQ-DC0_C1PEM_CRC_SUM
Creating C1:DAQ-DC0_C1X04_STATUS
Creating C1:DAQ-DC0_C1X04_CRC_CPS
Creating C1:DAQ-DC0_C1X04_CRC_SUM
Creating C1:DAQ-DC0_C1LSC_STATUS
Creating C1:DAQ-DC0_C1LSC_CRC_CPS
Creating C1:DAQ-DC0_C1LSC_CRC_SUM
Creating C1:DAQ-DC0_C1SCX_STATUS
Creating C1:DAQ-DC0_C1SCX_CRC_CPS
Creating C1:DAQ-DC0_C1SCX_CRC_SUM
Creating C1:DAQ-DC0_C1X05_STATUS
Creating C1:DAQ-DC0_C1X05_CRC_CPS
Creating C1:DAQ-DC0_C1X05_CRC_SUM
Creating C1:DAQ-DC0_C1SCY_STATUS
Creating C1:DAQ-DC0_C1SCY_CRC_CPS
Creating C1:DAQ-DC0_C1SCY_CRC_SUM
Creating C1:DAQ-DC0_C1ASS_STATUS
Creating C1:DAQ-DC0_C1ASS_CRC_CPS
Creating C1:DAQ-DC0_C1ASS_CRC_SUM
Creating C1:DAQ-DC0_C1CAL_STATUS
Creating C1:DAQ-DC0_C1CAL_CRC_CPS
Creating C1:DAQ-DC0_C1CAL_CRC_SUM
Creating C1:DAQ-DC0_C1MCC_STATUS
Creating C1:DAQ-DC0_C1MCC_CRC_CPS
Creating C1:DAQ-DC0_C1MCC_CRC_SUM
Creating C1:DAQ-DC0_C1MCP_STATUS
Creating C1:DAQ-DC0_C1MCP_CRC_CPS
Creating C1:DAQ-DC0_C1MCP_CRC_SUM
Creating C1:DAQ-DC0_C1LSP_STATUS
Creating C1:DAQ-DC0_C1LSP_CRC_CPS
Creating C1:DAQ-DC0_C1LSP_CRC_SUM
Creating C1:DAQ-DC0_C1SPX_STATUS
Creating C1:DAQ-DC0_C1SPX_CRC_CPS
Creating C1:DAQ-DC0_C1SPX_CRC_SUM
Creating C1:DAQ-DC0_C1SUP_STATUS
Creating C1:DAQ-DC0_C1SUP_CRC_CPS
Creating C1:DAQ-DC0_C1SUP_CRC_SUM
[Mon Aug  6 20:15:35 2012] Epics server started
[Mon Aug  6 20:15:35 2012] EDCU has 2553 channels configured; first=0

Symmetricom status: LOCKED
Starting at gps 1028344552 prev_gps 1028344552 frac 312500000 f 314094022
[Mon Aug  6 20:15:38 2012] Minute trender made GPS time correction; gps=1028344552; gps%60=52
Segmentation fault (core dumped)

  13145   Wed Jul 26 19:13:07 2017 JamieUpdateCDSdaqd showing same instability as before

I recompiled daqd on the updated fb1, similar to how I had before, and we're seeing the same instability: process crashes when it tries to write out the second trend (technically it looks like it crashes while it's trying to write out the full frame while the second trend is also being written out).  Jonathan Hanks and I are actively looking into it and i'll provide further report soon.

  13176   Wed Aug 9 12:05:57 2017 ranaUpdateElectronicsdata archiving

This kind of data fitting and analysis is really useful. We should figure out a way to archive it. Perhaps the data files and fitting stuff can be put into GIT in some smart way? The fit results can be added to the 40m MC electronics DCC tree. Then the links can be added to this elog.

  14723   Wed Jul 3 23:53:38 2019 MilindUpdateCamerasdata for nns

Tried collecting data today. Was unable to keep the camera_server code running for any length of time as it threw segfaults. Will take a shot again tomorrow.

Quote:

The GigE is focused now (judged by eye) and I have closed the lid. I'm attaching a picture of the MC2 beam spot, captured using GigE at an exposure time of 400µs.

What was the solution to resolving the flaky video streaming during the alignment process????

-> I think, the issue was with either the poor wireless network conection or the GigE-PoE ethernet cable.

Quote:

Turns out, focusing the GigE is actually a bit tricky. With pylon, everytime I change the exposure or the focus, I'm running into the error I had mentioned earlier in one of my elogs; so I tried using the python scripts to interact with the GigE. But whenever I try to change the focal plane distance by rotating the lens coupler, the ethernet cable connection becomes loose and the camera server needs to be relaunched every now and then. Also, everytime we want to change the distance between the lenses, the telescope needs to be dismantled and refocused again. I'll try to come up with a better telescope design for this.

Yesterday, I had focused the GigE using a low exposure time and small aperture of iris, to make sure that we are actually seeing a sharp image of the beam spot. I'm attaching a picture of the beam spot I had clicked while focusing it, unfortunately, I forgot to take a picture after I had focused it completely. I'm also attaching a picture of the final setup for future reference. 


Yesterday night, Rana asked me to lock the MC2. I figured that the PSL shutter was closed; I just opened it and was able to see the beam spot on the analog camera screen.

  5184   Thu Aug 11 08:29:28 2011 steveUpdateComputersdataviewer at Rosalba

I'm having this problem with DTV every morning at Rosalba only. It wants to start with a negative GPS time and it can not connect to the frame builder.

Normally after a few time of starting it, it will work.

Attachment 1: gpsfmb.png
gpsfmb.png
  4881   Fri Jun 24 22:35:23 2011 ranaConfigurationCDSdataviewer broken on pianosa

When I try to get minute trend, it says "word too long".

  13165   Thu Aug 3 20:15:11 2017 JamieUpdateCDSdataviewer can not raise test points

For some reason dataviewer is not able to raise test points with the new daqd setup, even though dtt can.  If you raise a test point with dtt then dataviewer can show the data fine.

It's unclear to me why this would be the case.  It might be that all the versions of dataviewer on the workstations are too old??  I'll look into it tomorrow to see if I can figure out what's going on.

  5895   Tue Nov 15 15:16:04 2011 kiwamuUpdateCDSdataviewer doesn't run

Dataviewer is not able to access to fb somehow.

I restarted daqd on fb but it didn't help.

Also the status screen is showing a blank white form in all the realtime model. Something bad is happening.

blank.png

JAMIEEEE !!!!

  5896   Tue Nov 15 15:56:23 2011 jamieUpdateCDSdataviewer doesn't run

Quote:

Dataviewer is not able to access to fb somehow.

I restarted daqd on fb but it didn't help.

Also the status screen is showing a blank while form in all the realtime model. Something bad is happening.

 So something very strange was happening to the framebuilder (fb).  I logged on the fb and found this being spewed to the logs once a second:

[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 15:28:51 2011] going down on signal 11
sh: /bin/gcore: No such file or directory

Apparently /bin/gcore was trying to be called by some daqd subprocess or thread, and was failing since that file doesn't exist.  This apparently started at around 5:52 AM last night:

[Tue Nov 15 05:46:52 2011] main profiler warning: 1 empty blocks in the buffer
[Tue Nov 15 05:46:53 2011] main profiler warning: 0 empty blocks in the buffer
[Tue Nov 15 05:46:54 2011] main profiler warning: 0 empty blocks in the buffer
[Tue Nov 15 05:46:55 2011] main profiler warning: 0 empty blocks in the buffer
[Tue Nov 15 05:46:56 2011] main profiler warning: 0 empty blocks in the buffer
...
[Tue Nov 15 05:52:43 2011] main profiler warning: 0 empty blocks in the buffer
[Tue Nov 15 05:52:44 2011] main profiler warning: 0 empty blocks in the buffer
[Tue Nov 15 05:52:45 2011] main profiler warning: 0 empty blocks in the buffer
GPS time jumped from 1005400026 to 1005400379
[Tue Nov 15 05:52:46 2011] going down on signal 11
sh: /bin/gcore: No such file or directory
[Tue Nov 15 05:52:46 2011] going down on signal 11
sh: /bin/gcore: No such file or directory

The gcore I believe it's looking for is a debugging tool that is able to retrieve images of running processes.  I'm guessing that something caused something int the fb to eat crap, and it was stuck trying to debug itself.  I can't tell what exactly happend, though.  I'll ping the CDS guys about it.  The daqd process was continuing to run, but it was not responding to anything, which is why it could not be restarted via the normal means, and maybe why the various FB0_*_STATUS channels were seemingly dead.

I manually killed the daqd process, and monit seemed to bring up a new process with no problem.  I'll keep an eye on it.

  7758   Wed Nov 28 21:42:21 2012 ranaFrogsComputer Scripts / Programsdataviewer font error

An error this evening on rossa: dataviewer not working due to some font errors:

controls@rossa:~ 0$ dataviewer
Connecting.... done
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning:
    Name: FilterText
    Class: XmTextField
    Character '\52' not supported in font.  Discarded.

Warning:
    Name: FilterText
    Class: XmTextField
    Character '\56' not supported in font.  Discarded.

Warning:
    Name: FilterText
    Class: XmTextField
    Character '\170' not supported in font.  Discarded.

Warning:

etc.............

  13181   Thu Aug 10 09:10:55 2017 steveUpdateGeneraldataviewer is recovering

It can look back 7 days trends now. There is still no vacuum channels. I can bring back the channels through the restore directory, but there are no data.

Attachment 1: 7dm.png
7dm.png
  5049   Wed Jul 27 15:49:13 2011 jamieConfigurationCDSdataviewer now working on pianosa

Not exactly sure what the problem was, but I updated to the head of the SVN and rebuilt and it seems to be working fine now.

  1652   Thu Jun 4 16:54:19 2009 peteUpdateLockingdaytime DD handoff

I played with the DD handoff during the day.  The DRM dark port was flickering like a candle flame in Dracula's castle.  The demod offsets for the handoff signals looked fine.  After MICH handoff, the MICH_CTRL started to get unstable at some low frequency, maybe 3 Hz (I didn't measure).  So I increased the MICH gain from 0.1 to 0.17 and it settled down.  PRC and SRC went fine.  Then the DD_handoff script raised the MICH gain to 0.7, and an instability started to grow in MICH_CTRL (at some higher frequency).  I decreased the MICH gain from 0.7 to 0.5, and it settled down and stayed stable.

  1658   Fri Jun 5 17:22:55 2009 peteUpdateLockingdaytime locking

After fixing the tp problem, I tried locking again.  Grabbing and DD handoff, no problem.  Died earlier than last night, handing off CARM to REFL_DC, around arm power of 4 or so.  Seems to happen after turning off the moving zero, Rob says it might be touchy in daytime.

  2092   Wed Oct 14 16:59:37 2009 robUpdateLockingdaytime locking

The IFO can now be locked during the daytime.  Well, it's locked now.

  2093   Wed Oct 14 23:02:41 2009 ranaUpdateLockingdaytime locking

This is huge.    Five hours of lock only interrupted by intentional break from transfer function abuse.

Attachment 1: a.png
a.png
  4606   Tue May 3 05:32:04 2011 kiwamuUpdateLSCdaytime tasks

Daytime tasks :

 - PRM & BS oplev (Steve)

 - LSC binary outputs (Joe/Jamie)

 - installation of the REFL55 RFPD (Suresh/Jamie)

 - Adjustment of demodulation phases (Kiwamu)

 - Bounce-Roll filters on BS and PRM (Suresh/Joe)

 - Suspension diagnostic using the free-swinging spectra (Leo)

 - PMC alignment (Jenne/Koji)

  4607   Tue May 3 10:21:25 2011 KojiUpdateLSCdaytime tasks

I think the installation of the PD DC signals are quite important. What to do
1) Connect the DC signals to the right top whitening board (be aware that there may be the modification of the whitening circuit).
2) Reconfigure the LSC model such that the DC signal is passed to the right channels (modify the left top part of the model)

Quote:

Daytime tasks :

 - PRM & BS oplev (Steve)

 - LSC binary outputs (Joe/Jamie)

 - installation of the REFL55 RFPD (Suresh/Jamie)

 - Adjustment of demodulation phases (Kiwamu)

 - Bounce-Roll filters on BS and PRM (Suresh/Joe)

 - Suspension diagnostic using the free-swinging spectra (Leo)

 - PMC alignment (Jenne/Koji)

 

  4610   Tue May 3 11:49:03 2011 KojiUpdateLSCdaytime tasks

Done. C1:PSL-PMC_PMCTRANSPD was improved from ~0.75 to 0.87.

Quote:

- PMC alignment (Jenne/Koji)

 

  6412   Wed Mar 14 05:26:39 2012 interferomter tack forceUpdateGeneraldaytime tasks

The following tasks need to be done in the daytime tomorrow.

  • Hook up the DC output of the Y green BBPD on the PSL table to an ADC channel (Jamie / Steve)
  • Install fancy suspension matrices on PRM and ITMX [#6365] (Jenne)
  • Check if the REFL165 RFPD is healthy or not (Suresh / Koji)
    • According to a simulation the REFL165 demod signal should show similar amount of the signal to that of REFL33.
    • But right now it is showing super tiny signals [#6403]
  6416   Wed Mar 14 14:09:01 2012 interferomter tack forceUpdateGeneraldaytime tasks

Quote:

The following tasks need to be done in the daytime tomorrow.

  • Hook up the DC output of the Y green BBPD on the PSL table to an ADC channel (Jamie / Steve)
  • Install fancy suspension matrices on PRM and ITMX [#6365] (Jenne)
  • Check if the REFL165 RFPD is healthy or not (Suresh / Koji)
    • According to a simulation the REFL165 demod signal should show similar amount of the signal to that of REFL33.
    • But right now it is showing super tiny signals [#6403]

 For ITMX, I used the values from the conlog:

2011/08/12,20:10:12 utc 'C1:SUS[-_]ITMX[-_]INMATRIX'
These are the latest values in the conlog that aren't the basic matricies.  Even though we did a round of diagonalization in Sept, and the 
matricies are saved in a .mat file, it doesn't look like we used the ITMX matrix from that time.

For PRM, I used the matricies that were saved in InputMatricies_16Sept2011.mat, in the peakFit folder, since I couldn't find anything in the Conlog other than the basic matricies.

 

UPDATE:  I didn't actually count the number of oscillations until the optics were damped, so I don't have an actual number for the Q, but I feel good about the damping, after having kicked POS of both ITMX and PRM and watching the sensors.

  4142   Wed Jan 12 02:41:19 2011 kiwamuUpdateGeneraldaytime tasks for tomorrow

[Rana, Kiwamu]

Here is the list for the daytime tasks of tomorrow, Jan. 12th.

The daytime task is a work basically to be done or quitted before the sun goes down.

Along with the tasks, we roughly assigned the people who are responsible for it.

The tasks below are basically separated from each other, so we can work in parallel.

 

 

--- 1.  LSC analog interface check (Joe/Koji)

   * check whitening filter

   * demodulation board check

    * check ADC connections

 

--- 2.  MC2 coil Dewhitening (Joe)

    * fix binary outputs.

    * FM9 must be the trigger of the binary outputs instead of FM10.

 

--- 3. 11MHz modulation depth (Kiwamu)

   * investigate why the depth is so low

 

--- 4. PEM DAQ name issue (Jenne)

   * change the name of seismic channels properly so that we can deal with the calibrated stuff

 

--- 5. phase adjustment for MC-PDH locking (Suresh)

 

--- 6. medm screens for C1LSC ()

    * make screens

 

 

 

  14492   Thu Mar 21 18:09:36 2019 KojiUpdateCDSdb file preparation for acromag c1susaux

I have updated the google doc spreadsheet to indicate the required action for the new dbfile generation.

There are three types of actions:

1. COPY - Just duplicate the old EPICS db entry. This is for soft channels, calc channels.
2. DELETE - Delete the entry for some physical channels that will not be implemented on Acromag (oplev, dewhitening mon, AI monitor, etc)
3. REPLACE - For the physical channels, we want to replace the port names.

The blue part of the spreadsheet indicates the action for each channel. If it is a physical channel, the assigned module and the channel are indicated there. What we still want to do is to use the these information for generating the port name which looks like "@asynMask(C1VAC_XT1221A_ADC 1 -16)MODBUS_DATA".

The links to the spreadsheets can be found on 40m wiki: https://wiki-40m.ligo.caltech.edu/CDS/SlowControls/c1susaux

Attachment 1: Screen_Shot_2019-03-21_at_18.06.53.png
Screen_Shot_2019-03-21_at_18.06.53.png
  14462   Fri Feb 15 21:15:42 2019 gautamUpdateVACdd backup of c1vac made
  1. Connected one of the solid-state drives to c1vac. It was /dev/sdb.
  2. Formatted the drive using sudo mkfs -t ext4 /dev/sdb
  3.  Mounted it as /mnt/backup using sudo mount /dev/sdb /mnt/backup
  4. Started a tmux session for the dd, called DDbackup
  5. Started the dd backup using  sudo dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync
  6. Backup completed in 719 seconds: need to test if it works...
controls@c1vac:~$ sudo dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync
[sudo] password for controls: 
^C283422+0 records in
283422+0 records out
18574344192 bytes (19 GB) copied, 719.699 s, 25.8 MB/s
Quote:
 
  • Generate a bootable backup hard drive for c1vac, which could be swapped in on a short time scale after a failure.
ELOG V3.1.3-