40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Sep 17 14:01:46 2019, gautam, Update, CDS, daqd fw dead 
    Reply  Tue Sep 17 21:34:07 2019, gautam, Update, CDS, daqd fw dead no more RTFEstatus.png
Message ID: 14891     Entry time: Tue Sep 17 21:34:07 2019     In reply to: 14889
Author: gautam 
Type: Update 
Category: CDS 
Subject: daqd fw dead no more 

Summary:

  1. Frames seem to be written again.yesSlowly but surely, we are converging to an operable state...
  2. No frames are available for the period 23 Aug to 17 September 2019
  3. Don't edit the C0EDCU.ini file unless you know what you're doing.
  4. If you make some changes to the RT system/channel list or reboot FEs, please make sure all the dependent systems are back up and running. There shouldn't be a need to willy-nilly reboot things.
  5. Tomorrow I will prepare the map of BIO channels for Chub to restore the whitening switching capability. Then we can try locking some cavities.

Details:

  1. First, I checked to make sure the /frames partition wasn't full. It wasn't. yes
  2. Next, I looked into the C0EDCU.ini file.
    • The last date for which frames are available, 23 Aug, coincided with the date when this file was modified.
    • It is a known problem that the daqd_fw service can crash if one of the channels in this file is reporting an unusually large number.
    • Several channels were added to this file - in the end, only 9 new ones were required, 5x "DetectMon" channels for each of the RF demodulation frequencies, and 4 for the new ALS LO and RF signal power monitor channels.
    • It is highly likely that one of the other channels was what caused the daqd_fw service to crash - though I can't say for sure, because I did not exhaustively search through the ~100 un-necessary channels that were in this file to see what values they were reporting.
  3. For good measure, I ran the reboot script, and brought the c1lsc models back online.
    • I want to do the mapping of the BIO channels to the pin-out of the BIO adaptor unit, which requires c1lsc to run.
    • Reboot script ran smoothly.
  4. Then I went into fb and restarted all the daqd services. This time, they all seem to run without crashing, at least in the ~10min window it took me to type out this elog.

controls@fb1:~ 127$ sudo systemctl status  daqd_fw.service
● daqd_fw.service - Advanced LIGO RTS daqd frame writer
   Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
   Active: active (running) since Tue 2019-09-17 21:32:25 PDT; 17min ago
 Main PID: 22040 (daqd_fw)
   CGroup: /daqd.slice/daqd_fw.service
           └─22040 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw

Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer crc thread - label dqprodcrc pid=22108
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] [Tue Sep 17 21:32:31 2019] Producer thread - label dqproddbg pid=22109Producer crc... permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer crc thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread - label dqprod pid=22103
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread priority error Operation not permitted
Sep 17 21:32:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:31 2019] Producer thread put on CPU 0
Sep 17 21:32:35 fb1 daqd_fw[22040]: [Tue Sep 17 21:32:35 2019] Minute trender made GPS time correction; gps=1252816371; gps%60=51
Sep 17 21:33:31 fb1 daqd_fw[22040]: [Tue Sep 17 21:33:31 2019] ->3: clear crc

drwxr-xr-x 2 controls controls 569344 Aug 23 05:17 12465
drwxr-xr-x 2 controls controls 565248 Aug 23 05:41 12466
drwxr-xr-x 2 controls controls 557056 Aug 23 05:53 12505
drwxr-xr-x 2 controls controls 262144 Aug 23 18:40 12506
drwxr-xr-x 2 controls controls  12288 Sep 17 21:54 12528
 

Unrelated to this work: c1auxey was keyed.

Quote:

This meant that no frames were being written since Aug 23, which probably coincides with when the c1lsc frontend crashed. Sad 😢 😭 🙁 .

Attachment 1: RTFEstatus.png  22 kB  Uploaded Tue Sep 17 22:57:34 2019  | Hide | Hide all
RTFEstatus.png
ELOG V3.1.3-