40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Thu Oct 16 03:18:48 2014, Jenne, Update, CDS, Daqd segfaulting again 
    Reply  Thu Oct 16 12:22:43 2014, ericq, Update, CDS, Daqd segfaulting again 
       Reply  Fri Oct 17 15:17:31 2014, jamie, Update, CDS, Daqd "fixed"? 
          Reply  Fri Oct 17 16:54:11 2014, jamie, Update, CDS, Daqd "fixed"? 
             Reply  Thu Oct 23 01:39:34 2014, Jenne, Update, CDS, Daqd "fixed"? 
Message ID: 10624     Entry time: Fri Oct 17 16:54:11 2014     In reply to: 10623     Reply to this: 10633
Author: jamie 
Type: Update 
Category: CDS 
Subject: Daqd "fixed"? 

Quote:

I very tentatively declare that this particular daqd crapfest is "resolved" after Jenne rebooted fb and daqd has been running for about 40 minutes now without crapping itself.  Wee hoo.

I spent a while yesterday trying to figure out what could have been going on.  I couldn't find anything.  I found an elog that said a previous daqd crapfest was finally only resolved by rebooting fb after a similar situation, i.e. there had been an issue that was resolved, daqd was still crapping itself, we couldn't figure out why so we just rebooted, daqd started working again.

So, in summary, totally unclear what the issue was, or why a reboot solved it, but there you go.

Looks like I spoke too soon.  daqd seems to be crapping itself again:

controls@fb /opt/rtcds/caltech/c1/target/fb 0$ ls -ltr logs/old/ | tail -n 20
-rw-r--r-- 1 4294967294 4294967294    11244 Oct 17 11:34 daqd.log.1413570846
-rw-r--r-- 1 4294967294 4294967294    11086 Oct 17 11:36 daqd.log.1413570988
-rw-r--r-- 1 4294967294 4294967294    11244 Oct 17 11:38 daqd.log.1413571087
-rw-r--r-- 1 4294967294 4294967294    13377 Oct 17 11:43 daqd.log.1413571386
-rw-r--r-- 1 4294967294 4294967294    11481 Oct 17 11:45 daqd.log.1413571519
-rw-r--r-- 1 4294967294 4294967294    11985 Oct 17 11:47 daqd.log.1413571655
-rw-r--r-- 1 4294967294 4294967294    13219 Oct 17 13:00 daqd.log.1413576037
-rw-r--r-- 1 4294967294 4294967294    11150 Oct 17 14:00 daqd.log.1413579614
-rw-r--r-- 1 4294967294 4294967294     5127 Oct 17 14:07 daqd.log.1413580231
-rw-r--r-- 1 4294967294 4294967294    11165 Oct 17 14:13 daqd.log.1413580397
-rw-r--r-- 1 4294967294 4294967294     5440 Oct 17 14:20 daqd.log.1413580845
-rw-r--r-- 1 4294967294 4294967294    11352 Oct 17 14:25 daqd.log.1413581103
-rw-r--r-- 1 4294967294 4294967294    11359 Oct 17 14:28 daqd.log.1413581311
-rw-r--r-- 1 4294967294 4294967294    11195 Oct 17 14:31 daqd.log.1413581470
-rw-r--r-- 1 4294967294 4294967294    10852 Oct 17 15:45 daqd.log.1413585932
-rw-r--r-- 1 4294967294 4294967294    12696 Oct 17 16:00 daqd.log.1413586831
-rw-r--r-- 1 4294967294 4294967294    11086 Oct 17 16:02 daqd.log.1413586924
-rw-r--r-- 1 4294967294 4294967294    11165 Oct 17 16:05 daqd.log.1413587101
-rw-r--r-- 1 4294967294 4294967294    11086 Oct 17 16:21 daqd.log.1413588108
-rw-r--r-- 1 4294967294 4294967294    11097 Oct 17 16:25 daqd.log.1413588301
controls@fb /opt/rtcds/caltech/c1/target/fb 0$

The times all indicate when the daqd log was rotated, which happens everytime the process restarts.  It doesn't seem to be happening so consistently, though.  It's been 30 minutes since the last one.  I wonder if it somehow correlated with actual interaction with the NDS process.  Does some sort of data request cause it to crash?

 

ELOG V3.1.3-