40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Jan 7 22:44:45 2014, Jenne, Update, CDS, daqd on fb is segfaulting every ~30 seconds 
    Reply  Tue Jan 7 23:08:01 2014, jamie, Update, CDS, /frames is full, causing daqd to die 
       Reply  Tue Jan 7 23:13:47 2014, jamie, Update, CDS, /frames is full, causing daqd to die 
          Reply  Tue Jan 7 23:50:27 2014, jamie, Update, CDS, /frames space cleared up, daqd stabilized 
             Reply  Tue May 13 17:45:21 2014, rana, Update, CDS, /frames space cleared up, daqd stabilized 
                Reply  Thu May 15 01:42:07 2014, rana, Update, CDS, /frames space cleared up, daqd stabilized 
Message ID: 9531     Entry time: Tue Jan 7 23:08:01 2014     In reply to: 9530     Reply to this: 9533
Author: jamie 
Type: Update 
Category: CDS 
Subject: /frames is full, causing daqd to die 

Quote:

The daqd process is segfaulting and restarting itself every 30 seconds or so.  It's pretty frustrating. 

Just for kicks, I tried an mxstream restart, clearing the testpoints, and restarting the daqd process, but none of things changed anything.  

Manasa found an elog from a year ago (elog 7105 and preceding), but I'm not sure that it's a similar / related problem.  Jamie, please help us

The problem is not exactly the same as what's described in 7105, but the symptoms are so similar I assumed they must have a similar source.

And sure enough, /frames is completely full:

controls@fb /opt/rtcds/caltech/c1/target/fb 0$ df -h /frames/
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              13T   13T     0 100% /frames
controls@fb /opt/rtcds/caltech/c1/target/fb 0$

So the problem in both cases was that it couldn't write out the frames.  Unfortunately daqd is apparently too stupid to give us a reasonable error message about what's going on.

So why is /frames full?  Apparently the wiper script is either not running, or is failing to do it's job.  My guess is that this is a side effect of the linux1 raid failure we had over xmas.

ELOG V3.1.3-