40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Jan 7 22:44:45 2014, Jenne, Update, CDS, daqd on fb is segfaulting every ~30 seconds 
    Reply  Tue Jan 7 23:08:01 2014, jamie, Update, CDS, /frames is full, causing daqd to die 
       Reply  Tue Jan 7 23:13:47 2014, jamie, Update, CDS, /frames is full, causing daqd to die 
          Reply  Tue Jan 7 23:50:27 2014, jamie, Update, CDS, /frames space cleared up, daqd stabilized 
             Reply  Tue May 13 17:45:21 2014, rana, Update, CDS, /frames space cleared up, daqd stabilized 
                Reply  Thu May 15 01:42:07 2014, rana, Update, CDS, /frames space cleared up, daqd stabilized 
Message ID: 9949     Entry time: Tue May 13 17:45:21 2014     In reply to: 9535     Reply to this: 9955
Author: rana 
Type: Update 
Category: CDS 
Subject: /frames space cleared up, daqd stabilized 

 

 Late last night we were getting some problems with DAQD again. Turned out to be /frames getting full again.

I deleted a bunch of old frame files by hand around 3AM to be able to keep locking quickly and then also ran the wiper script (target/fb/wiper.pl).

controls@pianosa|fb> df -h; date

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda1             440G  9.7G  408G   3% /

none                  7.9G  288K  7.9G   1% /dev

none                  7.9G  464K  7.9G   1% /dev/shm

none                  7.9G  144K  7.9G   1% /var/run

none                  7.9G     0  7.9G   0% /var/lock

none                  7.9G     0  7.9G   0% /lib/init/rw

none                  440G  9.7G  408G   3% /var/lib/ureadahead/debugfs

linux1:/home/cds      1.8T  1.4T  325G  82% /cvs/cds

linux1:/ligo           71G   18G   50G  27% /ligo

linux1:/home/cds/rtcds

                      1.8T  1.4T  325G  82% /opt/rtcds

fb:/frames        13T   12T  559G  96% /frames

linux1:/home/cds/caltech/users

                      1.8T  1.4T  325G  82% /users

Tue May 13 17:35:00 PDT 2014

Looking through the directories by hand it seems that the issue may be due to our FB MXstream instabilities. The wiper looks at the disk usage and tries to delete just enough files to keep us below 95% full for the next 24 hours. If, however, some of the channels are not being written because some front ends are not writing their DAQ channels to frames, then it will misestimate the disk size. In particular, if its currently writing small frames and then we restart the mxstream and the per frame file size goes back up to 80 MB, it can make the disk full.

For now, I have modified the wiper.pl script to try to stay below 93%. As you can see by the above output of 'df', it is already above 96% and it still has files to write until the next run of wiper.pl 7 hours from now at. at 6 AM.

IF we assume that its writing a 75MB file every 16 seconds, then it would write 405 GB of frames every day. There is 559 GB free right now so we are OK for now. With 405 GB of usage per day, we have a lookback of ~12TB/405GB ~ 29 days (ignoring the trend files).

ELOG V3.1.3-