Late last night we were getting some problems with DAQD again. Turned out to be /frames getting full again.
I deleted a bunch of old frame files by hand around 3AM to be able to keep locking quickly and then also ran the wiper script (target/fb/wiper.pl).
controls@pianosa|fb> df -h; date
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 440G 9.7G 408G 3% /
none 7.9G 288K 7.9G 1% /dev
none 7.9G 464K 7.9G 1% /dev/shm
none 7.9G 144K 7.9G 1% /var/run
none 7.9G 0 7.9G 0% /var/lock
none 7.9G 0 7.9G 0% /lib/init/rw
none 440G 9.7G 408G 3% /var/lib/ureadahead/debugfs
linux1:/home/cds 1.8T 1.4T 325G 82% /cvs/cds
linux1:/ligo 71G 18G 50G 27% /ligo
linux1:/home/cds/rtcds
1.8T 1.4T 325G 82% /opt/rtcds
fb:/frames 13T 12T 559G 96% /frames
linux1:/home/cds/caltech/users
1.8T 1.4T 325G 82% /users
Tue May 13 17:35:00 PDT 2014
Looking through the directories by hand it seems that the issue may be due to our FB MXstream instabilities. The wiper looks at the disk usage and tries to delete just enough files to keep us below 95% full for the next 24 hours. If, however, some of the channels are not being written because some front ends are not writing their DAQ channels to frames, then it will misestimate the disk size. In particular, if its currently writing small frames and then we restart the mxstream and the per frame file size goes back up to 80 MB, it can make the disk full.
For now, I have modified the wiper.pl script to try to stay below 93%. As you can see by the above output of 'df', it is already above 96% and it still has files to write until the next run of wiper.pl 7 hours from now at. at 6 AM.
IF we assume that its writing a 75MB file every 16 seconds, then it would write 405 GB of frames every day. There is 559 GB free right now so we are OK for now. With 405 GB of usage per day, we have a lookback of ~12TB/405GB ~ 29 days (ignoring the trend files). |