I have been watching daqd all day and I don't feel particularly closer to understanding what the issues are. However, things are
Interestingly, though, the stability appears highly variable at the moment. This morning, daqd was very unstable and was crashing within a couple of minutes of starting. However this afternoon, things seemed much more stable. As of this moment, daqd has been running for for 25 minutes now, writing full frames as well as minute and second trends (no minute_raw), without any issues. What has changed?
To reiterate, I have been closing watching disk IO to /frames. I see no indication that there is any disk contention while daqd is failing. It's still possible, though, that there are disk IO issues affecting daqd at a level that is not readily visible. From dstat, the frame writes are visible, but nothing else.
I have made one change that could be positively affecting things right now: I un-exported /frames from NFS. This eliminates anything external from reading /frames over the network. In particular, it also shuts off the transfer of frames to LDAS. Since I've done this, daqd has appeared to be more stable. It's NOT totally stable, though, as the instance that I described above did eventually just die after 43 minutes, as I was writing this.
In any event, as things are currently as stable as I've seen them, I'm leaving it running in this configuration for the moment, with the following relevant daqdrc parameters:
start main 16;
start frame-saver;
sync frame-saver;
start trender 60 60;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start profiler;
start trend profiler; |