40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Jun 30 11:33:00 2015, Jamie, Summary, CDS, prepping for CDS upgrade 
    Reply  Wed Jul 1 19:16:21 2015, Jamie, Summary, CDS, CDS upgrade in progress 
       Reply  Tue Jul 7 18:27:54 2015, Jamie, Summary, CDS, CDS upgrade: progress! 2.9-RTS-OK.pngC1X04_GDS_TP.png
          Reply  Wed Jul 8 20:37:02 2015, Jamie, Summary, CDS, CDS upgrade: one step forward, two steps back 
             Reply  Wed Jul 8 21:02:02 2015, Jamie, Summary, CDS, CDS upgrade: another step forward, so we're back to where we started (plus a bit?) 
                Reply  Thu Jul 9 13:26:47 2015, Jamie, Summary, CDS, CDS upgrade: new mx 1.2.16 installed 
                   Reply  Thu Jul 9 16:50:13 2015, Jamie, Summary, CDS, CDS upgrade: if all else fails try throwing metal at the problem 
                      Reply  Mon Jul 13 01:11:14 2015, Jamie, Summary, CDS, CDS upgrade: current assessment 
                         Reply  Mon Jul 13 18:12:50 2015, Jamie, Summary, CDS, CDS upgrade: left running in semi-stable configuration 
                            Reply  Tue Jul 14 09:08:37 2015, Jamie, Summary, CDS, CDS upgrade: left running in semi-stable configuration 
                               Reply  Tue Jul 14 10:28:02 2015, ericq, Summary, CDS, CDS upgrade: left running in semi-stable configuration 
                                  Reply  Tue Jul 14 11:57:27 2015, jamie, Summary, CDS, CDS upgrade: left running in semi-stable configuration 
                            Reply  Tue Jul 14 16:51:01 2015, Jamie, Summary, CDS, CDS upgrade: problem is not disk access 
                               Reply  Wed Jul 15 13:19:14 2015, Jamie, Summary, CDS, CDS upgrade: reducing mx end-points as last ditch effort 
                                  Reply  Wed Jul 15 18:19:12 2015, Jamie, Summary, CDS, CDS upgrade: tentative stabilty? 
                                     Reply  Sat Jul 18 15:37:19 2015, Jamie, Summary, CDS, CDS upgrade: current status cds-good.pngsus-damped.png
Message ID: 11404     Entry time: Mon Jul 13 18:12:50 2015     In reply to: 11402     Reply to this: 11406   11412
Author: Jamie 
Type: Summary 
Category: CDS 
Subject: CDS upgrade: left running in semi-stable configuration 

I have been watching daqd all day and I don't feel particularly closer to understanding what the issues are.  However, things are

Interestingly, though, the stability appears highly variable at the moment.  This morning, daqd was very unstable and was crashing within a couple of minutes of starting.  However this afternoon, things seemed much more stable.  As of this moment, daqd has been running for for 25 minutes now, writing full frames as well as minute and second trends (no minute_raw), without any issues.  What has changed?

To reiterate, I have been closing watching disk IO to /frames.  I see no indication that there is any disk contention while daqd is failing.  It's still possible, though, that there are disk IO issues affecting daqd at a level that is not readily visible.  From dstat, the frame writes are visible, but nothing else.

I have made one change that could be positively affecting things right now: I un-exported /frames from NFS.  This eliminates anything external from reading /frames over the network.  In particular, it also shuts off the transfer of frames to LDAS.  Since I've done this, daqd has appeared to be more stable.  It's NOT totally stable, though, as the instance that I described above did eventually just die after 43 minutes, as I was writing this.

In any event, as things are currently as stable as I've seen them, I'm leaving it running in this configuration for the moment, with the following relevant daqdrc parameters:

start main 16;
start frame-saver;
sync frame-saver;
start trender 60 60;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start profiler;
start trend profiler;
ELOG V3.1.3-