40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Mon Aug 6 19:37:50 2012, Jamie, Update, CDS, daqd and CDS network problems today 
    Reply  Mon Aug 6 19:54:53 2012, Jamie, Update, CDS, daqd and CDS network problems today 
       Reply  Mon Aug 6 20:08:45 2012, Jamie, Update, CDS, daqd and CDS network problems today 
          Reply  Mon Aug 6 20:22:50 2012, Jamie, Update, CDS, daqd segfaulting after five minutes 
             Reply  Tue Aug 7 11:46:24 2012, Jamie, Update, CDS, Alex working on daqd 
                Reply  Tue Aug 7 14:17:07 2012, Jamie, Update, CDS, daqd running again; related to c1sup issue 
                   Reply  Tue Aug 7 14:34:01 2012, Jamie, Update, CDS, jk. daqd still segfaulting 
                      Reply  Tue Aug 7 15:04:23 2012, Jamie, Update, CDS, daqd problem was root-owned files and directories 
Message ID: 7093     Entry time: Mon Aug 6 19:37:50 2012     Reply to this: 7094
Author: Jamie 
Type: Update 
Category: CDS 
Subject: daqd and CDS network problems today 

For some reason this afternoon we've been experiencing a lot of problems with the framebuilder, and with the CDS network in general.  The framebuilder has been very unresponsive, although the daqd logs seem to indicate that things are ok.  All models will loose contact with fb for very long stretches.  Attempts to kill/restart daqd don't seem to fix the problem.

These problems seem to be associated with the general CDS network issues as well.  The network seems to become very slow, and the workstations all become very slow.  The later I assume is because of the network and that so much of the work we do is on network mounted filesystems (/opt/rtcds, /ligo, etc.).

My current speculation is that daqd on fb is doing something stupid, like trying to read or write a bunch of stuff from /frames, which is also network mounted, and that clogs up the entire network.  Some serious network debugging is going to be needed to figure out what's going on, though.

I'm afraid daqd is caught in some bad state now, though.  It's not responding to anything, and every attempt to kill it seems to bring it back into the bad state.  Hopefully I can get Alex to help me figure out what's going on tomorrow.   Maybe it will clear up on it's own tonight...

ELOG V3.1.3-