40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Mon Aug 6 19:37:50 2012, Jamie, Update, CDS, daqd and CDS network problems today 
    Reply  Mon Aug 6 19:54:53 2012, Jamie, Update, CDS, daqd and CDS network problems today 
       Reply  Mon Aug 6 20:08:45 2012, Jamie, Update, CDS, daqd and CDS network problems today 
          Reply  Mon Aug 6 20:22:50 2012, Jamie, Update, CDS, daqd segfaulting after five minutes 
             Reply  Tue Aug 7 11:46:24 2012, Jamie, Update, CDS, Alex working on daqd 
                Reply  Tue Aug 7 14:17:07 2012, Jamie, Update, CDS, daqd running again; related to c1sup issue 
                   Reply  Tue Aug 7 14:34:01 2012, Jamie, Update, CDS, jk. daqd still segfaulting 
                      Reply  Tue Aug 7 15:04:23 2012, Jamie, Update, CDS, daqd problem was root-owned files and directories 
Message ID: 7103     Entry time: Tue Aug 7 14:34:01 2012     In reply to: 7102     Reply to this: 7105
Author: Jamie 
Type: Update 
Category: CDS 
Subject: jk. daqd still segfaulting 

Quote:

So daqd's problem was apparently the bad/non-running c1sup model.  The c1sup model, which I reported on attempting to get running in 7097, was not running because there were no available CPUs on the c1sus FE machine.  This was due to my stupid undercounting of the number of CPUs.  Anyway, for reasons I don't understand, this was causing daqd to segfault.  Removing c1sup from c1sus "fixed" the problem.

Alex agreed that daqd should definitely not be segfaulting in this circumstance.  It's still unclear exactly what daqd was looking at that was causing it to crash.

I'm going to move c1sup to c1iscex, which has a lot of spare CPUs.

I spoke too soon.  It's still segfaulting, but at a different place. Alex and I are looking into it.

But another mystery solved is the cause of all the network slowness: the daqd core dump.  When daqd segfaults it dumps it's core, which can typically be >4G, to /opt/rtcds/caltech/c1/target/fb/core.  This is of course an NFS mount from linux1, so it's dumping 4G on the network, which not surprisingly clogs the network.

ELOG V3.1.3-