40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Mon Aug 6 19:37:50 2012, Jamie, Update, CDS, daqd and CDS network problems today 
    Reply  Mon Aug 6 19:54:53 2012, Jamie, Update, CDS, daqd and CDS network problems today 
       Reply  Mon Aug 6 20:08:45 2012, Jamie, Update, CDS, daqd and CDS network problems today 
          Reply  Mon Aug 6 20:22:50 2012, Jamie, Update, CDS, daqd segfaulting after five minutes 
             Reply  Tue Aug 7 11:46:24 2012, Jamie, Update, CDS, Alex working on daqd 
                Reply  Tue Aug 7 14:17:07 2012, Jamie, Update, CDS, daqd running again; related to c1sup issue 
                   Reply  Tue Aug 7 14:34:01 2012, Jamie, Update, CDS, jk. daqd still segfaulting 
                      Reply  Tue Aug 7 15:04:23 2012, Jamie, Update, CDS, daqd problem was root-owned files and directories 
Message ID: 7105     Entry time: Tue Aug 7 15:04:23 2012     In reply to: 7103
Author: Jamie 
Type: Update 
Category: CDS 
Subject: daqd problem was root-owned files and directories 

Apparently the last problem was because of root-owned frame directories that daqd was trying to write to.  During debugging Alex had run daqd as root, but it's supposed to run as controls.  All the /frame directories are supposed to be owned by controls.  When daqd was run as root, it created new frame directories owned by root, which controls couldn't write to when I restarted daqd the proper way.  Once we chown'd the directories daqd started running again.

Alex also put in a "fix" for the core dump problem.  He touched an empty core file owned by root:

-rw-r--r-- 1 root root 0 Aug  7 14:38 /opt/rtcds/caltech/c1/target/fb/core

This will prevent any dying daqd process owned by controls from dumping it's core at that location.  Personally I think this is a horribly hacky "solution" that doesn't actually fix any of the issues that were causing the segfaults to begin with, but it might prevent some of the network slow down we see when the core does dump.  It's mostly just masking the problem, though, so I'm tempted to remove it so we all feel the pain when daqd starts shitting all over the network again.

ELOG V3.1.3-