40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Jan 7 22:44:45 2014, Jenne, Update, CDS, daqd on fb is segfaulting every ~30 seconds 
    Reply  Tue Jan 7 23:08:01 2014, jamie, Update, CDS, /frames is full, causing daqd to die 
       Reply  Tue Jan 7 23:13:47 2014, jamie, Update, CDS, /frames is full, causing daqd to die 
          Reply  Tue Jan 7 23:50:27 2014, jamie, Update, CDS, /frames space cleared up, daqd stabilized 
             Reply  Tue May 13 17:45:21 2014, rana, Update, CDS, /frames space cleared up, daqd stabilized 
                Reply  Thu May 15 01:42:07 2014, rana, Update, CDS, /frames space cleared up, daqd stabilized 
Message ID: 9530     Entry time: Tue Jan 7 22:44:45 2014     Reply to this: 9531
Author: Jenne 
Type: Update 
Category: CDS 
Subject: daqd on fb is segfaulting every ~30 seconds 

The daqd process is segfaulting and restarting itself every 30 seconds or so.  It's pretty frustrating. 

Just for kicks, I tried an mxstream restart, clearing the testpoints, and restarting the daqd process, but none of things changed anything.  

Manasa found an elog from a year ago (elog 7105 and preceding), but I'm not sure that it's a similar / related problem.  Jamie, please help us!

Here is a screen dump from the "dtail":

Every 1.0s: dmesg | tail -50                                                                                                                         Tue Jan  7 22:43:23 2014

[   33.498691]  [<ffffffff8104a063>] kthread+0x7a/0x82
[   33.498695]  [<ffffffff81003654>] kernel_thread_helper+0x4/0x10
[   33.498698]  [<ffffffff81049fe9>] ? kthread+0x0/0x82
[   33.498701]  [<ffffffff81003650>] ? kernel_thread_helper+0x0/0x10
[   33.498703] ---[ end trace 6236defa99b3e091 ]---
[   33.498705] mx INFO: Board 0: allocated MSI IRQ 67
[   33.498713] mx INFO: CPU0: PAT = 0x7010600070106
[   33.498715] mx INFO: CPU0: new PAT = 0x1010600070106
[   33.498718] mx INFO: Board 0: Using PAT index 6
[   33.499101] eth0: no IPv6 routers present
[   33.531013] mx INFO: Board 0: device 8, rev 0, 1 ports and 2096896 bytes of SRAM available
[   33.531017] mx INFO: Board 0: Bridge is 10de:005d
[   33.531228] mx INFO: Board 0: MAC address = 00:60:dd:46:ea:ec
[   33.535971] mx INFO: Loaded mcp of len 235448
[   34.489244] mx INFO: Starting usermode mapper at /opt/mx/sbin/mx_start_mapper
[   39.148855] mx INFO: mx0: Link0 is UP
[   39.588511] mx INFO: myri0: Will use skbuf frags (4096 bytes, order=0)
[   39.589299] mx INFO: 1 Myrinet board found and initialized
[  287.706367] daqd used greatest stack depth: 3368 bytes left
[86605.907520] daqd[18407]: segfault at 38b08e4c0 ip 00007f11b3942a6c sp 00007f10b1917d50 error 4
[86605.907530] daqd[18424]: segfault at 38b544f90 ip 00007f11b3942a6c sp 00007f10b12c6d30 error 4 in libc-2.10.1.so[7f11b390e000+14c000] in libc-2.10.1.so[7f11b390e000+14c00
0]
[86605.907544]
[86605.919454] daqd[21319] general protection ip:7f11b3942a6c sp:7f10b1814d30 error:0
[86605.919462] daqd[18442] general protection ip:7f11b3942a6c sp:7f10b0bf4d30 error:0
[86605.919615] daqd[18443]: segfault at 38aee3db0 ip 00007f11b3942a6c sp 00007f10b0b73d50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919694] daqd[18412]: segfault at 38aff35d0 ip 00007f11b3942a6c sp 00007f10b1752d30 error 4
[86605.919701] daqd[18417]: segfault at 38b544f70 ip 00007f11b3942a6c sp 00007f10b154dd50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919708] daqd[18445]: segfault at 38aff35b0 ip 00007f11b3942a6c sp 00007f10b0ab1d50 error 4
[86605.919733] daqd[18429]: segfault at 38b42ae90 ip 00007f11b3942a6c sp 00007f10b10c1d50 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.919741] daqd[18440]: segfault at 38b08e480 ip 00007f11b3942a6c sp 00007f10b0cb6d30 error 4 in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958551]  in libc-2.10.1.so[7f11b390e000+14c000] in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958557]
[86605.958577]  in libc-2.10.1.so[7f11b390e000+14c000]
[86605.958586]  in libc-2.10.1.so[7f11b390e000+14c000]
[86605.959639] daqd used greatest stack depth: 3160 bytes left
[98139.100888] show_signal_msg: 13 callbacks suppressed
[98139.100895] daqd[23753]: segfault at 39c7363b0 ip 00007f5bf253ba6c sp 00007f5b69b48d30 error 4 in libc-2.10.1.so[7f5bf2507000+14c000]
[98687.815120] daqd used greatest stack depth: 2984 bytes left
[208995.594227] daqd[10386] general protection ip:7f3b7c930a6c sp:7f3a79f09d50 error:0 in libc-2.10.1.so[7f3b7c8fc000+14c000]
[353015.067479] daqd used greatest stack depth: 2880 bytes left
[367406.863618] daqd[13078]: segfault at 41 ip 0000000000000041 sp 00007fb1f0ba2cf8 error 14 in daqd[400000+7c000]
[367406.863833] daqd[13104] general protection ip:7fb2f3018a6c sp:7fb1f01c8d30 error:0
[367406.863877] daqd[13086] general protection ip:7fb2f3018a6c sp:7fb1f089ad30 error:0
[367406.877408] daqd[13080]: segfault at 41 ip 0000000000000041 sp 00007fb1f0ae0ca8 error 14 in daqd[400000+7c000]
[367406.877435]  in libc-2.10.1.so[7fb2f2fe4000+14c000]
[367406.877442] daqd[13100]: segfault at 39ba287b0 ip 00007fb2f3018a6c sp 00007fb1f034cd30 error 4 in libc-2.10.1.so[7fb2f2fe4000+14c000]
[367406.878372]  in libc-2.10.1.so[7fb2f2fe4000+14c000]
[399802.887523] daqd[18295] general protection ip:7fb056a71a6c sp:7faf96125f10 error:0 in libc-2.10.1.so[7fb056a3d000+14c000]
[410595.969327] daqd[22057]: segfault at 3a91f27b0 ip 00007f48e96eea6c sp 00007f47e6c26d50 error 4 in libc-2.10.1.so[7f48e96ba000+14c000]
[410595.988926] daqd[22068]: segfault at 3a91f2790 ip 00007f48e96eea6c sp 00007f47e681bd30 error 4 in libc-2.10.1.so[7f48e96ba000+14c000]

ELOG V3.1.3-