40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Message ID: 3796     Entry time: Wed Oct 27 12:32:53 2010
Author: josephb 
Type: Update 
Category: CDS 
Subject: fb rebooted to try and fix testpoints 

Problem:

Test points were unavailable last night, even after reboots of c1sus and even restarting the daqd process on the frame builder.

Cause:

Its unclear at this time.  My guess is flaky fb and mx_stream codes.  At the moment, the daqd often requires several restarts as it segfaults within a minute or two of restarting it.

What we did (aka treating the symptoms):

We rebooted the frame builder machine.  I also added the daqd and nds processes to the inittab.  Now when these die, they will automatically be restarted.

Steps to add to the inittab on fb

0) If not on fb, ssh -X fb

1) cd /etc/

2) sudo vi inittab or sudo emacs init

3) Add a line like: id:runlevels:action:process

The id is a unqiue 2-4 letter and number identifier for the process

Run levels is the run level of linux that it will start at. 345 will cover the normal cases

action is what to do with the process. Respawn makes it run at startup and also restarts it everytime it dies.

process is the command you want to run

See "man inittab" for more details

In this case we added

daq:345:respawn:/opt/rtcds/caltech/c1/target/fb/daqd -c /opt/rtcds/caltech/c1/target/fb/daqdrc > /opt/rtcds/caltech/c1/target/fb/daqd.log


nds:345:respawn:/opt/rtcds/caltech/c1/target/fb/nds pipe > /opt/rtcds/caltech/c1/target/fb/nds.log

4) Save.

5) Run "sudo /sbin/telinit q".  This forces init to rexamine the inittab file

daqd and nds will now automatically restart when they die.

Continuing issues:

When the frame builder dies, the mx_stream processes on the front ends die as well.  These need to be restarted manually at the moment by using "sudo /etc/restart_streams" while on c1sus.

The framebuilder code shouldn't be this flaky.

ELOG V3.1.3-