40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Fri Jun 5 02:59:03 2009, pete, alberto, Update, Locking, tdsavg failure in cm_step script 
    Reply  Fri Jun 5 13:51:49 2009, rob, Update, Locking, tdsavg failure in cm_step script 
       Reply  Fri Jun 5 16:45:28 2009, rob, pete, HowTo, Computers, tdsavg failure in cm_step script 
Message ID: 1657     Entry time: Fri Jun 5 16:45:28 2009     In reply to: 1656
Author: rob, pete 
Type: HowTo 
Category: Computers 
Subject: tdsavg failure in cm_step script 



the command

tdsavg 5 C1:LSC-PD4_DC_IN1

was causing grievous woe in the cm_step script.  It turned out to fail intermittently at the command line, as did other LSC channels.  (But non-LSC channels seem to be OK.)  So we power cycled c1lsc (we couldn't ssh).

Then we noticed that computers were out of sync again (several timing fields said 16383 in the C0DAQ_RFMNETWORK screen).  We restarted c1iscey, c1iscex, c1lsc, c1susvme1, and c1susvme2.  The timing fields went back to 0.  But the tdsavg command still  intermittently said "ERROR: LDAQ - SendRequest - bad NDS status: 13".

The channel C1:LSC-SRM_OUT16 seems to work with tdsavg every time.

Let us know if you know how to fix this. 


 Did you try restarting the framebuilder?


What you type is in bold:

op440m> telnet fb40m 8087

daqd> shutdown


Restarting the framebuilder didn't work, but the problem now appears to be fixed.

Upon reflection, we also decided to try killing all open DTT and Dataviewer windows.  This also involved liberal use of ps -ef to seek out and destroy all diag's, dc3's, framer4's, etc.


That may have worked, but it happened simultaneously to killing the tpman process on fb40m, so we can't be sure which is the actual solution.


To restart the testpoint manager:

what you type is in bold:

rosalba> ssh fb40m

fb40m~> pkill tpman

The tpman is actually immortal, like Voldemort or the Kurgan or the Cylons in the new BG.  Truly slaying it requires special magic, so the pkill tpman command has the effect of restarting it.


In the future, we should make it a matter of policy to close DTTs and Dataviewers when we're done using them, and killing any unattended ones that we encounter.


ELOG V3.1.3-