40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Thu Oct 7 23:24:44 2010, yuta, Update, CDS, checking MC1 suspension damping 
    Reply  Fri Oct 8 12:21:11 2010, josephb, Update, CDS, checking MC1 suspension damping  
Message ID: 3678     Entry time: Fri Oct 8 12:21:11 2010     In reply to: 3675
Author: josephb 
Type: Update 
Category: CDS 
Subject: checking MC1 suspension damping  

Upon investigation, it appears that the c1mcs model was (and still is) timing out after a random amount of time. Or in other words, it at some point it was taking too long to do all the calculations for a single cycle and falling behind. The evidence for this is from the dmesg command when run on c1sus.

There's a bunch of lines like:

[ 8877.438002] c1mcs: cycle 568 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[ 8877.438002] c1mcs: cycle 569 time 62; adcWait 0; write1 0; write2 0; longest write2 0

With a final line like: [ 8877.439003] c1mcs: ADC TIMEOUT 1 2405 37 2277

This last line indicates in fell so far behind it gave up.

However, this doesn't actually seem to be related to the amount of computation going on with the front end. I restarted the c1mcs model this morning by logging into the c1sus machine, and changing to the /opt/rtcds/caltech/c1/target/c1mcs/bin directory and running:

lsmod

sudo rmmod c1mcsfe

sudo insmod c1mcsfe.ko

The first line just lists the running modules. The second removes the c1mcs module, and the third starts it up again.

I proceeded to turn all the filters and and set all the matrix values while keeping an eye on the C1MCS-GDS_TP.adl screen and its timing indicator. It was running fine with all these turned on. Then I ran a dtt session from rosalba (going to /opt/apps/, typing bash, then source gds-env.bash, and finally diaggui) as well as a dataviewer and looked at 6 test point channels. It received data fine.

However, about 2 minutes after I had stopped doing this (although the dataviewer realtime session was still going) the USR timing jumped from about 20 microseconds to 35 microseconds, and the CPU Max timing jumped to the 50-60 microsecond range. At this point dmesg started reporting things like:

[54143.465613] c1mcs: cycle 1076 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[54143.526004] c1mcs: cycle 2064 time 62; adcWait 0; write1 0; write2 0; longest write2 0

About a minute later the code gave up and reported a ADC timeout via dmesg. None of the other front ends seem to be affected.

I had to clear the test points manually after stopping dataviewer and dtt by going to rosalaba,using the sourced gds-env.bash, and running diag -l. I then typed "tp clear 36 *" to clear all the test points on the model with FEC number 36 (corresponding to c1mcs).

I removed and restarted c1mcs again, trying to turn on a few things at a time and letting it run for a few minutes to see if I could narrow down if its one particular filter perhaps reaching an underflow and starting to bog down the computations. However, after about 45 minutes of this, the model is still running and I've turned all the filters on and have been running about 8 test points with no problem, so the problem is clearly intermittent.

Quote:

Notes:
 c1mcs crashed many times during the investigation, and I had to kill and restart it again and again.
 It seemed to be easily crashed when filters are on, and so I couldn't check whether the damping servo is working correctly or not today.

Next work:

  - fix c1mcs (and maybe others)
  - check the damping servo by comparing the displacements of each 4 degrees of freedom when servo in off and on.

 

ELOG V3.1.3-