40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Wed Jul 19 08:37:21 2017, Jamie, Update, CDS, Update on front-end/DAQ rebuild  
    Reply  Wed Jul 19 14:26:50 2017, Jamie, Update, CDS, Update on front-end/DAQ rebuild  
    Reply  Fri Jul 21 18:03:17 2017, Jamie, Update, CDS, Update on front-end/DAQ rebuild  
       Reply  Sun Jul 23 22:16:55 2017, Jamie, gautam, Update, CDS, front-end now running with new OS, RCG 2017-07-23-210810_1394x488_scrot.png2017-07-23-211812_387x488_scrot.png
          Reply  Mon Jul 24 10:45:23 2017, gautam, Update, CDS, c1iscex models died c1iscexFailure.png
             Reply  Mon Jul 24 10:59:08 2017, Jamie, Update, CDS, c1iscex models died 
          Reply  Mon Jul 24 19:28:55 2017, Jamie, Update, CDS, front end MX stream network working, glitches in c1ioo fixed 48.png
             Reply  Mon Jul 24 19:57:54 2017, gautam, Update, CDS, IMC locked, Autolocker re-enabled 
             Reply  Wed Jul 26 19:13:07 2017, Jamie, Update, CDS, daqd showing same instability as before 
                Reply  Fri Jul 28 20:22:41 2017, Jamie, Update, CDS, possible stable daqd configuration with separate DC and FW 
                   Reply  Mon Jul 31 15:13:24 2017, gautam, Update, CDS, FB ---> FB1 
                   Reply  Mon Jul 31 18:44:40 2017, Jamie, Update, CDS, CDS system essentially fully recovered 02.png
                      Reply  Thu Aug 3 19:46:27 2017, Jamie, Update, CDS, new daqd restart procedure 
                      Reply  Fri Aug 4 09:07:28 2017, rana, Update, CDS, CDS system essentially NOT fully recovered 
                         Reply  Thu Aug 10 14:25:52 2017, gautam, Update, CDS, Slow EPICS channels -> Frames re-enabled 
                            Reply  Fri Aug 11 00:10:03 2017, gautam, Update, CDS, Slow EPICS channels -> Frames re-enabled 
                               Reply  Fri Aug 11 11:14:24 2017, gautam, Update, CDS, Slow EPICS channels -> Frames re-enabled 
                               Reply  Fri Aug 11 18:53:35 2017, gautam, Update, CDS, Slow EPICS channels -> Frames re-enabled 
                      Reply  Fri Aug 11 19:34:49 2017, Jamie, Update, CDS, CDS final bits status update 
Message ID: 13133     Entry time: Sun Jul 23 22:16:55 2017     In reply to: 13130     Reply to this: 13135   13138
Author: Jamie, gautam 
Type: Update 
Category: CDS 
Subject: front-end now running with new OS, RCG 

All front ends and model are (mostly) running now

All suspensions are damped:

It should be possible at this point to do more recovery, like locking the MC.

Some details on the restore process:

  • all models were recompiled with the new RCG version 3.0.3
  • the new RCG does stricter simulink drawing checks, and was complaining about unterminated outputs in some of the SUS models.  Terminated all outputs it was concerned about and saved.
  • RCG 3.0 requires a new directory for doing better filter module diagnostics: /opt/rtcds/caltech/c1/chans/tmp
  • had to reset the slow machines c1susaux, c1auxex, c1auxey

The daqd is not yet running.  This is the next task.

I have been taking copious notes and will fully document the restore process once complete.

c1ioo issues

c1ioo has been giving us a little bit of trouble.  The c1ioo model kept crashing and taking down the whole c1ioo host.  We found a red light on one of the ADCs (ADC1).  We pulled the card and replaced it with a spare from the CDS cabinet.  That seemed to fix the problem and c1ioo became more stable.

We've still been seeing a lot of glitching in c1ioo, though, with CPU cycle times frequently (every couple of seconds) running above threshold for all models, up to 200 us.  I tried unloading every kernel module I could and shutting down every non-critical process, but nothing seemed to help.

We eventually tried stopping the c1ioo model altogether and that seemed to help quite a bit, dropping the long cycle rate down to something like one every 30 seconds or so.  Not sure what that means.  We should look into the BIOS again, to see if there could be something interacting with the newer kernel.

So currently the c1ioo model is not running (which is why it's all white in the CDS overview snapshot above).  The fact that c1ioo is not running and the remaining models are still occaissionly glitching is also causing various IPC errors on auxilliary models (see c1mcs, c1rfm, c1ass, c1asx). 

RCG compile warnings

the new RCG tries to do more checks on custom c code, but it seems to be having trouble finding our custom "ccodeio.h" files that live with the c definitions in USERAPPS/*/common/src/.  Unclear why yet.  This is causing the RCG to spit out warnings like the following:

Cannot verify the number of ins/outs for C function BLRMS.
    File is /opt/rtcds/userapps/release/cds/c1/src/BLRMSFILTER.c
    Please add file and function to CDS_SRC or CDS_IFO_SRC ccodeio.h file.

This are just warnings and will not prevent the model form compiling or warning.  We'll figure out what the problem is to make these go away, but they can be ignored for the time being.

model unload instability

Probably the worst problem we're facing right now is an instability that will occaissionally, but not always, cause the entire front end host to freeze up upon unloading an RTS kernel module.  This is a known issue with the newer linux kernels (we're using kernel version 3.2.35), and is being looked into.

This is particularly annoying with the machines on the dolphin network, since if one of the dolphin hosts goes down it manages to crash all the models reading from the dolphin network.  Since half the time they can't be cleanly restarted, this tends to cause a boot fest with c1sus, c1lsc, and c1ioo.  If this happens, just restart those machines, wait till they've all fully booted, then restart all the models on all hosts with "rtcds start all".

ELOG V3.1.3-