40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Aug 14 11:39:13 2012, Jamie, Rolf, Alex, Update, CDS, Debugging of c1sus machine and c1rfm models 
    Reply  Tue Aug 14 11:49:15 2012, Den, Update, CDS, Debugging of c1sus machine and c1rfm models 
    Reply  Tue Aug 14 17:47:44 2012, Jamie, Update, CDS, c1sus machine replaced 
Message ID: 7182     Entry time: Tue Aug 14 17:47:44 2012     In reply to: 7174
Author: Jamie 
Type: Update 
Category: CDS 
Subject: c1sus machine replaced 

Rolf and Alex came back over with a replacement machine for c1sus.   We removed the old machine, removed it's timing, dolphin, and PCIe extension cards and put them in the new machine.  We then installed the new machine and booted it and it came up fine.  The BIOS in this machine is slightly different, and it wasn't having the same failure-to-boot-with-no-COM issue that the previous one was.  The COM ports are turned off on this machine (as is the USB interface).

Unfortunately the problem we were experiencing with the old machine, that unloading certain models was causing others to twitch and that dolphin IPC writes were being dropped, is still there.  So the problem doesn't seem to have anything to do with hardware settings...

After some playing, Rolf and Alex determined that for some reason the c1rfm model is coming up in a strange state when started during boot.  It runs faster, but the IPC errors are there.  If instead all models are stopped, the c1rfm model is started first, and then the rest of the models are started, the c1rfm model runs ok.  They don't have an explanation for this, and I'm not sure how we can work around it other than knowing the problem is there and do manual restarts after boot.  I'll try to think of something more robust.

A better "fix" to the problems is to clean up all of our IPC routing, a bunch of which we're currently doing very inefficient right now.  We're routing things through c1rfm that don't need to be, which is introducing delays.  It particular, things that can communicate directly over RFM or dolphin should just do so.  We should also figure out if we can put the c1oaf and c1pem models on the same machine, so that they can communicate directly over shared memory (SHMEM).  That should cut down on overhead quite a bit.  I'll start to look at a plan to do that.


ELOG V3.1.3-