40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Message ID: 4270     Entry time: Thu Feb 10 14:07:18 2011
Author: josephb 
Type: Update 
Category: CDS 
Subject: Updating dolphin drivers to eliminate timeouts when one dolphin card is shutdown 

[Joe,Alex]

Alex came over and we installed the new Dolphin drivers so that the front ends using the Dolphin PCIe RFM network don't pause for a long time when one of the other nodes in the network go down.  Generally this pause would cause the code to time out and quit.  Now you can take c1lsc or c1sus down without having the other have problems.

We did note on reboot however, that the Dolphin_wait script sometimes (not always) seems to hang.  Since this is run at boot up, to ensure the dolphin card has had enough to allocate memory space for data to be written/read from by the IOP process, it means nothing else in the startup script gets run if it does happen.  In this case, running "pkill dolphin_wait" may be necessary.

Note that you may still have problems if you hit the power button to force a shutdown (i.e. holding it for 4 seconds for immediate power off), but as long as you do a "reboot" or "shutdown -r now" type command, it should come down gracefully. 

 

What was done:

Alex grabbed the code from his server, and put it /home/controls/DIS/ on fb.

He ran the following commands in that directory to build the code.

./configure  '--with-adapter=DX' '--prefix=/opt/DIS'

make

sudo make install

He proceeded to modify the /diskless/root/etc/rc.local to have the line:

insmod /lib/modules/2.6.34.1/kernel/drivers/dis/dis_kosf.ko

In that same file he commented out

cd /root

and

exec /bin/bash/

He then modified the run levels in /diskless/root/etc/inittab. Level 0, level 3, and level 6 were changed:

l0:0:wait/etc/rc.halt

l3:3:wait:etc/rc.level3

l6:6:wait:/etc/rc.reboot

Then he created the scripts he was refering to:

rc.level3 is just:

exec /bin/bash

rc.halt is:

/opt/DIS/sbin/dxtool prepare-shutdown 0
sleep 3
halt -p

rc.reboot is:

reboot

 

Basically rc.halt calls a special code which prepares the Dolphin RFM card to shutdown nicely.  This is why just hitting the power button for 4 seconds will cause problems for the rest of the dolphin network.

We then checked out of svn the latest dolphin.c in  /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe

 

The Dolphin RFM cards have a new numbering scheme.  4 is reserved for special  broadcasts to everyone, so the Dolphin node IDs now start at 8.  So we needed to change the c1lsc and c1sus Dolphin node IDs.

To change them we went to /etc/dis/dishosts.conf on the fb machine, and changed the following lines:

 

HOSTNAME: c1sus
ADAPTER:  c1sus_a0 4 0 4
HOSTNAME: c1lsc
ADAPTER:  c1lsc_a0 8 0 4

to

HOSTNAME: c1sus
ADAPTER:  c1sus_a0 8 0 4
HOSTNAME: c1lsc
ADAPTER:  c1lsc_a0 12 0 4

The FE models for the c1lsc and c1sus machines were recompiled and then the computers were rebooted.  After having them come back up, we tested that there was no time out by shutting down c1lsc and watching c1sus. We then reveresed and shutdown c1sus while watching c1lsc.  No problems occured.  Currently they are up and communicating fine.

 

ELOG V3.1.3-