[Joe,Alex]
Alex came over and we installed the new Dolphin drivers so that the front ends using the Dolphin PCIe RFM network don't pause for a long time when one of the other nodes in the network go down. Generally this pause would cause the code to time out and quit. Now you can take c1lsc or c1sus down without having the other have problems.
We did note on reboot however, that the Dolphin_wait script sometimes (not always) seems to hang. Since this is run at boot up, to ensure the dolphin card has had enough to allocate memory space for data to be written/read from by the IOP process, it means nothing else in the startup script gets run if it does happen. In this case, running "pkill dolphin_wait" may be necessary.
Note that you may still have problems if you hit the power button to force a shutdown (i.e. holding it for 4 seconds for immediate power off), but as long as you do a "reboot" or "shutdown -r now" type command, it should come down gracefully.
What was done:
Alex grabbed the code from his server, and put it /home/controls/DIS/ on fb.
He ran the following commands in that directory to build the code.
./configure '--with-adapter=DX' '--prefix=/opt/DIS'
make
sudo make install
He proceeded to modify the /diskless/root/etc/rc.local to have the line:
insmod /lib/modules/2.6.34.1/kernel/drivers/dis/dis_kosf.ko
In that same file he commented out
cd /root
and
exec /bin/bash/
He then modified the run levels in /diskless/root/etc/inittab. Level 0, level 3, and level 6 were changed:
l0:0:wait/etc/rc.halt
l3:3:wait:etc/rc.level3
l6:6:wait:/etc/rc.reboot
Then he created the scripts he was refering to:
rc.level3 is just:
exec /bin/bash
rc.halt is:
/opt/DIS/sbin/dxtool prepare-shutdown 0
sleep 3
halt -p
rc.reboot is:
reboot
Basically rc.halt calls a special code which prepares the Dolphin RFM card to shutdown nicely. This is why just hitting the power button for 4 seconds will cause problems for the rest of the dolphin network.
We then checked out of svn the latest dolphin.c in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe
The Dolphin RFM cards have a new numbering scheme. 4 is reserved for special broadcasts to everyone, so the Dolphin node IDs now start at 8. So we needed to change the c1lsc and c1sus Dolphin node IDs.
To change them we went to /etc/dis/dishosts.conf on the fb machine, and changed the following lines:
HOSTNAME: c1sus
ADAPTER: c1sus_a0 4 0 4
HOSTNAME: c1lsc
ADAPTER: c1lsc_a0 8 0 4
to
HOSTNAME: c1sus
ADAPTER: c1sus_a0 8 0 4
HOSTNAME: c1lsc
ADAPTER: c1lsc_a0 12 0 4
The FE models for the c1lsc and c1sus machines were recompiled and then the computers were rebooted. After having them come back up, we tested that there was no time out by shutting down c1lsc and watching c1sus. We then reveresed and shutdown c1sus while watching c1lsc. No problems occured. Currently they are up and communicating fine.
|