40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Message ID: 476     Entry time: Wed May 14 13:14:19 2008
Author: Andrey 
Type: Summary 
Category: Computers 
Subject: Reflective Memory Network is restored 

Reflective Memory Network is restored, all watchdogs and oplevs are returned to the "enabled" state.

In order to revive the computers, several things were done.

1) Following Mr. Adhikari's elog entry #353, I walked around the interferometer room, and switched off the power keys in all crates with computers whose names are contained in the MEDM Reflective Memory screen, including the rack with the framebuilder. By the way, it was nontrivial to find the switch in the 1Y4 crate that would shut off/on processors "c1susvme1" and "c1susvme2": the switch turned out to be located at the rear side of the crate, and it is not a key but it is a button.

2) I was trying to follow wiki-40 computer restart procedures, but every time that I was trying to run "startup.cmd" screen from the corresponding target subdirectory, I got the error message "Device or resource busy".
By the way, one more thing was learned: if you firstly open in terminal burtgooey, select the snap file, then reboot the processor, and then will try to burt-restore it, you will get the message "Status Not OK". In order to really burt-restore the processor which was recently rebooted, you need to close the terminal with burtgooey and open burtgooey in a new terminal window which should be opened after rebooting the processor.

Feeling that my activities according to wiki-40 procedures do not revive computers, I invited Alex Ivanov.

3) Alex tried to touch the memory card in "c1iovme" in rack 1Y2, because once before this card failed causing network problems, but this did not help.

4) We shutted off and restarted again (pressing the power-switching button) the black Linux machine "c1dcuepics" (located in the very bottom below the framebuilder). Alex says that this machine is responsible for all EPICS. It was not restarted for 182 days, and probably some process there went wrong.

After restarting this machine "c1dcuepics" we were able to follow wiki-40 procedures for restarting all other computers (whose names are on the MEDM RFM network). We ran correcponding "startup.cmd" files and burt-restored them without error messages.

Now all the computers work and communicate in a proper way.

Mr. Joseph Betzwiezer was helping me with all these activities (we decided that it is more important that cameras for now), thanks to him. But our joint skills turned out to be insufficient, so Alex Ivanov's contribution was the most important.
ELOG V3.1.3-