40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Fri Dec 19 18:08:46 2014, Jenne, Update, CDS, SOS!!! HELP!! EPICS freeze 45min+ so far! 
    Reply  Fri Dec 19 19:21:04 2014, diego, Update, CDS, SOS!!! HELP!! EPICS freeze 45min+ so far! 
       Reply  Fri Dec 19 20:32:11 2014, diego, Update, CDS, SOS!!! HELP!! EPICS freeze 45min+ so far! 
Message ID: 10822     Entry time: Fri Dec 19 19:21:04 2014     In reply to: 10821     Reply to this: 10823
Author: diego 
Type: Update 
Category: CDS 
Subject: SOS!!! HELP!! EPICS freeze 45min+ so far! 

Quote:

[Jenne, Diego]

The EPICS freeze that we had noticed a few weeks ago (and several times since) has happened again, but this time it has not come back on its own.  It has been down for almost an hour so far. 

 So far, we have reset the Martian network's switch that is in the rack by the printer.  We have also power cycled the NAT router.  We have moved the NAT router from the old GC network switch to the new faster switch, and reset the Martian network's switch again after that.

We have reset the network switch that is in 1X6.

We have reset what we think is the DAQ network switch at the very top of 1X7.

So far, nothing is working.  EPICS is still frozen, we can't ping any computers from the control room, and new terminal windows won't give you the prompt (so perhaps we aren't able to mount the nfs, which is required for the bashrc).

We need help please!

[EricQ]

 

EricQ suggested it may be some NFS related issue: if something, maybe some computer in the control room, is asking too much to chiara, then all the other machines accessing chiara will slow down, and this could escalate and lead to the Big Bad Freeze. As a matter of fact, chiara's dmesg pointed out its eth0 interface being brought up constantly, as if something is making it go down repeatedly. Anyhow, after the shutdown of all the computers in the control room, a  reboot of chiara, megatron and the fb was performed.

 

[Diego]

Then I rebooted pianosa, and most of the issues seem gone so far; I had to "mxstream restart" all the frontends from medm and everyone of them but c1scy seems to behave properly. I will now bring the other machines back to life and see what happens next.

ELOG V3.1.3-