40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue Oct 17 23:07:52 2017, gautam, Update, CDS, FEs unresponsive 
    Reply  Wed Oct 18 01:41:32 2017, jamie, Update, CDS, FEs unresponsive 
       Reply  Wed Oct 18 02:09:32 2017, gautam, Update, CDS, FEs unresponsive 
          Reply  Wed Oct 18 09:21:22 2017, jamie, Update, CDS, FEs unresponsive 
             Reply  Wed Oct 18 23:11:53 2017, gautam, Update, CDS, FEs unresponsive 
Message ID: 13394     Entry time: Wed Oct 18 23:11:53 2017     In reply to: 13388
Author: gautam 
Type: Update 
Category: CDS 
Subject: FEs unresponsive 

This happened again just now - it was roughly this time when this happened last night as well.

There was certainly an EPICS freeze of the kind we were used to seeing prior to replacing the martian wireless router sometime in late 2015 (or early 2016?). I was trying to run the dither alignment servos on the Y-arm at this time, and all the StripTool traces flatlined.

I took the opportunity to try accessing testpoints from the iscey ADCs - specifically C1:SUS-TRY_OUT, and it seemed to work just fine. However, I couldn't ssh into c1iscey.

Looking at the dmesg once I was able to ssh in eventually (~2 minutes deadtime tonight, I feel like it was longer yesterday but can't quantify), I see the following: not sure if there are any clues in here, or whether this is the correct log to check. But there are many instances of the nfs server related message in the log. Note that the system time-stamp corresponds to when this freeze happened.

[5461308.784018] nfs: server 192.168.113.201 not responding, still trying
[5461412.936284] nfs: server 192.168.113.201 OK
[5461412.937130] systemd[1]: Starting Journal Service...
[5461412.947947] systemd-journald[20281]: Received SIGTERM from PID 1 (systemd).
[5461412.996063] systemd[1]: Unit systemd-journald.service entered failed state.
[5461413.002627] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[5461413.008983] systemd[1]: Stopping Journal Service...
[5461413.014664] systemd[1]: Starting Journal Service...
[5461413.044262] systemd[1]: Started Journal Service.
[5461413.694838] systemd-journald[400]: Received request to flush runtime journal from PID 1

 

ELOG V3.1.3-