40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Mon Jul 10 09:49:02 2017, gautam, Update, General, All FEs down CDS_down_10Jul2017.png
    Reply  Mon Jul 10 11:20:20 2017, gautam, Update, General, All FEs down 
       Reply  Mon Jul 10 17:46:26 2017, gautam, Update, General, All FEs down 
          Reply  Mon Jul 10 19:15:21 2017, gautam, Update, General, All FEs down 
             Reply  Mon Jul 10 21:03:48 2017, jamie, Update, General, All FEs down 
                Reply  Mon Jul 10 22:07:35 2017, Koji, Update, General, All FEs down 
                   Reply  Tue Jul 11 15:03:55 2017, gautam, Update, General, All FEs down 
                      Reply  Tue Jul 11 15:12:57 2017, Koji, Update, General, All FEs down 
                         Reply  Wed Jul 12 10:21:07 2017, gautam, Update, General, All FEs down 
                            Reply  Wed Jul 12 14:46:09 2017, gautam, Update, General, All FEs down 
                               Reply  Wed Jul 12 14:52:32 2017, jamie, Update, General, All FEs down 
                                  Reply  Fri Jul 14 17:47:03 2017, gautam, Update, General, Disks from LLO have arrived 
Message ID: 13117     Entry time: Fri Jul 14 17:47:03 2017     In reply to: 13115
Author: gautam 
Type: Update 
Category: General 
Subject: Disks from LLO have arrived 

[jamie, gautam]

Today morning, the disks from LLO arrived. Jamie and I have been trying to get things back up and running, but have not had much success today. Here is a summary of what we tried.

Keith Thorne sent us two disks: one has the daqd code and the second is the boot disk for the FE machines. Since Jamie managed to successfully compile the daqd code on FB1 yesterday, we decided to try the following: mount the boot disk KT sent us (using a SATA/USB adapter) on /mnt on FB1, get the FEs booted up, and restart the RT models. 

Quote:

I just want to mention that the situation is actually much more dire than we originally thought.  The diskless NFS root filesystem for all the front-ends was on that fb disk.  If we can't recover it we'll have to rebuilt the front end OS as well.

As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.

While on FB1, Jamie realized he actually had a copy of the /diskless/root directory, which is the NFS filesystem for the FEs, on FB1. So we decided to try and boot some of the FEs with this (instead of starting from scratch with the disks KT sent us). The way things were set up, the FEs were querying the FB machine as the DHCP server. But today, we followed the instructions here to get the FEs to get their IP address from chiara instead. We also added the line 

/diskless/root *(sync,rw,no_root_squash,no_all_squash,no_subtree_check)

to /etc/exports followed by exportfs -ra on FB1. At which point the FE machine we were testing (c1lsc) was able to boot up. 

However, it looks like the NFS filesystem isn't being mounted correctly, for reasons unknown. We commented out some of the rtcds related lines in /etc/rc.local because they were causing a whole bunch of errors at boot (the lines that were touched have been tagged with today's date).


So in summary, the status as of now is:

  1. Front-end machines are able to boot
  2. There seems to be some problem during the boot process, leading to the NFS file system not being correctly mounted. The closest related thing I could find from an elog search is this entry, but I think we are facing a different probelm.
  3. We wanted to see if we could start the realtime models (but without daqd for now), but we weren't even able to get that far today.

We will resume recovery efforts on Monday.

ELOG V3.1.3-