40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Mon Jul 10 09:49:02 2017, gautam, Update, General, All FEs down CDS_down_10Jul2017.png
    Reply  Mon Jul 10 11:20:20 2017, gautam, Update, General, All FEs down 
       Reply  Mon Jul 10 17:46:26 2017, gautam, Update, General, All FEs down 
          Reply  Mon Jul 10 19:15:21 2017, gautam, Update, General, All FEs down 
             Reply  Mon Jul 10 21:03:48 2017, jamie, Update, General, All FEs down 
                Reply  Mon Jul 10 22:07:35 2017, Koji, Update, General, All FEs down 
                   Reply  Tue Jul 11 15:03:55 2017, gautam, Update, General, All FEs down 
                      Reply  Tue Jul 11 15:12:57 2017, Koji, Update, General, All FEs down 
                         Reply  Wed Jul 12 10:21:07 2017, gautam, Update, General, All FEs down 
                            Reply  Wed Jul 12 14:46:09 2017, gautam, Update, General, All FEs down 
                               Reply  Wed Jul 12 14:52:32 2017, jamie, Update, General, All FEs down 
                                  Reply  Fri Jul 14 17:47:03 2017, gautam, Update, General, Disks from LLO have arrived 
Message ID: 13106     Entry time: Mon Jul 10 17:46:26 2017     In reply to: 13104     Reply to this: 13107
Author: gautam 
Type: Update 
Category: General 
Subject: All FEs down 

A bit more digging on the diagnostics page of the RAID array reveals that the two power supplies actually failed on Jun 2 2017 at 10:21:00. Not surprisingly, this was the date and approximate time of the last major power glitch we experienced. Apart from this, the only other error listed on the diagnostics page is "Reading Error" on "IDE CHANNEL 2", but these errors precede the power supply failure.

Perhaps the power supplies are not really damaged, and its just in some funky state since the power glitch. After discussing with Jamie, I think it should be safe to power cycle the Jetstor RAID array once the FB machine has been powered down. Perhaps this will bring back one/both of the faulty power supplies. If not, we may have to get new ones. 

The problem with FB may or may not be related to the state of the Jestor RAID array. It is unclear to me at what point during the boot process we are getting stuck at. It may be that because the RAID disk is in some funky state, the boot process is getting disrupted.

Quote:

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

 

ELOG V3.1.3-