40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Sat Jan 19 15:05:37 2013, Jenne, Update, Computers, All front ends but c1lsc are down upside_down_cat-t2.jpg
    Reply  Sat Jan 19 18:23:31 2013, rana, Update, Computers, All front ends but c1lsc are down FE.png
       Reply  Wed Jan 30 13:50:27 2013, Jenne, Update, Computers, c1iscex still down 
          Reply  Thu Jan 31 10:23:39 2013, Jamie, Update, Computers, c1iscex still down 
             Reply  Fri Feb 15 15:21:07 2013, Jamie, Update, Computers, c1iscex IO-chassis dead 
                Reply  Tue Feb 19 15:10:02 2013, Jamie, Update, CDS, c1iscex alive again 
                   Reply  Thu Feb 21 12:56:38 2013, Jenne, Update, CDS, c1iscex dead again 
                      Reply  Thu Feb 21 14:32:02 2013, Jamie, Update, CDS, c1iscex models restarted 
Message ID: 7920     Entry time: Sat Jan 19 15:05:37 2013     Reply to this: 7922
Author: Jenne 
Type: Update 
Category: Computers 
Subject: All front ends but c1lsc are down 

Message I get from dmesg of c1sus's IOP:

[   44.372986] c1x02: Triggered the ADC
[   68.200063] c1x02: Channel Hopping Detected on one or more ADC modules !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[   68.200064] c1x02: Check GDSTP screen ADC status bits to id affected ADC modules
[   68.200065] c1x02: Code is exiting ..............
[   68.200066] c1x02: exiting from fe_code()

Right now, c1x02's max cpu indicator reads 73,000 micro seconds.  c1x05 is 4,300usec, and c1x01 seems totally fine, except that it has the 02xbad.

c1x02 has 0xbad (not 0x2bad).  All other models on c1sus, c1ioo, c1iscex and c1iscey all have 0x2bad.

Also, no models on those computers have 'heartbeats'.

C1x02 has "NO SYNC", but all other IOPs are fine.

I've tried rebooting c1sus, restarting the daqd process on fb, all to no avail.  I can ssh / ping all of the computers, but not get the models running.  Restarting the models also doesn't help.

upside_down_cat-t2.jpg

c1iscex's IOP dmesg:

[   38.626001] c1x01: Triggered the ADC
[   39.626001] c1x01: timeout 0 1000000
[   39.626001] c1x01: exiting from fe_code()

c1ioo's IOP has the same ADC channel hopping error as c1sus'.

 

ELOG V3.1.3-