40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Tue May 8 11:41:16 2018, gautam, Update, General, IFO maintenance ITMX_stuck.png
    Reply  Wed May 9 17:30:04 2018, gautam, Update, General, Input beam misaligned InputBeamGone.pngTTpointing.png
       Reply  Wed May 9 19:51:07 2018, gautam, Update, General, Input beam misaligned DACweirdness.pngDACerror.png
          Reply  Sun May 13 15:15:18 2018, gautam, Update, General, CDS crash vertexFEs_crashed.png
             Reply  Sun May 13 17:31:51 2018, gautam, Update, General, CDS crash CDS_overview_20180513.pngAS_1210293643.jpeg
                Reply  Sun May 13 20:48:38 2018, johannes, Update, General, CDS crash 
                Reply  Thu May 17 11:56:37 2018, gautam, Update, General, EPICS process died on c1ioo 
                   Reply  Thu May 24 10:16:29 2018, gautam, Update, General, All models on c1lsc frontend crashed c1lsc_crashed.png
    Reply  Thu May 10 08:45:16 2018, Steve, Update, General, 4.5M eq. Cabazon, CA Cabazon4.5m79m.png4.5Meq.png
       Reply  Thu May 10 11:38:19 2018, gautam, Update, General, ITMY UL ITMY_UL.pdf
       Reply  Wed Aug 22 08:44:09 2018, Steve, Update, General, earth quake yesterday_EQs.png
          Reply  Fri Aug 24 08:04:37 2018, Steve, Update, General, small earth quake small_EQ.png
             Reply  Mon Aug 27 09:14:45 2018, Steve, Update, PEM, small earth quakes  small_EQs_vs_SUSs.png
                Reply  Tue Nov 27 10:50:20 2018, Steve, Update, PEM, earth quake Mexico 5.5M.Mexico.png
                   Reply  Thu Nov 29 08:13:33 2018, Steve, Update, PEM, EQ 3.9m So CA 3.9mSoCA.pngVac_as_today.pngas_is.png
          Reply  Wed Aug 29 09:20:27 2018, Steve, Update, SUS, local 4.4M earth quake 4.4_La_Verne.png3.4_&_4.4M_EQ.png
             Reply  Wed Aug 29 11:46:27 2018, Jon, Update, SUS, local 4.4M earth quake 
             Reply  Thu Sep 20 08:17:14 2018, Steve, Update, SUS, local 3.4M earth quake local_3.4M.png
Message ID: 13837     Entry time: Sun May 13 15:15:18 2018     In reply to: 13828     Reply to this: 13838
Author: gautam 
Type: Update 
Category: General 
Subject: CDS crash 

I found the c1lsc machine to be completely unresponsive today. Looking at the trend of the state word, it happened sometime yesterday (Saturday). The usual reboot procedure did not work - I am not able to bring back any of the models on any of the machines, during the restart procedure, they all fail. The logfile reads (for the c1ioo front end, but they all behave the same):

[  309.783460] c1x03: Initializing space for daqLib buffers
[  309.887357] CPU 2 is now offline
[  309.887422] c1x03: Sync source = 4
[  309.887425] c1x03: Waiting for EPICS BURT Restore = 2
[  309.946320] c1x03: Waiting for EPICS BURT 0
[  309.946320] c1x03: BURT Restore Complete
[  309.946320] c1x03: Corrupted Epics data:  module=0 filter=1 filterType=0 filtSections=134610112
[  309.946320] c1x03: Filter module init failed, exiting
[  363.229086] c1x03: Setting stop_working_threads to 1
[  364.232148] DXH Adapter 0 : BROADCAST - dx_user_mcast_unbind - mcgroupid=0x3
[  364.233689] Will bring back CPU 2
[  365.236674] Booting Node 1 Processor 2 APIC 0x2
[  365.236771] smpboot cpu 2: start_ip = 9a000
[  309.946320] Calibrating delay loop (skipped) already calibrated this CPU
[  365.251060] NMI watchdog enabled, takes one hw-pmu counter.
[  365.252135] Brought the CPU back up
[  365.252138] c1x03: Just before returning from cleanup_module for c1x03

Not sure what is going on here, or what "Corrutped EPICS data" is supposed to mean. Thinking that something was messed up the last time the model was compiled, I tried recompiling the IOP model. But I'm not able to even compile the model, it fails giving the error message

make[1]: Leaving directory '/opt/rtcds/caltech/c1/rtbuild/3.4'
make[1]: /cvs/cds/rtapps/epics-3.14.12.2_long/modules/seq/bin/linux-x86_64/snc: Command not found
make[1]: *** [build/c1x03epics/c1x03.c] Error 127
Makefile:28: recipe for target 'c1x03' failed
make: *** [c1x03] Error 1

I suspect this is some kind of path problem - the EPICS_BASE bash variable is set to /cvs/cds/rtapps/epics-3.14.12.2_long/base on the FEs, while /cvs isn't even mounted on the FEs (nor do I think it should be). I think the correct path should be /opt/rtapps/epics-3.14.12.2_long/base. Why should this have changed?

I've shutdown all watchdogs until this is resolved.

Attachment 1: vertexFEs_crashed.png  18 kB  Uploaded Sun May 13 16:16:47 2018  | Hide | Hide all
vertexFEs_crashed.png
ELOG V3.1.3-