I found the c1lsc machine to be completely unresponsive today. Looking at the trend of the state word, it happened sometime yesterday (Saturday). The usual reboot procedure did not work - I am not able to bring back any of the models on any of the machines, during the restart procedure, they all fail. The logfile reads (for the c1ioo front end, but they all behave the same):
[ 309.783460] c1x03: Initializing space for daqLib buffers
[ 309.887357] CPU 2 is now offline
[ 309.887422] c1x03: Sync source = 4
[ 309.887425] c1x03: Waiting for EPICS BURT Restore = 2
[ 309.946320] c1x03: Waiting for EPICS BURT 0
[ 309.946320] c1x03: BURT Restore Complete
[ 309.946320] c1x03: Corrupted Epics data: module=0 filter=1 filterType=0 filtSections=134610112
[ 309.946320] c1x03: Filter module init failed, exiting
[ 363.229086] c1x03: Setting stop_working_threads to 1
[ 364.232148] DXH Adapter 0 : BROADCAST - dx_user_mcast_unbind - mcgroupid=0x3
[ 364.233689] Will bring back CPU 2
[ 365.236674] Booting Node 1 Processor 2 APIC 0x2
[ 365.236771] smpboot cpu 2: start_ip = 9a000
[ 309.946320] Calibrating delay loop (skipped) already calibrated this CPU
[ 365.251060] NMI watchdog enabled, takes one hw-pmu counter.
[ 365.252135] Brought the CPU back up
[ 365.252138] c1x03: Just before returning from cleanup_module for c1x03
Not sure what is going on here, or what "Corrutped EPICS data" is supposed to mean. Thinking that something was messed up the last time the model was compiled, I tried recompiling the IOP model. But I'm not able to even compile the model, it fails giving the error message
make[1]: Leaving directory '/opt/rtcds/caltech/c1/rtbuild/3.4'
make[1]: /cvs/cds/rtapps/epics-3.14.12.2_long/modules/seq/bin/linux-x86_64/snc: Command not found
make[1]: *** [build/c1x03epics/c1x03.c] Error 127
Makefile:28: recipe for target 'c1x03' failed
make: *** [c1x03] Error 1
I suspect this is some kind of path problem - the EPICS_BASE bash variable is set to /cvs/cds/rtapps/epics-3.14.12.2_long/base on the FEs, while /cvs isn't even mounted on the FEs (nor do I think it should be). I think the correct path should be /opt/rtapps/epics-3.14.12.2_long/base. Why should this have changed?
I've shutdown all watchdogs until this is resolved. |