Today I finished implementing loopback monitors of the up/down state of the slow controls machines. They are visible on a new MEDM screen accessible from Sitemap > CDS > Slow Machine Status (pictured in attachment 1). Each monitor is a single EPICS binary channel hosted by the slow machine, which toggles its state at 1 Hz (an alive "blinker"). For each machine, the monitor is defined by a separate database file named c1[machine]_state.db located in the target directory.
Sitemap > CDS > Slow Machine Status
This is implemented for all upgraded machines, which includes every slow machine except for c1auxey. This is the next and final one slated for replacement.
The blinkers are currently implemented as soft channels, but I'd like to ultimately convert them to hard channels using two sinking/sourcing BIO units. This will require new wiring inside each Acromag chassis, however. For now, as soft channels, the monitors are sensitive to a failure of the host machine or a failure of the EPICS IOC. As hard channels, they will additionally be sensitive to a failure of the secondary network interface, as has been known to happen.
Each slow machine's IOC had to be restarted this afternoon to pick up the new channels. The IOCs were restarted according to the following procedure.
The intial recovery of c1susaux did not succeed. Most visibly, the alignment state of the IFO was not restored. After some debugging, we found that the restart of the modbus service was partially failing at the final burt-restore stage. The latest snapshot file /opt/rtcds/caltech/c1/burt/autoburt/latest/c1susaux.snap was not found. I manually restored a known good snapshot from earlier in the day (15:19) and we were able to relock the IMC and XARM. GV and I were just talking earlier today about eliminating these burt-restores from the systemd files. I think we should.