At 09:34 PST I noted a glitch in the controls room as the machines went down except for c1ioo. Briefly, the video feeds disappeared from the screens, though the screens themselves didn't lose power. At first I though this was some kind of power glitch, but upon checking with Jordan, it most likely was related to some system crash. Coming back to the controls room, I could see the MC reflection beam swinging, but unfortunately all the FE models came down. I noticed that the DAQ status channels were blank.
I ssh into c1ioo no problem and ran "rtcds stop c1ioo c1als c1omc", then "rtcds restart c1x03" to do a soft restart. This worked, but the DAQ status was still blank. I then tried to ssh into c1sus and c1lsc without success, similarly c1iscex and c1iscey were unreachable. I went and did a hard restart on c1iscex by switching it off, then its extension chassis, then unplugging the power cords, then inverting these steps, and could ssh into it from rossa. I ran "rtcds start c1x01" and saw the same blank DAQ status. I noticed the elog was also down... so nodus was also affected?
Anchal got on zoom to offer some assistance. We discovered that the fb1 and nodus were subject to some kind of system reboot at precisely 09:34. The "systemctl --failed" command on fb1 displayed both the daqd_dc.service and rc-local.service as loaded but failed (inactive). Is it a good idea to try and reboot the fb1 machine? ... Anchal was able to bring elog back up from nodus (ergo, this post).
Although it probably needs the DAQ service from the fb1 machine to be up and running, I tried running the scripts/cds/rebootC1LSC.sh script. This didn't work. I tried running sudo systemctl restart daqd_dc from the fb1 machine without success. Running systemctl reset-failed "worked" for daqd_dc and rc-local services on fb1 in the sense that they were no longer output from systemctl --failed, but they remained inactive (dead) when running systemctl status on them. Following from 15303 I succeeded in restarting the daqd services. Turned out I needed to manually start the open-mx and mx services in fb1. I rerun the restartC1LSC script without success. The script fails because some machines need to be rebooted by hand.