TL;DR
MC1 LL Sensor showed signs of fluctuating large offsets. We tried to find the issue in the box but couldn't find any. On power cycling, the sensor got back to normal. But in putting back the box, we bumped something and c1susaux slow channels froze. We tried to reboot it, but it didn't work and the channels do not exist anymore.
Today morning we came to find that IMC struggled to lock all night (See attachment 1). We kind of had an indication yesterday evening that MC1 LL Sensor PD had a higher variance than usual and Paco had to reset WFS offsets because they had integrated the noise from this sensor. Something similar happened last night, that a false offset and its fluctuation overwhelmed WFS and MC1 got misaligned making it impossible for IMC to get lock.
In the morning, Paco again reset the WFS offsets but not we were sure that the PD variance from MC1 LL osem was very high. See attachment 2 to see how only 1 OSEM is showing higher noise in comparison to the other 4 OSEMs. This behavior is similar to what we saw earlier in 16138 but for UL sensor. Koji and I fixed it in 16139 and we tested all other channels too.
So, Paco and I, went ahead and took out the MC1 satellite amplifier box S2100029 D1002812, opened the top, and checked all the PD channel testpoints with no input current. We didn't find anything odd. Next we checked the LED dirver circuit testpoints with LED OUT and GND shorted. We got 4.997V on all LED MON testpoints which indicate normal functioning.
We just hooked back everything on the MC1 satellit box and checked the sensor channels again on medm screens. To our surprise, it started functioning normally. So maybe, just a power cycling was required but we still don't know what caused this issue.
BUT when I (Anchal) was plugging back the power cables and D25 connectors on the back side in 1X4 after moving the box back into the rack, we found that the slow channels stopped updating. They just froze!
We got worried for some time as the negative power supply indicator LEDs on the acromag chassis (which is just below the MC1 satellite box) were not ON. We checked the power cables and had to open the side panel of the 1X4 rack to check how the power cables are connected. We found that there is no third wire in the power cables and the acromag chassis only takes in single rail supply. We confirmed this by looking at another acromag chassis on Xend. We pasted a note on the acromag chassis for future reference that it uses only positive rails and negative LED monitors are not usually ON.
Back to solving the frozen acromag issue, we conjectured that maybe the ethernet connection is broken. The DB25 cables for the satellite box are bit short and pull around other cables with it when connected. We checked all the ethernet cabling, it looked fine. On c1susaux computer, we saw that the monitor LED for ethernet port 2 which is connected to acromag chassis is solid ON while the other one (which is probably connection to the switch) is blinking.
We tried doing telnet to the computer, it didn't work. The host refused connection from pianosa workstation. We tried pinging the c1susaux computer, and that worked. So we concluded that most probably, the epics modbus server hosting the slow channels on c1susaux is unable to communicate with acromag chassis and hence the solid LED light on that ethernet port instead of a blinking one. We checked computer restart procedure page for SLOW computers on wiki and found that it said if telnet is not working, we can hard reboot the computer.
We hard reboot the computer by long pressing the power button and then presssing it back on. We did this process 3 times with the same result. The ethernet port 2 LED (Acromag chassis) would blink but the ethernet port 1 LED (connected to switch) would not turn ON. We now can not even ping the machine now, let alone telnet into it. All SUS slow monitor channels are not present now ofcourse. We also tried once pressing the reset button (which the manual said would reboot the machine), but we got the same outcome.
Now, we decided to stop poking around until someone with more experience can help us on this.
Bottomline: We don't know what caused the LL sensor issue and hence it has not been fixed. It can happen again. We lost all C1SUSAUX slow channels which are the OSEM and COIL slow monitor channels for PRM, BS, ITMX, ITMY, MC1, MC2 and MC3. |