[chub, gautam]
Sumary:
One of the XT1111 units (XT1111a) in the new vacuum system has malfunctioned. So all valves are closed, PSL shutter is also closed, until this is resolved.
Details:
- Chub alerted me he had changed the main N2 line pressure, but this did not show up in the trend data. In fact, the trend data suggested that all 3 N2 gauges had stopped logging data (they just held the previous value) since sometime on Monday, see Attachment #1.
- We verified that the gauges were being powered, and that the analog voltage output of the gauges made sense in the drill press room ---> So this suggested something was wrong at the Vacuum rack electronics rack.
- Went to the vacuum rack, saw no obvious indicator lights signalling a fault.
- So I restarted the modbus process on c1vac using sudo systemctl restart modbusIOC.service. The way Jon has this setup, this service controls all the sub-processes talking to gauges and TPs, so resatrting this master process should have brought everything back.
- This tripped the interlock, and all valves got closed.
- Once the modbus service restarted, most things came back normally. However, V1, V3, V4 and V5 readbacks were listed as "UNDEF".
- The way the interlock code works, it checks a valve state change request against the monitor channel, so all these valves could not be opened.
- We confirmed that the valves themselves were operational, by bypassing the itnerlock logic and directly actuating on the valve - but this is not a safe way of running overnight so we decided to shut everything down.
- We also confirmed that the problem is with one particular Acromag unit - switching the readback Dsub connector to another channel (e.g. V1 --> VM2) showed the expected readback.
- As a further check - I connected a windows laptop with the Acromag software installed, to the suspected XT1111 - it reported an error message saying "USB device may be damaged". Plugging into another XT111 in the crate, I was able to access the unit in the normal way.
- The phoenix connector architecture of the Acromags makes it possible to replace this single unit (we have spare XT1111 units) without disturbing the whole system - so barring objections, we plan to do this at 9am tomorrow. The replacement plan is summarized in Attachment #2.
Pressure of the main volume seems to have stabilized - see Attachment #3, so it should be fine to leave the IFO in this state overnight.
Questions:
- What caused the original failure of the writing to the ADC channels hooked up to the N2 gauges? There isn't any logging setup from the modbus processes afaik.
- What caused the failure of the XT1111? What is the failure mode even? Because some other channels on the same XT1111 are working...
- Was it user error? The only operation carried out by me was restarting the modbus services - how did this damage the readback channels for just four valves? I think Chub also re-arranged some wires at the end, but unplugging/re-connecting some cables shouldn't produce this kind of response...
The whole point of the upgrade was to move to a more reliable system - but seems quite flaky already. |