40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Thu Sep 2 17:53:15 2021, Paco, Summary, Computers, chiara down, vac interlock tripped 
    Reply  Thu Sep 2 21:21:14 2021, Koji, Summary, Computers, Vacuum recovery 2 Screenshot_2021-09-02_21-20-24.pngScreenshot_2021-09-02_21-20-48.png
    Reply  Thu Sep 2 21:49:03 2021, Paco, Summary, Computers, chiara down, vac interlock tripped 2021-09-02_21-51-15.png
       Reply  Fri Sep 3 02:03:15 2021, Tega, Summary, Computers, Strip down large error files 
Message ID: 16307     Entry time: Thu Sep 2 17:53:15 2021     Reply to this: 16312   16313
Author: Paco 
Type: Summary 
Category: Computers 
Subject: chiara down, vac interlock tripped 

[paco, koji, tega, ian]

Today in the morning the name server / network file system running in chiara failed. This resulted in donatella/pianosa/rossa shell prompts to hang forever. It also made sitemap crash and even dropping into a bash shell and just listing files from some directory in the file system froze the computer. Remote ssh sessions on nodus also had the same symptoms.

A little after 1 pm, we started debugging this issue with help from Koji. He suggested we hook a monitor, keyboard, and mouse onto chiara as it should still work locally even if something with the NFS (network file system) failed. We did this and then we tried for a while to unmount the /dev/sdc1/ from /home/cds/ (main file system) and mount /dev/sdb1/ from /media/40mBackup (backup copy) such that they swap places. We had no trouble unmounting the backup drive, but only succeeded in unmounting the main drive with the "lazy" unmount, or running "umount -l". Running "df" we could see that the disk space was 100% used, with only ~ 1 GB of free space which may have been the cause for the issue. After swapping these disks by editing the /etc/fstab file to implement the aforementioned swapping, we rebooted chiara and we recovered the shell prompts in all workstations, sitemap, etc... due to the backup drive mounting. We then started investigating what caused the main drive to fill up that quickly, and noted that weirdly now the capacity was at 85% or about 500GB less than before (after reboot and remount), so some large file was probably dumped into chiara that froze the NFS causing the issue.

At this point we tried opening the PSL shutter to recover the IMC. The shutter would not open and we suspected the vacuum interlock was still tripped... and indeed there was an uncleared error in the VAC screen. So with Koji's guidance we walked to the c1vac near the HV station and did the following at ~ 5:13 PM -->

  1. Open V4; apart from a brief pressure spike in PTP2, everything looked ok so we proceeded to
  2. Open V1; P2 spiked briefly and then started to drop. Then, Koji suggested that we could
  3. Close V4; but we saw P2 increasing by a factor of~ 10 in a few seconds, so we
  4. Reopened V4;

We made sure that P1a (main vacuum pressure) was dropping and before continuing we decided to look back to see what the nominal vacuum state was that we should try to restore.

We are currently searching the two systems for diffrences to see if we can narrow down the culprit of the failure.

 

ELOG V3.1.3-