40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 264 of 344  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  15164   Tue Jan 28 15:39:04 2020 gautamConfigurationComputersSluggish megatron?

There were a bunch of medm processes stalled on megatron (connected with screenshot taking). To see if they were interfering with the other scripts, I killed all of the medm processes, and commented out the line in the crontab that runs the screenshots every 10 mins. Let's see if this improves stability.

  15167   Tue Jan 28 17:36:45 2020 gautamConfigurationComputersLocal EPICS7.0 installed on megatron

[Jon, gautam]

We found that the caput commands were taking much longer to execute on megatron than on pianosa (for example). Suspecting that this had something to do with the fact that megatron was using EPICS binaries from the shared NFS drive which were compiled for a much older OS, I installed the latest stable release of EPICS on megatron. The new caput commands execute much faster. I also added the local EPICS directory to the head of the $PATH variable used by the MC autolocker and FSS Slow scripts, so that they use the new caput command. But mcup is still slow - maybe my new path definition isn't picked up and it is still using the NFS binaries? To be looked into...


There were a bunch of medm processes stalled on megatron (connected with screenshot taking). To see if they were interfering with the other scripts, I killed all of the medm processes, and commented out the line in the crontab that runs the screenshots every 10 mins. Let's see if this improves stability.

  15246   Wed Mar 4 11:10:47 2020 YehonathanUpdateComputersAllegra revival

Allegra had no network cable and no mouse. We found Allegra'snetwork cable (black) and connected it.

I found a dirty old school mouse and connected it.

I wiped Allegra and now I'm currently installing debian 10 on allegra following Jon's elog.

04/01 update: I forgot to mention that I tried installing cds software by following Jamie's instruction: I added the line in /etc/apt/sources.list.d/lscsoft.list: "deb http://software.ligo.org/lscsoft/debian/ stretch contrib". But this the only thing I managed to do. The next command in the instructions failed.

  15276   Fri Mar 13 20:00:50 2020 JonUpdateComputersLoopback monitoring for slow machines


Today I finished implementing loopback monitors of the up/down state of the slow controls machines. They are visible on a new MEDM screen accessible from Sitemap > CDS > Slow Machine Status (pictured in attachment 1). Each monitor is a single EPICS binary channel hosted by the slow machine, which toggles its state at 1 Hz (an alive "blinker"). For each machine, the monitor is defined by a separate database file named c1[machine]_state.db located in the target directory.

This is implemented for all upgraded machines, which includes every slow machine except for c1auxey. This is the next and final one slated for replacement.


The blinkers are currently implemented as soft channels, but I'd like to ultimately convert them to hard channels using two sinking/sourcing BIO units. This will require new wiring inside each Acromag chassis, however. For now, as soft channels, the monitors are sensitive to a failure of the host machine or a failure of the EPICS IOC. As hard channels, they will additionally be sensitive to a failure of the secondary network interface, as has been known to happen.

Each slow machine's IOC had to be restarted this afternoon to pick up the new channels. The IOCs were restarted according to the following procedure.


  • Disabled OPLEV servos on ETMX
  • Zeroed slow biases
  • Disabled watchdog
  • Restarted IOC
  • Reverted 1-3


  • Closed V1, VM1
  • Restarted IOC
  • Returned valves to original state


  • Disabled IMC autolocker
  • Closed PSL shutter
  • Restarted IOC
  • Reverted 1-2


  • ​Restarted IOC


  • Disabled IMC autolocker
  • Closed shutter
  • Disabled OPLEV servos on: MC1, MC2, MC3, BS, ITMX, ITMX, PRM, SRM
  • Zeroed slow biases
  • Disabled watchdogs
  • Restarted IOC
  • Reverted 1-5

The intial recovery of c1susaux did not succeed. Most visibly, the alignment state of the IFO was not restored. After some debugging, we found that the restart of the modbus service was partially failing at the final burt-restore stage. The latest snapshot file /opt/rtcds/caltech/c1/burt/autoburt/latest/c1susaux.snap was not found. I manually restored a known good snapshot from earlier in the day (15:19) and we were able to relock the IMC and XARM. GV and I were just talking earlier today about eliminating these burt-restores from the systemd files. I think we should.

Attachment 1: Screen_Shot_2020-03-13_at_7.59.55_PM.png
  15447   Wed Jul 1 18:16:09 2020 gautamUpdateComputersrossa re-re-revival

In an effort to make a second usable workstation, I did the following (remotely) on rossa today (not necessarily in this order, I wasn't maintaining a live log so I forgot):

  1. Fixed /etc/resolv.conf, so that the other martian machines can be found.
  2. Copied over .bashrc file, and the appropriate lines from /etc/fstab from pianosa to rossa.
  3. Ran sudo apt install nfs-common. Then ran sudo mount -a to get /cvs/cds mounted.
  4. Made symlinks for /users and /opt/rtcds , and /ligo. All of these are used by various environment-setting scripts and I chose to preserve the structure, though why we need so many symlinks, I don't know...
  5. Set up the shell variable $NDSSERVER using export NDSSERVER=fb:8088. I'm not sure how, but I believe DTT, awggui etc use this on startup to get the channel list (any
  6. Followed instructions from Erik von Reis at LHO to install the cds workstation packages and dependencies. Worked like a charm 🎃
  7. As a test, I plotted the accelerometer spectra in DTT, see Attachment #1. I also launched foton from inside awggui, and confirmed that the sample rate is inherited and I could designate a filter. But I haven't yet run the noise injection to test it, I'll do that the next time I'm in the lab.
  8. Also checked that medm, StripTool and ndscope, and anaconda python all seem to work 👍🏾.

So, in summary, rossa is now all set up for use during lock acquisition. However, until this machine has undergone a few months of testing, we should freeze the pianosa config and not mess with it.

Note that this version of the "crtools" is rather new. Please, use them and if there is an issue, report the errors! I am going to occassionally try lock acquisition using rossa. 


wiped and install Debian 10 on rossa today

still to be done: config it as CDS workstation

please don't try to "fix" it in the meantime

Attachment 1: MCacc.pdf
  15449   Sun Jul 5 16:14:41 2020 ranaUpdateComputersrossa re-re-revival

maybe we should make a "dd" copy of pianosa in case rossa has issues and someone destroys pianosa by accidentally spilling coffee on it.

So, in summary, rossa is now all set up for use during lock acquisition. However, until this machine has undergone a few months of testing, we should freeze the pianosa config and not mess with it.

  15451   Sun Jul 5 18:39:57 2020 ranaUpdateComputersrossa: printer

I did

sudo usermod -a -G lpadmin controlscheeky

and then was able to add Grazia to the list of printers for Rossa by following the instructions on the 40m Wiki. yes

I installed color syntax highlighting on Rossa using the internet (https://superuser.com/questions/71588/how-to-syntax-highlight-via-less). Now if you do 'less genius_code.py', it will be highlighting the python syntax.

when I try 'sitemap' on rossa I get:

medm: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

  15452   Mon Jul 6 00:37:28 2020 gautamUpdateComputersrossa: lib symlink

This is strange - I was definitely able to launch medm when I was working on this machine remotely on Friday. But now, there does seem to be a problem with this shared library being missing.

First of all, I installed mlocate to find where the shared library files are installed. Then I made the symlink, and now sitemap seems to work again.

Weirdly, my changes to /etc/resolv.conf got overwritten somehow. Was this machine rebooted? Uptime suggests it's only been running for ~6 hours at the time of writing of this elog.

sudo apt install mlocate
sudo updatedb
sudo ln -s /usr/lib/x86_64-linux-gnu/libreadline.so.7 /usr/lib/x86_64-linux-gnu/libreadline.so.6

when I try 'sitemap' on rossa I get:

medm: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

  15454   Mon Jul 6 12:43:02 2020 ranaUpdateComputersrossa: lib symlink

yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display

maybe we're supposed to edit something besides resolv.conf since that gets over-written on boot for some linux OS

  15455   Mon Jul 6 12:51:41 2020 gautamUpdateComputersrossa: resolvconf installed

Indeed, this is now fixed by following instructions from here. I rebooted rossa at ~1250 PDT and confirmed that resolv.conf didn't get overwritten. The resolv.conf file also now has the following useful lines at the head:

~>cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)

yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display

maybe we're supposed to edit something besides resolv.conf since that gets over-written on boot for some linux OS

  15460   Wed Jul 8 22:50:33 2020 gautamUpdateComputersrossa: more symlinks

I wanted to try using rossa as my locking workstation today. However, a few problems became quickly evident. Basically, any of our scripts that rely on the cdsutils package (there are MANY) will not work on rossa, because of some library error. This machine is running Debian 10, while the cdsutils package is being loaded from a pre-compiled install on the shared drive, so perhaps this isn't surprising?

Digging a little more, I found that actually, a version of cdsutils that actually works with python3 is actually shipped with the standard cds-workstation meta-package. This is great news, and we should try and use this where possible I guess. Deferring further debugging for daytime work.

Anyway, I added a symlink: sudo ln -s /usr/lib/x86_64-linux-gnu/libncurses.so.6 /usr/lib/x86_64-linux-gnu/libncurses.so.5, and installed wmctrl using sudo apt install wmctrl

  15463   Thu Jul 9 16:16:20 2020 gautamUpdateComputersrossa: graphics driver issues?

I noticed these streaky lines again today (but they were not a problem last night). It is annoying if we have to reboot this machine all the time. I wonder if this has something to do with missing drivers. When I ran sudo apt update && sudo apt upgrade, I got several lines like (this isn't the whole stack trace)

W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/ucode_unload.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/ucode_load.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/unload_bl.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/bl.bin for module nouveau

Is this indicative of the graphics drivers being installed incorrectly? I am hesitant to mess with this because I think in the past, it was always trying to update some graphics driver that crashed the whole machine into some weird state where we have to wipe the drive and do a fresh re-install of the OS.

Should we just follow these instructions? The graphics card is apparently Quadro P400, which is one of the supported ones according to the list of supported devices.

Or just swap donatella and rossa monitors and defer the problem for later?


yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display

  15469   Sat Jul 11 00:10:22 2020 gautamUpdateComputersrossa: more developmental work

After some consultation with Erik von Reis at LHO, this workstation is progressing towards being usable for most commissioning tasks. DTT, awggui, foton, and MEDM are all now working well. The main limitation now comes from the fact that many of our python scripts are written for python2, and rossa doesn't have many dependencies installed for python2. I see no reason to build these dependencies on rossa for python2, we should not have to work with an unsupported language. But at the same time, I don't want to completely wipe all our python2 scripts, and make them python3, because this would involve a lot of tedious testing that I'm not prepared to undertake at the moment (the problem is compounded by the fact that pianosa does not have many dependencies installed for python3).

So what I have done in the interim is make python3 versions of the most important scripts I need to get the PRFPMI locking working - they are in the scripts directory and have the same names as their python2 counterparts, but have a 3 appended to their names. So when working on rossa, these are the scripts that are called. Eventually, after a lot more testing, we can depracate the old scripts. Currently, where applicable, the MEDM screens allow for either the python2 or python3 version of the script to be called.

Please, for the time being, do not try and install any new packages on rossa unless you are prepared to debug any problems caused and return the machine to a workable state. If you find some issue with a missing package on rossa, (i) make a note of it on the elog, and (ii) if possible, set up your own conda environment for testing and install dependencies to that environment only.

  15473   Mon Jul 13 11:33:18 2020 ranaUpdateComputersrossa: more developmental work

I too, would prefer py3 for everything, but aren't all the cdsutils / gaurdian things still python2?

Is it possible to just make a python2 conda environment on rossa? I would guess that its simple and won't interfere with the regular operation of that machine.

  15475   Mon Jul 13 12:37:05 2020 gautamUpdateComputersrossa: more developmental work

In fact, all these utilities are now available in python3. There may be some bugs (e.g. this), but I've checked basic functionality and things look usable enough for development to proceed. While we can have a python2 env on rossa, I think it's unnecessary.


I too, would prefer py3 for everything, but aren't all the cdsutils / gaurdian things still python2?

  15921   Mon Mar 15 20:40:01 2021 ranaConfigurationComputersinstalled QTgrace on donatella for dataviewer

I installed QTgrace using yum on donatella.angel Both Grace and XMgrace are broken due to some boring fight between the Fedora package maintainers and the (non existent) Grace support team. So I have symlinked it:

controls@donatella|bin> sudo mv xmgrace xmgrace_bak
controls@donatella|bin> sudo ln -s qtgrace xmgrace
controls@donatella|bin> pwd

I checked that dataviewer works now for realtime and playback.cool Although the middle click paste on the mouse doesn't work yet.angry

Attachment 1: cutiegrace.png
  15928   Wed Mar 17 09:05:01 2021 Paco, AnchalConfigurationComputers40m Control Room Changes
  • Switched positions of allegra and donatella.
  • While doing so, the hdmi cable previously used by donatella snapped. We replaced this cable by another unused cable we found connected only on one end to rossa. We should get more HDMI cables if that cable was in use for some other purpose.
  • Paco bought a bluetooth speaker/mic that is placed infront of allegra and it's usb adapter is connected to iMac's keyboard in the bottom. With the new camera installed, the 40m video call environment is now complete.
  • Again, we have placed allegra's monitor for place holder but it is not working and we need new monitors for it in future whenever it is going to be used.
  15945   Fri Mar 19 15:26:19 2021 AidanUpdateComputersActivated MATLAB license on megatron

Activated MATLAB license on megatron

  15946   Fri Mar 19 15:31:56 2021 AidanUpdateComputersActivated MATLAB license on donatella

Activated MATLAB license on donatella

  15955   Tue Mar 23 09:16:42 2021 Paco, AnchalUpdateComputersPower cycled C1PSL; restored C1PSL

So actually, it was the C1PSL channels that had died. We did the following to get them back:

  • We went to this page and tried the telnet procedure. But it was unable to find the host.
  • So we followed the next advice. We went to the 1X1 rack and manually hard shut off C1PSL computer by holding down the power button until the LEDs went off.
  • We wait for 5-7 seconds and switched it back on.
  • By the time we were back in control room, the C1PSL channels were back online.
  • The mode cleaner however was struggling to keep the lock. It was going in and out of lock.
  • So we followed the next advice and did burt restore which ran following command:
    burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/snapshots/2021/Mar/22/17:19/c1psl.snap -l /tmp/controls_1210323_085130_0.write.log -o /tmp/controls_1210323_085130_0.nowrite.snap -v 
  • Now the mode cleaner was locked but we found that the input switch of C1IOO-WFS1_PIT and C1IOO-WFS2_PIT filter banks were off. Which meant that only YAW sensors were in loop in the lock.
  • We went back in dataviewer and checked when these channels were shut down. See attachments for time series.
  • It seems this happened yesterday, March 22nd near 1:00 pm (20:00:00 UTC). We can't find any mention of anyone else doing it on elog and we left by 12:15pm.
  • So we shut down the PSL shutter (C1:PSL-PSL_ShutterRqst) and switched off MC autolocker (C1:IOO-MC_LOCK_ENABLE).
  • Switched on C1:IOO-WFS1_PIT_SW1 and C1:IOO-WFS2_PIT_SW1.
  • Turned back on PSL shutter (C1:PSL-PSL_ShutterRqst) and MC autolocker (C1:IOO-MC_LOCK_ENABLE).
  • Mode cleaner locked back easily and now is keeping lock consistently. Everything looks normal.
Attachment 1: MCWFS1and2PITYAW.pdf
Attachment 2: MCWFS1and2PITYAW_Zoomed.pdf
  16027   Wed Apr 14 13:16:20 2021 AnchalConfigurationComputers40m Control Room Changes
  • I have confirmed that the old two monitors' backlighting is not working. One can see the impression of the display without any brightness on them. Both old monitors are on the shelf behind.
  • Today we got a monitor and mouse from Mike. I had to change /etc/default/grub GRUB_GFXMODE to 1920x1200@30 on allegra for it to work with the(any) monitor.
  • Allegra is Debian 10 with latest cds-workstation installed on it. It is a good test station to migrate our existing scripts to start using updated cds-workstation configuration.
  • Again, we have placed allegra's monitor for place holder but it is not working and we need new monitors for it in future whenever it is going to be used.


  16249   Fri Jul 16 16:26:50 2021 gautamUpdateComputersDocker installed on nodus

I wanted to try hosting some docker images on a "private" server, so I installed Docker on nodus following the instructions here. The install seems to have succeeded, and as far as I can tell, none of the functionality of nodus has been disturbed (I can ssh in, access shared drive, elog seems to work fine etc). But if you find a problem, maybe this action is responsible. Note that nodus is running Scientific Linux 7.3 (Nitrogen).

  16287   Mon Aug 23 10:17:21 2021 PacoSummaryComputerssystem reboot glitch


At 09:34 PST I noted a glitch in the controls room as the machines went down except for c1ioo. Briefly, the video feeds disappeared from the screens, though the screens themselves didn't lose power. At first I though this was some kind of power glitch, but upon checking with Jordan, it most likely was related to some system crash. Coming back to the controls room, I could see the MC reflection beam swinging, but unfortunately all the FE models came down. I noticed that the DAQ status channels were blank.

I ssh into c1ioo no problem and ran "rtcds stop c1ioo c1als c1omc", then "rtcds restart c1x03" to do a soft restart. This worked, but the DAQ status was still blank. I then tried to ssh into c1sus and c1lsc without success, similarly c1iscex and c1iscey were unreachable. I went and did a hard restart on c1iscex by switching it off, then its extension chassis, then unplugging the power cords, then inverting these steps, and could ssh into it from rossa. I ran "rtcds start c1x01" and saw the same blank DAQ status. I noticed the elog was also down... so nodus was also affected?

[paco, anchal]

Anchal got on zoom to offer some assistance. We discovered that the fb1 and nodus were subject to some kind of system reboot at precisely 09:34. The "systemctl --failed" command on fb1 displayed both the daqd_dc.service and rc-local.service as loaded but failed (inactive). Is it a good idea to try and reboot the fb1 machine? ... Anchal was able to bring elog back up from nodus (ergo, this post).


Although it probably needs the DAQ service from the fb1 machine to be up and running, I tried running the scripts/cds/rebootC1LSC.sh script. This didn't work. I tried running sudo systemctl restart daqd_dc from the fb1 machine without success. Running systemctl reset-failed "worked" for daqd_dc and rc-local services on fb1 in the sense that they were no longer output from systemctl --failed, but they remained inactive (dead) when running systemctl status on them. Following from  15303   I succeeded in restarting the daqd services. Turned out I needed to manually start the open-mx and mx services in fb1. I rerun the restartC1LSC script without success. The script fails because some machines need to be rebooted by hand.

  16307   Thu Sep 2 17:53:15 2021 PacoSummaryComputerschiara down, vac interlock tripped

[paco, koji, tega, ian]

Today in the morning the name server / network file system running in chiara failed. This resulted in donatella/pianosa/rossa shell prompts to hang forever. It also made sitemap crash and even dropping into a bash shell and just listing files from some directory in the file system froze the computer. Remote ssh sessions on nodus also had the same symptoms.

A little after 1 pm, we started debugging this issue with help from Koji. He suggested we hook a monitor, keyboard, and mouse onto chiara as it should still work locally even if something with the NFS (network file system) failed. We did this and then we tried for a while to unmount the /dev/sdc1/ from /home/cds/ (main file system) and mount /dev/sdb1/ from /media/40mBackup (backup copy) such that they swap places. We had no trouble unmounting the backup drive, but only succeeded in unmounting the main drive with the "lazy" unmount, or running "umount -l". Running "df" we could see that the disk space was 100% used, with only ~ 1 GB of free space which may have been the cause for the issue. After swapping these disks by editing the /etc/fstab file to implement the aforementioned swapping, we rebooted chiara and we recovered the shell prompts in all workstations, sitemap, etc... due to the backup drive mounting. We then started investigating what caused the main drive to fill up that quickly, and noted that weirdly now the capacity was at 85% or about 500GB less than before (after reboot and remount), so some large file was probably dumped into chiara that froze the NFS causing the issue.

At this point we tried opening the PSL shutter to recover the IMC. The shutter would not open and we suspected the vacuum interlock was still tripped... and indeed there was an uncleared error in the VAC screen. So with Koji's guidance we walked to the c1vac near the HV station and did the following at ~ 5:13 PM -->

  1. Open V4; apart from a brief pressure spike in PTP2, everything looked ok so we proceeded to
  2. Open V1; P2 spiked briefly and then started to drop. Then, Koji suggested that we could
  3. Close V4; but we saw P2 increasing by a factor of~ 10 in a few seconds, so we
  4. Reopened V4;

We made sure that P1a (main vacuum pressure) was dropping and before continuing we decided to look back to see what the nominal vacuum state was that we should try to restore.

We are currently searching the two systems for diffrences to see if we can narrow down the culprit of the failure.


  16312   Thu Sep 2 21:21:14 2021 KojiSummaryComputersVacuum recovery 2

Attachment 1:
We are pumping the main volume with TP2. Once P1a reached the pressure ~2.2mtorr, we could open the PSL shutter. The TP2 voltage went up once but came down to ~20V. It's close to nominal now.
We wondered if we should use TP3 or not. I checked the vacuum pressure trends and found that the annulus pressures were going up. So we decided to open the annulus valves.

Attachment 2:
The current vacuum status is as shown in the MEDM screenshot.

There is no trend data of the valve status (sad)

Attachment 1: Screenshot_2021-09-02_21-20-24.png
Attachment 2: Screenshot_2021-09-02_21-20-48.png
  16313   Thu Sep 2 21:49:03 2021 PacoSummaryComputerschiara down, vac interlock tripped

[tega, paco]

We found the files that took excess space in the chiara filesystem (see Attachment 1). They were error files from the summary pages that were ~ 50 GB in size or so located under /home/cds/caltech/users/public_html/detcharsummary/logs/. We manually removed them and then copied the rest of the summary page contents into the main file system drive (this is to preserve the information backup before it gets deleted by the cron job at the end of today) and checked carefully to identify the actual issue for why these files were as large in the first place.

We then copied the /detcharsummary directory from /media/40mBackup into /home/cds to match the two disks.

Attachment 1: 2021-09-02_21-51-15.png
  16314   Fri Sep 3 02:03:15 2021 TegaSummaryComputersStrip down large error files

Also deleted the ~50GB error files from ldas to prevent rsync from copying them to nodus again. With the new update to GWsumm, there are new error messages that initially didn't seem to affect the summary pages functionality, but in the extreme case can populated the error files the repeated warnings on the form "Loading: FrSerData", "Loading: FrSerData::n4294967295", "Loading: FrSummary","Loading: FrSerDataLoading: FrSerData" and many more combinations until we get file sizes of the order of ~50GB. So I have updated the checkstatus script to parse the error files and strip out the majority of these error messages. Work is ongoing to get them all.

In light of these large files generation, I decided to look in the summary pages folder to see if there are other large files that we need to keep track of and it turns there are indeed a collection of files in the archive folder that bloats the summary pages on ldas to ~1TB. Luckily these are not synced to nodus so no problem here. However, since the beginning of the year, the archive folders that hold data used for each day's computation have not been cleared. We have a script for doing this but it has not been run for a while now and it only delete archive files for a specific month which is hardcoded to two months from the date the file is run. I have modified the code to allow archive deletion for a range of months so we can clear data from Jan to July. 


[tega, paco]

We found the files that took excess space in the chiara filesystem (see Attachment 1). They were error files from the summary pages that were ~ 50 GB in size or so located under /home/cds/caltech/users/public_html/detcharsummary/logs/. We manually removed them and then copied the rest of the summary page contents into the main file system drive (this is to preserve the information backup before it gets deleted by the cron job at the end of today) and checked carefully to identify the actual issue for why these files were as large in the first place.

We then copied the /detcharsummary directory from /media/40mBackup into /home/cds to match the two disks.


  16346   Mon Sep 20 15:23:08 2021 YehonathanUpdateComputersWifi internet fixed

Over the weekend and today, the wifi was acting bad with frequent disconnections and no internet access. I tried to log into the web interface of the ASUS wifi but with no success.

I pushed the reset button for several seconds to restore factory settings. After that, I was able to log in. I did the automatic setup and defined the wifi passwords to be what they used to be.

Internet access was restored. I also unplugged and plugged back all the wifi extenders in the lab and moved the extender from the vertex inner wall to the outer wall of the lab close to the 1X3.

Now, there seems to be wifi reception both in X and Y arms (according to my android phone).


  16348   Mon Sep 20 15:42:44 2021 Ian MacMillanSummaryComputersQuantization Code Summary

This post serves as a summary and description of code to run to test the impacts of quantization noise on a state-space implementation of the suspension model.

Purpose: We want to use a state-space model in our suspension plant code. Before we can do this we want to test to see if the state-space model is prone to problems with quantization noise. We will compare two models one for a standard direct-ii filter and one with a state-space model and then compare the noise from both. 

Signal Generation:

First I built a basic signal generator that can produce a sine wave for a specified amount of time then can produce a zero signal for a specified amount of time. This will let the model ring up with the sine wave then decay away with the zero signal. This input signal is generated at a sample rate of 2^16 samples per second then stored in a numpy array. I later feed this back into both models and record their results.

State-space Model:

The code can be seen here

The state-space model takes in the list of excitation values and feeds them through a loop that calculates the next value in the output.

Given that the state-space model follows the form

  \dot{x}(t)=\textbf{A}x(t)+ \textbf{B}u(t)   and  y(t)=\textbf{C}x(t)+ \textbf{D}u(t) ,

the model has three parts the first equation, an integration, and the second equation.

  1. The first equation takes the input x and the excitation u and generates the x dot vector shown on the left-hand side of the first state-space equation.
  2. The second part must integrate x to obtain the x that is found in the next equation. This uses the velocity and acceleration to integrate to the next x that will be plugged into the second equation
  3. The second equation in the state space representation takes the x vector we just calculated and then multiplies it with the sensing matrix C. we don't have a D matrix so this gives us the next output in our system

This system is the coded form of the block diagram of the state space representation shown in attachment 1

Direct-II Model:

The direct form 2 filter works in a much simpler way. because it involves no integration and follows the block diagram shown in Attachment 2, we can use a single difference equation to find the next output. However, the only complication that comes into play is that we also have to keep track of the w(n) seen in the middle of the block diagram. We use these two equations to calculate the output value

y[n]=b_0 \omega [n]+b_1 \omega [n-1] +b_2 \omega [n-2],  where w[n] is  \omega[n]=x[n] - a_1 \omega [n-1] -a_2 \omega[n-2]

Bit length Control:

To control the bit length of each of the models I typecast all the inputs using np.float then the bit length that I want. This simulates the computer using only a specified bit length. I have to go through the code and force the code to use 128 bit by default. Currently, the default is 64 bit which so at the moment I am limited to 64 bit for highest bit length. I also need to go in and examine how numpy truncates floats to make sure it isn't doing anything unexpected.

Bode Plot: 

The bode plot at the bottom shows the transfer function for both the IIR model and the state-space model. I generated about 100 seconds of white noise then computed the transfer function as 

G(f) = \frac{P_{csd}(f)}{P_{psd}(f)}

which is the cross-spectral density divided by the power spectral density. We can see that they match pretty closely at 64 bits. The IIR direct II model seems to have more noise on the surface but we are going to examine that in the next elog


Attachment 1: 472px-Typical_State_Space_model.svg.png
Attachment 2: Biquad_filter_DF-IIx.svg.png
Attachment 3: SS-IIR-TF.pdf
  16350   Mon Sep 20 21:56:07 2021 KojiUpdateComputersWifi internet fixed

Ug, factory resets... Caltech IMSS announced that there was an intermittent network service due to maintenance between Sept 19 and 20. And there seemed some aftermath of it. Check out "Caltech IMSS"


  16355   Wed Sep 22 14:22:35 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

Now that we have a model of how the SS and IIR filters work we can get to the problem of how to measure the quantization noise in each of the systems. Den Martynov's thesis talks a little about this. from my understanding: He measured quantization noise by having two filters using two types of variables with different numbers of bits. He had one filter with many more bits than the second one. He fed the same input signal to both filters then recorded their outputs x_1 and x_2, where x_2 had the higher number of bits. He then took the difference x_1-x_2. Since the CDS system uses double format, he assumes that quantization noise scales with mantissa length. He can therefore extrapolate the quantization noise for any mantissa length.

Here is the Code that follows the following procedure (as of today at least)

This problem is a little harder than I had originally thought. I took Rana's advice and asked Aaron about how he had tackled a similar problem. We came up with a procedure explained below (though any mistakes are my own):

  1. Feed different white noise data into three of the same filter this should yield the following equation: \textbf{S}_i^2 =\textbf{S}_{ni}^2+\textbf{S}_x^2, where \textbf{S}_i^2 is the power spectrum of the output for the ith filter,  \textbf{S}_{ni}^2 is the noise filtered through an "ideal" filter with no quantization noise, and  \textbf{S}_x^2 is the power spectrum of the quantization noise. Since we are feeding random noise into the input the power of the quantization noise should be the same for all three of our runs.
  2. Next, we have our three outputs:  \textbf{S}_1^2,  \textbf{S}_2^2, and  \textbf{S}_3^2 that follow the equations: 

\textbf{S}_1^2 =\textbf{S}_{n1}^2+\textbf{S}_x^2

\textbf{S}_2^2 =\textbf{S}_{n2}^2+\textbf{S}_x^2

\textbf{S}_3^2 =\textbf{S}_{n3}^2+\textbf{S}_x^2

From these three equations, we calculate the three quantities: \textbf{S}_{12}^2\textbf{S}_{23}^2, and \textbf{S}_{13}^2 which are calculated by:

\textbf{S}_{12}^2 =\textbf{S}_{1}^2-\textbf{S}_2^2\approx \textbf{S}_{n1}^2 -\textbf{S}_{n2}^2

\textbf{S}_{23}^2 =\textbf{S}_{2}^2-\textbf{S}_3^2\approx \textbf{S}_{n2}^2 -\textbf{S}_{n3}^2

\textbf{S}_{13}^2 =\textbf{S}_{1}^2-\textbf{S}_3^2\approx \textbf{S}_{n1}^2 -\textbf{S}_{n3}^2

from these quantities, we can calculate three values: \bar{\textbf{S}}_{n1}^2\bar{\textbf{S}}_{n2}^2, and \bar{\textbf{S}}_{n3}^2 since these are just estimates we are using a bar on top. These are calculated using:




using these estimates we can then estimate  \textbf{S}_{x}^2  using the formula:

\textbf{S}_{x}^2 \approx \textbf{S}_{1}^2 - \bar{\textbf{S}}_{n1}^2 \approx \textbf{S}_{2}^2 - \bar{\textbf{S}}_{2}^2 \approx \textbf{S}_{3}^2 - \bar{\textbf{S}}_{n3}^2

we can average the three estimates for  \textbf{S}_{x}^2  to come up with one estimate.

This procedure should be able to give us a good estimate of the quantization noise. However, in the graph shown in the attachments below show that the noise follows the transfer function of the model to begin with. I would not expect this to be true so I believe that there is an error in the above procedure or in my code that I am working on finding. I may have to rework this three-corner hat approach. I may have a mistake in my code that I will have to go through.

I would expect the quantization noise to be flatter and not follow the shape of the transfer function of the model. Instead, we have what looks like just the result of random noise being filtered through the model. 

Next steps:

The first real step is being able to quantify the quantization noise but after I fix the issues in my code I will be able to start liking at optimal model design for both the state-space model and the direct form II model. I have been looking through the book "Quantization noise" by Bernard Widrow and Istvan Kollar which offers some good insights on how to minimize quantization noise. 

Attachment 1: IIR64-bitnoisespectrum.pdf
  16360   Mon Sep 27 12:12:15 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

I have not been able to figure out a way to make the system that Aaron and I talked about. I'm not even sure it is possible to pull the information out of the information I have in this way. Even the book uses a comparison to a high precision filter as a way to calculate the quantization noise:

"Quantization noise in digital filters can be studied in simulation by comparing the behavior of the actual quantized digital filter with that of a refrence digital filter having the same structure but whose numerical calculations are done extremely accurately."
-Quantization Noise by Bernard Widrow and Istvan Kollar (pg. 416)

Thus I will use a technique closer to that used in Den Martynov's thesis (see appendix B starting on page 171). A summary of my understanding of his method is given here:

A filter is given raw unfiltered gaussian data f(t) then it is filtered and the result is the filtered data x(t) thus we get the result: f(t)\rightarrow x(t)=x_N(t)+x_q(t)  where x_N(t) is the raw noise filtered through an ideal filter and x_q(t) is the difference which in this case is the quantization noise. Thus I will input about 100-1000 seconds of the same white noise into a 32-bit and a 64-bit filter. (hopefully, I can increase the more precise one to 128 bit in the future) then I record their outputs and subtract the from each other. this should give us the Quantization error e(t):
e(t)=x_{32}(t)-x_{64}(t)=x_{N_{32}}(t)+x_{q_{32}}(t) - x_{N_{64}}(t)-x_{q_{64}}(t)
and since x_{N_{32}}(t)=x_{N_{64}}(t) because they are both running through ideal filters:
e(t)=x_{N}(t)+x_{q_{32}}(t) - x_{N}(t)-x_{q_{64}}(t)
e(t)=x_{q_{32}}(t) -x_{q_{64}}(t)
and since in this case, we are assuming that the higher bit-rate process is essentially noiseless we get the Quantization noise x_{q_{32}}(t).

If we make some assumptions, then we can actually calculate a more precise version of the quantization noise:

"Since aLIGO CDS system uses double precision format, quantization noise is extrapolated assuming that it scales with mantissa length"
-Denis Martynov's Thesis (pg. 173)

From this assumption, we can say that the noise difference between the 32-bit and 64-bit filter outputs:  x_{q_{32}}(t)-x_{q_{64}}(t)  is proportional to the difference between their mantissa length. by averaging over many different bit lengths, we can estimate a better quantization noise number.

I am building the code to do this in this file

  16361   Mon Sep 27 16:03:15 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

I have coded up the procedure in the previous post: The result does not look like what I would expect. 

As shown in Attachment1 I have the power spectrum of the 32-bit output and the 64-bit output as well as the power spectrum of the two subtracted time series as well as the subtracted power spectra of both. unfortunately, all of them follow the same general shape of the raw output of the filter. 

I would not expect quantization noise to follow the shape of the filter. I would instead expect it to be more uniform. If anything I would expect the quantization noise to increase with frequency. If a high-frequency signal is run through a filter that has high quantization noise then it will severely degrade: i.e. higher quantization noise. 

This is one reason why I am confused by what I am seeing here. In all cases including feeding the same and different white noise into both filters, I have found that the calculated quantization noise is proportional to the response of the filter. this seems wrong to me so I will continue to play around with it to see if I can gain any intuition about what might be happening.

Attachment 1: DeltaNoiseSpectrum.pdf
  16362   Mon Sep 27 17:04:43 2021 ranaSummaryComputersQuantization Noise Calculation Summary

I suggest that you

  1. change the corner frequency to 10 Hz as I suggested last week. This filter, as it is, is going to give you trouble.
  2. Put in a sine wave at 3.4283 Hz with an amplitude of 1, rather than white noise. In this way, its not necessary to do any subtraction. Just make PSD of the output of each filter.
  3. Be careful about window length and window function. If you don't do this carefully, your PSD will be polluted by window bleeding.


  16366   Thu Sep 30 11:46:33 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

First and foremost I have the updated bode plot with the mode moved to 10 Hz. See Attachment 1. Note that the comparison measurement is a % difference whereas in the previous bode plot it was just the difference. I also wrapped the phase so that jumps from -180 to 180 are moved down. This eliminates massive jumps in the % difference. 

Next, I have two comparison plots: 32 bit and 64bit. As mentioned above I moved the mode to 10 Hz and just excited both systems at 3.4283Hz with an amplitude of 1. As we can see on the plot the two models are practically the same when using 64bits. With the 32bit system, we can see that the noise in the IIR filter is much greater than in the State-space model at frequencies greater than our mode.

Note about windowing and averaging: I used a Hanning window with averaging over 4 neighbor points. I came to this number after looking at the results with less averaging and more averaging. In the code, this can be seen as nperseg=num_samples/4 which is then fed into signal.welch

Attachment 1: SS-IIR-Bode.pdf
Attachment 2: PSD_32bit.pdf
Attachment 3: PSD_64bit.pdf
  16481   Wed Nov 24 11:02:23 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

I added mpmath to the quantization noise code. mpmath allows me to specify the precision that I am using in calculations. I added this to both the IIR filters and the State-space models although I am only looking at the IIR filters here. I hope to look at the state-space model soon. 

Notebook Summary:

I also added a new notebook which you can find HERE. This notebook creates a signal by summing two sine waves and windowing them.

Then that signal is passed through our filter that has been limited to a specific precision. In our case, we pass the same signal through a number of filters at different precisions.

Next, we take the output from the filter with the highest precision, because this one should have the lowest quantization noise by a significant margin, and we subtract the outputs of the lower precision filters from it. In summary, we are subtracting a clean signal from a noisy signal; because the underlying signal is the same, when we subtract them the only thing that should be left is noise. and since this system is purely digital and theoretical the limiting noise should be quantization noise.

Now we have a time series of the noise for each precision level (except for our highest precision level but that is because we are defining it as noiseless). From here we take a power spectrum of the result and plot it.

After this, we can calculate a frequency-dependent SNR and plot it. I also calculated values for the SNR at the frequencies of our two inputs. 

This is the procedure taken in the notebook and the results are shown below.

Analysis of Results:

The first thing we can see is that the precision levels 256 and 128 bits are not shown on our graph. the 256-bit signal was our clean signal so it was defined to have no noise so it cant be plotted. The 128-bit signal should have some quantization noise but I checked the output array and it contained all zeros. after further investigation, I found that the quantization noise was so small that when the result was being handed over from mpmath to the general python code it was rounding those numbers to zero. To overcome this issue I would have to keep the array as a mpmath object the entire time. I don't think this is useful because matplotlib probably couldn't handle it and it would be easier to just rewrite the code in C. 

The next thing to notice is sort of a sanity check thing. In general, low precision filters yield higher noise than high precision. This is a good quick sanity check. However, this does not hold true at the low end. we can see that 16-bit actually has the highest noise for most of the range. Chris pointed out that at low precisions that quantization noise can become so large that it is no longer a linearly coupled noise source. He also noted that this is prone to happen for low precision coefficients with features far below the Nyquist frequency like I have here. This is one explanation that seems to explain the data especially because this ambiguity is happening at 16-bit and lower as he points out. 

Another thing that I must mention, even if it is just a note to future readers, is that quantization noise is input dependent. by changing the input signal I see different degrees of quantization noise.

Analysis of SNR:

One of the things we hoped to accomplish in the original plan was to play around with the input and see how the results changed. I mainly looked at how the amplitude of the input signal scaled the SNR of the output. Below I include a table of the results. These results were taken from the SNR calculated at the first peak (see the last code block in the notebook) with the amplitude of the given sine wave given at the top of each column. this amplitude was given to both of the two sine waves even though only the first one was reported. To see an example, currently, the notebook is set up for measurement of input amplitude 10.

  0.1 Amplitude of input 1 Amplitude 100 Amplitude 1000 Amplitude
4-bit SNR 5.06e5 5.07e5 5.07e5 5.07e5
8-bit SNR 5.08e5 5.08e5 5.08e5 5.08e5
16-bit SNR 2.57e6 8.39e6 3.94e6 1.27e6
32-bit SNR 7.20e17 6.31e17 1.311e18 1.86e18
64-bit SNR 6.0e32 1.28e32 1.06e32 2.42e32
128-bit SNR unknown unknown unknown unknown

As we can see from the table above the SNR does not seem to relate to the amplitude of the input. in multiple instances, the SNR dips or peaks in the middle of our amplitude range.


Attachment 1: PSD_IIR_all.pdf
  16482   Wed Nov 24 13:44:19 2021 ranaSummaryComputersQuantization Noise Calculation Summary

This looks great. I think what we want to see mainly is just the noise in the 32 bit IIR filtering subtracted from the 64 bit one.

It would be good if Tega can look through your code to make sure there's NO sneaky places where python is doing some funny casting of the numbers. I didn't see anything obvious, but as Chris points out, these things can be really sneaky so you have to be next level paranoid to really be sure. Fox Mulder level paranoia.

And, we want to see a comparison between what you get and what Denis Martynov put in an appendix of his thesis when comparing the Direct Form II, with the low-noise form (also some slides from Matt Evans on thsi from a ~decade agoo). You should be able to reproduce his results. He used matlab + C, so I am curious to see if it can be done all in python, or if we really need to do it in C.

And then...we can make this a part of the IFOtest suite, so that we point it at any filter module anywhere in LIGO, and it downloads the data and gives us an estimate of the digital noise being generated.

  16492   Tue Dec 7 10:55:25 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

[Ian, Tega]

Tega and I have gone through the IIR Filter code and optimized it to make sure there aren't any areas that force high precision to be down-converted to low precision.

For the new biquad filter we have run into the issue where the gain of the filter is much higher than it should be. Looking at attachments 1 and 2, which are time series comparisons of the inputs and outputs from the different filters, we see that the scale for the output of the Direct form II filter shown in attachment 1 on the right is on the order of 10^-5 where the magnitude of the response of the biquad filter is on the order of 10^2. other than this gain the responses look to be the same. 

I am not entirely sure how this gain came into the system because we copied the c code that actually runs on the CDS system into python. There is a gain that affects the input of the biquad filter as shown on this slide of Matt Evans Slides. This gain, shown below as g, could decrease the input signal and thus fix the gain. However, I have not found any way to calculate this g.



With this gain problem we are left with the quantization noise shown in Attachment 4.

State Space:

I have controlled the state space filter to act with a given precision level. However, my code is not optimized. It works by putting the input state through the first state-space equation then integrating the result, which finally gets fed through the second state-space equation. 

This is not optimized and gives us the resulting quantization noise shown in attachment 5.

However, the state-space filter also has a gain problem where it is about 85 times the amplitude of the DF2 filter. Also since the state space is not operating in the most efficient way possible I decided to port the code chris made to run the state-space model to python. This code has a problem where it seems to be unstable. I will see if I can fix it



Attachment 1: DF2_TS.pdf
Attachment 2: BIQ_TS.pdf
Attachment 4: PSD_COMP_BIQ_DF2.pdf
Attachment 5: PSD_COMP_SS_DF2.pdf
  16498   Fri Dec 10 13:02:47 2021 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

I am trying to replicate the simulation done by Matt Evans in his presentation  (see Attachment 1 for the slide in particular). 

He defines his input as x_{\mathrm{in}}=sin(2\pi t)+10^{-9} sin(2\pi t f_s/4) so he has two inputs one of amplitude 1 at 1 Hz and one of amplitude 10^-9 at 1/4th the sampling frequency  in this case: 4096 Hz

For his filter, he uses a fourth-order notch filter. To achieve this filter I cascaded two second-order notch filters (signal.iirnotch) both with locations at 1 Hz and quality factors of 1 and 1e6. as specified in slide 13 of his presentation

I used the same procedure outlined here. My results are posted below in attachment 2.

Analysis of results:

As we can see from the results posted below the results don't match. there are a few problems that I noticed that may give us some idea of what went wrong.

First, there is a peak in the noise around 35 Hz. this peak is not shown at all in Matt's results and may indicate that something is inconsistent.

the second thing is that there is no peak at 4096 Hz. This is clearly shown in Matt's slides and it is shown in the input spectrum so it is strange that it does not appear in the output.

My first thought was that the 4kHz signal was being entered at about 35Hz but even when you remove the 4kHz signal from the input it is still there. The spectrum of the input shown in Attachment 3 shows no features at ~35Hz.

The Input filter, Shown in attachment 4 shows the input filter, which also has no features at ~35Hz. Seeing how the input has no features at ~35Hz and the filter has no features at ~35Hz there must be either some sort of quantization noise feature there or more likely there is some sort of sampling effect or some effect of the calculation.

To figure out what is causing this I will continue to change things in the model until I find what is controlling it. 

I have included a Zip file that includes all the necessary files to recreate these plots and results.

Attachment 1: G0900928-v1_(dragged).pdf
Attachment 2: PSD_COMP_BIQ_DF2.pdf
Attachment 3: Input_PSD.pdf
Attachment 4: Input_Filter.pdf
Attachment 5: QuantizationN.zip
  16507   Wed Dec 15 13:57:59 2021 PacoUpdateComputersupgraded ubuntu on zita


Upgraded zita's ubuntu and restarted the striptool script.

  16650   Mon Feb 7 16:14:37 2022 TegaUpdateComputersrealtime system reboot problem

I was looking into plotting temperature sensor data trend and why we currently do not have frame data written to file (on /frames) since Friday, and noticed that the FE models were not running. So I spoke to Anchal about it and he mentioned that we are currently unable to ssh into the FE machines, therefore we have been unable to start the models. I recalled the last time we enountered this problem Koji resolved it on Chiara, so I search the elog for Koji's fix and found it here, https://nodus.ligo.caltech.edu:8081/40m/16310. I followed the procedure and restarted c1sus and c1lsc machine and we are now able to ssh into these machines. Also restarted the remaining FE machines and confirm that can ssh into them. Then to start models, I ssh into each FE machine (c1lsc, c1sus, c1ioo, c1iscex, c1iscey, c1sus2) and ran the command

rtcds start --all

to start all models on the FE machine. This procedure worked for all the FE machines but failed for c1lsc. For some reason after starting the first the IOP model - c1x04, c1lsc and c1ass, the ssh connection to the machine drops. When we try to ssh into c1lsc after this event, we get the following error :  "ssh: connect to host c1lsc port 22: No route to host".  I reset the c1lsc machine and deecided to to start the IOP model for now. I'll wait for Anchal or Paco to resolve this issue.

[Anchal, Tega]

I informed Anchal of the problem and ask if he could take a look. It turn out 9 FE models across 3 FE machines (c1lsc, c1sus, c1ioo) have a certain interdependece that requires careful consideration when starting the FE model. In a nutshell, we need to first start the IOP models in all three FE machines before we start the other models in these machines. So we turned off all the models and shutdown the FE machines mainly bcos of a daq issue, since the DC (data concentrator) indicator was not initialised. Anchal looked around in fb1 to figure out why this was happening and eventually discovered that it was the same as the ms_stream issue encountered earlier in fb1 clone (https://nodus.ligo.caltech.edu:8081/40m/16372). So we restarted fb1 to see if things clear up given that chiara dhcp sever is now working fine. Upon restart of fb1, we use the info in a previous elog that shows if the DAQ network is working or not, r.e. we ran the command

$ /opt/mx/bin/mx_info
MX:fb1:mx_init:querying driver:error 5(errno=2):No MX device entry in /dev.

 The output shows that MX device was not initialiesd during the reboot as can also be seen below.

$ sudo systemctl status daqd_dc -l

● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
   Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
   Active: failed (Result: exit-code) since Mon 2022-02-07 18:02:02 PST; 12min ago
  Process: 606 ExecStart=/usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc (code=exited, status=1/FAILURE)
 Main PID: 606 (code=exited, status=1/FAILURE)

Feb 07 18:01:56 fb1 systemd[1]: Starting Advanced LIGO RTS daqd data concentrator...
Feb 07 18:01:56 fb1 systemd[1]: Started Advanced LIGO RTS daqd data concentrator.
Feb 07 18:02:00 fb1 daqd_dc_mx[606]: [Mon Feb  7 18:01:57 2022] Unable to set to nice = -20 -error Unknown error -1
Feb 07 18:02:00 fb1 daqd_dc_mx[606]: Failed to do mx_get_info: MX not initialized.
Feb 07 18:02:00 fb1 daqd_dc_mx[606]: 263596
Feb 07 18:02:02 fb1 systemd[1]: daqd_dc.service: main process exited, code=exited, status=1/FAILURE
Feb 07 18:02:02 fb1 systemd[1]: Unit daqd_dc.service entered failed state.

NOTE: We commented out the line


in the file "/etc/systemd/system/daqd_dc.service" in order to see the error, BUT MUST UNDO THIS AFTER THE PROBLEM IS FIXED!

  16656   Thu Feb 10 14:39:31 2022 KojiSummaryComputersNetwork security issue resolved

[Mike P / Koji / Tega / Anchal]

IMSS/LIGO IT notified us that "ILOM ports" of one of our hosts on the "114" network are open. We tried to shut down obvious machines but could not identify the host in question. So we decided to do a bit more systematic search of the host.

[@Network Rack]
- First of all, we disconnected the optical cables coming to the GC router while the ping is running on the AIRLIGO connected laptop (i.e. outside of the 40m network). This made the ping stopped. This means that the issue was definitely in the 40m.
- Secondly, we started to disconnect (and reconnect) the ethernet cables from the GC router one by one. We found that the ping response stops when the cable named "NODUS" was disconnected.

[@40m IFO lab]
- So we tracked the cable down in the 40m lab. After a while, we identified that the cable was really connected to nodus.

- Nodus was supposed to have one network connection to the martian network since the introduction of the bidirectional NAT router (rather than the old configuration with a single direction NAT router).

- In fact, the cable was connected to "non-networking" port of nodus. (Attachment 1). I guess the cable was connected like this long time, but somehow the ILOM (IPMI) port was activated along with the recent power cycling.

- The cable was disconnected at nodus too. (Attachment 2) And a tape was attached to the port so that we don't connect anything to the port anymore.

Attachment 1: PXL_20220210_220816955.jpg
Attachment 2: PXL_20220210_220827167.jpg
  16836   Mon May 9 15:32:14 2022 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

I made the first pass at a tool to measure the quantization noise of specific filters in the 40m system. The code for which can be found here. It takes the input to the filter bank and the filter coefficients for all of the filters in the filter bank. it then runs the input through all the filters and measures the quantization noise at each instance. It does this by subtracting the 64-bit output from the 32-bit output. Note: the actual system is 64 bit so I need to update it to subtract the 64-bit output from the 128-bit output using the long double format. This means that it must be run on a computer that supports the long double format. which I checked and Rossa does. The code outputs a number of plots that look like the one in Attachment 1. Koji suggested formatting a page for each of the filters that is automatically generated that shows the filter and the results as well as an SNR for the noise source. The code is formatted as a class so that it can be easily added to the IFOtest repo when it is ready.

I tracked down a filter that I thought may have lower thermal noise than the one that is currently used. The specifics of this will be in the DCC document version 2 that I am updating but a diagram of it is found in attachment 2. Preliminary calculations seemed to show that it had lower quantization noise than the current filter realization. I added this filter realization to the c code and ran a simple comparison between all of them. The results in Attachment 3 are not as good as I had hoped. The input was a two-toned sin wave. The low-level broadband signal between 10Hz and 4kHz is the quantization noise. The blue shows the current filter realization and there shows the generic and most basic direct form 2. The orange one is the new filter, which I personally call the Aircraft Biquad because I found it in this paper by the Hughes Aircraft Company. See fig 2 in paper. They call it the "modified canonic form realization" but there are about 20 filters in the paper that also share that name. in the DCC doc I have just given them numbers because it is easier. 

Whats next:

1) I need to make the review the qnoisetool code to make it compute the correct 64-bit noise. 

        a) I also want to add the new filter to the simulation to see how it does

2) Make the output into a summary page the way Koji suggested. 

3) complete the updated DCC document. I need to reconcile the differences between the calculation I made and the actual result of the simulation.

Attachment 1: SUS-ETMX_SUSYAW3_0.0.pdf
Attachment 2: LowNoiseBiquad2.pdf
Attachment 3: quant_noise_floor.pdf
  16881   Fri May 27 17:46:48 2022 PacoSummaryComputersCDS upgrade visit, downfall and rise of c1lsc models

[Paco, Anchal-remote, Yuta, JC]

Sometime around noon today, right after cds upgrade planning tour, c1lsc FE fell. We though this was ok because anyways c1sus was still up, but somehow the IFO alignment was compromised (this is in fact how we first noticed this loss). Yuta couldn't see REFL on the camera, and neither on the AP table (!!) so somehow either/all of TT1, TT2, PRM got affected by this model stopping. We even tried kicking PRM slightly to try and see if the beam was nearby with no success.

We decided to restart the models. To do this we first ssh into c1lsc, c1ioo and c1sus and stop all models. During this step, c1ioo and c1sus dropped their connection and so we had to physically restart them. We then noticed DC 0x4000 error in c1x04 (c1lsc iop) and after checking the gpstimes were different by 1 second. We then did stopped the model again, and from fb1 restart all daqd_* services and modprobe -r gpstime, modprobe gpstime, restart c1lsc and start the c1x04 model. This fixed the issue, so we finished restarting all FE models and burt restore all the relevant snap files to today 02:19 AM PDT.

This made the IFO recover its nominal alignment, minus the usual drift.

* The OAF model failed to start but we left it like so for now.

  16991   Tue Jul 12 13:59:12 2022 ranaSummaryComputersprocess monitoring: Monit

I've installed Monit on megatron and nodus just now, and will set it up to monitor some of our common processes. I'm hoping that it can give us a nice web view of what's running where in the Martian network.

  17058   Thu Aug 4 19:01:59 2022 TegaUpdateComputersFront-end machine in supermicro boxes

Koji and JC looked around the lab today and found some supermicro boxes which I was told to look into to see if they have any useful computers.


Boxes next to Y-arm cabinets (3 boxes: one empty)

We were expecting to see a smaller machine in the first box - like top machine in attachement 1 - but it turns out to actually contain the front-end we need, see bottom machine in attachment 1. This is the same machine as c1bhd currently on the teststand. Attachment 2 is an image of the machine in the second box (maybe a new machine for frambuilder?). The third box is empty.


Boxes next to X-arm cabinets (3 boxes)

Attachement 3 shows the 3 boxes each of which contains the same FE machine we saw earlier at the bottom of attachement 1. The middle box contains the note shown in attacment 4.


Box opposite Y-arm cabinets (1 empty box)


In summary, it looks like we have 3 new front-ends, 1 new front-end with networking issue and 1 new tower machine (possibly a frame builder replacement).

Attachment 1: IMG_20220804_184444473.jpg
Attachment 2: IMG_20220804_191658206.jpg
Attachment 3: IMG_20220804_185336240.jpg
Attachment 4: IMG_20220804_185023002.jpg
  17066   Mon Aug 8 17:16:51 2022 TegaUpdateComputersFront-end machine setup

Added 3 FE machines - c1ioo, c1lsc, c1sus -  to the teststand following the instructions in elog15947. Note that we also updated /etc/hosts on chiara by adding the names and ip of the new FE since we wish to ssh from there given that chiara is where we land when we connect to c1teststand.

Two of the FE machines - c1lsc & c1ioo - have the 6-core X5680 @ 3.3GHz processor and the BIOS were already mostly configured because they came from LLO I believe. The third machine - c1sus - has the 6-core X5650 @ 2.67GHz processor and required a complete BIOS config according to the doc.

Next Step:  I think the next step is to get the latest RTS working on the new fb1 (tower machine), then boot the frontends from there.

KVM switch note:

All current front-ends have the ps/2 keyboard and mouse connectors except for fb1, which only has usb ports. So we may not be able to connect to fb1 using a ps/2 KVM switch that works for all the current front-ends. The new tower machine does have a ps/2 connector so if we decide to use that as the bootserver and framebuilder, then we should be fine.

Attachment 1: IMG_20220808_170349717.jpg
  17074   Wed Aug 10 20:51:14 2022 TegaUpdateComputersCDS upgrade Front-end machine setup

Here is a summary of what needs doing following the chat with Jamie today.


Jamie brought over the KVM switch shown in the attachment and I tested all 16 ports and 7 cables and can confirm that they all work as expected.



1. Do a rack space budget to get a clear picture of how many front-ends we can fit into the new rack

2. Look into what needs doing and how much effort would be needed to clear rack 1X7 and use that instead of the new rack. The power down on Friday would present a good opportunity to do this work on Monday, so get the info ready before then. 

3. Start mounting front-ends, KVM and dolphin network switch

4. Add the BOX rack layout to the CDS upgrade page.

Attachment 1: IMG_20220810_171002928.jpg
Attachment 2: IMG_20220810_171019633.jpg
  17083   Tue Aug 16 18:22:59 2022 TegaUpdateComputersc1teststand rack mounting for CDS upgrade

[Tega, Yuta]

I keep getting confused about the purpose of the teststand. The view I am adopting going forward is its use as a platform for testing the compatibility of new hardware upgrade, instead of thinking of it as an independent system that works with old hardware.

The initial idea of clearing 1X7 cannot be done for now, because I missed the deadline for providing a detailed enough plan before Monday power up of the lab, so we are just going to go ahead and use the new rack as was initially intended and get the latest hardware and software tested here.

We mounted the DAQ, subnet and dolphin IX switches, see attachement 1. The mounting ears that came with the dolphin switch did not fit and so could not be used for mounting. We looked around the lab and decided to used one of the NavePoint mounting brackets which we found next to the teststand, see attachment 2.

We plan to move the new rack to the current location of the teststand and use the power connection from there. It is also closer to 1X7 so that moving the front-ends and switches to 1X7 should be straight forward after we complete all CDS upgrade testing.

Attachment 1: IMG_20220816_180157132.jpg
Attachment 2: IMG_20220816_175125874.jpg
  17088   Wed Aug 17 11:10:51 2022 ranaUpdateComputersc1teststand rack mounting for CDS upgrade

we want to be able to run SimPlant on the teststand, test our new controls algorithms, test watchdogs, and any other software upgrades. Ideally in the steady state it will run some plants with suspensions and cavities and we will develop our measurement scripts on there also (e.g. IFOtest).


[Tega, Yuta]

I keep getting confused about the purpose of the teststand. The view I am adopting going forward is its use as a platform for testing the compatibility of new hardware upgrade, instead of thinking of it as an independent system that works with old hardware.

ELOG V3.1.3-