40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
 40m Log, Page 98 of 348 Not logged in
ID Date Author Type Category Subject
14812   Thu Jul 25 14:28:03 2019 gautamConfigurationComputersfirewalld disabled for EPICS CA

I think rana did some more changes to this workstation to make it useful for commissioning activities - but the MEDM screens were still white blanks. The problem was that the firewalld wasn't disabled (last two steps of the KThorne setup wiki). I disabled it. Now donatella can run MEDM, ndscope and StripTool. DTT doesn't work to get online data because of a "Synchronization Error", I'm not bothering with this for now. I think Kruthi successfully demonstrated the fetching of offline data with DTT.

Attachment 1: donatellaCommissioning.png
14813   Thu Jul 25 20:08:36 2019 gautamUpdateComputersSolidworks machine

I brought one CPU (Dell T3500) and one 28" monitor from Mike Pedraza's office in Downs to the 40m. It is on Steve's desk right now, pending setup. The machine already has Solidworks and Altium installed on it, so we can set it up at our leisure. The login credentials are pasted on the CPU with a post-it should anyone wish to set it up.

14820   Wed Jul 31 14:44:11 2019 gautamUpdateComputersSupermicro inventory

Chub brought the replacement Supermicro we ordered to the 40m today. I stored it at the SW entrance to the VEA, along with the other Supermicro. At the time of writing, we have, in hand, two (unused) Supermicro machines. One is meant for EY and the other is meant for c1psl/c1iool0. DDR3 RAM and 120 GB SSD drives have also been ordered, but have not yet arrived (I think, Chub, please correct me if I'm wrong).

Update 20190802: The DDR3 RAM and 120 GB SSD drives arrived, and are stored in the FE hardware cabinet along the east arm. So at the time of writing, we have 2 sets of (Supermicro + 120GB HD + 4GB RAM).

 Quote: We should ask Chub to reorder several more SuperMicro rackmount machines, SSD drives, and DRAM cards. Gautam has the list of parts from Johannes' last order.
14829   Mon Aug 5 17:23:26 2019 gautamSummaryComputersWiFi Settings on asia

The VEA laptop asia was configured to be able to connect to too many WiFi networks - it was getting conflicted in its default position at the vertex and trying to hop between networks, for some reason trying to connect to networks that had poor signal strength. I deleted all options from the known networks except 40MARS. Now the network connection seems much more stable and reliable.

14831   Tue Aug 6 14:12:02 2019 yehonathanUpdateComputersmaking rossa great again

cdsutils is not working on rossa.

Import cdsutils produces this error:

In [2]: import cdsutils
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-949babce8459> in <module>()
----> 1 import cdsutils

/ligo/apps/linux-x86_64/cdsutils-480/lib/python2.7/site-packages/cdsutils/__init__.py in <module>()
53
54 try:
---> 55     import awg
56 except ImportError:
57     pass

/ligo/apps/linux-x86_64/cdsutils-480/lib/python2.7/site-packages/cdsutils/awg.py in <module>()
30 """
31
---> 32 import sys, numpy, awgbase
33 from time import sleep

/ligo/apps/linux-x86_64/cdsutils-480/lib/python2.7/site-packages/cdsutils/awgbase.py in <module>()
17 libawg = CDLL('libawg.so')
18 libtestpoint = CDLL('libtestpoint.so')
---> 19 libSIStr = CDLL('libSIStr.so')
20
21 ####

/ligo/apps/anaconda/lib/python2.7/ctypes/__init__.pyc in __init__(self, name, mode, handle, use_errno, use_last_error)
364
365         if handle is None:
--> 366             self._handle = _dlopen(self._name, mode)
367         else:
368             self._handle = handle

OSError: libSIStr.so: cannot open shared object file: No such file or directory

14864   Fri Sep 6 18:08:29 2019 ranaUpdateComputersAlarm noise from smart-ups machine under workstation?

please no one touch the UPS: last time it destroyed ROSSA. Please ask Chub to order the replacement batteries so we can do this in a controlled way (fully shutting down ALL workstations first). Last time we wasted 8 hours on ROSSA rebuilding.

 Quote: There was an alarm sound from the Smart-UPS 2200 sitting under the workstation. I see that the 'replace battery' light is red, and this elog tells me that these batteries are replaced every ~1-4 years; the last replacement was march 2016. Holding down the 'test' button for 2-3 seconds results in the alarm sound and does not clear the replace battery indicator.
14873   Thu Sep 12 09:49:07 2019 gautamUpdateComputerscontrol rm wkstns shutdown

Chub wanted to get the correct part number for the replacement UPS batteries which necessitated opening up the UPS. To be cautious, all the workstations were shutdown at ~9:30am while the unit is pulled out and inspected. While looking at the UPS, we found that the insulation on the main power cord is damaged at both ends. Chub will post photos.

However, despite these precautions, rossa reports some error on boot up (not the same xdisp junk that happened before). pianosa and donatella came back up just fine. It is remotely accessible (ssh-able) though so maybe we can recover it...

 Quote: please no one touch the UPS: last time it destroyed ROSSA. Please ask Chub to order the replacement batteries so we can do this in a controlled way (fully shutting down ALL workstations first). Last time we wasted 8 hours on ROSSA rebuilding
Attachment 1: IMG_7943.JPG
14913   Mon Sep 30 11:42:36 2019 aaronUpdateComputerscontrol rm wkstns shutdown

I booted Rossa in rescue mode; though I see no errors on bootup, I still see the same error ("a problem has occurred") after boot, and a prompt to logout. I powered rossa off/on (single short press of power button), no change.

Booting in debug mode, I see that the error occurs when mounting /cvs/cds, with the error

[FAILED] Failed to mount /cvs/cds.
See systemctl status cvs-cds.mount for details.
[DEPEND] Dependency failed for Remote File System

Which is odd, because when I boot in recovery mode, is mounts /cvs/cds successfully.

I booted in emergency mode by adding to the boot command

systemd.unit=emergency.target

but didn't have the appropriate root password to troubleshoot further (the usual two didn't work).

14925   Wed Oct 2 20:45:18 2019 ranaUpdateComputersrossa revival

Formatted and re-installing OS on rossa for the 3rd or 4th time this year. I suggest that whoever is installing software and adjusting video settings please stop.

If you feel you need to tinker deeply, use ottavia or zita and then be prepared to show up and fix it.

While I was moving the UPS around, the network lights went out for Rossa, so I may have damaged the network interface or cable. Debugging continues.

14935   Thu Oct 3 21:50:22 2019 ranaUpdateComputersrossa revival

Got the network to work again just by unplugging the power cord and letting it sit for awhile. But corrupted OS by trying to install Nvidia drivers.

14994   Mon Oct 28 18:55:06 2019 ranaUpdateComputersrossa revival

back on new Rossa from Xi computing

1. switched to using Display Port for video; this works. The DVi, HDMI, VGA ports are connected to the motherboard rather than the video card, so they are not active.
2. runs super slow w/ SL 7.6; maybe some service is running after startup?
3. install repos and update according to LLO CDS wiki
4. add controls user and group according to LLO wiki
5. remove gstreamer ugly because it breaks yum update
6. run 'yum update --skip-broken' because GDS doesn't work
7. turn off old selinux stuff
8. modify fstab to get NFS

Next:

1. finish mounting
2. xfce
3. figure out why the LLO install instructions can't install any CDS software (e.g. root, DTT, etc)

Update: Sun Nov 3 18:08:48 2019

1. moved the SL7 fresh install repos back into etc/yum.repos.d/. The LLO instructions has me remove them, but the LLO supplied repos are no good for standard apps. After putting these back was able to install standard apps (terminator, root, diaggui)
2. copied over /etc/fstab lines from pianosa sothat the NFS mounts work correctly
3. added symlinks so that the NFS dirs mount in the right dirs
4. symlink libsasl2.so.3 -> libsasl2.so.2 and now DTT runs and can get data now and in the past
5. install XFCE
6. sitemap / MEDM works
7.  Did "sudo ln -s /usr/lib64/libXm.so.4 /usr/lib64/libXm.so.3" to enable StripTool.

Update: Fri Nov 15 00:00:26 2019:

1. random hanging of machine while doing various window moving or workspace switching
2. turned off power management in XFCE
3. turned off power management on monitor
4. disabled SELINUX
6. installed most, pdftk, htop, glances, qtgrace, lesstif
7. dataviewer now works and QTgrace is much nicer than XMGrace
15031   Fri Nov 15 18:59:08 2019 ranaUpdateComputersZITA: started upgrade from Ubuntu 14 LTS to 18 LTS

and so it begins...until this is finished I have turned off the projector and moved the striptools to the big TV (time to look for Black Friday deals to replace the projector with a 120 inch LED TV)

15033   Mon Nov 18 16:32:15 2019 gautamUpdateComputersZITA: started upgrade from Ubuntu 14 LTS to 18 LTS

the upgrade seems to have been successfully executed - the machine was restarted at ~430pm local time. Projector remains off and diagnostic striptools are on the samsung.

 Quote: and so it begins...until this is finished I have turned off the projector and moved the striptools to the big TV (time to look for Black Friday deals to replace the projector with a 120 inch LED TV)
15084   Sun Dec 8 20:27:11 2019 ranaUpdateComputersViviana upgrade to Ubuntu 16

The IBM laptop at EX was running Ubuntu 14, so I allowed it to start upgrading itself to Ubuntu16 as it desired. After it is done, I will upgrade it to 18.04 LTS. We should have them all run LTS.

15085   Sun Dec 8 20:48:29 2019 ranaConfigurationComputersMegatron: starts up grade

I noticed recently that Megatron was running Ubuntu 12, so I've started its OS upgrade.

1. Unlocked the IMC + disabled the autolocker from the LockMC screen + closed the PSL shutter (IMC REFL shutter doesn't seem to do anythin)
2. Disabled the "FSS" slow servo on the FSS screen
3. did sudo apt-get update, sudo apt-get upgrade, and then sudo apt-get do-release-upgrade which starts the actual thing
4. According to the internet, the LTS upgrades will go in series rather than up to 18 in one shot, so its now doing 12 -> 14 (Trusty Tapir)

Megatron and IMC autolocking will be down for awhile, so we should use a different 'script' computer this week.

Mon Dec 9 14:52:58 2019

15095   Wed Dec 11 22:01:24 2019 ranaConfigurationComputersMegatron: starts up grade

Megatron is now running Ubuntu 18.04 LTS.

We should probably be able to load all the LSC software on there by adding the appropriate Debian repos.

I have re-enabled the cron jobs in the crontab.

The MC Autolocker and the PSL NPRO Slow/Temperature control are run using 'initctl', so I'll leave that up to Shruti to run/test.

15141   Wed Jan 22 16:38:01 2020 ranaUpdateComputersrossa revival

wiped and install Debian 10 on rossa today

still to be done: config it as CDS workstation

please don't try to "fix" it in the meantime

15142   Wed Jan 22 19:17:20 2020 gautamConfigurationComputersMegatron: starts up grade

cronjob testing wasn't one by one 😢

burt snapshots were gone

i brought them back home 🏠

 Quote: Megatron is now running Ubuntu 18.04 LTS.
15145   Thu Jan 23 15:32:42 2020 gautamConfigurationComputersMegatron: starts up grade

The burt snapshotting is still not so reliable - for whatever reason, the number of snapshot files that actually get written looks random. For example, the 14:19 backup today got all the snaps, but 15:19 did not. There are no obvious red flags in either the cron job logs or the autoburt log files. I also don't see any clues when I run the script in a shell. It'll be good if someone can take a look at this.

15159   Mon Jan 27 18:16:30 2020 gautamConfigurationComputersSluggish megatron?

I've also been noticing that the IMC Autolocker scripts are running rather sluggishly on Megatron recently. Some evidence - on Feb 11 2019, the time between the mcup script starting and finishing is ~10 seconds (I don't post the raw log output here to keep the elog short). However, post upgrade, the mean time is more like ~45-50 seconds. Rana mentioned he didn't install any of the modern LIGO software tools post upgrade, so maybe we are using some ancient EPICS binaries. I suspect the cron job for the burt snapshot is also just timing out due to the high latency in channel access. Rana is doing the software install on the new rossa, and once he verifies things are working, we will try implementing the same solution on megatron. The machine is an old Sun Microsystems one, but the system diagnostics don't signal any CPU timeouts or memory overflows, so I'm thinking the problem is software related...

 Quote: The burt snapshotting is still not so reliable - for whatever reason, the number of snapshot files that actually get written looks random. For example, the 14:19 backup today got all the snaps, but 15:19 did not. There are no obvious red flags in either the cron job logs or the autoburt log files. I also don't see any clues when I run the script in a shell. It'll be good if someone can take a look at this.
15164   Tue Jan 28 15:39:04 2020 gautamConfigurationComputersSluggish megatron?

There were a bunch of medm processes stalled on megatron (connected with screenshot taking). To see if they were interfering with the other scripts, I killed all of the medm processes, and commented out the line in the crontab that runs the screenshots every 10 mins. Let's see if this improves stability.

15167   Tue Jan 28 17:36:45 2020 gautamConfigurationComputersLocal EPICS7.0 installed on megatron

[Jon, gautam]

We found that the caput commands were taking much longer to execute on megatron than on pianosa (for example). Suspecting that this had something to do with the fact that megatron was using EPICS binaries from the shared NFS drive which were compiled for a much older OS, I installed the latest stable release of EPICS on megatron. The new caput commands execute much faster. I also added the local EPICS directory to the head of the PATH variable used by the MC autolocker and FSS Slow scripts, so that they use the new caput command. But mcup is still slow - maybe my new path definition isn't picked up and it is still using the NFS binaries? To be looked into...  Quote: There were a bunch of medm processes stalled on megatron (connected with screenshot taking). To see if they were interfering with the other scripts, I killed all of the medm processes, and commented out the line in the crontab that runs the screenshots every 10 mins. Let's see if this improves stability. 15246 Wed Mar 4 11:10:47 2020 YehonathanUpdateComputersAllegra revival Allegra had no network cable and no mouse. We found Allegra'snetwork cable (black) and connected it. I found a dirty old school mouse and connected it. I wiped Allegra and now I'm currently installing debian 10 on allegra following Jon's elog. 04/01 update: I forgot to mention that I tried installing cds software by following Jamie's instruction: I added the line in /etc/apt/sources.list.d/lscsoft.list: "deb http://software.ligo.org/lscsoft/debian/ stretch contrib". But this the only thing I managed to do. The next command in the instructions failed. 15276 Fri Mar 13 20:00:50 2020 JonUpdateComputersLoopback monitoring for slow machines Summary Today I finished implementing loopback monitors of the up/down state of the slow controls machines. They are visible on a new MEDM screen accessible from Sitemap > CDS > Slow Machine Status (pictured in attachment 1). Each monitor is a single EPICS binary channel hosted by the slow machine, which toggles its state at 1 Hz (an alive "blinker"). For each machine, the monitor is defined by a separate database file named c1[machine]_state.db located in the target directory. This is implemented for all upgraded machines, which includes every slow machine except for c1auxey. This is the next and final one slated for replacement. Implementation The blinkers are currently implemented as soft channels, but I'd like to ultimately convert them to hard channels using two sinking/sourcing BIO units. This will require new wiring inside each Acromag chassis, however. For now, as soft channels, the monitors are sensitive to a failure of the host machine or a failure of the EPICS IOC. As hard channels, they will additionally be sensitive to a failure of the secondary network interface, as has been known to happen. Each slow machine's IOC had to be restarted this afternoon to pick up the new channels. The IOCs were restarted according to the following procedure. c1auxex • Disabled OPLEV servos on ETMX • Zeroed slow biases • Disabled watchdog • Restarted IOC • Reverted 1-3 ​c1vac • Closed V1, VM1 • Restarted IOC • Returned valves to original state c1psl • Disabled IMC autolocker • Closed PSL shutter • Restarted IOC • Reverted 1-2 c1iscaux • ​Restarted IOC c1susaux • Disabled IMC autolocker • Closed shutter • Disabled OPLEV servos on: MC1, MC2, MC3, BS, ITMX, ITMX, PRM, SRM • Zeroed slow biases • Disabled watchdogs • Restarted IOC • Reverted 1-5 The intial recovery of c1susaux did not succeed. Most visibly, the alignment state of the IFO was not restored. After some debugging, we found that the restart of the modbus service was partially failing at the final burt-restore stage. The latest snapshot file /opt/rtcds/caltech/c1/burt/autoburt/latest/c1susaux.snap was not found. I manually restored a known good snapshot from earlier in the day (15:19) and we were able to relock the IMC and XARM. GV and I were just talking earlier today about eliminating these burt-restores from the systemd files. I think we should. Attachment 1: Screen_Shot_2020-03-13_at_7.59.55_PM.png 15447 Wed Jul 1 18:16:09 2020 gautamUpdateComputersrossa re-re-revival In an effort to make a second usable workstation, I did the following (remotely) on rossa today (not necessarily in this order, I wasn't maintaining a live log so I forgot): 1. Fixed /etc/resolv.conf, so that the other martian machines can be found. 2. Copied over .bashrc file, and the appropriate lines from /etc/fstab from pianosa to rossa. 3. Ran sudo apt install nfs-common. Then ran sudo mount -a to get /cvs/cds mounted. 4. Made symlinks for /users and /opt/rtcds , and /ligo. All of these are used by various environment-setting scripts and I chose to preserve the structure, though why we need so many symlinks, I don't know... 5. Set up the shell variableNDSSERVER using export NDSSERVER=fb:8088. I'm not sure how, but I believe DTT, awggui etc use this on startup to get the channel list (any
6. Followed instructions from Erik von Reis at LHO to install the cds workstation packages and dependencies. Worked like a charm 🎃
7. As a test, I plotted the accelerometer spectra in DTT, see Attachment #1. I also launched foton from inside awggui, and confirmed that the sample rate is inherited and I could designate a filter. But I haven't yet run the noise injection to test it, I'll do that the next time I'm in the lab.
8. Also checked that medm, StripTool and ndscope, and anaconda python all seem to work 👍🏾.

So, in summary, rossa is now all set up for use during lock acquisition. However, until this machine has undergone a few months of testing, we should freeze the pianosa config and not mess with it.

Note that this version of the "crtools" is rather new. Please, use them and if there is an issue, report the errors! I am going to occassionally try lock acquisition using rossa.

 Quote: wiped and install Debian 10 on rossa today still to be done: config it as CDS workstation please don't try to "fix" it in the meantime
Attachment 1: MCacc.pdf
15449   Sun Jul 5 16:14:41 2020 ranaUpdateComputersrossa re-re-revival

maybe we should make a "dd" copy of pianosa in case rossa has issues and someone destroys pianosa by accidentally spilling coffee on it.

 So, in summary, rossa is now all set up for use during lock acquisition. However, until this machine has undergone a few months of testing, we should freeze the pianosa config and not mess with it.
15451   Sun Jul 5 18:39:57 2020 ranaUpdateComputersrossa: printer

I did

sudo usermod -a -G lpadmin controls

and then was able to add Grazia to the list of printers for Rossa by following the instructions on the 40m Wiki.

I installed color syntax highlighting on Rossa using the internet (https://superuser.com/questions/71588/how-to-syntax-highlight-via-less). Now if you do 'less genius_code.py', it will be highlighting the python syntax.

when I try 'sitemap' on rossa I get:

medm: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

15452   Mon Jul 6 00:37:28 2020 gautamUpdateComputersrossa: lib symlink

This is strange - I was definitely able to launch medm when I was working on this machine remotely on Friday. But now, there does seem to be a problem with this shared library being missing.

First of all, I installed mlocate to find where the shared library files are installed. Then I made the symlink, and now sitemap seems to work again.

Weirdly, my changes to /etc/resolv.conf got overwritten somehow. Was this machine rebooted? Uptime suggests it's only been running for ~6 hours at the time of writing of this elog.

sudo apt install mlocate sudo updatedb sudo ln -s /usr/lib/x86_64-linux-gnu/libreadline.so.7 /usr/lib/x86_64-linux-gnu/libreadline.so.6
 Quote: when I try 'sitemap' on rossa I get: medm: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory
15454   Mon Jul 6 12:43:02 2020 ranaUpdateComputersrossa: lib symlink

yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display

maybe we're supposed to edit something besides resolv.conf since that gets over-written on boot for some linux OS

15455   Mon Jul 6 12:51:41 2020 gautamUpdateComputersrossa: resolvconf installed

Indeed, this is now fixed by following instructions from here. I rebooted rossa at ~1250 PDT and confirmed that resolv.conf didn't get overwritten. The resolv.conf file also now has the following useful lines at the head:

~>cat /etc/resolv.conf # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
 Quote: yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display maybe we're supposed to edit something besides resolv.conf since that gets over-written on boot for some linux OS
15460   Wed Jul 8 22:50:33 2020 gautamUpdateComputersrossa: more symlinks

I wanted to try using rossa as my locking workstation today. However, a few problems became quickly evident. Basically, any of our scripts that rely on the cdsutils package (there are MANY) will not work on rossa, because of some library error. This machine is running Debian 10, while the cdsutils package is being loaded from a pre-compiled install on the shared drive, so perhaps this isn't surprising?

Digging a little more, I found that actually, a version of cdsutils that actually works with python3 is actually shipped with the standard cds-workstation meta-package. This is great news, and we should try and use this where possible I guess. Deferring further debugging for daytime work.

Anyway, I added a symlink: sudo ln -s /usr/lib/x86_64-linux-gnu/libncurses.so.6 /usr/lib/x86_64-linux-gnu/libncurses.so.5, and installed wmctrl using sudo apt install wmctrl

15463   Thu Jul 9 16:16:20 2020 gautamUpdateComputersrossa: graphics driver issues?

I noticed these streaky lines again today (but they were not a problem last night). It is annoying if we have to reboot this machine all the time. I wonder if this has something to do with missing drivers. When I ran sudo apt update && sudo apt upgrade, I got several lines like (this isn't the whole stack trace)

W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/ucode_unload.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/ucode_load.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/unload_bl.bin for module nouveau W: Possible missing firmware /lib/firmware/nvidia/gp108/acr/bl.bin for module nouveau

Is this indicative of the graphics drivers being installed incorrectly? I am hesitant to mess with this because I think in the past, it was always trying to update some graphics driver that crashed the whole machine into some weird state where we have to wipe the drive and do a fresh re-install of the OS.

Should we just follow these instructions? The graphics card is apparently Quadro P400, which is one of the supported ones according to the list of supported devices.

Or just swap donatella and rossa monitors and defer the problem for later?

 Quote: yes, I rebooted yesterday to fix the 'steaking white lines' problem in the video/display
15469   Sat Jul 11 00:10:22 2020 gautamUpdateComputersrossa: more developmental work

After some consultation with Erik von Reis at LHO, this workstation is progressing towards being usable for most commissioning tasks. DTT, awggui, foton, and MEDM are all now working well. The main limitation now comes from the fact that many of our python scripts are written for python2, and rossa doesn't have many dependencies installed for python2. I see no reason to build these dependencies on rossa for python2, we should not have to work with an unsupported language. But at the same time, I don't want to completely wipe all our python2 scripts, and make them python3, because this would involve a lot of tedious testing that I'm not prepared to undertake at the moment (the problem is compounded by the fact that pianosa does not have many dependencies installed for python3).

So what I have done in the interim is make python3 versions of the most important scripts I need to get the PRFPMI locking working - they are in the scripts directory and have the same names as their python2 counterparts, but have a 3 appended to their names. So when working on rossa, these are the scripts that are called. Eventually, after a lot more testing, we can depracate the old scripts. Currently, where applicable, the MEDM screens allow for either the python2 or python3 version of the script to be called.

Please, for the time being, do not try and install any new packages on rossa unless you are prepared to debug any problems caused and return the machine to a workable state. If you find some issue with a missing package on rossa, (i) make a note of it on the elog, and (ii) if possible, set up your own conda environment for testing and install dependencies to that environment only.

15473   Mon Jul 13 11:33:18 2020 ranaUpdateComputersrossa: more developmental work

I too, would prefer py3 for everything, but aren't all the cdsutils / gaurdian things still python2?

Is it possible to just make a python2 conda environment on rossa? I would guess that its simple and won't interfere with the regular operation of that machine.

15475   Mon Jul 13 12:37:05 2020 gautamUpdateComputersrossa: more developmental work

In fact, all these utilities are now available in python3. There may be some bugs (e.g. this), but I've checked basic functionality and things look usable enough for development to proceed. While we can have a python2 env on rossa, I think it's unnecessary.

 Quote: I too, would prefer py3 for everything, but aren't all the cdsutils / gaurdian things still python2?
15921   Mon Mar 15 20:40:01 2021 ranaConfigurationComputersinstalled QTgrace on donatella for dataviewer

I installed QTgrace using yum on donatella. Both Grace and XMgrace are broken due to some boring fight between the Fedora package maintainers and the (non existent) Grace support team. So I have symlinked it:

controls@donatella|bin> sudo mv xmgrace xmgrace_bak
controls@donatella|bin> sudo ln -s qtgrace xmgrace
controls@donatella|bin> pwd
/usr/bin


I checked that dataviewer works now for realtime and playback. Although the middle click paste on the mouse doesn't work yet.

Attachment 1: cutiegrace.png
15928   Wed Mar 17 09:05:01 2021 Paco, AnchalConfigurationComputers40m Control Room Changes
• Switched positions of allegra and donatella.
• While doing so, the hdmi cable previously used by donatella snapped. We replaced this cable by another unused cable we found connected only on one end to rossa. We should get more HDMI cables if that cable was in use for some other purpose.
• Paco bought a bluetooth speaker/mic that is placed infront of allegra and it's usb adapter is connected to iMac's keyboard in the bottom. With the new camera installed, the 40m video call environment is now complete.
• Again, we have placed allegra's monitor for place holder but it is not working and we need new monitors for it in future whenever it is going to be used.
15945   Fri Mar 19 15:26:19 2021 AidanUpdateComputersActivated MATLAB license on megatron

15946   Fri Mar 19 15:31:56 2021 AidanUpdateComputersActivated MATLAB license on donatella

15955   Tue Mar 23 09:16:42 2021 Paco, AnchalUpdateComputersPower cycled C1PSL; restored C1PSL

So actually, it was the C1PSL channels that had died. We did the following to get them back:

• We went to this page and tried the telnet procedure. But it was unable to find the host.
• So we followed the next advice. We went to the 1X1 rack and manually hard shut off C1PSL computer by holding down the power button until the LEDs went off.
• We wait for 5-7 seconds and switched it back on.
• By the time we were back in control room, the C1PSL channels were back online.
• The mode cleaner however was struggling to keep the lock. It was going in and out of lock.
• So we followed the next advice and did burt restore which ran following command:
burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/snapshots/2021/Mar/22/17:19/c1psl.snap -l /tmp/controls_1210323_085130_0.write.log -o /tmp/controls_1210323_085130_0.nowrite.snap -v
• Now the mode cleaner was locked but we found that the input switch of C1IOO-WFS1_PIT and C1IOO-WFS2_PIT filter banks were off. Which meant that only YAW sensors were in loop in the lock.
• We went back in dataviewer and checked when these channels were shut down. See attachments for time series.
• It seems this happened yesterday, March 22nd near 1:00 pm (20:00:00 UTC). We can't find any mention of anyone else doing it on elog and we left by 12:15pm.
• So we shut down the PSL shutter (C1:PSL-PSL_ShutterRqst) and switched off MC autolocker (C1:IOO-MC_LOCK_ENABLE).
• Switched on C1:IOO-WFS1_PIT_SW1 and C1:IOO-WFS2_PIT_SW1.
• Turned back on PSL shutter (C1:PSL-PSL_ShutterRqst) and MC autolocker (C1:IOO-MC_LOCK_ENABLE).
• Mode cleaner locked back easily and now is keeping lock consistently. Everything looks normal.
Attachment 1: MCWFS1and2PITYAW.pdf
Attachment 2: MCWFS1and2PITYAW_Zoomed.pdf
16027   Wed Apr 14 13:16:20 2021 AnchalConfigurationComputers40m Control Room Changes
• I have confirmed that the old two monitors' backlighting is not working. One can see the impression of the display without any brightness on them. Both old monitors are on the shelf behind.
• Today we got a monitor and mouse from Mike. I had to change /etc/default/grub GRUB_GFXMODE to 1920x1200@30 on allegra for it to work with the(any) monitor.
• Allegra is Debian 10 with latest cds-workstation installed on it. It is a good test station to migrate our existing scripts to start using updated cds-workstation configuration.
 Quote: Again, we have placed allegra's monitor for place holder but it is not working and we need new monitors for it in future whenever it is going to be used.

16249   Fri Jul 16 16:26:50 2021 gautamUpdateComputersDocker installed on nodus

I wanted to try hosting some docker images on a "private" server, so I installed Docker on nodus following the instructions here. The install seems to have succeeded, and as far as I can tell, none of the functionality of nodus has been disturbed (I can ssh in, access shared drive, elog seems to work fine etc). But if you find a problem, maybe this action is responsible. Note that nodus is running Scientific Linux 7.3 (Nitrogen).

16287   Mon Aug 23 10:17:21 2021 PacoSummaryComputerssystem reboot glitch

[paco]

At 09:34 PST I noted a glitch in the controls room as the machines went down except for c1ioo. Briefly, the video feeds disappeared from the screens, though the screens themselves didn't lose power. At first I though this was some kind of power glitch, but upon checking with Jordan, it most likely was related to some system crash. Coming back to the controls room, I could see the MC reflection beam swinging, but unfortunately all the FE models came down. I noticed that the DAQ status channels were blank.

I ssh into c1ioo no problem and ran "rtcds stop c1ioo c1als c1omc", then "rtcds restart c1x03" to do a soft restart. This worked, but the DAQ status was still blank. I then tried to ssh into c1sus and c1lsc without success, similarly c1iscex and c1iscey were unreachable. I went and did a hard restart on c1iscex by switching it off, then its extension chassis, then unplugging the power cords, then inverting these steps, and could ssh into it from rossa. I ran "rtcds start c1x01" and saw the same blank DAQ status. I noticed the elog was also down... so nodus was also affected?

[paco, anchal]

Anchal got on zoom to offer some assistance. We discovered that the fb1 and nodus were subject to some kind of system reboot at precisely 09:34. The "systemctl --failed" command on fb1 displayed both the daqd_dc.service and rc-local.service as loaded but failed (inactive). Is it a good idea to try and reboot the fb1 machine? ... Anchal was able to bring elog back up from nodus (ergo, this post).

[paco]

Although it probably needs the DAQ service from the fb1 machine to be up and running, I tried running the scripts/cds/rebootC1LSC.sh script. This didn't work. I tried running sudo systemctl restart daqd_dc from the fb1 machine without success. Running systemctl reset-failed "worked" for daqd_dc and rc-local services on fb1 in the sense that they were no longer output from systemctl --failed, but they remained inactive (dead) when running systemctl status on them. Following from  15303   I succeeded in restarting the daqd services. Turned out I needed to manually start the open-mx and mx services in fb1. I rerun the restartC1LSC script without success. The script fails because some machines need to be rebooted by hand.

16307   Thu Sep 2 17:53:15 2021 PacoSummaryComputerschiara down, vac interlock tripped

[paco, koji, tega, ian]

Today in the morning the name server / network file system running in chiara failed. This resulted in donatella/pianosa/rossa shell prompts to hang forever. It also made sitemap crash and even dropping into a bash shell and just listing files from some directory in the file system froze the computer. Remote ssh sessions on nodus also had the same symptoms.

A little after 1 pm, we started debugging this issue with help from Koji. He suggested we hook a monitor, keyboard, and mouse onto chiara as it should still work locally even if something with the NFS (network file system) failed. We did this and then we tried for a while to unmount the /dev/sdc1/ from /home/cds/ (main file system) and mount /dev/sdb1/ from /media/40mBackup (backup copy) such that they swap places. We had no trouble unmounting the backup drive, but only succeeded in unmounting the main drive with the "lazy" unmount, or running "umount -l". Running "df" we could see that the disk space was 100% used, with only ~ 1 GB of free space which may have been the cause for the issue. After swapping these disks by editing the /etc/fstab file to implement the aforementioned swapping, we rebooted chiara and we recovered the shell prompts in all workstations, sitemap, etc... due to the backup drive mounting. We then started investigating what caused the main drive to fill up that quickly, and noted that weirdly now the capacity was at 85% or about 500GB less than before (after reboot and remount), so some large file was probably dumped into chiara that froze the NFS causing the issue.

At this point we tried opening the PSL shutter to recover the IMC. The shutter would not open and we suspected the vacuum interlock was still tripped... and indeed there was an uncleared error in the VAC screen. So with Koji's guidance we walked to the c1vac near the HV station and did the following at ~ 5:13 PM -->

1. Open V4; apart from a brief pressure spike in PTP2, everything looked ok so we proceeded to
2. Open V1; P2 spiked briefly and then started to drop. Then, Koji suggested that we could
3. Close V4; but we saw P2 increasing by a factor of~ 10 in a few seconds, so we
4. Reopened V4;

We made sure that P1a (main vacuum pressure) was dropping and before continuing we decided to look back to see what the nominal vacuum state was that we should try to restore.

We are currently searching the two systems for diffrences to see if we can narrow down the culprit of the failure.

16312   Thu Sep 2 21:21:14 2021 KojiSummaryComputersVacuum recovery 2

Attachment 1:
We are pumping the main volume with TP2. Once P1a reached the pressure ~2.2mtorr, we could open the PSL shutter. The TP2 voltage went up once but came down to ~20V. It's close to nominal now.
We wondered if we should use TP3 or not. I checked the vacuum pressure trends and found that the annulus pressures were going up. So we decided to open the annulus valves.

Attachment 2:
The current vacuum status is as shown in the MEDM screenshot.

There is no trend data of the valve status (sad)

Attachment 1: Screenshot_2021-09-02_21-20-24.png
Attachment 2: Screenshot_2021-09-02_21-20-48.png
16313   Thu Sep 2 21:49:03 2021 PacoSummaryComputerschiara down, vac interlock tripped

[tega, paco]

We found the files that took excess space in the chiara filesystem (see Attachment 1). They were error files from the summary pages that were ~ 50 GB in size or so located under /home/cds/caltech/users/public_html/detcharsummary/logs/. We manually removed them and then copied the rest of the summary page contents into the main file system drive (this is to preserve the information backup before it gets deleted by the cron job at the end of today) and checked carefully to identify the actual issue for why these files were as large in the first place.

We then copied the /detcharsummary directory from /media/40mBackup into /home/cds to match the two disks.

Attachment 1: 2021-09-02_21-51-15.png
16314   Fri Sep 3 02:03:15 2021 TegaSummaryComputersStrip down large error files

Also deleted the ~50GB error files from ldas to prevent rsync from copying them to nodus again. With the new update to GWsumm, there are new error messages that initially didn't seem to affect the summary pages functionality, but in the extreme case can populated the error files the repeated warnings on the form "Loading: FrSerData", "Loading: FrSerData::n4294967295", "Loading: FrSummary","Loading: FrSerDataLoading: FrSerData" and many more combinations until we get file sizes of the order of ~50GB. So I have updated the checkstatus script to parse the error files and strip out the majority of these error messages. Work is ongoing to get them all.

In light of these large files generation, I decided to look in the summary pages folder to see if there are other large files that we need to keep track of and it turns there are indeed a collection of files in the archive folder that bloats the summary pages on ldas to ~1TB. Luckily these are not synced to nodus so no problem here. However, since the beginning of the year, the archive folders that hold data used for each day's computation have not been cleared. We have a script for doing this but it has not been run for a while now and it only delete archive files for a specific month which is hardcoded to two months from the date the file is run. I have modified the code to allow archive deletion for a range of months so we can clear data from Jan to July.

 Quote: [tega, paco] We found the files that took excess space in the chiara filesystem (see Attachment 1). They were error files from the summary pages that were ~ 50 GB in size or so located under /home/cds/caltech/users/public_html/detcharsummary/logs/. We manually removed them and then copied the rest of the summary page contents into the main file system drive (this is to preserve the information backup before it gets deleted by the cron job at the end of today) and checked carefully to identify the actual issue for why these files were as large in the first place. We then copied the /detcharsummary directory from /media/40mBackup into /home/cds to match the two disks.

16346   Mon Sep 20 15:23:08 2021 YehonathanUpdateComputersWifi internet fixed

Over the weekend and today, the wifi was acting bad with frequent disconnections and no internet access. I tried to log into the web interface of the ASUS wifi but with no success.

I pushed the reset button for several seconds to restore factory settings. After that, I was able to log in. I did the automatic setup and defined the wifi passwords to be what they used to be.

Internet access was restored. I also unplugged and plugged back all the wifi extenders in the lab and moved the extender from the vertex inner wall to the outer wall of the lab close to the 1X3.

Now, there seems to be wifi reception both in X and Y arms (according to my android phone).

16348   Mon Sep 20 15:42:44 2021 Ian MacMillanSummaryComputersQuantization Code Summary

This post serves as a summary and description of code to run to test the impacts of quantization noise on a state-space implementation of the suspension model.

Purpose: We want to use a state-space model in our suspension plant code. Before we can do this we want to test to see if the state-space model is prone to problems with quantization noise. We will compare two models one for a standard direct-ii filter and one with a state-space model and then compare the noise from both.

Signal Generation:

First I built a basic signal generator that can produce a sine wave for a specified amount of time then can produce a zero signal for a specified amount of time. This will let the model ring up with the sine wave then decay away with the zero signal. This input signal is generated at a sample rate of 2^16 samples per second then stored in a numpy array. I later feed this back into both models and record their results.

State-space Model:

The code can be seen here

The state-space model takes in the list of excitation values and feeds them through a loop that calculates the next value in the output.

Given that the state-space model follows the form

$\dot{x}(t)=\textbf{A}x(t)+ \textbf{B}u(t)$   and  $y(t)=\textbf{C}x(t)+ \textbf{D}u(t)$ ,

the model has three parts the first equation, an integration, and the second equation.

1. The first equation takes the input x and the excitation u and generates the x dot vector shown on the left-hand side of the first state-space equation.
2. The second part must integrate x to obtain the x that is found in the next equation. This uses the velocity and acceleration to integrate to the next x that will be plugged into the second equation
3. The second equation in the state space representation takes the x vector we just calculated and then multiplies it with the sensing matrix C. we don't have a D matrix so this gives us the next output in our system

This system is the coded form of the block diagram of the state space representation shown in attachment 1

Direct-II Model:

The direct form 2 filter works in a much simpler way. because it involves no integration and follows the block diagram shown in Attachment 2, we can use a single difference equation to find the next output. However, the only complication that comes into play is that we also have to keep track of the w(n) seen in the middle of the block diagram. We use these two equations to calculate the output value

$y[n]=b_0 \omega [n]+b_1 \omega [n-1] +b_2 \omega [n-2]$,  where w[n] is  $\omega[n]=x[n] - a_1 \omega [n-1] -a_2 \omega[n-2]$

Bit length Control:

To control the bit length of each of the models I typecast all the inputs using np.float then the bit length that I want. This simulates the computer using only a specified bit length. I have to go through the code and force the code to use 128 bit by default. Currently, the default is 64 bit which so at the moment I am limited to 64 bit for highest bit length. I also need to go in and examine how numpy truncates floats to make sure it isn't doing anything unexpected.

Bode Plot:

The bode plot at the bottom shows the transfer function for both the IIR model and the state-space model. I generated about 100 seconds of white noise then computed the transfer function as

$G(f) = \frac{P_{csd}(f)}{P_{psd}(f)}$

which is the cross-spectral density divided by the power spectral density. We can see that they match pretty closely at 64 bits. The IIR direct II model seems to have more noise on the surface but we are going to examine that in the next elog

Attachment 1: 472px-Typical_State_Space_model.svg.png