40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m elog, Page 51 of 357  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  13132   Sun Jul 23 15:00:28 2017 JamieOmnistructureVACstrange sound around X arm vacuum pumps

While walking down to the X end to reset c1iscex I heard what I would call a "rythmic squnching" sound coming from under the turbo pump.  I would have said the sound was coming from a roughing pump, but none of them are on (as far as I can tell).

Steve maybe look into this??

  13136   Mon Jul 24 10:59:08 2017 JamieUpdateCDSc1iscex models died
Quote:

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

This was me.  I had rebooted that machine and hadn't restarted the models.  Sorry for the confusion.

  13138   Mon Jul 24 19:28:55 2017 JamieUpdateCDSfront end MX stream network working, glitches in c1ioo fixed

MX/OpenMX network running

Today I got the mx/open-mx networking working for the front ends.  This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel.  Again, this will all be documented.

controls@fb1:~ 0$ /opt/mx/bin/mx_info
MX Version: 1.2.16
MX Build: root@fb1:/opt/src/mx-1.2.16 Mon Jul 24 11:33:57 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:43:74:62
    Product code:    10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:    485052
    Mapper:        00:60:dd:43:74:62, version = 0x00000000, configured
    Mapped hosts:    6

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:43:74:62 fb1:0                             1,0
   1) 00:30:48:be:11:5d c1iscex:0                         1,0
   2) 00:30:48:bf:69:4f c1lsc:0                           1,0
   3) 00:25:90:0d:75:bb c1sus:0                           1,0
   4) 00:30:48:d6:11:17 c1iscey:0                         1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
controls@fb1:~ 0$

c1ioo timing glitches fixed

I also checked the BIOS on c1ioo and found that the serial port was enabled, which is known to cause timing glitches.  I turned off the serial port (and some power management stuff), and rebooted, and all the c1ioo timing glitches seem to have gone away.

It's unclear why this is a problem that's just showing up now.  Serial ports have always been a problem, so it seems unlikely this is just a problem with the newer kernel.  Could the BIOS have somehow been reset during the power glitch?

In any event, all the front ends are now booting cleanly, with all dolphin and mx networking coming up automatically, and all models running stably:

Now for daqd...

  13145   Wed Jul 26 19:13:07 2017 JamieUpdateCDSdaqd showing same instability as before

I recompiled daqd on the updated fb1, similar to how I had before, and we're seeing the same instability: process crashes when it tries to write out the second trend (technically it looks like it crashes while it's trying to write out the full frame while the second trend is also being written out).  Jonathan Hanks and I are actively looking into it and i'll provide further report soon.

  13149   Fri Jul 28 20:22:41 2017 JamieUpdateCDSpossible stable daqd configuration with separate DC and FW

This week Jonathan Hanks and I have been trying to diagnose why the daqd has been unstable in the configuration used by the 40m, with data concentrator (dc) and frame writer (fw) in the same process (referred to generically as 'fb').  Jonathan has been digging into the core dumps and source to try to figure out what's going on, but he hasn't come up with anything concrete yet.

As an alternative, we've started experimenting with a daqd configuration with the dc and fw components running in separate processes, with communication over the local loopback interface.  The separate dc/fw process model more closely matches the configuration at the sites, although the sites put dc and fwprocesses on different physical machines.  Our experimentation thus far seems to indicate that this configuration is stable, although we haven't yet tested it with the full configuration, which is what I'm attempting to do now.

Unfortunately I'm having trouble with the mx_stream communication between the front ends and the dc process.  The dc does not appear to be receiving the streams from the front ends and is producing a '0xbad' status message for each.  I'm investigating.

  13153   Mon Jul 31 18:44:40 2017 JamieUpdateCDSCDS system essentially fully recovered

The CDS system is mostly fully recovered at this point.  The mx_streams are all flowing from all front ends, and from all models, and the daqd processes are receiving them and writing the data to frames:

Remaining unresolved issues:

  • IFO needs to be fully locked to make sure ALL components of all models are working.
  • The remaining red status lights are from the "FB NET" diagnostics, which are reflecting a missing status bit from the front end processes due to the fact that they were compiled with an earlier RCG version (3.0.3) than the mx_streams were (3.3+/trunk).  There will be a new release of the RTS soon, at which point we'll compile everything from the same version, which should get us all green again.
  • The entire system has been fully modernized, to the target CDS reference OS (Debian jessie) and more recent RCG versions.  The management of the various RTS components, both on the front ends and on fb, have as much as possible been updated to use the modern management tools (e.g. systemd, udev, etc.).  These changes need to be documented.  In particular...
  • The fb daqd process has been split into three separate components, a configuration that mirrors what is done at the sites and appears to be more stable: The "target" directory for all of these components is now:
    • daqd_dc: data concentrator (receives data from front ends)
    • daqd_fw: receives frames from dc and writes out full frames and second/minute trends
    • daqd_rcv: NDS1 server (raises test points and receives archive data from frames from 'nds' process)
    The "target" directory for all of these new components is:
    • /opt/rtcds/caltech/c1/target/daqd
    All of these processes are now managed under systemd supervision on fb, meaning the daqd restart procedure has changed.  This needs to be simplified and clarified.
  • Second trend frames are being written, but for some reason they're not accessible over NDS.
  • Have not had a chance to verify minute trend and raw minute trend writing yet.  Needs to be confirmed.
  • Get wiper script working on new fb.
  • Front end RTS kernel will occaissionally crash when the RTS modules are unloaded.  Keith Thorne apparently has a kernel version with a different set of patches from Gerrit Kuhn that does not have this problem.  Keith's kernel needs to be packaged and installed in the front end diskless root.
  • The models accessing the dolphin shared memory will ALL crash when one of the front end hosts on the dolphin network goes away.  This results in a boot fest of all the dolphin-enabled hosts.  Need to figure out what's going on there.
  • The RCG settings snapshotting has changed significantly in later RCG versions.  We need to make sure that all burt backup type stuff is still working correctly.
  • Restoration of /frames from old fb SCSI RAID?
  • Backup of entirety of fb1, including fb1 root (/) and front end diskless root (/diskless)
  • Full documentation of rebuild procedure from Jamie's notes.
  13164   Thu Aug 3 19:46:27 2017 JamieUpdateCDSnew daqd restart procedure

This is the daqd restart procedure:

$ ssh fb1 sudo systemctl restart daqd_*

That will restart all of the daqd services (daqd_dc, daqd_fw, daqd_rcv).

The front end mx_stream processes should all auto-restart after the daqd_dc comes back up.  If they don't (models show "0x2bad" on DC0_*_STATUS) then you can execute the following to restart the mx_stream process on the front end:

$ ssh c1<host> sudo systemctl restart mx_stream

 

 

  13165   Thu Aug 3 20:15:11 2017 JamieUpdateCDSdataviewer can not raise test points

For some reason dataviewer is not able to raise test points with the new daqd setup, even though dtt can.  If you raise a test point with dtt then dataviewer can show the data fine.

It's unclear to me why this would be the case.  It might be that all the versions of dataviewer on the workstations are too old??  I'll look into it tomorrow to see if I can figure out what's going on.

  13198   Fri Aug 11 19:34:49 2017 JamieUpdateCDSCDS final bits status update

So it appears we now have full frames and second, minute, and minute_raw trends.

We are still not able to raise test points with daqd_rcv (e.g. the NDS1 server), which is why dataviewer and nds2-client can't get test points on their own.

We were not able to add the EDCU (EPICS client) channels without daqd_fw crashing.

We have a new kernel image that's supposed to solve the module unload instability issue.  In order to try it we'll need to restart the entire system, though, so I'll do that on Monday morning.

I've got the CDS guys investigating the test point and EDCU issues, but we won't get any action on that until next week.

Quote:

Remaining unresolved issues:

  • IFO needs to be fully locked to make sure ALL components of all models are working.
  • The remaining red status lights are from the "FB NET" diagnostics, which are reflecting a missing status bit from the front end processes due to the fact that they were compiled with an earlier RCG version (3.0.3) than the mx_streams were (3.3+/trunk).  There will be a new release of the RTS soon, at which point we'll compile everything from the same version, which should get us all green again.
  • The entire system has been fully modernized, to the target CDS reference OS (Debian jessie) and more recent RCG versions.  The management of the various RTS components, both on the front ends and on fb, have as much as possible been updated to use the modern management tools (e.g. systemd, udev, etc.).  These changes need to be documented.  In particular...
  • The fb daqd process has been split into three separate components, a configuration that mirrors what is done at the sites and appears to be more stable: The "target" directory for all of these components is now:
    • daqd_dc: data concentrator (receives data from front ends)
    • daqd_fw: receives frames from dc and writes out full frames and second/minute trends
    • daqd_rcv: NDS1 server (raises test points and receives archive data from frames from 'nds' process)
    The "target" directory for all of these new components is:
    • /opt/rtcds/caltech/c1/target/daqd
    All of these processes are now managed under systemd supervision on fb, meaning the daqd restart procedure has changed.  This needs to be simplified and clarified.
  • Second trend frames are being written, but for some reason they're not accessible over NDS.
  • Have not had a chance to verify minute trend and raw minute trend writing yet.  Needs to be confirmed.
  • Get wiper script working on new fb.
  • Front end RTS kernel will occaissionally crash when the RTS modules are unloaded.  Keith Thorne apparently has a kernel version with a different set of patches from Gerrit Kuhn that does not have this problem.  Keith's kernel needs to be packaged and installed in the front end diskless root.
  • The models accessing the dolphin shared memory will ALL crash when one of the front end hosts on the dolphin network goes away.  This results in a boot fest of all the dolphin-enabled hosts.  Need to figure out what's going on there.
  • The RCG settings snapshotting has changed significantly in later RCG versions.  We need to make sure that all burt backup type stuff is still working correctly.
  • Restoration of /frames from old fb SCSI RAID?
  • Backup of entirety of fb1, including fb1 root (/) and front end diskless root (/diskless)
  • Full documentation of rebuild procedure from Jamie's notes.
  13205   Mon Aug 14 19:41:46 2017 JamieUpdateCDSfront-end/DAQ network down for kernel upgrade, and timing errors

I'm upgrading the linux kernel for all the front ends to one that is supposedly more stable and won't freeze when we unload RTS models (linux-image-3.2.88-csp).  Since it's a different kernel version it requires rebuilds of all kernel-related support stuff (mbuf, symmetricom, mx, open-mx, dolphin) and all the front end models.  All the support stuff has been upgraded, but we're now waiting on the front end rebuilds, which takes a while.

Initial testing indicates that the kernel is more stable; we're mostly able to unload/reload RTS modules without the kernel freezing.  However, the c1iscey host seems to be oddly problematic and has frozen twice so far on module unloads.  None of the other hosts have frozen on unload (yet), though, so still not clear.

We're now seeing some timing errors between the front ends and daqd, resulting in a "0x4000" status message in the 'C1:DAQ-DC0_*_STATUS' channels.  Part of the problem was an issue with the IRIG-B/GPS receiver timing unit, which I'll log in a separate post.  Another part of the problem was a bug in the symmetricom driver, which has been resolved.  That wasn't the whole problem, though, since we're still seeing timing errors.  Working with Jonathan to resolve.

  13215   Wed Aug 16 17:05:53 2017 JamieUpdateCDSfront-end/DAQ network down for kernel upgrade, and timing errors

The CDS system has now been up moved to a supposedly more stable real-time-patched linux kernel (3.2.88-csp) and RCG r4447 (roughly the head of trunk, intended to be release 3.4).  With one major and one minor exception, everything seems to be working:

The remaining issues are:

  • RFM network down.  The IOP models on all hosts on the RFM network are not detecting their RFM cards.  Keith Thorne thinks that this is because of changes in trunk to support the new long-range PCIe that will be used at the sites, and that we just need to add a new parameter to the cdsParameters block in models that use RFM.  Him and Rolf are looking into for us.
  • The 3.2.88-csp kernel is still not totally stable.  On most hosts (c1sus, c1ioo, c1iscex) it seems totally fine and we're able to load/unload models without issue.  c1iscey is definitely problematic, frequently freezing on module unload.  There must be a hardware/bios issue involved here.  c1lsc has also shown some problems.  A better kernel is supposedly in the works.
  • NDS clients other than DTT are still unable to raise test points.  This appears to be an issue with the daqd_rcv component (i.e. NDS server) not properly resolving the front ends in the GDS network.  Still looking into this with Keith, Rolf, and Jonathan.

Issues that have been fixed:

  • "EDCU" channels, i.e. non-front-end EPICS channels, are now being acquired properly by the DAQ.  The front-ends now send all slow channels to the daq over the MX network stream.  This means that front end channels should no longer be specified in the EDCU ini file.  There were a couple in there that I removed, and that seemed to fix that issue.
  • Data should now be recorded in all formats: full frames, as well as second, minute, and raw_minute trends
  • All FE and DAQD diagnostics are green (other than the ones indicating the problems with the RFM network).  This was fixed by getting the front ends models, mx_stream processes, and daqd processes all compiled against the same version of the advLigoRTS, and adding the appropriate command line parameters to the mx_stream processes.
  13217   Wed Aug 16 18:01:28 2017 JamieUpdateCDSfront-end/DAQ network down for kernel upgrade, and timing errors
Quote:

What's the current backup situation?

Good question.  We need to figure something out.  fb1 root is on a RAID1, so there is one layer of safety.  But we absolutely need a full backup of the fb1 root filesystem.  I don't have any great suggestions, other than just getting an external disk, 1T or so, and just copying all of root (minus NFS mounts).

  13219   Wed Aug 16 18:50:58 2017 JamieUpdateCDSfront-end/DAQ network down for kernel upgrade, and timing errors
Quote:

The remaining issues are:

  • RFM network down.  The IOP models on all hosts on the RFM network are not detecting their RFM cards.  Keith Thorne thinks that this is because of changes in trunk to support the new long-range PCIe that will be used at the sites, and that we just need to add a new parameter to the cdsParameters block in models that use RFM.  Him and Rolf are looking into for us.

RFM network is back!  Everything green again.

Use of RFM has been turned off in adLigoRTS trunk in favor of the new long-range PCIe networking being developed for the sites.  Rolf provided a single-line patch that re-enables it:

controls@c1sus:/opt/rtcds/rtscore/trunk 0$ svn diff
Index: src/epics/util/feCodeGen.pl
===================================================================
--- src/epics/util/feCodeGen.pl    (revision 4447)
+++ src/epics/util/feCodeGen.pl    (working copy)
@@ -122,7 +122,7 @@
 $diagTest = -1;
 $flipSignals = 0;
 $virtualiop = 0;
-$rfm_via_pcie = 1;
+$rfm_via_pcie = 0;
 $edcu = 0;
 $casdf = 0;
 $globalsdf = 0;
controls@c1sus:/opt/rtcds/rtscore/trunk 0$

This patched was applied to RTS source checkout we're using for the FE builds (/opt/rtcds/rtscore/trunk, which is r4447, and is linked to /opt/rtcds/rtscore/release).  The following models that use RFM were re-compiled, re-installed, and re-started:

  • c1x02
  • c1rfm
  • c1x03
  • c1als
  • c1x01
  • c1scx
  • c1asx
  • c1x05
  • c1scy
  • c1tst

The re-compiled models now see the RFM cards (dmesg log from c1ioo):

[24052.203469] c1x03: Total of 4 I/O modules found and mapped
[24052.203471] c1x03: ***************************************************************************
[24052.203473] c1x03: 1 RFM cards found
[24052.203474] c1x03:     RFM 0 is a VMIC_5565 module with Node ID 180
[24052.203476] c1x03: address is 0xffffc90021000000
[24052.203478] c1x03: ***************************************************************************

This cleared up all RFM transmission error messages.

CDS upstream are working to make this RFM usage switchable in a reasonable way.

  13258   Mon Aug 28 08:47:32 2017 JamieSummaryLSCFirst cavity length reconstruction with a neural network
Quote:

Phenomenal!

truly.

  15742   Mon Dec 21 09:28:50 2020 JamieConfigurationCDSUpdated CDS upgrade plan
Quote:

Attached is the layout for the "intermediate" CDS upgrade option, as was discussed on Wednesday. Under this plan:

  • Existing FEs stay where they are (they are not moved to a single rack)

  • Dolphin IPC remains PCIe Gen 1

  • RFM network is entirely replaced with Dolphin IPC

Please send me any omissions or corrections to the layout.

I just want to point out that if you move all the FEs to the same rack they can all be connected to the Dolphin switch via copper, and you would only have to string a single fiber to every IO rack, rather than the multiple now (for network, dolphin, timing, etc.).

  16299   Wed Aug 25 18:20:21 2021 JamieUpdateCDSGPS time on fb1 fixed, dadq writing correct frames again

I have no idea what happened to the GPS timing on fb1, but it seems like the issue was coincident with the power glitch on Monday.

As was noted by Koji above, the GPS time kernel interface was off by a year, which was causing the frame builder to write out files with the wrong names.  fb1 was using DAQD components from the advligorts 3.3 release, which used the old "symmetricom" kernel module for the GPS time.  This old module was also known to have issues with time offsets.  This issue is remniscent of previous timing issues with the DAQ on fb1.

I noted that a newer version of the advligorts, version 3.4, was available on debian jessie, the system running on fb1.  advligorts 3.4 includes a newer version of the GPS time module, renamed gpstime.  I checked with Jonathan Hanks that the interfaces did not change between 3.3 and 3.4, and 3.4 was mostly a bug fix and packaging release, so I decided to upgrade the DAQ to get the new components.  I therefore did the following

  • updated the archive info in /etc/apt/sources.list.d/cdssoft.list, and added the "jessie-restricted" archive which includes the mx packages: https://git.ligo.org/cds-packaging/docs/-/wikis/home

  • removed the symmetricom module from the kernel

    sudo rmmod symmetricom

  • upgraded the advligorts-daqd components (NOTE I did not upgrade the rest of the system, although there are outstanding security upgrades needed):

    sudo apt install advligorts-daqd advligorts-daqd-dc-mx

  • loaded the new gpstime module and checked that the GPS time was correct:

    sudo modprobe gpstime

  • restarted all the daqd processes

    sudo systemctl restart daqd_*

Everything came up fine at that point, and I checked that the correct frames were being written out.

  16302   Thu Aug 26 10:30:14 2021 JamieConfigurationCDSfront end time synchronization fixed?

I've been looking at why the front end NTP time synchronization did not seem to be working.  I think it might not have been working because the NTP server the front ends were point to, fb1, was not actually responding to synchronization requests.

I cleaned up some things on fb1 and the front ends, which I think unstuck things.

On fb1:

  • stopped/disabled the default client (systemd-timesyncd), and properly installed the full NTP server (ntp)
  • the ntp server package for debian jessie is old-style sysVinit, not systemd.  In order to make it more integrated I copied the auto-generated service file to /etc/systemd/system/ntp.service, and added and "[install]" section that specifies that it should be available during the default "multi-user.target".
  • "enabled" the new service to auto-start at boot ("sudo systemctl enable ntp.service") 
  • made sure ntp was configured to serve the front end network ('broadcast 192.168.123.255') and then restarted the server ("sudo systemctl restart ntp.service")

For the front ends:

  • on fb1 I chroot'd into the front-end diskless root (/diskless/root) and manually specifed that systemd-timesyncd should start on boot by creating a symlink to the timesyncd service in the multi-user.target directory:
$ sudo chroot /diskless/root
$ cd /etc/systemd/system/multi-user.target.wants
$ ln -s /lib/systemd/system/systemd-timesyncd.service
  • on the front end itself (c1iscex as a test) I did a "systemctl daemon-reload" to force it to reload the systemd config, and then restarted the client ("systemctl restart systemd-timesyncd")
  • checked the NTP synchronization with timedatectl:
controls@c1iscex:~ 0$ timedatectl 
      Local time: Thu 2021-08-26 11:35:10 PDT
  Universal time: Thu 2021-08-26 18:35:10 UTC
        RTC time: Thu 2021-08-26 18:35:10
       Time zone: America/Los_Angeles (PDT, -0700)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2021-03-14 01:59:59 PST
                  Sun 2021-03-14 03:00:00 PDT
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2021-11-07 01:59:59 PDT
                  Sun 2021-11-07 01:00:00 PST
controls@c1iscex:~ 0$ 

Note that it is now reporting "NTP enabled: yes" (the service is enabled to start at boot) and "NTP synchronized: yes" (synchronization is happening), neither of which it was reporting previously.  I also note that the systemd-timesyncd client service is now loaded and enabled, is no longer reporting that it is in an "Idle" state and is in fact reporting that it synchronized to the proper server, and it is logging updates:

controls@c1iscex:~ 0$ sudo systemctl status systemd-timesyncd
â— systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
   Active: active (running) since Thu 2021-08-26 10:20:11 PDT; 1h 22min ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 2918 (systemd-timesyn)
   Status: "Using Time Server 192.168.113.201:123 (ntpserver)."
   CGroup: /system.slice/systemd-timesyncd.service
           â””─2918 /lib/systemd/systemd-timesyncd

Aug 26 10:20:11 c1iscex systemd[1]: Started Network Time Synchronization.
Aug 26 10:20:11 c1iscex systemd-timesyncd[2918]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 26 10:20:11 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 64s/+0.000s/0.000s/0.000s/+26ppm
Aug 26 10:21:15 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 128s/-0.000s/0.000s/0.000s/+25ppm
Aug 26 10:23:23 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 256s/+0.001s/0.000s/0.000s/+26ppm
Aug 26 10:27:40 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 512s/+0.003s/0.000s/0.001s/+29ppm
Aug 26 10:36:12 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 1024s/+0.008s/0.000s/0.003s/+33ppm
Aug 26 10:53:16 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 2048s/-0.026s/0.000s/0.010s/+27ppm
Aug 26 11:27:24 c1iscex systemd-timesyncd[2918]: interval/delta/delay/jitter/drift 2048s/+0.009s/0.000s/0.011s/+29ppm
controls@c1iscex:~ 0$ 

So I think this means everything is working.

I then went ahead and reloaded and restarted the timesyncd services on the rest of the front ends.

We still need to confirm that everything comes up properly the next time we have an opportunity to reboot fb1 and the front ends (or the opportunity is forced upon us).

There was speculation that the NTP clients on the front ends (systemd-timesyncd) would not work on a read-only filesystem, but this doesn't seem to be true.  You can't trust everything you read on the internet.

  17109   Sun Aug 28 23:14:22 2022 JamieUpdateComputersrack reshuffle proposal for CDS upgrade

@tega This looks great, thank you for putting this together.  The rack drawing in particular is great.  Two notes:

  1. In "1X6 - proposed" I would move the "PEM AA + ADC Adapter" down lower in the rack, maybe where "Old FB + JetStor" are, after removing those units since they're no longer needed.  That would keep all the timing stuff together at the top without any other random stuff in between them.  If we can't yet remove Old FB and the JetStor then I would move the VME GPS/Timing chassis up a couple units to make room for the PEM module between the VME chassis and FB1.
  2. We'll eventually want to move FB1 and Megatron into 1X7, since it seems like there will be room there.  That will put all the computers into one rack, which will be very nice.  FB1 should also be on the KVM switch as well.

I think most of this work can be done with very little downtime.

  17379   Tue Jan 3 12:10:46 2023 JamieUpdateCDSYearly DAQD fix 2023, did not work :(

This whole procedure is no longer needed after the CDS upgrade.  We don't do things in the old hacky way anymore.  The gpstime kernel module does not need to be hacked anymore, and it *should* keep up with leap seconds properly...

That said, there was an issue that @christopher.wipf id'd properly:

  • The front end models were not getting the correct GPS time because the offset on their timing cards was off by a year.  This is because the timing cards do not get the year info from IRIG-B, so the offset they had when the models were loaded in 2022 was no longer valid as soon as the year rolled over to 2023.
  • This meant that the data sent from the front ends to the DAQ was a year old, and the DAQ receiver process (cps_recv) was silently dropping the packets because the time stamps were too off.

So to resolve the issue the /usr/share/advligorts/calc_gps_offset.py program needed to be run on the front ends, and all the models needed to be restarted (or all the front ends need to be restarted).  Once that happens the data should start flowing properly.

NOTE the DAQD no longer uses the gpstime kernel module, so the fact that the gpstime module was reporting the wrong time on fb1 did not affect anything.

Anchel restarted the front ends which fixed the issue.

We got new timing hardware from Keith at LLO which will hopefully resolve the year issue (if it's ever installed) so that we won't have to go through these shenanigans again when the year rolls over.  The sites don't have to deal with this anymore.

Issues reported upstream:

  •  better logging from the cps_recv process to report why there is no data coming through
  •  fix packaging for advligorts-daqd to not depend on the gpstime kernel module

Also note that the code in /opt/rtcds/rtscore is completely obsolete and is not used by anything and that whole directory should be purged

  7173   Tue Aug 14 11:33:14 2012 Jamie Alex DenUpdateCDSAI and AA filters

When signals are transmitted between the models running at different rates, no AI or AA filters are automatically applied. We need to fix our models.

ai.png

  10188   Fri Jul 11 22:02:52 2014 Jamie, ChrisOmnistructureCDScdsutils: multifarious upgrades

To make the latest cdsutils available in the control room, we've done the following:

Upgrade pianosa to Ubuntu 12 (cdsutils depends on python2.7, not found in the previous release)

  • Enable distribution upgrades in the Ubuntu Software Center prefs
  • Check for updates in the Update Manager and click the big "Upgrade" button

Note that rossa remains on Ubuntu 10 for now.

Upgrade cdsutils to r260

  • Instructions here
  • cdsutils-238 was left as the default pointed to by the cdsutils symlink, for rossa's sake

Built and installed the nds2-client (a cdsutils dependency)

  • Checked out the source tree from svn into /ligo/svncommon/nds2
  • Built tags/nds2_client_0_10_5 (install instructions are here; build dependencies were installed by apt-get on chiara)
  • ./configure --prefix=/ligo/apps/ubuntu12/nds2-client-0.10.5; make; make install
  • In /ligo/apps/ubuntu12: ln -s nds2-client-0.10.5 nds2-client

nds2-client was apparently installed locally as a deb in the past, but the version in lscsoft seems broken currently (unknown symbols?). We should revisit this.

Built and installed pyepics (a cdsutils dependency)

  • Download link to ~/src on chiara
  • python setup.py build; python setup.py install --prefix=/ligo/apps/ubuntu12/pyepics-3.2.3
  • In /ligo/apps/ubuntu12: ln -s pyepics-3.2.3 pyepics

pyepics was also installed as deb before; should revisit when Jamie gets back.

Added the gqrx ppa and installed gnuradio (dependency of the waterfall plotter)

Added a test in /ligo/apps/ligoapps-user-env.sh to load the new cdsutils only on Ubuntu 12.

The end result:

controls@chiara|~ > z
usage: cdsutils  

Advanced LIGO Control Room Utilites

Available commands:

  read         read EPICS channel value
  write        write EPICS channel value
  switch       switch buttons in standard LIGO filter module
  avg          average one or more NDS channels for a specified amount of seconds
  servo        servos channel with a simple integrator (pole at zero)
  trigservo    servos channel with a simple integrator (pole at zero)
  audio        Play channel as audio stream.
  dv           Plot time series of channels from NDS.
  water        Live waterfall plotter for LIGO data

  version      print version info and exit
  help         this help

Add '-h' after individual commands for command help.
 
 
  13207   Mon Aug 14 20:12:09 2017 Jamie, GautumUpdateCDSWeird problem with GPS receiver

Today we saw a weird issue with the GPS receiver (EndRun Technologies Tempus LX).  GPS timing on fb1 (which is handled via IRIG-B connection to the receiver with a spectracom card) was off by +18 seconds.  We tried resetting the GPS receiver and it still came up with +18 second offset.  To be clear, the GPS receiver unit itself was showing a time on it's front panel that looked close enough to 24-hour UTC, but was off by +18s.  The time also said "GPS" vertically to the right of the time.

We started exploring the settings on the GPS receiver and found this menu item:

Clock -> "Time Mode" -> "UTC"/"GPS"/"Local"

The setting when we found it was "GPS", which seems logical enough.  However, when we switched it to "UTC" the time as shown on the front panel was correct, now with "UTC" vertically to the right of the time, and fb1 was then showing the correct GPS time.

From the manual:

Time Mode
Time mode defines the time format used for the front-panel time display and, if installed, the optional
time code or Serial Time output. The time mode does not affect the NTP output, which is always
UTC. Possible values for the time mode are GPS, UTC, and local time. GPS time is derived from
the GPS satellite system. UTC is GPS time minus the current leap second correction. Local time is
UTC plus local offset and Daylight Savings Time. The local offset and daylight savings time displays
are described below.

The fact that moving to "UTC" fixed the problem, even though that is supposed to remove the leap second correction, might indicate that there's another bug in the symmetricom driver...

  13211   Tue Aug 15 16:32:42 2017 Jamie, GautumUpdateCDSGPS receiver apparently set to correct mode as "UTC"
Quote:

The setting when we found it was "GPS", which seems logical enough.  However, when we switched it to "UTC" the time as shown on the front panel was correct, now with "UTC" vertically to the right of the time, and fb1 was then showing the correct GPS time.

From Keith Thorne:

In the GPS receiver, you are trying to match the IRIG-B output format that is created by the aLIGO IRIG-B Fanout.  Since we have to prep the aLIGO IRIG-B Fanout every time there is a leap second coming, I would suspect that we are sending UTC to the IRIG-B receivers.  Thus, the GPS receiver needs to be set to that mode.

Soooo, "UTC" is the correct mode for the GPS receiver.

  4993   Tue Jul 19 23:39:11 2011 Jamie, JenneUpdateLSCMajor overhaul of LSC rack; binary switching of whitening filters now working

Yesterday we started going through the LSC binary whitening switching to make sure the new switching control in the LSC model was working.  Jenne and I hooked up a fancy home-brew white noise generator [0] into all of the LSC whitening filter inputs and started switching the whitening filters to see what would happen.  We found that some of the channels were switching, but the majority were not, or worse yet switching the wrong channels.  Upon closer inspection, and after finally finding the LSC wiring schematic, we found that the LSC rack cross-connect/back-plane cabling was pretty much a complete mess, and didn't at all correspond to the channel layout in Suresh's diagram.

Given that the back-plane wiring had to be almost completely redone, we decided to completely redo the LSC electronics layout, to be a little more self-consistent and to use the given space more efficiently.  We'll post an updated electronics drawing soon.  The LSC model was also updated to reflect the new layout.

We then went through and verified that all of the whitening switching was working with the new layout.  As described previously, the first filter in the PD input filter bank is used to control the switching.  We did indeed verify that all the switching is working, but we noticed that switching logic was inverted, such that the whitening filter engaged when the filter was turned off.  This was fixed in the model and all the switching logic was verified to be working as expected.

Everything has now been hooked back up, but we need to verify that we're getting all of the PD demodulated RF and DC outputs as expected.  We need to check the RF phases, as some of the RF cable lengths have changed.

[0] a 50k resistor

Links:

 

  4790   Mon Jun 6 18:29:01 2011 Jamie, JoeUpdateCDSCOMPLETE FRONT-END REBUILD (WITH PROBLEMS (fixed))

Today Joe and I undertook a FULL rebuild of all front end systems with the head of the 2.1 branch of the RCG.  Here is the full report of what we did:

  1. checked out advLigoRTS/branches/branch-2.1, r2457 into core/branches/branch-2.1
  2. linked core/release to branches/branch-2.1
  3. linked in models to core/release/src/epics/simLink using Joe's new script (userapps/release/cds/c1/scripts/link_userapps)
  4. remove unused/non-up-to-date models:
  5. c1dafi.md
    c1lsp.mdl
    c1gpv.mdl
    c1sup_vertex_plant_shmem.mdl
  6. modified core/release/Makefile so that it can find models:
  7. --- Makefile	(revision 2451)
    +++ Makefile (working copy)
    @@ -346,7 +346,7 @@
    #MDL_MODELS = x1cdst1 x1isiham x1isiitmx x1iss x1lsc x1omc1 x1psl x1susetmx x1susetmy x1susitmx x1susitmy x1susquad1 x1susquad2 x1susquad3 x1susquad4 x1x12 x1x13 x1x14 x1x15 x1x16 x1x20 x1x21 x1x22 x1x23

    #MDL_MODELS = $(wildcard src/epics/simLink/l1*.mdl)
    -MDL_MODELS = $(shell cd src/epics/simLink; ls m1*.mdl | sed 's/.mdl//')
    +MDL_MODELS = $(shell cd src/epics/simLink; ls c1*.mdl | sed 's/.mdl//')

    World: $(MDL_MODELS)
    showWorld:
  8. removed channel files for models that we know will be renumbered
    • For this rebuild, we are also building modified sus models, that are now using libraries, so the channel numbering is changing.
  9. make World
    • this makes all the models
  10. make installWorld
    • this installs all the models
  11. Run activateDQ.py script to activate all the relevant channels
    • this script was modified to handle the new "_DQ" channels
  12. make/install new awgtpman:
  13. cd src/gds
    make
    cp awgtpman /opt/rtcds/caltech/c1/target/gds/bin
  14. turn off all watchdogs
  15. test restart one front end: c1iscex
  16. BIG PROBLEM

    The c1iscex models (c1x01 and c1scx) did not come back up.  c1x01 was running long on every cycle, until the model crashed and brought down the computer.  After many hours, and with Alex's help, we managed to track down the issue to a patch from Rolf at r2361.  The code included in that patch should have been wrapped in an "#ifndef RFM_DIRECT_READ".  This was fixed and committed to branches/branch-2.1 at r2460 and to trunk at r2461.

  17. update to core/branches/branch-2.1 to r2460
  18. make World && make installWorld with the new fixed code
  19. restarted all computers
  20. restart frame builder
  21. burt restored to 8am this morning
  22. turned on all watchdogs

Everything is now green, and things seem to be working.  Mode cleaner is locked.  X arm locked.

 

  4809   Mon Jun 13 15:33:55 2011 Jamie, JoeUpdateCDSDolphin fiber between 1Y3 and 1X4 appears to be dead

The fiber that connects the Dolphin card in the c1lsc machine (in the 1Y3 rack) to the Dolphin switch in the 1X4 rack appears to have died spontaneously this morning.  This was indicated by loss of Dolphin communication between c1lsc and c1sus.

We definitively tracked it down to the fiber by moving the c1lsc machine over to 1X4 and testing the connection with a short cable.  This worked fine.  Moving it back to using the fiber again failed.

Unfortunately, we have no replaced Dolphin fiber.  As a work around, we are stealing  a long computer->IO chassis cable from Downs and moving the c1lsc machine to 1X4.

This is will be a permanent reconfiguration.  The original plan was to have the c1lsc machine also live in 1X4.  The new setup will put the computer farther from the RF electronics, and more closely mimic the configuration at the sites, both of which are good things.

  4811   Mon Jun 13 18:40:08 2011 Jamie, JoeUpdateCDSSnags in the repair of LSC CDS

We've run into a problem with our attempts to get the LCS control back up and running.

As reported previously, the Dolphin fiber connection between c1lsc and c1sus appears to be dead.  Since we have no replacement fiber, the solution was to move the c1lsc machine in to the 1X4 rack, which would allow us to use one of the many available short Dolphin cables, and then use a long fiber PCIe extension cable to connect c1lsc to it's IO chassis.  However, the long PCIe extension cable we got from Downs does not appear to be working with our setup.  We tested the cable with c1sus, and it does not seem to work.

We've run out of options today.  Tomorrow we're going to head back to Downs to see if we can find a cable that at least works with the test-stand setup they have there.

  4812   Mon Jun 13 19:26:42 2011 Jamie, JoeConfigurationCDSSUS binary IO chassis 2 and 3 moved from 1X5 to 1X4

While preping 1X4 for installation of c1lsc, we removed some old VME crates that were no longer in use.  This freed up lots of space in 1X4.  We then moved the SUS binary IO chassis 2 and 3, which plug into the 1X4 cross-connect, from 1X5 into the newly freed space in 1X4.  This makes the cable run from these modules to the cross connect much cleaner.

  4816   Tue Jun 14 12:23:44 2011 Jamie, JoeUpdateCDSWE ARE ALL GREEN! LSC back up and running in new configuration.

After moving the c1lsc computer to 1X4, then connecting c1lsc to it's IO chassis in 1Y3 by a fiber PCIe extension cable, everything is back up and running and the status screen is all green.  c1lsc is now directly connected to c1sus via a short copper Dolphin cable.

After lunch we will do some more extensive testing of the system to make sure everything is working as expected.

  4733   Tue May 17 18:09:13 2011 Jamie, KiwamuConfigurationCDSc1sus and c1auxey crashed, rebooted

c1sus and c1auxey crashed, required hard reboot

For some reason, we found that c1sus and c1auxey were completely unresponsive.  We went out and gave them a hard reset, which brought them back up with no problems.

This appears to be related to a very similar problem report by Kiwamu just a couple of days ago, where c1lsc crashed after editing the C1LSC.ini and restarting the daqd process, which is exactly what I just did (see my previous log).  What could be causing this?

  4818   Tue Jun 14 18:12:34 2011 Jamie, KiwamuUpdateLSCLSC seems to be fully recovered

We are now locking the arms reliably, with reasonable transmitted power.  We zeroed the LSC offsets with script, since they were apparently not being reset with either the overall burt restore or the arm restore scripts.

We have lost a bit of power through the mode cleaner.  However, we have opted not to tweak it up just yet, so that we don't have to realign to the arms.

  5012   Thu Jul 21 12:19:29 2011 Jamie, KiwamuUpdateIOOMC Trans QPD working, now locking

It turns out that the MC_TRANS_SUM signal was being derived from the SUS-MC2_OL_SUM_INMON channel in the ioo.db file. 

However, this channel name was recently changed to SUS-MC2_OLSUM_INMON (no underscore between OL and SUM) when

I added the new OL_SUM epics channel to the sus_single_control library model (I forgot to mention it in my previous log on this change,

apologies).  This is why there appeared to be no signal.  This was also what was preventing the mode cleaner from locking, since

the MC_TRANS_SUM signal is used as a trigger in the MC autolocker script.

We modified the ioo.db file at /cvs/cds/caltech/target/c1iool0/ioo.db [0,1] to change the name of the channel that the

C1:IOO-MC_TRANS_SUM signal is derived from.  The diff on the ioo.db file is:

--- /cvs/cds/caltech/target/c1iool0/ioo.db	2011-07-21 11:43:44.000000000 -0700
+++ /cvs/cds/caltech/target/c1iool0/ioo.db.2011Jul21	2011-07-21 11:43:36.000000000 -0700
@@ -303,7 +303,7 @@
 {
         field(DESC,"MC2 Trans QPD Sum")
         field(PREC,"1")
-        field(INPA, "C1:SUS-MC2_OLSUM_INMON")
+        field(INPA, "C1:SUS-MC2_OL_SUM_INMON")
         field(SCAN, ".1 second")
         field(CALC, "A+0.001")
 }

We then rebooted the c1iool0 machine, and when it came back up the MC_TRANS_SUM channel was showing the correct values.

We then found that the MC autolocker was not running, presumably because it had crashed after the channel rename?

In any event, we logged in to op340m and restarted the autolockerMCmain40m script.

The mode cleaner is now locked.

[0] Rana's log where this was initially defined

[1] All of the slow channel stuff is still in the old /cvs/cds/caltech path.  This needs to be fixed.

 

  4808   Mon Jun 13 12:34:21 2011 Jamie, KojiUpdateLSCUpdated LSC model installed

After a couple of hickups, I was able to compile and install Koji's rework of the LSC model.

The main changes are that the model now use an RF_PD library part, and the channel names were tweaked to be more in line with what we expect.

I found a couple of small bugs in the model that were preventing it from compiling.  Those were fixed and it compiled with no further problems.

There was also some rearrangement of signal inputs to the PD_DOF matrix.  The matrix screen was updated to reflect the proper inputs.  However, this also meant that the burt restore scripts for the IFO configurations were setting the wrong elements in the matrix.  I fixed the settings for X and Y arm locking, and updated the burt snapshots using the burt/c1ifoconfigure/C1save{X,Y}arm scripts.  NOTE: burt settings will need to be updated for the MICH, PRM, DRM, and FULL IFO configurations as well.

During the build/install process, Joe and I also found a bug in the feCodeGen that was causing the filter screens to be created with the wrong names.  Joe sent out a patch that will hopefully be merged soon.  Building the model with Joe's patch fixed the screen names, so the screens are currently named correctly.

  8598   Fri May 17 18:58:58 2013 Jamie, KojiSummaryCDSWeird DAC bit flipping at half integer output values

While looking at the DAC anti-imaging filters, Koji noticed an odd feature of the DAC output:

sweep.pdf

What you see here is 16kHz double data from a model right before the DAC part ('C1:SUS-PRM_ULCOIL_OUT', blue), and the full 64kHz int data sent to the physical DAC as reported by the IOP ('C1:X02-MDAC0_TP_CH0', green).  The balls are the actual sample values (as expected there are four green balls for every blue).  The output data is being ramped continuously between 0 and 1.

As the output data crosses the half-count level, the integer DAC output oscillates continuously at every 64kHz sample between the bounding integer values (in this case 0 and 1).

Here's the data as we hold the output continuously at the half-count level; the integer DAC output just oscillates continuously:

const.pdf

After some probing I found that the oscillation happens between [-0.003 +0.004] of the half-count level.

The result of this is a fairly strong 32 kHz line in the DAC analog output.

We looked in the controller.c and couldn't identify anything that would be doing this.  This is the output procedure as I can see it (controller.c lines): 

  1. The double from the model is passed to the IOP
  2. The IOP applies a sample-and-hold or zero-pad if the model is running at a slower speed than the IOP (1799)
  3. The data is then anti-image filtered (1801)
  4. A half-integer is added/subtracted before casting such that the cast is a round instead of a floor (1817)
  5. The data double is cast to an int (1819)
  6. The data is written to the DAC (1873)

There's nothing there that would indicate this sort of bit flipping.

  4493   Wed Apr 6 18:55:49 2011 Jamie, LarisaConfigurationLSCmajor AP table cleanup

We ripped out all of the old AS, PLL, and REFL paths, green, orange, and cyan respectively on the old AP table layout photo:

  • AS (green): had already been re-purposed by putting a ThorLabs diode right after the first steering mirror.   Everything downstream of that has been removed.
  • PLL (orange): everything removed.
  • REFL (cyan): CCD was left in place, so everything upstream of that was not touched.  Everything else was removed, including all of the REFL detectors.
  • OMCT (purple): previously removed
  • OMCR (blue): left in place, but the diode and CCD are not connected (found that way).
  • MCT (magenta): previously removed.
  • IMRC (red): untouched

All optics and components were moved to the very south end of the SP table.

We also removed all spurious cables from the table top, and from underneath, as well as pulled out no-longer-needed power supplies.

  8545   Tue May 7 20:09:10 2013 Jamie, RanaUpdatePSLPMC problem was FSS slow actuator

Rana showed up and diagnosed the problem as a railed FSS SLOW output.  The SLOW Monitor about was showing ~6V, which is apparently a bad mode-hoppy place for the NPRO.  Reducing the SLOW output brought things back into a good range which allowed the PMC to lock again.

In attempting to diagnose the problem I noticed that there is -100 mV DC coming out of the PMC RFPD RF output.  This is not good, probably indicating a problem, and was what I thought was the PMC lock issue for a while.    Need to look at the PMC RFPD RF output.

  4868   Thu Jun 23 21:35:46 2011 Jamie, Rana, KiwamuUpdateSUSFix calibration for sus sensors

We have fixed the counts-to-micron (cts2um) calibration for the suspension sensor filters. Each suspension sensor filter bank (e.g. ULSEN) has a "cts2um" calibration filter. These have now been set with the following flat gains:

   40 V       10^3 um         um
 -------- *  --------  = .36  --
 2^16 cts     1.7 V           ct

The INMTRX was also fixed with proper element values:

UL UR LR LL SIDE  
.25 .25 .25 .25 0 POS
1.666 1.666 -1.666

-1.666

0 PIT
1.666 -1.666 -1.666 1.666 0 YAW
0 0 0 0 1 SIDE

This was done for all core optic suspensions (BS, PRM, SRM, ITMX, ITMY, ETMX, ETMY).

 

  7174   Tue Aug 14 11:39:13 2012 Jamie, Rolf, AlexUpdateCDSDebugging of c1sus machine and c1rfm models

Rolf and Alex came over this morning to see if they could help debug some issues we have been seeing with IPC transmission between the c1sus and c1lsc machines.

c1oaf, which runs on c1lsc, sees a lot of transmission errors on it's dolphin receivers from c1rfm, which runs on c1sus.  Their speculation is that c1rfm is trying to process too many channels, and it's not able to read off all the RFM channels and retransmit them over dolphin to c1lsc before the end of cycle.  To test this they turned off all RFM reads on c1rfm and the dolphin receiver errors on c1lsc all went away.  We ran into other problems before I had a chance to pester them about what the take-away is here.  We might just need to reduce the load on c1rfm, maybe by introducing a c1rfm2?

We then tried to debug an issue in the c1sus machine where models would occasionally run slow for a cycle, or run slow when a different model on the machine was loaded or unloaded.  The suspect was BIOS settings.  Unfortunately, we ran into trouble when we tried to tweak the BIOS setting on c1sus.  We found that all the serial/COM ports were on, which is usually a big no-no for the RTS (the interrupts cause many cycle delays).  However, turning off the COM ports prevented the machine from booting at all.  This was a big mystery.  The machine seemed to be acting flaky in general as well, since the boot (pre-kernel) would hang in various places after different reboots.  Alex went to grab us a spare machine that we're going to try swapping out this afternoon.

  4962   Tue Jul 12 11:52:54 2011 Jamie, SureshUpdateLSCLSC model updates

The LSC model has been updated:

Binary outputs to control whitening filter switching

We now take the filter state bit from the first filter bank in all RF PD I/Q filter banks (AS55_I, REFL11_Q, etc) as the controls for the binary analog whitening switching on the RF PD I/Q inputs. The RF_PD part was also modified to output this control bit. The bits from the individual PDs are then combined into the various words that are written to the Contec BO part.

Channel mapping updated/fixed to reflect wiring specification

Yesterday Suresh posted an updated LSC wiring diagram, with correct channel assignments for the RF PD I/Q and DC inputs.  Upon inspection of the physical hardware we found that some of LSC the wiring was incorrect, with I/Q channels swapped, and some of the PDs in the wrong channels.  We went through and fixed the physical wiring to reflect the diagram.  This almost certainly will affect the EPICS settings for some of the input channels, such as offsets and RD rotations.  We should therefore go through all of the RF inputs and make sure everything is kosher.

I also fixed all of the wiring in the LSC model to also reflect the diagram.

Once this was all done, I rebuilt and restarted the LSC model, and confirmed that the anti-whitening filter banks in the PD input filter modules were indeed switching the correct bits.  I'll next put together a script to confirm that the LSC PD whitening is switching as it should.

 

  8225   Mon Mar 4 19:52:03 2013 Jamie, YutaUpdateGeneralinput pointing mirror (TT1/TT2) control improved

We improved the active tip-tilt (TT) controllers such that they now have filter banks at the PIT/YAW inputs, and at the coil outputs:

TT.png

This allowed us to do a couple of things:

  • normalize the matrix to unity, and move overall gains into the filters (we moved x100 gain into PIT/YAW)
  • slider control is now PIT/YAW OFFSET
  • potentially do coil balancing
  • allow for input excitiations
  • automatically record EPICS values

These are all big improvements.  The TT MEDM screens were appropriately updated.

We had to rebuild/restart c1ass, which reset the TT pointing.  We recorded all the values before hand and were able to recover the pointing easily.  Interestingly, there did appear to be hysteresis in pitch, which is maybe not entirely unexpected, but still worth nothing.

 

  13133   Sun Jul 23 22:16:55 2017 Jamie, gautamUpdateCDSfront-end now running with new OS, RCG

All front ends and model are (mostly) running now

All suspensions are damped:

It should be possible at this point to do more recovery, like locking the MC.

Some details on the restore process:

  • all models were recompiled with the new RCG version 3.0.3
  • the new RCG does stricter simulink drawing checks, and was complaining about unterminated outputs in some of the SUS models.  Terminated all outputs it was concerned about and saved.
  • RCG 3.0 requires a new directory for doing better filter module diagnostics: /opt/rtcds/caltech/c1/chans/tmp
  • had to reset the slow machines c1susaux, c1auxex, c1auxey

The daqd is not yet running.  This is the next task.

I have been taking copious notes and will fully document the restore process once complete.

c1ioo issues

c1ioo has been giving us a little bit of trouble.  The c1ioo model kept crashing and taking down the whole c1ioo host.  We found a red light on one of the ADCs (ADC1).  We pulled the card and replaced it with a spare from the CDS cabinet.  That seemed to fix the problem and c1ioo became more stable.

We've still been seeing a lot of glitching in c1ioo, though, with CPU cycle times frequently (every couple of seconds) running above threshold for all models, up to 200 us.  I tried unloading every kernel module I could and shutting down every non-critical process, but nothing seemed to help.

We eventually tried stopping the c1ioo model altogether and that seemed to help quite a bit, dropping the long cycle rate down to something like one every 30 seconds or so.  Not sure what that means.  We should look into the BIOS again, to see if there could be something interacting with the newer kernel.

So currently the c1ioo model is not running (which is why it's all white in the CDS overview snapshot above).  The fact that c1ioo is not running and the remaining models are still occaissionly glitching is also causing various IPC errors on auxilliary models (see c1mcs, c1rfm, c1ass, c1asx). 

RCG compile warnings

the new RCG tries to do more checks on custom c code, but it seems to be having trouble finding our custom "ccodeio.h" files that live with the c definitions in USERAPPS/*/common/src/.  Unclear why yet.  This is causing the RCG to spit out warnings like the following:

Cannot verify the number of ins/outs for C function BLRMS.
    File is /opt/rtcds/userapps/release/cds/c1/src/BLRMSFILTER.c
    Please add file and function to CDS_SRC or CDS_IFO_SRC ccodeio.h file.

This are just warnings and will not prevent the model form compiling or warning.  We'll figure out what the problem is to make these go away, but they can be ignored for the time being.

model unload instability

Probably the worst problem we're facing right now is an instability that will occaissionally, but not always, cause the entire front end host to freeze up upon unloading an RTS kernel module.  This is a known issue with the newer linux kernels (we're using kernel version 3.2.35), and is being looked into.

This is particularly annoying with the machines on the dolphin network, since if one of the dolphin hosts goes down it manages to crash all the models reading from the dolphin network.  Since half the time they can't be cleanly restarted, this tends to cause a boot fest with c1sus, c1lsc, and c1ioo.  If this happens, just restart those machines, wait till they've all fully booted, then restart all the models on all hosts with "rtcds start all".

  9135   Tue Sep 17 17:55:42 2013 Jamie.ConfigurationComputer Scripts / Programspyepics configured

Quote:
 controls@rosalba:~ 0$ cdsutils
Traceback (most recent call last):
  File "/ligo/apps/cdsutils/lib/cdsutils/__main__.py", line 7, in <module>
    from cdsutils import CMDS
  File "/ligo/apps/cdsutils/lib/cdsutils/__init__.py", line 4, in <module>
    from servo import servo
  File "/ligo/apps/cdsutils/lib/cdsutils/servo.py", line 1, in <module>
    from epics import PV
ImportError: No module named epics
controls@rosalba:~ 1$
Mon Sep 16 19:40:32 2013 

 I properly installed the python-pyepics package on all the workstations, so this should be working now.

And for posterity, the pyepics source is at:

 pianos:/home/controls/src/pyepics

From this debian packages were built:

 controls@pianosa:~/src/pyepics 0$ debuild -uc -us

The .deb was then moved into the /ligo/apps/deb nfs:

 controls@pianosa:~/src 0$ cp python-pyepics_*_all.deb /ligo/apps/debs/pyepics/

It was then installed on the various workstations:

 controls@rosalba:~ 0$ sudo dpkg -i /ligo/apps/debs/pyepics/python-pyepics*.deb

This will probably need to be repeated any time we upgrade the EPICS install.

  9357   Wed Nov 6 17:21:58 2013 JamitUpdateCDSFB not talking to LSC?

Quote:

Something funny is going on with the framebuilder's communication with the LSC machine. 

This is a different failure mode / error than I have seen before.  It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem.

ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem.  Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually.  All of the models on the lsc machine are running fine now, though.

Here's a screenshot of the CDS overview screen, with the error lights:

Screenshot-Untitled_Window-1.png

This definitely looks like a timing problem on the c1lsc front end computer.  The red lights on the left mean that the timing synchronization is lost at the user model.  I'm perplexed why it looks like the IOP is not seeing the same error, though, since it should originate at the ADC.  The red lights to the right just mean the timing synchronization is lost with the DAQ, which is too be expected given a timing loss at the front end.

We'll have to take a closer look when this happens again.

  7199   Wed Aug 15 20:15:51 2012 JanUpdateIOORingdown measurements

Quote:

While I thought that the bumps observed at the end of the ringdown might be because of the cavity trying to lock itself, Jan commented that they have always existed in these measurements and their source is not known yet.

What I meant to say was that in all ringdown measurements that we observed today, the bumps were consistently part the ringdown, and that I have no explanation for the bumps. It should also be mentioned that fitting the bumpy part of the ringdown instead (we used the clean first 10us), the ringdown time is about twice as high. In either case, the ringdown time is significantly smaller than we have seen in documents about previous measurements.

We (basically I) also made one error when producing the plots. The yaxis label of the semi-logarithmic plot should be log(...), not log10(...).

  16769   Mon Apr 11 11:00:30 2022 JancarloUpdateVACC1VAC Reboot and Nitrogen tanks

[Paco, JC, Ian, Jordan, Chub]

Checking in the morning, I walked over to the Nitrogen tanks to check the levels. Noticed one tank was empty, so I swapped it out. Chub came over to check the levels and to take note of how many tanks were left available for usage (None). Chub continued to put in a work order for a set of full Nitrogen tanks. We should be set on Nitrogen until Thursday this week (4/14/22).

As for C1VAC, this morning, Paco and I attempted to open the PSL shutter, but the interlock system was tripped so we didn't get any light into the IFO. We traced the issue down to C1VAC being unresponsive. We discussed this may have interlocked as a result of the Nitrogen tanks running out, but we do not believe this was the issue since we would have recieved an email. We tried troubleshooting as much as possible avoiding a reboot, but were unable to solve the issue. In response, we ran the idea of a reboot across Jordan and Ian, where everyone was in agreement, and fixed the system. Restarting c1vac seems to have closed V4, but this didn't cause any issues with the current state of the vacuum system.

After opening the PSL shutter again, we see the laser down the IFO, so we resume alignment work

  16784   Mon Apr 18 15:17:31 2022 JancarloUpdateGeneralTool box and Work Station Organization

I cleaned up around the 40 m lab. All the Laser Safety Glasses have been picked up and placed on the rack at the entrance.

Some miscellaneous BNC Connector cables have been arranged and organized along the wall parallel to the Y-Tunnel.

Nitrogen tanks have been swapped out. Current tank is at 1200 psi and the other is at 1850 psi.

The tool box has been organized with each tool in its specified area.

  10879   Thu Jan 8 19:02:42 2015 JaxSummaryElectronicsMC demod modifications

Here's a summary of the changes made to the D990511 serial 115 (formerly known as REFL 33), as well as a short procedure. It needed tuning to 29.5MHz and also had some other issues that we found along the way. 

So here's a picture of it as built:

The changes made are:

1. U11 and U12 changed from 5MHz LP to 10 MHz LP filters.

2. Resistors R8 and R9 moved from their PCB locations to between pins 1 (signal) and 3 (ground) of U11 and U12, respectively. These were put in the wrong place for proper termination so it made sense to shift them while I was already replacing the filters.

Also, please note- whoever labeled the voltages on this board needed an extra cup of coffee that day. There are two separate 15V power supplies, one converted from 24V, one directly supplied. The directly supplied one is labeled 15A. This does NOT mean 15 AMPS.

Transfer functions:

Equipment: 4395A, Signal generator (29.5 MHz), two splitters, one mixer

You can't take the TF from PD in to I/Q out directly. Since this is a demod board, there's a demodulating (downconverting) mixer in the I and Q PD in paths. Negligible signal will get through without some signal applied to the L input of the mixer. In theory, this signal could be at DC, but there are blocking capacitors in the LO in paths. Therefore, you have to upconvert the signal you're using to probe the board's behavior before it hits the board.  Using the 4395A as a network analyzer, split the RF out. RFout1 goes to input R, RFout2 goes to the IF port of the mixer. Split the signal generator (SG). SG1 goes to LO in, SG2 goes to the L port of the mixer. The RF port of the mixer (your upconverted RFout2) goes to PD in, and the I/Q out goes back to the A/B port of the 4395A - at the same frequency as the input, thanks to the board's internal downconversion. 

Phase measurement:

Equipment: Signal generator (29.5 MHz), signal generator (29.501 MHz), oscilloscope

Much simpler: 29.5 MHz to the LO input (0 dBm), 29.501 MHz to the PD input (0 dBm), compare the phases of the I/Q outputs on the oscilloscope. There are four variable capacitors in the circuit that are not on the DCC revision of the board - C28-31. On the LO path, C28 tunes the I phase, C30 tunes the Q phase. On the PD path, C29 and 31 appear to be purely decorative - both are in parallel with each other on the PD in Q path, I'm guessing C29 was supposed to be on the PD in I path. Fortunately, C28 and C30 had enough dynamic range to tune the I/Q phase difference to 90 degrees.

Before tuning:

After tuning:

 

  16802   Fri Apr 22 07:01:58 2022 JcUpdateCoil DriversAdding Resistors and Reinstalling

[Koji, JC]

Coil Drivers LO2, SR2, AS4, and AS1 have been updated a reinstalled into the system. 

LO2 Coil Driver 1 (UL/LL/UR)now has R=100 // 1.2k ~ 92Ohm for CH1/2/3        Unit: S2100008

LO2 Coil Driver 2 (LR/SD)now has R=100 // 1.2k ~ 92Ohm for CH3                    Unit: S2100530

SR2 Coil Driver 1 (UL/LL/UR)now has R=100 // 1.2k ~ 92Ohm for CH1/2/3        Unit: S2100614

SR2 Coil Driver 2 (LR/SD)now has R=100 // 1.2k ~ 92Ohm for CH3                    Unit: S2100615

AS1 Coil Driver 1 (UL/LL/UR)now has R=100 // 1.2k ~ 92Ohm for CH1/2/3        Unit: S2100610

AS1 Coil Driver 2 (LR/SD)now has R=100 // 1.2k ~ 92Ohm for CH3                    Unit: S2100611

AS4 Coil Driver 1 (UL/LL/UR)now has R=100 // 1.2k ~ 92Ohm for CH1/2/3        Unit: S2100612

AS4 Coil Driver 2 (LR/SD)now has R=100 // 1.2k ~ 92Ohm for CH3                    Unit: S2100613

Attachment 1: IMG_0536.jpeg
IMG_0536.jpeg
Attachment 2: IMG_0539.jpeg
IMG_0539.jpeg
Attachment 3: IMG_0542.jpeg
IMG_0542.jpeg
Attachment 4: IMG_0545.jpeg
IMG_0545.jpeg
Attachment 5: IMG_0548.jpeg
IMG_0548.jpeg
Attachment 6: IMG_0551.jpeg
IMG_0551.jpeg
Attachment 7: IMG_0554.jpeg
IMG_0554.jpeg
Attachment 8: IMG_0557.jpeg
IMG_0557.jpeg
  16846   Thu May 12 13:46:59 2022 JcUpdateAlignmentPOP Beam

[Tega, JC]

Tega and I went in to adjust the POP being in the ITMX Table. The beam entered the table high, so we adjusted the this by adding mirrors (The highlighted in Turqoise are mirrors which adjust the pitch of the beam). All the mirrors are set and we are now in the process of adjusting the PD.

Attachment 1: IMG_0777.jpeg
IMG_0777.jpeg
  17561   Tue Apr 25 15:51:21 2023 JcUpdateIMCIMC has been tripping

It has happened multiple times today that IMC has tripped on its own. Yehonathan and I have had to come back to manually lock IMC multiple times.


Wed Apr 26 10:24:07 2023 [EDIT]

[Paco] I aligned the MC by hand, let it run locked for 30 minutes without angular controls, and then switched on the WFS loops yesterday at ~ 6 PM. IMC has been locked ever since.

Attachment 1: Screenshot_2023-04-25_15-55-33.png
Screenshot_2023-04-25_15-55-33.png
ELOG V3.1.3-