40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 242 of 335  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  11266   Fri May 1 16:42:42 2015 ranaUpdateDAQPEM Slow channels added to saved frames

Still processing, but I think it should work fine once we have a day of data. Until then, here's the summary pages so far, including Vac channels:

http://www.ligo.caltech.edu/~misi/summary/day/20150501/pem/

  11627   Mon Sep 21 15:22:19 2015 jamieUpdateDAQworking on new fb replacement

I've been putting together a new machine that Rolf got for us as a replacement for fb.

I've installed and configured the OS, and compiled daqd and the necessary supporting software.  I want to try acquiring data with it.  This will require removing the current/old fb from the DAQ network, and adding the new machine.  It should be able to be done relatively non-invasively, such that none of the front end configuration needs to be adjusted, and the old fb can be put back in place easily.

If the test is successfully, then I'll push ahead with the rest of the replacement (such as either moving or copying the /frames RAID to the new machine).

I will do this work in the early AM tomorrow, September 22, 2015.

  11636   Tue Sep 22 17:30:55 2015 jamieUpdateDAQattempts at getting new fb working

Today I've been trying to get the new frame builder, tentatively 'fb1', to work.  It's not fully working yet, so I'm about to revert the system back to using 'fb'.  The switch-over process is annoying, since our one myrinet card has to be moved between the hosts.

A brief update on the process so far:

I'm being a little bold with this system by trying to build daqd against more system libraries, instead of the manually installed stuff usually nominally required.  Here's some of the relevant info about th fb1 system:

  • Debian 7 (wheezy)
  • lscsoft ldas-tools-framecpp-dev 2.4.1-1+deb7u0
  • lscsoft gds-dev 2.17.2-2+deb7u0
  • lscsoft libmetaio-dev 8.4.0-1+deb7u0
  • lscsoft libframe-dev 8.20-1+deb7u0
  • /opt/rtapps/epics-1.4.12.2_long
  • /opt/mx-1.2.16
  • advLigoRTS trunk

I finally managed to get daqd to build against the advLigoRTS trunk (post 2.9 branch).  I'll post detailed build log once I work out all the kinks.  It runs ok, including writing out full frames, as well as second and minute trends and raw minute trends, but there are a couple of show-stopper problems:

  • daqd segfaults if the C1EDCU.ini is specified.  If I comment out that one file from the 'master' channel ini file list then it runs without segfaulting.
  • Something is going on with the mx_streams from the front ends:
    • They appear to look ok from the daqd side, but the FEC-<ID>_FB_NET_STATUS indicators remain red.  The "DAQ" bit in the STATE_WORD is also red.  Again, this is even though data seems to be flowing.
    • The mx_stream processes on the front ends are dying (and restarting via monit) about every 2 minutes.  It's unclear what exactly is happening, but they all dia around the same time, so it possibly initiated from a daqd problem.  Around the time of the mx_stream failures, we see this in the daqd log:
[Tue Sep 22 17:24:07 2015] GPS MISS dcu 91 (TST); dcu_gps=1127003062 gps=1127003063

Aborted 1 send requests due to remote peer Aborted 1 send requests due to remote peer 00:25:90:0d:75:bb (c1sus:0) disconnected
mx_wait failed in rcvr eid=004, reqn=11; wait did not complete; status code is Remote endpoint is closed
00:30:48:d6:11:17 (c1iscey:0) disconnected
mx_wait failed in rcvr eid=002, reqn=235; wait did not complete; status code is Remote endpoint is closed
disconnected from the sender on endpoint 002
mx_wait failed in rcvr eid=005, reqn=253; wait did not complete; status code is Bad session (missing mx_connect?)
disconnected from the sender on endpoint 005
disconnected from the sender on endpoint 004
[Tue Sep 22 17:24:13 2015] GPS MISS dcu 39 (PEM); dcu_gps=1127003062 gps=1127003069
  • Occaissionally the daqd process dies when the front end mx_streams processes die.

I'll keep investigating, hopefully with some feedback from Keith and Rolf tomorrow.

  11645   Fri Sep 25 17:51:11 2015 jamieUpdateDAQfb replacement work update

Brief update about the fb replacement status.

The new hardware for fb is in the rack, temporarily sitting on top of megatron, and on the CDS network with the name 'fb1'.  I've installed an OS on it and have re-built daqd.

Earlier this week I swapped it into the network and tried to get it to acquire data from the front ends.  I was ultimately unsuccessfully.  The problem seemed to be the mx_stream communication from the front ends to the new host.

The swap is sort of a pain because we only have one Myrinet fiber network adapter card that has to be moved between machines, which of course requires shutting down both machines and opening up their chassis.  I instructed Steve to order us a new Myrinet card for the new machine, which will allow us to swap daqd machines by just moving the fiber connection.  Once that's in place (early next week) I'll go back to trying to figure out what the issue is with the mx_streams.

If all else fails I'll take the repulsive last resort of either swapping or cloning the disk from the old fb.

  11653   Wed Sep 30 13:59:49 2015 jamieUpdateDAQattempts at getting new fb working

I got Steve to get us a new Myrinet fiber network adapter card for fb1:

  • Myrinet 10G-PCIE-8B-S

I just finished installing the card in fb1, and it came up fine.  We happened to have a spare fiber, and a spare fiber jack in the DAQ switch, so I went ahead and plugged it in in parallel to the old fb:

controls@fb1:~/rtbuild/trunk 130$ /opt/mx/bin/mx_info
MX Version: 1.2.16
MX Build: controls@fb1:/opt/src/mx-1.2.16 Fri Sep 18 18:32:59 PDT 2015
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:         Running, P0: Link Up
    Network:        Ethernet 10G

    MAC Address:    00:60:dd:43:74:62
    Product code:   10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:  485052
    Mapper:         00:60:dd:46:ea:ec, version = 0x00000000, configured
    Mapped hosts:   7

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:43:74:62 fb1:0                             1,0
   1) 00:25:90:0d:75:bb c1sus:0                           1,0
   2) 00:30:48:be:11:5d c1iscex:0                         1,0
   3) 00:30:48:d6:11:17 c1iscey:0                         1,0
   4) 00:30:48:bf:69:4f c1lsc:0                           1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
   6) 00:60:dd:46:ea:ec fb:0                              1,0

We can now work on fb1 while fb continues to run and collect data from the front ends.

I'm still not getting the mx_stream connections to the new fb1 daq to work.  I'm leaving everything running as is on fb for the moment.

  11655   Thu Oct 1 19:49:52 2015 jamieUpdateDAQmore failed attempts at getting new fb working

Summary

I've not really been able to make additional progress with the new 'fb1' DAQ.  It's still flaky as hell.  Therefore we're still using old 'fb'.

Issues

mx_stream

The mx_stream processes on the front ends initially run fine, connecting to the daqd and transferring data, with both DAQ-..._STATUS and FE-..._FB_NET_STATUS indicators green.  Then after about two minutes all the mx_stream processes on all the front ends die.  Monit eventually restarts them all, at which point they come up green for a while until the crash again ~2 minutes later.  This is essentially the same situation as reported previously.

In the daqd logs when the mx_streams die:

Aborted 2 send requests due to remote peer 00:30:48:be:11:5d (c1iscex:0) disconnected
Aborted 2 send requests due to remote peer 00:14:4f:40:64:25 (c1ioo:0) disconnected
Aborted 2 send requests due to remote peer 00:30:48:d6:11:17 (c1iscey:0) disconnected
Aborted 2 send requests due to remote peer 00:25:90:0d:75:bb (c1sus:0) disconnected
Aborted 1 send requests due to remote peer 00:30:48:bf:69:4f (c1lsc:0) disconnected
mx_wait failed in rcvr eid=000, reqn=176; wait did not complete; status code is Remote endpoint is closed
disconnected from the sender on endpoint 000
mx_wait failed in rcvr eid=000, reqn=177; wait did not complete; status code is Connectivity is broken between the source and the destination
disconnected from the sender on endpoint 000
mx_wait failed in rcvr eid=000, reqn=178; wait did not complete; status code is Connectivity is broken between the source and the destination
disconnected from the sender on endpoint 000
mx_wait failed in rcvr eid=000, reqn=179; wait did not complete; status code is Connectivity is broken between the source and the destination
disconnected from the sender on endpoint 000
mx_wait failed in rcvr eid=000, reqn=180; wait did not complete; status code is Connectivity is broken between the source and the destination
disconnected from the sender on endpoint 000
[Thu Oct  1 19:00:09 2015] GPS MISS dcu 39 (PEM); dcu_gps=1127786407 gps=1127786425

[Thu Oct  1 19:00:09 2015] GPS MISS dcu 39 (PEM); dcu_gps=1127786408 gps=1127786426

[Thu Oct  1 19:00:09 2015] GPS MISS dcu 39 (PEM); dcu_gps=1127786408 gps=1127786426

In the mx_stream logs:

controls@c1iscey ~ 0$ /opt/rtcds/caltech/c1/target/fb/mx_stream -r 0 -W 0 -w 0 -s 'c1x05 c1scy c1tst' -d fb1:0
mmapped address is 0x7f0df23a6000
mmapped address is 0x7f0dee3a6000
mmapped address is 0x7f0dea3a6000
send len = 263596
Connection Made
isendxxx failed with status Remote Endpoint Unreachable
disconnected from the sender

daqd

While the mx_stream processes are running daqd seems to write out data just fine.  At least for the full frames.  I manually verified that there is indeed data in the frames that are written.

Eventually, though, daqd itself crashes with the same error that we've been seeing:

main profiler warning: 0 empty blocks in the buffer

I'm not exactly sure what the crashes are coincident with, but it looks like they are also coincident with the writing out of the minute and/or second trend files.  It's unclear how it's related to the mx_stream crashes, if at all.  The mx_stream crashes happen every couple of minutes, whereas the daqd itself crashes much less frequently.

The new daqd can't handle EDCU files.  If an EDCU file is specified (e.g. C0EDCU.ini in our case), the daqd will segfault very soon after startup.  This was an issue with the current daqd on fb, but was "fixed" by moving where the EDCU file was specified in the master file.

Conclusion

There are a number of differences between the fb1 and fb configurations:

  • newer OS (Debian 7 vs. ancient gentoo)
  • newer advLigoRTS (trunk vs. 2.9.4)
  • newer framecpp library installed from LSCSoft Debian repo (2.4.1-1+deb7u0 vs. 1.19.32-p1)

It's possible those differences could account for the problems (/opt/rtapps/epics incompatible with this Debian install, for instance).  Somehow I doubt it.  I wonder if all the weird network issues we've been seeing are somehow involved.  If the NFS mount of chiara is problematic for some reason that would affect everything that mounts it, which includes all the front ends and fb/fb1.

There are two things to try:

  • Fix the weird network problem.  Try remove EVERYTHING from the network except for chiara, fb/fb1, and the front ends and see if that helps.
  • Rebuild fb1 with Ubuntu and daqd as prescribed by Keith Thorne.
  11656   Thu Oct 1 20:24:02 2015 jamieUpdateDAQmore failed attempts at getting new fb working

I just realized that when running fb1, if a single mx_stream dies they all die.

  11657   Thu Oct 1 20:26:21 2015 jamieUpdateDAQSwapping between fb and fb1

Swapping between fb and fb1 as DAQ is very straightforward, now that they are both on the DAQ network:

  • stop daqd on fb
  • on fb sudoedit /diskless/root/etc/init.d/mx_stream and set: endpoint=fb1:0
  • start daqd on fb1.  The "new" daqd binary on fb1 is at: ~controls/rtbuild/trunk/build/mx-localtime/daqd

Once daqd starts, the front end mx_stream processes will be restarted by their monits, and be pointing to the new location.

Moving back is just reversing those steps.

  11664   Sun Oct 4 14:28:03 2015 jamieUpdateDAQmore failed attempts at getting new fb working

I tried to look at fb1 again today, but still haven't made any progress.

The one thing I did notice, though, is that every hour on the hour the fb1 daqd process dies in an identical manor to how the fb daqd dies, with these:

[Sun Oct  4 12:02:56 2015] main profiler warning: 0 empty blocks in the buffer

errors right as/after it tries to write out the minute trend frames.

This makes me think that this new hardware isn't actually going to fix the problem we've been seeing with the fb daqd, even if we do get daqd "working" on fb1 as well as it's currently working on fb.

  12714   Fri Jan 13 21:32:49 2017 ranaHowToDAQGet 40m data using NDS2 and Python

The attached file is a python notebook that you can use to get data. Minimal syntax.

Attachment 1: get40mData.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get some 40m data using NDS"
   ]
  },
  {
... 137 more lines ...
  12717   Sat Jan 14 00:53:05 2017 ranaHowToDAQGet 40m data using NDS2 and Python

Minute trend data seems not available using the NDS2 server. Its super slow using dataviewer from the control room.

Did some digging into the NDS2 config on megatron. It hasn't been updated in 2 years.

All of the stuff is run by the user 'nds2mgr'. The CronTab for this user was running all the channel name updates and server restarts at 3 AM each day; I've moved it to 5:05 AM. I don't know the password for this user, so I just did 'sudo su nds2mgr' to become him.

On megatron, in /home/nds2mgr/nds2-megatron/ there is a list of channels and configs. The file for the minute trend (C-M-ChanList.txt), hasn't been updated since Nov-2015. ???

  12718   Sat Jan 14 12:12:03 2017 ranaUpdateDAQminute trends missing

Did we turn off minute trend writing in one of the recent FrameBuilder debug sessions? Seems we only have second trends in 2016. Maybe this explains why its so slow to get minute trends? Dataviewer has to rebuild it from second trend.

controls@nodus|frames > l
total 64
drwx------   2 root     root     16384 Jun  8  2009 lost+found/
drwxr-xr-x   2 controls controls  4096 Jul 14  2015 tmp/
-rw-r--r--   1 controls controls     0 Jul 14  2015 test-file
drwxr-xr-x   5 controls controls  4096 Apr  7  2016 trend/
drwxr-xr-x   4 root     root      4096 Apr 11  2016 archive/
drwxr-xr-x 789 controls controls 36864 Jan 13 19:34 full/
controls@nodus|frames > cd trend
controls@nodus|trend > l
total 3340
drwxr-xr-x 258 controls controls 3342336 Jul  6  2015 minute_raw/
drwxr-xr-x 387 controls controls   36864 Nov  5  2015 minute/
drwxr-xr-x 969 controls controls   36864 Jan 13 19:49 second/

  12719   Sat Jan 14 12:36:57 2017 ericqUpdateDAQminute trends missing

Yes, writing minute trends causes hourly FB crashes in the current state of things. The "raw" minute trending is turned on, but I think that these are unknown to nds.

  12829   Wed Feb 15 00:26:44 2017 JohannesUpdateDAQpanels and pcbs

I finished designing the PCBs for the VME crate back sides (see attached). The project files live on the DCC now at https://dcc.ligo.org/LIGO-D1700058. I ordered a prototype quantity (9) of the PCB printed and bought the corresponding connectors, all will arrive within the next two weeks. See also attached the front panels for the Acromag DAQ chassis and Lydia's RF amplifier unit (the lone +24V slot confuses me: I don't see a ground connector?). On the Acromag panel, six (3x2) of the DB37 connectors are reserved for VME hardware, two are reserve, and I filled the remaining space with general purpose BNC connectors for whatever comes up.

Attachment 1: acromag_chassis_panel.pdf
acromag_chassis_panel.pdf
Attachment 2: vme_backplane_panel.pdf
vme_backplane_panel.pdf
Attachment 3: rfAmp.pdf
rfAmp.pdf
  12830   Wed Feb 15 09:06:13 2017 ericqUpdateDAQpanels and pcbs

The amplifier unit should use the three pin dsub connectors (3w3?) that we use on many of the other units for DC power, and preferably go through the back panel. You can leave out the negative pin, since you just need +24 and ground.

  12832   Wed Feb 15 22:21:12 2017 LydiaUpdateDAQpanels and pcbs

This is already how it's hooked up. The hole on the from that says +24 V is for an indicator light.

Quote:

The amplifier unit should use the three pin dsub connectors (3w3?) that we use on many of the other units for DC power, and preferably go through the back panel. You can leave out the negative pin, since you just need +24 and ground.

 

  12942   Thu Apr 13 19:54:07 2017 ranaUpdateDAQcheckup on minute trends

Our minute trends are still not available through NDS2 from the outside world due to the bad config of the DAQ, but I can confirm that we still have the minute-raw capability. This is 111 days of Seismic BLRMS.

However, it seems we're only able to get ~1 week of lookback on our second trendssadno and that is low-down dirty shame. We used to have over a month of second trend lookback before the last decade of 'upgrades'.

Attachment 1: BRLMS-trend.png
BRLMS-trend.png
  13478   Thu Dec 14 23:27:46 2017 johannesUpdateDAQaux chassis design

Made a front and back panel and slot panels for DSub and IDC breakouts. I want to send this out soon, are there any comments? Preferences for color schemes?

Attachment 1: auxdaq_40m_4U_front.pdf
auxdaq_40m_4U_front.pdf
Attachment 2: auxdaq_40m_4U_rear.pdf
auxdaq_40m_4U_rear.pdf
Attachment 3: auxdaq_40m_4U_DSub37x2.pdf
auxdaq_40m_4U_DSub37x2.pdf
Attachment 4: auxdaq_40m_4U_IDC50.pdf
auxdaq_40m_4U_IDC50.pdf
  13517   Tue Jan 9 00:07:03 2018 johannesUpdateDAQetmx slow daq chassis

All parts received and assembly near complete, small problem detected because the two DSub connectors are too close together for two cables to fit at the same time. Gautam and I will make some additional slot panels tomorrow using a waterjet cutter, so we can spread the breakout boards out and remedy this.

Fast binary channels need to be spliced into DSub connectors. Aaron is on this. All other, slow connections are already wired from before and have been tested for correct pins on the backplane DIN connectors.

 

The Acromag modules require only a positive supply voltage between +12 and +30 VDC and draw a maximum of 2.8W at that. This raises the question if we want this supply rail to be regulated or take the raw power from the Sorensens. Consulting with Ben Abbott: The Acromags in LIGO do not operate with regulated power. We could easily accomodate the standard regulator boards D1000217 in the chassis, which is probably a good idea if we want to place any custom electronics inside the chassis in the future, for example for whitening or active lowpass filtering.

  13529   Wed Jan 10 22:24:28 2018 johannesUpdateDAQetmx slow daq chassis

This evening I transitioned the slow controls to c1auxex2.

  1. Disconnected satellite box
  2. Turned off c1auxex
  3. Disconnected DIN cables from backplace connectors
  4. Attached purple adapter boards
  5. Labeled DSub cables for use
  6. Connected DSub cables to adapter boards and chassis
  7. Initiated modbus IOC on c1auxex2

Gautam and I then proceeded to test basic functionality

  1. Pitch bias sliders move pitch, yaw moves yawyes.
  2. Coil enable and monitoring channels work yes
  3. Watchdog seems to work. yes We set the treshold for tripping low, excited the optic, watchdog didn't disappoint and triggered.
  4. All channels Initialize with "0" upon machine/server script restart. This means the watchdog happens to be OFF, which is good yes. It would be great if we could also initialize PIT and YAW to retain their value from before to avoid kicking the optic. This is not straightforward with EPICS records but there must be a way.
  5. We got the local damping going yes.
  6. There is some problem with the routing of the fast BIO channels through the new chassis - so the ANALOG de-whitening filter seems to be always engaged, despite our toggling the software BIO bits no. Something must be wrongly wired, as we confirmed by returning only the FAST BIO wiring to the pre-acromag state (but everything else is now controlled by acromag) and didn't have the problem anymore. Or some electrical connection is not made (I had to use gender changers on these connectors due to lack of proper cabling)
  7. The switches for the QPD gain stages did not work. no I suspect a wiring problem, since the switching of the coil enables did work.

Arms are locked, have been for ~1hour with no hickups. We will leave it like this overnight to observe, and debug further tomorrow.

  13530   Thu Jan 11 09:57:17 2018 SteveUpdateDAQacromag at ETMX

Good going Johannes!

Quote:

This evening I transitioned the slow controls to c1auxex2.

  1. Disconnected satellite box
  2. Turned off c1auxex
  3. Disconnected DIN cables from backplace connectors
  4. Attached purple adapter boards
  5. Labeled DSub cables for use
  6. Connected DSub cables to adapter boards and chassis
  7. Initiated modbus IOC on c1auxex2

Gautam and I then proceeded to test basic functionality

  1. Pitch bias sliders move pitch, yaw moves yawyes.
  2. Coil enable and monitoring channels work yes
  3. Watchdog seems to work. yes We set the treshold for tripping low, excited the optic, watchdog didn't disappoint and triggered.
  4. All channels Initialize with "0" upon machine/server script restart. This means the watchdog happens to be OFF, which is good yes. It would be great if we could also initialize PIT and YAW to retain their value from before to avoid kicking the optic. This is not straightforward with EPICS records but there must be a way.
  5. We got the local damping going yes.
  6. There is some problem with the routing of the fast BIO channels through the new chassis - so the ANALOG de-whitening filter seems to be always engaged, despite our toggling the software BIO bits no. Something must be wrongly wired, as we confirmed by returning only the FAST BIO wiring to the pre-acromag state (but everything else is now controlled by acromag) and didn't have the problem anymore. Or some electrical connection is not made (I had to use gender changers on these connectors due to lack of proper cabling)
  7. The switches for the QPD gain stages did not work. no I suspect a wiring problem, since the switching of the coil enables did work.

Arms are locked, have been for ~1hour with no hickups. We will leave it like this overnight to observe, and debug further tomorrow.

 

Attachment 1: Acromg_in_action.png
Acromg_in_action.png
  13535   Thu Jan 11 20:59:41 2018 gautamUpdateDAQetmx slow daq chassis

Some suggestions of checks to run, based on the rightmost colum in the wiring diagram here - I guess some of these have been done already, just noting them here so that results can be posted.

  1. Oplev quadrant slow readouts should match their fast DAQ counterparts.
  2. Confirm that EX Transmon QPD whitening/gain switching are working as expected, and that quadrant spectra have the correct shape.
  3. Watchdog tripping under different conditions.
  4. Coil driver slow readbacks make sense - we should also confirm which of the slow readbacks we are monitoring (there are multiple on the SOS coil driver board) and update the MEDM screen accordingly.
  5. Confirm that shadow sensor PD whitening is working by looking at spectra.
  6. Confirm de-whitening switching capability - both to engage and disengage - maybe the procedure here can be repeated.
  7. Monitor DC alignment of ETMX - we've seen the optic wander around (as judged by the Oplev QPD spot position) while sitting in the control room, would be useful to rule out that this is because of the DC bias voltage stability (it probably isn't).
  8. Confirm that burt snapshot recording is working as expected - this is not just for c1auxex, but for all channels, since, as Johannes pointed out, the 2018 directory was totally missing and hence no snapshots were being made.
  9. Confirm that systemd restarts IOC processes when the machine currently called c1auxex2 gets restarted for whatever reason.

 

  13537   Fri Jan 12 10:02:05 2018 johannesUpdateDAQetmx slow daq chassis
Quote:

There is some problem with the routing of the fast BIO channels through the new chassis - so the ANALOG de-whitening filter seems to be always engaged, despite our toggling the software BIO bits no. Something must be wrongly wired, as we confirmed by returning only the FAST BIO wiring to the pre-acromag state (but everything else is now controlled by acromag) and didn't have the problem anymore. Or some electrical connection is not made (I had to use gender changers on these connectors due to lack of proper cabling)

The switches for the QPD gain stages did not work. no I suspect a wiring problem, since the switching of the coil enables did work.

Both issues were fixed. In both cases it was two separate causes that prevented them from working.

The QPD gain stage switch software channels were assigned to wrong physical pins of the Acromag, and additionally their DSub cable was swapped with a different one.

The BIO switching had its signal and ground wires swapped on ALL connections, and part of it was also suffering from the cable-mixup.

Both issues were fixed. All backplane signals are now routed through the Acromag chassis.

 

Gautam and I did notice that occasionally ETMX alignment will start drifting as evident from the OpLev. I want to set up a diagnostic channel to see if the DAC voltages coming from the Acromag are responsible for this.

  13543   Fri Jan 12 19:15:34 2018 johannesUpdateDAQetmx slow daq chassis

Steve and I removed c1auxex from 1X9 today to make space for the DAQ chassis. Steve installed rails for mounting. To install the box I had to remove all cabling, for which I used the usual precautions (disconnect satellite box etc.)

On reconnect c1auxex2 didn't initialize the physical EPICS channels (the 'actual' acromag channels), apparently it had trouble communicating. A reboot fixed this. It's possible that this is because of the direct cable connection without a network switch that exists
between the Acromags and c1auxex. The EPICS server was started automatically on reboot.

Currently the channel defaults need to be loaded manually after every EPICS server script restart with burt. I'm looking for a good way to automate this, but the only compiled burt binaries for x86 (that we can in principle run on c1auxex2 itself) on the martian network are from EPICS version 3.14.10 and throw a missing shared object error. Could be that simply some path variable is missing.

The burt binaries are not distributed by the lscsoft or cdssoft packages, so alternatively we would need to compile it ourselves for x86 or get it working with the older epics version.

  13553   Wed Jan 17 14:32:51 2018 gautamUpdateDAQAcromag checks
  1. I take back what I said about the OSEM PD mon at the meeting - there does seem to be to be some overall calibration factor (Attachment #1) that has scaled the OSEM PD readback channels, by a factor of (20000/2^15), which Johannes informs me is some strange feature of the ADC, which he will explain in a subsequent post.
  2. The coil redback fields on the MEDM screen have a "30Hz HPF" text field below them - I believe this is misleading. Judging by the schematic, we are monitoring, on the backplane (which is what these channels are reading back from), the coil to the voltage with a gain of 0.5. We can reconfirm by checking the ETMX coil driver board, after which we should remove the misleading label on the MEDM screens.
Quote:

Some suggestions of checks to run, based on the rightmost colum in the wiring diagram here - I guess some of these have been done already, just noting them here so that results can be posted.

  1. Oplev quadrant slow readouts should match their fast DAQ counterparts.
  2. Confirm that EX Transmon QPD whitening/gain switching are working as expected, and that quadrant spectra have the correct shape.
  3. Watchdog tripping under different conditions.
  4. Coil driver slow readbacks make sense - we should also confirm which of the slow readbacks we are monitoring (there are multiple on the SOS coil driver board) and update the MEDM screen accordingly.
  5. Confirm that shadow sensor PD whitening is working by looking at spectra.
  6. Confirm de-whitening switching capability - both to engage and disengage - maybe the procedure here can be repeated.
  7. Monitor DC alignment of ETMX - we've seen the optic wander around (as judged by the Oplev QPD spot position) while sitting in the control room, would be useful to rule out that this is because of the DC bias voltage stability (it probably isn't).
  8. Confirm that burt snapshot recording is working as expected - this is not just for c1auxex, but for all channels, since, as Johannes pointed out, the 2018 directory was totally missing and hence no snapshots were being made.
  9. Confirm that systemd restarts IOC processes when the machine currently called c1auxex2 gets restarted for whatever reason.

 

 

 

Attachment 1: OSEMPDmon_Acro.png
OSEMPDmon_Acro.png
  13554   Wed Jan 17 22:44:14 2018 johannesUpdateDAQAcromag checks

This happened because there are multiple ways to scale the raw value of an EPICS channel to the desired output range. In the CryoLab I was using one way, but the EPICS records I copied from c1auxex were doing it differently. Basically this:

DTYP  - Data type -
LINR "NO CONVERSION" vs "LINEAR"
RVAL Raw value
EGUF Engineering units full scale
EGUL Engineering units low
ASLO Manual scaling factor
AOFF Manual offset
VAL Value

If the "LINR" field is set to "LINEAR", the fields EGUF and EGUL are used to convert the raw value to the channel value VAL. To use them, one needs to enter the voltages that return the maximum and minimum values expected for the given data type. It used to be +10V and -10V, respectively, and was copied that way but that doesn't work with the data type required for the Acromag units. For -some- reason, while the the ADC range is -10V to +10V, this corresponds to values -20000 to +20000, while for the DAC channels it's -30000 to +30000. I had observed this before when setting up the DAQ in the CryoLab, but there we were using "NO CONVERSION", which skips the EGUF and EGUL fields, and used the ASLO and AOFF for manual scaling to get it right. When I mixed the records from there with the old ones from c1auxex this got lost in translation. Gautam and I confirmed by eye that this indeed explains the different levels well. This means that the VMon channelsfor the coils are also showing the wrong voltages, which will be corrected, but the readback still definitely works and confirms that the enable switches do their job.

Quote:
  1. I take back what I said about the OSEM PD mon at the meeting - there does seem to be to be some overall calibration factor (Attachment #1) that has scaled the OSEM PD readback channels, by a factor of (20000/2^15), which Johannes informs me is some strange feature of the ADC, which he will explain in a subsequent post.

 

  13565   Sun Jan 21 13:11:25 2018 johannesUpdateDAQAcromag checks

After some research: -the- reason for the reduced +/- 20,000 swing in raw values is a default setting for having support for legacy devices enabled when using the acromag proprietary i2o peer-to-peer protocol. So this is doubly unnecessary because a) we don't have any legacy devices at all and b) we're using pure modbus/TCP and no i2o. To change the setting I have to connect via the USB configuration utility. In addition, I want to understand the averaging feature of the acromag units better, which is also configured via USB, and lets one set a fixed amount of samples to be averaged before updating the read-register value. The documentation says that the 8 channels are multiplexed into a single ADC, and that new input data is available after 10 ms for each channel, suggesting a sampling rate of 100 Hz per channel and that the multiplexing happens faster, but is not super-clear about this, so I want to test it in the cryo lab first before unleashing it onto c1auxex2.

Furthermore, the standard timing options for updating epics records are 10s, 5s, 2s, 1s, 0.5s, 0,2s, and 0.1s. On the previous c1auxex, the monitoring channels were set to 0.1s, but that clashes with the 16 Hz global EPICS rate, resulting in partial double-sampling. One can manually provide the option 0.0625s for 16Hz update rate. I will test this and how it deals with the averaging in the cryolab too.

  13576   Wed Jan 24 18:12:31 2018 johannesUpdateDAQETMX auxiliary DAQ work

I replaced the two remaining D-Sub M/M cables that still had gender-changers with M/F cables today, completing the mechanical and wiring work on the ETMX rack. All backplane adapter boards were secured to a cross-strut of the crate using zip ties. This was necessary because the adapter boards don't fit the crate with their panels attached ( the ETMX eurocrate is the only one with slightly different dimensions from all the others), and the we can't mount them to the strut using the panels. This won't be an issue on any of the other crates.

   

   

In other news:

I disabled the legacy support in the three Acromag ADC units and set the input averaging to 10 samples via the USB configuration utility. The documentation is unfortunately a little sparse about what this actually means. The manual states that "fresh input data is available to the network every 10ms", so the sampling rate is for sure faster than 100Hz. Since the IOC updates its channels every .1 seconds I assume that an averaging value of 10 to reduce the input noise is safe. The maximum value the configuration tool permits is 200. I tried this using the CryoLab DAQ and set all input channels to 200 and used StripTool to look at the time series of a slow oscillation (.1Hz) with a large amplitude (16Vpp) and looked for missed data points, indicating too long wait times for channels updates. There was no such qualitative difference between 1 sample, 10 samples, and 200 samples, so even pushing the averaging value to max seemed okay. I went with the conservative value of 10 for the ETMX DAQ, but we can likely increase this if noise on the slow inputs becomes an issue.

The input scaling of the ADC channels has been corrected. I changed the conversion method in the EPICS records from manual using the ASLO and AOFF fields to using engineering units via EGUF and EGUL. This required a little attention. The Acromags scale the dynamic input range of +/- 10V to +/- 30,000 raw value, but the EPICS IOC interprets the data type as ranging from -32767 to +32768, so the EGUF and EGUL fields must be set to -10.923 and +10.923 to achieve proper scaling. I also changed the SCAN field on all ADC channels to 0.1 seconds. This has been done for all ADC and DAC channel records.

  13578   Wed Jan 24 19:17:06 2018 johannesUpdateDAQc1auxex2 startup behavior

I compiled the burt binaries on c1auxex2 which took a little fiddling with dependencies and paths but nothing too major. The complete local epics folder (/opt/epics/) which contains the base epics binaries, modbus and burt for 32-bit linux has been copied to the shared drive at /opt/rtapps/epics-3.15.5. They belong to the most recent stable release. This was so we can now automatically call burt after the IOC initialization on c1auxex2 to restore the backed-up channel values.

I also copied the database definition and modbus instruction files to /cvs/cds/caltech/target/c1auxex2, from where they are now being read upon IOC initialization. This is an excerpt of the service file:

#ExecStart=/usr/bin/procServ -f -L /home/controls/modbusIOC/modbusIOC.log -p /run/modbusioc.pid 8008 /opt/epics/modules/modbus/bin/linux-x86/modbusApp /cvs/cds/caltech/target/c1auxex2/ETMXaux2.cmd   <-- Contains logging to file, see note 1)
ExecStart=/usr/bin/procServ -f -p /run/modbusioc.pid 8008 /opt/epics/modules/modbus/bin/linux-x86/modbusApp /cvs/cds/caltech/target/c1auxex2/ETMXaux2.cmd <-- Initializes the EPICS IOC with Modbus support
ExecStop=/bin/kill -9 ` cat /run/modbusioc.pid` <-- Kills the detached process by its process ID
ExecStartPost=/bin/bash -c "/opt/epics/extensions/bin/linux-x86/burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/latest/c1auxex.snap" <-- Restores general channel values
ExecStartPost=/bin/bash -c "/opt/epics/extensions/bin/linux-x86/burtwb -f /opt/rtcds/caltech/c1/medm/MISC/ifoalign/burt/ETMX.snap" <-- Restores PIT and YAW values from align MEDM screen
ExecStartPost=/bin/bash -c ". /home/controls/modbusIOC/ETMXaux2.sh" <-- Enables writing to PIT and YAW DAC channels, see note 2)

Note 1) I removed the logging to file for now because I noticed that if there are Acromag communication issues the logfile tends to grow in size VERY fast. In the cryo lab is had gotten to over 70GB just over the winter break. I don't think it's absolutely necessary to have it, and if diagnostics are needed we can easily uncomment it temporarily.

Note 2) I modified the static EPICS records of the four OSEM bias adjust channels so they won't start updating as soon as the IOC starts up (and before the channel defaults are restored by burt). This was done by setting the OMSL (output mode select) field from "closed_loop" to "supervisory". Sample record:

record(ao,"C1:SUS-ETMX_ULBiasAdj")
{
        field(DESC,"Bias Adjust for ETMX UL Coil Output")
        field(DTYP,"asynInt32")
        field(OUT, "@asynMask(C1AUXEX_XT1541A_DAC, 0, -16)MODBUS_DATA")
        field(SCAN,".1 second")
        field(OMSL,"supervisory")  <-- Used to be "closed_loop"
        field(DOL, "C1:SUS-ETMX_ULBiasSet  PP")
        field(PREC,"3")
        field(EGUF,"10.923")
        field(EGUL,"-10.923")
        field(EGU, "Volts")
        field(LINR,"LINEAR")
        field(DRVH,"10")
        field(DRVL,"-10")
        field(HOPR,"10")
        field(LOPR,"-10")
}

Now, on reboort/IOC re-initialization the physical DAC channels are performing a one-time readback of the last stored value in the Acromag's register, then idle until the last StartPost statement executes the script ETMXaux.sh, which changes their OMSL field back to "closed_loop". This causes them to start updating their output from the calc records defined in their DOL field (which have by then recovered their default values curtesy of burt). The result is a smooth transition from idling to the controlled state with no sudden or large offset changes. yes

  13580   Wed Jan 24 23:13:30 2018 johannesUpdateDAQc1auxex2 startup behavior
Quote:

The result is a smooth transition from idling to the controlled state with no sudden or large offset changes. yes

[Gautam, Johannes]

While checking how smooth the transition is we still noticed significant motion of ETMX by looking at the locked green laser and OpLevs. We found that this motion was not caused by interruption of the slow offset adjust, but rather the Watchdog being re-initialized to its OFF state, which cuts the fast channels OFF. On other optics this is observed too, but not as severe. The cause is a rather large offset on the LR coil coming from the fast DAQ, which was reported as 50mV by the slow readback channel (while other readback values are <10mV). It is present even when turning the output of the CDS model OFF, but vanishes when the watchdog is triggered. This helped us trace it to an offset of the DAC output itself: it is present at the output of the AI board but vanishes when the DAC is disconnected. The actual offset is ~40mV, as opposed to other channels on the same board, which ahve offsets in the range 3-7mV.

While we can compensate for this offset in software - it made us  wonder if the DAC channel is somehow busted and if that's what causing the 'wandering' of ETMX that we have been observing recently. There are two free DAC channels on the AI chassis that has the side coil and the green temperature control signals. We could re-route the LR signal through a different DAC channel to fix this.

gautam: 40mV offset at the AI board output gets multiplied by 3 in the dewhitening board, so there is a 120mV DC offset going to the coil (measured at dewhite board output with DMM). The offset itself isn't hurting us, but the fact that it is several times larger than other channels led us to wonder if it could be drifting around as well. From my SOS pitch balancing forays, in my head I have the number 30mrad as being the full range of the OSEM actuation - so if the offset swings by 120mV, that's ~150urad of motion, which is quite large, and is of the order of magnitude I'm used to seeing ETMX move around by.

  13590   Wed Jan 31 15:29:44 2018 johannesUpdateDAQPSL acromag server moved from megatron to c1auxex2

I moved the epics IOC server process for the single Acromag ADC that monitors the PSL signals from megatron to c1auxex2.

First, I disabled the legacy support on all channels as explained in elog 13565. Then I copied the files npro_config.cmd and NPRO.db from /opt/rtcds/caltech/c1/scripts/Acromag to /cvs/cds/caltech/target/c1psl2/ following the pattern of the old Motorola machines and the new c1auxex2. I had to make some edits for correct paths and expanded the epics records to the standard we're using for ETMX.

I then added a service to systemd on c1auxex2 that runs the epics IOC for the Acromag PSL channels: /etc/systemd/system/modbusPSL.service. No more tmux on megatron.

Running two IOCs on a signle machine at the same time did not produce any errors and seems fine so far.

  13742   Mon Apr 9 23:28:49 2018 johannesConfigurationDAQc1psl channel list

I made a list of all the physical c1psl channels to get a better idea for how many acromags we need to replace it eventually. There  3123 unit is the one whose failure had prevented c1psl from booting, which is why it was unplugged (elog post 12852), and its channels have been inactive since. Are the 126MOPA channels used for the current mephisto? 126 tells me it's for an old lightwave laser, but I was checking a few and found that they have non-zero, changing values, so they may have been rewired.

It also hosts some virtual channels for the ISS with root C1:PSL-ISS_ defined in iss.db and dc.db, the PSL particle counter with root C1:PEM- defined in PCount.db  and a whole lot of PSL status channels defined in pslstatus.db. Transfering these virtual channels to a different machine is almost trivial, but the serial readout of the particle counter would have to find a new home.

Long story short - we need:

Function Type # Channels #Channels (no MOPA) # Units # Units (no MOPA)
ADC XT1221 34 21 5 3
DAC XT1541 17 14 3 2
BIO XT1111 19 10 2 1

 



3113 - ADC

C1:PSL-126MOPA_126PWR
C1:PSL-126MOPA_DTMP
C1:PSL-126MOPA_LTMP
C1:PSL-126MOPA_DMON
C1:PSL-126MOPA_LMON
C1:PSL-126MOPA_CURMON
C1:PSL-126MOPA_DTEC
C1:PSL-126MOPA_LTEC
C1:PSL-126MOPA_CURMON2
C1:PSL-126MOPA_HTEMP
C1:PSL-126MOPA_HTEMPSET
C1:PSL-FSS_RFPDDC
C1:PSL-FSS_LODET
C1:PSL-FSS_FAST
C1:PSL-FSS_PCDRIVE
C1:PSL-FSS_MODET
C1:PSL-FSS_VCODETPWR
C1:PSL-FSS_TIDALOUT
C1:PSL-PMC_RFPDDC
C1:PSL-PMC_LODET
C1:PSL-PMC_PZT
C1:PSL-PMC_MODET


3123 - ADC (failed)

C1:PSL-126MOPA_AMPMON
C1:PSL-126MOPA_126MON
C1:PSL-FSS_RCTRANSPD
C1:PSL-FSS_MINCOMEAS
C1:PSL-FSS_RMTEMP
C1:PSL-FSS_RCTEMP
C1:PSL-FSS_MIXERM
C1:PSL-FSS_SLOWM
C1:PSL-FSS_TIDALINPUT
C1:PSL-PMC_PMCTRANSPD
C1:PSL-PMC_PMCERR
C1:PSL-PPKTP_TEMP


4116 - DAC

C1:PSL-126MOPA_126CURADJ
C1:PSL-126MOPA_DCAMP
C1:PSL-126MOPA_DCAMP-
C1:PSL-FSS_INOFFSET
C1:PSL-FSS_MGAIN
C1:PSL-FSS_FASTGAIN
C1:PSL-FSS_PHCON
C1:PSL-FSS_RFADJ
C1:PSL-FSS_SLOWDC
C1:PSL-FSS_VCOMODLEVEL
C1:PSL-FSS_TIDAL
C1:PSL-FSS_TIDALSET
C1:PSL-PMC_GAIN
C1:PSL-PMC_INOFFSET
C1:PSL-PMC_PHCON
C1:PSL-PMC_RFADJ
C1:PSL-PMC_RAMP


XVME-210 - Binary Input

C1:PSL-126MOPA_FAULT
C1:PSL-126MOPA_INTERLOCK
C1:PSL-126MOPA_SHUTTER
C1:PSL-126MOPA_126LASE
C1:PSL-126MOPA_AMPON


XVME-220 - Binary Output

C1:PSL-126MOPA_126NE
C1:PSL-126MOPA_126STANDBY
C1:PSL-126MOPA_SHUTOPENEX
C1:PSL-126MOPA_STANDBY
C1:PSL-FSS_SW1
C1:PSL-FSS_SW2
C1:PSL-FSS_FASTSWEEP
C1:PSL-FSS_PHFLIP
C1:PSL-FSS_VCOTESTSW
C1:PSL-FSS_VCOWIDESW
C1:PSL-PMC_SW1
C1:PSL-PMC_SW2
C1:PSL-PMC_PHFLIP
C1:PSL-PMC_BLANK

  14141   Mon Aug 6 20:41:10 2018 aaronUpdateDAQNew DAC for the OMC

Gautam and I tested out the DAC that he installed in the latter half of last week. We confirmed that at least one of the channels is can successfully drive a sine wave (ch10, 1-indexed). We had to measure the output directly on the SCSI connector (breakout in the FE hard drive cabinet along the Y arm), since the SCSI breakout box (D080303) seems not to be working (wiring diagram in Gautam's elog from his SURF years).

I added some DAC channels to our c1omc model:
PZT1_PIT
PZT1_YAW
PZT2_PIT
PZT2_YAQ
 
And determined that when we go to use the ADC, we will initially want the following channels (even these are probably unnecessary for the very first scans):
TRANS_PD1
TRANS_PD2
REFL_PD
DVMDC (drive voltage monitor, DC level)
DVMAC ("", AC level, only needed if we dither the length)
 
I attach a screenshot of the model, and a picture of where the whitening/dewhitening boards should go in the rack.
Attachment 1: OMCDACmdl.png
OMCDACmdl.png
  14172   Tue Aug 21 03:09:59 2018 johannesOmnistructureDAQPanels for Acromag DAQ chassis

I expanded the previous panels to 6U height for the new DAQ chassis we're buying for the upgrade. I figure it's best if we stick to the modular design, so I'm showing a panel for 8 BNC connectors as an example. The front panel has 12 slots, the back has 10 plus power connectors, switches, and the ethernet plug.

I moved the power switch to the rear because it's a waste of space to put it in the front, and it's not like we're power cycling this thing all the time. Note that the unit only requires +24V (for general operation, +20V also does the trick, as is the situation for ETMX) and +15V (excitation field for the binary I/O modules). While these could fit into a single CONEC power connector, it's probably for the better if we don't make a version that supplies a large positive voltage where negative is expected, so I put in two CONEC plugs for +/- 15 and +/- 24.

I want to order 5-6 of these as soon as possible, so if anyone wants anything changed or sees a problem, please do tell!

Attachment 1: auxdaq_40m_6U_front.pdf
auxdaq_40m_6U_front.pdf
Attachment 2: auxdaq_40m_6U_rear.pdf
auxdaq_40m_6U_rear.pdf
Attachment 3: auxdaq_40m_6U_BNC.pdf
auxdaq_40m_6U_BNC.pdf
  14295   Wed Nov 14 18:58:35 2018 aaronUpdateDAQNew DAC for the OMC

I began moving the AA and AI chassis over to 1X1/1X2 as outlined in the elog.

The chassis were mostly filled with empty cables. There was one cable attached to the output of a QPD interface board, but there was nothing attached to the input so it was clearly not in use and I disconnected it.

I also attach a picture of some of the SMA connectors I had to rotate to accommodate the chassis in their new locations.

Update:

The chassis are installed, and the anti-imaging chassis can be seen second from the top; the anti-aliasing chassis can be seen 7th from the top.

I need to breakout the SCSI on the back of the AA chassis, because ADC breakout board only has a DB36 adapter available; the other cables are occupied by the signals from the WFS dewhitening outputs.

Attachment 1: 6D079592-1350-4099-864B-1F61539A623F.jpeg
6D079592-1350-4099-864B-1F61539A623F.jpeg
Attachment 2: 5868D030-0B97-43A1-BF70-B6A7F4569DFA.jpeg
5868D030-0B97-43A1-BF70-B6A7F4569DFA.jpeg
  15067   Tue Dec 3 20:32:37 2019 ranaOmnistructureDAQNDS2 situation

Recently, accordian to Gautam, the NDS2 server has been dying on Megatron ~daily or weekly. The prescription is to restart the server.

  1. I could find no instructions (that work) in the elog or wiki. We must remove the misleading entries from the wiki and update it with whatever works as of today.
  2. There is a line (which has been commented out) in the Megatron crontab which is close to the right command, but it has the wrong path.
  3. Running the command from the CRON (/home/nds2mgr/nds2-megatron/test_restart), gives several errrors.
  4. when I run the init.d command which is in the script, it seems to run fine
  5. the server then takes several minutes to get itself together; i.e. just because it is running doesn't mean that you can get data. I recommend waiting 5-10 minutes.

Also, megatron is running Ubuntu 12 !! Let's decide on a day to upgrade it to a Debian 18ish....word from Rolf is that Scientific Linux is fading out everywhere, so Debian is the new operating system for all conformists.

Attachment 1: getData.py
#!/usr/bin/env python
# this function gets some data (from the 40m) and saves it as
# a .mat file for the matlabs
# Ex. python -O getData.py


from scipy.io import savemat,loadmat
import scipy.signal as sig
from astropy.time import Time
import nds2
... 116 more lines ...
Attachment 2: chanlist.txt
PEM-SEIS_BS_X_OUT_DQ
PEM-SEIS_BS_Y_OUT_DQ
PEM-SEIS_BS_Z_OUT_DQ
PEM-SEIS_EX_X_OUT_DQ
PEM-SEIS_EX_Y_OUT_DQ
PEM-SEIS_EX_Z_OUT_DQ
PEM-SEIS_EY_X_OUT_DQ
PEM-SEIS_EY_Y_OUT_DQ
PEM-SEIS_EY_Z_OUT_DQ
  15302   Mon Apr 13 16:51:49 2020 ranaSummaryDAQNODUS: rsyncd daemon / service set up

I just now modified the /etc/rsyncd.conf file as per Dan Kozak's instructions. The old conf file is still there with the file name appended with today's date.

I then enabled the rsync daemon to run on boot using 'enable'. I'll ask Dan to start the file transfers again and see if this works.

controls@nodus|etc> sudo systemctl start rsyncd.service
controls@nodus|etc> sudo systemctl enable rsyncd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/rsyncd.service to /usr/lib/systemd/system/rsyncd.service.
controls@nodus|etc> sudo systemctl status rsyncd.service
● rsyncd.service - fast remote file copy program daemon
   Loaded: loaded (/usr/lib/systemd/system/rsyncd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-04-13 16:49:12 PDT; 1min 28s ago
 Main PID: 4950 (rsync)
   CGroup: /system.slice/rsyncd.service
           └─4950 /usr/bin/rsync --daemon --no-detach

Apr 13 16:49:12 nodus.martian.113.168.192.in-addr.arpa systemd[1]: Started fast remote file copy program daemon.
Apr 13 16:49:12 nodus.martian.113.168.192.in-addr.arpa systemd[1]: Starting fast remote file copy program daemon...

  15560   Sun Sep 6 13:15:44 2020 JonUpdateDAQUPS for framebuilder

Now that the old APC Smart-UPS 2200 is no longer in use by the vacuum system, I looked into whether it can be repurposed for the framebuilder machine. Yes, it can. The max power consumption of the framebuilder (a SunFire X4600) is 1.137kW. With fresh batteries, I estimate this UPS can power the framebuilder for >10 min. and possibly as long as 30 min., depending on the exact load.

@Chub/Jordan, this UPS is ready to be moved to rack 1X6/1X7. It just has to be disconnected from the wall outlet. All of the equipment it was previously powering has been moved to the new UPS. I have ordered a replacement battery (APC #RBC43) which is scheduled to arrive 9/09-11.

  16853   Sat May 14 08:36:03 2022 ChrisUpdateDAQDAQ troubleshooting

I heard a rumor about a DAQ problem at the 40m.

To investigate, I tried retrieving data from some channels under C1:SUS-AS1 on the c1sus2 front end. DQ channels worked fine, testpoint channels did not. This pointed to an issue involving the communication with awgtpman. However, AWG excitations did work. So the issue seemed to be specific to the communication between daqd and awgtpman.

daqd logs were complaining of an error in the tpRequest function: error code -3/couldn't create test point handle. (Confusingly, part of the error message was buffered somewhere, and would only print after a subsequent connection to daqd was made.) This message signifies some kind of failure in setting up the RPC connection to awgtpman. A further error string is available from the system to explain the cause of the failure, but daqd does not provide it. So we have to guess...

One of the reasons an RPC connection can fail is if the server name cannot be resolved. Indeed, address lookup for c1sus2 from fb1 was broken:

$ host c1sus2
Host c1sus2 not found: 3(NXDOMAIN)

In /etc/resolv.conf on fb1 there was the following line:

search martian.113.168.192.in-addr.arpa

Changing this to search martian got address lookup on fb1 working:

$ host c1sus2
c1sus2.martian has address 192.168.113.87

But testpoints still could not be retrieved from c1sus2, even after a daqd restart.

In /etc/hosts on fb1 I found the following:

192.168.113.92  c1sus2

Changing the hardcoded address to the value returned by the nameserver (192.168.113.87) fixed the problem.

It might be even better to remove the hardcoded addresses of front ends from the hosts file, letting DNS function as the sole source of truth. But a full system restart should be performed after such a change, to ensure nothing else is broken by it. I leave that for another time.

  16854   Mon May 16 10:49:01 2022 AnchalUpdateDAQDAQ troubleshooting

[Anchal, Paco, JC]

Thanks Chris for the fix. We are able to access the testpoints now but we started facing another issue this morning, not sure how it is related to what you did.

  • The C1:LSC-TRX_OUT and C1:LSC-TRY_OUT channels are stuck to zero value.
  • These were the channels we used until last friday to align the interferometer.
  • These channels are routed through the c1rfm FE model (Reflected Memory model is the name, I think). These channels carry the IR transmission photodiode monitors at the two ends of the interferometer, where they are first logged into the local FEs as C1:SUS-ETMX_TRX and C1:SUS-ETMY_TRY .
  • These channels are then fed to C1:SCX-RFM_TRX -> C1:RFM_TRX -> C1:RFM-LSC_TRX -> C1:LSC-TRX and similar for Y side.
  • We are able to see channels in the end FE filtermodule testpoints (C1:SUS-ETMX_TRX_OUT & C1:SUS-ETMY_TRY_OUT)
  • However, we are unable to see the same signal in c1rfm filter module testpoints like C1:RFM_TRX_IN1, C1:RFM_TRY_IN1 etc
  • There is an IPC error shown in CDS FE status screen for c1rfm in c1sus. But we remember seeing this red for a long time and have been ignoring it so far as everything was working regardless.

The steps we have tried to fix this are:

  • Restart all the FE models in c1lsc, c1sus, and c1ioo (without restarting the computers themselves) , and then burt restore.
  • Restart all the FE models in c1iscex, and c1iscey (only c1iscey computer was restarted) , and then burt restore.

These above steps did not fix the issue. Since we have  the testpoints (C1:SUS-ETMX_TRX_OUT & C1:SUS-ETMY_TRY_OUT) for now to monitor the transmission levels, we are going ahead with our upgrade work without resovling this issue. Please let us know if you have any insights.

  16855   Mon May 16 12:59:27 2022 ChrisUpdateDAQDAQ troubleshooting

It looks like the RFM problem started a little after 2am on Saturday morning (attachment 1). It’s subsequent to what I did, but during a time of no apparent activity, either by me or others.

The pattern of errors on c1rfm (attachment 2) looks very much like this one previously reported by Gautam (errors on all IRFM0 ipcs). Maybe the fix described in Koji’s followup will work again (involving hard reboots).

Attachment 1: timeseries.png
timeseries.png
Attachment 2: err.png
err.png
  32   Tue Oct 30 19:32:13 2007 tobinProblem FixedComputersconlogger restarted
I noticed that the conlogger wasn't running. It looks like it hasn't been running since October 11th. I modified the restart_conlogger script to insist that it run on op340m instead of op440m, and then ran it on op340m.
  46   Thu Nov 1 16:34:47 2007 Andrey RodionovSummaryComputersLimitation on attachment size of E-LOG

I discovered yesterday when I was attaching photos that it is NOT possible to attach files whose size is 10Mb or more. Therefore, 10Mb or something very close to that value is the limit.
  71   Tue Nov 6 16:48:54 2007 tobinConfigurationComputersscopes on the net
I configured our two 100 MHz Tektronix 3014B scopes with IP addresses: 131.215.113.24 (scope0) and 113.215.113.25 (scope1). Let the scripting commence!

There appears to be a Matlab Instrument Control Toolbox driver for this scope.
  72   Tue Nov 6 18:18:15 2007 tobinConfigurationComputersI broke (and fixed) conlogger
It turns out that not only restart_conlogger, but also conlogger itself checks to see that it is running on the right machine. I had changed the restart_conlogger script to run on op340, but it would actually silently fail (because we cleverly redirect conlogger's output to /dev/null). Anyway, it's fixed now: I edited the conlogger source code where the hostname is hardcoded (blech!) and recompiled.

On another note, Andrey fixed the "su" command on op440m. It turns out that the GNU version, in /usr/local/bin, doesn't work, and was masking the (working) sun version in /bin. Andrey renamed the offending version as "su.backup".
  73   Tue Nov 6 23:45:38 2007 tobinConfigurationComputerstektronix scripts!
I cooked up a little script to fetch the data from the networked Tektronix scope. Example usage:

linux2:scripts>tektronix/tek-dump scope0 ch1 foo.csv

"scope0" is the hostname of the scope, "ch1" is the channel you want to dump, and "foo.csv" is the file you want to dump it to. The script is written in Python since Python's libhttp gave me less trouble than Perl's HTTP::Lite.
  77   Wed Nov 7 10:55:21 2007 ajwConfigurationComputersbackup script restarted
Following the reboot of computers on 10/31/07, the backup script required restart (which unfortunately "can't" be automated because a password needs to be typed in). I restarted, following the instructions in /cvs/cds/caltech/scripts/backup/000README.txt and verified that it more-or-less worked last night (the rsync sometimes times out; it gets through after a couple of days of trying.)
  92   Sun Nov 11 21:21:04 2007 ranaHowToComputersNew DV
To use the new ligoDV (previously GEO DV) to look at 40m data, open up a matlab, set up for mDV as usual,
and then from the /cvs/cds/caltech/apps/ligoDV/ directory, type 'ligoDV'.

Then select which NDS server you want to look at and then start clicking to get some plots.
Attachment 1: Screenshot-1.png
Screenshot-1.png
  106   Thu Nov 15 18:06:06 2007 tobinUpdateComputersalex: linux1 root file system hard disk's dying
I just noticed that Alex made an entry in the old ilog yesterday, saying: "Looks like linux1 root filesystem hard drive is about to die. The system log is full of drive seek errors. We should get a replacement IDE drive as soon as possible or else the unthinkable could happen. 40 Gb IDE hard drive will be sufficient."
  107   Thu Nov 15 18:23:55 2007 JohnHowToComputersSwap CAPS and CTRL on a Windows 2000/XP machine
I've swapped ctrl and caps on the four control room Windows machines. Right ctrl is unchanged.



Start menu->Run "regedit"

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout

Click on the KeyboardLayout entry.

Edit->New Binary Value name it Scancode Map.

Then select the new Scancode Map entry.

Edit menu->Modify Binary Data.

In the dialog box enter the following data:

0000: 00 00 00 00 00 00 00 00
0008: 03 00 00 00 3A 00 1D 00
0010: 1D 00 3A 00 00 00 00 00

Exit the Registry Editor. You need to log off and then on in XP (and restart in Windows 2000) for the changes to be made.
ELOG V3.1.3-