40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 59 of 335  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  16311   Thu Sep 2 20:47:19 2021 KojiUpdateCDSChiara DHCP restarted

[Paco, Tega, Koji]

Once chiara's DHCP is back, things got much more straight forward.
c1iscex and c1iscey were rebooted and the IOPs were launched without any hesitation.

Paco ran rebootC1LSC.sh and for the first time in this year we had the launch of the processes without any issue.

  16321   Mon Sep 13 14:32:25 2021 YehonathanUpdateCDSc1auxey assembly

So we agreed that the RTNs points on the c1auxex Acromag chassis should just be grounded to the local Acromag ground as it just needs a stable reference. Normally, the RTNs are not connected to any ground so there is should be no danger of forming ground loops by doing that. It is probably best to use the common wire from the 15V power supplies since it also powers the VME crate. I took the spectra of the ETMX OSEMs (attachment) for reference and proceeding with the grounding work.

 

Attachment 1: ETMX_OSEMS_Noise.png
ETMX_OSEMS_Noise.png
  16325   Tue Sep 14 15:57:05 2021 jamieFrogsCDSfb1 /var full after reboot, caused all sorts of problems

/var on fb1 filled up today, which caused all sorts of CDS issues.  I found out about the problem by reading the logs of the services that were having trouble running, in which they complained about not being able to write to disk.  I looked at the filesystem status with 'df' and noticed that /var was full, which is where applications write temporary data, and will always cause problems if it's full.

I tracked the issue down to multiple multi-gigabyte log files: /var/log/messages and /var/log/messages.1.  They were full of lines like this one:

Aug 29 06:25:21 fb1 kernel: l called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl ca

Seems like something related to the gpstime kernel module?

Anyway, I deleted the log files for now, which cleared up the space on /var.  Things should be back to normal now, until the logs fill up again...

  16327   Tue Sep 14 16:44:54 2021 jamieFrogsCDSfb1 /var full after reboot, caused all sorts of problems

Jonathan Hanks pointed me to this fix to the gpstime kernel module that was unfortunately put in after the 3.4 release that we're currently using:

https://git.ligo.org/cds/advligorts/-/commit/6f6d6e2eb1d3355d0cbfe9fe31ea3b59af1e7348

I hacked the source in place (/usr/src/gpstime-3.4/drv/gpstime/gpstime.c) to get the fix, and then rebuilt the kernel module with dkms :

sudo dkms uninstall gpstime/3.4
sudo dkms install gpstime/3.4

I then stopped daqd_dc, unloaded gpstime, reloaded it, restarted daqd_dc.  The messages are no longer showing up in /var/log/messages, so I think we're ok for the moment.

NOTE: the fix will be undone if we for some reason reinstall the advligorts-gpstime-dkms package.  There shouldn't be a need to do that, but we should be aware.  I'm discussing with Jonathan if we want to try to push out a new debian package to fix this issue...

  16330   Tue Sep 14 17:22:21 2021 AnchalUpdateCDSAdded temp sensor channels to DAQ list

[Tega, Paco, Anchal]

We attempted to reboot fb1 daqd today to get the new temperature sensor channels recording. However, the FE models got stuck, apparantely due to reasons explaine din 40m/16325. Jamie cleared the /var/logs in fb1 so that FE can reboot. We were able to reboot the FE machines after this work successfully and get the models running too. During the day, the FE machines were shut down manually and brought back on manually, a couple of times on the c1iscex machine. Only change in fb1 is in the /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini where the new channels were added, and some hacking was done by Jamie in gpstime module (See 40m/16327).

  16332   Wed Sep 15 11:27:50 2021 YehonathanUpdateCDSc1auxey assembly

{Yehonathan, Paco}

We turned off the ETMX watchdogs and OpLevs. We went to the X end and shut down the Acromag chassi. We labeled the chassi feedthroughs and disconnected all the cables from it.

We took it out and tied the common wire of the power supplies (the commons of the 20V and 15V power supplies were shorted so there is no difference which we connect) to the RTNs of the analog inputs.

The chassi was put back in place. All the cables were reconnected. Power turn on.

We rebooted c1auxex and the channels went back online. We turned on the watchdogs and watched the ETMX motion get damped. We turned on the OpLev. We waited until the beam position got centered on the ETMX.

Attachment shows a comparison between the OSEM spectra before and after the grounding work. Seems like there is no change.

We were able to lock the arms with no issues.

 

Attachment 1: c1auxex_Grounding_OSEM_comparison1.pdf
c1auxex_Grounding_OSEM_comparison1.pdf
Attachment 2: c1auxex_Grounding_OSEM_comparison2.pdf
c1auxex_Grounding_OSEM_comparison2.pdf
  16351   Tue Sep 21 11:09:34 2021 AnchalSummaryCDSXARM YARM UGF Servo and Oscillators added

I've updated the c1LSC simulink model to add the so-called UGF servos in the XARM and YARM single arm loops as well. These were earlier present in DARM, CARM, MICH and PRCL loops only. The UGF servo themselves serves a larger purpose but we won't be using that. What we have access to now is to add an oscillator in the single arm and get realtime demodulated signal before and after the addition of the oscillator. This would allow us to get the open loop transfer function and its uncertaintiy at particular frequencies (set by the oscillator) and would allow us to create a noise budget on the calibration error of these transfer functions.

 

The new model has been committed locally in the 40m/RTCDSmodels git repo. I do not have rights to push to the remote in git.ligo. The model builds, installs and starts correctly.

  16354   Wed Sep 22 12:40:04 2021 AnchalSummaryCDSXARM YARM UGF Servo and Oscillators shifted to OAF

To reduce burden on c1lsc, I've shifted the added UGF block to to c1oaf model. c1lsc had to be modified to allow addition of an oscillator in the XARm and YARM control loops and take out test points before and after the addition to c1oaf through shared memory IPC to do realtime demodulation in c1oaf model.

The new models built and installed successfully and I've been able to recover both single arm locks after restarting the computers.

 

  16365   Wed Sep 29 17:10:09 2021 AnchalSummaryCDSc1teststand problems summary

[anchal, ian]

We went and collected some information for the overlords to fix the c1teststand DAQ network issue.


  • from c1teststand, c1bhd and c1sus2 computers were not accessible through ssh. (No route to host). So we restarted both the computers (the I/O chassis were ON).
  • After the computers restarted, we were able to ssh into c1bhd and c1sus, ad we ran rtcds start c1x06 and rtcds start c1x07.
  • The first page in attachment shows the screenshot of GDS_TP screens of the IOP models after this step.
  • Then we started teh user models by running rtcds start c1bhd and rtcds start c1su2.
  • The second page shows the screenshot of GDS_TP screens. You can notice that DAQ status is red in all the screens and the DC statuses are blank.
  • So we checked if daqd_ services are running in the fb computer. They were not. So we started them all by sudo systemctl start daqd_*.
  • Third page shows the status of all services after this step. the daqd_dc.service remained at failed state.
  • open-mx_stream.service was not even loaded in fb. We started it by running sudo systemctl start open-mx_stream.service.
  • The fourth page shows the status of this service. It started without any errors.
  • However, when we went to check the status of mx_stream.service in c1bhd and c1sus2, they were not loaded and we we tried to start them, they showed failed state and kept trying to start every 3 seconds without success. (See page 5 and 6).
  • Finally, we also took a screenshot of timedatectl command output on the three computers fb, c1bhd, and c1sus2 to show that their times were not synced at all.
  • The ntp service is running on fb but it probably does not have access to any of the servers it is following.
  • The timesyncd on c1bhd and c1sus2 (FE machines) is also running but showing status 'Idle' which suggested they are unable to find the ntp signal from fb.
  • I believe this issue is similar to what jamie ficed in the fb1 on martian network in 40m/16302. Since the fb on c1teststand network was cloned before this fix, it might have this dysfunctional ntp as well.

We would try to get internet access to c1teststand soon. Meanwhile, someone with more experience and knowledge should look into this situation and try to fix it. We need to test the c1teststand within few weeks now.

Attachment 1: c1teststand_issues_summary.pdf
c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf
  16367   Thu Sep 30 14:09:37 2021 AnchalSummaryCDSNew way to ssh into c1teststand

Late elog, original time Wed Sep 29 14:09:59 2021

We opened a new port (22220) in the router to the martian subnetwork which is forwarded to port 22 on c1teststand (192.168.113.245) allowing direct ssh access to c1teststand computer from the outside world using:

                                                                       

                                                                                    
 

Checkout this wiki page for unredadcted info.

  16372   Mon Oct 4 11:05:44 2021 AnchalSummaryCDSc1teststand problems summary

[Anchal, Paco]

We tried to fix the ntp synchronization in c1teststand today by repeating the steps listed in 40m/16302. Even though teh cloned fb1 now has the exact same package version, conf & service files, and status, the FE machines (c1bhd and c1sus2) fail to sync to the time. the timedatectl shows the same stauts 'Idle'. We also, dug bit deeper into the error messages of daq_dc on cloned fb1 and mx_stream on FE machines and have some error messages to report here.


Attempt on fixing the ntp

  • We copied the ntp package version 1:4.2.6 deb file from /var/cache/apt/archives/ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb on the martian fb1 to the cloned fb1 and ran.
    controls@fb1:~ 0$ sudo dbpg -i ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb
  • We got error messages about missing dependencies of libopts25 and libssl1.1. We downloaded oldoldstable jessie versions of these packages from here and here. We ensured that these versions are higher than the required versions for ntp. We installed them with:
    controls@fb1:~ 0$ sudo dbpg -i libopts25_5.18.12-3_amd64.deb 
    controls@fb1:~ 0$ sudo dbpg -i libssl1.1_1.1.0l-1~deb9u4_amd64.deb
  • Then we installed the ntp package as described above. It asked us if we want to keep the configuration file, we pressed Y.
  • However, we decided to make the configuration and service files exactly same as martian fb1 to make it same in cloned fb1. We copied /etc/ntp.conf and /etc/systemd/system/ntp.service files from martian fb1 to cloned fb1 in the same positions. Then we enabled ntp, reloaded the daemon, and restarted ntp service:
    controls@fb1:~ 0$ sudo systemctl enable ntp
    controls@fb1:~ 0$ sudo systemctl daemon-reload
    controls@fb1:~ 0$ sudo systemctl restart ntp
  • But ofcourse, since fb1 doesn't have internet access, we got some errors in status of the ntp.service:
    controls@fb1:~ 0$ sudo systemctl status ntp
    ● ntp.service - NTP daemon (custom service)
       Loaded: loaded (/etc/systemd/system/ntp.service; enabled)
       Active: active (running) since Mon 2021-10-04 17:12:58 UTC; 1h 15min ago
     Main PID: 26807 (code=exited, status=0/SUCCESS)
       CGroup: /system.slice/ntp.service
               ├─30408 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
               └─30525 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
    
    Oct 04 17:48:42 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
    Oct 04 17:48:52 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
    Oct 04 18:05:05 fb1 ntpd_intres[30525]: host name not found: 0.debian.pool.ntp.org
    Oct 04 18:05:15 fb1 ntpd_intres[30525]: host name not found: 1.debian.pool.ntp.org
    Oct 04 18:05:25 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
    Oct 04 18:05:35 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
    Oct 04 18:21:48 fb1 ntpd_intres[30525]: host name not found: 0.debian.pool.ntp.org
    Oct 04 18:21:58 fb1 ntpd_intres[30525]: host name not found: 1.debian.pool.ntp.org
    Oct 04 18:22:08 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
    Oct 04 18:22:18 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
  • But the ntpq command is giving the saem output as given by ntpq comman in martian fb1 (except for the source servers), that the broadcasting is happening in the same manner:
    controls@fb1:~ 0$ ntpq -p
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
     192.168.123.255 .BCST.          16 u    -   64    0    0.000    0.000   0.000
    
  • On the FE machines side though, the systemd-timesyncd are still unable to read the time signal from fb1 and show the status as idle:
    controls@c1bhd:~ 3$ timedatectl
          Local time: Mon 2021-10-04 18:34:38 UTC
      Universal time: Mon 2021-10-04 18:34:38 UTC
            RTC time: Mon 2021-10-04 18:34:38
           Time zone: Etc/UTC (UTC, +0000)
         NTP enabled: yes
    NTP synchronized: no
     RTC in local TZ: no
          DST active: n/a
    controls@c1bhd:~ 0$ systemctl status systemd-timesyncd -l
    ● systemd-timesyncd.service - Network Time Synchronization
       Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
       Active: active (running) since Mon 2021-10-04 17:21:29 UTC; 1h 13min ago
         Docs: man:systemd-timesyncd.service(8)
     Main PID: 244 (systemd-timesyn)
       Status: "Idle."
       CGroup: /system.slice/systemd-timesyncd.service
               └─244 /lib/systemd/systemd-timesyncd
  • So the time synchronization is still not working. We expected the FE machined to just synchronize to fb1 even though it doesn't have any upstream ntp server to synchronize to. But that didn't happen.
  • I'm (Anchal) working on getting internet access to c1teststand computers.

Digging into mx_stream/daqd_dc errors:

  • We went and changed the Restart fileld in /etc/systemd/system/daqd_dc.service on cloned fb1 to 2. This allows the service to fail and stop restarting after two attempts. This allows us to see the real error message instead of the systemd error message that the service is restarting too often. We got following:
    controls@fb1:~ 3$ sudo systemctl status daqd_dc -l
    ● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
       Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
       Active: failed (Result: exit-code) since Mon 2021-10-04 17:50:25 UTC; 22s ago
      Process: 715 ExecStart=/usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc (code=exited, status=1/FAILURE)
     Main PID: 715 (code=exited, status=1/FAILURE)
    
    Oct 04 17:50:24 fb1 systemd[1]: Started Advanced LIGO RTS daqd data concentrator.
    Oct 04 17:50:25 fb1 daqd_dc_mx[715]: [Mon Oct  4 17:50:25 2021] Unable to set to nice = -20 -error Unknown error -1
    Oct 04 17:50:25 fb1 daqd_dc_mx[715]: Failed to do mx_get_info: MX not initialized.
    Oct 04 17:50:25 fb1 daqd_dc_mx[715]: 263596
    Oct 04 17:50:25 fb1 systemd[1]: daqd_dc.service: main process exited, code=exited, status=1/FAILURE
    Oct 04 17:50:25 fb1 systemd[1]: Unit daqd_dc.service entered failed state.
    
  • It seemed like the only thing daqd_dc process doesn't like is that mx_stream services are in failed state in teh FE computers. So we did the same process on FE machines to get the real error messages:
    controls@fb1:~ 0$ sudo chroot /diskless/root
    fb1:/ 0#
    fb1:/ 0# sudo nano /etc/systemd/system/mx_stream.service
    fb1:/ 0#
    fb1:/ 0# exit
  • Then I ssh'ed into c1bhd to see the error message on mx_stream service properly.
    controls@c1bhd:~ 0$ sudo systemctl daemon-reload
    controls@c1bhd:~ 0$ sudo systemctl restart mx_stream
    controls@c1bhd:~ 0$ sudo systemctl status mx_stream -l
    ● mx_stream.service - Advanced LIGO RTS front end mx stream
       Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
       Active: failed (Result: exit-code) since Mon 2021-10-04 17:57:20 UTC; 24s ago
      Process: 11832 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
     Main PID: 11832 (code=exited, status=1/FAILURE)
    
    Oct 04 17:57:20 c1bhd systemd[1]: Starting Advanced LIGO RTS front end mx stream...
    Oct 04 17:57:20 c1bhd systemd[1]: Started Advanced LIGO RTS front end mx stream.
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: send len = 263596
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: mx_connect failed Nic ID not Found in Peer Table
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: c1x06_daq mmapped address is 0x7f516a97a000
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: c1bhd_daq mmapped address is 0x7f516697a000
    Oct 04 17:57:20 c1bhd systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
    Oct 04 17:57:20 c1bhd systemd[1]: Unit mx_stream.service entered failed state.
    
  • c1sus2 shows the same error. I'm not sure I understand these errors at all. But they seem to have nothing to do with timing issuessurprise!

As usual, some help would be helpful

  16376   Mon Oct 4 18:00:16 2021 KojiSummaryCDSc1teststand problems summary

I don't know anything about mx/open-mx, but you also need open-mx,don't you?


controls@c1ioo:~ 0$ systemctl status *mx*
● open-mx.service - LSB: starts Open-MX driver
   Loaded: loaded (/etc/init.d/open-mx)
   Active: active (running) since Wed 2021-09-22 11:54:39 PDT; 1 weeks 5 days ago
  Process: 470 ExecStart=/etc/init.d/open-mx start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/open-mx.service
           └─620 /opt/3.2.88-csp/open-mx-1.5.4/bin/fma -d

● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: active (running) since Wed 2021-09-22 12:08:00 PDT; 1 weeks 5 days ago
 Main PID: 5785 (mx_stream)
   CGroup: /system.slice/mx_stream.service
           └─5785 /usr/bin/mx_stream -e 0 -r 0 -w 0 -W 0 -s c1x03 c1ioo c1als c1omc -d fb1:0

 

  16381   Tue Oct 5 17:58:52 2021 AnchalSummaryCDSc1teststand problems summary

open-mx service is running successfully on the fb1(clone), c1bhd and c1sus.

Quote:

I don't know anything about mx/open-mx, but you also need open-mx,don't you?


  16382   Tue Oct 5 18:00:53 2021 AnchalSummaryCDSc1teststand time synchronization working now

Today I got a new router that I used to connect the c1teststand, fb1 and chiara. I was able to see internet access in c1teststand and fb1, but not in chiara. I'm not sure why that is the case.

The good news is that the ntp server on fb1(clone) is working fine now and both FE computers, c1bhd and c1sus2 are succesfully synchronized to the fb1(clone) ntpserver. This resolves any possible timing issues in this DAQ network.

On running the IOP and user models however, I see the same errors are mentioned in 40m/16372. Something to do with:

Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: mx_connect failed Nic ID not Found in Peer Table
Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: c1x07_daq mmapped address is 0x7fa4819cc000
Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: c1su2_daq mmapped address is 0x7fa47d9cc000


Thu Oct 7 17:04:31 2021

I fixed the issue of chiara not getting internet. Now c1teststand, fb1 and chiara, all have internet connections. It was the issue of default gateway and interface and findiing the DNS. I have found the correct settings now.

  16391   Mon Oct 11 17:31:25 2021 AnchalSummaryCDSFixed mounting of mx devices in fb. daqd_dc is running now.
 
 

However, lspci | grep 'Myri' shows following output on both computers:

controls@fb1:/dev 0$ lspci | grep 'Myri'
02:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (rev 01)

Which means that the computer detects the card on PCie slot.

 

I tried to add this to /etc/rc.local to run this script at every boot, but it did not work. So for now, I'll just manually do this step everytime. Once the devices are loaded, we get:

controls@fb1:/etc 0$ ls /dev/*mx*
/dev/mx0  /dev/mx4  /dev/mxctl   /dev/mxp2  /dev/mxp6         /dev/ptmx
/dev/mx1  /dev/mx5  /dev/mxctlp  /dev/mxp3  /dev/mxp7
/dev/mx2  /dev/mx6  /dev/mxp0    /dev/mxp4  /dev/open-mx
/dev/mx3  /dev/mx7  /dev/mxp1    /dev/mxp5  /dev/open-mx-raw

The, restarting all daqd_ processes, I found that daqd_dc was running succesfully now. Here is the status:

controls@fb1:/etc 0$ sudo systemctl status daqd_* -l
● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
   Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:00 PDT; 23min ago
 Main PID: 2308 (daqd_dc_mx)
   CGroup: /daqd.slice/daqd_dc.service
           ├─2308 /usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc
           └─2370 caRepeater

Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 006 thread priority error Operation not permitted[Mon Oct 11 17:48:06 2021]
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 005 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] [Mon Oct 11 17:48:06 2021] mx receiver 006 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 007 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread - label dqmx003 pid=2362
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread priority error Operation not permitted
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: warning:regcache incompatible with malloc
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] EDCU has 410 channels configured; first=0
Oct 11 17:49:06 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:49:06 2021] ->4: clear crc

● daqd_fw.service - Advanced LIGO RTS daqd frame writer
   Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:01 PDT; 23min ago
 Main PID: 2318 (daqd_fw)
   CGroup: /daqd.slice/daqd_fw.service
           └─2318 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw

Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] [Mon Oct 11 17:48:09 2021] Producer thread - label dqproddbg pid=2440
Oct 11 17:48:09 fb1 daqd_fw[2318]: Producer crc thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] [Mon Oct 11 17:48:09 2021] Producer crc thread put on CPU 0
Oct 11 17:48:09 fb1 daqd_fw[2318]: Producer thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread put on CPU 0
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread - label dqprod pid=2434
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread put on CPU 0
Oct 11 17:48:10 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:10 2021] Minute trender made GPS time correction; gps=1318034906; gps%60=26
Oct 11 17:49:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:49:09 2021] ->3: clear crc

● daqd_rcv.service - Advanced LIGO RTS daqd testpoint receiver
   Loaded: loaded (/etc/systemd/system/daqd_rcv.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:00 PDT; 23min ago
 Main PID: 2311 (daqd_rcv)
   CGroup: /daqd.slice/daqd_rcv.service
           └─2311 /usr/bin/daqd_rcv -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.rcv

Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1X07_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_STATUS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_CRC_CPS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_STATUS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_CRC_CPS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1OM[Mon Oct 11 17:50:21 2021] Epics server started
Oct 11 17:50:24 fb1 daqd_rcv[2311]: [Mon Oct 11 17:50:24 2021] Minute trender made GPS time correction; gps=1318035040; gps%120=40
Oct 11 17:51:21 fb1 daqd_rcv[2311]: [Mon Oct 11 17:51:21 2021] ->3: clear crc

Now, even before starting teh FE models, I see DC status as ox2bad in the CDS screens of the IOP and user models. The mx_stream service remains in a failed state at teh FE machines and remain the same even after restarting the service.

controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: failed (Result: exit-code) since Mon 2021-10-11 17:50:26 PDT; 15min ago
  Process: 382 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
 Main PID: 382 (code=exited, status=1/FAILURE)

Oct 11 17:50:25 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 17:50:25 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 17:50:25 c1sus2 mx_stream_exec[382]: Failed to open endpoint Not initialized
Oct 11 17:50:26 c1sus2 systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 11 17:50:26 c1sus2 systemd[1]: Unit mx_stream.service entered failed state.

But  if I restart the mx_stream service before starting the rtcds models, the mx-stream service starts succesfully:

controls@c1sus2:~ 0$ sudo systemctl restart mx_stream
controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: active (running) since Mon 2021-10-11 18:14:13 PDT; 25s ago
 Main PID: 1337 (mx_stream)
   CGroup: /system.slice/mx_stream.service
           └─1337 /usr/bin/mx_stream -e 0 -r 0 -w 0 -W 0 -s c1x07 c1su2 -d fb1:0

Oct 11 18:14:13 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 18:14:13 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: send len = 263596
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: Connection Made

However, the DC status on CDS screens still show 0x2bad. As soon as I start the rtcds model c1x07 (the IOP model for c1sus2), the mx_stream service fails:

controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: failed (Result: exit-code) since Mon 2021-10-11 18:18:03 PDT; 27s ago
  Process: 1337 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
 Main PID: 1337 (code=exited, status=1/FAILURE)

Oct 11 18:14:13 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 18:14:13 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: send len = 263596
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: Connection Made
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: isendxxx failed with status Remote Endpoint Unreachable
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: disconnected from the sender
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: c1x07_daq mmapped address is 0x7fe3620c3000
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: c1su2_daq mmapped address is 0x7fe35e0c3000
Oct 11 18:18:03 c1sus2 systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 11 18:18:03 c1sus2 systemd[1]: Unit mx_stream.service entered failed state.

This shows that the start of rtcds model, causes the fail in mx_stream, possibly due to inability of finding the endpoint on fb1. I've again reached to the edge of my knowledge here. Maybe the fiber optic connection between fb and the network switch that connects to FE is bad, or the connection between switch and FEs is bad.

But we are just one step away from making this work.

 

 

  16392   Mon Oct 11 18:29:35 2021 AnchalSummaryCDSMoving forward?

The teststand has some non-trivial issue with Myrinet card (either software or hardware) which even teh experts are saying they don't remember how to fix it. CDS with mx was iin use more than a decade ago, so it is hard to find support for issues with it now and will be the same in future. We need to wrap up this test procedure one way or another now, so I have following two options moving forward:


Direct integration with main CDS and testing

  • We can just connect the c1sus2 and c1bhd FE computers to martian network directly.
  • We'll have to connect c1sus2 and c1bhd to the optical fiber subnetwork as well.
  • On booting, they would get booted through the exisitng fb1 boot server which seems to work fine for the other 5 FE machines.
  • We can update teh DHCP in chiara and reload it so that we can ssh into these FEs with host names.
  • Hopefully, presence of these computers won't tank the existing CDS even if they  themselves have any issues, as they have no shared memory with other models.
  • If this works, we can do the loop back testing of I/O chassis using the main DAQ network and move on with our upgrade.
  • If this does not work and causes any harm to exisitng CDS network, we can disconnect these computers and go back to existing CDS. Recently, our confidence on rebooting the CDS has increased with the robust performance as some legacy issues were fixed.
  • We'll however, continue to use a CDS which is no more supported by the current LIGO CDS group.

Testing CDS upgrade on teststand

  • From what I could gather, most of the hardware in I/O chassis that I could find, is still used in CDS of LLO and LHO, with their recent tests and documents using the same cards and PCBs.
  • There might be some difference in the DAQ network setup that I need to confirm.
  • I've summarised the current c1teststand hardware on this wiki page.
  • If the latest CDS is backwards compatible with our hardware, we can test the new CDS in teh c1teststand setup without disrupting our main CDS. We'll have ample help and support for this upgrade from the current LIGO CDS group.
  • We can do the loop back testing of the I/O chassis as well.
  • If the upgrade is succesfull in the teststand without many hardware changes, we can upgrade the main CDS of 40m as well, as it has the same hardware as our teststand.
  • Biggest plus point would be that out CDS will be up-to-date and we will be able to take help from CDS group if any trouble occurs.

So these are the two options we have. We should discuss which one to take in the mattermost chat or in upcoming meeting.

  16395   Tue Oct 12 17:10:56 2021 AnchalSummaryCDSSome more information

Chris pointed out some information displaying scripts, that show if the DAQ network is working or not. I thought it would be nice to log this information here as well.

controls@fb1:/opt/mx/bin 0$ ./mx_info
MX Version: 1.2.16
MX Build: controls@fb1:/opt/src/mx-1.2.16 Mon Aug 14 11:06:09 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:45:37:86
    Product code:    10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:    423340
    Mapper:        00:60:dd:45:37:86, version = 0x00000000, configured
    Mapped hosts:    3

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:45:37:86 fb1:0                             1,0
   1) 00:25:90:05:ab:47 c1bhd:0                           1,0
   2) 00:25:90:06:69:c3 c1sus2:0                          1,0

 

controls@c1bhd:~ 1$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.4
 build: root@fb1:/opt/src/open-mx-1.5.4 Tue Aug 15 23:48:03 UTC 2017

Found 1 boards (32 max) supporting 32 endpoints each:
 c1bhd:0 (board #0 name eth1 addr 00:25:90:05:ab:47)
   managed by driver 'igb'

Peer table is ready, mapper is 00:60:dd:45:37:86
================================================
  0) 00:25:90:05:ab:47 c1bhd:0
  1) 00:60:dd:45:37:86 fb1:0
  2) 00:25:90:06:69:c3 c1sus2:0

 

controls@c1sus2:~ 0$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.4
 build: root@fb1:/opt/src/open-mx-1.5.4 Tue Aug 15 23:48:03 UTC 2017

Found 1 boards (32 max) supporting 32 endpoints each:
 c1sus2:0 (board #0 name eth1 addr 00:25:90:06:69:c3)
   managed by driver 'igb'

Peer table is ready, mapper is 00:60:dd:45:37:86
================================================
  0) 00:25:90:06:69:c3 c1sus2:0
  1) 00:60:dd:45:37:86 fb1:0
  2) 00:25:90:05:ab:47 c1bhd:0

These outputs prove that the framebuilder and the FEs are able to see each other in teh DAQ network.


Further, the error that we see when IOP model is started which crashes the mx_stream service on the FE machines (see 40m/16391) :

isendxxx failed with status Remote Endpoint Unreachable

This has been seen earlier when Jamie was troubleshooting the current fb1 in martian network in 40m/11655 in Oct, 2015. Unfortunately, I could not find what Jamie did over a year to fix this issue.

  16396   Tue Oct 12 17:20:12 2021 AnchalSummaryCDSConnected c1sus2 to martian network

I connected c1sus2 to the martian network by splitting the c1sim connection with a 5-way switch. I also ran another ethernet cable from the second port of c1sus2 to the DAQ network switch on 1X7.

Then I logged into chiara and added the following in chiara:/etc/dhcp/dhcpd.conf :

host c1sus2 {
  hardware ethernet 00:25:90:06:69:C2;
  fixed-address 192.168.113.92;
}

And following line in chiara:/var/lib/bind/martian.hosts :

c1sus2          A    192.168.113.92

Note that entires c1bhd is already added in these files, probably during some earlier testing by Gautam or Jon. Then I ran following to restart the dhcp server and nameserver:

~> sudo service bind9 reload
[sudo] password for controls:
 * Reloading domain name service... bind9                                                 [ OK ]
~> sudo service isc-dhcp-server restart
isc-dhcp-server stop/waiting
isc-dhcp-server start/running, process 25764

Now, As I switched on c1sus2 from front panel, it booted over network from fb1 like other FE machines and I was able to login to it by first logging to fb1 and then sshing to c1sus2.

Next, I copied the simulink models and the medm screens of c1x06, xc1x07, c1bhd, c1sus2 from the paths mentioned on this wiki page. I also copied the medm screens from chiara(clone):/opt/rtcds/caltech/c1/medm to martian network chiara in the appropriate places. I have placed the file /opt/rtcds/caltech/c1/medm/teststand_sitemap.adl which can be used to open sitemap for c1bhd and c1sus2 IOP and user models.

Then I logged into c1sus2 (via fb1) and did make, install, start procedure:

controls@c1sus2:~ 0$ rtcds make c1x07
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### building c1x07...
Cleaning c1x07...
Done
Parsing the model c1x07...
Done
Building EPICS sequencers...
Done
Building front-end Linux kernel module c1x07...
Done
RCG source code directory:
/opt/rtcds/rtscore/branches/branch-3.4
The following files were used for this build:
/opt/rtcds/userapps/release/cds/c1/models/c1x07.mdl

Successfully compiled c1x07
***********************************************
Compile Warnings, found in c1x07_warnings.log:
***********************************************
***********************************************
controls@c1sus2:~ 0$ rtcds install c1x07
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### installing c1x07...
Installing system=c1x07 site=caltech ifo=C1,c1
Installing /opt/rtcds/caltech/c1/chans/C1X07.txt
Installing /opt/rtcds/caltech/c1/target/c1x07/c1x07epics
Installing /opt/rtcds/caltech/c1/target/c1x07
Installing start and stop scripts
/opt/rtcds/caltech/c1/scripts/killc1x07
/opt/rtcds/caltech/c1/scripts/startc1x07
sudo: unable to resolve host c1sus2
Performing install-daq
Updating testpoint.par config file
/opt/rtcds/caltech/c1/target/gds/param/testpoint.par
/opt/rtcds/rtscore/branches/branch-3.4/src/epics/util/updateTestpointPar.pl -par_file=/opt/rtcds/caltech/c1/target/gds/param/archive/testpoint_211012_174226.par -gds_node=24 -site_letter=C -system=c1x07 -host=c1sus2
Installing GDS node 24 configuration file
/opt/rtcds/caltech/c1/target/gds/param/tpchn_c1x07.par
Installing auto-generated DAQ configuration file
/opt/rtcds/caltech/c1/chans/daq/C1X07.ini
Installing Epics MEDM screens
Running post-build script

safe.snap exists 
controls@c1sus2:~ 0$ rtcds start c1x07
Cannot start/stop model 'c1x07' on host c1sus2.
controls@c1sus2:~ 4$ rtcds list

controls@c1sus2:~ 0$ 

One can see that even after making and installing, the model c1x07 is not listed as available models in rtcds list. Same is the case for c1sus2 as well. So I could not proceed with testing.

Good news is that nothing that I did affect the current CDS functioning. So we can probably do this testing safely from the main CDS setup.

  16397   Tue Oct 12 23:42:56 2021 KojiSummaryCDSConnected c1sus2 to martian network

Don't you need to add the new hosts to /diskless/root/etc/rtsystab at fb1? --> There looks many elogs talking about editing "rtsystab".

controls@fb1:/diskless/root/etc 0$ cat rtsystab
#
# host    list of control systems to run, starting with IOP
#
c1iscex  c1x01  c1scx c1asx
c1sus     c1x02  c1sus c1mcs c1rfm c1pem
c1ioo     c1x03  c1ioo c1als c1omc
c1lsc    c1x04  c1lsc c1ass c1oaf c1cal c1dnn c1daf
c1iscey  c1x05 c1scy c1asy
#c1test   c1x10  c1tst2

 

  16398   Wed Oct 13 11:25:14 2021 AnchalSummaryCDSRan c1sus2 models in martian CDS. All good!

Three extra steps (when adding new models, new FE):

  • Chris pointed out that the sudo command in c1sus2 is giving error
    sudo: unable to resolve host c1sus2
    
    This error comes in when the computer could not figure out it's own hostname. Since FEs are network booted off the fb1, we need to update the /etc/hosts in /diskless/root everytime we add a new FE.
    controls@fb1:~ 0$ sudo chroot /diskless/root
    fb1:/ 0# sudo nano /etc/hosts
    fb1:/ 0# exit
    
    I added the following line in /etc/hosts file above:
    192.168.113.92  c1sus2 c1sus2.martian
    
    This resolved the issue of sudo giving error. Now, the rtcds make and install steps had no errors mentioned in their outputs.
  • Another thing that needs to be done, as Koji pointed out, is to add the host and models in /etc/rtsystab in /diskless/root of fb:
    controls@fb1:~ 0$ sudo chroot /diskless/root
    fb1:/ 0# sudo nano /etc/rtsystab
    fb1:/ 0# exit
    
    I added the following lines in /etc/rtsystab file above:
    c1sus2   c1x07  c1su2
    
    This told rtcds what models would be available on c1sus2. Now rtcds list is displaying the right models:
    controls@c1sus2:~ 0$ rtcds list
    c1x07
    c1su2
  • The above steps are still not sufficient for the daqd_ processes to know about the new models. This part is supossed to happen automatically, but does not happen in our CDS apparently. So everytime there is a new model, we need to edit the file /opt/rtcds/caltech/c1/target/daqd/master and add following lines to it:
    # Fast Data Channel lists
    # c1sus2
    /opt/rtcds/caltech/c1/chans/daq/C1X07.ini
    /opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
    
    # test point lists
    # c1sus2
    /opt/rtcds/caltech/c1/target/gds/param/tpchn_c1x07.par
    /opt/rtcds/caltech/c1/target/gds/param/tpchn_c1su2.par
    
    I needed to restart the daqd_ processes in  fb1 for them to notice these changes:
    controls@fb1:~ 0$ sudo systemctl restart daqd_*
    
    This finally lit up the status channels of DC in C1X07_GDS_TP.adl and C1SU2_GDS_TP.adl . However the channels C1:DAQ-DC0_C1X07_STATUS and C1:DAQ-DC0_C1SU2_STATUS both have values 0x2bad. This persists on restarting the models. I then just simply restarted teh mx_stream on c1sus2 and boom, it worked! (see attached all green screen, never seen before!)

So now Ian can work on testing the I/O chassis and we would be good to move c1sus2 FE and I/O chassis to 1Y3 after that. I've also done following extra changes:

  • Updated CDS_FE_STATUS medm screen to show the new c1sus2 host.
  • Updated global diag rest script to act on c1xo7 and c1su2 as well.
  • Updated mxstream restart script to act on c1sus2 as well.
Attachment 1: CDS_screens_running.png
CDS_screens_running.png
  16414   Tue Oct 19 18:20:33 2021 Ian MacMillanSummaryCDSc1sus2 DAC to ADC test

I ran a DAC to ADC test on c1sus2 channels where I hooked up the outputs on the DAC to the input channels on the ADC. We used different combinations of ADCs and DACs to make sure that there were no errors that cancel each other out in the end. I took a transfer function across these channel combinations to reproduce figure 1 in T2000188.

As seen in the two attached PDFs the channels seem to be working properly they have a flat response with a gain of 0.5 (-6 dB). This is the response that is expected and is the result of the DAC signal being sent as a single ended signal and the ADC receiving as a differential input signal. This should result in a recorded signal of 0.5 the amplitude of the actual output signal.

The drop off on the high frequency end is the result of the anti-aliasing filter and the anti-imaging filter. Both of these are 8-pole elliptical filters so when combined we should get a drop off of 320dB per decade. I measured the slope on the last few points of each filter and the averaged value was around 347dB per decade. This is slightly steeper than expected but since it is to cut off higher frequencies it shouldn't have an effect on the operation of the system. Also it is very close to the expected value.

The ripples seen before the drop off are also an effect of the elliptical filters and are seen in T2000188.

Note: the transfer function that doesn't seem to match the others is the heartbeat timing signal.

Attachment 1: data3_Plots.pdf
data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf
Attachment 2: data2_Plots.pdf
data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf
  16415   Tue Oct 19 23:43:09 2021 KojiSummaryCDSc1sus2 DAC to ADC test

(Because of a totally unrelated reason) I was checking the electronics units for the upgrade. And I realized that the electronics units at the test stand have not been properly powered.

I found that the AA/AI stack at the test stand (Attachment 1) has an unusual powering configuration (Attachment 2).
- Only the positive power supply was used / - The supply voltage is only +15V / - The GND reference is not connected to anywhere.

For confirmation, I checked the voltage across the DC power strip (Attachments 3/4). The positive was +5.3V and the negative was -9.4V. This is subject to change depending on the earth potential.

This is not a good condition at all. The asymmetric powering of the circuit may cause damages to the opamps. So I turned off the switches of the units.

The power configuration should be immediately corrected.

  1. Use both positive and negative supply (2 power supply channels) to produce the positive and the negative voltage potentials. Connect the reference potential to the earth post of the power supply.
    https://www.youtube.com/watch?v=9_6ecyf6K40   [Dual Power Supply Connection / Serial plus minus electronics laboratory PS with center tap]
  2. These units have DC power regulator which produces +/-15V out of +/-18V. So the DC power supplies are supposed to be set at +18V.

 

Attachment 1: P_20211019_224433.jpg
P_20211019_224433.jpg
Attachment 2: P_20211019_224122.jpg
P_20211019_224122.jpg
Attachment 3: P_20211019_224400.jpg
P_20211019_224400.jpg
Attachment 4: P_20211019_224411.jpg
P_20211019_224411.jpg
  16417   Wed Oct 20 11:48:27 2021 AnchalSummaryCDSPower supple configured correctly.

This was horrible! That's my bad, I should have checked the configuration before assuming that it is right.

I fixed the power supply configuration. Now the strip has two rails of +/- 18V and the GND is referenced to power supply earth GND.

Ian should redo the tests.

  16430   Tue Oct 26 18:24:00 2021 Ian MacMillanSummaryCDSc1sus2 DAC to ADC test

[Ian, Anchal, Paco]

After the Koji found that there was a problem with the power source Anchal and I fixed the power then reran the measurment. The only change this time around is that I increased the excitation amplitude to 100. In the first run the excitation amplitude was 1 which seemed to come out noise free but is too low to give a reliable value.

link to previous results

The new plots are attached.

Attachment 1: data2_Plots.pdf
data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf
Attachment 2: data3_Plots.pdf
data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf
  16495   Thu Dec 9 00:32:56 2021 TegaUpdateCDSNew SUS medm screen update

The new SUS screen can be reached via sitemap -> IFO SUS button -> NEW ETMX dropdown menu link. Please use and provide feedback. Not sure exactly if we need/want the display screens after the IOP model on the right of the medm screen. I have not been able to locate the corresponding channels but did not want to remove them until I was sure that we don't plan to add these features to our screens. When all bugs have been ironed out, we can use appropriate macro substitution for the other optics.

The next feature to add is the BLRMS to the coil and PD channels. I plan to combine the PEM BLRMS medm implementation with the sus_single_BLRMS model block (located in  /opt/rtcds/userapps/release/cds/c1/models). This way we use the latest BLRMS block in "/opt/rtcds/userapps/release/cds/common/models/BLRMS.mdl" whilst also leveraging the previous work done on the sus_single_BLRMS model, which neatly fits into our current SUS model.

Attachment 1: Screen_Shot_2021-12-09_at_12.29.30_AM.png
Screen_Shot_2021-12-09_at_12.29.30_AM.png
Attachment 2: Screen_Shot_2021-12-09_at_12.42.35_AM.png
Screen_Shot_2021-12-09_at_12.42.35_AM.png
  16496   Thu Dec 9 18:22:36 2021 TegaUpdateCDSNew SUS medm screen update

Work on the medm screen for SUS RMS monitor is ongoing. The next step would be to incorporate this into the SUS medm screen, add the BLRMS model to the SUS controller model, recompile, check that the channels are being correctly addressed, then load the appropriate bandpass and lowpass filters.  

Attachment 1: Screen_Shot_2021-12-09_at_6.21.09_PM.png
Screen_Shot_2021-12-09_at_6.21.09_PM.png
  16500   Fri Dec 10 18:55:58 2021 TegaUpdateCDSNew SUS medm screen update

Turns out the BLRMS monitoring channels for MC1, MC2, MC3, ITMY and SRM already exist in c1pem. So I modified the new SUS screen to display the BLRMS info for the aforementioned optics. Next step is to add the BLRMS monitor for PRM, ITMX, ETMX and ETMY. This would require extending the number of inputs for the "SUS" block in c1pem to accomodate the additional inputs from the remaining optics.

Attachment 1: BLRMS_ITMY_screenshot.png
BLRMS_ITMY_screenshot.png
  16533   Wed Dec 22 17:40:22 2021 AnchalSummaryCDSc1su2 model updated with SUS damping blocks for 7 SOSs

[Anchal, Koji]

I've updated the c1su2 model today with model suspension blocks for the 7 new SOSs (LO1, LO2, AS1, AS4, SR2, PR2 and PR3). The model is running properly now but we had some difficulty in getting it to run.

Initially, we were getting 0x2000 error on the c1su2 model CDS screen. The issue probably was high data transmission required for all the 7 SOSs in this model. Koji dug up a script /opt/rtcds/caltech/c1/userapps/trunk/cds/c1/scripts/activateDQ.py that has been used historically for updating the data rate on some of theDQ channels in the suspension block. However, this script was not working properly for Koji, so he create a new script at /opt/rtcds/caltech/c1/chans/daq/activateSUS2DQ.py.

[Ed by KA: I could not make this modified script run so that I replaces the input file (i.e. C1SU2.ini). So the output file is named C1SU2.ini.NEW and need to manually replace the original file.]

With this, Koji was able to reduce acquisition rate of SUSPOS_IN1_DQ, SUSPIT_IN1_DQ, SUSYAW_IN1_DQ, SUSSIDE_IN1_DQ, SENSOR_UL, SENSOR_UR, SENSOR_LL,SENSOR_LR, SENSOR_SIDE, OPLEV_PERROR, OPLEV_YERROR, and OPLEV_SUM to 2048 Sa/s. The script modifies the /opt/rtcds/caltech/c1/chans/daq/C1SU2.ini file which would get re-written if c1su2 model is remade and reinstalled. After this modification, the 0x2000 error stopped appearing and the model is running fine.


Should we change the library model part for sus_single_control.mdl

We notice that all our suspension models need to go through this weird python script modifying auto-generated .ini files to reduce the data rate. Ideally, there is a simpler solution to this by simply adding the datarate 2048 in the '#DAQ Channels' block in the model library part /cvs/cds/rtcds/userapps/trunk/sus/c1/models/lib/sus_single_control.mdl which is the root model in all the suspensions. With this change, the .ini files will automatically be written with correct datarate and there will be no need for using the activateDQ script. But we couldn't find why this simple solution was not implemented in the past, so we want to know if there is more stuff going on here then we know. Changing the library model would obviously change every suspension model and we don't want a broken CDS system on our head at the begining of holidays, so we'll leave this delicate task for the near future.

  16537   Wed Dec 29 20:09:40 2021 ranaSummaryCDSc1su2 model updated with SUS damping blocks for 7 SOSs

We want to maintain the 16 kHz sample rate for the COIL DAQ channels, but nothing wrong with reducing the others.

I would suggest setting the DQ sample rates to 256 Hz for the SUS DAMP channels and 1024 Hz for the OPLEV channels (for noise diagnostics).

Maybe you can put these numbers into a new library part and we can have the best of all worlds?

Quote:
 

Should we change the library model part for sus_single_control.mdl

We notice that all our suspension models need to go through this weird python script modifying auto-generated .ini files to reduce the data rate. Ideally, there is a simpler solution to this by simply adding the datarate 2048 in the '#DAQ Channels' block in the model library part /cvs/cds/rtcds/userapps/trunk/sus/c1/models/lib/sus_single_control.mdl which is the root model in all the suspensions. With this change, the .ini files will automatically be written with correct datarate and there will be no need for using the activateDQ script. But we couldn't find why this simple solution was not implemented in the past, so we want to know if there is more stuff going on here then we know. Changing the library model would obviously change every suspension model and we don't want a broken CDS system on our head at the begining of holidays, so we'll leave this delicate task for the near future.

 

  16546   Thu Jan 6 12:52:49 2022 AnchalUpdateCDSYearly DAQD fix 2022!

Just as predicted, all realtime models reported "0x4000" error. Read the parent post for more details. I fixed this by following the instructions. I add folowing lines to the file /opt/rtcds/rtscore/release/src/include/drv/spectracomGPS.c in fb1:

/* 2020 had 366 days and no leap second */
       pHardware->gpsOffset += 31622400;
/* 2021 had no leap seconds or leap days, so adjust for that */
       pHardware->gpsOffset += 31536000;

Then is made the package and reloaded it after stoping the daqd services. This brought back all the fast models except C1SUS2 models which are in red due to some other reason that I'll investigate further.

 

  16547   Thu Jan 6 13:54:28 2022 KojiUpdateCDSYearly DAQD fix 2022!

Just restarting all the c1sus2 models fixed the issue. (Attachment 1)

SUS2 ADC1 CH21 is saturated. I'm not yet sure if this is the electronics issue or the ADC issue.
SUS2 ADC1 CH10 also has large offset. This should also be investiagted.

Attachment 1: Screenshot_2022-01-06_13-57-40.png
Screenshot_2022-01-06_13-57-40.png
  16548   Thu Jan 6 14:08:14 2022 KojiUpdateCDSMore BHD SUS screens added to sitemap

More BHD SUS screens added to sitemap (Attachment 1)

Attachment 1: Screenshot_2022-01-06_14-06-15.png
Screenshot_2022-01-06_14-06-15.png
  16553   Thu Jan 6 22:18:47 2022 KojiUpdateCDSSUS screen debugging

Indicated by the red arrow:
Even when the side damping servo is off, the number appears at the input of the output matrix

Indicated by the green arrows:
The face magnets and the side magnets use different ADCs. How about opening a custom ADC panel that accommodates all ADCs at once? Same for the DAC.

Indicated by the blue arrows:
This button opens a custom FM window. When the pitch gain was modified with a ramping time, the pitch and yaw gain grows at the same time even though only the pitch gain was modified.

Indicated by the orange circle:
The numbers are not indicated here, but they are input-related numbers (for watchdogging) rather than output-related numbers. It is confusing to place them here.

Attachment 1: Screen_Shot_2022-01-06_at_18.03.24.png
Screen_Shot_2022-01-06_at_18.03.24.png
  16570   Tue Jan 11 10:46:07 2022 TegaUpdateCDSSUS screen debugging

Seen. Thanks.

Red Arrow: The channel was labeled incorrectly as INMON instead of OUTPUT

Green Arrow: OK, I will create a custom medm screen for this.

Blue arrow: Hmm, OK I will look into this. Doing this work remotely is a pain as the medm response is quite slow for poking around.

Orange circle: OK, I'll move this to the left side of the line.

Note to self: I also noticed another error on the side (LPYS blue box just b4 the sum). The channel is pointing to YAW instead of the side, so needs to be fixed as well.

Quote:

Indicated by the red arrow:
Even when the side damping servo is off, the number appears at the input of the output matrix

Indicated by the green arrows:
The face magnets and the side magnets use different ADCs. How about opening a custom ADC panel that accommodates all ADCs at once? Same for the DAC.

Indicated by the blue arrows:
This button opens a custom FM window. When the pitch gain was modified with a ramping time, the pitch and yaw gain grows at the same time even though only the pitch gain was modified.

Indicated by the orange circle:
The numbers are not indicated here, but they are input-related numbers (for watchdogging) rather than output-related numbers. It is confusing to place them here.

 

  16611   Fri Jan 21 12:46:31 2022 TegaUpdateCDSSUS screen debugging

All done (almost)! I still have not sorted the issue of pitch and yaw gains growing together when modified using ramping time. Image of custom ADC and DAC panel is attached.

 

Quote:

Seen. Thanks.

 
Quote:

Indicated by the red arrow:
Even when the side damping servo is off, the number appears at the input of the output matrix

Indicated by the green arrows:
The face magnets and the side magnets use different ADCs. How about opening a custom ADC panel that accommodates all ADCs at once? Same for the DAC.

Indicated by the blue arrows:
This button opens a custom FM window. When the pitch gain was modified with a ramping time, the pitch and yaw gain grows at the same time even though only the pitch gain was modified.

Indicated by the orange circle:
The numbers are not indicated here, but they are input-related numbers (for watchdogging) rather than output-related numbers. It is confusing to place them here.

 

 

Attachment 1: Custom_ADC_DAC_monitors.png
Custom_ADC_DAC_monitors.png
  16662   Thu Feb 10 21:16:27 2022 KojiSummaryCDSchiara resolv.conf wierdo

During the videomux debug, I noticed that the host name resolving on chiara didn't behave well. Basically I could not login to anything from chiara using host names.

I found that there was no /etc/resolv.conf. Instead, there is /etc/resolvconf directory.

According to my research, live resolv.conf is placed in /run/resolveconf/resolv.conf .

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.113.20
nameserver 131.215.125.1
nameserver 8.8.8.8

This 113.20 is directing an old "linux1" machine. Too much obsolete. If I modify this file as

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.113.104
nameserver 131.215.125.1
nameserver 8.8.8.8
search martian

Then the name resolving became reasonable. However, during rebooting / service resetting / etc, resolvconf -u command is executed and /run/resolveconf/resolv.conf is overridden, as indicated in the file.

I have modified /etc/resolvconf/resolv.conf.d/base to include 192.168.113.104 and search martian . The latter was included but the former did not show up.

FInally I figured out that, after the resolv.conf is constructed from base and head files in /etc/resolvconf/resolv.conf.d/ , NetworkManager overrides the nameserver addresses.
The configuration was found in /etc/NetworkManager/system-connections/Wired\ connection\ 1 .

Here is the modified setting (dns entry was modified)

>sudo cat /etc/NetworkManager/system-connections/Wired\ connection\ 1
[sudo] password for controls:
[802-3-ethernet]
duplex=full
mac-address=68:05:CA:36:4E:B4

[connection]
id=Wired connection 1
uuid=ed177e70-d10e-42be-8165-3bf59f8f199d
type=802-3-ethernet
timestamp=1438810765

[ipv6]
method=auto

[ipv4]
method=manual
dns=192.168.113.104;131.215.125.1;8.8.8.8;
addresses1=192.168.113.104;24;192.168.113.2;

And

>cat /etc/resolvconf/resolv.conf.d/base
search martian
# See Also /etc/NetworkManager/system-connections/Wired\ connection\ 1

So complicated...

  16663   Thu Feb 10 21:51:02 2022 KojiUpdateCDS[Solved] Huge random numbers flowing into ETMX/ETMY ASC PIT/YAW

Huge random numbers are flowing into ETMX/ETMY ASC PIT/YAW. Because of this, I could not damp the ETMX/ETMY suspension at the beginning during the recovery from rebooting. (Attachment 1)
By turning off the output of the ASC filters, the mirrors were successfully damped.

Looking at the FE model view of the end RTSs, there were two possibilities: (Attachment 2)

- They are coming from RFM connection
- They are coming from ASXASY

ASX/ASY are not active and I could not see anything producing these numbers. Burtrestore didn't help.

The possibility was something at the other side of the RFM, or corruption of the RFM signal.

- Looking at the RFM model (Attachment 3), the ASC signals are coming from ASS and IOO. The ASS path has the filter module (C1:RFM-ETMX_PIT and etc). This FM is quiet and not guilty.

- Why do we have the RFM from IOO? I went to IOO and found the new ASC (WFS) model is there. I didn't realize the presence of this model. In fact ASC screen showed that these random numbers are flowing into the end SUSs.
So I did burtrestore of c1iooepics. Alas! they are gone.

Now I can go home.

Attachment 1: Screenshot_2022-02-10_21-46-02.png
Screenshot_2022-02-10_21-46-02.png
Attachment 2: Screen_Shot_2022-02-10_at_21.54.21.png
Screen_Shot_2022-02-10_at_21.54.21.png
Attachment 3: Screen_Shot_2022-02-10_at_22.14.23.png
Screen_Shot_2022-02-10_at_22.14.23.png
  16664   Fri Feb 11 10:56:38 2022 AnchalUpdateCDS[Solved] Huge random numbers flowing into ETMX/ETMY ASC PIT/YAW

Yeah, this is a known issue actually. We go to ASC screen and manually swich off all the outputs after every reboot. We haven't been able to find a way to set default so that when the model comes online, these outputs remain switched off. We should find a way for this.

 

  16666   Fri Feb 11 12:22:19 2022 ranaUpdateCDS[Solved] Huge random numbers flowing into ETMX/ETMY ASC PIT/YAW

you can hand edit the autoBurt file which the FE uses to set the values after boot up. Just make a python script that amends all of the OFF or ZERO that are needed to make things safe. This would be the autoBurt snap used on boot up only, and not the hourly snaps.

 

Yeah, this is a known issue actually. We go to ASC screen and manually swich off all the outputs after every reboot. We haven't been able to find a way to set default so that when the model comes online, these outputs remain switched off. We should find a way for this.

 

  16668   Fri Feb 11 17:07:19 2022 AnchalUpdateCDS[Solved] Huge random numbers flowing into ETMX/ETMY ASC PIT/YAW

The autoBurt file for FE already has the C1:ASC-ETMX_PIT_SW2 (and other channels for ETMY, ITMX, ITMY, BS and for YAW) present, and I checked the last snapshot file from Feb 7th, 2022, which has 0 for these channels. So I'm not sure why when FE boots up, it does not follow the switch configuration. Fr safety, I changed all the gains of these filter modules, named like C1:ASC-XXXX_YYY_GAIN (where XXXX is ETMX, ETMY, ITMX, ITMY, or BS , and YYY is PIT or YAW) to 0.0. Now, even if the FE loads with switches in ON configuration, nothing should happen. In future, if we use this model for anything, we can change the gain values which won't be hard to track as the reason why no signal moves forward. Note, the BS connections from this model to BS suspension model do not work.

Quote:

you can hand edit the autoBurt file which the FE uses to set the values after boot up. Just make a python script that amends all of the OFF or ZERO that are needed to make things safe. This would be the autoBurt snap used on boot up only, and not the hourly snaps.

 

Yeah, this is a known issue actually. We go to ASC screen and manually swich off all the outputs after every reboot. We haven't been able to find a way to set default so that when the model comes online, these outputs remain switched off. We should find a way for this.

 

 

  16697   Thu Mar 3 15:37:40 2022 AnchalSummaryCDSc1teststand restructured

c1teststand has been restructured. There is no port computer called 'c1teststand' anymore. When you ssh into the c1teststand network using ssh c1teststand from inside martian or from outside network using the method mentioned in this wiki page , you would land into chiara (clone) computer and you can navigate into any teststand network computer from there.

I'll be repurposing 1U c1teststand computer into the new c1susaux2 slow machine now. All files from home directory and from /etc directory of former c1teststand have been zipped and stored in /home/controls of chiara (clone). Just a aside, the network configuration of teststand can be done from inside the teststand network, by going to a browser on either fb1 (clone) or chair (clone) and going to address 10.0.1.1. The login and password are same as our usual workstation username and password.

  16700   Fri Mar 4 11:04:34 2022 AnchalSummaryCDSc1susaux2 system setup and running

I took the c1teststand computer from teststand and converted it into c1susaux2. To do so, I installed a fresh copy of debian 10 on it and followed the steps on this wiki page. I did some parts slightly differently though. The directory /cvs/cds/caltecg/c1susaux2 is a repository and contains the service unit file modbusIOC.service as well. A symbolic link is created at /etc/systemd/system to use this service file for creating the modbusIOC service. All db files are generated by parsing the acromag chassis wiring file using this python script.

The service file is running without any errors now and all channels are available. The leftmost bench on EEshop at 40m is now ready to do LO1 slow controls and monitor testing. If someone gets time today, they can hookup an unused coil driver to the chassis and verify ENABLE switching and monitoring through the optical isolators. We can also drive some voltage on the PD monitors and verify the functioning of our ADCs. Once this test passes, it is straight forward to finish the remaining 6 SOS wiring and we would be good to install the chassis.

Attaching wiring diagram of c1susuaux2 acromag chassis. Any comments/modification suggestions should come soon as we'll go ahead and wire it soon.

Note: While accessing channels using caget on c1susuaux2, you might get a warning "Identical process variable names on multiple servers". You can safely ignore it. It just means that channel is accessible on that particular computer via two different network interfaces (martian network eno1 and acromag subnetwork eno2) and it will just pick one of them.

Attachment 1: 40mBHD_C1SUSAUX2_Acromag_Chassis.pdf
40mBHD_C1SUSAUX2_Acromag_Chassis.pdf
  16702   Sat Mar 5 01:58:49 2022 KojiSummaryCDSpaola rescue

ETMY end ThinkPad "paola" could not reboot due to "Fan Error". It seems that it is the failure of the CPU fan. I really needed a functional laptop at the end for the electronics work, I decided to open the chassis. By removing the marked screws at the bottom lid, the keyboard was lifted. I found that the CPU fan was stuck because of accumulated dust. Once the fan was cleaned, the laptop starts up as before.

Attachment 1: PXL_20220305_035255834.jpg
PXL_20220305_035255834.jpg
Attachment 2: PXL_20220305_034649120.MP.jpg
PXL_20220305_034649120.MP.jpg
  16712   Mon Mar 7 19:38:47 2022 AnchalSummaryCDSc1susaux2 slow controls issues

I tried to perform a simple enabling test of coils using c1susaux2 modbus channels but failed. I'm able to do the enabling of coils using the windows GUI of acromag card but I can not do it when the cards are connected to the computer subnetwork. The issue is two-fold:

  • The enable channels such as C1:SUS-LO1_UL_ENABLE are not changing values when their DOL changes a value. In this case, I created a calc channel C1:SUS-LO1_ALL_CALC which takes the AND of all coil's individual CALC channels which are normally used as DOL for the ENABLE channels. But even though the changes are reflected properly to C1:SUS-LO1_ALL_CALC, it does not affect C1:SUS-LO1_UL_ENABLE. See the db files here for more info.
  • I tried to directly change the value of C1:SUS-LO1_UL_ENABLE using caput and even though in soft value the channel changes, it does not propagate a change at the output of Acromag card. So my suspicion is that something might be off with the setting of the Acromag card or c1susuaux2.cmd file. I followed this wiki page instructions, but if anyone can find an error, it would be useful.

There's also an issue in reading back the ENABLE_MON channels. Here we suspect that one of the optical isolator box that we have been using might have a short in one of it's output channel. I'll investigate this more tomorrow. Again, the issue is two-fold. The EPICS channel values do not really change. So there is clearly some issue of communicating with the acromag cards.

  16724   Mon Mar 14 12:20:05 2022 AnchalSummaryCDSc1susaux2 slow controls acromag chassis installed

[Anchal, Yehonathan, Ian]

We installed c1susaux2 acromag chassis in 1Y0 with c1susaux2 computer. We connected PD monitors, Binary inputs, Binary outputs, and Run/Acquire RTS signals for 6 of the 7 suspensions. We ran out of DB9 cables to connect PR3. Of the ones that were connected, LO2, AS1, AS4, SR2, and PR2 are showing no issues in the functionality from the chassis. For LO1, everything is working except for UR EnableMon channel. The enable monitor does not show an ON state for the coil even though the coil driver chassis shows that it is ON via the LED lights. A possible reason could be that a wire got disconnected when we closed the chassis (there are a lot of wires pushing against each other. Another reason could be that the optical isolator ISO10 could have developed a bad channel on channel 2. The circuit was tested before closing the chassis, so not sure what went wrong after closing it.

PR2 is showing a non-acromag chassis related issue. As soon as we close the loop by enabling the coils, the watchdog triggers because the loop is unstable. Not sure what has changed for PR2, but someone should take a look at it.

For the issue with LO1, I suggest we keep a note that the C1:SUS-LO1_UR_ENABLEMon channel is faulty and don't take its value seriously. We should diagnose and fix this issue once we have more reasons to disconnect the chassis and open it.

 

Attachment 1: BHD_WatchDogs.png
BHD_WatchDogs.png
Attachment 2: 40mBHD_C1SUSAUX2_Acromag_Chassis.pdf
40mBHD_C1SUSAUX2_Acromag_Chassis.pdf
  16726   Tue Mar 15 11:52:34 2022 AnchalSummaryCDSc1su2 model updated for sending Run/Acquire Binary Output to Binary Interface card

I routed the XXX_COIL_DW signals from the 7 SOS blocks in c1su2.mdl (located at /cvs/cds/rtcds/userapps/trunk/sus/c1/models/c1su2.mdl) to the binary outputs from the FE model. The routing is done such that when these binary outputs are routed through the binary interface card mounted on 1Y0, they go to the acromag chassis just installed and from there they go to the binary inputs of the coil drivers together with the acromag controlled coil outputs.

I have not restarted the rtcds models yet. This needs more care and need to follow instructions from 40m/16533. Will do that sometime later or Koji can follow up this work.

Attachment 1: c1su2.pdf
c1su2.pdf
  16728   Tue Mar 15 14:10:41 2022 AnchalSummaryCDSc1su2 model remade, reinstalled, restarted after the update

I have restarted c1su2 model with the connections of Run Acquire switch to analog filters on coil drivers. Following steps were taken:

First ssh to c1sus2 and then:

controls@c1sus2:~ 0$ rtcds make c1su2
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### building c1su2...
Cleaning c1su2...
Done
Parsing the model c1su2...
Done
Building EPICS sequencers...
Done
Building front-end Linux kernel module c1su2...
Done
RCG source code directory:
/opt/rtcds/rtscore/branches/branch-3.4
The following files were used for this build:
/opt/rtcds/userapps/release/cds/common/models/lockin.mdl
/opt/rtcds/userapps/release/cds/common/models/rtbitget.mdl
/opt/rtcds/userapps/release/cds/common/models/rtdemod.mdl
/opt/rtcds/userapps/release/isc/common/models/QPD.mdl
/opt/rtcds/userapps/release/sus/c1/models/c1su2.mdl
/opt/rtcds/userapps/release/sus/c1/models/lib/sus_single_control.mdl

Successfully compiled c1su2
***********************************************
Compile Warnings, found in c1su2_warnings.log:
***********************************************
WARNING  *********** No connection to subsystem output named  SUS_DAC1_12  
WARNING  *********** No connection to subsystem output named  SUS_DAC1_13  
WARNING  *********** No connection to subsystem output named  SUS_DAC1_14  
WARNING  *********** No connection to subsystem output named  SUS_DAC1_15  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_7  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_8  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_9  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_10  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_11  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_12  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_13  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_14  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_15  
***********************************************
controls@c1sus2:~ 0$ rtcds install c1su2
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### installing c1su2...
Installing system=c1su2 site=caltech ifo=C1,c1
Installing /opt/rtcds/caltech/c1/chans/C1SU2.txt
Installing /opt/rtcds/caltech/c1/target/c1su2/c1su2epics
Installing /opt/rtcds/caltech/c1/target/c1su2
Installing start and stop scripts
/opt/rtcds/caltech/c1/scripts/killc1su2
/opt/rtcds/caltech/c1/scripts/startc1su2
Performing install-daq
Updating testpoint.par config file
/opt/rtcds/caltech/c1/target/gds/param/testpoint.par
/opt/rtcds/rtscore/branches/branch-3.4/src/epics/util/updateTestpointPar.pl -par_file=/opt/rtcds/caltech/c1/target/gds/param/archive/testpoint_220315_135808.par -gds_node=26 -site_letter=C -system=c1su2 -host=c1sus2
Installing GDS node 26 configuration file
/opt/rtcds/caltech/c1/target/gds/param/tpchn_c1su2.par
Installing auto-generated DAQ configuration file
/opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
Installing Epics MEDM screens
Running post-build script

/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-AS1_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-AS1_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-AS1_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-AS4_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-AS4_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-AS4_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-LO1_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-LO1_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-LO1_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-LO2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-LO2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-LO2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-PR2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-PR2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-PR2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-PR3_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-PR3_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-PR3_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-SR2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-SR2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-SR2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_TO_COIL_KB.adl
safe.snap exists
controls@c1sus2:~ 0$

Then on rossa, run activateSUS2DQ.py which creates a file C1SU2.ini.NEW. Remove old backup file C1SU2.ini.bak, rename C1SU2.ini to C1SU2.ini.bak and rename C1SU2.ini.NEW to C1SU2.ini:

~> cd /opt/rtcds/caltech/c1/chans/daq/
daq>python2 activateSUS2DQ.py 
/opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
daq>rm C1SU2.ini.bak
daq>mv C1SU2.ini C1SU2.ini.bak
daq>mv C1SU2.ini.NEW C1SU2.ini

Then ssh back to c1sus2 and restart the rtcds model:

controls@c1sus2:~ 0$ rtcds restart c1su2
### stopping c1su2...
### starting c1su2...
c1su2epics: no process found
Number of ADC cards on bus = 2
Number of DAC16 cards on bus = 3
Number of DAC18 cards on bus = 0
Number of DAC20 cards on bus = 0
Specified filename iocC1.log does not exist.
c1su2epics C1 IOC Server started
c1su2 RT ready in 4
awg_server Version $Id$
channel_client Version $Id$
testpoint_server Version $Id$
/opt/rtcds/caltech/c1/target/gds/bin/awgtpman -s c1su2 -l /opt/rtcds/caltech/c1/target/gds/awgtpman_logs/c1su2.log started on host c1sus2 hostid ffffffffa8c05771 
awgtpman Version $Id$
controls@c1sus2:~ 0$

Then restart daqd services from rossa and burtrestore to latest snap of c1su2epics.snap:

daq>telnet fb 8083
Trying 192.168.113.201...
Connected to fb.martian.
Escape character is '^]'.
daqd> shutdown
OK
Connection closed by foreign host.
daq>burtgooey
>burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/latest/c1su2epics.snap -l /tmp/controls_1220315_140755_0.write.log -o /tmp/controls_1220315_140755_0.nowrite.snap -v <
daq>

All suspensions are back online and everything is same as before now. Will test later the Run/Acquire switch functionality.

  16734   Thu Mar 17 19:12:44 2022 AnchalSummaryCDSc1auxey1 slow controls acromag chassis installed, not powered

[Anchal, Tega]

We installed c1auxey1 computer and the acromag chassis in 1Y4. The computer has been configured properly for nfs mounts to happen and we have initialized a git repo for /cvs/cds/caltech/target/c1auxey1 directory which stores all files for running modbusIOC service on this computer. We connected 18V power source but have not connected the 24V power yet  as we need to make a new connector for it. Going on what Koji recommended, we'll connect the 24V power input to 18 V strip as well as the acromags can run on that voltage too.

  16736   Fri Mar 18 18:39:13 2022 YehonathanSummaryCDSc1auxey1 slow controls acromag chassis installed, powered

{Yehonathan, Anchal}

We connected the c1auxey1 chassie to the different boxes (coil drivers, SAT amp, etc.) using DB9 cables and labeling them in the process. We ran out of 2.5 foot DB9 cables so we used 5 foot as a temporary solution.

The chassie was powered, but a two issues arised:

1. The Acromags didn't turn on.

2. When connecting the green laser shutter BNC cable, the power supply overloaded.

We took the chassie back to the bench. The wire that powers the Acromags was disconnected. We made a new longer wire and made sure it is not connected flimsily.

The issue with the BNC turned out to be a much deeper problem: The GND and EXC wires on the DIN rail connector were switched! Making the shield of the BNC to have high volatage compared to the shield of the green shutter causing current to overflow when the BNC was connected.

We switched back the EXC and GND wires. Not trusting the digital I/O tests that were done before due to this mistake we tested some of the I/Os using a spare coil driver. We tested both the inputs and the outputs and they all seemed to work.

Finally, we also noticed that the 2 RTS DB9s were wrongly female type so we switched them to males. We closed the lead on chassie and installed it back in the rack. We connected the cables and saw that the green shutter BNC cable was no longer shorting the power supply.

  16737   Fri Mar 18 19:10:51 2022 AnchalSummaryCDSc1auxey1 slow controls issues

I started the modbusIOC service on c1auxey1 and added PD variance channels for UR and SD as well.  There are unfortunately two issues here:

  • The enable monitors are reading NOT of what they should read. The optical isolator circuit might need to be changed.
  • ETMY is not damping now. This is strange and was seen in the use to other acromag chassis as well where AS4 and PR2 are unable to damp. This is weird since the acromag chassis are not part of the damping loop, maybe it is a coincidence. Next time we should check if we still have this issue when acromag chassis is disconnected from ETMY.

 

ELOG V3.1.3-