40m
QIL
Cryo_Lab
CTN
SUS_Lab
TCS_Lab
OMC_Lab
CRIME_Lab
FEA
ENG_Labs
OptContFac
Mariner
WBEEShop
|
40m Log |
Not logged in |
 |
|
Wed Sep 29 17:10:09 2021, Anchal, Summary, CDS, c1teststand problems summary
|
Thu Sep 30 14:09:37 2021, Anchal, Summary, CDS, New way to ssh into c1teststand
|
Thu Mar 3 15:37:40 2022, Anchal, Summary, CDS, c1teststand restructured
|
Mon Oct 4 11:05:44 2021, Anchal, Summary, CDS, c1teststand problems summary
|
Mon Oct 4 18:00:16 2021, Koji, Summary, CDS, c1teststand problems summary
|
Tue Oct 5 17:58:52 2021, Anchal, Summary, CDS, c1teststand problems summary
|
Tue Oct 5 18:00:53 2021, Anchal, Summary, CDS, c1teststand time synchronization working now
|
Mon Oct 11 17:31:25 2021, Anchal, Summary, CDS, Fixed mounting of mx devices in fb. daqd_dc is running now.
|
Mon Oct 11 18:29:35 2021, Anchal, Summary, CDS, Moving forward?
|
Tue Oct 12 17:20:12 2021, Anchal, Summary, CDS, Connected c1sus2 to martian network
|
Tue Oct 12 23:42:56 2021, Koji, Summary, CDS, Connected c1sus2 to martian network
|
Wed Oct 13 11:25:14 2021, Anchal, Summary, CDS, Ran c1sus2 models in martian CDS. All good!
|
Tue Oct 19 18:20:33 2021, Ian MacMillan, Summary, CDS, c1sus2 DAC to ADC test 
|
Tue Oct 19 23:43:09 2021, Koji, Summary, CDS, c1sus2 DAC to ADC test   
|
Wed Oct 20 11:48:27 2021, Anchal, Summary, CDS, Power supple configured correctly.
|
Tue Oct 26 18:24:00 2021, Ian MacMillan, Summary, CDS, c1sus2 DAC to ADC test 
|
Wed Dec 22 17:40:22 2021, Anchal, Summary, CDS, c1su2 model updated with SUS damping blocks for 7 SOSs
|
Wed Dec 29 20:09:40 2021, rana, Summary, CDS, c1su2 model updated with SUS damping blocks for 7 SOSs
|
Fri Mar 4 11:04:34 2022, Anchal, Summary, CDS, c1susaux2 system setup and running
|
Mon Mar 7 19:38:47 2022, Anchal, Summary, CDS, c1susaux2 slow controls issues
|
Mon Mar 14 12:20:05 2022, Anchal, Summary, CDS, c1susaux2 slow controls acromag chassis installed 
|
Thu Mar 17 19:12:44 2022, Anchal, Summary, CDS, c1auxey1 slow controls acromag chassis installed, not powered
|
Fri Mar 18 18:39:13 2022, Yehonathan, Summary, CDS, c1auxey1 slow controls acromag chassis installed, powered
|
Fri Mar 18 19:10:51 2022, Anchal, Summary, CDS, c1auxey1 slow controls issues
|
Mon Mar 21 18:42:06 2022, Anchal, Summary, CDS, c1auxey1 slow controls issues
|
Mon Apr 4 17:03:47 2022, Anchal, Summary, CDS, c1susaux2 slow controls acromag chassis fixed and installed
|
Wed Jul 6 22:40:03 2022, Tega, Summary, CDS, Use osem variance to turn off SUS damping instead of coil outputs
|
Thu Jul 7 21:25:48 2022, Tega, Summary, CDS, Use osem variance to turn off SUS damping instead of coil outputs
|
Tue Mar 15 11:52:34 2022, Anchal, Summary, CDS, c1su2 model updated for sending Run/Acquire Binary Output to Binary Interface card
|
Tue Mar 15 14:10:41 2022, Anchal, Summary, CDS, c1su2 model remade, reinstalled, restarted after the update
|
Tue Oct 12 17:10:56 2021, Anchal, Summary, CDS, Some more information
|
|
Message ID: 16372
Entry time: Mon Oct 4 11:05:44 2021
In reply to: 16365
Reply to this: 16376
16382
|
Author: |
Anchal |
Type: |
Summary |
Category: |
CDS |
Subject: |
c1teststand problems summary |
|
|
[Anchal, Paco]
We tried to fix the ntp synchronization in c1teststand today by repeating the steps listed in 40m/16302. Even though teh cloned fb1 now has the exact same package version, conf & service files, and status, the FE machines (c1bhd and c1sus2) fail to sync to the time. the timedatectl shows the same stauts 'Idle'. We also, dug bit deeper into the error messages of daq_dc on cloned fb1 and mx_stream on FE machines and have some error messages to report here.
Attempt on fixing the ntp
- We copied the ntp package version 1:4.2.6 deb file from /var/cache/apt/archives/ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb on the martian fb1 to the cloned fb1 and ran.
controls@fb1:~ 0$ sudo dbpg -i ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb
- We got error messages about missing dependencies of libopts25 and libssl1.1. We downloaded oldoldstable jessie versions of these packages from here and here. We ensured that these versions are higher than the required versions for ntp. We installed them with:
controls@fb1:~ 0$ sudo dbpg -i libopts25_5.18.12-3_amd64.deb
controls@fb1:~ 0$ sudo dbpg -i libssl1.1_1.1.0l-1~deb9u4_amd64.deb
- Then we installed the ntp package as described above. It asked us if we want to keep the configuration file, we pressed Y.
- However, we decided to make the configuration and service files exactly same as martian fb1 to make it same in cloned fb1. We copied /etc/ntp.conf and /etc/systemd/system/ntp.service files from martian fb1 to cloned fb1 in the same positions. Then we enabled ntp, reloaded the daemon, and restarted ntp service:
controls@fb1:~ 0$ sudo systemctl enable ntp
controls@fb1:~ 0$ sudo systemctl daemon-reload
controls@fb1:~ 0$ sudo systemctl restart ntp
- But ofcourse, since fb1 doesn't have internet access, we got some errors in status of the ntp.service:
controls@fb1:~ 0$ sudo systemctl status ntp
● ntp.service - NTP daemon (custom service)
Loaded: loaded (/etc/systemd/system/ntp.service; enabled)
Active: active (running) since Mon 2021-10-04 17:12:58 UTC; 1h 15min ago
Main PID: 26807 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/ntp.service
├─30408 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
└─30525 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
Oct 04 17:48:42 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
Oct 04 17:48:52 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
Oct 04 18:05:05 fb1 ntpd_intres[30525]: host name not found: 0.debian.pool.ntp.org
Oct 04 18:05:15 fb1 ntpd_intres[30525]: host name not found: 1.debian.pool.ntp.org
Oct 04 18:05:25 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
Oct 04 18:05:35 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
Oct 04 18:21:48 fb1 ntpd_intres[30525]: host name not found: 0.debian.pool.ntp.org
Oct 04 18:21:58 fb1 ntpd_intres[30525]: host name not found: 1.debian.pool.ntp.org
Oct 04 18:22:08 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
Oct 04 18:22:18 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
- But the ntpq command is giving the saem output as given by ntpq comman in martian fb1 (except for the source servers), that the broadcasting is happening in the same manner:
controls@fb1:~ 0$ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.123.255 .BCST. 16 u - 64 0 0.000 0.000 0.000
- On the FE machines side though, the systemd-timesyncd are still unable to read the time signal from fb1 and show the status as idle:
controls@c1bhd:~ 3$ timedatectl
Local time: Mon 2021-10-04 18:34:38 UTC
Universal time: Mon 2021-10-04 18:34:38 UTC
RTC time: Mon 2021-10-04 18:34:38
Time zone: Etc/UTC (UTC, +0000)
NTP enabled: yes
NTP synchronized: no
RTC in local TZ: no
DST active: n/a
controls@c1bhd:~ 0$ systemctl status systemd-timesyncd -l
● systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
Active: active (running) since Mon 2021-10-04 17:21:29 UTC; 1h 13min ago
Docs: man:systemd-timesyncd.service(8)
Main PID: 244 (systemd-timesyn)
Status: "Idle."
CGroup: /system.slice/systemd-timesyncd.service
└─244 /lib/systemd/systemd-timesyncd
- So the time synchronization is still not working. We expected the FE machined to just synchronize to fb1 even though it doesn't have any upstream ntp server to synchronize to. But that didn't happen.
- I'm (Anchal) working on getting internet access to c1teststand computers.
Digging into mx_stream/daqd_dc errors:
- We went and changed the Restart fileld in /etc/systemd/system/daqd_dc.service on cloned fb1 to 2. This allows the service to fail and stop restarting after two attempts. This allows us to see the real error message instead of the systemd error message that the service is restarting too often. We got following:
controls@fb1:~ 3$ sudo systemctl status daqd_dc -l
● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
Active: failed (Result: exit-code) since Mon 2021-10-04 17:50:25 UTC; 22s ago
Process: 715 ExecStart=/usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc (code=exited, status=1/FAILURE)
Main PID: 715 (code=exited, status=1/FAILURE)
Oct 04 17:50:24 fb1 systemd[1]: Started Advanced LIGO RTS daqd data concentrator.
Oct 04 17:50:25 fb1 daqd_dc_mx[715]: [Mon Oct 4 17:50:25 2021] Unable to set to nice = -20 -error Unknown error -1
Oct 04 17:50:25 fb1 daqd_dc_mx[715]: Failed to do mx_get_info: MX not initialized.
Oct 04 17:50:25 fb1 daqd_dc_mx[715]: 263596
Oct 04 17:50:25 fb1 systemd[1]: daqd_dc.service: main process exited, code=exited, status=1/FAILURE
Oct 04 17:50:25 fb1 systemd[1]: Unit daqd_dc.service entered failed state.
- It seemed like the only thing daqd_dc process doesn't like is that mx_stream services are in failed state in teh FE computers. So we did the same process on FE machines to get the real error messages:
controls@fb1:~ 0$ sudo chroot /diskless/root
fb1:/ 0#
fb1:/ 0# sudo nano /etc/systemd/system/mx_stream.service
fb1:/ 0#
fb1:/ 0# exit
- Then I ssh'ed into c1bhd to see the error message on mx_stream service properly.
controls@c1bhd:~ 0$ sudo systemctl daemon-reload
controls@c1bhd:~ 0$ sudo systemctl restart mx_stream
controls@c1bhd:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
Active: failed (Result: exit-code) since Mon 2021-10-04 17:57:20 UTC; 24s ago
Process: 11832 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
Main PID: 11832 (code=exited, status=1/FAILURE)
Oct 04 17:57:20 c1bhd systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 04 17:57:20 c1bhd systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: send len = 263596
Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: mx_connect failed Nic ID not Found in Peer Table
Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: c1x06_daq mmapped address is 0x7f516a97a000
Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: c1bhd_daq mmapped address is 0x7f516697a000
Oct 04 17:57:20 c1bhd systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 04 17:57:20 c1bhd systemd[1]: Unit mx_stream.service entered failed state.
- c1sus2 shows the same error. I'm not sure I understand these errors at all. But they seem to have nothing to do with timing issues
!
As usual, some help would be helpful |