40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Wed Sep 29 17:10:09 2021, Anchal, Summary, CDS, c1teststand problems summary c1teststand_issues_summary.pdf
    Reply  Thu Sep 30 14:09:37 2021, Anchal, Summary, CDS, New way to ssh into c1teststand 
    Reply  Mon Oct 4 11:05:44 2021, Anchal, Summary, CDS, c1teststand problems summary 
       Reply  Mon Oct 4 18:00:16 2021, Koji, Summary, CDS, c1teststand problems summary 
          Reply  Tue Oct 5 17:58:52 2021, Anchal, Summary, CDS, c1teststand problems summary 
       Reply  Tue Oct 5 18:00:53 2021, Anchal, Summary, CDS, c1teststand time synchronization working now 
          Reply  Mon Oct 11 17:31:25 2021, Anchal, Summary, CDS, Fixed mounting of mx devices in fb. daqd_dc is running now. 
             Reply  Mon Oct 11 18:29:35 2021, Anchal, Summary, CDS, Moving forward? 
                Reply  Tue Oct 12 17:20:12 2021, Anchal, Summary, CDS, Connected c1sus2 to martian network 
                   Reply  Tue Oct 12 23:42:56 2021, Koji, Summary, CDS, Connected c1sus2 to martian network 
                      Reply  Wed Oct 13 11:25:14 2021, Anchal, Summary, CDS, Ran c1sus2 models in martian CDS. All good! CDS_screens_running.png
                         Reply  Tue Oct 19 18:20:33 2021, Ian MacMillan, Summary, CDS, c1sus2 DAC to ADC test data3_Plots.pdfdata2_Plots.pdf
                            Reply  Tue Oct 19 23:43:09 2021, Koji, Summary, CDS, c1sus2 DAC to ADC test P_20211019_224433.jpgP_20211019_224122.jpgP_20211019_224400.jpgP_20211019_224411.jpg
                               Reply  Wed Oct 20 11:48:27 2021, Anchal, Summary, CDS, Power supple configured correctly. 
                               Reply  Tue Oct 26 18:24:00 2021, Ian MacMillan, Summary, CDS, c1sus2 DAC to ADC test data2_Plots.pdfdata3_Plots.pdf
             Reply  Tue Oct 12 17:10:56 2021, Anchal, Summary, CDS, Some more information 
Message ID: 16391     Entry time: Mon Oct 11 17:31:25 2021     In reply to: 16382     Reply to this: 16392   16395
Author: Anchal 
Type: Summary 
Category: CDS 
Subject: Fixed mounting of mx devices in fb. daqd_dc is running now. 
 
 

However, lspci | grep 'Myri' shows following output on both computers:

controls@fb1:/dev 0$ lspci | grep 'Myri'
02:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (rev 01)

Which means that the computer detects the card on PCie slot.

 

I tried to add this to /etc/rc.local to run this script at every boot, but it did not work. So for now, I'll just manually do this step everytime. Once the devices are loaded, we get:

controls@fb1:/etc 0$ ls /dev/*mx*
/dev/mx0  /dev/mx4  /dev/mxctl   /dev/mxp2  /dev/mxp6         /dev/ptmx
/dev/mx1  /dev/mx5  /dev/mxctlp  /dev/mxp3  /dev/mxp7
/dev/mx2  /dev/mx6  /dev/mxp0    /dev/mxp4  /dev/open-mx
/dev/mx3  /dev/mx7  /dev/mxp1    /dev/mxp5  /dev/open-mx-raw

The, restarting all daqd_ processes, I found that daqd_dc was running succesfully now. Here is the status:

controls@fb1:/etc 0$ sudo systemctl status daqd_* -l
● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
   Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:00 PDT; 23min ago
 Main PID: 2308 (daqd_dc_mx)
   CGroup: /daqd.slice/daqd_dc.service
           ├─2308 /usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc
           └─2370 caRepeater

Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 006 thread priority error Operation not permitted[Mon Oct 11 17:48:06 2021]
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 005 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] [Mon Oct 11 17:48:06 2021] mx receiver 006 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 007 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread - label dqmx003 pid=2362
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread priority error Operation not permitted
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: warning:regcache incompatible with malloc
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] EDCU has 410 channels configured; first=0
Oct 11 17:49:06 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:49:06 2021] ->4: clear crc

● daqd_fw.service - Advanced LIGO RTS daqd frame writer
   Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:01 PDT; 23min ago
 Main PID: 2318 (daqd_fw)
   CGroup: /daqd.slice/daqd_fw.service
           └─2318 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw

Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] [Mon Oct 11 17:48:09 2021] Producer thread - label dqproddbg pid=2440
Oct 11 17:48:09 fb1 daqd_fw[2318]: Producer crc thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] [Mon Oct 11 17:48:09 2021] Producer crc thread put on CPU 0
Oct 11 17:48:09 fb1 daqd_fw[2318]: Producer thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread put on CPU 0
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread - label dqprod pid=2434
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread put on CPU 0
Oct 11 17:48:10 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:10 2021] Minute trender made GPS time correction; gps=1318034906; gps%60=26
Oct 11 17:49:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:49:09 2021] ->3: clear crc

● daqd_rcv.service - Advanced LIGO RTS daqd testpoint receiver
   Loaded: loaded (/etc/systemd/system/daqd_rcv.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:00 PDT; 23min ago
 Main PID: 2311 (daqd_rcv)
   CGroup: /daqd.slice/daqd_rcv.service
           └─2311 /usr/bin/daqd_rcv -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.rcv

Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1X07_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_STATUS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_CRC_CPS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_STATUS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_CRC_CPS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1OM[Mon Oct 11 17:50:21 2021] Epics server started
Oct 11 17:50:24 fb1 daqd_rcv[2311]: [Mon Oct 11 17:50:24 2021] Minute trender made GPS time correction; gps=1318035040; gps%120=40
Oct 11 17:51:21 fb1 daqd_rcv[2311]: [Mon Oct 11 17:51:21 2021] ->3: clear crc

Now, even before starting teh FE models, I see DC status as ox2bad in the CDS screens of the IOP and user models. The mx_stream service remains in a failed state at teh FE machines and remain the same even after restarting the service.

controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: failed (Result: exit-code) since Mon 2021-10-11 17:50:26 PDT; 15min ago
  Process: 382 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
 Main PID: 382 (code=exited, status=1/FAILURE)

Oct 11 17:50:25 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 17:50:25 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 17:50:25 c1sus2 mx_stream_exec[382]: Failed to open endpoint Not initialized
Oct 11 17:50:26 c1sus2 systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 11 17:50:26 c1sus2 systemd[1]: Unit mx_stream.service entered failed state.

But  if I restart the mx_stream service before starting the rtcds models, the mx-stream service starts succesfully:

controls@c1sus2:~ 0$ sudo systemctl restart mx_stream
controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: active (running) since Mon 2021-10-11 18:14:13 PDT; 25s ago
 Main PID: 1337 (mx_stream)
   CGroup: /system.slice/mx_stream.service
           └─1337 /usr/bin/mx_stream -e 0 -r 0 -w 0 -W 0 -s c1x07 c1su2 -d fb1:0

Oct 11 18:14:13 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 18:14:13 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: send len = 263596
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: Connection Made

However, the DC status on CDS screens still show 0x2bad. As soon as I start the rtcds model c1x07 (the IOP model for c1sus2), the mx_stream service fails:

controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: failed (Result: exit-code) since Mon 2021-10-11 18:18:03 PDT; 27s ago
  Process: 1337 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
 Main PID: 1337 (code=exited, status=1/FAILURE)

Oct 11 18:14:13 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 18:14:13 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: send len = 263596
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: Connection Made
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: isendxxx failed with status Remote Endpoint Unreachable
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: disconnected from the sender
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: c1x07_daq mmapped address is 0x7fe3620c3000
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: c1su2_daq mmapped address is 0x7fe35e0c3000
Oct 11 18:18:03 c1sus2 systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 11 18:18:03 c1sus2 systemd[1]: Unit mx_stream.service entered failed state.

This shows that the start of rtcds model, causes the fail in mx_stream, possibly due to inability of finding the endpoint on fb1. I've again reached to the edge of my knowledge here. Maybe the fiber optic connection between fb and the network switch that connects to FE is bad, or the connection between switch and FEs is bad.

But we are just one step away from making this work.

 

 

ELOG V3.1.3-