Chris pointed out some information displaying scripts, that show if the DAQ network is working or not. I thought it would be nice to log this information here as well.
controls@fb1:/opt/mx/bin 0$ ./mx_info
MX Version: 1.2.16
MX Build: controls@fb1:/opt/src/mx-1.2.16 Mon Aug 14 11:06:09 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0: 364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
Status: Running, P0: Link Up
Network: Ethernet 10G
MAC Address: 00:60:dd:45:37:86
Product code: 10G-PCIE-8B-S
Part number: 09-04228
Serial number: 423340
Mapper: 00:60:dd:45:37:86, version = 0x00000000, configured
Mapped hosts: 3
ROUTE COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
0) 00:60:dd:45:37:86 fb1:0 1,0
1) 00:25:90:05:ab:47 c1bhd:0 1,0
2) 00:25:90:06:69:c3 c1sus2:0 1,0
controls@c1bhd:~ 1$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.4
build: root@fb1:/opt/src/open-mx-1.5.4 Tue Aug 15 23:48:03 UTC 2017
Found 1 boards (32 max) supporting 32 endpoints each:
c1bhd:0 (board #0 name eth1 addr 00:25:90:05:ab:47)
managed by driver 'igb'
Peer table is ready, mapper is 00:60:dd:45:37:86
================================================
0) 00:25:90:05:ab:47 c1bhd:0
1) 00:60:dd:45:37:86 fb1:0
2) 00:25:90:06:69:c3 c1sus2:0
controls@c1sus2:~ 0$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.4
build: root@fb1:/opt/src/open-mx-1.5.4 Tue Aug 15 23:48:03 UTC 2017
Found 1 boards (32 max) supporting 32 endpoints each:
c1sus2:0 (board #0 name eth1 addr 00:25:90:06:69:c3)
managed by driver 'igb'
Peer table is ready, mapper is 00:60:dd:45:37:86
================================================
0) 00:25:90:06:69:c3 c1sus2:0
1) 00:60:dd:45:37:86 fb1:0
2) 00:25:90:05:ab:47 c1bhd:0
These outputs prove that the framebuilder and the FEs are able to see each other in teh DAQ network.
Further, the error that we see when IOP model is started which crashes the mx_stream service on the FE machines (see 40m/16391) :
isendxxx failed with status Remote Endpoint Unreachable
This has been seen earlier when Jamie was troubleshooting the current fb1 in martian network in 40m/11655 in Oct, 2015. Unfortunately, I could not find what Jamie did over a year to fix this issue. |