MX/OpenMX network running
Today I got the mx/open-mx networking working for the front ends. This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel. Again, this will all be documented.
controls@fb1:~ 0$ /opt/mx/bin/mx_info
MX Version: 1.2.16
MX Build: root@fb1:/opt/src/mx-1.2.16 Mon Jul 24 11:33:57 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0: 364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
Status: Running, P0: Link Up
Network: Ethernet 10G
MAC Address: 00:60:dd:43:74:62
Product code: 10G-PCIE-8B-S
Part number: 09-04228
Serial number: 485052
Mapper: 00:60:dd:43:74:62, version = 0x00000000, configured
Mapped hosts: 6
ROUTE COUNT
INDEX MAC ADDRESS HOST NAME P0
----- ----------- --------- ---
0) 00:60:dd:43:74:62 fb1:0 1,0
1) 00:30:48:be:11:5d c1iscex:0 1,0
2) 00:30:48:bf:69:4f c1lsc:0 1,0
3) 00:25:90:0d:75:bb c1sus:0 1,0
4) 00:30:48:d6:11:17 c1iscey:0 1,0
5) 00:14:4f:40:64:25 c1ioo:0 1,0
controls@fb1:~ 0$
c1ioo timing glitches fixed
I also checked the BIOS on c1ioo and found that the serial port was enabled, which is known to cause timing glitches. I turned off the serial port (and some power management stuff), and rebooted, and all the c1ioo timing glitches seem to have gone away.
It's unclear why this is a problem that's just showing up now. Serial ports have always been a problem, so it seems unlikely this is just a problem with the newer kernel. Could the BIOS have somehow been reset during the power glitch?
In any event, all the front ends are now booting cleanly, with all dolphin and mx networking coming up automatically, and all models running stably:

Now for daqd... |