I'm noticing what appears to be occasional failures of mx_stream on the front end machines. It doesn't happen that frequently, but I've noticed it a couple of times already since the upgrade.
The symptom is that the DC Status goes to "0xbad" (red) and the "FE NET" goes red for all models on a given front end.
The solution seems to be restarting mx_stream on the given front end: sudo /etc/init.d/mx_stream restart"
There is nothing in the mx_stream log:
controls@c1sus ~ 0$ cat /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log
c1x02
c1sus
c1mcs
c1rfm
c1pem
mmapped address is 0x7f43740ec000
mapped at 0x7f43740ec000
mmapped address is 0x7f43700ec000
mapped at 0x7f43700ec000
mmapped address is 0x7f436c0ec000
mapped at 0x7f436c0ec000
mmapped address is 0x7f43680ec000
mapped at 0x7f43680ec000
mmapped address is 0x7f43640ec000
mapped at 0x7f43640ec000
send len = 263596
Connection Made
but I do see some funny messages in the front end dmesg:
[200341.317912] DXH Adapter 0 : Heartbeat alive-check for node=12 failed (cnt=8387 state=0x1 deb=0 val=0).
[200341.318670] DXH Adapter 0 : Session for node 12 is disabled - Status = 0x5
[200341.319062] Session callback reason=1 status=5 target_node=12
[200341.319069] Session callback reason=3 status=0 target_node=12
[200341.359534] (map_table_check_access:752):my id 1 -> remote id 2 : entry was valid - is now tentatively valid
[200341.859584] DXH Adapter 0 : Probe failure for node=12 - disabling session probeStatus=0x40000f02
[200341.860335] DXH Adapter 0 : Session for node 12 is disabled - Status = 0x3
[200341.860728] Session callback reason=1 status=3 target_node=12
[200374.006111] DXH Adapter 0 : Set reachable remote node list.
[200409.020670] DXH Adapter 0 : Set reachable remote node list.
[200409.021076] DXH Adapter 0 : Session for node 12 is deleted - Status = 0x0
[200409.021468] Session callback reason=5 status=0 target_node=12
[200412.362824] (map_table_insert:648):** successfully inserted **(valid unicast) inst 0 node 1->0 fwd 0 fwd_tp 4 egress 0
[200418.025994] (map_table_check_access:752):my id 1 -> remote id 0 : entry was valid - is now invalid
[200418.025998] (map_table_insert:648):** successfully inserted **(valid unicast) inst 0 node 1->2 fwd 0 fwd_tp 4 egress 0
[200421.743916] Session callback reason=0 status=0 target_node=12
[200422.073776] DXH Adapter 0 : Set reachable remote node list.
[200422.342446] Session callback reason=7 status=0 target_node=12
[200422.342454] DXH Adapter 0 : Session for node 12 is ok.
I'm awaiting feedback from experts.
|