40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Message ID: 6574     Entry time: Thu Apr 26 18:15:59 2012
Author: Jamie 
Type: Update 
Category: CDS 
Subject: possible issue with mx_stream on front ends 

I'm noticing what appears to be occasional failures of mx_stream on the front end machines.  It doesn't happen that frequently, but I've noticed it a couple of times already since the upgrade.

The symptom is that the DC Status goes to "0xbad" (red) and the "FE NET" goes red for all models on a given front end.

The solution seems to be restarting mx_stream on the given front end:    sudo  /etc/init.d/mx_stream restart"

There is nothing in the mx_stream log:

 controls@c1sus ~ 0$ cat /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log 
 c1x02
 c1sus
 c1mcs
 c1rfm
 c1pem
 mmapped address is 0x7f43740ec000
 mapped at 0x7f43740ec000
 mmapped address is 0x7f43700ec000
 mapped at 0x7f43700ec000
 mmapped address is 0x7f436c0ec000
 mapped at 0x7f436c0ec000
 mmapped address is 0x7f43680ec000
 mapped at 0x7f43680ec000
 mmapped address is 0x7f43640ec000
 mapped at 0x7f43640ec000
 send len = 263596
 Connection Made

but I do see some funny messages in the front end dmesg:

 [200341.317912] DXH Adapter 0 : Heartbeat alive-check for node=12 failed (cnt=8387 state=0x1 deb=0 val=0).
 [200341.318670] DXH Adapter 0 : Session for node 12 is disabled - Status = 0x5
 [200341.319062] Session callback reason=1 status=5 target_node=12
 [200341.319069] Session callback reason=3 status=0 target_node=12
 [200341.359534] (map_table_check_access:752):my id 1 ->  remote id 2 : entry was valid - is now tentatively valid
 [200341.859584] DXH Adapter 0 : Probe failure for node=12 - disabling session probeStatus=0x40000f02
 [200341.860335] DXH Adapter 0 : Session for node 12 is disabled - Status = 0x3
 [200341.860728] Session callback reason=1 status=3 target_node=12
 [200374.006111] DXH Adapter 0 : Set reachable remote node list.
 [200409.020670] DXH Adapter 0 : Set reachable remote node list.
 [200409.021076] DXH Adapter 0 : Session for node 12 is deleted - Status = 0x0
 [200409.021468] Session callback reason=5 status=0 target_node=12
 [200412.362824] (map_table_insert:648):** successfully inserted **(valid unicast) inst 0 node 1->0 fwd 0 fwd_tp 4 egress 0
 [200418.025994] (map_table_check_access:752):my id 1 ->  remote id 0 : entry was valid - is now invalid
 [200418.025998] (map_table_insert:648):** successfully inserted **(valid unicast) inst 0 node 1->2 fwd 0 fwd_tp 4 egress 0
 [200421.743916] Session callback reason=0 status=0 target_node=12
 [200422.073776] DXH Adapter 0 : Set reachable remote node list.
 [200422.342446] Session callback reason=7 status=0 target_node=12
 [200422.342454] DXH Adapter 0 : Session for node 12 is ok.

I'm awaiting feedback from experts.

 

ELOG V3.1.3-