Today I've been trying to get the new frame builder, tentatively 'fb1', to work. It's not fully working yet, so I'm about to revert the system back to using 'fb'. The switch-over process is annoying, since our one myrinet card has to be moved between the hosts.
A brief update on the process so far:
I'm being a little bold with this system by trying to build daqd against more system libraries, instead of the manually installed stuff usually nominally required. Here's some of the relevant info about th fb1 system:
- Debian 7 (wheezy)
- lscsoft ldas-tools-framecpp-dev 2.4.1-1+deb7u0
- lscsoft gds-dev 2.17.2-2+deb7u0
- lscsoft libmetaio-dev 8.4.0-1+deb7u0
- lscsoft libframe-dev 8.20-1+deb7u0
- /opt/rtapps/epics-1.4.12.2_long
- /opt/mx-1.2.16
- advLigoRTS trunk
I finally managed to get daqd to build against the advLigoRTS trunk (post 2.9 branch). I'll post detailed build log once I work out all the kinks. It runs ok, including writing out full frames, as well as second and minute trends and raw minute trends, but there are a couple of show-stopper problems:
- daqd segfaults if the C1EDCU.ini is specified. If I comment out that one file from the 'master' channel ini file list then it runs without segfaulting.
- Something is going on with the mx_streams from the front ends:
- They appear to look ok from the daqd side, but the FEC-<ID>_FB_NET_STATUS indicators remain red. The "DAQ" bit in the STATE_WORD is also red. Again, this is even though data seems to be flowing.
- The mx_stream processes on the front ends are dying (and restarting via monit) about every 2 minutes. It's unclear what exactly is happening, but they all dia around the same time, so it possibly initiated from a daqd problem. Around the time of the mx_stream failures, we see this in the daqd log:
[Tue Sep 22 17:24:07 2015] GPS MISS dcu 91 (TST); dcu_gps=1127003062 gps=1127003063
Aborted 1 send requests due to remote peer Aborted 1 send requests due to remote peer 00:25:90:0d:75:bb (c1sus:0) disconnected
mx_wait failed in rcvr eid=004, reqn=11; wait did not complete; status code is Remote endpoint is closed
00:30:48:d6:11:17 (c1iscey:0) disconnected
mx_wait failed in rcvr eid=002, reqn=235; wait did not complete; status code is Remote endpoint is closed
disconnected from the sender on endpoint 002
mx_wait failed in rcvr eid=005, reqn=253; wait did not complete; status code is Bad session (missing mx_connect?)
disconnected from the sender on endpoint 005
disconnected from the sender on endpoint 004
[Tue Sep 22 17:24:13 2015] GPS MISS dcu 39 (PEM); dcu_gps=1127003062 gps=1127003069
- Occaissionally the daqd process dies when the front end mx_streams processes die.
I'll keep investigating, hopefully with some feedback from Keith and Rolf tomorrow. |