I would make a detailed post with how the problems were fixed, but unfortunately, most of what we did was not scientific/systematic/repeatable. Instead, I note here some general points (Jamie/Koji can addto /correct me):
This should still work, but the address has changed. The daqd was split up into three separate binaries to get around the issue with the monolithic build that we could never figure out. The address of the data concentrator (DC) (which is the thing that needs to be restarted) is now 8083.
Koji suggested trying to simply retsart the ASS model to see if that fixes the weird errors shown in Attachment #2. This did the trick. But we are now faced with more confusion - during the restart process, the various indicators on the CDS overview MEDM screen froze up, which is usually symptomatic of the machines being unresponsive and requiring a hard reboot. But we waited for a few minutes, and everything mysteriously came back. Over repeated observations and looking at the dmesg of the frontend, the problem seems to be connected with an unresponsive NFS connection. Jamie had noted sometime ago that the NFS seems unusually slow. How can we fix this problem? Is it feasible to have a dedicated machine that is not FB1 do the NFS serving for the FEs?
I don't think the problem is fb1. The fb1 NFS is mostly only used during front end boot. It's the rtcds mount that's the one that sees all the action, which is being served from chiara.