Since this morning, the fb's timing has been off. Steve pointed it out to me earlier today, but I didn't have a chance to look at it until now.
This was different from the more common problem of the mx stream needing to be restarted - that causes 3 red blocks per core, on all cores on a computer, but it doesn't have to be every computer. This was only one red block per core in the CDS FE status screen, but it was on every core on every computer.
The error message, when you click into the details of a single core, was 0x4000. I elog searched for that, and found elog 6920, which says that this is a timing issue with the frame builder. Since Jamie had already set things on nodus' config correctly, all I did was reconnect the fb to the ntp:
fb$ sudo /etc/init.d/ntp-client restart
As in elog 6920, the daqd stopped, then restarted itself, and cleared the error message. It looks like everything is good again.
I suspect (without proof) that this may have to do with the campus network being down this morning, so the computers couldn't sync up with the outside world. |