[Joe, Alex]
Problem Symptoms:
There were red lights on the status screen indicating RFM errors for the c1scy, c1mcs and c1rfm processes.
The c1iscey, c1sus machines were receiving data sent over the RFM network from the c1ioo computer with a bad time stamp, a few cycles too late. The c1iscex computer was receiving data from c1ioo fine.
Problem:
The c1iscex RFM card had gotten into a bad state and was somehow slowing things down/corrupting data. It didn't affect itself, but due to the loop topology was messing everyone else up. Basically the only one who wasn't throwing an error was the culprit.
Solution:
Hard power cycling the c1iscex computer reset the RFM card and fixed the problem. |