Koji was unable to build his c1lst model first thing this morning. Turns out there was a bug with RCG parser that was introduced on Friday when we did the RCG updates. We talked Alex who did a quick comment fix. The diff is as follows:
--- Parser3.pm (revision 2328)
+++ Parser3.pm (working copy)
@@ -1124,8 +1124,8 @@
print "Flattening the model\n";
print "Finished flattening the model\n";
- CDS::Tree::do_on_nodes($root, \&remove_tags, 0, $root);
- print "Removed Tags\n";
+ #CDS::Tree::do_on_nodes($root, \&remove_tags, 0, $root);
+ #print "Removed Tags\n";
CDS::Tree::do_on_nodes($root, \&remove_busses, 0, $root);
This was some code to remove TAGs from the .mdl file for some reason which I do not understand at this time. I will ask tommorrow in person so I can understand the full story.
Koji then rebuilt and started the c1lst process. This is his new test version of the LSC code. We descovered (again) that when you activate too many DAQ channels (simply uncommenting them, not even recording them with activate=1 in the .ini file) that the frame builder crashes. In addition, the c1lsc machine, which the code was running on, also hard crashed.
When a channel gets added to the .ini file (or uncommented) it is sent to the framebuilder, irregardless of whether its recorded or not by the frame builder. There is only about 2 megabytes per second bandwidth per computer. In this case we were trying to do something like 200 channels * 16384 Hz * 4 bytes = 13 megabytes per second.
The maximium number of 16384 channels is roughly 30, with little to no room for anything else. In addition, test points use the same allocated memory structure, so that if you use up all the capacity with channels, you won't be able to use testpoints to that computer (or thats what Alex has led me to believe).
The daqd process then core dumped and was causing all sorts of martian network slowdowns. At the same time, the c1lsc computer crashed hard, and all of the front end processes except for the IOP on c1sus crashed.
We rebooted c1lsc, and restarted the c1sus processes using the startc1SYS scripts. However, the c1susfe.ko apparently got stuck in a wierd state. We were completely unable to damp the optics and were in general ringing them up severely. We tried debugging, including several burt restores and single path checks.
Eventually we decided to reboot the c1sus machine after a bit of debugging. After doing a burt restore after the reboot, everything started to damp and work happily. My best guess is the kernel module crashed in a bad way and remained in memory when we simply did the restart scripts.