After the catastrophic fb disk failure last week we lost essentially the entire front end system (not any of the userapp code, but the front end boot server, operating system, and DAQ). The fb disk was entirely unrecoverable, so we've been trying to rebuild everything from the bits and pieces lying around, and some disks that Keith Thorne sent from LLO. We're trying to get the front ends working first, and will work on recovering daqd after.
Luckily, fb1, which was being configured as an fb replacement, is mostly fully configured, including having a copy of the front end diskless root image. We setup fb1 as the new boot server, and were able to get front ends booting again. Unfortunately, we've been having trouble running and building models, so something is still amis. We've been taking a three-pronged approach to getting the front ends running:
- /diskless/root.fb: This involves booting the front ends from the backup of the diskless root from fb. Runs gentoo kernel 2.6.34.1. This should correspond to the environment that all models were built and running against. But something is missing in the configuration. The front ends were also mounting /opt from fb, which included the dolphin drivers, and we don't have a copy of that, so models aren't loading or recompiling.
- /diskless/root.x1boot: Keith sent a disk image of the entire x1boot server from LLO. It uses gentoo kernel 3.0.8. This ostensibly includes everything we should need to run the front ends, but it's unfortunately configured with newer versions of some of the software and also isn't loading our existing models or building new ones. This also seems to be having issues with the dolphin drivers.
- /diskless/root.jessie: This is an entirely new boot image build from scratch with Debian jessie, using an RTS-patched 3.2 kernel. This would use the latest versions of everything. It's mostly working, we just need to rebuild the dolphin driver and source.
It seems that in all cases we need to rebuild the dolphin drivers from source. |