I've been preparing for testing Gabriele's deep neural network MICH/PRCL reconstruction. No changes to the front end have been made yet, this is all just prep/testing work.
Background:
We have been unable to get Gabriele's nn.c code running in kernel space for reasons unknown (see tests described in previous post). However, Rolf recently added functionality to the RCG that allows front end models to be run in user space, without needing to be loaded into the kernel. Surprisingly, this seems to work very well, and is much more stable for the overall system (starting/stopping the user space models will not ever crash the front end machine). The nn.c code has been running fine on a test machine in this configuration. The RCG version that supports user space models is not that much newer than what the 40m is running now, so we should be able to run user space models on the existing system without upgrading anything at the 40m. Again, I've tested this on a test machine and it seems to work fine.
The new RCG with user space support compiles and installs both kernel and user-space versions of the model.
Work done:
- Create 'c1dnn' model for the nn.c code. This will run on the c1lsc front end machine (on core 6 which is currently empty), and will communicate with the c1lsc model via SHMEM IPC. It lives at:
- /opt/rtcds/userapps/release/isc/c1/models/c1dnn.mdl
- Got latest copy of nn.c code from Gabriele's git, and put it at:
- /opt/rtcds/userapps/release/isc/c1/src/nn/
- Checked out the latest version of the RCG (currently SVN trunk r4532):
- /opt/rtcds/rtscore/test/nn-test
- Set up the appropriate build area:
- /opt/rtcds/caltech/c1/rtbuild/test/nn-test
- Built the model in the new nn-test build directory ("make c1dnn")
- Installed the model from the nn-test build dir ("make install-c1dnn")
Test:
I tried a manual test of the new user space model. Since this is a user space process running it should have no affect on the rest of the front end system (which it didn't):
- Manually started the c1dnn EPICS IOC:
- Tried running the model user-space process directly:
Unfortunately, the process died with an "ADC TIMEOUT" error. I'm investigating why.
Once we confirm the model runs, we'll add the appropriate SHMEM IPC connections to connect it to the c1lsc model. |