40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Fri Sep 8 12:08:32 2017, Gabriele, Summary, LSC, Good reconstruction of PRMI degrees of freedom with deep learning full_real_data_2017_09_06b_100x4_4000_300_20_free_swinging_time_domain.png.pngfull_real_data_2017_09_06b_100x4_4000_300_20_free_swinging_free_swinging_histograms2d.png.pnggru_2017_08_13a_100x4_4000_300_20_free_swinging_simulation_reference.png.png
    Reply  Fri Oct 6 12:56:40 2017, gautam, Summary, LSC, RTCDS NN post_NN_test.png
       Reply  Tue Oct 17 17:53:25 2017, jamie, Summary, LSC, prep for tests of Gabriele's neural network cavity length reconstruction c1dnn.png
          Reply  Wed Oct 18 12:14:08 2017, jamie, Summary, LSC, prep for tests of Gabriele's neural network cavity length reconstruction 
             Reply  Thu Oct 19 15:42:03 2017, jamie, Summary, LSC, MICH/PRCL reconstruction neural network running on c1lsc NN.pngC1DNN_GDS.pngC1DNN_CPU_METER.png
                Reply  Tue Oct 24 20:14:21 2017, jamie, Summary, LSC, further testing of c1dnn integration; plugged in to DAQ NN.pngc1dnn_out.png
                   Reply  Wed Oct 25 09:32:14 2017, Gabriele, Summary, LSC, further testing of c1dnn integration; plugged in to DAQ 
                   Reply  Mon Nov 6 18:22:48 2017, jamie, Summary, LSC, current procedure for running c1dnn code 
                      Reply  Thu Nov 9 10:51:37 2017, gautam, Summary, LSC, current procedure for compiling and installing c1dnn code 
Message ID: 13390     Entry time: Wed Oct 18 12:14:08 2017     In reply to: 13383     Reply to this: 13395
Author: jamie 
Type: Summary 
Category: LSC 
Subject: prep for tests of Gabriele's neural network cavity length reconstruction 
Quote:

I tried a manual test of the new user space model.  Since this is a user space process running it should have no affect on the rest of the front end system (which it didn't):

  • Manually started the c1dnn EPICS IOC:
    • $ (cd /opt/rtcds/caltech/c1/target/c1dnn/c1dnnepics && ./startupC1)
  • Tried running the model user-space process directly:
    • $ taskset -c 6 /opt/rtcds/caltech/c1/target/c1dnn/bin/c1dnn -m  c1dnn

Unfortunately, the process died with an "ADC TIMEOUT" error.  I'm investigating why.

Once we confirm the model runs, we'll add the appropriate SHMEM IPC connections to connect it to the c1lsc model.

I tried moving the model to c1ioo, where there are plenty of free cores sitting idle, and the model seems runs fine.  I think the problem was just CPU contention on the c1lsc machine, where there were only two free cores and the kernel was using both for all the rest of the normal user space processes.

So there are two options:

  • Use cpuset on c1lsc to tell the kernel to remove all other processes from CPU6 and save it just for the c1dnn model.  This should not have any impact on the running of c1lsc, since that's exactly what would be happening if we were running the model in kernel space (e.g. isolating the core for the front end model).  The auxilliary support user space processes (epics seq/ioc, awgtpman) should all run fine on CPU0, since that's what usually happens.  Linux is only using the additional core since it's there.  We don't have much experience with cpuset yet, though, so more offline testing will be required first.
  • Run the model on c1ioo and ship the needed signals to/from c1lsc via PCIe dolphin.  This is potentially slightly more invasive of a change, and would put more work on the dolphin network, but it should be able to handle it.

I'm going to start testing cpuset offline to figure out exactly what would need to be done.

ELOG V3.1.3-