40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  ATF eLog  Not logged in ELOG logo
Message ID: 729     Entry time: Mon Apr 19 19:09:42 2010
Author: Frank 
Type: Computing 
Category: DAQ 
Subject: todays DAQ odyssey - part 1 

when we started last week setting up a second fronend i thought it might be simple as making another copy of the main disk, changing some config files and here we go - what alex and i learned is that it is way more complicated with a touch of impossible at all. but lets get started from the beginning:

the idea was to set up a second RT frontend computer with it's own framebuilder, either in the same network or a different subnetwork. The reason why we should have seperate framebuilders is that if we have only one and the first (which whatever we define as the first) of the frontends is down, the whole thing is down. So having more then one model running on different machines, the one which we define as the "main" or first one has to be alive, at all times. If not the others don't work anymore. 

Trying to setting it up with independent framebuilders in the same network is impossible due to broadcast messages on the network prohibiting both working at the same time and still using tools like DTT, which listen only to one broadcasting in the network. The minimum requirement is to have physically seperate networks.

Ok, thats fine with us, we decided to split our networks into seperate subnetworks anyway, but then we can't use the existing workstations and the installed tools for more than one network due to the broadcasting stuff. Using the same workstations requires to log into the corresponding frontend/framebuilder and start all tools localy, which is not nice but still works.


Said this we decided to set everything up like that an the next thing we realized is that the stuff we have currently installed uses a the cvs-stuff mounted from one central source. But the frontend code we had was not designed for that, e.g. important parameters are not set in the matlab model and main configuration files are simply overwritten when compiling one of the frontend codes. So we had to add a couple of things in the matlab file, like the gds_node_id. An example of current cdsparameters are:

site=C3
rate=64K
dcuid=10
gds_node_id=2
shmem_daq=1
specific_cpu=3

We had to hack plenty of things which i can't remember all but e.g. we had to add the right node ID in the testpoint.par file in /opt/apps/Linux/gds/param/ as "L-node" - yes we have to use LHO not Caltech here , e.g

[L-nodex]  (here x=gds_node_id of atf model)
hostname=ip of frontend running atf model
system=atf

[L-nodez] (here y=gds_node_id of psl model)
hostname=ip of frontend running psl model
system=psl

the testpoints are created and written to a file named tpchn_Cx.par, where x equals the gds_node_id again, so a model in ifo=C1 and node-id=2 creates a file tpchn_C2.par !!! So this C2 does not correspond to the IFO set in the model. So e.g. using two different models, IFO=C2 and C3, both running on different frontends (!) but starting with node-id=1 (if you don't specify it in the model default is 1) overwrites the one from the other model each time the model is recompiled !!!! So be carefull. Also a link named tpchn_Lx.par has to point to tpchn_Cx.par (the LHO thing again).... this file has to be also added to the list in the fb master file...

the gds stuff is configured in diag_Cx.conf, x is abritrary, but independent for each system. It looks like:

&nds * *  10.0.1.10 8088 *
&chn * *  10.0.1.10 822087685 1
&leap * * 10.0.1.10 822087686 1
&ntp * *  10.0.1.10 * *
&err 0 *  10.0.1.10 5353 *

containing the ip address for the corresponding machine for all the services.

Same for AWG, which setting can be found in awg.par (again, use LHO(!):

[L1-awg0]
hostname=10.0.1.10

at the end it didn't work because the second frontend computer, even if it is almost identical (it's the same identical model, but a slightly newer version of the mainboard/bios) which is not capable of running the RTcore in realtime.  So, this is the end of part 1 of this odyssey, having only one computer where we can run stuff on which brought us to the point where we had to use the same machine (fb0) for both models, but read part 2 of this odissey why it took another four hours to get it (basically) work, which will be posted soon.....

 

 

ELOG V3.1.3-