40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  ATF eLog  Not logged in ELOG logo
Message ID: 731     Entry time: Tue Apr 20 18:54:18 2010
Author: Frank 
Type: Computing 
Category: DAQ 
Subject: DAQ odyssey - part 2 

welcome to part 2: i forgot a couple of things in part 1 which i will post here now before we go on with the installation of a second model on the same frontend.

As explained yesterday, the gds configuration of the systems sharing the same cvs is stored in /caltech/target/gds/params. The testpoints are created and written to a file named tpchn_Cx.par, where x equals the gds_node_id in the model. This filename has to be added to /etc/xinetd.d/chnconf, which looks like this

service chnconf
{
        disable                 = no
        type                    = RPC
        rpc_version             = 2-3
        socket_type             = stream
        protocol                = tcp
        wait                    = yes
        user                    = root
        server                  = /apps/Linux/gds/bin/chnconfd
        env                     = HOME=/cvs/cds/caltech/target/gds
        server_args             = /cvs/cds/caltech/target/gds/param/tpchn_C2.par
}

The information (ip address and ports) about the services was configured in /cvs/cds/caltech/target/gds/param/diag_Cx.conf, where x was abritrary, but independent for each system!

&nds * *  10.0.0.12 8088 *
&chn * *  10.0.0.12 822087685 1
&leap * * 10.0.0.12 822087686 1
&ntp * *  10.0.0.12 * *
&err 0 *  10.0.0.12 5353 *

This file has to be added in /etc/xinetd.d/diagconf, which looks like this

service diagconf
{
        disable                 = no
        port                    = 5355
        socket_type             = dgram
        protocol                = udp
        wait                    = yes
        user                    = root
        passenv                 = PATH
        server                  = /apps/Linux/gds/bin/diagconfd
        env                     = HOME=/cvs/cds/caltech/target/gds
        server_args             = /cvs/cds/caltech/target/gds/param/diag_C3.conf
}


After running the script to start the frontend code we can check if everything is running by using the diag -i command, which gives us a list of all the processes actually running like ntp, nds, awg and the testpoint manager etc. If they don't show up something is wrong with the configuration. S
tarting awg and tpman on the machine manually could give some hints

sudo /cvs/cds/caltech/target/gds/bin/awgtpman -s psl -4
64 kHz system
Spawn testpoint manager

what we see is that the testpoint manager doesn't start and is waiting for something and after 10s or so we get an error message. The problem is the network configuration. The reason why we had to move to an isolated network was the broadcasting for the gds stuff. So we moved our second machine to a new subnetwork. But we still share the same /cvs directory. Here we have the testpoint.par file which only exists once for all testpoint managers (hard coded). This file contains the following in our example:

[L-node0]
hostname=10.0.1.10
system=atf

[L-node1]
hostname=10.0.0.12
system=psl

 

The problem is now that the code tries to start the manager by browsing

But even if everything is right we still have a problem which shows up if we wanna stop the code using the kill script:

[controls@fb2 param]$ killpsl
CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C3:PSL-GENERIC_DOF2_Name02", Connecting to: fb2:5064, Ignored: 10.0.1.13:5064"
    Source File: ../cac.cpp line 1224
    Current Time: Tue Apr 20 2010 19:02:46.161552000
..................................................................
CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C3:PSL-GENERIC_DOF2_Name04", Connecting to: fb2:5064, Ignored: 10.0.1.13:5064"
    Source File: ../cac.cpp line 1224
    errlog = 2370 messages were discarded

The problem is the network configuration. The reason why we had to move to an isolated network was the broadcasting for the gds stuff. So we moved our second machine to a new subnetwork. But we still share the same /cvs directory. Here we have the testpoint.par file which only exists once for all testpoint managers (hard coded). Thsi file contains the following in our example:

[L-node0]
hostname=10.0.1.10
system=atf

[L-node1]
hostname=10.0.0.12
system=psl

the problem now is that we can't start part of the services due to a timeout when trying to find the machines configured in that file. We don't see the other IP and so we get a timeout without starting the local one. This can be fixed by adding a second interface which is in this network WITHOUT connecting it(!). This is very important as we are not allowed to broadcast packets into the other network !!

If we do this we then can run everything. Now we get in trouble if we wanna stop the frontend code using the kill script. If we do so we get the following:

[controls@fb2 param]$ killpsl
CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C3:PSL-GENERIC_DOF2_Name03", Connecting to: fb2:5064, Ignored: 10.0.1.13:5064"
    Source File: ../cac.cpp line 1224
    Current Time: Tue Apr 20 2010 19:15:18.18726000
..................................................................
CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C3:PSL-GENERIC_DOF2_Name04", Connecting to: fb2:5064, Ignored: 10.0.1.13:5064"
    Source File: ../cac.cpp line 1224
errlog = 354 messages were discarded
CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C3:PSL-GENERIC_DOF6_Name01", Connecting to: fb2:5064, Ignored: 10.0.1.13:5064"
    Source File: ../cac.cpp line 1224
    Current Time: Tue Apr 20 2010 19:15:18.21537000

The reason is that we have now two interfaces on the same machine which gives us the "Identical process variable names on multiple servers" error message from the EPICS part. This prevents us from stopping the code. The only way to stop it is to manually kill the processes.

This however can be fixed by making a manual entry for two environmental variables:

EPICS_CA_ADDR_LIST="10.0.x.x"
EPICS_CA_AUTO_ADDR_LIST="NO"

using the local ip address for EPICS_CA_ADDR_LIST which tells EPICS that there is only one Channel Access Server which is our local machine and nothing else.

After adding those everything is fine and the second frontend is working as usual. Of cause we have to make all the changes for the first frontend too, because this is not working anymore since we edited the config files for awg, tptman etc....

part3 of the odyssey will describe why our second frontend computer doesn't work and how we managed to run a second model on fb0 using an additional expansion chassis connected via fiber to the PSL lab.

 

 

ELOG V3.1.3-