40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 249 of 344  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  3516   Thu Sep 2 17:43:30 2010 josephbUpdateCDSOne working BO output module, others not so much

 Joe and Kiwamu:

We found one bug in the RCG code, where the second input for the CDO32 part (32 binary output) was simply a repeat of the first input, and totally ignored the second input.  This was fixed in the /advLigoRTS/src/epics/util/lib/CDO32.pm file by changing 

$calcExp .= $::fromExp[0];

to

$calcExp .= $::fromExp[1];

This fix has been added to the svn.  Unfortunately, while we have a single working binary output module, the 2nd and later modules do not seem to be responding at all.  We've done the usual swaping parts of the path in both software and hardware and can't find any bad pieces in our model files or the actual hardware.   That leaves me wondering about the c code, specifically if the CDO32Output[1], CDO32Output[2], and so forth array entries in the code are being handled properly.  I'll try to get some thoughts on it from Alex tomorrow.

  3521   Fri Sep 3 11:23:16 2010 josephbConfigurationComputersrossa nvidia driver and dual monitor configuration
At LLO the machines are running Centos 5.5. A quick login confirms this. Specifically the release is 2.6.18-194.3.1.el5.

Quote:

Why are we running CentOS 4.8 instead of 5.5 ?    What runs at LLO?     What runs in Downs?

 

  3531   Tue Sep 7 10:50:53 2010 josephbConfigurationComputersrossa notes

The controls group is user id 500 by default on most new machines. Unfortunately, the user ID used across the already existing machines is 1001. One method of doing this switch is in this elog.  You can also do the change of the controls ID by becoming root and using the graphical command system-config-users.  This will  let you change the user ID and group ID for controls to 1001.  This graphical interface also lets you change the login shell.

Unfortunately, I had some minor difficulty and I ended up removing the old controls and creating a new controls account with the correct values and using tcsh.  The .cshrc file has been recreated to source cshrc.40m.  The controls account now has correct permissions, although some of the preferences such as background will need to be reset.

 

 

Quote:

**** Deleted
apps/emacs
apps/linux64/firefoxold
apps/linux64/comsol      (old v. 3.5)

* running up2date on rossa

* rossa needs to be able move windows between monitors: Xinerama?

* there are permissions problems: controls on rossa can't
make and delete directories made by 'controls' elsewhere.
Some sort of user# or group issue?

 

  3546   Wed Sep 8 14:44:37 2010 josephbUpdateCDSUpdated parse_mdl_to_ipc.py and feCodeGen.pl

I updated parse_mdl_to_ipc.py to correctly work with the 3 new cdsIPCx parts, namely cdsIPCx_PCIE (for dolphin connections), cdsIPCx_RFM (for our traditional reflected memory connections), and cdsIPCx_SHMEM (local shared memory on the computer).  These parts replaced cdsIPCx awhile back (see here ).

The code now correctly counts each type independantly with regards to ipcNum (I.e. you can have ipcNum = 0 for RFM and ipcNum = 0 for SHMEM for example).

 

I also went in and modified a few sections of the feCodeGen.pl (in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util/) so as to properly assign names to matrix adl files, matrix of filter bank adl files, and filter bank adl files.

  3553   Thu Sep 9 14:42:40 2010 josephbUpdateCDSUpdating suspensions model

I've been working on updating the suspensions model, incorporating Koji's refinements as well as trying to simplify the model and making it less cluttered. 

This had the side benefit of making an incorrect connection obvious.  I had incorrectly wired the ASCPIT input to be summed into the yaw path, and the ASCYAW input into the pitch path.  This has been corrected.

I've finished a single optic, and now I am in the process of propagating the changes to all the optics, as well as cleaning up the overall diagram using Rolf's new tags, which make things much less cluttered.  I've attached a screenshot of the PRM optic control model, and will be updating the Matlab web export once I've updated the full model.

Attachment 1: SingleSUS.png
SingleSUS.png
  3564   Mon Sep 13 10:22:48 2010 josephbUpdateCDSRCG bugs/feature request wiki page

I've started a wiki page under the Upgrade 09/New CDS section regarding known bugs and pending feature requests for the Real Time Code Generator.   It can be found at http://lhocds.ligo-wa.caltech.edu:8000/40m/Bugs_and_Pending_Feature_requests_for_the_RCG.  If you have any ideas to improve the RCG or encounter a bug in the code generation process (say a particular part doesn't work inside subsystems for example), please note them here.

Currently there are bugs with excitation points (they don't work inside of a subsystem block) and tags (they don't respect scope and only 1 "from" tag for each "goto" tag connected to the output of a subsystem block).

  3576   Wed Sep 15 14:34:57 2010 josephbSummaryCDSPlan for RFM switch over

Steps for RFM switch over:

1) Ensure the new frame builder code is working properly:

   A) Get Alex to finish compiling the frame builder and test on Megatron.

   B) Test the new frame builder code on fb40m (which is running Solaris) in a reversible way.  Change directory structure away from Data1, Data2, to use actual times.

   C) Confirm new frame builder code still records slow channels (c1dcuepics).

2) Ensure awg, tpman, and diagnostic codes (dtt) are working with the new front end code.

3) Physically move RFM cables from old front ends to the new front ends.  Remove excess connections from the network.

4) Merge the megatron/c1sus/c1iscex/c1ioo network with the main network.

   A) Update all the network settings on the machines as well as Linux1

   B) Remove the network switch separating the networks.

4) Start the new frame builder code on fb40m.

  3583   Fri Sep 17 12:11:42 2010 josephbUpdateCDSDowns update

In doing a re-inventory prior to the IOO chassis installation, I re-discovered we had a missing interface board that goes in an IO chassis.  This board connects the chassis to the computer and lets them talk to each other.  After going to Downs we remembered Alex had taken a possibly broken interface board back to downs for testing. 

Apparently the results of that testing was it was broken.  This was about 2.5 months ago and unfortunately it hadn't been sent back for repairs or a replacement ordered.  Its my fault for not following up on that sooner.

I asked Rolf what the plan for the broken one was.  His response was  they were planning on repairing it, and that he'd have it sent back for repairs today.  My guess the turn around time for that is on the order of 3-4 weeks (based on conversations with Gary), however it could be longer.  This will affect when the last IO chassis (LSC) can be made fully functional.  I did however pickup the 100 foot fiber cable for going between the LSC chassis and the LSC computer (which will be located in 1X3).

As a general piece of information, according to Gary the latest part number for these cards is OSS-SHB-ELB-x4/x8-2.0 and they cost 936 dollars (latest quote).

  3584   Fri Sep 17 14:55:01 2010 josephbUpdateCDSTook 5565 RFM card from IOVME to place in the new IOO chassis

I took the 5565 RFM card out of the IOVME machine to so I could put it in the new IO chassis that will be replacing it.  It is no longer on the RFM network.  This doesn't affect the slow channels associated with the auxilliary crate.

  3588   Mon Sep 20 10:33:21 2010 josephbBureaucracyComputersLarry stopped by - GC machine had conflicting IP

Larry stopped by today and had to disconnect the m25 machine (this is the 1st GC machine on the left as you walk into the control room) because its IP was conflicting with a machine over in Downs.  Do not use 131.215.115.125 as the IP on this machine as this is already assigned to someone else.  They couldn't figure out the root password to change it which is why it is not currently plugged into the network, and is not to be until an appropriate IP is assigned.

They've asked that whoever set the machine up to please contact them (extension 2974).

  3589   Mon Sep 20 11:39:45 2010 josephbUpdateCDSSwitch over

I talked to with Alex this morning, discussing what he needed to do to have a frame builder running that was compatible with the new front ends.

 

1) We need a heavy duty router as a separate network dedicated to data acquisition running between the front ends and the frame builder.  Alex says they have one over at downs, although a new one may need to be ordered to replace that one.

2) The frame builder is a linux machine (basically we stop using the Sun fb40m and start using the linux fb40m2 directly.).

3) He is currently working on the code today.  Depending on progress today, it might be installable tomorrow.

  3590   Mon Sep 20 16:59:26 2010 josephbUpdateCDSMegatron in 1X2 rack, to be come c1ioo

[Rana, Koji, Joe]

We pulled the phase shifters in the 1X2 rack out to make room for megatron.  Megatron will be converted into c1ioo, and the 8 core, 1U computer will be used as c1lsc.  A temporary ethernet cable was run from 1X2 to 1X3 to connect megatron to the same sub-network.

The c1lsc machine was worked on today, setting it up to run the real time code, along with the correct controls accounts, passwords, .cshrc files, etc.  It needs to be moved from 1X1 to 1X4 tomorrow.

  3593   Tue Sep 21 16:05:21 2010 josephbUpdateCDSFirst pass at rack diagram

I've made a first pass at a rack diagram for the 1X1 and 1X2 racks, attached as png.

Gray is old existing boards, power supplies etc.  Blue is new CDS computers and IO chassis, and gold is for the Alberto's new RF electronics.  I still need to double check on whether some of these boards will be coming out (perhaps the 2U FSS ref board?).

Attachment 1: 1X1_1X2_racks.png
1X1_1X2_racks.png
  3594   Wed Sep 22 16:35:45 2010 josephbUpdateCDSFibers pulled, new FB install tomorrow

[Aidan, Tara, Joe]

We pulled out what used to be the LSC/ASC fiber from the 1Y3 arm rack, and then redirected it to the 1X1 rack.  This will be used as the c1ioo 1PPS timing signal.  So c1ioo is using the old c1iovme fiber for RFM communications back to the bypass switch, and the old LSC fiber for 1PPS.

The c1sus machine will be using the former sosvme fiber for communications to the RFM bypass switch.  It already had a 1 PPS timing fiber.

The c1iscex machine had a new timing fiber already put in, and will be using the c1iscey vme crate's RFM for communication.

We still need to pull up the extra blue fiber which was used to connect c1iscex directly to c1sus, and reuse it as the 1PPS signal to the new front end on the Y arm. 

Alex has said he'll come in tomorrow morning to install the new FB code.

 

  3606   Fri Sep 24 22:58:40 2010 josephbUpdateCDSModified front end medm screens

To startup medm screens for the new suspension front end, do the following:

1) From a control room machine, log into megatron

ssh -X megatron

2) Move to the new medm directory, and more specifically the master sub-directory

cd /opt/rtcds/caltech/c1/medm/master/

3) Run the basic sitemap in that directory

medm -x sitemap.adl

 

The new matrix of filters replacing the old ULPOS, URPOS, etc type filters is now on the screens.  This was previously hidden.  I also added the sensor input matrix entry for the side sensor.

Lastly, the C1SUS.txt filter bank was updated to place the old ULPOS type filters into the correct matrix filter bank.

 

The suspension controls still need all the correct values entered into the matrix entries (along with gains for the matrix of filter banks), as well as the filters turned on.  I hope to have some time tomorrow morning to do this, which basically involves looking at the old screens copying the values over.  The watch dogs are still controlled by the old control screens.  This will be fixed on Monday when I finish switching the front ends over from their sub-network to the main network, at which point logging into megatron will no longer be necessary.

  3612   Mon Sep 27 17:35:13 2010 josephbUpdateCDSUpdated Suspension screens/Megatron now c1ioo/Further work on fb

The medm screens have been updated further, with the hidden matrices added in bright colors.  An example screen shot is attached.

Megatron has been renamed c1ioo and moved to martian network.  Similarly, c1sus and c1iscex are also on the martian network.  Medm screens can be run on any of the control machines and they will work.

Currently the suspension controller is running on c1sus.

The frame builder is currently running on the fb machine *however* it is not working well.  Test points and daq channels on the new front ends tended to crash it when Alex started the mx_stream to the fb via our new DAQ network (192.168.114.XXX, accessible through the front ends or fb - has a dedicated 1 gigabit network with up to 10 gigabit for the fb).  So for the moment, we're running without front end data. Alex will be back tomorrow to work on it. 

Alex claimed to have left the frame builder in a state where it should be recording slow data, however, I can't seem to access recent trends (i.e. since we started it today).  The frame builder throws up an error "Couldn't open raw minute trend file '/frames/trend/minute_raw/C1:Vac-P1_pressure', for example.  Realtime seems to work for slow channels however.  Remember to connect to fb, not fb40m. So it seems the fb is still in a mostly non-functional state.

Alex also started a job to convert all the old trends to the correct new data format, which should finish by tomorrow.

RA: Nice screen work. The old screens had a 'slow' slider effect when ramping the bias so that we couldn't whack the optic too hard. Is the new one instantaneous?

Attachment 1: MC1_Example_Screen.png
MC1_Example_Screen.png
  3615   Tue Sep 28 10:07:29 2010 josephbUpdateCDSUpdated Suspension screens/Megatron now c1ioo/Further work on fb

Quote:

RA: Nice screen work. The old screens had a 'slow' slider effect when ramping the bias so that we couldn't whack the optic too hard. Is the new one instantaneous?

 Looking at the sliders, I apparently still need to connect them properly.  There's a mismatch between the medm screen channel name and the model name.  At the moment there is no "slow" slider effect implemented, so they are effectively instantaneous.  Talking with Alex, he suggests writing a little c-code block and adding it to the model.  I can use the c code used in the filter module ramps as a starting point.

  3619   Wed Sep 29 11:18:36 2010 josephbUpdateCDSApps code changes

After asking Alex specifically what he did yesterday after I left, he indicated he copied  a bunch of stuff from Hanford, including the latest gds, fftw, libframe, root.  We also now have the new dtt code as well.  But those apparently were for the Gentoo build   After asking Alex about the ezca tools this morning, he discovered they weren't complied in the gds code he brought over.  We are in the process of getting the source over here and compiling the ezca tools. 

 

Alex is indicating to me that the currently compiled new gds code may not run on the Centos 5.5 since it was compiled Gentoo (which is what our new fb is running and apparently what they're using for the front ends at Hanford).  We may need to recompile the source on our local Centos 5.5 control machines to get some working gds code.  We're in the process of transferring the source code from Hanford.  Apparently this latest code is not in SVN yet, because at some point he needs to merge it with some other work other people have been doing in parallel and he hasn't had the time yet to do the work necessary for the merge.

For the moment, Alex is undoing the soft link changes he did pointing gds at the latest gds code he copied, and pointing back at the original install we had.

  3634   Fri Oct 1 11:53:42 2010 josephbConfigurationCDSAdded RCG simlink files to the 40m svn

I've added a new directory in /opt/rtcds/caltech/c1/core called rts_simlink.  This directory is now in the 40m svn.  Unfortunately, the simlink files used to generate the front end c codes live in a directory controlled by the CDS svn.  So I've copied the .mdl files from /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/simLink/ into this new directory and added them into the 40m svn.  When making changes to the simlink files, please copy them to this new directory and check them in so we can a useful history of the models.

 

  3636   Fri Oct 1 16:34:06 2010 josephbUpdateCDSc1sus not booting due to fb dhcp server not running

For some reason, the dhcp server running on the fb machine which assigns the IP address to c1sus (since its running a diskless boot) was down.  This was preventing c1sus from coming up properly.  The symptom was an error indicated no DHCP offers were made(when I plugged a keyboard and monitor in).

To check if the dhcp server is running, run ps -ef | grep dhcpd.  If its not, it can be started with "sudo /etc/init.d/dhcpd start"

  3642   Mon Oct 4 11:20:45 2010 josephbUpdateCDSFixed Suspension binary output list and sus model

I've updated the CDS wiki page listing the wiring of the 40m suspensions with the correct binary output channels.  I previously had confused the wiring of the Auxillary crate XY220 (watchdogs) with the SOS coil dewhitening bypasses.  So I had wound up with the wrong channels (the physical cables we plugged in were correct, just what I thought was going on channel X of that cable was wrong).  This has been corrected in the plan now.  The updated channel/cable list is at http://lhocds.ligo-wa.caltech.edu:8000/40m/Upgrade_09/CDS/Suspension_wiring_to_channels

  3644   Mon Oct 4 15:28:10 2010 josephbUpdateCDSTrying to get c1ioo booting as Gentoo.

I modified the dhcpd.conf file in /etc/dhcp on the fb machine.  I added a entry for c1ioo, listing its MAC address and ip number near the bottom of the file.  I then restarted the dhcp server using "sudo /etc/init.d/dhcpd restart" while on the fb machine.

I also modified the rtsystab, which is used to determine which front end codes start on boot up of a machine.  I added a line: c1ioo   c1x03  c1ioo

I am now in the process of getting c1ioo to come up as a Gentoo machine so I can build a model with an RFM connection in it and test the communication between c1sus and c1ioo.  This involves removing the hard drives and checking to make sure the boot priority is correct (i.e. it checks for a network boot).

  3661   Wed Oct 6 15:56:14 2010 josephbHowToCDSHow to load matrices quickly and easily

Awhile ago I wrote several scripts for reading in medm screen matrix settings and then writing them out.  It was meant as kind of a mini-burt just for matrices for switching between a couple of different setups quickly.

Yuta has expressed interest in having this instruction available.

In /cvs/cds/caltech/scripts/matrix/ there are 4 python scripts:

saveMatrix.py, oldSaveMatrix.py, loadMatrix.py, oldLoadMatrix.py

The saveMatrix.py and loadMatrix.py are for use with the current front end codes (which start counting from 1), where as the old*.py files are for the old system.

To use saveMatrix.py you need to specify the number of inputs, outputs, and the base name of the matrix (i.e. C1:LSP-DOF2PD_MTRX is the base of C1:LSP-DOF2PD_MTRX_0_0_GAIN for example), as well as an ouptut file where the values are stored.

So to save the BS in_matrix setting you could do (from /cvs/cds/caltech/scripts/matrix/)

./saveMatrix.py -i 4 -o 5 -n "C1:SUS-BS_TO_COIL_MTRX" -f -d ./to_coil_mtrx.txt

The -i 4 indicates 4 inputs, the -o 5 indicates 5 outputs, -n "blah blah" indicates the base channel name, -f indicates a matrix bank of filters (if its just a normal matrix with no filters, drop the -f flag), and -d ./to_coil_mtrx.txt indicates the destination file.

To write the matrix, you do virtually the same thing:

./loadMatrix.py -n "C1:SUS-PRM_TO_COIL_MTRX" -f -d ./to_coil_mtrx.txt

In this case, you're writing the saved values of the BS, to the PRM.  This method might be faster if you're trying to fill in a bunch of new matrices that are identical rather than typing 1's and -1's 20 times for each matrix.

I'll have Yuta add his how-to of this to the wiki.

  3665   Thu Oct 7 10:37:42 2010 josephbUpdateCDSc1sus with flaky ssh

Currently trying to understand why the ssh connections to c1sus  are flaky.  This morning, every time I tried to make the c1sus model on the c1sus machine, the ssh session would be terminated at a random spot midway through the build process.  Eventually restarting c1sus fixed the problem for the moment.

However, previously in the last 48 hours, the c1sus machine had stopped responding to ssh logins while still appearing to be running the front end code.  The next time this occurs, we should attach a monitor and keyboard and see what kind of state the computer is in.  Its interesting to note we didn't have these problems before we switched over to the Gentoo kernel from the real-time linux Centos 5.5 kernel.

  3678   Fri Oct 8 12:21:11 2010 josephbUpdateCDSchecking MC1 suspension damping

Upon investigation, it appears that the c1mcs model was (and still is) timing out after a random amount of time. Or in other words, it at some point it was taking too long to do all the calculations for a single cycle and falling behind. The evidence for this is from the dmesg command when run on c1sus.

There's a bunch of lines like:

[ 8877.438002] c1mcs: cycle 568 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[ 8877.438002] c1mcs: cycle 569 time 62; adcWait 0; write1 0; write2 0; longest write2 0

With a final line like: [ 8877.439003] c1mcs: ADC TIMEOUT 1 2405 37 2277

This last line indicates in fell so far behind it gave up.

However, this doesn't actually seem to be related to the amount of computation going on with the front end. I restarted the c1mcs model this morning by logging into the c1sus machine, and changing to the /opt/rtcds/caltech/c1/target/c1mcs/bin directory and running:

lsmod

sudo rmmod c1mcsfe

sudo insmod c1mcsfe.ko

The first line just lists the running modules. The second removes the c1mcs module, and the third starts it up again.

I proceeded to turn all the filters and and set all the matrix values while keeping an eye on the C1MCS-GDS_TP.adl screen and its timing indicator. It was running fine with all these turned on. Then I ran a dtt session from rosalba (going to /opt/apps/, typing bash, then source gds-env.bash, and finally diaggui) as well as a dataviewer and looked at 6 test point channels. It received data fine.

However, about 2 minutes after I had stopped doing this (although the dataviewer realtime session was still going) the USR timing jumped from about 20 microseconds to 35 microseconds, and the CPU Max timing jumped to the 50-60 microsecond range. At this point dmesg started reporting things like:

[54143.465613] c1mcs: cycle 1076 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[54143.526004] c1mcs: cycle 2064 time 62; adcWait 0; write1 0; write2 0; longest write2 0

About a minute later the code gave up and reported a ADC timeout via dmesg. None of the other front ends seem to be affected.

I had to clear the test points manually after stopping dataviewer and dtt by going to rosalaba,using the sourced gds-env.bash, and running diag -l. I then typed "tp clear 36 *" to clear all the test points on the model with FEC number 36 (corresponding to c1mcs).

I removed and restarted c1mcs again, trying to turn on a few things at a time and letting it run for a few minutes to see if I could narrow down if its one particular filter perhaps reaching an underflow and starting to bog down the computations. However, after about 45 minutes of this, the model is still running and I've turned all the filters on and have been running about 8 test points with no problem, so the problem is clearly intermittent.

Quote:

Notes:
 c1mcs crashed many times during the investigation, and I had to kill and restart it again and again.
 It seemed to be easily crashed when filters are on, and so I couldn't check whether the damping servo is working correctly or not today.

Next work:

  - fix c1mcs (and maybe others)
  - check the damping servo by comparing the displacements of each 4 degrees of freedom when servo in off and on.

 

  3680   Fri Oct 8 15:57:32 2010 josephbUpdateCDSc1ioo status

I've been trying to figure out why the c1ioo machine crashes when I try to run the c1ioo front end.

I tried removing some RFM components from the c1ioo model, and then the c1gpt model (Kiwamu's green locking model) as an easier test case.  Both cause the machine to lock up once they start running.  Lastly, I tried running the c1x02 and c1sus models on the c1ioo machine instead of the c1sus machine, after first turning off all models on c1sus.  This led to the same lockup. 

Since those models run fine on the c1sus machine, I could only conclude that a recent change in the fe code generator or the Gentoo kernel and the Sun X4600 computer don't work together at the moment.

After talking to Alex, he got the idea to check if the monitor() and mwait() were supported on the c1ioo machine.  These function calls were added relatively recently, and are   used to poll a memory location to see if something has been written there, and then do something when it is.  Apparently the Sun X4600 computers do not support this call.  Alex has modified the code to not use these functions calls, at least for now.  He'd like to add a check to the code so it does use those calls on machines which have them supported.

After this change, the c1ioo and c1gpt front end codes do in fact run.

  3681   Fri Oct 8 17:35:24 2010 josephbUpdateCDSstatus of c1ioo, c1sus and rfm

RFM is still not working.  I can see data on a filter just before it reaches the RFM sending code, but I see only zeros on the receiving side.

c1sus machine and c1x02, c1sus, c1mcs, c1rms are running.  At the moment, the c1mcs model is running at about 42 microseconds for USR time and 56 microseconds for CPU MAX, which is close to the 61 microsecond limit.  This is with MC filters on.  So far it has not been late, but its not clear to me if its going to stay that way.  So far I haven't been able to isolate why it sometimes slows down after a few minutes.  Also, it was running faster earlier in the day (around 30-ish microseconds) and I believe it has slowed down slightly in the last hour or two.

c1ioo machine and c1x03, c1ioo are running. However its not doing very much good as I can't get any data transferred from it to any of the optic suspensions. I need to spend some more time debugging this and then grab Alex I think.

  3687   Mon Oct 11 10:49:03 2010 josephbUpdateCDSc1sus stability

Taking a look at the c1sus machine, it looks as if all of the front end codes its running (c1sus - running BS, ITMX, ITMY, c1mcs - running MC1, MC2, MC3, and c1rms - running PRM and SRM) worked over the weekend.  As I see no

Running dmesg on c1sus reports on a single long cycle on c1x02, where it took 17 microseconds (~15 microseconds i maximum because the c1x02 IOP process is running at 64kHz).

Both the c1sus and c1mcs models are running at around 39-42 microseconds USR time and 44-50 microseconds CPU time.  It would run into problems at 60-62 microseconds.

Looking at the filters that are turned on, it looks as it these models were running with only a single optic's worth of filters turned on via the medm screens.  I.e. the MC2 and ITMY filters were properly set, but not the others.

The c1rms model is running at around 10 microseconds USR time and 14-18 microseconds CPU time.  However it apparently had no filters on.

It looks as if no test points were used this weekend.  We'll turn on the rest of the filters and see if we start seeing crashes of the front end again.

Edit:

The filters for all the suspensions have been turned on, and all matrix elements entered.  The USR and CPU times have not appreciably changed.  No long cycles have been reported through dmesg on c1sus at this time.  I'm going to let it run and see if it runs into problems.

  3714   Thu Oct 14 10:29:33 2010 josephbUpdateComputersMafalda ready for NDS2, updated Rosalba, Rossa repos

At John Zweizig's request, I installed a couple of packages from the lscsoft repository, along with libtools, automake, autoconf, and several kerberos packages, including cyrus-sasl, cyrus-sasl-lib, cyrus-sasl-devel, cyrus-sasl-gssapi, and krb5-libs.  All it needs now is John to come down and install the NDS2 server.

I copied the lscsoft.repo file from /etc/yum.conf.d/ on allegra to mafalda, as well as rosalba and rossa, just for completeness.  I also added the epel repository to rossa and installed the yum-priorities package and set the priorities on rossa's repositories. 

  3715   Thu Oct 14 10:59:10 2010 josephbUpdateComputersUpdated cshrc.40m and Computer Restart Procedures

I started modifying the cshrc.40m file in /cvs/cds/caltech/ so that it starts pointing at the new directories.

# misc aliases
alias target 'cd /opt/rtcds/caltech/c1/target'
alias scripts 'cd /opt/rtcds/caltech/c1/scripts'
alias chans 'cd /opt/rtcds/caltech/c1/chans'
alias c 'cd /opt/rtcds/caltech/c1'
alias s 'cd /opt/rtcds/caltech/c1/scripts'
alias u 'cd /cvs/cds/caltech/users'

I also added "alias screens 'cd /opt/rtcds/caltech/c1/medm'" as a quick way to get to the medm directory.

Once we get multiple compiled versions (i.e. i386, x86_64, Solaris) of the new gds tools from Alex, we'll have to some more serious surgery on this file.

I removed the "New Computer Restart Procedures" and simply moved the changes into the "Computer Restart procedures", found here.  I've removed everything I don't think applies anymore (all the VME FE reboot procedures for example).

  3716   Thu Oct 14 12:45:47 2010 josephbUpdateComputersNDS 2 server installation on mafalda

[Joe, John Zweizig]

John stopped by around noon today to install the NDS 2 server.  He installed it /cvs/cds/caltech/users/jzweizig/nds2-server/.

Once John is done, I will be moving this to a more sensible install location that is not his user directory, but its there for the moment.

We had to install a couple more packages including bzip2, bzip2-devel, gcc-c++, openssl, and openssl-devel. 

We mounted the /frames directory from the fb machine to mafalda by modifying the /etc/fstab file with the line:

fb:/frames              /frames                 nfs     bg,ro           0 0

 

If we change channels recorded by the frame builder, we need to update a channel list file for the NDS 2 server.  There's an excutable located at:

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList

This builds the file list if given a .gwf file.  These are written by the frame builder, and can be found in /frames/full/####, where #### are the first 4 gps digits of the gravity files contained in that directory.

Upon questing about when we get to GPS time 1000000000, he said there's some updates he needs to do so it rather throws away the last 5 digits, rather than keeping the first 4.

An example command run on the fb or mafalda machine is:

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList  /frames/full/9711/C-R-971119728-16.gwf > nds2-mafalda/C-R-ChanList.txt

For a seconds trends file (located in /frames/trends/second/ instead of /frames/full)

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList  /frames/trends/second/9711/ C-T-971106780-60.gwf > nds2-mafalda/C-T-ChanList.txt

For a minute trends file (located in /frames/trends/minute)

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList  /frames/trends/minute/9711/C-T-971106780-60.gwf  > nds2-mafalda/C-M-ChanList.txt

In these cases, John was putting the lists in the /cvs/cds/caltech/users/jzweizig/nds2-mafalda/ directory.

 

Both the  C-raw-cache.txt file and the nds2.conf files need to be configured to point at the correct files in the nds2-mafalda directory.

 

  3720   Thu Oct 14 14:09:30 2010 josephbUpdateCDSNDS 2 server status

[Joe, John]

The nds 2 server is about 50% of the way there.  You can connect to it and get channel lists, but its having issues actually serving the data.  The errors we're getting basically say it can't find the source data for the channel

John had to go get lunch at 2pm, but he said he'd log in remotely later and try to configure it properly.

  3728   Fri Oct 15 15:32:35 2010 josephbUpdateCDSslow DAQ channels added

I added all the _OUT16, _INMON, _EXCMON channels associated with the BS, ITMX, and ITMY channels in SUS_SLOW.ini in the /opt/rtcds/caltech/c1/chans/daq directory.  Similarly, I added the channels associated with MC1, MC2, and MC3 to MCS_SLOW.ini and those associated with PRM and the SRM to RMS_SLOW.ini. 

Lines pointing to these files were then added to the /opt/rtcds/caltech/c1/target/fb/master file and the frame builder restarted.  It took about 4 tries before the frame builder stayed up using ./daqd -c ./daqdrc.

I generated the .ini files with a python script.  Its at /opt/rtcds/caltech/c1/scripts/daqscripts/create_inmon_out16_daq_ini.py. It checks the C0EDCU.ini to find what slow channels already exist, then goes through a medm directory given at the command line looking for all _OUT16, _INMON, and _EXCMON channels, adding them if they aren't already accounted for, and then writing out the new file to the location of your choice.

  3793   Wed Oct 27 10:53:03 2010 josephbConfigurationComputersWhy doesn't DTT work?!?

Test points for the SUS channels should be there.  They have been working previously this week.  Possibly break down points include awgtpman, mx_streams, and the fb itself.  I'll look into that.

As far as other fast channels, there are no other fast front ends running than the suspensions ones we have.  Until additional channels get connected to the front ends and the models updated, those are the channels we have available.  However I am working on getting c1ioo up and running, and we can try connecting in some PEM channels today to the c1sus front end's 4th ADC.

 

Edit:

I tried starting a fresh instance of the frame builder, but when I brought the old copy down, it left a pair of zombie or dead mx_stream processes running on c1sus . Basically c1mcs and c1rms were still running, while c1x02 and c1sus came down.  I tried to kill the processes but this caused the c1sus machine to crash.  In the past I've killed left over mx_stream processes running after the frame builder has gone down, but I've never seen them crash the computer.  I'm unsure why this happened since we haven't done any updates of the code, just updated models and daq configuration files.

Quote:

DTT has only SUS and "X02" channels under C1 in the drop down channel selection menu.  Basically, we can't measure any fast channels with DTT.  I keep getting the error: "Unable to select testpoints."  Sadface.

Similar things are true for DataViewer.  The same limited number of fast channels, and no data found:

Server error 13: no data found
datasrv: DataWriteRealtime failed: daq_send: Illegal seek

Is this a framebuilder problem?  Is this something that the CDS team has on the to-do list?

 

  3794   Wed Oct 27 11:34:40 2010 josephbUpdateCDSModified rc.local for front end machines

What was the problem:

On reboot, c1sus was in a strange state.  The epics IOC was running, along with tpman, but there were no loaded front ends.

People had to manually run sudo insmod c1SYSfe.ko.

What was the cause:

Awhile back Alex had commented out the line in the rc.local file which actually loaded the front end modules (i.e. c1x02fe.ko, c1susfe.ko, etc).  This was while debugging.  He never put it back.

What was the solution:

I uncommented the following line in /diskless/root/etc/rc.local file on the fb machine:

/opt/rtcds/caltech/c1/target/$i/scripts/startupC1rt

Now on reboot, the c1sus machine should start up all the front ends listed in the rtsystab file, located in /diskless/root/etc/ on the fb machine.

 

  3795   Wed Oct 27 11:52:45 2010 josephbUpdateelogElog needed to be restarted

I had to restart the elog on Nodus because it was no longer responding.

  3796   Wed Oct 27 12:32:53 2010 josephbUpdateCDSfb rebooted to try and fix testpoints

Problem:

Test points were unavailable last night, even after reboots of c1sus and even restarting the daqd process on the frame builder.

Cause:

Its unclear at this time.  My guess is flaky fb and mx_stream codes.  At the moment, the daqd often requires several restarts as it segfaults within a minute or two of restarting it.

What we did (aka treating the symptoms):

We rebooted the frame builder machine.  I also added the daqd and nds processes to the inittab.  Now when these die, they will automatically be restarted.

Steps to add to the inittab on fb

0) If not on fb, ssh -X fb

1) cd /etc/

2) sudo vi inittab or sudo emacs init

3) Add a line like: id:runlevels:action:process

The id is a unqiue 2-4 letter and number identifier for the process

Run levels is the run level of linux that it will start at. 345 will cover the normal cases

action is what to do with the process. Respawn makes it run at startup and also restarts it everytime it dies.

process is the command you want to run

See "man inittab" for more details

In this case we added

daq:345:respawn:/opt/rtcds/caltech/c1/target/fb/daqd -c /opt/rtcds/caltech/c1/target/fb/daqdrc > /opt/rtcds/caltech/c1/target/fb/daqd.log


nds:345:respawn:/opt/rtcds/caltech/c1/target/fb/nds pipe > /opt/rtcds/caltech/c1/target/fb/nds.log

4) Save.

5) Run "sudo /sbin/telinit q".  This forces init to rexamine the inittab file

daqd and nds will now automatically restart when they die.

Continuing issues:

When the frame builder dies, the mx_stream processes on the front ends die as well.  These need to be restarted manually at the moment by using "sudo /etc/restart_streams" while on c1sus.

The framebuilder code shouldn't be this flaky.

  3799   Wed Oct 27 17:06:48 2010 josephbUpdateCDSMoved c1iscey chassis and host interface board to c1ioo

Problem:

Need a working IO chassis connected to c1ioo in order to bring the MC_L into the digital realm, and then via RFM transmit to the c1sus machine.

Attempted Solution:

Move the c1iscey IO chassis to c1ioo while the c1ioo chassis is at downs.

The c1iscey chassis however doesn't seem to be talking to the c1ioo computer.  I tried changing the host interface card on the c1ioo chassis.  I took out One Stop Systems HIB2-x4-H interface card with serial number 26638 from the c1ioo computer and put in the One Stop Systems HIB2-x4-H with serial number 35242 in from c1iscey into c1ioo.  Still didn't work.

All the lights are red on the interface card on the actual chassis and its cooling fan isn't spinning. 

Using dmesg on c1ioo shows that it does not see any of the ADC/DAC/BO cards.

Status:

I'm going to  wait until tomorrow morning when Rolf gets a chance to look at the c1ioo chassis over at Downs to determine the next step.  If we fix the c1ioo chassis, I'm move the c1iscey chassis and its host interface board back to the end.

  3811   Thu Oct 28 16:38:54 2010 josephbUpdateCDSFlaky fb, reverted inittab changes on fb

Problem:

Yuta reported many of the signals being displayed by dataviewer "fuzzier" than normal.  And diaggui was not working.

Running "diag -i" reported:

Diagnostics configuration:
awg 21 0 192.168.113.85 822095893 1 192.168.113.85
awg 36 0 192.168.113.85 822095908 1 192.168.113.85
awg 37 0 192.168.113.85 822095909 1 192.168.113.85
tp 21 0 192.168.113.85 822091797 1 192.168.113.85
tp 36 0 192.168.113.85 822091812 1 192.168.113.85
tp 37 0 192.168.113.85 822091813 1 192.168.113.85

This seems to be missing an nds type line between the 3 awgs and the 3 tp lines.

The daqd code (the framebuilder) is being especially flaky today, and I'm starting to see new errors.

[Thu Oct 28 16:13:46 2010] Couldn't open full trend frame file
`/frames/trend/second/9723/C-

T-972342780-60.gwf' for writing; errno 13
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
Segmentation fault (core dumped)

or

[Thu Oct 28 16:17:06 2010] Couldn't open full frame file
`/frames/full/9723/.C-R-972343024-16.gwf' for writing; errno 13
CA client library tcp receive thread terminating due to a C++ exception
FATAL: exception not rethrown
CA client library tcp receive thread terminating due to a C++ exception
CA client library tcp receive thread terminating due to a C++ exception
FATAL: exception not rethrown
cac: tcp send thread received an unexpected exception - disconnecting
Aborted (core dumped)

What was done today that might have affected it:

A new c1ioo chassis from Downs was connected to c1ioo.  I also connected c1ioo to the DAQ network (192.168.114.xxx) which talks to the frame builder.

I started downloading the necessary files to be able to follow Keith's instructions for a standard control / teststand setup in /opt/apps , /opt/rtapps, etc.  However, it has not actually been installed yet.

Yuta added additional OL channels to the DAQ config for being recorded.

Attempted Fixes:

I reverted the inittab changes I made in this elog.  Didn't help.

I disconnected c1ioo from the DAQ network.  Didn't help.

Rebooted the frame builder machine.  Didn't help.

I've sent an e-mail to Alex describing the problem to see if he has any idea where we went wrong.

Yuta may try restoring the old DAQ channel choices and see if that makes a difference.

Current Status:

daqd framebuilder code still won't stay up.  So no channels at the moment.

  3822   Fri Oct 29 11:29:29 2010 josephbUpdateCDSHow I broke the frame builder yesterday

Problem:

Long before Yuta came along and deleted daqd, I had done something to prevent the framebuilder code from running at all.

Cause:

Alex pointed out via e-mail that the corresponded to the inability to access certain frame files due to their permissions being only for root. 

Turns out when I had run the code under the inittab, I forgot to make it use controls, instead of root (which is the default).  This later on caused problems for the code when it tried to access those files, resulting in the wierd errors we saw.

Solution:

Use chown to change the offending frame files back to controls.

Future:

Write a proper inittab script which uses "su controls" before running the daqd code.

  3827   Fri Oct 29 16:43:25 2010 josephbUpdateCDSc1ioo now talking to c1sus

Problem:

c1ioo was not able to talk to c1sus because of timing issues.  This prevented the mode cleaner length signal (MC_L) from getting to c1sus.

Solution:

The replacement c1ioo chassis from Downs with a more recent revision of the IO backplane works. 

The c1ioo is now talking to c1sus and transmitting a signal.

We connected the cable hanging off the DAQ interface board labeled MC OUT1 to the MC Servo board's output labeled OUT1.

During debugging I modified the c1x02, c1x03, c1mcs and c1ioo codes to print debugging messages.  This was done by modifying the /opt/rtcds/caltech/c1/advLigoRTS/src/fe/commData2.c file.  I have since reverted those changes.

Future:

We still need to check that everything is connected properly and that the correct signal is being sent to the MC2 suspension.

 

  3834   Mon Nov 1 11:34:08 2010 josephbUpdateCDSCA.Client.Exception spam fixed

Problem:

CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C1:SUS-ITMX_TO_COIL_0_3_INMON", Connecting to: 192.168.113.85:42367, Ignored: c1susdaq:42367"
    Source File: ../cac.cpp line 1208
    Current Time: Fri Oct 29 2010 18:07:39.132686519

The above exception gets thrown for each channel sent to the framebuilder when you start the frame builder.  Its the 192.168.113.xxx (main martian) and 192.168.114.xxx (DAQ) networks sending the same information.  This makes it hard to see real errors when starting the frame builder.

Solution:

Configure EPICS channel access to only use one network.  This is done by modifying the /etc/bash/bashrc and the /diskless/root/etc/bash/bashrc files with the following two lines:

export EPICS_CA_AUTO_ADDR_LIST=NO
export EPICS_CA_ADDR_LIST="192.168.113.255"

The first line tells the computers not to automatically search all attached network devices.  The second tells it to use the 192.168.113.xxx network for EPICS channel broadcasts.

  3837   Mon Nov 1 15:35:41 2010 josephbUpdateCDSFront end USR and CPU times now recorded by DAQ

Problem:

We have no record of how long the CPUs are taking to perform a cycle's worth of computation

Solution:

I added the following channels to the various slow DAQ configuration files in /opt/rtcds/caltech/c1/chans/daq/

IOO_SLOW.ini:[C1:FEC-34_USR_TIME]
IOO_SLOW.ini:[C1:FEC-34_CPU_METER]
IOP_SLOW.ini:[C1:FEC-20_USR_TIME]
IOP_SLOW.ini:[C1:FEC-20_CPU_METER]
IOP_SLOW.ini:[C1:FEC-33_USR_TIME]
IOP_SLOW.ini:[C1:FEC-33_CPU_METER]
MCS_SLOW.ini:[C1:FEC-36_USR_TIME]
MCS_SLOW.ini:[C1:FEC-36_CPU_METER]
RMS_SLOW.ini:[C1:FEC-37_USR_TIME]
RMS_SLOW.ini:[C1:FEC-37_CPU_METER]
SUS_SLOW.ini:[C1:FEC-21_USR_TIME]
SUS_SLOW.ini:[C1:FEC-21_CPU_METER]

 

Notes:

To restart the daqd code, simply kill the running process.  It should restart automatically.  If it appears not to have started, check the /opt/rtcds/caltech/c1/target/fb/restart.log file and the /opt/rtcds/caltech/c1/target/fb/logs/daqd.log.xxxx files.  If you made a mistake in the DAQ channels and its complaining, fix the error and then restart init on the fb machine by running "sudo /sbin/init q"

  3846   Tue Nov 2 15:24:18 2010 josephbUpdateCDSc1ioo and c1mcs only sending MC_L, MC1_PIT, MC1_YAW

In order to have the c1mcs model run, we're running with only 3 RFM channels between c1ioo and c1mcs at the moment.  This leaves the model at around 45 microseconds, and at least lets us damp.

Alex and I still need to track down why the RFM read calls are taking so much time to execute.

  3853   Wed Nov 3 15:13:55 2010 josephbUpdateCDSTemporary RFM slow read work around

Problem:

Each RFM read in the c1mcs model is adding ~7 microseconds to the cycle time.  Adding too many pushes it over the 62 microsecond limit.

RFM writes do not have this problem.

Temporary Solution:

The fastest fix was to create a new front end model, called c1rfm, which does nothing but read in the MC1, MC2, MC3 PIT and YAW signals from the c1ioo machine, and then passes them along to the c1mcs model via shared memory, which is fast.

This means the data being sent is 2 cycles slow, one to go over the RFM, and one to go over shared memory.  It is running at 16384 cycles/second, so it shouldn't have much impact at the frequencies we use those channels for.

MC_L is still being sent directly to the c1mcs front end code via RFM.

Current Status:

The c1mcs model is running at  30-33 microseconds for CPU time.

The c1rfm model is running at 45-47 microseconds for CPU time.

All 7 channels, MC_L, MC1_PIT, MC1_YAW, MC2_PIT, MC2_YAW, MC3_PIT, MC3_YAW are responding.

The c1rfm model was added to the /diskless/root/etc/rtsystab file on the fb machine so that it automatically starts on the reboot of c1sus.

The USR and CPU time channels for c1rfm were added to the MCS_SLOW.ini file in /opt/rtcds/caltech/c1/chans/daq/ so that the framebuilder records them, namely:

[C1:FEC-38_USR_TIME]
[C1:FEC-38_CPU_METER]

The framebuilder was restarted to take these new channels into account.

Future:

Finish implementing and debugging the "round robin" RFM reader so as to not require a seperate model to be doing RFM reads in parallel.

Look into improving read speed by either merging timestamps and data into a single  or reading time stamp once every tenth or hundreth cycle, although this at best provides a factor of 2 improvement.

Check to see if RFM card being on the IO chassis or directly in the computer chassis makes a difference.

Get Alex and Rolf to improve RFM read speed.

  3855   Wed Nov 3 17:01:01 2010 josephbSummaryCDSComparison of RFM read times

Problem:

RFM reads are slow.  Rolf has said it should take 2-3 microseconds per read. 

c1sus is taking about 7 microseconds per read, twice as slow as Rolf's claim.

Hypothesis:

The RFM card is in the IO chassis, and is sharing the PCIe bus with 4 ADC cards, 3 DAC cards, 4 BO cards, and a BIO card.  Its possible this crowded bus is causing the reads to take even longer.

Test Results:

Compare read times between the c1sus computer, which has its RFM card in the IO chassis, to c1ioo, which has its RFM card in the computer.

c1ioo:

No RFM reads: 8 microseconds

3 RFM reads: 20 microseconds (~4 per read)

6 RFM reads: 32 microseconds (~4 per read)

c1sus:

No RFM read: 25 microseconds (bigger model)

1 RFM read: 33 microseconds (~8 per read)

3 RFM read: 45 microseconds (~7 per read)

6 RFM read: Over 62 microseconds, doesn't run.

Conclusion:

It looks like moving the RFM card may help by about a factor of 2 in read speed, although its still not quite what Alex and Rolf claim it should be.

The c1mcs and c1ioo models have been reverted to their normal operations.

 

  3860   Thu Nov 4 15:15:43 2010 josephbUpdateCDSModified feCodeGen.pl, fmseq.pl, and suspension screens

Feature Requested:

Have the CPU_meter change change color at various alarm levels.  These alarm levels have been set at 2/3 maximum for Minor alarm (yellow) and 9/10 maximum for Major alarm (red).

Implementation:

Rather than hand code each EPICS .db file to add the alarm files each time we rebuild the front ends, I decided to modify it at the source (since it strikes me as a generally useful alarm level for all front end codes).

First, I modified the feCodeGen.pl script.

I changed

print EPICS "OUTVARIABLE FEC\_$dcuId\_CPU_METER epicsOutput.cpuMeter int ai 0 field(HOPR,\"$rate\") field(LOPR,\"0\")\n";

to

print EPICS "OUTVARIABLE FEC\_$dcuId\_CPU_METER epicsOutput.cpuMeter int ai 0 field(HOPR,\"$rate\") field(LOPR,\"0\") field(HIGH,\"$two_thirds_rate\") field(HSV,\"MINOR\") field(HIHI,\"$     nine_tenths_rate\") field(HHSV,\"MAJOR\")\n";

I added the following two lines just before it as well:

$two_thirds_rate = int($rate * 2 / 3);

$nine_tenths_rate = int($rate * 9 / 10);

However, only the first four fields were actually added to the database file.  Apparently fmseq.pl, which populated the database, was hard coded to only handle up to 4 fields.

I modified the fmseq.pl script in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util/ so as to be able to handle up to 6 field values when writing EPICS .db files.

This change was accomplished by simply changing the following line

($junk, $v_name, $v_var, $v_type, $ve_type, $v_init, $v_efield1, $v_efield2, $v_efield3, $v_efield4 ) = split(/\s+/, $_);

to 

($junk, $v_name, $v_var, $v_type, $ve_type, $v_init, $v_efield1, $v_efield2, $v_efield3, $v_efield4, $v_efield5, $v_efield6 ) = split(/\s+/, $_);

everywhere it occurred.  There were something like 10 instances of it. Also, I added the two lines

        $vardb .= "    $v_efield5\n";
        $vardb .= "    $v_efield6\n";

after each set of

         $vardb .= "    $v_efield1\n";
         $vardb .= "    $v_efield2\n";
         $vardb .= "    $v_efield3\n";
         $vardb .= "    $v_efield4\n";

Lastly, I modified the CPU_METER bar graph on the C1SUS_DEFAULTNAME.adl screen (located in /opt/rtcds/caltech/c1/medm/master/) to use alarm levels, and then ran generate_master_screens.py.

 

  3900   Thu Nov 11 21:07:49 2010 josephbUpdateCDSPlugged c1iscex into DAQ network - still causes network slowdown

I connected the c1iscex computer to the dedicated DAQ network switch (located in 1X7).

This does not seem to have helped c1iscex stop spewing out "OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)" at the rate of ~1 Gigabyte per minute.

c1iscex is currently off until a solution can be found.

  3919   Mon Nov 15 11:13:12 2010 josephbUpdateCDSModified rc.local to not start mx_streams automatically

Problem:

c1iscex floods the network with about 1 gigabyte of error messages in a few seconds, writing to a log file in /opt/rtcds/caltech/c1/target/fb/logs/

Temporary change:

I commented out the following line in the rc.local file on the fb machine in the /diskless/root/etc/ directory:

#nice --20 ./mx_stream -s "$SYSTEMS" -d fb:0 >& logs/$HOSTNAME.log&

This disables the automatic start up of the mx_streams code on all the front ends.  This will prevent the network being brought to its knees by c1iscex while we debug the problem.

It also means on a reboot of the front ends, the mx_stream process needs to be started by hand until this change is reverted.

To do this, log into the front end and then change directory to /opt/rtcds/caltech/c1/target/fb

For c1sus, run:

./mx_stream -s c1x02 c1sus c1mcs c1rms c1rfm  -d fb:0

For c1ioo, run:

./mx_stream -s c1x03 c1ioo -d fb:0

 

  3926   Mon Nov 15 16:26:46 2010 josephbUpdateCDSc1iscex is now running and the network hasn't died

Problem:

c1iscex was spamming the network with error messages.

Solution:

Updated the front end codes to current standards (they were on the order of months out of date).  After fixing them up and rebuilding the codes on c1iscex, it no longer had problems connecting to the frame builder.\

Status:

I can look at test points for ETMX.  It is not currently damping however.

To Do:

Move filters for ETMX into the correct files. 

Need to add a Binary output blue and gold box to the end rack, and plug it into the binary output card.  Confirm the binary output logic is correct for the OSEM whitening, coil dewhitening, and QPD whitening boards. 

Get ETMX damped.

Figure out what we're going to do with the aux crate which is currently running y-end code at the new x-end.  Koji suggested simply swapping auxilliary crates - this may be the easiest.  Other option would be to change the IP address, so that when it PXE boots it grabs the x-end code instead of the y-end code.

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder TDS
                     
ELOG V3.1.3-