40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 247 of 341  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  3644   Mon Oct 4 15:28:10 2010 josephbUpdateCDSTrying to get c1ioo booting as Gentoo.

I modified the dhcpd.conf file in /etc/dhcp on the fb machine.  I added a entry for c1ioo, listing its MAC address and ip number near the bottom of the file.  I then restarted the dhcp server using "sudo /etc/init.d/dhcpd restart" while on the fb machine.

I also modified the rtsystab, which is used to determine which front end codes start on boot up of a machine.  I added a line: c1ioo   c1x03  c1ioo

I am now in the process of getting c1ioo to come up as a Gentoo machine so I can build a model with an RFM connection in it and test the communication between c1sus and c1ioo.  This involves removing the hard drives and checking to make sure the boot priority is correct (i.e. it checks for a network boot).

  3661   Wed Oct 6 15:56:14 2010 josephbHowToCDSHow to load matrices quickly and easily

Awhile ago I wrote several scripts for reading in medm screen matrix settings and then writing them out.  It was meant as kind of a mini-burt just for matrices for switching between a couple of different setups quickly.

Yuta has expressed interest in having this instruction available.

In /cvs/cds/caltech/scripts/matrix/ there are 4 python scripts:

saveMatrix.py, oldSaveMatrix.py, loadMatrix.py, oldLoadMatrix.py

The saveMatrix.py and loadMatrix.py are for use with the current front end codes (which start counting from 1), where as the old*.py files are for the old system.

To use saveMatrix.py you need to specify the number of inputs, outputs, and the base name of the matrix (i.e. C1:LSP-DOF2PD_MTRX is the base of C1:LSP-DOF2PD_MTRX_0_0_GAIN for example), as well as an ouptut file where the values are stored.

So to save the BS in_matrix setting you could do (from /cvs/cds/caltech/scripts/matrix/)

./saveMatrix.py -i 4 -o 5 -n "C1:SUS-BS_TO_COIL_MTRX" -f -d ./to_coil_mtrx.txt

The -i 4 indicates 4 inputs, the -o 5 indicates 5 outputs, -n "blah blah" indicates the base channel name, -f indicates a matrix bank of filters (if its just a normal matrix with no filters, drop the -f flag), and -d ./to_coil_mtrx.txt indicates the destination file.

To write the matrix, you do virtually the same thing:

./loadMatrix.py -n "C1:SUS-PRM_TO_COIL_MTRX" -f -d ./to_coil_mtrx.txt

In this case, you're writing the saved values of the BS, to the PRM.  This method might be faster if you're trying to fill in a bunch of new matrices that are identical rather than typing 1's and -1's 20 times for each matrix.

I'll have Yuta add his how-to of this to the wiki.

  3665   Thu Oct 7 10:37:42 2010 josephbUpdateCDSc1sus with flaky ssh

Currently trying to understand why the ssh connections to c1sus  are flaky.  This morning, every time I tried to make the c1sus model on the c1sus machine, the ssh session would be terminated at a random spot midway through the build process.  Eventually restarting c1sus fixed the problem for the moment.

However, previously in the last 48 hours, the c1sus machine had stopped responding to ssh logins while still appearing to be running the front end code.  The next time this occurs, we should attach a monitor and keyboard and see what kind of state the computer is in.  Its interesting to note we didn't have these problems before we switched over to the Gentoo kernel from the real-time linux Centos 5.5 kernel.

  3678   Fri Oct 8 12:21:11 2010 josephbUpdateCDSchecking MC1 suspension damping

Upon investigation, it appears that the c1mcs model was (and still is) timing out after a random amount of time. Or in other words, it at some point it was taking too long to do all the calculations for a single cycle and falling behind. The evidence for this is from the dmesg command when run on c1sus.

There's a bunch of lines like:

[ 8877.438002] c1mcs: cycle 568 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[ 8877.438002] c1mcs: cycle 569 time 62; adcWait 0; write1 0; write2 0; longest write2 0

With a final line like: [ 8877.439003] c1mcs: ADC TIMEOUT 1 2405 37 2277

This last line indicates in fell so far behind it gave up.

However, this doesn't actually seem to be related to the amount of computation going on with the front end. I restarted the c1mcs model this morning by logging into the c1sus machine, and changing to the /opt/rtcds/caltech/c1/target/c1mcs/bin directory and running:

lsmod

sudo rmmod c1mcsfe

sudo insmod c1mcsfe.ko

The first line just lists the running modules. The second removes the c1mcs module, and the third starts it up again.

I proceeded to turn all the filters and and set all the matrix values while keeping an eye on the C1MCS-GDS_TP.adl screen and its timing indicator. It was running fine with all these turned on. Then I ran a dtt session from rosalba (going to /opt/apps/, typing bash, then source gds-env.bash, and finally diaggui) as well as a dataviewer and looked at 6 test point channels. It received data fine.

However, about 2 minutes after I had stopped doing this (although the dataviewer realtime session was still going) the USR timing jumped from about 20 microseconds to 35 microseconds, and the CPU Max timing jumped to the 50-60 microsecond range. At this point dmesg started reporting things like:

[54143.465613] c1mcs: cycle 1076 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[54143.526004] c1mcs: cycle 2064 time 62; adcWait 0; write1 0; write2 0; longest write2 0

About a minute later the code gave up and reported a ADC timeout via dmesg. None of the other front ends seem to be affected.

I had to clear the test points manually after stopping dataviewer and dtt by going to rosalaba,using the sourced gds-env.bash, and running diag -l. I then typed "tp clear 36 *" to clear all the test points on the model with FEC number 36 (corresponding to c1mcs).

I removed and restarted c1mcs again, trying to turn on a few things at a time and letting it run for a few minutes to see if I could narrow down if its one particular filter perhaps reaching an underflow and starting to bog down the computations. However, after about 45 minutes of this, the model is still running and I've turned all the filters on and have been running about 8 test points with no problem, so the problem is clearly intermittent.

Quote:

Notes:
 c1mcs crashed many times during the investigation, and I had to kill and restart it again and again.
 It seemed to be easily crashed when filters are on, and so I couldn't check whether the damping servo is working correctly or not today.

Next work:

  - fix c1mcs (and maybe others)
  - check the damping servo by comparing the displacements of each 4 degrees of freedom when servo in off and on.

 

  3680   Fri Oct 8 15:57:32 2010 josephbUpdateCDSc1ioo status

I've been trying to figure out why the c1ioo machine crashes when I try to run the c1ioo front end.

I tried removing some RFM components from the c1ioo model, and then the c1gpt model (Kiwamu's green locking model) as an easier test case.  Both cause the machine to lock up once they start running.  Lastly, I tried running the c1x02 and c1sus models on the c1ioo machine instead of the c1sus machine, after first turning off all models on c1sus.  This led to the same lockup. 

Since those models run fine on the c1sus machine, I could only conclude that a recent change in the fe code generator or the Gentoo kernel and the Sun X4600 computer don't work together at the moment.

After talking to Alex, he got the idea to check if the monitor() and mwait() were supported on the c1ioo machine.  These function calls were added relatively recently, and are   used to poll a memory location to see if something has been written there, and then do something when it is.  Apparently the Sun X4600 computers do not support this call.  Alex has modified the code to not use these functions calls, at least for now.  He'd like to add a check to the code so it does use those calls on machines which have them supported.

After this change, the c1ioo and c1gpt front end codes do in fact run.

  3681   Fri Oct 8 17:35:24 2010 josephbUpdateCDSstatus of c1ioo, c1sus and rfm

RFM is still not working.  I can see data on a filter just before it reaches the RFM sending code, but I see only zeros on the receiving side.

c1sus machine and c1x02, c1sus, c1mcs, c1rms are running.  At the moment, the c1mcs model is running at about 42 microseconds for USR time and 56 microseconds for CPU MAX, which is close to the 61 microsecond limit.  This is with MC filters on.  So far it has not been late, but its not clear to me if its going to stay that way.  So far I haven't been able to isolate why it sometimes slows down after a few minutes.  Also, it was running faster earlier in the day (around 30-ish microseconds) and I believe it has slowed down slightly in the last hour or two.

c1ioo machine and c1x03, c1ioo are running. However its not doing very much good as I can't get any data transferred from it to any of the optic suspensions. I need to spend some more time debugging this and then grab Alex I think.

  3687   Mon Oct 11 10:49:03 2010 josephbUpdateCDSc1sus stability

Taking a look at the c1sus machine, it looks as if all of the front end codes its running (c1sus - running BS, ITMX, ITMY, c1mcs - running MC1, MC2, MC3, and c1rms - running PRM and SRM) worked over the weekend.  As I see no

Running dmesg on c1sus reports on a single long cycle on c1x02, where it took 17 microseconds (~15 microseconds i maximum because the c1x02 IOP process is running at 64kHz).

Both the c1sus and c1mcs models are running at around 39-42 microseconds USR time and 44-50 microseconds CPU time.  It would run into problems at 60-62 microseconds.

Looking at the filters that are turned on, it looks as it these models were running with only a single optic's worth of filters turned on via the medm screens.  I.e. the MC2 and ITMY filters were properly set, but not the others.

The c1rms model is running at around 10 microseconds USR time and 14-18 microseconds CPU time.  However it apparently had no filters on.

It looks as if no test points were used this weekend.  We'll turn on the rest of the filters and see if we start seeing crashes of the front end again.

Edit:

The filters for all the suspensions have been turned on, and all matrix elements entered.  The USR and CPU times have not appreciably changed.  No long cycles have been reported through dmesg on c1sus at this time.  I'm going to let it run and see if it runs into problems.

  3714   Thu Oct 14 10:29:33 2010 josephbUpdateComputersMafalda ready for NDS2, updated Rosalba, Rossa repos

At John Zweizig's request, I installed a couple of packages from the lscsoft repository, along with libtools, automake, autoconf, and several kerberos packages, including cyrus-sasl, cyrus-sasl-lib, cyrus-sasl-devel, cyrus-sasl-gssapi, and krb5-libs.  All it needs now is John to come down and install the NDS2 server.

I copied the lscsoft.repo file from /etc/yum.conf.d/ on allegra to mafalda, as well as rosalba and rossa, just for completeness.  I also added the epel repository to rossa and installed the yum-priorities package and set the priorities on rossa's repositories. 

  3715   Thu Oct 14 10:59:10 2010 josephbUpdateComputersUpdated cshrc.40m and Computer Restart Procedures

I started modifying the cshrc.40m file in /cvs/cds/caltech/ so that it starts pointing at the new directories.

# misc aliases
alias target 'cd /opt/rtcds/caltech/c1/target'
alias scripts 'cd /opt/rtcds/caltech/c1/scripts'
alias chans 'cd /opt/rtcds/caltech/c1/chans'
alias c 'cd /opt/rtcds/caltech/c1'
alias s 'cd /opt/rtcds/caltech/c1/scripts'
alias u 'cd /cvs/cds/caltech/users'

I also added "alias screens 'cd /opt/rtcds/caltech/c1/medm'" as a quick way to get to the medm directory.

Once we get multiple compiled versions (i.e. i386, x86_64, Solaris) of the new gds tools from Alex, we'll have to some more serious surgery on this file.

I removed the "New Computer Restart Procedures" and simply moved the changes into the "Computer Restart procedures", found here.  I've removed everything I don't think applies anymore (all the VME FE reboot procedures for example).

  3716   Thu Oct 14 12:45:47 2010 josephbUpdateComputersNDS 2 server installation on mafalda

[Joe, John Zweizig]

John stopped by around noon today to install the NDS 2 server.  He installed it /cvs/cds/caltech/users/jzweizig/nds2-server/.

Once John is done, I will be moving this to a more sensible install location that is not his user directory, but its there for the moment.

We had to install a couple more packages including bzip2, bzip2-devel, gcc-c++, openssl, and openssl-devel. 

We mounted the /frames directory from the fb machine to mafalda by modifying the /etc/fstab file with the line:

fb:/frames              /frames                 nfs     bg,ro           0 0

 

If we change channels recorded by the frame builder, we need to update a channel list file for the NDS 2 server.  There's an excutable located at:

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList

This builds the file list if given a .gwf file.  These are written by the frame builder, and can be found in /frames/full/####, where #### are the first 4 gps digits of the gravity files contained in that directory.

Upon questing about when we get to GPS time 1000000000, he said there's some updates he needs to do so it rather throws away the last 5 digits, rather than keeping the first 4.

An example command run on the fb or mafalda machine is:

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList  /frames/full/9711/C-R-971119728-16.gwf > nds2-mafalda/C-R-ChanList.txt

For a seconds trends file (located in /frames/trends/second/ instead of /frames/full)

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList  /frames/trends/second/9711/ C-T-971106780-60.gwf > nds2-mafalda/C-T-ChanList.txt

For a minute trends file (located in /frames/trends/minute)

/cvs/cds/caltech/users/jzweizig/nds2-server/bin/buildChannelList  /frames/trends/minute/9711/C-T-971106780-60.gwf  > nds2-mafalda/C-M-ChanList.txt

In these cases, John was putting the lists in the /cvs/cds/caltech/users/jzweizig/nds2-mafalda/ directory.

 

Both the  C-raw-cache.txt file and the nds2.conf files need to be configured to point at the correct files in the nds2-mafalda directory.

 

  3720   Thu Oct 14 14:09:30 2010 josephbUpdateCDSNDS 2 server status

[Joe, John]

The nds 2 server is about 50% of the way there.  You can connect to it and get channel lists, but its having issues actually serving the data.  The errors we're getting basically say it can't find the source data for the channel

John had to go get lunch at 2pm, but he said he'd log in remotely later and try to configure it properly.

  3728   Fri Oct 15 15:32:35 2010 josephbUpdateCDSslow DAQ channels added

I added all the _OUT16, _INMON, _EXCMON channels associated with the BS, ITMX, and ITMY channels in SUS_SLOW.ini in the /opt/rtcds/caltech/c1/chans/daq directory.  Similarly, I added the channels associated with MC1, MC2, and MC3 to MCS_SLOW.ini and those associated with PRM and the SRM to RMS_SLOW.ini. 

Lines pointing to these files were then added to the /opt/rtcds/caltech/c1/target/fb/master file and the frame builder restarted.  It took about 4 tries before the frame builder stayed up using ./daqd -c ./daqdrc.

I generated the .ini files with a python script.  Its at /opt/rtcds/caltech/c1/scripts/daqscripts/create_inmon_out16_daq_ini.py. It checks the C0EDCU.ini to find what slow channels already exist, then goes through a medm directory given at the command line looking for all _OUT16, _INMON, and _EXCMON channels, adding them if they aren't already accounted for, and then writing out the new file to the location of your choice.

  3793   Wed Oct 27 10:53:03 2010 josephbConfigurationComputersWhy doesn't DTT work?!?

Test points for the SUS channels should be there.  They have been working previously this week.  Possibly break down points include awgtpman, mx_streams, and the fb itself.  I'll look into that.

As far as other fast channels, there are no other fast front ends running than the suspensions ones we have.  Until additional channels get connected to the front ends and the models updated, those are the channels we have available.  However I am working on getting c1ioo up and running, and we can try connecting in some PEM channels today to the c1sus front end's 4th ADC.

 

Edit:

I tried starting a fresh instance of the frame builder, but when I brought the old copy down, it left a pair of zombie or dead mx_stream processes running on c1sus . Basically c1mcs and c1rms were still running, while c1x02 and c1sus came down.  I tried to kill the processes but this caused the c1sus machine to crash.  In the past I've killed left over mx_stream processes running after the frame builder has gone down, but I've never seen them crash the computer.  I'm unsure why this happened since we haven't done any updates of the code, just updated models and daq configuration files.

Quote:

DTT has only SUS and "X02" channels under C1 in the drop down channel selection menu.  Basically, we can't measure any fast channels with DTT.  I keep getting the error: "Unable to select testpoints."  Sadface.

Similar things are true for DataViewer.  The same limited number of fast channels, and no data found:

Server error 13: no data found
datasrv: DataWriteRealtime failed: daq_send: Illegal seek

Is this a framebuilder problem?  Is this something that the CDS team has on the to-do list?

 

  3794   Wed Oct 27 11:34:40 2010 josephbUpdateCDSModified rc.local for front end machines

What was the problem:

On reboot, c1sus was in a strange state.  The epics IOC was running, along with tpman, but there were no loaded front ends.

People had to manually run sudo insmod c1SYSfe.ko.

What was the cause:

Awhile back Alex had commented out the line in the rc.local file which actually loaded the front end modules (i.e. c1x02fe.ko, c1susfe.ko, etc).  This was while debugging.  He never put it back.

What was the solution:

I uncommented the following line in /diskless/root/etc/rc.local file on the fb machine:

/opt/rtcds/caltech/c1/target/$i/scripts/startupC1rt

Now on reboot, the c1sus machine should start up all the front ends listed in the rtsystab file, located in /diskless/root/etc/ on the fb machine.

 

  3795   Wed Oct 27 11:52:45 2010 josephbUpdateelogElog needed to be restarted

I had to restart the elog on Nodus because it was no longer responding.

  3796   Wed Oct 27 12:32:53 2010 josephbUpdateCDSfb rebooted to try and fix testpoints

Problem:

Test points were unavailable last night, even after reboots of c1sus and even restarting the daqd process on the frame builder.

Cause:

Its unclear at this time.  My guess is flaky fb and mx_stream codes.  At the moment, the daqd often requires several restarts as it segfaults within a minute or two of restarting it.

What we did (aka treating the symptoms):

We rebooted the frame builder machine.  I also added the daqd and nds processes to the inittab.  Now when these die, they will automatically be restarted.

Steps to add to the inittab on fb

0) If not on fb, ssh -X fb

1) cd /etc/

2) sudo vi inittab or sudo emacs init

3) Add a line like: id:runlevels:action:process

The id is a unqiue 2-4 letter and number identifier for the process

Run levels is the run level of linux that it will start at. 345 will cover the normal cases

action is what to do with the process. Respawn makes it run at startup and also restarts it everytime it dies.

process is the command you want to run

See "man inittab" for more details

In this case we added

daq:345:respawn:/opt/rtcds/caltech/c1/target/fb/daqd -c /opt/rtcds/caltech/c1/target/fb/daqdrc > /opt/rtcds/caltech/c1/target/fb/daqd.log


nds:345:respawn:/opt/rtcds/caltech/c1/target/fb/nds pipe > /opt/rtcds/caltech/c1/target/fb/nds.log

4) Save.

5) Run "sudo /sbin/telinit q".  This forces init to rexamine the inittab file

daqd and nds will now automatically restart when they die.

Continuing issues:

When the frame builder dies, the mx_stream processes on the front ends die as well.  These need to be restarted manually at the moment by using "sudo /etc/restart_streams" while on c1sus.

The framebuilder code shouldn't be this flaky.

  3799   Wed Oct 27 17:06:48 2010 josephbUpdateCDSMoved c1iscey chassis and host interface board to c1ioo

Problem:

Need a working IO chassis connected to c1ioo in order to bring the MC_L into the digital realm, and then via RFM transmit to the c1sus machine.

Attempted Solution:

Move the c1iscey IO chassis to c1ioo while the c1ioo chassis is at downs.

The c1iscey chassis however doesn't seem to be talking to the c1ioo computer.  I tried changing the host interface card on the c1ioo chassis.  I took out One Stop Systems HIB2-x4-H interface card with serial number 26638 from the c1ioo computer and put in the One Stop Systems HIB2-x4-H with serial number 35242 in from c1iscey into c1ioo.  Still didn't work.

All the lights are red on the interface card on the actual chassis and its cooling fan isn't spinning. 

Using dmesg on c1ioo shows that it does not see any of the ADC/DAC/BO cards.

Status:

I'm going to  wait until tomorrow morning when Rolf gets a chance to look at the c1ioo chassis over at Downs to determine the next step.  If we fix the c1ioo chassis, I'm move the c1iscey chassis and its host interface board back to the end.

  3811   Thu Oct 28 16:38:54 2010 josephbUpdateCDSFlaky fb, reverted inittab changes on fb

Problem:

Yuta reported many of the signals being displayed by dataviewer "fuzzier" than normal.  And diaggui was not working.

Running "diag -i" reported:

Diagnostics configuration:
awg 21 0 192.168.113.85 822095893 1 192.168.113.85
awg 36 0 192.168.113.85 822095908 1 192.168.113.85
awg 37 0 192.168.113.85 822095909 1 192.168.113.85
tp 21 0 192.168.113.85 822091797 1 192.168.113.85
tp 36 0 192.168.113.85 822091812 1 192.168.113.85
tp 37 0 192.168.113.85 822091813 1 192.168.113.85

This seems to be missing an nds type line between the 3 awgs and the 3 tp lines.

The daqd code (the framebuilder) is being especially flaky today, and I'm starting to see new errors.

[Thu Oct 28 16:13:46 2010] Couldn't open full trend frame file
`/frames/trend/second/9723/C-

T-972342780-60.gwf' for writing; errno 13
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
epicsThreadOnceOsd epicsMutexLock failed.
Segmentation fault (core dumped)

or

[Thu Oct 28 16:17:06 2010] Couldn't open full frame file
`/frames/full/9723/.C-R-972343024-16.gwf' for writing; errno 13
CA client library tcp receive thread terminating due to a C++ exception
FATAL: exception not rethrown
CA client library tcp receive thread terminating due to a C++ exception
CA client library tcp receive thread terminating due to a C++ exception
FATAL: exception not rethrown
cac: tcp send thread received an unexpected exception - disconnecting
Aborted (core dumped)

What was done today that might have affected it:

A new c1ioo chassis from Downs was connected to c1ioo.  I also connected c1ioo to the DAQ network (192.168.114.xxx) which talks to the frame builder.

I started downloading the necessary files to be able to follow Keith's instructions for a standard control / teststand setup in /opt/apps , /opt/rtapps, etc.  However, it has not actually been installed yet.

Yuta added additional OL channels to the DAQ config for being recorded.

Attempted Fixes:

I reverted the inittab changes I made in this elog.  Didn't help.

I disconnected c1ioo from the DAQ network.  Didn't help.

Rebooted the frame builder machine.  Didn't help.

I've sent an e-mail to Alex describing the problem to see if he has any idea where we went wrong.

Yuta may try restoring the old DAQ channel choices and see if that makes a difference.

Current Status:

daqd framebuilder code still won't stay up.  So no channels at the moment.

  3822   Fri Oct 29 11:29:29 2010 josephbUpdateCDSHow I broke the frame builder yesterday

Problem:

Long before Yuta came along and deleted daqd, I had done something to prevent the framebuilder code from running at all.

Cause:

Alex pointed out via e-mail that the corresponded to the inability to access certain frame files due to their permissions being only for root. 

Turns out when I had run the code under the inittab, I forgot to make it use controls, instead of root (which is the default).  This later on caused problems for the code when it tried to access those files, resulting in the wierd errors we saw.

Solution:

Use chown to change the offending frame files back to controls.

Future:

Write a proper inittab script which uses "su controls" before running the daqd code.

  3827   Fri Oct 29 16:43:25 2010 josephbUpdateCDSc1ioo now talking to c1sus

Problem:

c1ioo was not able to talk to c1sus because of timing issues.  This prevented the mode cleaner length signal (MC_L) from getting to c1sus.

Solution:

The replacement c1ioo chassis from Downs with a more recent revision of the IO backplane works. 

The c1ioo is now talking to c1sus and transmitting a signal.

We connected the cable hanging off the DAQ interface board labeled MC OUT1 to the MC Servo board's output labeled OUT1.

During debugging I modified the c1x02, c1x03, c1mcs and c1ioo codes to print debugging messages.  This was done by modifying the /opt/rtcds/caltech/c1/advLigoRTS/src/fe/commData2.c file.  I have since reverted those changes.

Future:

We still need to check that everything is connected properly and that the correct signal is being sent to the MC2 suspension.

 

  3834   Mon Nov 1 11:34:08 2010 josephbUpdateCDSCA.Client.Exception spam fixed

Problem:

CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C1:SUS-ITMX_TO_COIL_0_3_INMON", Connecting to: 192.168.113.85:42367, Ignored: c1susdaq:42367"
    Source File: ../cac.cpp line 1208
    Current Time: Fri Oct 29 2010 18:07:39.132686519

The above exception gets thrown for each channel sent to the framebuilder when you start the frame builder.  Its the 192.168.113.xxx (main martian) and 192.168.114.xxx (DAQ) networks sending the same information.  This makes it hard to see real errors when starting the frame builder.

Solution:

Configure EPICS channel access to only use one network.  This is done by modifying the /etc/bash/bashrc and the /diskless/root/etc/bash/bashrc files with the following two lines:

export EPICS_CA_AUTO_ADDR_LIST=NO
export EPICS_CA_ADDR_LIST="192.168.113.255"

The first line tells the computers not to automatically search all attached network devices.  The second tells it to use the 192.168.113.xxx network for EPICS channel broadcasts.

  3837   Mon Nov 1 15:35:41 2010 josephbUpdateCDSFront end USR and CPU times now recorded by DAQ

Problem:

We have no record of how long the CPUs are taking to perform a cycle's worth of computation

Solution:

I added the following channels to the various slow DAQ configuration files in /opt/rtcds/caltech/c1/chans/daq/

IOO_SLOW.ini:[C1:FEC-34_USR_TIME]
IOO_SLOW.ini:[C1:FEC-34_CPU_METER]
IOP_SLOW.ini:[C1:FEC-20_USR_TIME]
IOP_SLOW.ini:[C1:FEC-20_CPU_METER]
IOP_SLOW.ini:[C1:FEC-33_USR_TIME]
IOP_SLOW.ini:[C1:FEC-33_CPU_METER]
MCS_SLOW.ini:[C1:FEC-36_USR_TIME]
MCS_SLOW.ini:[C1:FEC-36_CPU_METER]
RMS_SLOW.ini:[C1:FEC-37_USR_TIME]
RMS_SLOW.ini:[C1:FEC-37_CPU_METER]
SUS_SLOW.ini:[C1:FEC-21_USR_TIME]
SUS_SLOW.ini:[C1:FEC-21_CPU_METER]

 

Notes:

To restart the daqd code, simply kill the running process.  It should restart automatically.  If it appears not to have started, check the /opt/rtcds/caltech/c1/target/fb/restart.log file and the /opt/rtcds/caltech/c1/target/fb/logs/daqd.log.xxxx files.  If you made a mistake in the DAQ channels and its complaining, fix the error and then restart init on the fb machine by running "sudo /sbin/init q"

  3846   Tue Nov 2 15:24:18 2010 josephbUpdateCDSc1ioo and c1mcs only sending MC_L, MC1_PIT, MC1_YAW

In order to have the c1mcs model run, we're running with only 3 RFM channels between c1ioo and c1mcs at the moment.  This leaves the model at around 45 microseconds, and at least lets us damp.

Alex and I still need to track down why the RFM read calls are taking so much time to execute.

  3853   Wed Nov 3 15:13:55 2010 josephbUpdateCDSTemporary RFM slow read work around

Problem:

Each RFM read in the c1mcs model is adding ~7 microseconds to the cycle time.  Adding too many pushes it over the 62 microsecond limit.

RFM writes do not have this problem.

Temporary Solution:

The fastest fix was to create a new front end model, called c1rfm, which does nothing but read in the MC1, MC2, MC3 PIT and YAW signals from the c1ioo machine, and then passes them along to the c1mcs model via shared memory, which is fast.

This means the data being sent is 2 cycles slow, one to go over the RFM, and one to go over shared memory.  It is running at 16384 cycles/second, so it shouldn't have much impact at the frequencies we use those channels for.

MC_L is still being sent directly to the c1mcs front end code via RFM.

Current Status:

The c1mcs model is running at  30-33 microseconds for CPU time.

The c1rfm model is running at 45-47 microseconds for CPU time.

All 7 channels, MC_L, MC1_PIT, MC1_YAW, MC2_PIT, MC2_YAW, MC3_PIT, MC3_YAW are responding.

The c1rfm model was added to the /diskless/root/etc/rtsystab file on the fb machine so that it automatically starts on the reboot of c1sus.

The USR and CPU time channels for c1rfm were added to the MCS_SLOW.ini file in /opt/rtcds/caltech/c1/chans/daq/ so that the framebuilder records them, namely:

[C1:FEC-38_USR_TIME]
[C1:FEC-38_CPU_METER]

The framebuilder was restarted to take these new channels into account.

Future:

Finish implementing and debugging the "round robin" RFM reader so as to not require a seperate model to be doing RFM reads in parallel.

Look into improving read speed by either merging timestamps and data into a single  or reading time stamp once every tenth or hundreth cycle, although this at best provides a factor of 2 improvement.

Check to see if RFM card being on the IO chassis or directly in the computer chassis makes a difference.

Get Alex and Rolf to improve RFM read speed.

  3855   Wed Nov 3 17:01:01 2010 josephbSummaryCDSComparison of RFM read times

Problem:

RFM reads are slow.  Rolf has said it should take 2-3 microseconds per read. 

c1sus is taking about 7 microseconds per read, twice as slow as Rolf's claim.

Hypothesis:

The RFM card is in the IO chassis, and is sharing the PCIe bus with 4 ADC cards, 3 DAC cards, 4 BO cards, and a BIO card.  Its possible this crowded bus is causing the reads to take even longer.

Test Results:

Compare read times between the c1sus computer, which has its RFM card in the IO chassis, to c1ioo, which has its RFM card in the computer.

c1ioo:

No RFM reads: 8 microseconds

3 RFM reads: 20 microseconds (~4 per read)

6 RFM reads: 32 microseconds (~4 per read)

c1sus:

No RFM read: 25 microseconds (bigger model)

1 RFM read: 33 microseconds (~8 per read)

3 RFM read: 45 microseconds (~7 per read)

6 RFM read: Over 62 microseconds, doesn't run.

Conclusion:

It looks like moving the RFM card may help by about a factor of 2 in read speed, although its still not quite what Alex and Rolf claim it should be.

The c1mcs and c1ioo models have been reverted to their normal operations.

 

  3860   Thu Nov 4 15:15:43 2010 josephbUpdateCDSModified feCodeGen.pl, fmseq.pl, and suspension screens

Feature Requested:

Have the CPU_meter change change color at various alarm levels.  These alarm levels have been set at 2/3 maximum for Minor alarm (yellow) and 9/10 maximum for Major alarm (red).

Implementation:

Rather than hand code each EPICS .db file to add the alarm files each time we rebuild the front ends, I decided to modify it at the source (since it strikes me as a generally useful alarm level for all front end codes).

First, I modified the feCodeGen.pl script.

I changed

print EPICS "OUTVARIABLE FEC\_$dcuId\_CPU_METER epicsOutput.cpuMeter int ai 0 field(HOPR,\"$rate\") field(LOPR,\"0\")\n";

to

print EPICS "OUTVARIABLE FEC\_$dcuId\_CPU_METER epicsOutput.cpuMeter int ai 0 field(HOPR,\"$rate\") field(LOPR,\"0\") field(HIGH,\"$two_thirds_rate\") field(HSV,\"MINOR\") field(HIHI,\"$     nine_tenths_rate\") field(HHSV,\"MAJOR\")\n";

I added the following two lines just before it as well:

$two_thirds_rate = int($rate * 2 / 3);

$nine_tenths_rate = int($rate * 9 / 10);

However, only the first four fields were actually added to the database file.  Apparently fmseq.pl, which populated the database, was hard coded to only handle up to 4 fields.

I modified the fmseq.pl script in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util/ so as to be able to handle up to 6 field values when writing EPICS .db files.

This change was accomplished by simply changing the following line

($junk, $v_name, $v_var, $v_type, $ve_type, $v_init, $v_efield1, $v_efield2, $v_efield3, $v_efield4 ) = split(/\s+/, $_);

to 

($junk, $v_name, $v_var, $v_type, $ve_type, $v_init, $v_efield1, $v_efield2, $v_efield3, $v_efield4, $v_efield5, $v_efield6 ) = split(/\s+/, $_);

everywhere it occurred.  There were something like 10 instances of it. Also, I added the two lines

        $vardb .= "    $v_efield5\n";
        $vardb .= "    $v_efield6\n";

after each set of

         $vardb .= "    $v_efield1\n";
         $vardb .= "    $v_efield2\n";
         $vardb .= "    $v_efield3\n";
         $vardb .= "    $v_efield4\n";

Lastly, I modified the CPU_METER bar graph on the C1SUS_DEFAULTNAME.adl screen (located in /opt/rtcds/caltech/c1/medm/master/) to use alarm levels, and then ran generate_master_screens.py.

 

  3900   Thu Nov 11 21:07:49 2010 josephbUpdateCDSPlugged c1iscex into DAQ network - still causes network slowdown

I connected the c1iscex computer to the dedicated DAQ network switch (located in 1X7).

This does not seem to have helped c1iscex stop spewing out "OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)" at the rate of ~1 Gigabyte per minute.

c1iscex is currently off until a solution can be found.

  3919   Mon Nov 15 11:13:12 2010 josephbUpdateCDSModified rc.local to not start mx_streams automatically

Problem:

c1iscex floods the network with about 1 gigabyte of error messages in a few seconds, writing to a log file in /opt/rtcds/caltech/c1/target/fb/logs/

Temporary change:

I commented out the following line in the rc.local file on the fb machine in the /diskless/root/etc/ directory:

#nice --20 ./mx_stream -s "$SYSTEMS" -d fb:0 >& logs/$HOSTNAME.log&

This disables the automatic start up of the mx_streams code on all the front ends.  This will prevent the network being brought to its knees by c1iscex while we debug the problem.

It also means on a reboot of the front ends, the mx_stream process needs to be started by hand until this change is reverted.

To do this, log into the front end and then change directory to /opt/rtcds/caltech/c1/target/fb

For c1sus, run:

./mx_stream -s c1x02 c1sus c1mcs c1rms c1rfm  -d fb:0

For c1ioo, run:

./mx_stream -s c1x03 c1ioo -d fb:0

 

  3926   Mon Nov 15 16:26:46 2010 josephbUpdateCDSc1iscex is now running and the network hasn't died

Problem:

c1iscex was spamming the network with error messages.

Solution:

Updated the front end codes to current standards (they were on the order of months out of date).  After fixing them up and rebuilding the codes on c1iscex, it no longer had problems connecting to the frame builder.\

Status:

I can look at test points for ETMX.  It is not currently damping however.

To Do:

Move filters for ETMX into the correct files. 

Need to add a Binary output blue and gold box to the end rack, and plug it into the binary output card.  Confirm the binary output logic is correct for the OSEM whitening, coil dewhitening, and QPD whitening boards. 

Get ETMX damped.

Figure out what we're going to do with the aux crate which is currently running y-end code at the new x-end.  Koji suggested simply swapping auxilliary crates - this may be the easiest.  Other option would be to change the IP address, so that when it PXE boots it grabs the x-end code instead of the y-end code.

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder TDS
                     
  3938   Wed Nov 17 10:39:20 2010 josephbUpdateCDSScreen Time Fix

An improved python code to apply a replacement to all *.adl files in a directory would be:

import re, os
files = os.listdir("./")
  for file in files:
    if ".adl" in str(file):
      data = open(file).read()
      o = open(file,"w")
      o.write( re.sub("C0:TIM-PACIFIC_STRING","C1:FEC-34_TIME_STRING",data)  )
      o.close()

Of course, this entire python script can be replaced with a single sed command:

sed -i 's/C0:TIM-PACIFIC_STRING/C1:FEC-34_TIME_STRING/g' *

A more complicated script could be written which looks for key identifiers either in the file header or inside the file to determine which front end is appropriate, using a dictionary like:

dcuid_dict = {"BS":21,"PRM":37,"SRM":37,"ITMX":21,"ITMY":21,"MC1":36,"MC2":36,"MC3":36,"ETMX":24,"ETMY":26}

and then using for loops and if statements.

 

  3940   Wed Nov 17 16:02:30 2010 josephbUpdateCDSModified feCodeGen.pl to fix filtMuxMatrix name generation

Problem:

Sometime in the last 3 weeks, probably when Alex brought his latest changes from Hanford to the 40m and did an SVN update, the code which generates the names of the filter .adl files links for the overall matrix view broke.

Fix:

I modified FE code gen to use $basename instead of the base name after the top name transform (this changes _ to - after the first 3 letters

@@ -3520,11 +3522,11 @@
 
                  my $tn = top_name_transform($basename);
                  my $basename1 = $usite . ":" . $tn . "_";
-                 my $filtername1 = $usite . $tn;
+                 my $filtername1 = $usite . $basename;

Still having problems:

The filter modules built with the matrix of filter modules run (offests/gains work), but will not load filter coefficients/filter names.  All the other filter modules outside the matrix seem to load fine.  At this point, doing a rebuild of any of the front end machines may cause the A2L filter banks to be unloadable.

 

  3945   Thu Nov 18 11:06:20 2010 josephbUpdateCDSc1sus and ADCs

Problem:

ADCs are timing out on c1sus when we have more than 3.

Talked with Rolf:

Alex will be back tomorrow (he took yesterday and today off), so I talked with Rolf.

He said ordering shouldn't make a difference and he's not sure why would be having a problem. However, when he loads the chassis, he tends to put all the ADCs on the same PCI bus (the back plane apparently contains multiples).  Slot 1 is its own bus, Slots 2-9 should be the same bus, and 10-17should be the same bus.

He also mentioned that when you use dmesg and see a line like "ADC TIMEOUT # ##### ######", the first number should be the ADC number, which is useful for determining which one is reporting back slow.

Plan:

Disconnect c1sus IO chassis completely, pull it out, pull out all cards, check connectors, and repopulate with Rolf's suggestions and keeping this elog in mind.

In regards to the RFM, it looks like one of the fibers had been disconnected from  the c1sus chassis RFM card (its plugged in in the middle of the chassis so its hard to see) during all the plugging in and out of the cables and cards last night.

  3947   Thu Nov 18 14:19:01 2010 josephbUpdateCDSSwapped c1auxex and c1auxey codes

Problem:

We had not switched the c1aux crates when we renamed the arms, thus the watchdogs labeled ETMX were really watching ETMY and vice-versa.

Solution:

I used telnet to connect to c1auxey, and then c1auxex.

I used the bootChange command to change the IP address of c1auxey to 192.168.113.59 (c1auxex's IP), and its startup script.  Similarly c1auxex was changed to c1auxey and then both were rebooted.

 

c1auxey > bootChange

'.' = clear field;  '-' = go to previous field;  ^D = quit

boot device          : ei
processor number     : 0
host name            : linux1
file name            : /cvs/cds/vw/mv162-262-16M/vxWorks
inet on ethernet (e) : 192.168.113.60:ffffff00 192.168.113.59:ffffff00
inet on backplane (b):
host inet (h)        : 192.168.113.20
gateway inet (g)     :
user (u)             : controls
ftp password (pw) (blank = use rsh):
flags (f)            : 0x0
target name (tn)     : c1auxey c1auxex
startup script (s)   : /cvs/cds/caltech/target/c1auxey/startup.cmd /cvs/cds/caltech/target/c1auxex/startup.cmd
other (o)            :

value = 0 = 0x0

c1auxex > bootChange

'.' = clear field;  '-' = go to previous field;  ^D = quit

boot device          : ei
processor number     : 0
host name            : linux1
file name            : /cvs/cds/vw/mv162-262-16M/vxWorks
inet on ethernet (e) : 192.168.113.59:ffffff00 192.168.113.60:ffffff00
inet on backplane (b):
host inet (h)        : 192.168.113.20
gateway inet (g)     :
user (u)             : controls
ftp password (pw) (blank = use rsh):
flags (f)            : 0x0
target name (tn)     : c1auxex c1auxey
startup script (s)   : /cvs/cds/caltech/target/c1auxex/startup.cmd /cvs/cds/caltech/target/c1auxey/startup.cmd
other (o)            :

value = 0 = 0x0

  3954   Fri Nov 19 12:53:50 2010 josephbUpdateCDSTestpoints on c1iscex now working

Problem:

c1iscex did not have test points working last night.

Solution:

The diag -i command indicated that :

awg 19 0 192.168.113.80 822095891 1 192.168.113.80

awg 45 0 192.168.113.80 822095917 1 192.168.113.80

The first number after the awg should be the DCUID number.  The IP address 192.168.113.80 corresponds to c1iscex.  So we had awg and testpoints setup for DCUI 19 and 45 on c1iscex.  DCUID 19 is c1x01 (the IOP), but 45 was used for a test awhile back. 

Turns out that in the testpoint.par file located in /cvs/cds/rtcds/caltech/c1/target/gds/param, there were two entries for c1scx, one with DCUID 24 and also DCUID 45.  The model at the time was running with DCUID 24.

So I changed the model DCUID to 45, deleted the [C-node24] entry in the testpoint.par file, and restarted the machine, and also did a "telnet fb 8088" and "shutdown" to restart the frame builder.

  3962   Mon Nov 22 12:00:18 2010 josephbUpdateCDSUpdated Computer Restart Procedures for FB

I've updated the  Computer Restart Procedures  page in the wiki with the latest fb restart procedure.

To just restart just the daqd (frame builder) process, do:

1) telnet fb 8088

2) shutdown

The init process will take care of the rest and restart daqd automatically.

 

Background:
Plan:
  - Check the wiring after SOS Coil Driver Module and circuit around SDSEN
  - Check whitening and dewhitening filters. We connected a binary output cable, but didn't checked them yet.
  - Make a script for step 2
  - Activate new DAQ channels for ETMX (what is the current new fresh up-to-date latest fb restart procedure?)

 

  3963   Mon Nov 22 13:16:52 2010 josephbSummaryCDSCDS Plan for the week

CDS Objectives for the Week:

Monday/Tuesday:

1) Investigate ETMX SD sensor problems

2) Fully check out the ETMX suspension and get that to a "green" state.

3) Look into cleaning up target directories (merge old target directory into the current target directory) and update all the slow machines for the new code location.

4) Clean up GDS apps directory (create link to opt/apps on all front end machines).

5) Get Rana his SENSOR, PERROR, etc channels.

Tuesday/Wednesday:

3) Install LSC IO chassis and necessary cabling/fibers.

4) Get LSC computer talking to its remote IO chassis

Wednesday:

5) If time, connect and start debugging Dolphin connection between LSC and SUS machines

 

  3964   Mon Nov 22 16:16:04 2010 josephbUpdateCDSDid an SVN update on the CDS code

Problem:

The CDS oscillator part doesn't work inside subsystems.

Solution:

Rolf checked in an older version of the CDS oscillator which includes an input (which you just connect to a ground).  This makes the parser work properly so you can build with the oscillator in a subsystem.

So I did an SVN checkout and confirmed that the custom changes we have here were not overwritten.

Edit:

Turns out the latest svn version requires new locations for certain codes, such as EPICS installs.  I reverted back to version 2160, which is just before the new EPICs and other rtapps directory locations, but late enough to pick up the temporary fix to the CDS oscillator part.

  3965   Mon Nov 22 17:48:11 2010 josephbUpdateCDSc1iscex is not seeing its Binary Output card

Problem:

c1iscex does not even see its 32 channel Binary output card.  This means we have no control over the state of the analog whitening and dewhitening filters.  The ADC, DAC, and the 1616 Binary Input/Output cards are recognized and working.

Things tried:

Tried recreating the IOP code from the known working c1x02 (from the c1sus front end), but that didn't help.

Checked seating of the card, but it seems correctly socketed and tightened down nicely with a screw.

Tomorrow will try moving cards around and see if there's an issue with the first slot, which the Binary Output card is in.

Current Status:

The ETMX is currently damping, including POS, PIT, YAW and SIDE degrees of freedom.  However, the gds screen is showing a 0x2bad status for the c1scx front end (the IOP seems fine with a 0x0 status).  So for the moment, I can't seem to bring up c1scx testpoints.  I was able to do so earlier when I was testing the status of the binary outputs, so during one of the rebuilds, something broke. I may have to undo the SVN update and/or a change made by Alex today to allow for longer filter bank names beyond 19 characters.

  3974   Tue Nov 23 10:53:20 2010 josephbUpdateCDStiming issues

Problem:

Front ends seem to be experiencing a timing issue.  I can visibly see a difference in the GPS time ticks between models running on c1ioo and c1sus. 

In addition, the fb is reporting a 0x2bad to all front ends.  The 0x2000 means a mismatch in config files, but the 0xbad indicates an out of sync problem between the front ends and the frame builder.

Plan:

As there are plans to work on the optic tables today and suspension damping is needed, we are holding off on working on the problem until this afternoon/evening, since suspensions are still damping.  It does mean the RFM connections are not available.

At that point I'd like to do a reboot of the front ends and framebuilder and see if they come back up in sync or not.

  3975   Tue Nov 23 11:20:30 2010 josephbUpdateCDSCleaning up old target directory

Winter Cleaning:

I cleaned up the /cvs/cds/caltech/target/ directory of all the random models we had built over the last year, in preparation for the move of the old /cvs/cds/caltech/target slow control machine code into the new /opt/rtcds/caltech/c1/target directories.

I basically deleted all the directories generated by the RCG code that were put there, including things like c1tst, c1tstepics, c1x00, c1x00epics, and so forth.  Pre-RCG era code was left untouched.

  3978   Tue Nov 23 16:55:14 2010 josephbUpdateCDSUpdated apps

Updated Apps:

I created a new setup script for the newest build of the gds tools (DTT, foton, etc), located in /opt/apps (which is a soft link from /cvs/cds/apps) called gds-env.csh.

This script is now sourced by cshrc.40m for linux 64 bit machines.  In addition, the control room machines have a soft link in the /opt directory to the /cvs/cds/apps directory.

So now when you type dtt or foton, it will bring up the Centos compiled code Alex copied over from Hanford last month.

  3994   Tue Nov 30 12:10:44 2010 josephbUpdateelogElog restarted again

The elog seemed to be down at around 12:05pm.  I waited a few minutes to see if the browser would connect, but it did not.

I used the existing script in /cvs/cds/caltech/elog/ (as opposed to Zach's new on in elog/elog-2.8.0/) which also seems to have worked fine.

  3995   Tue Nov 30 12:25:08 2010 josephbUpdateCDSLSC computer to chassis cable dead

Problem:

We seemed to have a broken fiber link for use between the LSC and its IO chassis.  It is unclear to mean when this damage occurred.  The cable had been sitting in a box with styrofoam padding, and the kink is in the middle of the fiber, with no other obvious damage near by.  The cable however may have previously been used by the people in Downs for testing and possibly then.  Or when we were stringing it, we caused a kink to happen.

Tried Solutions:

I talked to Alex yesterday, and he suggested unplugging the power on both the computer and the IO chassis completely, then plugging in the new fiber connector, as he had to do that once with a fiber connection at Hanford.  We tried this this morning, however, still no joy.  At this point I plan to toss the fiber as I don't know of any way to rehabilitate kinked fibers.

Note this means that I rebooted c1sus and then did a burt restore from the Nov/30/07:07 directory for c1suspeics, c1rmsepics, c1mcsepics.  It looks like all the filters switched on.

Current Plan:

We do, however, have the a Dolphin fiber which originally was intended to go between the LSC and its IO chassis, before Rolf was told it doesn't work well that way.  However, we were going to connect the LSC machine to the rest of the network via Dolphin.

We can put the LSC machine next to its chassis in the LSC rack, and connect the chassis to the rest of the front ends by the Dolphin fiber.  In that case we just need the usual copper style cable going between the chassis and the computer.

 

  3999   Tue Nov 30 16:02:18 2010 josephbUpdateCDSstatus

Issues:

1) Turns out the /opt/rtcds/caltech/c1/target/gds/param/testpoint.par file had been emptied or deleted at one point, and the only entry in it was c1pem.  This had been causing us a lack of test points for the last few days.  It is unclear when or how this happened.  The file has been fixed to include all the front end models again.  (Fixed)

2) Alex and I worked on tracking down why there's a GPS difference between the front ends and the frame builder, which is why we see a 0x4000 error on all the front end GDS screens. This involved several rebuilds of the front end codes and reboots of the machines involved. (Broken)

3) Still working on understanding why the RFM communication, which I think is related to the timing issues we're seeing.  I know the data is being transferred on the card, but it seems to being rejected after being red in, suggesting a time stamp mismatch. (Broken)

4) The c1iscex binary output card still doesn't work.  (Broken)

Plan:

Alex and I will be working on the above issues tomorrow morning.

Status:

Currently, the c1ioo, c1sus and c1iscex computers are running with their front ends. They all still have 0x4000 error.  However, you can still look at channels on dataviewer for example.  However, there's a possibility of inconsistent timing between computer (although all models on a single computer will be in sync).

All the front ends where burt restorted to 07:07 this morning.  I spot checked several optic filter banks and they look to have been turned on.

  4009   Fri Dec 3 15:37:10 2010 josephbUpdateCDSfb, front ends fixed - tested RFM between c1ioo and c1iscex

Problem:

The front ends and fb computers were unresponsive this morning.

This was due to the fb machine having its ethernet cable plugged into the wrong input.   It should be plugged into the port labeled 0.

Since all the front end machines mount their root partition from fb, this caused them to also hang.

Solution:

The cable has been relabled to "fb" on both ends, and plugged into the correct jack.  All the front ends were rebooted.

 

Testing RFM for green locking:

I tested the RFM connection between c1ioo and c1scx.  Unfortunately, on the first test, it turns out the c1ioo machine had its gps time off by 1 second compared to c1sus and c1iscex.  A second reboot seems to have fixed the issue.

However, it bothers me that the code didn't come up with the correct time on the first boot.

The test was done using the c1gcv model and by modifying the c1scx model.  At the moment, the MC_L channel is being passed the MC_L input of the ETMX suspension.  In the final configuration, this will be a properly shaped error signal from the green locking.

The MC_L signal is currently not actually driving the optic, as the ETMX POS MATRIX currently has a 0 for the MC_L component.

  4014   Mon Dec 6 11:59:41 2010 josephbUpdateCDSNew c1lsc computer moved to lsc rack

Computer moved:

The c1lsc computer has been moved over to the 1Y3 rack, just above the c1lsc IO chassis. 

It will talking to the c1sus computer via a Dolphin PCIe reflected memory card.  The cards have been installed into c1lsc and c1sus this morning.

It will talk to its IO chassis via the usual short IO chassis cable.

 

To Do:

The Dolphin fiber still needs to be strung between c1sus and c1lsc.

The DAQ cable between c1lsc and the DAQ router (which lets the frame builder talk directly with the front ends) also needs t to be strung.

c1lsc needs to be configured to use fb as a boot server, and the fb needs to be configured to handle the c1lsc machine.

  4015   Mon Dec 6 16:49:43 2010 josephbUpdateCDSc1lsc halfway to working

C1LSC Status:

The c1lsc computer is running Gentoo off of the fb server. It has been connected to the DAQ network and is handling mx_streams properly (so we're not flooding the network error messages like we used to with c1iscex).  It is using the old c1lsc ip address (192.168.113.62). It can ssh'd into.

However, it is not talking properly to the IO chassis.  The IO chassis turns on when the computer turns on, but the host interface board in the IO chassis only has 2 red lights on (as opposed to many green lights on the host interface boards in the c1sus, c1ioo, and c1iscex IO chassis).  The c1lsc IO processor (called c1x04) doesn't see any ADCs, DACs, or Binary cards.  The timing slave is receiving 1PPS and is locked to it, but because the chassis isn't communicating, c1x04 is running off the computer's internal clock, causing it to be several seconds off. 

Need to investigate why the computer and chassis are not talking to each other.

General Status:

The c1sus and c1ioo computers are not talking properly to the frame builder.  A reboot of c1iscex fixed the same problem earlier, however, as Kiwamu and Suresh are working in the vacuum, I'm leaving those computers alone for the moment, but a reboot and burt restore probably should be done later today for c1sus and c1ioo

 

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Dolphin RFM Sim.Plant Frame builder TDS
                       
  4020   Tue Dec 7 16:09:53 2010 josephbUpdateCDSc1iscex status

I swapped out the IO chassis which could only handle 3 PCIe cards with the another chassis which has space for 17, but which previously had timing issues.  A new cable going between the timing slave and the rear board seems to have fixed the timing issues. 

I'm hoping to get a replacement PCI extension board which can handle more than 3 cards this week from Rolf and then eventually put it in the Y-end rack.  I'm also still waiting for a repaired Host interface board to come in for that as well.

At this point, RFM is working to c1iscex, but I'm still debugging the binary outputs to the analog filters.  As of this time they are not working properly (turning the digital filters on and off seems to have no effect on the transfer function measured from an excitation in SUSPOS, all the way around to IN1 of the sensor inputs (but before measuring the digital fitlers).  Ideally I should see a difference when I switch the digital filters on and off (since the analog ones should also switch on and off), but I do not.

  4025   Wed Dec 8 12:26:56 2010 josephbUpdateCDSmegatron set up - as a test front end

[josephb, Osamu]

Megatron Setup:

To show Osamu how to setup a a front end as well as provide a test computer for Osamu's use, we used the new megatron (sunfire x4600 with 16 cores and 8 gigabytes of memory) as a front end without an IO chassis.

The steps we followed are in the wiki, here.

The new megatron's IP address is 192.168.113.209.  It is running the c1x99 front end code.

  4028   Wed Dec 8 14:51:09 2010 josephbUpdateCDSc1pem now recording data

Problem:

c1pem model was reporting all zeros for all the PEM channels.

Solution:

Two fold.  On the software end, I added ADCs 0, 1, and 2 to the model.  ADC 3 was already present and is the actual ADC taking in PEM information.

There was a problem noted awhile back by Alex and Rolf that there's a problem with the way the DACs and ADCs are number internally in the code.  Missing ADCs or DACs prior to the one you're actually using can cause problems.

At some point that problem should be fixed by the CDS crew, but for now, always include all ADCs and DACs up to and including the highest number ADC/DAC you need to use for that model.

On the physical end, I checked the AA filter chassis and found the power was not plugged in.  I plugged it in.

Status:

We now have PEM channels being recorded by the FB, which should make Jenne happier.

ELOG V3.1.3-