40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 263 of 357  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  4509   Mon Apr 11 13:30:04 2011 josephb, JamieUpdateCDSNo Wiper script - Frames full over weekend


The daqd process was dying every minute or so when it couldn't write frame.  This was slowing down the network by writing a 2.9G core dump over NFS every minute or so. (In /opt/rtcds/caltech/c1/target/fb/).

The problem was /frames/ was 100% full.

Apparently, when we switched the fb over to Gentoo, we forgot to install crontab and a wiper script.


We will install crontab and get the wiper script installed.

  822   Mon Aug 11 11:36:11 2008 josephb, SteveConfigurationComputersc1susvme1 minor problems
Around 11 am c1susvme1 start having issues. Namely C1:SUS-PRM_FE_SYNC was railing at some large value like 16384 (2^14). I presume this means the computer was running catastophically late.

I turned off the BS and ITM watch dogs (the PRM was already off), tried hitting reset and sshing in, and running startup, but this didn't help. I then turned off the c1susvme2 associated watch dogs (MC1-3, SRM) and went out to do a hard reboot by switching the crate power off. c1susvme2 came back up fine, was restarted and associated watch dogs turned back on. However, c1susvme1 came back up without mounting /cvs/cds/.

As a test, I replaced the ethernet connection with a CAT6 cable to the Prosafe switch in 1Y6, and then ran reboot on c1susvme1. When it came back up, it had mounted properly, and I was able to run the ./startup.cmd file. At this point it seems to be happy. The new cable is in the trays coming in from the top of the 1Y4 and 1Y6 and approriately labeled.

Edit: Apparently ITMX and ITMY became excited after the reboot (perhaps I turned the watchdogs back on too early? Although that was after the DAQ light was listed as green for c1susvme). Steve noticed this when the alarms went off again (I had turned them off after the reboot seemed successful), and he damped them. Interestingly, the BS remained unexcited.
  1673   Mon Jun 15 15:17:33 2009 josephb, SteveConfigurationVACVacuum control and monitor screens

We updated the vacuum control and monitor screens  (C0VAC_MONITOR.adl and C0VAC_CONTROL.adl).  We also updated the /cvs/cds/caltech/target/c1vac1/Vac.db file.

1) We changed the C1:Vac-TP1_lev channel to C1:Vac-TP1_ala channel, since it now is an alarm readback on the new turbo pump rather than an indication of levitation.  The logic on printing the "X" was changed from X is printed on a 1 = ok status) to X is printed on a 0 = problem status.  All references within the Vac.db file to C1:Vac-TP1_lev were changed.  The medm screens also now are labeled Alarm, instead of Levitating.

2) We changed the text displayed by the CP1 channel (C1:Vac-CP1_mon in Vac.db) from "On" and "Off" to "Cold - On" and "Warm - OFF".

3) We restarted the c1vac1 front end as well as the framebuilder after these changes.

  1258   Thu Jan 29 16:50:53 2009 josephb, albertoConfigurationComputersMegatron fixed
The warning light on megatron and the fans at full speed were fixed by not just power cycling, but completely unplugging megatron from power, waiting for a minute, and then reconnecting the power cables.

Apparently, the Sunfire X4600s at Hanford have also had intermittent problems, which required unplugging the machines completely. In their case, they were unable to access the machine normally, nor could they access the the Intergrated Lights Out Manager (ILOM).

Here, we could interact normally with megatron, but were unable to connect to the ILOM. We were able to get BIOS, but unable to change any of the setting for the ILOM connection. Since the ILOM is a seperate processor and effectively always on, even when the power light is off, a normal shutdown won't reset it. Hence the need to completely disconnect the power supply.
  1555   Thu May 7 15:22:19 2009 josephb, albertoConfigurationComputersfb40m


Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.


We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.





This afternoon the alignment script chrashed after returning sysntax errors. We found that the tpman wasn't running on the framebuilder becasue it had probably failed to get restarted in one of the several reboots executed in the morning by Alex and Jo.

Restarting the tpman was then sufficient for the alignment scripts to get back to work.

  1668   Thu Jun 11 14:54:18 2009 josephb, albertoUpdateComputersWireless network

After poking around for a few minutes several facts became clear:

1) At least one GPIB interface has a hard ethernet connection (and does not currently go through the wireless).

2) The wireless on the laptop works fine, since it can connect to the router.

3) The rest of the martian network cannot talk to the router.

This led to me replugging the ethernet cord back into the wireless router, which at some point in the past had been unplugged.  The computers now seem to be happy and can talk to each other.


  1554   Thu May 7 12:21:36 2009 josephb, alexConfigurationComputersfb40m

Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.


We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.



  2215   Mon Nov 9 14:59:34 2009 josephb, alexUpdateComputersThe saga of Megatron continues

Apparently the random file system failure on megatron was unrelated to the RFM card (or at least unrelated to the physical card itself, its possible I did something while installing it, however unlikely).

We installed a new hard drive, with a duplicate copy of RTL and assorted code stolen from another computer.  We still need to get the host name and a variety of little details straightened out, but it boots and can talk to the internet.  For the moment though, megatron thinks its name is scipe11.

You still use ssh megatron.martian to log in though.

We installed the RFM card again, and saw the exact same error as before.  "NMI EVENT!" and "System halted due to fatal NMI".

Alex has hypothesized that the interface card the actual RFM card plugs into, and which provides the PCI-X connection might be the wrong type, so he has gone back to Wilson house to look for a new interface card.  If that doesn't work out, we'll need to acquire a new RFM card at some point

After removing the RFM card, megatron booted up fine, and had no file system errors.  So the previous failure was in fact coincidence.


  2225   Tue Nov 10 10:51:00 2009 josephb, alexUpdateComputersMegatron on, powercycled c1omc, and burt restored from 3am snapshot

Last night around 5pm or so, Alex had remotely logged in and made some fixes to megatron.

First, he changed the local name from scipe11 to megatron.  There were no changes to the network, this was a purely local change.  The name server running on Linux1 is what provides the name to IP conversions.  Scipe11 and Megatron both resolve to distinct IPs. Given c1auxex wasn't reported to have any problems (and I didn't see any problems with it yesterday), this was not a source of conflict.  Its possible that Megatron could get confused while in that state, but it would not have affected anything outside its box.

Just to be extra secure, I've switched megatron's personal router over from a DMZ setup to only forwarding port 22.  I have also disabled the dhcp server on the gateway router (

Second, he turned the mdp and mdc codes on.  This should not have conflicted with c1omc.

This morning I came in and turned megatron back on around 9:30 and began trying to replicate the problems from last night between c1omc and megatron.  I called Alex and we rebooted c1omc while megatron was on, but not running any code, and without any changes to the setup (routers, etc).  We were able to burt restore.  Then we turned the mdp, mdc and framebuilder codes on, and again rebooted c1omc, which appeared to burt restore as well (I restored from 3 am this morning, which looks reasonable to me). 

Finally, I made the changes mentioned above to the router setups in the hope that this will prevent future problems but without being able to replicate the issue I'm not sure.

  2266   Fri Nov 13 10:28:03 2009 josephb, alexUpdateComputersMegatron is back to its old self

I called Alex this morning and explained the problems with megatron.

Turns out when he had been setting up megatron, he thought a startup script file, rc.local was missing in the /etc directory.  So he created it.  However, the rc.local file in the /etc directory is normally just a link to the /etc/rc.d/rc.local file.  So on startup (basically when we rebooted the machine yesterday), it was running an incorrect startup script file.  The real rc.local includes line:

/usr/bin/setup_shmem.rtl mdp mdc&

Hence the errors we were getting with shm_open().  We changed the file into a soft link, and resourced the rc.local script and mdp started right up.  So we're back to where we were 2 nights ago (although we do have an RFM card in hand).

Update:  The tst module wouldn't start, but after talking to Alex again, it seems that I need to add the module tst to the /usr/bin/setup_shmem.rtl mdp mdc& line in order for it to have a shared memory location setup for it.  I have edited the file (/etc/rc.d/rc.local), adding tst at the end of the line.  On reboot and running starttst, the code actually loads, although for the moment, I'm still getting blank white blocks on the medm screens.

  2305   Fri Nov 20 11:01:58 2009 josephb, alexConfigurationComputersWhere to find RFM offsets

Alex checked out the old rts (which he is no longer sure how to compile) from CVS to megatron, to the directory:


In /home/controls/cds/rts/src/include you can find the various h files used.  Similarly, /fe has the c files.

In the h files, you can work out the memory offset by noting the primary offset in iscNetDsc40m.h

A line like suscomms.pCoilDriver.extData[0] determines an offset to look for.

0x108000 (from suscomms )

Then pCoilDriver.extData[#] determines a further offset.

sizeof(extData[0]) = 8240  (for the 40m - you need to watch the ifdefs, we were looking at the wrong structure for awhile, which was much smaller).

DSC_CD_PPY is the structure you need to look in to find the final offset to add to get any particular channel you want to look at.

The number for ETMX is 8, ETMY 9 (this is in extData), so the extData offset from 0x108000 for ETMY should be 9 * 82400.  These numbers (i.e. 8 =ETMX, 9=ETMY) can be found in losLinux.c in /home/controls/cds/rts/src/fe/40m/.  There's a bunch of #ifdef and #endif which define ETMX, ETMY, RMBS, ITM, etc.  You're looking for the offset in those.

So for ETMY LSC channel (which is a double) you add 0x108000 (a hex number) + (9 * 82400 + 24) (not in hex, need to convert) to get the final value of 0x11a160 (in hex).


A useful program to interact with the RFM network can be found on fb40m.  If you log in and go to:


you can then run rfm2g_util, give it a 3, then type help.

You can use this to read data.  Just type help read.  We had played around with some offsets and various channels until we were sure we had the offsets right.  For example, we fixed an offset into the ETMY LSC input, and saw the corresponding memory location change to that value.  This utility may also be useful for when we do the RFM test to check the integrity of the ring, as there are some diagnostic options available inside it.

  2306   Fri Nov 20 11:14:22 2009 josephb, alexConfigurationComputerstest points working on megatron and we may have filters with switch outputs built in

Alex tooked at the channel definitions (can be seen in tpchn_C1.par), and noticed the rmid was 0. 

However, we had set in testpoint.par the tst system to C-node1 instead of C-node0.  The final number inf that and the rmid need to be equal.   We have changed this, and the test points appear to be working now.

However, the confusing part is in the tst model, the gds_node_id is set to 1.  Apparently, the model starts counting at 1, while the code starts counting at 0, so when you edit the testpoint.par file by hand, you have to subtract one from whatever you set in the model.

In other news, Alex pointed me at a CDS_PARTS.mdl, filters, "IIR FM with controls".  Its a light green module with 2 inputs and 2 outputs.  While the 2nd set of input and outputs look like they connect to ground, they should be iterpreted by the RCG to do the right thing (although Alex wasn't positive it works, it worth trying it and seeing if the 2nd output corresponds to a usable filter on/off switch to connect to the binary I/O to control analog DW.  However, I'm not sure it has the sophistication to wait for a zero crossing or anything like that - at the moment, it just looks like a simple on/off switch based on what filters are on/off.

  2487   Fri Jan 8 11:43:22 2010 josephb, alexUpdateComputersRFM and Megatron

Alex came over with a short RFM cable this morning.  We used it to connect the rfm card in c1iscey to the rfm card megatron

Alex renamed startup.cmd in /cvs/cds/caltech/target/c1iscey/ to startup.cmd.sav, so it doesn't come up automatically.  At the end we moved it back.

Alex used the vxworks command d to look at memory locations on c1iscey.  Such as d 0xf0000000, which is the start of the rfm code location.  So to look at 0x11a1c8 (lscPos) in the rfm memory, he typed "d 0xf011a1c8".  After doing some poking around, we look at the raw tst front end code (in /home/controls/cds/advLigo/src/fe/tst), and realized it was trying to read doubles.  The old rts code uses floats, so the code was reading incorrectly.

As a quick fix, we changed the code to floats for that part.  They looked like:

etmy_lsc = filterModuleD(dsp_ptr,dspCoeff,ETMY_LSC,cdsPciModules.pci_rfm[0]? *(\
(double *)(((void *)cdsPciModules.pci_rfm[0]) + 0x11a1c8)) : 0.0,0);

And we simply changed the double to float in each case.  In addition we changed the RCG scripts locally as well (if we do a update at some point, it'll get overwritten).  The file we updated was /home/controls/cds/advLigo/src/epics/util/lib/RfmIO.pm

Line 57 and Line 84 were changed, with double replaced with float.

return "cdsPciModules.pci_rfm[0]? *((float *)(((void *)cdsPciModules.pci
_rfm[$card_num]) + $rfmAddressString)) : 0.0";

. "  *((float *)(((char *)cdsPciModules.pci_rfm[$card_nu
m]) + $rfmAddressString)) = $::fromExp[0];\n"

This fixed our ability to read the RFM card, which now can read the LSC POS channel, for example.

Unfortunately, when we were putting everything the way it was with RFM fibers and so forth, the c1iscey started to get garbage (all the RFM memory locations were reading ffff).  We eventually removed the VME board, removed the RFM card, looked at it, put the RFM card back in a different slot on the board, and returned c1iscey to the rack.  After this it started working properly.  Its possible in all the plugging and unplugging that the card somehow had become loose.

The next step is to add all the channels that need to be read into the .mdl file, as well as testing and adding the channel which need to be written.


  2488   Fri Jan 8 15:40:14 2010 josephb, alexUpdateComputersRFM and RCG

Alex added a new module to the RCG, for generating RFMIO using floats.  This has been commited to CVS.

  2542   Fri Jan 22 12:33:37 2010 josephb, alexUpdateComputersModified CDS_PARTS for Binary output

Turns out the CDSO32 part (representing the Contec BO-32L-PE binary output) rquires two inputs. One for the first 16 bits, and one for the second set of 16 bits.  So Alex added another input to the part in the library.  Its still a bit strange, as it seems the In1 represents the second set of 16 bits, and the In2 represents the first set of 16 bits.

I added two sliders on the CustomAdls/C1TST_ETMY.adl control screen (upper left), along with a bit readout display, which shows the bitwise and of the two slider channels. For the moment, I still can't see any output voltage on any of the DO pins, no matter what output I set.


  2591   Thu Feb 11 18:33:54 2010 josephb, alexUpdateComputersStatus of the IP change over

A few machines have still not been changed over, including a few laptops, mafalda, ottavia, and c0rga.

All the front ends have been changed over.

fb40m died during a reboot and was replaced with a spare Sun blade 1000 that Larry had.  We had to swap in our old hard drive and memory.

All the front ends, belladonna, aldabella, and the control room machines have been switched over. Nodus was changed over after we realized we hosed the elog and svn by switching linux1's IP.

At this point, 90% of the machines seem to be working, although c0daqawg seems to be having some issues with its startup.cmd code.

  2603   Sat Feb 13 18:58:31 2010 josephb, alexUpdateComputersfb40m testpoints fixed

I received an e-mail from Alex indicating he found the testpoint problem and fixed it today:

Quote from Alex: "After we swapped the frame builder computer it has reconfigured all device files and I needed to create some symlinks on /dev/ to make tpman work again. I test the testpoints and they do work now."


  2983   Tue May 25 16:40:27 2010 josephb, alexUpdateCDSFinally tracked down why new models wouldn't talk to each other

The problem with the new models using the new shared memory/dolphin/RFM defined as names in a single .ipc file.

The first is the no_oversampling flag should not be used.  Since we have a single IO processor handling ADCs and DACs at 64k, while the models run at 16k, there is some oversampling occuring.  This was causing problems syncing between the models and the IOP.

It also didn't help I had a typo in two channels which I happened to use as a test case to confirm they were talking.  However, that has been fixed.

  3055   Tue Jun 8 15:58:25 2010 josephb, alexUpdateCDSNew multi-filter matrix part added to RCG (at the 40m at least)

A new webview of the LSP model is available at:


This model include a couple example noise generators as well as the new Matrix of Filter banks (5 inputs x 15 outputs = 75 Filters!).  The attached png shows where these parts can be found in the CDS_PARTS library.  I'm still working on the automatic generation of the matrix and filter bank medm screens for this part.  The plan is to have a matrix screen similar to current ones, except that the value entry points to the gain setting of the associated filter.  In addition, underneath each value, there will be a link to the full filter bank screen.  Ideally, I'd like to have the filter adl files located in a sub-directory of the system, to keep clutter down.

I've cut and past the new Foton file generated by the LSP model below.  The first number following the MTRX is the input the filter is taking data from and the second number is the output its pushing data to.  This means for the script parsing Valera's transfer functions, I need to input which channel corresponds to which number, such as DARM = 0, MICH = 1, etc.  So the next step is to write this script and populate the filter banks in this file.

# Computer generated file: DO NOT EDIT
# MODULES Mirror2DOF_f2x2 Mirror2DOF_f2x3 Mirror2DOF_f2x4 Mirror2DOF_f2x5
# MODULES Mirror2DOF_f2x6 Mirror2DOF_f2x7 DOF2PD_MTRX_0_0 DOF2PD_MTRX_0_1

Attachment 1: CDS_Library.png
  3534   Tue Sep 7 15:31:28 2010 josephb, alexUpdateCDSBinary Output working/ IO chassis fixed

As noted previously, we were having problems with getting multiple 32 channel binary output cards working.  Alex came by and we eventually tracked the problem down to an incorrect counter in the c code.  This has been fixed and checked into the CDS svn repository.  I tested the actual hardware and we are in fact able to turn our test LEDs on with multiple binary output boards.


Alex and I also looked at the non-functional IO chassis (the one which wouldn't sync with the 1PPS signal and wasn't turning on when the computer turned on.  We discovered one corner of the trenton board wasn't screwed down and was in fact slightly warped.  I screwed it down properly, straightening the board out in the process.  After this, the IO chassis worked with a host interface board to the computer and started properly.  We were able to see the boards attached as well with lspci.  So that chassis looks to be in working condition now.

Onwards to the RFM test.

  3600   Thu Sep 23 12:05:20 2010 josephb, alexUpdateCDSfb40m down, new fb in progress

Alex came over this morning and we began work on the frame builder change over.  This required fb40m be brought down and disconnected from the RAID array, so the frame builder is not available.

He brought a Netgear switch which we've installed at the top of the 1X7 rack.  This will eventually be connected, via Cat 6 cable, to all the front ends.  It is connected to the new fb machine via a 10G fiber.

Alex has gone back to Downs to pickup a Symmetricon (sp?) card for getting timing information into the frame builder.  He will also be bringing back a harddrive with the necessary framebuilder software to be copied onto the new fb machine.

He said he'd like to also put a Gentoo boot server on the machine.  This boot server will not affect anything at the moment, but its apparently the style the sites are moving towards.  So you have a single boot server, and diskless front end computers, running Gentoo.  However for the moment we are sticking with our current Centos real time kernel (which is still compatible with the new frame builder code).  However this would make a switch over to the new system possible in the future.

At the moment, the RAID array is doing a file system check, and is going slowly while it checks terabytes of data.  We will continue work after lunch. 

 Punchline: things still don't work.

  3602   Thu Sep 23 21:01:11 2010 josephb, alexUpdateCDSfb40m still down, new fb still in progress
Unfortunately, copying the data to the USB/SATA drive over at downs took longer than expected for Alex. We will be installing the new code on the new fb machine tomorrow and running it. We will be running off of a timer on that machine until Monday. On Monday, a Symmetricom card will be arriving from LLO so that we can connect an IRIG-B timing signal into the frame builder and use a proper time signal. There is no running frame builder for tonight and thus will be no trends until we get the new FB running tomorrow morning.
  3620   Wed Sep 29 12:08:28 2010 josephb, alexSummaryCDSLast burt save of old controls

This is being recorded for posterity so we know where to look for the old controls settings.

The last good burt restore that was saved before turning off scipe25 aka c1dcuepics was on September 29, 11:07.

  3625   Thu Sep 30 11:07:20 2010 josephb, alexUpdateCDStest points starting to work

The centos 5.5 compiled gds code is currently living on rosalba in the /opt/app directory (this is local to Rosalba only).  It has not been fully compiled properly yet.  It is still missing ezcaread/write/ and so forth.  Once we have a fully working code, we'll propagate it to the correct directories on linux1.

So to have a working dtt session with the new front ends, log into rosalba, go to opt/apps/, and source gds-env.bash in /opt/apps (you need to be in bash for this to work, Alex has not made a tcsh environment script yet).  This will let get testpoints and be able to make transfer function measurements, for example

Also, to build the latest awgtpman, got to fb, go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/gds, and type make.  This has been done and mentioned just as reference.

The awgtpman along with the front end models should startup automatically on reboot of c1sus (courtesy of the /etc/rc.local file).

  3628   Thu Sep 30 16:29:35 2010 josephb, alexUpdateCDSfb update

There currently seems to be a timing issue with  the frame builder.  We switched over to using a symmetricom card to get an IRIG-B signal into the fb machine, but the gps time stamp is way off (~80 years Alex said).

If there is a frame buiilder issue, its currently often necessary to kill the associated mx_stream processes, since they don't seem to restart gracefully.  To fix it the following steps should be taken:

Kill frame builder, kill the two mx_stream processes, then /etc/restart_streams/, then restart the frame builder (usual daqd -c ./daqdrc >& ./daqd.log in /opt/rtcds/caltech/c1/target/fb).

To restart (or start after a boot) the nds server, you need to go to /opt/rtcds/caltech/c1/target/fb and type

./nds /opt/rtcds/caltech/c1/target/fb/pipe

At this time, testpoints are kind of working, but timing issues seem to be preventing useful work being done with it.  I'm leaving with Alex working on the code.


  3633   Fri Oct 1 11:33:15 2010 josephb, alexConfigurationCDSChanging gds code to the new working version

Alex is installing the newly compiled gds code (compiled on Centos 5.5 on Rosalba) which does in fact include the ezca type tools. 

At the moment we don't have a solaris compile, although that should be done at somepoint in the future.  It means the gds tools (diaggui, foton, etc) won't work on op440m.  On the bright side, this newer gds code has a foton that doesn't seem to crash all the time on Linux.


  3635   Fri Oct 1 14:13:29 2010 josephb, alexUpdateCDSfb work that still needs to be done

1) Need to check 1 PPS signal alignment

2) Figure out why 1PPS and ADC/DAC testpoints went away from feCodeGen.pl?

3) Fix 1PPS testpoint giving NaN data

4) Figure out why is daqd printing "making gps time correction" twice?

5) Need to investigate why mx_streams are still getting stuck

6) Epics channels should not go out on 114 network (seen messages when doing
burt restore/save).

7) Dataviewer leaves test points hanging, daqd does not deallocate them
(net_Writer.c shutdown_netwriter call)

8) Need to install wiper scripts on fb

9) Need to install newer kernel on fb to avoid loading myrinet firmware
(avoid boot delay)

  3648   Tue Oct 5 13:46:26 2010 josephb, alexUpdateCDSRestarted fb trending

Fb is now once again actually recording trends.

A section of the daqdrc file (located in /opt/rtcds/caltech/c1/target/fb/ directory) had been commented out by Alex and never uncommented.  This section included the commands which actually make the fb record trends.

The section now reads as:

# comment out this block to stop saving data

start frame-saver;
sync frame-saver;
start trender;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start raw_minute_trend_saver;
#start frame-writer "" broadcast="" all;
#sleep 5;

  3651   Tue Oct 5 14:11:09 2010 josephb, alexUpdateCDSGoing to from rtlinux to Gentoo requires front end code clean out

Apparently when updating front end codes from rtlinux to the patched Gentoo, certain files don't get deleted when running make clean, such as the sysfe.rtl files in the advLigoRTS/src/fe/sys directories.  This fouls the start up scripts by making it think it should be configured for rtlinux rather than the Gentoo kernel module.

  3696   Tue Oct 12 13:05:00 2010 josephb, alexUpdateCDSfb still flaky, models time out fixed

Interesting information from Alex.  We're limited to 2 Megabytes per second per front end model.  Assuming all your channels are running at a 2kHz rate, we can have at most 256 channels being set to the frame builder from the front end (assuming 4 byte data).  We're fine for the moment, but perhaps useful to keep in mind.

I talked to Alex this morning and he said the frame builder is being flaky (it crashed on us twice this morning, but the third time seemed to stay up when requesting data).  I've added a new wiki page called "New Computer Restart Prodecures" under Computers and Scripts, found here. It includes all the codes that need to be running, and also a start order if things seem to be in a particularly bad state.  Unfortunately, there were no fixes done to the frame builder but it is on Alex's list of things to do.

In regards to the timing out of the front ends, Alex came over to the 40m this morning and we sat down debugging.  We tried several things, such as removing all filters from C1MCS.txt file in the chans directory, and watching the timing as we pressed various medm control buttons. We traced it to a filters used by the DAC in the model talking to the IOP front end, which actually sends the data to the physical DAC card.  The filter is used when converting between sample rates, in this case between the 16 kHz of the front end model and the 64 kHz of the IOP.  Sending it raw zeros after having had real data seemed to cause this filter to eat up an usually large amount of CPU time. 

We modified the /opt/rtcds/caltech/c1/core/advLigoRTS/src/include/drv/fm10Gen.c file.

We reverted a change that was done between version 908 and 929, where underflows (really small numbers) were dealt with by adding and then subtracting a very small number.  We left the adding and subtracting, but also restored the hard limits on the history.

So instead of relying on just:

input += 1e-16;
junk = input;
input -= 1e-16;

we also now use

if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

Thus any filter value who's absolute value is less than 1e-20 will be clamped to -1e-20 or 1e-20.  On the bright side, we no longer crash the front ends when we turn something off.


  3735   Mon Oct 18 15:33:00 2010 josephb, alexUpdateCDSPossibly broken timing cards

It looks as though we may have two IO chassis with bad timing cards.

Symptoms are as follows:

We can get our front end models writing data and timestamps out on the RFM network.

However, they get rejected on the receiving end because the timestamps don't match up with the receiving front end's timestamp.  Once started, the system is consistently off by the same amount. Stopping the front end module on c1ioo and restarting it, generated a new consistent offset.  Say off by 29,000 cycles in the first case and on restart we might be 11,000 cycles off.  Essentially, on start up, the IOP isn't using the 1PPS signal to determine when to start counting.

We tried swapping the spare IO chassis (intended for the LSC) in ....

# Joe will finish this in 3 days.

# Basically, in conclusion, in a word, we found that c1ioo IO chassis is wrong.

  3844   Tue Nov 2 11:34:53 2010 josephb, alexUpdateCDSdiagconfd running, excitations back in dtt


Diagnostic test tools was starting with errors.


After the reboot of the frame builder machine yesterday by Alex, the diagconfd daemon was not getting started by xinetd.  There was a sequence error in the startup where xinetd was being called before mounting drives from linux1.

Important Note:

If you do not see the "nds" line you would not have diagnostic tests enabled in the DTT:

[controls@rosalba apps]$ diag -i | grep nds
nds * * 8088 *


Alex changed /etc/xinetd.d/diagconfd file to point to /opt/apps/gds/bin/diagconfd instead of /opt/apps/bin/diagconf.  He also ensured xinetd started after mounting from linux1.

Alex's Suggestion:
My feeling is we should get rid of this feature and have an NDS address
entry box in the "Online" tab in the DTT with the default "nds". I
mentioned this to Jim Batch and he greed with me, so maybe he is going to
implement this. So maybe you guys want to request the same thing too, send
the request to Rolf and Jim, so we can have the last demon exercised.

  3869   Fri Nov 5 15:20:18 2010 josephb, alexSummaryCDS40m computer slow down solved


The 40m computers were responding sluggishly yesterday, to the point of being unusable.


The mx_stream code running on c1iscex (the X end suspension control computer) went crazy for some reason.  It was constantly writing to a log file in /cvs/cds/rtcds/caltech/c1/target/fb/  In the past 24 hours this file had grown to approximately 1 Tb in size. The computer had been turned back on yesterday after having reconnected its IO chassis, which had been moved around last week for testing purposes - specifically plugging the c1ioo IO chassis in to it to confirm it had timing problems.

Current Status:

The mx_stream code was killed on c1iscex and the 1 Tb file removed.

Computers are now more usable.

We still need to investigate exactly what caused the code to start writing to the log file non-stop.

Update Edit:

Alex believes this was due to a missing entry in the /diskless/root/etc/hosts file on the fb machine.  It didn't list the IP and hostname for the c1iscex machine.  I have now added it.  c1iscex had been added to the /etc/dhcp/dhcpd.conf file on fb, which is why it was able to boot at all in the first place.  With the addition of the automatic start up of mx_streams in the past week by Alex, the code started, but without the correct ip address in the hosts file, it was getting confused about where it was running and constantly writing errors.


When adding a new FE machine, add its IP address and its hostname to the /diskless/root/etc/hosts file on the fb machine.

  3870   Fri Nov 5 16:40:22 2010 josephb, alexUpdateCDSc1ioo now has working awgtpman


We couldn't set testpoints on the c1ioo machine.


Awgtpman was getting into some strange race condition.  Alex added an additional sleep statement and shifted boot order slightly in the rc.local file.  This apparently is only a problem on c1ioo, which is a Sun X4600.  It was using up 100% of a single CPU for the awgtpman process.


We now have c1ioo test points working.


Need to examine the startc1ioo script and see if needs a similar modification, as that was tried at several points but yielded a similar state of non-functioning test points. For the moment, reboot of c1ioo is probably the best choice after making modifications to the c1ioo or c1x03 models.

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder  TDS
  4003   Wed Dec 1 12:02:49 2010 josephb, alexUpdateCDSRebuilding frame builder with latest code


The front ends seem to have different gps timestamps on the data than the frame builder has when receiving them.

One theory is we have fairly been doing SVN checkouts of the code for the front ends once a week or every two weeks, but the frame builder has not been rebuilt for about a month. 

Current Action:

Alex is currently rebuilding the frame builder with the latest code changes.

It also suggests I should try rebuilding the frame builder on a semi-regular basis as updates come in.



  4037   Thu Dec 9 12:28:52 2010 josephb, alexUpdateCDSThe Dolphin is in (Reflected memory that is)

Setting the Configurations files:

On the fb machine in /etc/dis/ there are several configurations files that need to be set for our dolphin network.

First, we modify networkmanager.conf.

We set  "-dimensionX 2;" and leave the dimensionY and dimensionZ as 0.  If we had 3 machines on a single router, we'd set X to 3, and so forth.

We then modify dishosts.conf.

We add an entry for each machine that looks like:

#Keyword name nodeid adapter link_width
ADAPTER:  c1sus_a0 4 0 4

The nodeids (the first number after the name)  increment by 4 each time, so c1lsc is:

ADAPTER:  c1lsc_a0 8 0 4

The file cluster.conf is automatically updated by the code by parsing the dishosts.conf and networkmanager.conf files.

Getting the code to automatically start:

We uncommented the following lines in the rc.local file in /diskless/root/etc on the fb machine:

# Initialize Dolphin
sleep 2
# Have to set it first to node 4 with dxconfig or dis_nodemgr fails. Unexplai   ned.
/opt/DIS/sbin/dxconfig -c 1 -a 0 -slw 4 -n 4
/opt/DIS/sbin/dis_nodemgr -basedir /opt/DIS

For the moment we left the following lines commented out:

# Wait for Dolphin to initialize on all nodes
We were unsure of the effect of the dolphin_wait script on the front ends without Dolphin cards.  It looks like the script it calls waits until there are no dead nodes.

In /etc/conf.d/ on the fb machine we modified the local.start file by uncommenting:


This starts the Dolphin network manager on the fb machine.  The fb machine is not using a Dolphin connection, but controls the front end Dolphin connections via ethernet.

The Dolphin network manager can be interacted with by using the dxadmin program (located in /opt/DIS/sbin/ on the fb machine).  This is a GUI program so use ssh -X when logging into the fb before use.

Setting up the front ends models:

Each IOP model (c1x02, c1x04) that runs on a machine using the Dolphin RFM cards needs to have the flag pciRfm=1 set in the configuration box (usually located in the upper left of the model in Simulink).  Similarly, the models actually making use of the Dolphin connections should have it set as well.  Use the PCIE_SignalName parts from IO_PARTS in the CDS_PARTS.mdl file to send and receive communications via the Dolphin RFM.

  4045   Mon Dec 13 11:56:32 2010 josephb, alexUpdateCDSDolphin is working


The dolphin RFM was not sending data between c1lsc and c1sus.


Dig into the controller.c code located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe/.  Find this bit of code on line 2173:


2173 #ifdef DOLPHIN_TEST
2174 #ifdef X1X14_CODE
2175         static const target_node = 8; //DIS_TARGET_NODE;
2176 #else
2177         static const target_node = 12; //DIS_TARGET_NODE;
2178 #endif
2179         status = init_dolphin(target_node);

Replace it with this bit of code:

2173 #ifdef DOLPHIN_TEST
2174 #ifdef C1X02_CODE
2175         static const target_node = 8; //DIS_TARGET_NODE;
2176 #else
2177         static const target_node = 4; //DIS_TARGET_NODE;
2178 #endif
2179         status = init_dolphin(target_node);

Basically this was hard coded for use at the site on their test stands.  When starting up, the dolphin adapter would look for a target node to talk to, that could not be itself.  So, all the dolphin adapters would normally try to talk to target_node 12, unless it was the X1X14 front end code, which happened to be the one with dolphin node id 12.  It would try to talk to node 8.

Unfortunately, in our setup, we only had nodes 4 and 8.  Thus, both our codes would try to talk to a nonexistent node 12.  This new code has everyone talk to node 4, except the c1x02 process which talks to node 8 (since it is node 4 and can't talk to itself).

I'm told this stuff is going away in the next revision and shouldn't have this hard coded stuff.

Different Dolphin Problem and Fix:

Apparently, the only models which should have pciRfm=1 are the IOP models which have a dolphin connection.  Front end models that are not IOP models (like c1lsc and c1rfm) should not have this flag set.  Otherwise they include the dolphin drivers and causes them and the IOP to refuse to unload when using rmmod.

So pciRfm=1 only in IOP models using Dolphin, everyone else should not have it or should have pciRfm=-1.


Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Dolphin RFM Sim.Plant Frame builder TDS
  4184   Fri Jan 21 17:59:27 2011 josephb, alexUpdateCDSFixed Dolphin transmission

The orientation of the Dolphin cards seems to be opposite on c1lsc and c1sus.  The wide part is on top on c1lsc and on the bottom on c1sus.  This means, the cable is plugged into the left Dolphin port on c1lsc and into the right Dolphin port on c1sus.  Otherwise you get a wierd state where you receive but not transmit.

  2627   Mon Feb 22 12:48:31 2010 josephb, alex, kojiUpdateComputersFE machines now coming up

Even after bringing up the fb40m, I was unable to get the front ends to come up, as they would error out with an RFM problem.

We proceeded to reboot everything I could get my hands on, although its likely it was daqawg and daqctrl which were the issue, as on the C0DAQ_DETAIL screen their status had been showing as 0xbad, but after the reboot showed up as 0x0.  They had originally come up before the frame builder had been fixed, so this might have been the culprit.  In the course of rebooting, I also found c1omc and c1lsc had been turned off as well, and turned them on.

After this set of reboots, we're now able to bring the front ends up one by one.

  3673   Thu Oct 7 17:19:55 2010 josephb, alex, rolfUpdateCDSc1sus status

As noted by Koji, Alex and Rolf stopped by.

We discussed the feasibility of getting multiple models using the same DAC.  We decided that we infact did need it. (I.e. 8 optics through 3 DACs does not divide nicely), and went about changing the controller.c file so as to gracefully handle that case.  Basically it now writes a 0 to the channel rather than repeating the last output if a particular model goes down that is sharing a DAC.

In a separate issue, we found that when skipping DACs  in a model (say using DACs 1 and 2 only) there was a miscommunication to the IOP, resulting in the wrong DACs getting the data.  the temporary solution is to have all DACs in each model, even if they are not used.  This will eventually be fixed in code.

At this point, we *seem* to be able to control and damp optics.  Look for a elog from Yuta confirming or denying this later tonight (or maybe tomorrow).


  3836   Mon Nov 1 14:53:30 2010 josephb, alex, rolfUpdateCDSAlex updated FB and mx_streams code


The framebuilder was being flaky.  MX_streams would go down, prevent testpoints from working and so forth.


Send Alex up North for a week to fix the code. 

Alex came back and installed updates to the frame builder and the mx_streams code (Myrinet Express over Generic Ethernet Hardware) used by the front ends to talk to the frame builder.  Instead of 1 stream per model, there's now just 1 per front end handling all communications.

Alex did an SVN update and we now have the latest CDS code.

Self restarting codes:

The frame builder code (daqd) and nds pipe have been added to the fb machine's inittab.  Specifically it calls a script called /opt/rtcds/caltech/c1/target/fb/start_daqd.inittab and /opt/rtcds/caltech/c1/target/fb/start_nds.inittab.

The addition to the /etc/inittab file on fb is:



When these codes die they should automatically restart.

Self starting codes at boot up:

The front ends now start the mx_stream script (which lives in /opt/rtcds/caltech/c1/target/fb/ directory) at boot up.  They call it with the approriate command line options for that front end. It can be found in the /etc/rc.local file.

They look like: mx_stream -s "c1x02 c1sus c1mcs c1rms" -d fb:0

As always, the front end codes to be started are defined in the /etc/rtsystab file (or on fb, in the /diskless/root/etc/rtsystab file).

However, if it does go down you would need to restart it manually, although it seems more robust now and doesn't seem to go down every time we restart the frame builder.

All the usual front end IOCs and modules should be started and loaded on boot up as well.


Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder  ...
  4004   Wed Dec 1 13:41:21 2010 josephb, alex, rolfUpdateCDSTiming is back


We had timing problems across the front ends.


Noticing that the 1PPS reference was not blinking on the Master Timing Distribution box.  It was supposed to be getting a signal from the c0dcu1 VME crate computer, but this was not happening.

We disconnected the timing signal going into c0dcu1, coming from c0daqctrl, and connected the 1PPS directly from c0daqctrl to the Ref In for the Master Timing distribution box (blue box with lots of fibers coming out of it in 1X5).

We now have agreement in timing between front ends.

After several reboots we now have working RFM again, along with computers who agree on the current GPS time along with the frame builder.


RFM is back and testpoints should be happy.

We still don't have a working binary output for the X end.  I may need to get a replacement backplane with more than 4 slots if the 1st slot of this board has the same problem as the large boards.

I have burt restored the c1ioo, c1mcs, c1rms, c1sus, and c1scx processes, and optics look to be damped.


  4129   Mon Jan 10 16:39:36 2011 josephb, alex, rolfUpdateCDSNew Time server for frame builder and 1PPS

Alex and Rolf came over today with a Tempus LX  GPS network timing server.  This has an IRIG-B output and a 1PPS output.  It can also be setup to act as an NTP server (although we did not set that up).

This was placed at waist height in the 1X7 rack.  We took the cable running to the presumably roof mounted antenna from the VME timing board and connected it to this new timing server.  We also moved the source of the 1PPS signal going to the master timer sequencer (big blue box in 1X7 with fibers going to all the front ends) to this new time server.  This system is currently working, although it took about 5 minutes to actually acquire a timing signal from the GPS satellites.  Alex says this system should be more stable, with no time jumps. 

I asked Rolf about the new timing system for the front ends, he had no idea when that hardware would be available to the 40m.

Currently, all the front ends and the frame builder agree on the time.  Front ends are running so the 1 PPS signal appears to be working as well.

  4130   Mon Jan 10 16:47:08 2011 josephb, alex, rolfUpdateCDSFixed c1lsc dolphin reflected memory

While Alex and Rolf were visiting, I pointed out that the Dolphin card was not sending any data, not even a time stamp, from the c1lsc machine.

After some poking around, we realized the IOP (input/output processor) was coming up before the Dolphin driver had even finished loading. 

We uncommented the line


in the /diskless/root/etc/rc.local file on the frame builder.  This waits until the dolphin module is fully loaded, so it can hand off a correct pointer to the memory location that the Dolphin card reads and writes to.  Previously, the IOP had been receiving a bad pointer since the Dolphin driver had not finished loading.

So now the c1lsc machine can communicate with c1sus via Dolphin and from there the rest of the network via the traditional Ge Fanuc RFM.

  3036   Wed Jun 2 17:34:33 2010 josephb, alex, valeraUpdateCDSCDS updates

From what I understand, Alex rewrote portions of the framebuilder and testpoint codes and then recompiled them in order to get more than 1 testpoint per front end working.   I've tested up to 5 testpoints at once so far, and it worked.

We also have a new noise component added to the RCG code.  This piece of code uses the random number generator from chapter 7.1 of Numerical Recipies Third Edition to generate uniform numbers from 0 to 1.  By placing a filter bank after it should give us sufficient flexibility in generating the necessary noise types.  We did a coherence test between two instances of this noise piece, and they looked pretty incoherent.  Valera will add a picture of it when it finishe 1000 averages to this elog.

I'm in the process of propagating the old suspension control filters to the new RCG filter banks to give us a starting point.  Tomorrow Valera and I are planning to choose a subset of the plant filters  and put them in, and then work out some initial control filters to correspond to the plant.  I also need to think about adding the anti-aliasing filters and whitening/dewhitening filters.


  3251   Tue Jul 20 15:06:48 2010 josephb, bobOmnistructureGeneralNew speakers in ceiling in control room

After the crane training, Bob attached speakers to the ceiling right next to the projector, for use with presentations.

  4510   Mon Apr 11 14:17:22 2011 josephb, jamieUpdateCDSFrame wiper script installed

[Joe, Jamie, Alex]


I asked Alex which cron to use (dcron? frcron?).  He promptly did the following:

emerge dcron

rc-update add dcron default

Copied the wiper.pl script from LLO to /opt/rtcds/caltech/c1/target/fb/

At that point, I modified wiper.pl script to reduce to 95% instead of 99.7%.

I added controls to the cron group on fb:

sudo gpasswd -a controls cron

I then added the wiper.pl to the crontab as the following line using crontab -e.

0 6 * * *       /opt/rtcds/caltech/c1/target/fb/wiper.pl --delete &> /opt/rtcds/caltech/c1/target/fb/wiper.log


Note, placing backups on the /frames raid array will break this script, because it compares the amount in the /frames/full/, /frames/trends/minutes, and /frames/trends/seconds to the total capacity. 

Apparently, we had backups from September 27th, 2010 and March 22nd, 2011.  These would have broken the script in any case. 

We are currently removing these backups, as they are redundant data, and we have rsync'd backups of the frames and trends.  We should now have approximately twice the lookback of full frames.

  4583   Thu Apr 28 16:12:19 2011 josephb, jamieUpdateCDSNew CDS model SVN


We are now using the LIGO CDS SVN for storing our control models.

The SVN is at:


The models are under cds_user_apps, then trunk, then approriate subsystem (ISC for c1lsc for example), c1 (for caltech 40m), then models.

We have checked out /cds_user_apps to /opt/rtcds/.

So to find the c1lsc.mdl model, you would go to /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1lsc.mdl

This SVN is shared by many people LIGO, so please follow good SVN practice.  Remember to update models ("svn update") before doing commits.  Also, after making changes please do an update to the SVN so we have a record of the changes.

New Practices

We are creating soft links in the /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/simLink/ to the models that you need to build.  So if you want to add a new model, please add it to the cds_users_apps SVN in the correct place and create a soft link to the simLink directory.

lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1sus.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1sus.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1sup.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1sup.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1spy.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1spy.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1spx.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1spx.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1scy.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1scy.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1scx.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1scx.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1mcs.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1mcs.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x05.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x05.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x04.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x04.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x03.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x03.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x02.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x02.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x01.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x01.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1rfm.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1rfm.mdl
lrwxrwxrwx 1 controls controls      55 Apr 28 14:41 c1dafi.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1dafi.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1pem.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1pem.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1mcp.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1mcp.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1lsp.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1lsp.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1lsc.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1lsc.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1ioo.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1ioo.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1gpv.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1gpv.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1gfd.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1gfd.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1gcv.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1gcv.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1ass.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1ass.mdl

  2067   Thu Oct 8 11:10:50 2009 josephb, jenneUpdateComputersEPICs Computer troubles

At around 9:45 the RFM/FB network alarm went off, and I found c1asc, c1lsc, and c1iovme not responding. 

I went out to hard restart them, and also c1susvme1 and c1susvme2 after Jenne suggested that.

c1lsc seemed to have a promising come back initially, but not really.  I was able to ssh in and run the start command.  The green light under c1asc on the RFMNETWORK status page lit, but the reset and CPU usage information is still white, as if its not connected.   If I try to load an LSC channel, say like PD5_DC monitor, as a testpoint in DTT it works fine, but the 16 Hz monitor version for EPICs is dead.  The fact that we were able to ssh into it means the network is working at least somewhat.

I had to reboot c1asc multiple times (3 times total), waiting a full minute on the last power cycle, before being able to telnet in.  Once I was able to get in, I restarted the startup.cmd, which did set the DAQ-STATUS to green for c1asc, but its having the same lack of communication as c1lsc with EPICs.

c1iovme was rebooted, was able to telnet in, and started the startup.cmd.  The status light went green, but still no epics updates.

The crate containing c1susvme1 and c1susvme2 was power cycled.  We were able to ssh into c1susvme1 and restart it, and it came back fully.  Status light, cpu load and channels working.  However I c1susvme2 was still having problems, so I power cycled the crate again.  This time c1susvme2 came back, status light lit green, and its channels started updating.

At this point, lacking any better ideas, I'm going to do a full reboot, cycling c1dcuepics and proceeding through the restart procedures.

  3156   Fri Jul 2 11:06:38 2010 josephb, kiwamuUpdateCDSCDS and Green locking thoughts

Kiwamu and I went through and looked at the spare channels available near the PSL table and at the ends.

First, I noticed I need another 4 DB37 ADC adapter box, since there's 3 Pentek ADCs there, which I don't think Jay realized.

PSL Green Locking

Anyways, in the IOO chassis that will put in, for the ADC we have a spare 8 channels which comes in the DB37 format.  So one option, is build a 8 BNC converter, that plugs into that box.

The other option, is build 4-pin Lemo connectors and go in through the Sander box which currently goes to the 110B ADC, which has some spare channels.

For DAC at the PSL, the IOO chassis will have 8 spare channel DAC channels since there's only 1 Pentek DAC.  This would be in a IDC40 cable format, since thats what the blue DAC adapter box takes.  A 8 channel DAC box to 40 pin IDC would need to be built.


End Green Locking

The ends have 8 spare DAC channels, again 40 pin IDC cable.   A box similar to the 8 channel DAC box for the PSL would need to be built.

The ends also have spare 4-pin Lemo capacity.  It looked like there were 10 channels or so still unused.  So lemo connections would need to be made.  There doesn't appear to be any spare 37 DB connectors on the adapter box available, so lemo via the Sander box is the only way.



Joe needs to provide Kiwamu with cabling pin outs.

If Kiwamu makes a couple spares of the 8 BNC to 37DB connector boards, there's a spare 37DB ADC input in the SUS machine we could use up, providing 8 more channels for test use.

ELOG V3.1.3-