40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m elog, Page 261 of 357  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  3855   Wed Nov 3 17:01:01 2010 josephbSummaryCDSComparison of RFM read times

Problem:

RFM reads are slow.  Rolf has said it should take 2-3 microseconds per read. 

c1sus is taking about 7 microseconds per read, twice as slow as Rolf's claim.

Hypothesis:

The RFM card is in the IO chassis, and is sharing the PCIe bus with 4 ADC cards, 3 DAC cards, 4 BO cards, and a BIO card.  Its possible this crowded bus is causing the reads to take even longer.

Test Results:

Compare read times between the c1sus computer, which has its RFM card in the IO chassis, to c1ioo, which has its RFM card in the computer.

c1ioo:

No RFM reads: 8 microseconds

3 RFM reads: 20 microseconds (~4 per read)

6 RFM reads: 32 microseconds (~4 per read)

c1sus:

No RFM read: 25 microseconds (bigger model)

1 RFM read: 33 microseconds (~8 per read)

3 RFM read: 45 microseconds (~7 per read)

6 RFM read: Over 62 microseconds, doesn't run.

Conclusion:

It looks like moving the RFM card may help by about a factor of 2 in read speed, although its still not quite what Alex and Rolf claim it should be.

The c1mcs and c1ioo models have been reverted to their normal operations.

 

  3860   Thu Nov 4 15:15:43 2010 josephbUpdateCDSModified feCodeGen.pl, fmseq.pl, and suspension screens

Feature Requested:

Have the CPU_meter change change color at various alarm levels.  These alarm levels have been set at 2/3 maximum for Minor alarm (yellow) and 9/10 maximum for Major alarm (red).

Implementation:

Rather than hand code each EPICS .db file to add the alarm files each time we rebuild the front ends, I decided to modify it at the source (since it strikes me as a generally useful alarm level for all front end codes).

First, I modified the feCodeGen.pl script.

I changed

print EPICS "OUTVARIABLE FEC\_$dcuId\_CPU_METER epicsOutput.cpuMeter int ai 0 field(HOPR,\"$rate\") field(LOPR,\"0\")\n";

to

print EPICS "OUTVARIABLE FEC\_$dcuId\_CPU_METER epicsOutput.cpuMeter int ai 0 field(HOPR,\"$rate\") field(LOPR,\"0\") field(HIGH,\"$two_thirds_rate\") field(HSV,\"MINOR\") field(HIHI,\"$     nine_tenths_rate\") field(HHSV,\"MAJOR\")\n";

I added the following two lines just before it as well:

$two_thirds_rate = int($rate * 2 / 3);

$nine_tenths_rate = int($rate * 9 / 10);

However, only the first four fields were actually added to the database file.  Apparently fmseq.pl, which populated the database, was hard coded to only handle up to 4 fields.

I modified the fmseq.pl script in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util/ so as to be able to handle up to 6 field values when writing EPICS .db files.

This change was accomplished by simply changing the following line

($junk, $v_name, $v_var, $v_type, $ve_type, $v_init, $v_efield1, $v_efield2, $v_efield3, $v_efield4 ) = split(/\s+/, $_);

to 

($junk, $v_name, $v_var, $v_type, $ve_type, $v_init, $v_efield1, $v_efield2, $v_efield3, $v_efield4, $v_efield5, $v_efield6 ) = split(/\s+/, $_);

everywhere it occurred.  There were something like 10 instances of it. Also, I added the two lines

        $vardb .= "    $v_efield5\n";
        $vardb .= "    $v_efield6\n";

after each set of

         $vardb .= "    $v_efield1\n";
         $vardb .= "    $v_efield2\n";
         $vardb .= "    $v_efield3\n";
         $vardb .= "    $v_efield4\n";

Lastly, I modified the CPU_METER bar graph on the C1SUS_DEFAULTNAME.adl screen (located in /opt/rtcds/caltech/c1/medm/master/) to use alarm levels, and then ran generate_master_screens.py.

 

  3900   Thu Nov 11 21:07:49 2010 josephbUpdateCDSPlugged c1iscex into DAQ network - still causes network slowdown

I connected the c1iscex computer to the dedicated DAQ network switch (located in 1X7).

This does not seem to have helped c1iscex stop spewing out "OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)" at the rate of ~1 Gigabyte per minute.

c1iscex is currently off until a solution can be found.

  3919   Mon Nov 15 11:13:12 2010 josephbUpdateCDSModified rc.local to not start mx_streams automatically

Problem:

c1iscex floods the network with about 1 gigabyte of error messages in a few seconds, writing to a log file in /opt/rtcds/caltech/c1/target/fb/logs/

Temporary change:

I commented out the following line in the rc.local file on the fb machine in the /diskless/root/etc/ directory:

#nice --20 ./mx_stream -s "$SYSTEMS" -d fb:0 >& logs/$HOSTNAME.log&

This disables the automatic start up of the mx_streams code on all the front ends.  This will prevent the network being brought to its knees by c1iscex while we debug the problem.

It also means on a reboot of the front ends, the mx_stream process needs to be started by hand until this change is reverted.

To do this, log into the front end and then change directory to /opt/rtcds/caltech/c1/target/fb

For c1sus, run:

./mx_stream -s c1x02 c1sus c1mcs c1rms c1rfm  -d fb:0

For c1ioo, run:

./mx_stream -s c1x03 c1ioo -d fb:0

 

  3926   Mon Nov 15 16:26:46 2010 josephbUpdateCDSc1iscex is now running and the network hasn't died

Problem:

c1iscex was spamming the network with error messages.

Solution:

Updated the front end codes to current standards (they were on the order of months out of date).  After fixing them up and rebuilding the codes on c1iscex, it no longer had problems connecting to the frame builder.\

Status:

I can look at test points for ETMX.  It is not currently damping however.

To Do:

Move filters for ETMX into the correct files. 

Need to add a Binary output blue and gold box to the end rack, and plug it into the binary output card.  Confirm the binary output logic is correct for the OSEM whitening, coil dewhitening, and QPD whitening boards. 

Get ETMX damped.

Figure out what we're going to do with the aux crate which is currently running y-end code at the new x-end.  Koji suggested simply swapping auxilliary crates - this may be the easiest.  Other option would be to change the IP address, so that when it PXE boots it grabs the x-end code instead of the y-end code.

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder TDS
                     
  3938   Wed Nov 17 10:39:20 2010 josephbUpdateCDSScreen Time Fix

An improved python code to apply a replacement to all *.adl files in a directory would be:

import re, os
files = os.listdir("./")
  for file in files:
    if ".adl" in str(file):
      data = open(file).read()
      o = open(file,"w")
      o.write( re.sub("C0:TIM-PACIFIC_STRING","C1:FEC-34_TIME_STRING",data)  )
      o.close()

Of course, this entire python script can be replaced with a single sed command:

sed -i 's/C0:TIM-PACIFIC_STRING/C1:FEC-34_TIME_STRING/g' *

A more complicated script could be written which looks for key identifiers either in the file header or inside the file to determine which front end is appropriate, using a dictionary like:

dcuid_dict = {"BS":21,"PRM":37,"SRM":37,"ITMX":21,"ITMY":21,"MC1":36,"MC2":36,"MC3":36,"ETMX":24,"ETMY":26}

and then using for loops and if statements.

 

  3940   Wed Nov 17 16:02:30 2010 josephbUpdateCDSModified feCodeGen.pl to fix filtMuxMatrix name generation

Problem:

Sometime in the last 3 weeks, probably when Alex brought his latest changes from Hanford to the 40m and did an SVN update, the code which generates the names of the filter .adl files links for the overall matrix view broke.

Fix:

I modified FE code gen to use $basename instead of the base name after the top name transform (this changes _ to - after the first 3 letters

@@ -3520,11 +3522,11 @@
 
                  my $tn = top_name_transform($basename);
                  my $basename1 = $usite . ":" . $tn . "_";
-                 my $filtername1 = $usite . $tn;
+                 my $filtername1 = $usite . $basename;

Still having problems:

The filter modules built with the matrix of filter modules run (offests/gains work), but will not load filter coefficients/filter names.  All the other filter modules outside the matrix seem to load fine.  At this point, doing a rebuild of any of the front end machines may cause the A2L filter banks to be unloadable.

 

  3945   Thu Nov 18 11:06:20 2010 josephbUpdateCDSc1sus and ADCs

Problem:

ADCs are timing out on c1sus when we have more than 3.

Talked with Rolf:

Alex will be back tomorrow (he took yesterday and today off), so I talked with Rolf.

He said ordering shouldn't make a difference and he's not sure why would be having a problem. However, when he loads the chassis, he tends to put all the ADCs on the same PCI bus (the back plane apparently contains multiples).  Slot 1 is its own bus, Slots 2-9 should be the same bus, and 10-17should be the same bus.

He also mentioned that when you use dmesg and see a line like "ADC TIMEOUT # ##### ######", the first number should be the ADC number, which is useful for determining which one is reporting back slow.

Plan:

Disconnect c1sus IO chassis completely, pull it out, pull out all cards, check connectors, and repopulate with Rolf's suggestions and keeping this elog in mind.

In regards to the RFM, it looks like one of the fibers had been disconnected from  the c1sus chassis RFM card (its plugged in in the middle of the chassis so its hard to see) during all the plugging in and out of the cables and cards last night.

  3947   Thu Nov 18 14:19:01 2010 josephbUpdateCDSSwapped c1auxex and c1auxey codes

Problem:

We had not switched the c1aux crates when we renamed the arms, thus the watchdogs labeled ETMX were really watching ETMY and vice-versa.

Solution:

I used telnet to connect to c1auxey, and then c1auxex.

I used the bootChange command to change the IP address of c1auxey to 192.168.113.59 (c1auxex's IP), and its startup script.  Similarly c1auxex was changed to c1auxey and then both were rebooted.

 

c1auxey > bootChange

'.' = clear field;  '-' = go to previous field;  ^D = quit

boot device          : ei
processor number     : 0
host name            : linux1
file name            : /cvs/cds/vw/mv162-262-16M/vxWorks
inet on ethernet (e) : 192.168.113.60:ffffff00 192.168.113.59:ffffff00
inet on backplane (b):
host inet (h)        : 192.168.113.20
gateway inet (g)     :
user (u)             : controls
ftp password (pw) (blank = use rsh):
flags (f)            : 0x0
target name (tn)     : c1auxey c1auxex
startup script (s)   : /cvs/cds/caltech/target/c1auxey/startup.cmd /cvs/cds/caltech/target/c1auxex/startup.cmd
other (o)            :

value = 0 = 0x0

c1auxex > bootChange

'.' = clear field;  '-' = go to previous field;  ^D = quit

boot device          : ei
processor number     : 0
host name            : linux1
file name            : /cvs/cds/vw/mv162-262-16M/vxWorks
inet on ethernet (e) : 192.168.113.59:ffffff00 192.168.113.60:ffffff00
inet on backplane (b):
host inet (h)        : 192.168.113.20
gateway inet (g)     :
user (u)             : controls
ftp password (pw) (blank = use rsh):
flags (f)            : 0x0
target name (tn)     : c1auxex c1auxey
startup script (s)   : /cvs/cds/caltech/target/c1auxex/startup.cmd /cvs/cds/caltech/target/c1auxey/startup.cmd
other (o)            :

value = 0 = 0x0

  3954   Fri Nov 19 12:53:50 2010 josephbUpdateCDSTestpoints on c1iscex now working

Problem:

c1iscex did not have test points working last night.

Solution:

The diag -i command indicated that :

awg 19 0 192.168.113.80 822095891 1 192.168.113.80

awg 45 0 192.168.113.80 822095917 1 192.168.113.80

The first number after the awg should be the DCUID number.  The IP address 192.168.113.80 corresponds to c1iscex.  So we had awg and testpoints setup for DCUI 19 and 45 on c1iscex.  DCUID 19 is c1x01 (the IOP), but 45 was used for a test awhile back. 

Turns out that in the testpoint.par file located in /cvs/cds/rtcds/caltech/c1/target/gds/param, there were two entries for c1scx, one with DCUID 24 and also DCUID 45.  The model at the time was running with DCUID 24.

So I changed the model DCUID to 45, deleted the [C-node24] entry in the testpoint.par file, and restarted the machine, and also did a "telnet fb 8088" and "shutdown" to restart the frame builder.

  3962   Mon Nov 22 12:00:18 2010 josephbUpdateCDSUpdated Computer Restart Procedures for FB

I've updated the  Computer Restart Procedures  page in the wiki with the latest fb restart procedure.

To just restart just the daqd (frame builder) process, do:

1) telnet fb 8088

2) shutdown

The init process will take care of the rest and restart daqd automatically.

 

Background:
Plan:
  - Check the wiring after SOS Coil Driver Module and circuit around SDSEN
  - Check whitening and dewhitening filters. We connected a binary output cable, but didn't checked them yet.
  - Make a script for step 2
  - Activate new DAQ channels for ETMX (what is the current new fresh up-to-date latest fb restart procedure?)

 

  3963   Mon Nov 22 13:16:52 2010 josephbSummaryCDSCDS Plan for the week

CDS Objectives for the Week:

Monday/Tuesday:

1) Investigate ETMX SD sensor problems

2) Fully check out the ETMX suspension and get that to a "green" state.

3) Look into cleaning up target directories (merge old target directory into the current target directory) and update all the slow machines for the new code location.

4) Clean up GDS apps directory (create link to opt/apps on all front end machines).

5) Get Rana his SENSOR, PERROR, etc channels.

Tuesday/Wednesday:

3) Install LSC IO chassis and necessary cabling/fibers.

4) Get LSC computer talking to its remote IO chassis

Wednesday:

5) If time, connect and start debugging Dolphin connection between LSC and SUS machines

 

  3964   Mon Nov 22 16:16:04 2010 josephbUpdateCDSDid an SVN update on the CDS code

Problem:

The CDS oscillator part doesn't work inside subsystems.

Solution:

Rolf checked in an older version of the CDS oscillator which includes an input (which you just connect to a ground).  This makes the parser work properly so you can build with the oscillator in a subsystem.

So I did an SVN checkout and confirmed that the custom changes we have here were not overwritten.

Edit:

Turns out the latest svn version requires new locations for certain codes, such as EPICS installs.  I reverted back to version 2160, which is just before the new EPICs and other rtapps directory locations, but late enough to pick up the temporary fix to the CDS oscillator part.

  3965   Mon Nov 22 17:48:11 2010 josephbUpdateCDSc1iscex is not seeing its Binary Output card

Problem:

c1iscex does not even see its 32 channel Binary output card.  This means we have no control over the state of the analog whitening and dewhitening filters.  The ADC, DAC, and the 1616 Binary Input/Output cards are recognized and working.

Things tried:

Tried recreating the IOP code from the known working c1x02 (from the c1sus front end), but that didn't help.

Checked seating of the card, but it seems correctly socketed and tightened down nicely with a screw.

Tomorrow will try moving cards around and see if there's an issue with the first slot, which the Binary Output card is in.

Current Status:

The ETMX is currently damping, including POS, PIT, YAW and SIDE degrees of freedom.  However, the gds screen is showing a 0x2bad status for the c1scx front end (the IOP seems fine with a 0x0 status).  So for the moment, I can't seem to bring up c1scx testpoints.  I was able to do so earlier when I was testing the status of the binary outputs, so during one of the rebuilds, something broke. I may have to undo the SVN update and/or a change made by Alex today to allow for longer filter bank names beyond 19 characters.

  3974   Tue Nov 23 10:53:20 2010 josephbUpdateCDStiming issues

Problem:

Front ends seem to be experiencing a timing issue.  I can visibly see a difference in the GPS time ticks between models running on c1ioo and c1sus. 

In addition, the fb is reporting a 0x2bad to all front ends.  The 0x2000 means a mismatch in config files, but the 0xbad indicates an out of sync problem between the front ends and the frame builder.

Plan:

As there are plans to work on the optic tables today and suspension damping is needed, we are holding off on working on the problem until this afternoon/evening, since suspensions are still damping.  It does mean the RFM connections are not available.

At that point I'd like to do a reboot of the front ends and framebuilder and see if they come back up in sync or not.

  3975   Tue Nov 23 11:20:30 2010 josephbUpdateCDSCleaning up old target directory

Winter Cleaning:

I cleaned up the /cvs/cds/caltech/target/ directory of all the random models we had built over the last year, in preparation for the move of the old /cvs/cds/caltech/target slow control machine code into the new /opt/rtcds/caltech/c1/target directories.

I basically deleted all the directories generated by the RCG code that were put there, including things like c1tst, c1tstepics, c1x00, c1x00epics, and so forth.  Pre-RCG era code was left untouched.

  3978   Tue Nov 23 16:55:14 2010 josephbUpdateCDSUpdated apps

Updated Apps:

I created a new setup script for the newest build of the gds tools (DTT, foton, etc), located in /opt/apps (which is a soft link from /cvs/cds/apps) called gds-env.csh.

This script is now sourced by cshrc.40m for linux 64 bit machines.  In addition, the control room machines have a soft link in the /opt directory to the /cvs/cds/apps directory.

So now when you type dtt or foton, it will bring up the Centos compiled code Alex copied over from Hanford last month.

  3994   Tue Nov 30 12:10:44 2010 josephbUpdateelogElog restarted again

The elog seemed to be down at around 12:05pm.  I waited a few minutes to see if the browser would connect, but it did not.

I used the existing script in /cvs/cds/caltech/elog/ (as opposed to Zach's new on in elog/elog-2.8.0/) which also seems to have worked fine.

  3995   Tue Nov 30 12:25:08 2010 josephbUpdateCDSLSC computer to chassis cable dead

Problem:

We seemed to have a broken fiber link for use between the LSC and its IO chassis.  It is unclear to mean when this damage occurred.  The cable had been sitting in a box with styrofoam padding, and the kink is in the middle of the fiber, with no other obvious damage near by.  The cable however may have previously been used by the people in Downs for testing and possibly then.  Or when we were stringing it, we caused a kink to happen.

Tried Solutions:

I talked to Alex yesterday, and he suggested unplugging the power on both the computer and the IO chassis completely, then plugging in the new fiber connector, as he had to do that once with a fiber connection at Hanford.  We tried this this morning, however, still no joy.  At this point I plan to toss the fiber as I don't know of any way to rehabilitate kinked fibers.

Note this means that I rebooted c1sus and then did a burt restore from the Nov/30/07:07 directory for c1suspeics, c1rmsepics, c1mcsepics.  It looks like all the filters switched on.

Current Plan:

We do, however, have the a Dolphin fiber which originally was intended to go between the LSC and its IO chassis, before Rolf was told it doesn't work well that way.  However, we were going to connect the LSC machine to the rest of the network via Dolphin.

We can put the LSC machine next to its chassis in the LSC rack, and connect the chassis to the rest of the front ends by the Dolphin fiber.  In that case we just need the usual copper style cable going between the chassis and the computer.

 

  3999   Tue Nov 30 16:02:18 2010 josephbUpdateCDSstatus

Issues:

1) Turns out the /opt/rtcds/caltech/c1/target/gds/param/testpoint.par file had been emptied or deleted at one point, and the only entry in it was c1pem.  This had been causing us a lack of test points for the last few days.  It is unclear when or how this happened.  The file has been fixed to include all the front end models again.  (Fixed)

2) Alex and I worked on tracking down why there's a GPS difference between the front ends and the frame builder, which is why we see a 0x4000 error on all the front end GDS screens. This involved several rebuilds of the front end codes and reboots of the machines involved. (Broken)

3) Still working on understanding why the RFM communication, which I think is related to the timing issues we're seeing.  I know the data is being transferred on the card, but it seems to being rejected after being red in, suggesting a time stamp mismatch. (Broken)

4) The c1iscex binary output card still doesn't work.  (Broken)

Plan:

Alex and I will be working on the above issues tomorrow morning.

Status:

Currently, the c1ioo, c1sus and c1iscex computers are running with their front ends. They all still have 0x4000 error.  However, you can still look at channels on dataviewer for example.  However, there's a possibility of inconsistent timing between computer (although all models on a single computer will be in sync).

All the front ends where burt restorted to 07:07 this morning.  I spot checked several optic filter banks and they look to have been turned on.

  4009   Fri Dec 3 15:37:10 2010 josephbUpdateCDSfb, front ends fixed - tested RFM between c1ioo and c1iscex

Problem:

The front ends and fb computers were unresponsive this morning.

This was due to the fb machine having its ethernet cable plugged into the wrong input.   It should be plugged into the port labeled 0.

Since all the front end machines mount their root partition from fb, this caused them to also hang.

Solution:

The cable has been relabled to "fb" on both ends, and plugged into the correct jack.  All the front ends were rebooted.

 

Testing RFM for green locking:

I tested the RFM connection between c1ioo and c1scx.  Unfortunately, on the first test, it turns out the c1ioo machine had its gps time off by 1 second compared to c1sus and c1iscex.  A second reboot seems to have fixed the issue.

However, it bothers me that the code didn't come up with the correct time on the first boot.

The test was done using the c1gcv model and by modifying the c1scx model.  At the moment, the MC_L channel is being passed the MC_L input of the ETMX suspension.  In the final configuration, this will be a properly shaped error signal from the green locking.

The MC_L signal is currently not actually driving the optic, as the ETMX POS MATRIX currently has a 0 for the MC_L component.

  4014   Mon Dec 6 11:59:41 2010 josephbUpdateCDSNew c1lsc computer moved to lsc rack

Computer moved:

The c1lsc computer has been moved over to the 1Y3 rack, just above the c1lsc IO chassis. 

It will talking to the c1sus computer via a Dolphin PCIe reflected memory card.  The cards have been installed into c1lsc and c1sus this morning.

It will talk to its IO chassis via the usual short IO chassis cable.

 

To Do:

The Dolphin fiber still needs to be strung between c1sus and c1lsc.

The DAQ cable between c1lsc and the DAQ router (which lets the frame builder talk directly with the front ends) also needs t to be strung.

c1lsc needs to be configured to use fb as a boot server, and the fb needs to be configured to handle the c1lsc machine.

  4015   Mon Dec 6 16:49:43 2010 josephbUpdateCDSc1lsc halfway to working

C1LSC Status:

The c1lsc computer is running Gentoo off of the fb server. It has been connected to the DAQ network and is handling mx_streams properly (so we're not flooding the network error messages like we used to with c1iscex).  It is using the old c1lsc ip address (192.168.113.62). It can ssh'd into.

However, it is not talking properly to the IO chassis.  The IO chassis turns on when the computer turns on, but the host interface board in the IO chassis only has 2 red lights on (as opposed to many green lights on the host interface boards in the c1sus, c1ioo, and c1iscex IO chassis).  The c1lsc IO processor (called c1x04) doesn't see any ADCs, DACs, or Binary cards.  The timing slave is receiving 1PPS and is locked to it, but because the chassis isn't communicating, c1x04 is running off the computer's internal clock, causing it to be several seconds off. 

Need to investigate why the computer and chassis are not talking to each other.

General Status:

The c1sus and c1ioo computers are not talking properly to the frame builder.  A reboot of c1iscex fixed the same problem earlier, however, as Kiwamu and Suresh are working in the vacuum, I'm leaving those computers alone for the moment, but a reboot and burt restore probably should be done later today for c1sus and c1ioo

 

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Dolphin RFM Sim.Plant Frame builder TDS
                       
  4020   Tue Dec 7 16:09:53 2010 josephbUpdateCDSc1iscex status

I swapped out the IO chassis which could only handle 3 PCIe cards with the another chassis which has space for 17, but which previously had timing issues.  A new cable going between the timing slave and the rear board seems to have fixed the timing issues. 

I'm hoping to get a replacement PCI extension board which can handle more than 3 cards this week from Rolf and then eventually put it in the Y-end rack.  I'm also still waiting for a repaired Host interface board to come in for that as well.

At this point, RFM is working to c1iscex, but I'm still debugging the binary outputs to the analog filters.  As of this time they are not working properly (turning the digital filters on and off seems to have no effect on the transfer function measured from an excitation in SUSPOS, all the way around to IN1 of the sensor inputs (but before measuring the digital fitlers).  Ideally I should see a difference when I switch the digital filters on and off (since the analog ones should also switch on and off), but I do not.

  4025   Wed Dec 8 12:26:56 2010 josephbUpdateCDSmegatron set up - as a test front end

[josephb, Osamu]

Megatron Setup:

To show Osamu how to setup a a front end as well as provide a test computer for Osamu's use, we used the new megatron (sunfire x4600 with 16 cores and 8 gigabytes of memory) as a front end without an IO chassis.

The steps we followed are in the wiki, here.

The new megatron's IP address is 192.168.113.209.  It is running the c1x99 front end code.

  4028   Wed Dec 8 14:51:09 2010 josephbUpdateCDSc1pem now recording data

Problem:

c1pem model was reporting all zeros for all the PEM channels.

Solution:

Two fold.  On the software end, I added ADCs 0, 1, and 2 to the model.  ADC 3 was already present and is the actual ADC taking in PEM information.

There was a problem noted awhile back by Alex and Rolf that there's a problem with the way the DACs and ADCs are number internally in the code.  Missing ADCs or DACs prior to the one you're actually using can cause problems.

At some point that problem should be fixed by the CDS crew, but for now, always include all ADCs and DACs up to and including the highest number ADC/DAC you need to use for that model.

On the physical end, I checked the AA filter chassis and found the power was not plugged in.  I plugged it in.

Status:

We now have PEM channels being recorded by the FB, which should make Jenne happier.

  4029   Wed Dec 8 17:05:39 2010 josephbUpdateCDSPut in dolphin fiber between c1sus and c1lsc

[josephb,Suresh]

We put in the fiber for use with the Dolphin reflected memory between c1sus and c1lsc (rack 1X4 to rack 1Y3).  I still need to setup the dolphin hub in the 1X4 rack, but once that is done, we should be able to test the dolphin memory tomorrow.

  4046   Mon Dec 13 17:18:47 2010 josephbUpdateCDSBurt updates

Problem:

Autoburt wouldn't restore settings for front ends on reboot

What was done:

First I moved the burt directory over to the new directory structure.

This involved moving /cvs/cds/caltech/burt/ to /opt/rtcds/caltech/c1/burt.

Then I updated the burt.cron file in the new location, /opt/rtcds/caltech/c1/burt/autoburt/.  This pointed to the new autoburt.pl script.

I created an autoburt directory in the /opt/rtcds/caltech/c1/scripts directory and placed the autoburt.pl script there.

I modified the autoburt.pl script so that it pointed to the new snapshot location.  I also modified it so it updates a directory called "latest" located in the /opt/rtcds/caltech/c1/burt/autoburt directory.  In there is a set of soft links to the latest autoburt backup.

Lastly, I edited the crontab on op340m (using crontab -e) to point to the new burt.cron file in the new location.

This was the easiest solution since the start script is just a simple bash script and I couldn't think of a quick and easy way to have it navigate the snapshots directory reliably.

I then modified the Makefile located in /opt/rtcds/caltech/c1/core/advLigoRTS/ which actually generates the start scripts, to point at the "latest" directory when doing restores.  Previously it had been pointing to /tmp/ which didn't really have anything in it.

So in the future, when building code, it should point to the correct snapshots now.  Using sed I modified all the existing start scripts to point to the latest directory when grabbing snapshots.

Future:

According to Keith directory documentation (see T1000248) , the burt restores should live in the individual target system directory i.e. /target/c1sus/burt, /target/c1lsc/burt, etc.  This is a distinctly different paradigm from what we've been using in the autoburt script, and would require a fairly extensive rewrite of that script to handle this properly.  For the moment I'm keeping the old style, everything in one directory by date.  It would probably be worth discussing if and how to move over to the new system.

  4053   Tue Dec 14 11:24:35 2010 josephbUpdateCDSburt restore

I had updated the individual start scripts, but forgotten to update the rc.local file on the front ends to handle burt restores on reboot.

I went to the fb machine and into /diskless/root/etc/ and modified the rc.local file there.

Basically in the loop over systems, I added the following line:

/opt/epics-3.14.9-linux/base/bin/linux-x86/burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/latest/${i}epics.snap  -l /opt/rtcds/caltech/c1/burt/autoburt/logs/${i}epics.log.restore -v

The ${i} gets replaced with the system name in the loop (c1sus, c1mcs, c1rms, etc)

  4057   Wed Dec 15 13:36:44 2010 josephbUpdateCDSETMY IO chassis update

I gave Alex a sob story over lunch about having to go and try to resurrect dead VME crates.  He and Rolf then took pity on me and handed me their last host interface board from their test stand, although I was warned by Rolf that this one (the latest generation board from One Stop) seems to be flakier than previous versions, and may require reboots if it starts in a bad state.

Anyways, with this in hand I'm hoping to get c1iscey damping by tomorrow at the latest.

  4060   Wed Dec 15 17:21:20 2010 josephbUpdateCDSETMY controls status

Status:

The c1iscey was converted over to be a diskless Gentoo machine like the other front ends, following the instructions found here.  Its front end model, c1scy was copied and approriately changed from the c1scx model, along with the filter banks.  A new IOP c1x05 was created and assigned to c1iscey.

The c1iscey IO chassis had the small 4 PCI slot board removed and a large 17 PCI slot board put in.  It was repopulated with an ADC/DAC/BO and RFM card.  The host interface board from Rolf was also put in. 

On start up, the IOP process did not see or recognize any of the cards in the IO chassis.

Four reboots later, the IOP code had seen the ADC/DAC/BO/RFM card once.  And on that reboot, there was a time out on the ADC which caused the IOP code to exit.

In addition to the not seeing the PCI cards most of the time, several cables still need to be put together for plugging into the the adapter boards and a box need to be made for the DAC adapter electronics.

 

  4064   Thu Dec 16 10:52:42 2010 josephbUpdateCamerasNew PoE digital cameras

We have two new Basler acA640-100gm cameras.  These are power over ethernet (PoE) and very tiny.

Attachment 1: basler.jpg
basler.jpg
  4082   Tue Dec 21 11:52:58 2010 josephbUpdateComputersRGA scripts fixed, c0rga fixed

c0rga apparently had a full hard drive.  There was 1 Gig log file in /var/log directory, called Xorg.0.log.old which I deleted which freed up about 20% of the hard drive.  This let me then modify the crontab file (which previously had been complaining about no room on disk to make edits).

I updated the crontab to look at the new scripts location, updated the RGA script itself to write to the new log location, and then created a soft link in the /opt directory to /cvs/cds/rtcds on c0rga.

The RGA script should now be running again once a day.

  4097   Fri Dec 24 09:01:33 2010 josephbUpdateCDSBorrowed ADC

Osamu has borrowed an ADC card from the LSC IO chassis (which currently has a flaky generation 2 Host interface board).  He has used it to get his temporary Dell test stand running daqd successfully as of yesterday.

This is mostly a note to myself so I remember this in the new year, assuming Osamu hasn't replaced the evidence by January 7th.

  4132   Tue Jan 11 11:19:13 2011 josephbSummaryCDSStoring FE harddrives down Y arm

Lacking a better place, I've chosen the cabinet down the Y arm which had ethernet cables and various VME cards as a location to store some spare CDS computer equipment, such as harddrives.  I've added (or will add in 5 minutes) a label "FE COMPUTER HARD DRIVES" to this cabinet.

  4135   Tue Jan 11 14:05:11 2011 josephbUpdateComputersMartian host table updated daily

I created two simple cron jobs, one running on linux1 and one running on nodus, to produce an updated copy of the martian host table linkable from the wiki every day.

The scripts live in /opt/rtcds/caltech/c1/scripts/AutoUpdate/.  One is called  updateHostTable.cron and run on linux1 everyday at 4 am, and the other is called moveHostTable.cron which is run on nodus everyday at 5am.

The new link has been added to the Martian Host table wiki page  here.

 

  4136   Tue Jan 11 16:04:17 2011 josephbUpdateCDSScript to update web views of models for all installed front ends

I wrote a new script that is in /opt/rtcds/caltech/c1/scripts/AutoUpdate/ called  webview_simlink_update.m. 

This m-file when run in matlab will go to the /opt/rtcds/caltech/c1/target directory and for each c1 front end, generate the corresponding webview files for that system and place them in the AutoUpdate directory. 

Afterwards the files can be moved on Nodus to the /users/public_html/FE/ directory with:

mv /opt/rtcds/caltech/c1/scripts/AutoUpdate/*slwebview* /users/public_html/FE/

This was run today, and the files can be viewed at:

https://nodus.ligo.caltech.edu:30889/FE/

Long term, I'd like to figure out a way of automating this to produce automatically updated screens without having to run it manually.  However, simulink seems to stubbornly require an X window to work.

  4144   Wed Jan 12 17:50:21 2011 josephbUpdateCDSWorked on c1lsc, MC2 screens

[josephb, osamu, kiwamu]

We worked over by the 1Y2 rack today, trying to debug why we didn't get any signal to the c1lsc ADC.

We turned off the power to the rack several times while examining cards, including the whitening filter board, AA board, and the REFL 33 demod board.  I will note, I incorrectly turned off power in the 1Y1 rack briefly. 

We noticed a small wire on the whitening filter board on the channel 5 path.  Rana suggested this was to part of a fix for the channels 4 and 5 having too much cross talk.  A trace was cut and this jumper added to fix that particular problem.

We confirmed would could pass signals through each individual channel on the AA and whitening filter boards.  When we put them back in, we did noticed a large offset when the inputs were not terminated.  After terminating all inputs, values at the ADC were reasonable, measuring on from 0 to about -20 counts.  We applied a 1 Hz, 0.1 Vpp signal and confirmed we saw the digital controls respond back with the correct sine wave.

We examined the REFL 33 demod board and confirmed it would work for demodulating 11 MHZ, although without tuning, the I and Q phases will not be exactly 90 degrees apart.

The REFL 33  I and Q outputs have been connected to the whitening board's 1 and 2 inputs, respectively.  Once Kiwamu  adds approriate LO and PD signals to the REFL 33 demod board he should be able to see the resulting I and Q signals digitally on the PD1 I and Q channels.

 

In an unrelated fix, we examined the suspensions screens, specifically the Dewhitening lights.  Turns out the lights were still looking at SW2 bit 7 instead of SW2 bit 5.  The actual front end models were using the correct bit (21 which corresponds to the 9th filter bank), so this was purely a display issue.  Tomorrow I'll take a look at the binary outputs and see why the analog filters aren't actually changing.

 

 

 

  4150   Thu Jan 13 14:21:13 2011 josephbUpdateCDSWebview of front end model files automated

After Rana pointed me to Yoichi's MEDM snapshot script, I learned how to use Xvfb, which is what Yoichi used to write screens without a real screen.  With this I wrote a new cron script, which I added to Mafalda's cron tab to be run once a day at 6am.

The script is called webview_update.cron and is in /opt/rtcds/caltech/c1/scripts/AutoUpdate/.

#!/bin/bash
DISPLAY=:6
export DISPLAY
#Check if Xvfb server is already running
pid=`ps -eaf|grep vfb | grep $DISPLAY | awk '{print $2}'`
if [ $pid ]; then
        echo "Xvfb already running [pid=${pid}]" >/dev/null
else
# Start Xvfb
echo "Starting Xvfb on $DISPLAY"
Xvfb $DISPLAY -screen 0 1600x1200x24 >&/dev/null &
fi
pid=$!
echo $pid > /opt/rtcds/caltech/c1/scripts/AutoUpdate/Xvfb.pid
sleep 3

#Running the matlab process
/cvs/cds/caltech/apps/linux/matlab/bin/matlab -display :6 -logfile /opt/rtcds/caltech/c1/scripts/AutoUpdate/webview.log -r webview_simlink_update

  4151   Thu Jan 13 16:34:02 2011 josephbUpdateComputers32 bit matlab updated

There was a problem with running the webview report generator in matlab on Mafalada.  It complained of not having a spare report generator license to use, even though the report generator was working before and after on other machines such as Rosalba.  So I moved the old 32 bit matlab directory from /cvs/cds/caltech/apps/Linux/matlab to /cvs/cds/caltech/apps/Linux/matlab_old.  I installed the latest R2010b matlab from IMSS in /cvs/cds/caltech/apps/Linux/matlab and this seems to have made the cron job work on Mafalda now.

  4152   Thu Jan 13 16:41:07 2011 josephbUpdateCDSChannel names for LSC updated

I renamed most of the filter banks in the c1lsc model.  The input filters are now labeled based on the RF photodiode's name, plus I or Q.  The last set of filters in the OM subsystem (output matrix) have had the TO removed, and are now sensibly named ETMX, ETMY, etc.

We also removed the redundant filter banks between the LSCMTRX and the LSC_OM_MTRX.  There is now only one set, the DARM, CARM, etc ones.

The webview of the LSC model can be found here.

  4157   Fri Jan 14 17:13:39 2011 josephbUpdateCamerasPylon driver for Basler Cameras installed on Megatron

After getting some help from the Basler technical support, I was directed to the following ftp link:

ftp://Pylon4Linux-ro:h50UZgkl@ftp.baslerweb.com

I went to the pylon 2.1.0 directory and downloaded the pylon-2.1.0-1748-bininst-64.tar.bz2 file.  Inside of this tar file was another one called pylon-bininst-64.tar.bz2 (along with some other sample programs). I ran tar -jxf on pylon-bininst-64.tar.bz2 and placed the results into the /opt/pylon directory.  It produced a directory of includes, libraries and binaries there.

After playing around with the make files for several sample programs they provided, I finally have been able to compile  them.  At several places I had to have the make files point to /opt/pylon/lib64 rather than /opt/pylon/lib.  I'll be testing the camera with these programs on Monday.  I'd also like to see if this particular distribution will work on Centos machines.  There's some comments in one of the INSTALL help files suggesting packages needed for an install on Fedora 9, which may mean its possible to get this version working on the Centos machines.

  4163   Mon Jan 17 15:31:50 2011 josephbUpdateCamerasTest the Basler acA640-100gm camera

The Basler acA640-100gm is a power over ethernet camera.  It uses a power injector to supply power over an ethernet cable to the camera.  Once I got past some initial IP difficulties, the camera worked fine out of the box.

You need to set some environment variables first, so the code knows where its libraries are located.

setenv PYLON_ROOT /opt/pylon
setenv GENICAM_ROOT_V1_1 /opt/pylon
setenv GENICAM_CACHE /cvs/cds/caltech/users/josephb/xml_cache
setenv LD_LIBRARY_PATH /opt/pylon/lib64:$LD_LIBRARY_PATH

I then run the /opt/pylon/bin/PylonViewerApp

Notes on IP:

Initially, you need to set the computer connecting to the camera to an ip in the 169.254.0.XXX range.  I used 169.254.0.1 on megatron's eth1 ethernet connection.  I also set mtu to 9000.

You can then run the IpConfigurator in /opt/pylon/bin/ to change the camera IP as needed.

Attachment 1: PylonViewer.jpg
PylonViewer.jpg
  4168   Wed Jan 19 10:31:24 2011 josephbUpdateelogElog restarted again

Elog wasn't responding at around 10 am this morning.  I killed the elogd process, then used the restart script.

  4175   Thu Jan 20 10:15:50 2011 josephbUpdateCDSc1scy error

This is caused by an insufficient number of active DAQ channels in the C1SCY.ini file located in /opt/rtcds/caltech/c1/chans/daq/.  A quick look (grep -v # C1SCY.ini) indicates there are no active channels.  Experience tells me you need at least 2 active channels.

Taking a look at the activateDAQ.py script in the daq directory, it looks like the C1SCY.ini file is included, by the loop over optics is missing ETMY.  This caused the file to improperly updated when the activateDAQ.py script was run.  I have fixed the C1SCY.ini file (ran a modified version of the activate script on just C1SCY.ini).

I have restarted the c1scy front end using the startc1scy script and is currently working.

Quote:
 Here is the error messages in the dmesg on c1iscey
[   39.429002] c1scy: Invalid num daq chans = 0
[   39.429002] c1scy: DAQ init failed -- exiting
 

 

  4179   Thu Jan 20 18:20:55 2011 josephbUpdateCDSc1iscex computer and c1sus computer swapped

Since the 1U sized computers don't have enough slots to hold the host interface board, RFM card, and a dolphin card, we had to move the 2U computer from the end to middle to replace c1sus.

We're hoping this will reduce the time associated with reads off the RFM card compared to when its in the IO chassis.  Previous experience on c1ioo shows this change provides about a factor of 2 improvement, with 8 microseconds per read dropping to 4 microseconds per read, per this elog.

So the dolphin card was moved into the 2U chassis, as well as the RFM card.  I had to swap the PMC to PCI adapter on the RFM card since the one originally on it required an external power connection, which the computer doesn't provide.  So I swapped with one of the DAC cards in the c1sus IO chassis.

But then I forgot to hit submit on this elog entry..............

  4183   Fri Jan 21 15:26:15 2011 josephbUpdateCDSc1sus broken yesterday and now fixed

[Joe, Koji]
Yesterday's CDS swap of c1sus and c1iscex left the interfometer in a bad state due to several issues.

The first being a need to actually power down the IO chassis completely (I eventually waited for a green LED to stop glowing and then plugged the power back in) when switching computers.  I also plugged and plugged the interface cable from the IO chassis and computer while powered down.  This let the computer actually see the IO chassis (previously the host interface card was glowing just red, no green lights).

Second, the former c1iscex computer and now new c1sus computer only has 6 CPUs, not 8 like most of the other front ends.  Because it was running 6 models (c1sus, c1mcs, c1rms, c1rfm, c1pem, c1x02) and 1 CPU needed to be reserved for the operating system, 2 models were not actually running (recycling mirrors and PEM).  This meant the recycling mirrors were left swinging uncontrolled.

To fix this I merged the c1rms model with the c1sus model.  The c1sus model now controls BS, ITMX, ITMY, PRM, SRM.  I merged the filter files in the /chans/ directory, and reactivated all the DAQ channels.  The master file for the fb in the /target/fb directory had all references to c1rms removed, and then the fb was restarted via "telnet fb 8088" and then "shutdown".

My final mistake was starting the work late in the day.

So the lesson for Joe is, don't start changes in the afternoon.

Koji has been helping me test the damping and confirm things are really running.  We were having some issues with some of the matrix values.  Unfortunately I had to add them by hand since the previous snapshots no longer work with the models.

  4194   Mon Jan 24 10:39:16 2011 josephbHowToDAQDAQ Wiki Failure

Actually both port 8087 and 8088 work to talk to the frame builder.  Don't let the lack of a daqd prompt fool you.

 

Here's putting in the commands:

rosalba:~>telnet fb 8088 Trying 192.168.113.202...

Connected to fb.martian (192.168.113.202). Escape character is '^]'.

shutdown

0000Connection closed by foreign host.

rosalba:~>date Mon Jan 24 10:30:59 PST 2011

 

Then looking at the last 3 lines of restart.log in /opt/rtcds/caltech/c1/target/fb/

daqd_start Fri Jan 21 15:20:48 PST 2011

daqd_start Fri Jan 21 23:06:38 PST 2011

daqd_start Mon Jan 24 10:30:29 PST 2011

 

So clearly its talking to the frame builder, it just doesn't have the right formatting for the prompt.  If you try typing in "help" at the prompt, you still get all the frame builder commands listed and can try using any of them.

However, I'll edit the DAQ wiki and indicate 8087 should be used because of the better formatting for the prompt.


Quote:
Apparently, 8087 is the right port. Various elog entries from Joe and Kiwamu say 8087 or 8088. Not sure what's going on here.

After figuring this out, I activated the C1:GCV-XARM_COARSE_OUT_DAQ and C1:GCV-XARM_FINE_OUT_DAQ and set both of them to be recorded at 2048 Hz. We are loading filters and setting gains into these filter modules such that the OUT signals will be calibrated into Hz (that's why we used the OUT instead of the IN1 as there was last night).

 

  4200   Tue Jan 25 15:20:38 2011 josephbUpdateCDSUpdated c1rfm model plus new naming convention for RFM/Dolphin

After sitting down for 5 minutes and thinking about it, I realized the names I had been using for internal RFM communication were pretty bad.  It was because looking at a model didn't let you know where the RFM connection was coming from or going to.  So to correct my previous mistakes, I'm instituting the following naming convention for reflected memory, PCIE reflected memory (dolphin) and shared memory names.  These don't actually get used anywhere but the models, and thus don't show up as channel names anywhere else.  They are replaced by raw hex memory locations in the actual code through the use of the IPC file (/opt/rtcds/caltech/c1/chans/ipc/C1.ipc).  However it will make understanding the models easier for anyone looking at them or modifying them.

 

The new naming convention for RFM and Dolphin channels is as follows.

SITE:Sending Model-Receiving Model_DESCRIPTION_HERE

The description should be unique to that data being transferred and reused if its the same data.  Thus if its transfered to another model, its easy to identify it as the same information.

The model should be the .mdl file name, not the subsystem its a part of.  So SCX is used instead of SUS.  This is to make it easier to track where data is going.

In the unlikely case of multiple models receiving, it should be of the form SITE:Sending Model-Receiving Model 1-Receiving Model 2_DESCRIPTION_HERE.  Seperate models by dashes and description by underscores.

Example:

C1:LSC-RFM_ETMX_LSC

This channel goes from the LSC model (on c1lsc) to the RFM model (on c1sus).  It transfers ETMX LSC position feedback.  The second LSC may seem redundant until we look at the next channel in the chain.

C1:RFM-SCX_ETMX_LSC

This channel goes from the RFM model to the SCX model (on c1iscex). It contains the same information as the first channel, i.e. ETMX LSC position feedback.

 

I have updated all the models that had RFM and SHMEM connections, as well as adding all the LSC communciation connections to c1rfm.  This includes c1sus, c1rfm, c1mcs, c1ioo, c1gcv, c1lsc, c1scx, c1scy.  I have not yet built all the models since I didn't finish the updates until this afternoon.  I will build and test the code tomorrow morning.

 

 

 

  4206   Wed Jan 26 10:58:48 2011 josephbUpdateCDSFront End multiple crash

Looking at dmesg on c1lsc, it looks like the model is starting, but then eventually times out due to a long ADC wait. 

[  114.778001] c1lsc: cycle 45 time 23368; adcWait 14; write1 0; write2 0; longest write2 0
[  114.779001] c1lsc: ADC TIMEOUT 0 1717 53 181

I'm not sure what caused the time out, although there about 20 messages indicating a failed time stamp read from c1sus (its sending TRX information to c1lsc via the dolphin connection) before the time out.

Not seeing any other obvious error messages, I killed the dead c1lsc model by typing:

sudo rmmod c1lscfe

I then tried starting just the front end model again by going to the /opt/rtcds/caltech/c1/target/c1lsc/bin/ directory and typing:

sudo insmod c1lscfe.ko

This started up just the FE again (I didn't use the restart script because the EPICs processes were running fine since we had non-white channels).  At the moment, c1lsc is now running and I see green lights and 0x0 for FB0 status  on the C1LSC_GDS_TP screen.

At this point I'm not sure what caused the timeout.  I'll be adding some more trouble shooting steps to the wiki though.  Also, c1scx, c1scy are probably in need of restart to get them properly sync'd to the framebuilder.

I did a quick test on dataviewer and can see LSC channels such as C1:LSC-TRX_IN1, as well other channels on C1SUS such as BS sensors channels.

Quote:

STATUS:

  • Rebooted c1lsc and c1sus. Restarted fb many times.
  • c1sus seems working.
  • All of the suspensions are damped / Xarm is locked by the green
  • Thermal control for the green is working
  • c1lsc is frozen
  • FB status: c1lsc 0x4000, c1scx/c1scy 0x2bad
  • dataviewer not working 

 

ELOG V3.1.3-