40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 92 of 344  Not logged in ELOG logo
ID Date Authordown Type Category Subject
  3323   Thu Jul 29 17:12:48 2010 josephb, kiwamuUpdateCDSRe:Working out ADC/DAC/BO wiring

We have installed  4 BO boards, 3 DAC boards and 1 ADC board for new C1SUS.

They are on the 1Y5 rack. 

 

 DSC_2302.png

  3638   Fri Oct 1 18:19:24 2010 josephb, kiwamuUpdateCDSc1sus work

The c1sus model was split into 2, so that c1sus controls BS, PRM, SRM, ITMX, ITMY, while c1mcs controls MC1, MC2, MC3.  The c1mcs uses shared memory to tell c1sus what signals to the binary outputs (which control analog whitening/dewhitening filters), since two models can't control a binary output.

This split was done because the CPU time was running above 60 microseconds (the limit allowable since we're trying to run at 16kHz). Apparently the work Alex had done getting testpoints working had put a greater load on the cpu and pushed it over an acceptable maximum.    After removing the MC optics controls, the CPU time dropped to about 47 microseconds from about 67 microseconds.  The c1mcs is taking about 20 microseconds per cycle.

The new model is using the top_names functionality to still call the channels C1SUS-XXX_YYY.  However, the directory to find the actual medm filter modules is /opt/rtcds/caltech/c1/medm/c1mcs, and the gds testpoint screen for that model is called C1MCS-GDS_TP.adl.  I'm currently in the process of updating the medm screens to point to the correct location.

Also, while plugging in the cables from the coil dewhitening boards, we realized I (Joe) had made a mistake in the assignment of channels to the binary output boards.  I need to re-examine Jay's old drawings and fix the simulink model binary outputs.

  3639   Fri Oct 1 18:53:33 2010 josephb, kiwamuUpdateCDSThings needing to be done next week

We realized we cannot build code with the current RCG compiler on c1ioo or c1iscex, since these are not Gentoo machines.  We need either to get a backwards compatible code generator, or change the boot priority (removing the harddrives also probably works) for c1ioo and c1iscex so they do the diskless Gentoo thing.  This would involve adding some MAC address to the framebuilder dhcpd.conf file in /etc/dhcp along with the computer IPs, and then modifying the /diskless/root/etc/rtsystab with the right machine names and models to start.

I also need to bring some of the older, neglected models up to current build standards. I.e. use cdsIPCx_RFM instead of cdsIPCx and so forth. 

Need to fix the binary outputs for c1sus/c1mcs.  Need to actually get the RFM running, since Kiwamu was having some issues with his green RFM test model.  We have the latest checkout from Rolf, but we have no proof that it actually works.

  4012   Mon Dec 6 11:53:20 2010 josephb, kiwamuSummaryall down cond.power outage

The monitors for allegra and rossa's seemed to be in a weird state after the power outage.  I turned allegra and rossa on, but didn't see anything.  However, I was after awhile able to ssh in.  Power cycling the monitors did apparently got them talking with the computers again and displaying.

I had to power cycle the c1sus and c1iscex machines (they probably booted faster than linux1 and the fb machines, and thus didn't see their root and /cvs/cds directories).  All the front ends seem to be working normally and we have damped optics.

The slow crates look to be working, such as c1psl, c1iool0, c1auxex and so forth.

Kiwamu turned the main laser back on.

Quote:

Looks like there was a power outage.

 

  4027   Wed Dec 8 14:46:19 2010 josephb, kiwamuUpdateCDSWhy the ETMX daq channels were not recorded last night

When adding the ETMX DAQ channels using the daqconfig gui (located in /opt/rtcds/caltech/c1/scripts/) on C1SCX.ini, we forgot to set the acquire flag to 1 from 0.

So the frame builder was receiving the data, but not recording it.

We have since then added ETMX and the C1SCX.ini file to Yuta's useful "activateDAQ.py" script in /opt/rtcds/caltech/c1/chans/daq/, so that it now sets the sensor and SUSPOS like channels to be acquired at 2k when run.  You still need to restart the frame builder (telnet fb 8087 and then shutdown) for these changes to take effect.

The script now also properly handles files which already have had channels activated, but not acquired.

  4065   Thu Dec 16 15:10:18 2010 josephb, kiwamuUpdateCDSETMY working at the expense of ETMX

I acquired a second full pair of Host interface board cards (one for the computer and one for the chassis) from Rolf (again, 2nd generation - the bad kind).

However, they exhibited the same symptoms as the first one that I was given. 

Rolf gave a few more suggestions on getting it to work.  Pull the power plugs.  If its got slow flashing green lights, just soft cycle, don't power cycle.  Alex suggested turning the IO chassis on before the computer.

None of it seemed to help in getting the computer talking to the IO chassis.

 

I finally decided to simply take the ETMX IO chassis and place it at the Y end.  So for the moment, ETMY is working, while ETMX is temporarily out of commission. 

We also made the necessary cables (2x 37 d-sub female to 40 pin female and 40 pin female to 40 pin female) .  Kiwamu also did nice work on creating a DAC adapter box, since Jay had given me a spare board, but nothing to put it in.

  4134   Tue Jan 11 13:32:52 2011 josephb, kiwamuUpdateCDSUpdated some DAQ channel names

[Joe, Kiwamu]

We modified the activateDAQ.py script which lives in /opt/rtcds/caltech/c1/chans/daq/ and updates the C1SUS.ini, C1MCS.ini, C1RMS.ini, C1SCX.ini and C1SCY.ini files.  These files contain the DAQ channels for all the optics.

It has been modified so that channels like C1:SUS-ITMX_ULSEN_OUT_DAQ become C1:SUS-ITMX_SENSOR_UL.  Similarly the oplev signals go from C1:SUS-ITMX_OLPIT_OUT to C1:SUS-ITMX_OPLEV_PERROR.

After some debugging, we ran the script successfully and checked the output was correct.  We then restarted the frame builder (telnet fb 8088 and then shutdown) and also hit the DAQ reload button for all the front ends.

I tested in dataviewer that I could go back several years as well as going back just 1 hour in the history and see data for C1:SUS-ITMX_SENSOR_LL as well as C1:SUS-ITMX_OPLEV_YERROR.  I also tested realtime is also working for these channels.

 

The contents of the script are below.

 

inputfiles=["C1SUS.ini","C1RMS.ini","C1MCS.ini","C1SCX.ini","C1SCY.ini"]
prefix="[C1:SUS-"
optics=["BS_","ITMX_","ITMY_","PRM_","SRM_","MC1_","MC1_","MC2_","MC3_","ETMX_"]
#channels=["SUSPOS_IN1","SUSPIT_IN1","SUSYAW_IN1","SUSSIDE_IN1","ULSEN_OUT","URSEN_OUT","LRSEN_OUT","LLSEN_OUT","SDSEN_OUT","OL_SUM_IN1","OLPIT_IN1","OLYAW_IN1"]
channels_dict = {'SUSPOS_IN1':'SUSPOS_IN1_DAQ',
'SUSPIT_IN1':'SUSPIT_IN1_DAQ',
'SUSYAW_IN1':'SUSYAW_IN1_DAQ',
'SUSSIDE_IN1':'SUSSIDE_IN1_DAQ',
'ULSEN_OUT':'SENSOR_UL',
'URSEN_OUT':'SENSOR_UR',
'LRSEN_OUT':'SENSOR_LR',
'LLSEN_OUT':'SENSOR_LL',
'SDSEN_OUT':'SENSOR_SIDE',
'OLPIT_OUT':'OPLEV_PERROR',
'OLYAW_OUT':'OPLEV_YERROR',
'OL_SUM_OUT':'OPLEV_SUM'}

suffix="_DAQ]\n"

## set datarate
datarate=2048

## read the ini files
for inputfile in inputfiles:
    print inputfile
    outputfile=inputfile
    ifile = open(inputfile,'r')
    lines = ifile.readlines()
    ifile.close()

    for k in range(len(lines)):
        for op in optics:
            for ch in channels_dict:
                if (prefix+op+ch+suffix) in lines[k]:
                    lines[k]=prefix + op + channels_dict[ch] + "]\n"
                    lines[k+1]=lines[k+1].lstrip("#").rstrip(lines[k+1].split("=")[1])+"1\n"
                    lines[k+2]=lines[k+2].lstrip("#")
                    lines[k+3]=lines[k+3].lstrip("#").rstrip(lines[k+3].split("=")[1])+str(datarate)+"\n"
                    lines[k+4]=lines[k+4].lstrip("#")
    ofile = open(outputfile,'w')
    for k in range(len(lines)):
        ofile.write(lines[k])
        #print lines[k]
    ofile.close()

  2067   Thu Oct 8 11:10:50 2009 josephb, jenneUpdateComputersEPICs Computer troubles

At around 9:45 the RFM/FB network alarm went off, and I found c1asc, c1lsc, and c1iovme not responding. 

I went out to hard restart them, and also c1susvme1 and c1susvme2 after Jenne suggested that.

c1lsc seemed to have a promising come back initially, but not really.  I was able to ssh in and run the start command.  The green light under c1asc on the RFMNETWORK status page lit, but the reset and CPU usage information is still white, as if its not connected.   If I try to load an LSC channel, say like PD5_DC monitor, as a testpoint in DTT it works fine, but the 16 Hz monitor version for EPICs is dead.  The fact that we were able to ssh into it means the network is working at least somewhat.

I had to reboot c1asc multiple times (3 times total), waiting a full minute on the last power cycle, before being able to telnet in.  Once I was able to get in, I restarted the startup.cmd, which did set the DAQ-STATUS to green for c1asc, but its having the same lack of communication as c1lsc with EPICs.

c1iovme was rebooted, was able to telnet in, and started the startup.cmd.  The status light went green, but still no epics updates.

The crate containing c1susvme1 and c1susvme2 was power cycled.  We were able to ssh into c1susvme1 and restart it, and it came back fully.  Status light, cpu load and channels working.  However I c1susvme2 was still having problems, so I power cycled the crate again.  This time c1susvme2 came back, status light lit green, and its channels started updating.

At this point, lacking any better ideas, I'm going to do a full reboot, cycling c1dcuepics and proceeding through the restart procedures.

  4510   Mon Apr 11 14:17:22 2011 josephb, jamieUpdateCDSFrame wiper script installed

[Joe, Jamie, Alex]

Fixes:

I asked Alex which cron to use (dcron? frcron?).  He promptly did the following:

emerge dcron

rc-update add dcron default

Copied the wiper.pl script from LLO to /opt/rtcds/caltech/c1/target/fb/

At that point, I modified wiper.pl script to reduce to 95% instead of 99.7%.

I added controls to the cron group on fb:

sudo gpasswd -a controls cron

I then added the wiper.pl to the crontab as the following line using crontab -e.

0 6 * * *       /opt/rtcds/caltech/c1/target/fb/wiper.pl --delete &> /opt/rtcds/caltech/c1/target/fb/wiper.log

Notes:

Note, placing backups on the /frames raid array will break this script, because it compares the amount in the /frames/full/, /frames/trends/minutes, and /frames/trends/seconds to the total capacity. 

Apparently, we had backups from September 27th, 2010 and March 22nd, 2011.  These would have broken the script in any case. 

We are currently removing these backups, as they are redundant data, and we have rsync'd backups of the frames and trends.  We should now have approximately twice the lookback of full frames.

  4583   Thu Apr 28 16:12:19 2011 josephb, jamieUpdateCDSNew CDS model SVN

New SVN

We are now using the LIGO CDS SVN for storing our control models.

The SVN is at:

https://redoubt.ligo-wa.caltech.edu/websvn/

The models are under cds_user_apps, then trunk, then approriate subsystem (ISC for c1lsc for example), c1 (for caltech 40m), then models.

We have checked out /cds_user_apps to /opt/rtcds/.

So to find the c1lsc.mdl model, you would go to /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1lsc.mdl

This SVN is shared by many people LIGO, so please follow good SVN practice.  Remember to update models ("svn update") before doing commits.  Also, after making changes please do an update to the SVN so we have a record of the changes.

New Practices

We are creating soft links in the /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/simLink/ to the models that you need to build.  So if you want to add a new model, please add it to the cds_users_apps SVN in the correct place and create a soft link to the simLink directory.

lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1sus.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1sus.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1sup.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1sup.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1spy.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1spy.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1spx.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1spx.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1scy.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1scy.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1scx.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1scx.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1mcs.mdl -> /opt/rtcds/cds_user_apps/trunk/SUS/c1/models/c1mcs.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x05.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x05.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x04.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x04.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x03.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x03.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x02.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x02.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1x01.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1x01.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1rfm.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1rfm.mdl
lrwxrwxrwx 1 controls controls      55 Apr 28 14:41 c1dafi.mdl -> /opt/rtcds/cds_user_apps/trunk/CDS/c1/models/c1dafi.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1pem.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1pem.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1mcp.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1mcp.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1lsp.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1lsp.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1lsc.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1lsc.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1ioo.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1ioo.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1gpv.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1gpv.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1gfd.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1gfd.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1gcv.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1gcv.mdl
lrwxrwxrwx 1 controls controls      54 Apr 28 14:41 c1ass.mdl -> /opt/rtcds/cds_user_apps/trunk/ISC/c1/models/c1ass.mdl

  3251   Tue Jul 20 15:06:48 2010 josephb, bobOmnistructureGeneralNew speakers in ceiling in control room

After the crane training, Bob attached speakers to the ceiling right next to the projector, for use with presentations.

  3036   Wed Jun 2 17:34:33 2010 josephb, alex, valeraUpdateCDSCDS updates

From what I understand, Alex rewrote portions of the framebuilder and testpoint codes and then recompiled them in order to get more than 1 testpoint per front end working.   I've tested up to 5 testpoints at once so far, and it worked.

We also have a new noise component added to the RCG code.  This piece of code uses the random number generator from chapter 7.1 of Numerical Recipies Third Edition to generate uniform numbers from 0 to 1.  By placing a filter bank after it should give us sufficient flexibility in generating the necessary noise types.  We did a coherence test between two instances of this noise piece, and they looked pretty incoherent.  Valera will add a picture of it when it finishe 1000 averages to this elog.

I'm in the process of propagating the old suspension control filters to the new RCG filter banks to give us a starting point.  Tomorrow Valera and I are planning to choose a subset of the plant filters  and put them in, and then work out some initial control filters to correspond to the plant.  I also need to think about adding the anti-aliasing filters and whitening/dewhitening filters.

 

  3673   Thu Oct 7 17:19:55 2010 josephb, alex, rolfUpdateCDSc1sus status

As noted by Koji, Alex and Rolf stopped by.

We discussed the feasibility of getting multiple models using the same DAC.  We decided that we infact did need it. (I.e. 8 optics through 3 DACs does not divide nicely), and went about changing the controller.c file so as to gracefully handle that case.  Basically it now writes a 0 to the channel rather than repeating the last output if a particular model goes down that is sharing a DAC.

In a separate issue, we found that when skipping DACs  in a model (say using DACs 1 and 2 only) there was a miscommunication to the IOP, resulting in the wrong DACs getting the data.  the temporary solution is to have all DACs in each model, even if they are not used.  This will eventually be fixed in code.

At this point, we *seem* to be able to control and damp optics.  Look for a elog from Yuta confirming or denying this later tonight (or maybe tomorrow).

 

  3836   Mon Nov 1 14:53:30 2010 josephb, alex, rolfUpdateCDSAlex updated FB and mx_streams code

Problem:

The framebuilder was being flaky.  MX_streams would go down, prevent testpoints from working and so forth.

Solution:

Send Alex up North for a week to fix the code. 

Alex came back and installed updates to the frame builder and the mx_streams code (Myrinet Express over Generic Ethernet Hardware) used by the front ends to talk to the frame builder.  Instead of 1 stream per model, there's now just 1 per front end handling all communications.

Alex did an SVN update and we now have the latest CDS code.

Self restarting codes:

The frame builder code (daqd) and nds pipe have been added to the fb machine's inittab.  Specifically it calls a script called /opt/rtcds/caltech/c1/target/fb/start_daqd.inittab and /opt/rtcds/caltech/c1/target/fb/start_nds.inittab.

The addition to the /etc/inittab file on fb is:

daq:345:respawn:/opt/rtcds/caltech/c1/target/fb/start_daqd.inittab 

nds:345:respawn:/opt/rtcds/caltech/c1/target/fb/start_nds.inittab

When these codes die they should automatically restart.

Self starting codes at boot up:

The front ends now start the mx_stream script (which lives in /opt/rtcds/caltech/c1/target/fb/ directory) at boot up.  They call it with the approriate command line options for that front end. It can be found in the /etc/rc.local file.

They look like: mx_stream -s "c1x02 c1sus c1mcs c1rms" -d fb:0

As always, the front end codes to be started are defined in the /etc/rtsystab file (or on fb, in the /diskless/root/etc/rtsystab file).

However, if it does go down you would need to restart it manually, although it seems more robust now and doesn't seem to go down every time we restart the frame builder.

All the usual front end IOCs and modules should be started and loaded on boot up as well.

 

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder  ...
                     ... 
  4004   Wed Dec 1 13:41:21 2010 josephb, alex, rolfUpdateCDSTiming is back

Problem:

We had timing problems across the front ends.

Solution:

Noticing that the 1PPS reference was not blinking on the Master Timing Distribution box.  It was supposed to be getting a signal from the c0dcu1 VME crate computer, but this was not happening.

We disconnected the timing signal going into c0dcu1, coming from c0daqctrl, and connected the 1PPS directly from c0daqctrl to the Ref In for the Master Timing distribution box (blue box with lots of fibers coming out of it in 1X5).

We now have agreement in timing between front ends.

After several reboots we now have working RFM again, along with computers who agree on the current GPS time along with the frame builder.

Status:

RFM is back and testpoints should be happy.

We still don't have a working binary output for the X end.  I may need to get a replacement backplane with more than 4 slots if the 1st slot of this board has the same problem as the large boards.

I have burt restored the c1ioo, c1mcs, c1rms, c1sus, and c1scx processes, and optics look to be damped.

 

  4129   Mon Jan 10 16:39:36 2011 josephb, alex, rolfUpdateCDSNew Time server for frame builder and 1PPS

Alex and Rolf came over today with a Tempus LX  GPS network timing server.  This has an IRIG-B output and a 1PPS output.  It can also be setup to act as an NTP server (although we did not set that up).

This was placed at waist height in the 1X7 rack.  We took the cable running to the presumably roof mounted antenna from the VME timing board and connected it to this new timing server.  We also moved the source of the 1PPS signal going to the master timer sequencer (big blue box in 1X7 with fibers going to all the front ends) to this new time server.  This system is currently working, although it took about 5 minutes to actually acquire a timing signal from the GPS satellites.  Alex says this system should be more stable, with no time jumps. 

I asked Rolf about the new timing system for the front ends, he had no idea when that hardware would be available to the 40m.

Currently, all the front ends and the frame builder agree on the time.  Front ends are running so the 1 PPS signal appears to be working as well.

  4130   Mon Jan 10 16:47:08 2011 josephb, alex, rolfUpdateCDSFixed c1lsc dolphin reflected memory

While Alex and Rolf were visiting, I pointed out that the Dolphin card was not sending any data, not even a time stamp, from the c1lsc machine.

After some poking around, we realized the IOP (input/output processor) was coming up before the Dolphin driver had even finished loading. 

We uncommented the line

#/etc/dolphin_wait

in the /diskless/root/etc/rc.local file on the frame builder.  This waits until the dolphin module is fully loaded, so it can hand off a correct pointer to the memory location that the Dolphin card reads and writes to.  Previously, the IOP had been receiving a bad pointer since the Dolphin driver had not finished loading.

So now the c1lsc machine can communicate with c1sus via Dolphin and from there the rest of the network via the traditional Ge Fanuc RFM.

  2627   Mon Feb 22 12:48:31 2010 josephb, alex, kojiUpdateComputersFE machines now coming up

Even after bringing up the fb40m, I was unable to get the front ends to come up, as they would error out with an RFM problem.

We proceeded to reboot everything I could get my hands on, although its likely it was daqawg and daqctrl which were the issue, as on the C0DAQ_DETAIL screen their status had been showing as 0xbad, but after the reboot showed up as 0x0.  They had originally come up before the frame builder had been fixed, so this might have been the culprit.  In the course of rebooting, I also found c1omc and c1lsc had been turned off as well, and turned them on.

After this set of reboots, we're now able to bring the front ends up one by one.

  1554   Thu May 7 12:21:36 2009 josephb, alexConfigurationComputersfb40m

Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.

 

We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.

 

 

  2215   Mon Nov 9 14:59:34 2009 josephb, alexUpdateComputersThe saga of Megatron continues

Apparently the random file system failure on megatron was unrelated to the RFM card (or at least unrelated to the physical card itself, its possible I did something while installing it, however unlikely).

We installed a new hard drive, with a duplicate copy of RTL and assorted code stolen from another computer.  We still need to get the host name and a variety of little details straightened out, but it boots and can talk to the internet.  For the moment though, megatron thinks its name is scipe11.

You still use ssh megatron.martian to log in though.

We installed the RFM card again, and saw the exact same error as before.  "NMI EVENT!" and "System halted due to fatal NMI".

Alex has hypothesized that the interface card the actual RFM card plugs into, and which provides the PCI-X connection might be the wrong type, so he has gone back to Wilson house to look for a new interface card.  If that doesn't work out, we'll need to acquire a new RFM card at some point

After removing the RFM card, megatron booted up fine, and had no file system errors.  So the previous failure was in fact coincidence.

 

  2225   Tue Nov 10 10:51:00 2009 josephb, alexUpdateComputersMegatron on, powercycled c1omc, and burt restored from 3am snapshot

Last night around 5pm or so, Alex had remotely logged in and made some fixes to megatron.

First, he changed the local name from scipe11 to megatron.  There were no changes to the network, this was a purely local change.  The name server running on Linux1 is what provides the name to IP conversions.  Scipe11 and Megatron both resolve to distinct IPs. Given c1auxex wasn't reported to have any problems (and I didn't see any problems with it yesterday), this was not a source of conflict.  Its possible that Megatron could get confused while in that state, but it would not have affected anything outside its box.

Just to be extra secure, I've switched megatron's personal router over from a DMZ setup to only forwarding port 22.  I have also disabled the dhcp server on the gateway router (131.215.113.2).

Second, he turned the mdp and mdc codes on.  This should not have conflicted with c1omc.

This morning I came in and turned megatron back on around 9:30 and began trying to replicate the problems from last night between c1omc and megatron.  I called Alex and we rebooted c1omc while megatron was on, but not running any code, and without any changes to the setup (routers, etc).  We were able to burt restore.  Then we turned the mdp, mdc and framebuilder codes on, and again rebooted c1omc, which appeared to burt restore as well (I restored from 3 am this morning, which looks reasonable to me). 

Finally, I made the changes mentioned above to the router setups in the hope that this will prevent future problems but without being able to replicate the issue I'm not sure.

  2266   Fri Nov 13 10:28:03 2009 josephb, alexUpdateComputersMegatron is back to its old self

I called Alex this morning and explained the problems with megatron.

Turns out when he had been setting up megatron, he thought a startup script file, rc.local was missing in the /etc directory.  So he created it.  However, the rc.local file in the /etc directory is normally just a link to the /etc/rc.d/rc.local file.  So on startup (basically when we rebooted the machine yesterday), it was running an incorrect startup script file.  The real rc.local includes line:

/usr/bin/setup_shmem.rtl mdp mdc&

Hence the errors we were getting with shm_open().  We changed the file into a soft link, and resourced the rc.local script and mdp started right up.  So we're back to where we were 2 nights ago (although we do have an RFM card in hand).

Update:  The tst module wouldn't start, but after talking to Alex again, it seems that I need to add the module tst to the /usr/bin/setup_shmem.rtl mdp mdc& line in order for it to have a shared memory location setup for it.  I have edited the file (/etc/rc.d/rc.local), adding tst at the end of the line.  On reboot and running starttst, the code actually loads, although for the moment, I'm still getting blank white blocks on the medm screens.

  2305   Fri Nov 20 11:01:58 2009 josephb, alexConfigurationComputersWhere to find RFM offsets

Alex checked out the old rts (which he is no longer sure how to compile) from CVS to megatron, to the directory:

/home/controls/cds/rts/

In /home/controls/cds/rts/src/include you can find the various h files used.  Similarly, /fe has the c files.

In the h files, you can work out the memory offset by noting the primary offset in iscNetDsc40m.h

A line like suscomms.pCoilDriver.extData[0] determines an offset to look for.

0x108000 (from suscomms )

Then pCoilDriver.extData[#] determines a further offset.

sizeof(extData[0]) = 8240  (for the 40m - you need to watch the ifdefs, we were looking at the wrong structure for awhile, which was much smaller).

DSC_CD_PPY is the structure you need to look in to find the final offset to add to get any particular channel you want to look at.

The number for ETMX is 8, ETMY 9 (this is in extData), so the extData offset from 0x108000 for ETMY should be 9 * 82400.  These numbers (i.e. 8 =ETMX, 9=ETMY) can be found in losLinux.c in /home/controls/cds/rts/src/fe/40m/.  There's a bunch of #ifdef and #endif which define ETMX, ETMY, RMBS, ITM, etc.  You're looking for the offset in those.

So for ETMY LSC channel (which is a double) you add 0x108000 (a hex number) + (9 * 82400 + 24) (not in hex, need to convert) to get the final value of 0x11a160 (in hex).

-----------

A useful program to interact with the RFM network can be found on fb40m.  If you log in and go to:

/usr/install/rfm2g_solaris/vmipci/sw-rfm2g-abc-005/util/diag

you can then run rfm2g_util, give it a 3, then type help.

You can use this to read data.  Just type help read.  We had played around with some offsets and various channels until we were sure we had the offsets right.  For example, we fixed an offset into the ETMY LSC input, and saw the corresponding memory location change to that value.  This utility may also be useful for when we do the RFM test to check the integrity of the ring, as there are some diagnostic options available inside it.

  2306   Fri Nov 20 11:14:22 2009 josephb, alexConfigurationComputerstest points working on megatron and we may have filters with switch outputs built in

Alex tooked at the channel definitions (can be seen in tpchn_C1.par), and noticed the rmid was 0. 

However, we had set in testpoint.par the tst system to C-node1 instead of C-node0.  The final number inf that and the rmid need to be equal.   We have changed this, and the test points appear to be working now.

However, the confusing part is in the tst model, the gds_node_id is set to 1.  Apparently, the model starts counting at 1, while the code starts counting at 0, so when you edit the testpoint.par file by hand, you have to subtract one from whatever you set in the model.

In other news, Alex pointed me at a CDS_PARTS.mdl, filters, "IIR FM with controls".  Its a light green module with 2 inputs and 2 outputs.  While the 2nd set of input and outputs look like they connect to ground, they should be iterpreted by the RCG to do the right thing (although Alex wasn't positive it works, it worth trying it and seeing if the 2nd output corresponds to a usable filter on/off switch to connect to the binary I/O to control analog DW.  However, I'm not sure it has the sophistication to wait for a zero crossing or anything like that - at the moment, it just looks like a simple on/off switch based on what filters are on/off.

  2487   Fri Jan 8 11:43:22 2010 josephb, alexUpdateComputersRFM and Megatron

Alex came over with a short RFM cable this morning.  We used it to connect the rfm card in c1iscey to the rfm card megatron

Alex renamed startup.cmd in /cvs/cds/caltech/target/c1iscey/ to startup.cmd.sav, so it doesn't come up automatically.  At the end we moved it back.

Alex used the vxworks command d to look at memory locations on c1iscey.  Such as d 0xf0000000, which is the start of the rfm code location.  So to look at 0x11a1c8 (lscPos) in the rfm memory, he typed "d 0xf011a1c8".  After doing some poking around, we look at the raw tst front end code (in /home/controls/cds/advLigo/src/fe/tst), and realized it was trying to read doubles.  The old rts code uses floats, so the code was reading incorrectly.

As a quick fix, we changed the code to floats for that part.  They looked like:

etmy_lsc = filterModuleD(dsp_ptr,dspCoeff,ETMY_LSC,cdsPciModules.pci_rfm[0]? *(\
(double *)(((void *)cdsPciModules.pci_rfm[0]) + 0x11a1c8)) : 0.0,0);

And we simply changed the double to float in each case.  In addition we changed the RCG scripts locally as well (if we do a update at some point, it'll get overwritten).  The file we updated was /home/controls/cds/advLigo/src/epics/util/lib/RfmIO.pm

Line 57 and Line 84 were changed, with double replaced with float.

return "cdsPciModules.pci_rfm[0]? *((float *)(((void *)cdsPciModules.pci
_rfm[$card_num]) + $rfmAddressString)) : 0.0";

. "  *((float *)(((char *)cdsPciModules.pci_rfm[$card_nu
m]) + $rfmAddressString)) = $::fromExp[0];\n"

This fixed our ability to read the RFM card, which now can read the LSC POS channel, for example.

Unfortunately, when we were putting everything the way it was with RFM fibers and so forth, the c1iscey started to get garbage (all the RFM memory locations were reading ffff).  We eventually removed the VME board, removed the RFM card, looked at it, put the RFM card back in a different slot on the board, and returned c1iscey to the rack.  After this it started working properly.  Its possible in all the plugging and unplugging that the card somehow had become loose.

The next step is to add all the channels that need to be read into the .mdl file, as well as testing and adding the channel which need to be written.

 

  2488   Fri Jan 8 15:40:14 2010 josephb, alexUpdateComputersRFM and RCG

Alex added a new module to the RCG, for generating RFMIO using floats.  This has been commited to CVS.

  2542   Fri Jan 22 12:33:37 2010 josephb, alexUpdateComputersModified CDS_PARTS for Binary output

Turns out the CDSO32 part (representing the Contec BO-32L-PE binary output) rquires two inputs. One for the first 16 bits, and one for the second set of 16 bits.  So Alex added another input to the part in the library.  Its still a bit strange, as it seems the In1 represents the second set of 16 bits, and the In2 represents the first set of 16 bits.

I added two sliders on the CustomAdls/C1TST_ETMY.adl control screen (upper left), along with a bit readout display, which shows the bitwise and of the two slider channels. For the moment, I still can't see any output voltage on any of the DO pins, no matter what output I set.

 

  2591   Thu Feb 11 18:33:54 2010 josephb, alexUpdateComputersStatus of the IP change over

A few machines have still not been changed over, including a few laptops, mafalda, ottavia, and c0rga.

All the front ends have been changed over.

fb40m died during a reboot and was replaced with a spare Sun blade 1000 that Larry had.  We had to swap in our old hard drive and memory.

All the front ends, belladonna, aldabella, and the control room machines have been switched over. Nodus was changed over after we realized we hosed the elog and svn by switching linux1's IP.

At this point, 90% of the machines seem to be working, although c0daqawg seems to be having some issues with its startup.cmd code.

  2603   Sat Feb 13 18:58:31 2010 josephb, alexUpdateComputersfb40m testpoints fixed

I received an e-mail from Alex indicating he found the testpoint problem and fixed it today:

Quote from Alex: "After we swapped the frame builder computer it has reconfigured all device files and I needed to create some symlinks on /dev/ to make tpman work again. I test the testpoints and they do work now."

 

  2983   Tue May 25 16:40:27 2010 josephb, alexUpdateCDSFinally tracked down why new models wouldn't talk to each other

The problem with the new models using the new shared memory/dolphin/RFM defined as names in a single .ipc file.

The first is the no_oversampling flag should not be used.  Since we have a single IO processor handling ADCs and DACs at 64k, while the models run at 16k, there is some oversampling occuring.  This was causing problems syncing between the models and the IOP.

It also didn't help I had a typo in two channels which I happened to use as a test case to confirm they were talking.  However, that has been fixed.

  3055   Tue Jun 8 15:58:25 2010 josephb, alexUpdateCDSNew multi-filter matrix part added to RCG (at the 40m at least)

A new webview of the LSP model is available at:

https://nodus.ligo.caltech.edu:30889/FE/lsp_slwebview_files/

This model include a couple example noise generators as well as the new Matrix of Filter banks (5 inputs x 15 outputs = 75 Filters!).  The attached png shows where these parts can be found in the CDS_PARTS library.  I'm still working on the automatic generation of the matrix and filter bank medm screens for this part.  The plan is to have a matrix screen similar to current ones, except that the value entry points to the gain setting of the associated filter.  In addition, underneath each value, there will be a link to the full filter bank screen.  Ideally, I'd like to have the filter adl files located in a sub-directory of the system, to keep clutter down.

I've cut and past the new Foton file generated by the LSP model below.  The first number following the MTRX is the input the filter is taking data from and the second number is the output its pushing data to.  This means for the script parsing Valera's transfer functions, I need to input which channel corresponds to which number, such as DARM = 0, MICH = 1, etc.  So the next step is to write this script and populate the filter banks in this file.

# FILTERS FOR ONLINE SYSTEM
#
# Computer generated file: DO NOT EDIT
#
# MODULES DOF2PD_AS11I DOF2PD_AS11Q DOF2PD_AS55I DOF2PD_AS55Q
# MODULES DOF2PD_ASDC DOF2PD_POP11I DOF2PD_POP11Q DOF2PD_POP55I
# MODULES DOF2PD_POP55Q DOF2PD_POPDC DOF2PD_REFL11I DOF2PD_REFL11Q
# MODULES DOF2PD_REFL55I DOF2PD_REFL55Q DOF2PD_REFLDC Mirror2DOF_f2x1
# MODULES Mirror2DOF_f2x2 Mirror2DOF_f2x3 Mirror2DOF_f2x4 Mirror2DOF_f2x5
# MODULES Mirror2DOF_f2x6 Mirror2DOF_f2x7 DOF2PD_MTRX_0_0 DOF2PD_MTRX_0_1
# MODULES DOF2PD_MTRX_0_2 DOF2PD_MTRX_0_3 DOF2PD_MTRX_0_4 DOF2PD_MTRX_0_5
# MODULES DOF2PD_MTRX_0_6 DOF2PD_MTRX_0_7 DOF2PD_MTRX_0_8 DOF2PD_MTRX_0_9
# MODULES DOF2PD_MTRX_0_10 DOF2PD_MTRX_0_11 DOF2PD_MTRX_0_12 DOF2PD_MTRX_0_13
# MODULES DOF2PD_MTRX_0_14 DOF2PD_MTRX_1_0 DOF2PD_MTRX_1_1 DOF2PD_MTRX_1_2
# MODULES DOF2PD_MTRX_1_3 DOF2PD_MTRX_1_4 DOF2PD_MTRX_1_5 DOF2PD_MTRX_1_6
# MODULES DOF2PD_MTRX_1_7 DOF2PD_MTRX_1_8 DOF2PD_MTRX_1_9 DOF2PD_MTRX_1_10
# MODULES DOF2PD_MTRX_1_11 DOF2PD_MTRX_1_12 DOF2PD_MTRX_1_13 DOF2PD_MTRX_1_14
# MODULES DOF2PD_MTRX_2_0 DOF2PD_MTRX_2_1 DOF2PD_MTRX_2_2 DOF2PD_MTRX_2_3
# MODULES DOF2PD_MTRX_2_4 DOF2PD_MTRX_2_5 DOF2PD_MTRX_2_6 DOF2PD_MTRX_2_7
# MODULES DOF2PD_MTRX_2_8 DOF2PD_MTRX_2_9 DOF2PD_MTRX_2_10 DOF2PD_MTRX_2_11
# MODULES DOF2PD_MTRX_2_12 DOF2PD_MTRX_2_13 DOF2PD_MTRX_2_14 DOF2PD_MTRX_3_0
# MODULES DOF2PD_MTRX_3_1 DOF2PD_MTRX_3_2 DOF2PD_MTRX_3_3 DOF2PD_MTRX_3_4
# MODULES DOF2PD_MTRX_3_5 DOF2PD_MTRX_3_6 DOF2PD_MTRX_3_7 DOF2PD_MTRX_3_8
# MODULES DOF2PD_MTRX_3_9 DOF2PD_MTRX_3_10 DOF2PD_MTRX_3_11 DOF2PD_MTRX_3_12
# MODULES DOF2PD_MTRX_3_13 DOF2PD_MTRX_3_14 DOF2PD_MTRX_4_0 DOF2PD_MTRX_4_1
# MODULES DOF2PD_MTRX_4_2 DOF2PD_MTRX_4_3 DOF2PD_MTRX_4_4 DOF2PD_MTRX_4_5
# MODULES DOF2PD_MTRX_4_6 DOF2PD_MTRX_4_7 DOF2PD_MTRX_4_8 DOF2PD_MTRX_4_9
# MODULES DOF2PD_MTRX_4_10 DOF2PD_MTRX_4_11 DOF2PD_MTRX_4_12 DOF2PD_MTRX_4_13
# MODULES DOF2PD_MTRX_4_14  
# MODULES 

Attachment 1: CDS_Library.png
CDS_Library.png
  3534   Tue Sep 7 15:31:28 2010 josephb, alexUpdateCDSBinary Output working/ IO chassis fixed

As noted previously, we were having problems with getting multiple 32 channel binary output cards working.  Alex came by and we eventually tracked the problem down to an incorrect counter in the c code.  This has been fixed and checked into the CDS svn repository.  I tested the actual hardware and we are in fact able to turn our test LEDs on with multiple binary output boards.

 

Alex and I also looked at the non-functional IO chassis (the one which wouldn't sync with the 1PPS signal and wasn't turning on when the computer turned on.  We discovered one corner of the trenton board wasn't screwed down and was in fact slightly warped.  I screwed it down properly, straightening the board out in the process.  After this, the IO chassis worked with a host interface board to the computer and started properly.  We were able to see the boards attached as well with lspci.  So that chassis looks to be in working condition now.

Onwards to the RFM test.

  3600   Thu Sep 23 12:05:20 2010 josephb, alexUpdateCDSfb40m down, new fb in progress

Alex came over this morning and we began work on the frame builder change over.  This required fb40m be brought down and disconnected from the RAID array, so the frame builder is not available.

He brought a Netgear switch which we've installed at the top of the 1X7 rack.  This will eventually be connected, via Cat 6 cable, to all the front ends.  It is connected to the new fb machine via a 10G fiber.

Alex has gone back to Downs to pickup a Symmetricon (sp?) card for getting timing information into the frame builder.  He will also be bringing back a harddrive with the necessary framebuilder software to be copied onto the new fb machine.

He said he'd like to also put a Gentoo boot server on the machine.  This boot server will not affect anything at the moment, but its apparently the style the sites are moving towards.  So you have a single boot server, and diskless front end computers, running Gentoo.  However for the moment we are sticking with our current Centos real time kernel (which is still compatible with the new frame builder code).  However this would make a switch over to the new system possible in the future.

At the moment, the RAID array is doing a file system check, and is going slowly while it checks terabytes of data.  We will continue work after lunch. 

 Punchline: things still don't work.

  3602   Thu Sep 23 21:01:11 2010 josephb, alexUpdateCDSfb40m still down, new fb still in progress
Unfortunately, copying the data to the USB/SATA drive over at downs took longer than expected for Alex. We will be installing the new code on the new fb machine tomorrow and running it. We will be running off of a timer on that machine until Monday. On Monday, a Symmetricom card will be arriving from LLO so that we can connect an IRIG-B timing signal into the frame builder and use a proper time signal. There is no running frame builder for tonight and thus will be no trends until we get the new FB running tomorrow morning.
  3620   Wed Sep 29 12:08:28 2010 josephb, alexSummaryCDSLast burt save of old controls

This is being recorded for posterity so we know where to look for the old controls settings.

The last good burt restore that was saved before turning off scipe25 aka c1dcuepics was on September 29, 11:07.

  3625   Thu Sep 30 11:07:20 2010 josephb, alexUpdateCDStest points starting to work

The centos 5.5 compiled gds code is currently living on rosalba in the /opt/app directory (this is local to Rosalba only).  It has not been fully compiled properly yet.  It is still missing ezcaread/write/ and so forth.  Once we have a fully working code, we'll propagate it to the correct directories on linux1.

So to have a working dtt session with the new front ends, log into rosalba, go to opt/apps/, and source gds-env.bash in /opt/apps (you need to be in bash for this to work, Alex has not made a tcsh environment script yet).  This will let get testpoints and be able to make transfer function measurements, for example

Also, to build the latest awgtpman, got to fb, go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/gds, and type make.  This has been done and mentioned just as reference.

The awgtpman along with the front end models should startup automatically on reboot of c1sus (courtesy of the /etc/rc.local file).

  3628   Thu Sep 30 16:29:35 2010 josephb, alexUpdateCDSfb update

There currently seems to be a timing issue with  the frame builder.  We switched over to using a symmetricom card to get an IRIG-B signal into the fb machine, but the gps time stamp is way off (~80 years Alex said).

If there is a frame buiilder issue, its currently often necessary to kill the associated mx_stream processes, since they don't seem to restart gracefully.  To fix it the following steps should be taken:

Kill frame builder, kill the two mx_stream processes, then /etc/restart_streams/, then restart the frame builder (usual daqd -c ./daqdrc >& ./daqd.log in /opt/rtcds/caltech/c1/target/fb).

To restart (or start after a boot) the nds server, you need to go to /opt/rtcds/caltech/c1/target/fb and type

./nds /opt/rtcds/caltech/c1/target/fb/pipe

At this time, testpoints are kind of working, but timing issues seem to be preventing useful work being done with it.  I'm leaving with Alex working on the code.

 

  3633   Fri Oct 1 11:33:15 2010 josephb, alexConfigurationCDSChanging gds code to the new working version

Alex is installing the newly compiled gds code (compiled on Centos 5.5 on Rosalba) which does in fact include the ezca type tools. 

At the moment we don't have a solaris compile, although that should be done at somepoint in the future.  It means the gds tools (diaggui, foton, etc) won't work on op440m.  On the bright side, this newer gds code has a foton that doesn't seem to crash all the time on Linux.

 

  3635   Fri Oct 1 14:13:29 2010 josephb, alexUpdateCDSfb work that still needs to be done

1) Need to check 1 PPS signal alignment

2) Figure out why 1PPS and ADC/DAC testpoints went away from feCodeGen.pl?

3) Fix 1PPS testpoint giving NaN data

4) Figure out why is daqd printing "making gps time correction" twice?

5) Need to investigate why mx_streams are still getting stuck

6) Epics channels should not go out on 114 network (seen messages when doing
burt restore/save).

7) Dataviewer leaves test points hanging, daqd does not deallocate them
(net_Writer.c shutdown_netwriter call)

8) Need to install wiper scripts on fb

9) Need to install newer kernel on fb to avoid loading myrinet firmware
(avoid boot delay)

  3648   Tue Oct 5 13:46:26 2010 josephb, alexUpdateCDSRestarted fb trending

Fb is now once again actually recording trends.

A section of the daqdrc file (located in /opt/rtcds/caltech/c1/target/fb/ directory) had been commented out by Alex and never uncommented.  This section included the commands which actually make the fb record trends.

The section now reads as:

# comment out this block to stop saving data

#
start frame-saver;
sync frame-saver;
start trender;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start raw_minute_trend_saver;
#start frame-writer "225.225.225.1" broadcast="131.215.113.0" all;
#sleep 5;

  3651   Tue Oct 5 14:11:09 2010 josephb, alexUpdateCDSGoing to from rtlinux to Gentoo requires front end code clean out

Apparently when updating front end codes from rtlinux to the patched Gentoo, certain files don't get deleted when running make clean, such as the sysfe.rtl files in the advLigoRTS/src/fe/sys directories.  This fouls the start up scripts by making it think it should be configured for rtlinux rather than the Gentoo kernel module.

  3696   Tue Oct 12 13:05:00 2010 josephb, alexUpdateCDSfb still flaky, models time out fixed

Interesting information from Alex.  We're limited to 2 Megabytes per second per front end model.  Assuming all your channels are running at a 2kHz rate, we can have at most 256 channels being set to the frame builder from the front end (assuming 4 byte data).  We're fine for the moment, but perhaps useful to keep in mind.

I talked to Alex this morning and he said the frame builder is being flaky (it crashed on us twice this morning, but the third time seemed to stay up when requesting data).  I've added a new wiki page called "New Computer Restart Prodecures" under Computers and Scripts, found here. It includes all the codes that need to be running, and also a start order if things seem to be in a particularly bad state.  Unfortunately, there were no fixes done to the frame builder but it is on Alex's list of things to do.

In regards to the timing out of the front ends, Alex came over to the 40m this morning and we sat down debugging.  We tried several things, such as removing all filters from C1MCS.txt file in the chans directory, and watching the timing as we pressed various medm control buttons. We traced it to a filters used by the DAC in the model talking to the IOP front end, which actually sends the data to the physical DAC card.  The filter is used when converting between sample rates, in this case between the 16 kHz of the front end model and the 64 kHz of the IOP.  Sending it raw zeros after having had real data seemed to cause this filter to eat up an usually large amount of CPU time. 

We modified the /opt/rtcds/caltech/c1/core/advLigoRTS/src/include/drv/fm10Gen.c file.

We reverted a change that was done between version 908 and 929, where underflows (really small numbers) were dealt with by adding and then subtracting a very small number.  We left the adding and subtracting, but also restored the hard limits on the history.

So instead of relying on just:

input += 1e-16;
junk = input;
input -= 1e-16;

we also now use

if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

Thus any filter value who's absolute value is less than 1e-20 will be clamped to -1e-20 or 1e-20.  On the bright side, we no longer crash the front ends when we turn something off.

 

  3735   Mon Oct 18 15:33:00 2010 josephb, alexUpdateCDSPossibly broken timing cards

It looks as though we may have two IO chassis with bad timing cards.

Symptoms are as follows:

We can get our front end models writing data and timestamps out on the RFM network.

However, they get rejected on the receiving end because the timestamps don't match up with the receiving front end's timestamp.  Once started, the system is consistently off by the same amount. Stopping the front end module on c1ioo and restarting it, generated a new consistent offset.  Say off by 29,000 cycles in the first case and on restart we might be 11,000 cycles off.  Essentially, on start up, the IOP isn't using the 1PPS signal to determine when to start counting.

We tried swapping the spare IO chassis (intended for the LSC) in ....

# Joe will finish this in 3 days.

# Basically, in conclusion, in a word, we found that c1ioo IO chassis is wrong.

  3844   Tue Nov 2 11:34:53 2010 josephb, alexUpdateCDSdiagconfd running, excitations back in dtt

Problem:

Diagnostic test tools was starting with errors.

Cause:

After the reboot of the frame builder machine yesterday by Alex, the diagconfd daemon was not getting started by xinetd.  There was a sequence error in the startup where xinetd was being called before mounting drives from linux1.

Important Note:

If you do not see the "nds" line you would not have diagnostic tests enabled in the DTT:

[controls@rosalba apps]$ diag -i | grep nds
nds * * 192.168.113.202 8088 * 192.168.113.202

Solution:

Alex changed /etc/xinetd.d/diagconfd file to point to /opt/apps/gds/bin/diagconfd instead of /opt/apps/bin/diagconf.  He also ensured xinetd started after mounting from linux1.

Alex's Suggestion:
My feeling is we should get rid of this feature and have an NDS address
entry box in the "Online" tab in the DTT with the default "nds". I
mentioned this to Jim Batch and he greed with me, so maybe he is going to
implement this. So maybe you guys want to request the same thing too, send
the request to Rolf and Jim, so we can have the last demon exercised.

  3869   Fri Nov 5 15:20:18 2010 josephb, alexSummaryCDS40m computer slow down solved

Problem:

The 40m computers were responding sluggishly yesterday, to the point of being unusable.

Cause:

The mx_stream code running on c1iscex (the X end suspension control computer) went crazy for some reason.  It was constantly writing to a log file in /cvs/cds/rtcds/caltech/c1/target/fb/192.168.113.80.log.  In the past 24 hours this file had grown to approximately 1 Tb in size. The computer had been turned back on yesterday after having reconnected its IO chassis, which had been moved around last week for testing purposes - specifically plugging the c1ioo IO chassis in to it to confirm it had timing problems.

Current Status:

The mx_stream code was killed on c1iscex and the 1 Tb file removed.

Computers are now more usable.

We still need to investigate exactly what caused the code to start writing to the log file non-stop.

Update Edit:

Alex believes this was due to a missing entry in the /diskless/root/etc/hosts file on the fb machine.  It didn't list the IP and hostname for the c1iscex machine.  I have now added it.  c1iscex had been added to the /etc/dhcp/dhcpd.conf file on fb, which is why it was able to boot at all in the first place.  With the addition of the automatic start up of mx_streams in the past week by Alex, the code started, but without the correct ip address in the hosts file, it was getting confused about where it was running and constantly writing errors.

Future:

When adding a new FE machine, add its IP address and its hostname to the /diskless/root/etc/hosts file on the fb machine.

  3870   Fri Nov 5 16:40:22 2010 josephb, alexUpdateCDSc1ioo now has working awgtpman

Problem:

We couldn't set testpoints on the c1ioo machine.

Cause:

Awgtpman was getting into some strange race condition.  Alex added an additional sleep statement and shifted boot order slightly in the rc.local file.  This apparently is only a problem on c1ioo, which is a Sun X4600.  It was using up 100% of a single CPU for the awgtpman process.

Status:

We now have c1ioo test points working.

Future:

Need to examine the startc1ioo script and see if needs a similar modification, as that was tried at several points but yielded a similar state of non-functioning test points. For the moment, reboot of c1ioo is probably the best choice after making modifications to the c1ioo or c1x03 models.

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder  TDS
                     
  4003   Wed Dec 1 12:02:49 2010 josephb, alexUpdateCDSRebuilding frame builder with latest code

Problem:

The front ends seem to have different gps timestamps on the data than the frame builder has when receiving them.

One theory is we have fairly been doing SVN checkouts of the code for the front ends once a week or every two weeks, but the frame builder has not been rebuilt for about a month. 

Current Action:

Alex is currently rebuilding the frame builder with the latest code changes.

It also suggests I should try rebuilding the frame builder on a semi-regular basis as updates come in.

 

 

  4037   Thu Dec 9 12:28:52 2010 josephb, alexUpdateCDSThe Dolphin is in (Reflected memory that is)

Setting the Configurations files:

On the fb machine in /etc/dis/ there are several configurations files that need to be set for our dolphin network.

First, we modify networkmanager.conf.

We set  "-dimensionX 2;" and leave the dimensionY and dimensionZ as 0.  If we had 3 machines on a single router, we'd set X to 3, and so forth.

We then modify dishosts.conf.

We add an entry for each machine that looks like:

#Keyword name nodeid adapter link_width
HOSTNAME: c1sus
ADAPTER:  c1sus_a0 4 0 4

The nodeids (the first number after the name)  increment by 4 each time, so c1lsc is:

HOSTNAME: c1lsc
ADAPTER:  c1lsc_a0 8 0 4

The file cluster.conf is automatically updated by the code by parsing the dishosts.conf and networkmanager.conf files.

Getting the code to automatically start:

We uncommented the following lines in the rc.local file in /diskless/root/etc on the fb machine:

# Initialize Dolphin
sleep 2
# Have to set it first to node 4 with dxconfig or dis_nodemgr fails. Unexplai   ned.
/opt/DIS/sbin/dxconfig -c 1 -a 0 -slw 4 -n 4
/opt/DIS/sbin/dis_nodemgr -basedir /opt/DIS

For the moment we left the following lines commented out:

# Wait for Dolphin to initialize on all nodes
#/etc/dolphin_wait
We were unsure of the effect of the dolphin_wait script on the front ends without Dolphin cards.  It looks like the script it calls waits until there are no dead nodes.

In /etc/conf.d/ on the fb machine we modified the local.start file by uncommenting:

/opt/DIS/sbin/dis_networkmgr&

This starts the Dolphin network manager on the fb machine.  The fb machine is not using a Dolphin connection, but controls the front end Dolphin connections via ethernet.

The Dolphin network manager can be interacted with by using the dxadmin program (located in /opt/DIS/sbin/ on the fb machine).  This is a GUI program so use ssh -X when logging into the fb before use.

Setting up the front ends models:

Each IOP model (c1x02, c1x04) that runs on a machine using the Dolphin RFM cards needs to have the flag pciRfm=1 set in the configuration box (usually located in the upper left of the model in Simulink).  Similarly, the models actually making use of the Dolphin connections should have it set as well.  Use the PCIE_SignalName parts from IO_PARTS in the CDS_PARTS.mdl file to send and receive communications via the Dolphin RFM.

  4045   Mon Dec 13 11:56:32 2010 josephb, alexUpdateCDSDolphin is working

Problem:

The dolphin RFM was not sending data between c1lsc and c1sus.

Solution:

Dig into the controller.c code located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe/.  Find this bit of code on line 2173:

 

2173 #ifdef DOLPHIN_TEST
2174 #ifdef X1X14_CODE
2175         static const target_node = 8; //DIS_TARGET_NODE;
2176 #else
2177         static const target_node = 12; //DIS_TARGET_NODE;
2178 #endif
2179         status = init_dolphin(target_node);

Replace it with this bit of code:

2173 #ifdef DOLPHIN_TEST
2174 #ifdef C1X02_CODE
2175         static const target_node = 8; //DIS_TARGET_NODE;
2176 #else
2177         static const target_node = 4; //DIS_TARGET_NODE;
2178 #endif
2179         status = init_dolphin(target_node);

Basically this was hard coded for use at the site on their test stands.  When starting up, the dolphin adapter would look for a target node to talk to, that could not be itself.  So, all the dolphin adapters would normally try to talk to target_node 12, unless it was the X1X14 front end code, which happened to be the one with dolphin node id 12.  It would try to talk to node 8.

Unfortunately, in our setup, we only had nodes 4 and 8.  Thus, both our codes would try to talk to a nonexistent node 12.  This new code has everyone talk to node 4, except the c1x02 process which talks to node 8 (since it is node 4 and can't talk to itself).

I'm told this stuff is going away in the next revision and shouldn't have this hard coded stuff.

 
Different Dolphin Problem and Fix:

Apparently, the only models which should have pciRfm=1 are the IOP models which have a dolphin connection.  Front end models that are not IOP models (like c1lsc and c1rfm) should not have this flag set.  Otherwise they include the dolphin drivers and causes them and the IOP to refuse to unload when using rmmod.

So pciRfm=1 only in IOP models using Dolphin, everyone else should not have it or should have pciRfm=-1.

 

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Dolphin RFM Sim.Plant Frame builder TDS
                       
  4184   Fri Jan 21 17:59:27 2011 josephb, alexUpdateCDSFixed Dolphin transmission

The orientation of the Dolphin cards seems to be opposite on c1lsc and c1sus.  The wide part is on top on c1lsc and on the bottom on c1sus.  This means, the cable is plugged into the left Dolphin port on c1lsc and into the right Dolphin port on c1sus.  Otherwise you get a wierd state where you receive but not transmit.

ELOG V3.1.3-