40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 248 of 339  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  4679   Tue May 10 16:42:49 2011 josephbUpdateCDSNew upconversion model (c1uct)

[Ryan, Joe]

Ryan added the c1uct (upconversion tester) model to the c1ioo machine.   It is DCU_ID 32, CPU 6.  The relevant wiki page has been updated. It has been added to /diskless/root/etc/rtsystab on the fb machine and automatically comes up when the c1ioo computer is turned on. 

Joe still needs to add its status to the status screen.

It is soft linked from:

/opt/rtcds/caltech/c1/userapps/trunk/CDS/c1/models/c1uct.mdl

Ryan will expand upon its uses later.

  4680   Tue May 10 16:45:19 2011 josephbUpdateCDSc1ass now receiving AS55I from c1lsc

[Valera, Joe]

We added a cdsPCIx_SHMEM connection between the c1lsc and c1ass models.  This connection is called C1:LSC-ASS_AS55I, and sends the normalized AS55I data to Lockin 11 of the c1ass model.

In addition, in order to get the c1ass model to compile, we had to place all the non-IO parts inside a subsystem block, which we called ASS, and gave the top_names tag to.

The c1lsc and c1ass models were rebuilt, the frame builder restarted, and the models restarted.

  4748   Thu May 19 12:09:41 2011 josephbUpdateCDSAA filter box pulled from 1X5, optic suspensions currently off

[Steve, Joe]

Steve pulled the top AA filter box from 1X5 which handled some of the suspensions channels.  We turned off all the watchdogs before pulling it out, as well as recorded which cables were connected to which inputs.

The case  is undergoing a structural modification to have the ADC adapter card which previously was loosely connected via cables, securely attached to the case.

Steve still wants to do some cabling in the rack while the box is out, and will return it this afternoon once he has finished that.

  4770   Tue May 31 11:26:29 2011 josephbUpdateCDSCDS Maintenance

1) Checked in the changes I had made to the c1mcp.mdl model just before leaving for Elba.

2) The c1x01 and c1scx kernel modules had stopped running due to an ADC timeout. 

According to dmesg on c1iscex, they died at 3426838 seconds after starting (which corresponds to ~39 days).  "uptime" indicates c1iscex was up for 46 days, 23 hours. So my guess is about 8 days ago (last Monday or Tuesday),  they both died when the ADCs failed to respond quick enough for an unknown reason.

I used the kill scripts (in /opt/rtcds/caltech/c1/scripts/) to kill c1spx, c1scx, and c1x01.  I then used the start scripts to start c1x01, then c1scx, and then finally c1spx.  They all came up fine.

Status screen is now all green.  I renabled damping on ETMX and it seems to be happy. A small kick of the optic shows the approriately damped response.

  4776   Wed Jun 1 11:31:50 2011 josephbUpdateCDSMC1 LR digital reading close to zero, readback ~0.7 volts

There appears to be a bad cable connection somewhere on the LR sensor path for the MC1 optic.

The channel C1:SUS-MC1_LRPDMon is reading back 0.664 volts, but the digital sensor channel, C1:SUS-MC1_LRSEN_INMON, is reading about -16.  This should be closer to +1000 or so.

We've temporarily turned off the LRSEN filter module output while this is being looked into.

I briefly went out and checked the cables around the whitening and AA boards for the suspension sensors, but even after wiggling and making sure everything was plugged in solidly.  There was one semi-loose connection, but it wasn't on the MC1 board, but I pushed it all the way in anyways.  The monitor point on the AA board looks correct for the LR channels, although ITMX LR struck me as being very low at about -0.05 Volts.

According to data viewer, the MC1 LR sensor channel went bad roughly two weeks ago, around 00:40 on 5/18 UTC, or 17:40 on 5/17 PDT.

 

UPDATE:

It appears the AA board (or possibly the SCSI cable connected to it) is the problem in the chain.

  4800   Thu Jun 9 16:18:03 2011 josephbUpdateCDSSecond trends only go back 12 days

While answering a quick question by Kiwamu, I noticed we only had second trends going back to 99050000 GPS time, May 27th 2011. 

Trends (I thought) were intended to be kept forever, and certainly longer than full data, which currently goes back several months.

Jamie will need to look into this.

  4918   Thu Jun 30 06:54:07 2011 josephbUpdateCDSModified the automated scripts for producing model webviews

Dave Barker pointed out last week that the webview of our simulink model files, generated from the installed models (i.e. in /opt/rtcds/caltech/c1/target/<system name>/simLink/) was not handling libraries properly.  Essentially the web pages generated couldn't see inside library parts.

This was caused by 2 problems.  The first being the userapps not being in the matlab path when the slwebview call was done, so it couldn't even find the libraries.  The second problem is the slwebview code by default doesn't follow libraries and links, and needs a special command to be told to do so.

I added the following lines to the webview_simlink_update.m file:

addpath('/opt/rtcds/caltech/c1/core/trunk/src/epics/simLink/lib')
for sub = {'cds','isc','isi','sus','psl'}
 for spath = {'common/models','c1/models/lib'}
   addpath(['/opt/rtcds/caltech/c1/userapps/release/' sub{1} '/' spath{1}]);
 end
end

I also changed the following:

temp = slwebview(final_files{x},'viewFile',false);

became

temp = slwebview(final_files{x},'viewFile',false,'FollowLinks','on','FollowModelReference','on');

After confirming these changes worked, I have sent a corrected version to Dave and Keith.

The webview results can be found at: https://nodus.ligo.caltech.edu:30889/FE/

 

 

  6051   Wed Nov 30 11:04:26 2011 josephbUpdateCDSFiltering Noise issue tracked down ???

Quote:

For now, I suppose we can just change this number to 1e-40 or so. I don't know how to calculate what the right number should be. Not sure why this underflow is not an issue for the BiQuad, however.

According to the RCG SVN logs, the reason it was removed was a more general change done to the compiled code, not specific to just the biquad.  Basically, the ability to have an underflow number (subnormal) has been turned off completely by having any number that underflows set to zero. I'm not positive, but from a quick search looks that the smallest number before hitting is an underflow as a double is 2.2250738585072014e-308.

Alex's entry from the SVN log for 2663:

Added new fz_daz() function to turn on two bits in the FPU SSE control register.
Bits FZ (flush underflows to zero) and DOZ (denorms are zeros) are set to
avoid runaway code on float/double denorms (really small numbers).
Ref: http://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior/

SVN log 2664:

Removed +- 1e-20 limiting code, this is taken care of by setting FZ/DOZ bits
in the CPU SEE control register (see mathInline.h)

SVN log 2665:

Kill the underflows and roll down float denorms to zero,
see fz_doz() in mathInline.h.

  1892   Wed Aug 12 13:35:03 2009 josephb, AlexConfigurationComputersTested old Framebuilder 1.5 TB raid array on Linux1

Yesterday, Alex attached the old frame builder 1.5 TB raid array to linux1, and tested to make sure it would work on linux1.

This morning he tried to start a copy of the current /cvs/cds structure, however realized at the rate it was going it would take it roughly 5 hours, so he stopped.

Currently, it is planned to perform this copy on this coming Friday morning.

  3128   Mon Jun 28 13:40:53 2010 josephb, AlexUpdateCDSChanges added to CDS SVN, new checkout, new features, some changes made

Last week Alex merged in the changes I had made to the local 40m copy of the Real Time Code Generator.  These were to add a new part, called FiltMuxMatrix, which is a matrix of filter banks, as well as fixing the filter medm generation code so the filter banks actually have working time stamps.

I checked out a new version of the CDS SVN with these changes merged in.  Changes that will be added in the near future by Rolf and Alex include the addition of "tags".  These are pieces in simulink which act as a bridge between two points, so you can reduce the amount of wire clutter on diagrams.  Otherwise they have no real affect on the generated C code.  Also the ADC/DAC channel selector and in fact the ADC/DAC parts will be changing.  The MIT group has requested the channel selector be freed up for its original purpose in matlab, so Rolf is working on that.

The new checkout includes the new directory scheme Rolf is pushing.  So when you run the code generator and more specifically, install SYS, it places code in /opt/rtcds/caltech/c1/ type directories, like medm, chans, target, scripts.

For the time being, Alex has created a directory /rtcds on Linux1 under /home/cds.  He then created softlinks to that directory on megatron, c1iscex, and allegra in the /opt directory.  This was an easy way to have a shared path.

However, it does mean on each new FE  machine after setting up the mounting of /home/cds from Linux1, we also need to create the /opt/rtcds link to /cvs/cds/rtcds.

After checking out the CDS SVN, we discovered there some files missing that Alex had added to his version, but not the main branch.  Alex came over to the 40m and proceeded to get all those files checked in.  We then checked it out again.  Changes were made to the awg, framebuilder, and nds codes and needed to be rebuilt. 

There's a new naming scheme for models.  You need to include the site before the 3 letter system name.  So lsc.mdl become c1lsc.mdl.

Certain other file name conventions were also changed.  Instead of tpchn_c1.par, tpchn_c2.par, etc, its now tpchn_c1lsc.par, tpchn_c2lsp.par, etc.  The system name is included at the end of the filename, to help make it clearer what file goes with what.

This required an edit of the chnconf file, which has explicit calls to those file names.  Once we edited that file, we had to reload the xinetd service which its apparently a subpart of (this can be accomplished by /etc/init.d/xinetd stop, then /etc/init.d/xinetd start).

/etc/rc.d/rc.local also had to be edited for the new model names (c1lsc, c1lsp, etc).

The daqdrc file (for the framebuilder) now parses which dcu_rate to use from the tpchn_c1lsc.par type files, so the dcu_rate 20 = 16384 lines have been removed.  set gds_server has also been removed, and replaced with tpconfig "/opt/rtcds/caltech/c1/target/gds/param/testpoint.par";  from which it can get the hostname.  This information is now derived from the c1SYS.mdl file.

Hostname needs to be added to the .mdl files, in the cdsParameter block (i.e. host=megatron).

After that Alex informed me the IOP processor needs to be running for the other models to work properly, as well as for the Framebuilder to work.

The models and framebuilder now get their timing signal from the IOP (input/output processor).  This must be running in order for the other models or FB to run properly.  Its generally named c1x00 or c1x01 or similar.  The last two numbers ideally are unique to each FE computer.

Initially there was a problem running on Megatron, because the IOP gets its timing signal from the IO chassis, and there was none connected to megatron.  However, he has since modified the code so that if there's no IO chassis, the IOP processor just uses the system clock.  It has been tested and runs on megatron now.

 

  506   Fri May 30 12:03:08 2008 josephb, AndreyConfigurationCamerasHead to head comparison of cameras
Andrey and myself - Joseph B. - have examined the output of the GC650 (CCD) and GC750 (CMOS) prosilica cameras. We did several live motion tests (i.e. rotate the turning mirror, move and rotate the camera, etc) and also used a microscope slide to try to eliminate back reflections and interference.

Both the GC650 and GC750 produce dark lines in the images, some of which look parallel, while others are in much stranger shapes, such as circles and arcs.

Moving the GC750 camera physically, we have the spot moving around, with the dark lines appearing to be fixed to the camera itself, and remain in the same location on the detector. I.e. coming back to the same spot keeps showing a circle. In reasonably well behaved sections, these lines are about 10% dips in power, and could in principle be subtracted out. Its possible that the camera was damaged with too much light incident in the past, although going back to the pmc_trans images that were taken, similar lines are still visible.

Moving the GC650 camera physically seems to change the position of the lines (if one also rotates the turning mirror to get to the same spot on the CCD). It seems as if a slight change in angle has a large effect on these dark bands, which can either be thin, or very large, bordering on the size of the spot size. My guess is (as the vendor suggested) the light is interacting with the electronics behind the surface layer rather than a surface defect producing these lines. Using a microscope slide in between the turning mirror and the GC650, we were able to produce new fringes, but didn't affect the underlying ones.

Placing a microscope slide in between the last turning mirror and the GC750 does not affect the dark lines (although it does seem to add some), nor does turning the final turning mirror, so it seems unlikely to be caused by back reflection in this case.

So it seems the CMOS may be more consistent, although we need to determine if the current line problems are due to exposure to too much light at some point in the past (i.e. I broke it) or they come that way from the factory.

Attached are the results of image-processing of the images from the two our cameras using Andrey's new Matlab script.
Attachment 1: Waveform_Reconstruction_May30-2008.pdf
Waveform_Reconstruction_May30-2008.pdf Waveform_Reconstruction_May30-2008.pdf Waveform_Reconstruction_May30-2008.pdf Waveform_Reconstruction_May30-2008.pdf Waveform_Reconstruction_May30-2008.pdf Waveform_Reconstruction_May30-2008.pdf
  4406   Fri Mar 11 18:32:45 2011 josephb, Chris, JamieUpdateCDSDebugging simplant damping

The FM1 filter module change for XXSEN was propagated to the ETMX suspension.  This was a change from a 30,100:3 with a DC gain of 1 to a 3:30 which just compensates the hardware filter.

The good gains for the Sim damping were found to be:  100 for ETMX_SUSPOS, 0.1 ETMX_SUSPIT, and 0.1 ETMX_SUSYAW, ETMX_SUSSIDE is -70.  Gains much higher tended to saturate the simulated coils (i.e. hitting 10V limit) and then had to have the histories cleared for the RESPONSE matrix.

These seem to work to damp the real ETMX as well.

  558   Tue Jun 24 17:12:10 2008 josephb, EricConfigurationCamerasGC750 setup, 1X4 Hub connected, ETMX images
The GC750 camera has been setup to look at ETMX. In addition, the new 1X4 rack mounted switch (131.215.113.200) has been connected via new cat6 cable to the control room hub (131.215.113.1?), thanks to Eric. The camera is now plugged into 1X4 rack switch and now has a gigabit connection to the control room computers as well as Mafalda (131.215.113.23).

By using ssh -X mafalda or ssh -X 131.215.113.23, then typing:

target
cd Prosilica/bin-pc/x86/
./Sampleviewer

A viewer will be brought up. By clicking on the 3rd icon from the left (looks like an eye) will bring up a live view.

Closing the view, and then cd ../../40mCode, and then running ./Snap --help will tell you how to use a simple code for taking .tiff images as well as setting things such as exposure length and size of image (in pixels) to send.

When the interferometer was set to an X-arm only configuration, we took two series of 200 images each, with two different exposure lengths.

Attached are three pdf images. The first is just a black and white single image, the second is an average of 100 images, and the third is the standard deviation of the 100 images.
Attachment 1: GC750_ETMX_E30000_single.pdf
GC750_ETMX_E30000_single.pdf
Attachment 2: GC750_ETMX_E30000_avg.pdf
GC750_ETMX_E30000_avg.pdf
Attachment 3: GC750_ETMX_E30000_std.pdf
GC750_ETMX_E30000_std.pdf
  681   Wed Jul 16 15:59:04 2008 josephb, EricConfigurationCamerasPMC trans camera path
In order to reduce saturation, we placed a Y1 plate (spare from the SP table) in transmission just before the GC650 camera looking at the PMC transmision. The reflection (most of the light) was dumped to a convient razor blade dump. We also removed the 0.3 and 0.5 ND filters and placed them in the 24 hour loan ND filter box.

Good exposure values to view are now around 3000 for that camera.
  693   Fri Jul 18 12:24:15 2008 josephb, EricConfigurationCamerasChanged Lenses on GC750 at ETMX
We removed the giant TV zoom lens and replaced it with a much smaller fixed zoom lens. Currently it views the entire optic. We have another (also small) zoom lens which focuses much better on the spot itself. With how far back the camera is currently placed, neither of these fixed zoom lenses can touch or hit the view port or the chamber while still attached to camera and mount, even using all of the mount's motion range. So this should be less of a safety issue.

Ideally, we'd like to get some images of the full optic (including osems and so forth) with the X-arm locked, and then use the higher zoom lens while still locked, to get images we can use to calibrate the x and y length scales.
  767   Wed Jul 30 13:09:40 2008 josephb, EricConfigurationPSLPMC scan experiment
We turned the PSL power down by a factor of 4, blocked one half of the Mach Zehnder and scanned the PMC by applying a ramp signal to PMC PZT. Eric will adding plots later today of those results.

We returned the power to close to original level and removed the block on the Mach Zehnder, and then relocked the PMC.
  899   Fri Aug 29 12:41:26 2008 josephb, EricConfigurationComputersMore front ends moved to new network
Used Cat6 cables to finish moving all the front ends in 1Y4 and 1Y5 over to the new GigE network switches, specifically to the switch in 1Y6. This included the ones labeled c1susvme2, c1sosvme, and c1dscl1epics0.
  932   Fri Sep 5 09:56:14 2008 josephb, EricConfigurationComputersFunny channels, reboots, and ethernet connections
1) Apparently the I00-ICS type channels had gotten into a funny state last night, where they were showing just noise, exactly when Rana changed the accelerometer gains and did major reboots. A power cycle of the c1ioo crate and appropriate restarts fixed this.

2) c1asc looks like it was down all night. When I walked out to look at the terminal, it claimed to be unable to read the input file from the command line I had entered the previous night ( < /cvs/cds/caltech/target/c1asc/startup.cmd). In addition we were unable to telnet in, suggesting an ethernet breakdown and inability to mount the appropriate files. So we have temporarily run a new cat6 cable from the c1asc board to the ITMX prosafe switch (since there's a nice knee high cable tray right there). One last power cycle and we were able to telnet in and get it running.
  922   Thu Sep 4 11:33:25 2008 josephb, Eric, JenneConfigurationComputersAttempt to increase gain for C1:PSL-ISS_INMONPD_F via 110B
We were attempting to increase the gain on the channel C1:PSL-ISS_INMONPD_F in preparation to do a scan of the PMC at very low input power.

We started by adding a line to the C1:IOOF.ini file in /cvs/cds/caltech/chans/daq/ under that channel that said "gain=10.0". Before touching anything, the channel was outputting around 4000 counts.

We hit the reconfig button for c1iovme16k, then rebooted c1iovme (which turned out to do nothing) and then the framebuilder, in a method consistent with the wiki. This turned out to put the channel in an odd state, where it was showing very rapid, random spikes, virtually but still around 4000ish counts. We returned the file back to its original format, hit reconfig, and then rebooted the framebuilder. The channel however, was still behaving in the same broken way.

After poking around the PSL table, looking at some direct outputs, we came back and rebooted c1iovme and the framebuilder again, which fixed the channel, such that it was reading out correctly. Taking this as a sign that maybe we should reboot the framebuilder, then c1iovme to get the channel to load changes, we changed the file again to have "gain=10.0". Upon reboot of the framebuilder, the channel was still reading out fine, but at the same level. So we continued with the reboot of c1iovme. This still had no effect on the channel output.

The ini file has been set back at this point, however since Yoichi is working, I'm holding off doing a reconfig and reboot on the framebuilder until later.
  4509   Mon Apr 11 13:30:04 2011 josephb, JamieUpdateCDSNo Wiper script - Frames full over weekend

Problem:

The daqd process was dying every minute or so when it couldn't write frame.  This was slowing down the network by writing a 2.9G core dump over NFS every minute or so. (In /opt/rtcds/caltech/c1/target/fb/).

The problem was /frames/ was 100% full.

Apparently, when we switched the fb over to Gentoo, we forgot to install crontab and a wiper script.

Solution:

We will install crontab and get the wiper script installed.

  822   Mon Aug 11 11:36:11 2008 josephb, SteveConfigurationComputersc1susvme1 minor problems
Around 11 am c1susvme1 start having issues. Namely C1:SUS-PRM_FE_SYNC was railing at some large value like 16384 (2^14). I presume this means the computer was running catastophically late.

I turned off the BS and ITM watch dogs (the PRM was already off), tried hitting reset and sshing in, and running startup, but this didn't help. I then turned off the c1susvme2 associated watch dogs (MC1-3, SRM) and went out to do a hard reboot by switching the crate power off. c1susvme2 came back up fine, was restarted and associated watch dogs turned back on. However, c1susvme1 came back up without mounting /cvs/cds/.

As a test, I replaced the ethernet connection with a CAT6 cable to the Prosafe switch in 1Y6, and then ran reboot on c1susvme1. When it came back up, it had mounted properly, and I was able to run the ./startup.cmd file. At this point it seems to be happy. The new cable is in the trays coming in from the top of the 1Y4 and 1Y6 and approriately labeled.

Edit: Apparently ITMX and ITMY became excited after the reboot (perhaps I turned the watchdogs back on too early? Although that was after the DAQ light was listed as green for c1susvme). Steve noticed this when the alarms went off again (I had turned them off after the reboot seemed successful), and he damped them. Interestingly, the BS remained unexcited.
  1673   Mon Jun 15 15:17:33 2009 josephb, SteveConfigurationVACVacuum control and monitor screens

We updated the vacuum control and monitor screens  (C0VAC_MONITOR.adl and C0VAC_CONTROL.adl).  We also updated the /cvs/cds/caltech/target/c1vac1/Vac.db file.

1) We changed the C1:Vac-TP1_lev channel to C1:Vac-TP1_ala channel, since it now is an alarm readback on the new turbo pump rather than an indication of levitation.  The logic on printing the "X" was changed from X is printed on a 1 = ok status) to X is printed on a 0 = problem status.  All references within the Vac.db file to C1:Vac-TP1_lev were changed.  The medm screens also now are labeled Alarm, instead of Levitating.

2) We changed the text displayed by the CP1 channel (C1:Vac-CP1_mon in Vac.db) from "On" and "Off" to "Cold - On" and "Warm - OFF".

3) We restarted the c1vac1 front end as well as the framebuilder after these changes.

  1258   Thu Jan 29 16:50:53 2009 josephb, albertoConfigurationComputersMegatron fixed
The warning light on megatron and the fans at full speed were fixed by not just power cycling, but completely unplugging megatron from power, waiting for a minute, and then reconnecting the power cables.

Apparently, the Sunfire X4600s at Hanford have also had intermittent problems, which required unplugging the machines completely. In their case, they were unable to access the machine normally, nor could they access the the Intergrated Lights Out Manager (ILOM).

Here, we could interact normally with megatron, but were unable to connect to the ILOM. We were able to get BIOS, but unable to change any of the setting for the ILOM connection. Since the ILOM is a seperate processor and effectively always on, even when the power light is off, a normal shutdown won't reset it. Hence the need to completely disconnect the power supply.
  1555   Thu May 7 15:22:19 2009 josephb, albertoConfigurationComputersfb40m

Quote:

Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.

 

We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.

 

 

 

 

This afternoon the alignment script chrashed after returning sysntax errors. We found that the tpman wasn't running on the framebuilder becasue it had probably failed to get restarted in one of the several reboots executed in the morning by Alex and Jo.

Restarting the tpman was then sufficient for the alignment scripts to get back to work.

  1668   Thu Jun 11 14:54:18 2009 josephb, albertoUpdateComputersWireless network

After poking around for a few minutes several facts became clear:

1) At least one GPIB interface has a hard ethernet connection (and does not currently go through the wireless).

2) The wireless on the laptop works fine, since it can connect to the router.

3) The rest of the martian network cannot talk to the router.

This led to me replugging the ethernet cord back into the wireless router, which at some point in the past had been unplugged.  The computers now seem to be happy and can talk to each other.

 

  1554   Thu May 7 12:21:36 2009 josephb, alexConfigurationComputersfb40m

Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.

 

We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.

 

 

  2215   Mon Nov 9 14:59:34 2009 josephb, alexUpdateComputersThe saga of Megatron continues

Apparently the random file system failure on megatron was unrelated to the RFM card (or at least unrelated to the physical card itself, its possible I did something while installing it, however unlikely).

We installed a new hard drive, with a duplicate copy of RTL and assorted code stolen from another computer.  We still need to get the host name and a variety of little details straightened out, but it boots and can talk to the internet.  For the moment though, megatron thinks its name is scipe11.

You still use ssh megatron.martian to log in though.

We installed the RFM card again, and saw the exact same error as before.  "NMI EVENT!" and "System halted due to fatal NMI".

Alex has hypothesized that the interface card the actual RFM card plugs into, and which provides the PCI-X connection might be the wrong type, so he has gone back to Wilson house to look for a new interface card.  If that doesn't work out, we'll need to acquire a new RFM card at some point

After removing the RFM card, megatron booted up fine, and had no file system errors.  So the previous failure was in fact coincidence.

 

  2225   Tue Nov 10 10:51:00 2009 josephb, alexUpdateComputersMegatron on, powercycled c1omc, and burt restored from 3am snapshot

Last night around 5pm or so, Alex had remotely logged in and made some fixes to megatron.

First, he changed the local name from scipe11 to megatron.  There were no changes to the network, this was a purely local change.  The name server running on Linux1 is what provides the name to IP conversions.  Scipe11 and Megatron both resolve to distinct IPs. Given c1auxex wasn't reported to have any problems (and I didn't see any problems with it yesterday), this was not a source of conflict.  Its possible that Megatron could get confused while in that state, but it would not have affected anything outside its box.

Just to be extra secure, I've switched megatron's personal router over from a DMZ setup to only forwarding port 22.  I have also disabled the dhcp server on the gateway router (131.215.113.2).

Second, he turned the mdp and mdc codes on.  This should not have conflicted with c1omc.

This morning I came in and turned megatron back on around 9:30 and began trying to replicate the problems from last night between c1omc and megatron.  I called Alex and we rebooted c1omc while megatron was on, but not running any code, and without any changes to the setup (routers, etc).  We were able to burt restore.  Then we turned the mdp, mdc and framebuilder codes on, and again rebooted c1omc, which appeared to burt restore as well (I restored from 3 am this morning, which looks reasonable to me). 

Finally, I made the changes mentioned above to the router setups in the hope that this will prevent future problems but without being able to replicate the issue I'm not sure.

  2266   Fri Nov 13 10:28:03 2009 josephb, alexUpdateComputersMegatron is back to its old self

I called Alex this morning and explained the problems with megatron.

Turns out when he had been setting up megatron, he thought a startup script file, rc.local was missing in the /etc directory.  So he created it.  However, the rc.local file in the /etc directory is normally just a link to the /etc/rc.d/rc.local file.  So on startup (basically when we rebooted the machine yesterday), it was running an incorrect startup script file.  The real rc.local includes line:

/usr/bin/setup_shmem.rtl mdp mdc&

Hence the errors we were getting with shm_open().  We changed the file into a soft link, and resourced the rc.local script and mdp started right up.  So we're back to where we were 2 nights ago (although we do have an RFM card in hand).

Update:  The tst module wouldn't start, but after talking to Alex again, it seems that I need to add the module tst to the /usr/bin/setup_shmem.rtl mdp mdc& line in order for it to have a shared memory location setup for it.  I have edited the file (/etc/rc.d/rc.local), adding tst at the end of the line.  On reboot and running starttst, the code actually loads, although for the moment, I'm still getting blank white blocks on the medm screens.

  2305   Fri Nov 20 11:01:58 2009 josephb, alexConfigurationComputersWhere to find RFM offsets

Alex checked out the old rts (which he is no longer sure how to compile) from CVS to megatron, to the directory:

/home/controls/cds/rts/

In /home/controls/cds/rts/src/include you can find the various h files used.  Similarly, /fe has the c files.

In the h files, you can work out the memory offset by noting the primary offset in iscNetDsc40m.h

A line like suscomms.pCoilDriver.extData[0] determines an offset to look for.

0x108000 (from suscomms )

Then pCoilDriver.extData[#] determines a further offset.

sizeof(extData[0]) = 8240  (for the 40m - you need to watch the ifdefs, we were looking at the wrong structure for awhile, which was much smaller).

DSC_CD_PPY is the structure you need to look in to find the final offset to add to get any particular channel you want to look at.

The number for ETMX is 8, ETMY 9 (this is in extData), so the extData offset from 0x108000 for ETMY should be 9 * 82400.  These numbers (i.e. 8 =ETMX, 9=ETMY) can be found in losLinux.c in /home/controls/cds/rts/src/fe/40m/.  There's a bunch of #ifdef and #endif which define ETMX, ETMY, RMBS, ITM, etc.  You're looking for the offset in those.

So for ETMY LSC channel (which is a double) you add 0x108000 (a hex number) + (9 * 82400 + 24) (not in hex, need to convert) to get the final value of 0x11a160 (in hex).

-----------

A useful program to interact with the RFM network can be found on fb40m.  If you log in and go to:

/usr/install/rfm2g_solaris/vmipci/sw-rfm2g-abc-005/util/diag

you can then run rfm2g_util, give it a 3, then type help.

You can use this to read data.  Just type help read.  We had played around with some offsets and various channels until we were sure we had the offsets right.  For example, we fixed an offset into the ETMY LSC input, and saw the corresponding memory location change to that value.  This utility may also be useful for when we do the RFM test to check the integrity of the ring, as there are some diagnostic options available inside it.

  2306   Fri Nov 20 11:14:22 2009 josephb, alexConfigurationComputerstest points working on megatron and we may have filters with switch outputs built in

Alex tooked at the channel definitions (can be seen in tpchn_C1.par), and noticed the rmid was 0. 

However, we had set in testpoint.par the tst system to C-node1 instead of C-node0.  The final number inf that and the rmid need to be equal.   We have changed this, and the test points appear to be working now.

However, the confusing part is in the tst model, the gds_node_id is set to 1.  Apparently, the model starts counting at 1, while the code starts counting at 0, so when you edit the testpoint.par file by hand, you have to subtract one from whatever you set in the model.

In other news, Alex pointed me at a CDS_PARTS.mdl, filters, "IIR FM with controls".  Its a light green module with 2 inputs and 2 outputs.  While the 2nd set of input and outputs look like they connect to ground, they should be iterpreted by the RCG to do the right thing (although Alex wasn't positive it works, it worth trying it and seeing if the 2nd output corresponds to a usable filter on/off switch to connect to the binary I/O to control analog DW.  However, I'm not sure it has the sophistication to wait for a zero crossing or anything like that - at the moment, it just looks like a simple on/off switch based on what filters are on/off.

  2487   Fri Jan 8 11:43:22 2010 josephb, alexUpdateComputersRFM and Megatron

Alex came over with a short RFM cable this morning.  We used it to connect the rfm card in c1iscey to the rfm card megatron

Alex renamed startup.cmd in /cvs/cds/caltech/target/c1iscey/ to startup.cmd.sav, so it doesn't come up automatically.  At the end we moved it back.

Alex used the vxworks command d to look at memory locations on c1iscey.  Such as d 0xf0000000, which is the start of the rfm code location.  So to look at 0x11a1c8 (lscPos) in the rfm memory, he typed "d 0xf011a1c8".  After doing some poking around, we look at the raw tst front end code (in /home/controls/cds/advLigo/src/fe/tst), and realized it was trying to read doubles.  The old rts code uses floats, so the code was reading incorrectly.

As a quick fix, we changed the code to floats for that part.  They looked like:

etmy_lsc = filterModuleD(dsp_ptr,dspCoeff,ETMY_LSC,cdsPciModules.pci_rfm[0]? *(\
(double *)(((void *)cdsPciModules.pci_rfm[0]) + 0x11a1c8)) : 0.0,0);

And we simply changed the double to float in each case.  In addition we changed the RCG scripts locally as well (if we do a update at some point, it'll get overwritten).  The file we updated was /home/controls/cds/advLigo/src/epics/util/lib/RfmIO.pm

Line 57 and Line 84 were changed, with double replaced with float.

return "cdsPciModules.pci_rfm[0]? *((float *)(((void *)cdsPciModules.pci
_rfm[$card_num]) + $rfmAddressString)) : 0.0";

. "  *((float *)(((char *)cdsPciModules.pci_rfm[$card_nu
m]) + $rfmAddressString)) = $::fromExp[0];\n"

This fixed our ability to read the RFM card, which now can read the LSC POS channel, for example.

Unfortunately, when we were putting everything the way it was with RFM fibers and so forth, the c1iscey started to get garbage (all the RFM memory locations were reading ffff).  We eventually removed the VME board, removed the RFM card, looked at it, put the RFM card back in a different slot on the board, and returned c1iscey to the rack.  After this it started working properly.  Its possible in all the plugging and unplugging that the card somehow had become loose.

The next step is to add all the channels that need to be read into the .mdl file, as well as testing and adding the channel which need to be written.

 

  2488   Fri Jan 8 15:40:14 2010 josephb, alexUpdateComputersRFM and RCG

Alex added a new module to the RCG, for generating RFMIO using floats.  This has been commited to CVS.

  2542   Fri Jan 22 12:33:37 2010 josephb, alexUpdateComputersModified CDS_PARTS for Binary output

Turns out the CDSO32 part (representing the Contec BO-32L-PE binary output) rquires two inputs. One for the first 16 bits, and one for the second set of 16 bits.  So Alex added another input to the part in the library.  Its still a bit strange, as it seems the In1 represents the second set of 16 bits, and the In2 represents the first set of 16 bits.

I added two sliders on the CustomAdls/C1TST_ETMY.adl control screen (upper left), along with a bit readout display, which shows the bitwise and of the two slider channels. For the moment, I still can't see any output voltage on any of the DO pins, no matter what output I set.

 

  2591   Thu Feb 11 18:33:54 2010 josephb, alexUpdateComputersStatus of the IP change over

A few machines have still not been changed over, including a few laptops, mafalda, ottavia, and c0rga.

All the front ends have been changed over.

fb40m died during a reboot and was replaced with a spare Sun blade 1000 that Larry had.  We had to swap in our old hard drive and memory.

All the front ends, belladonna, aldabella, and the control room machines have been switched over. Nodus was changed over after we realized we hosed the elog and svn by switching linux1's IP.

At this point, 90% of the machines seem to be working, although c0daqawg seems to be having some issues with its startup.cmd code.

  2603   Sat Feb 13 18:58:31 2010 josephb, alexUpdateComputersfb40m testpoints fixed

I received an e-mail from Alex indicating he found the testpoint problem and fixed it today:

Quote from Alex: "After we swapped the frame builder computer it has reconfigured all device files and I needed to create some symlinks on /dev/ to make tpman work again. I test the testpoints and they do work now."

 

  2983   Tue May 25 16:40:27 2010 josephb, alexUpdateCDSFinally tracked down why new models wouldn't talk to each other

The problem with the new models using the new shared memory/dolphin/RFM defined as names in a single .ipc file.

The first is the no_oversampling flag should not be used.  Since we have a single IO processor handling ADCs and DACs at 64k, while the models run at 16k, there is some oversampling occuring.  This was causing problems syncing between the models and the IOP.

It also didn't help I had a typo in two channels which I happened to use as a test case to confirm they were talking.  However, that has been fixed.

  3055   Tue Jun 8 15:58:25 2010 josephb, alexUpdateCDSNew multi-filter matrix part added to RCG (at the 40m at least)

A new webview of the LSP model is available at:

https://nodus.ligo.caltech.edu:30889/FE/lsp_slwebview_files/

This model include a couple example noise generators as well as the new Matrix of Filter banks (5 inputs x 15 outputs = 75 Filters!).  The attached png shows where these parts can be found in the CDS_PARTS library.  I'm still working on the automatic generation of the matrix and filter bank medm screens for this part.  The plan is to have a matrix screen similar to current ones, except that the value entry points to the gain setting of the associated filter.  In addition, underneath each value, there will be a link to the full filter bank screen.  Ideally, I'd like to have the filter adl files located in a sub-directory of the system, to keep clutter down.

I've cut and past the new Foton file generated by the LSP model below.  The first number following the MTRX is the input the filter is taking data from and the second number is the output its pushing data to.  This means for the script parsing Valera's transfer functions, I need to input which channel corresponds to which number, such as DARM = 0, MICH = 1, etc.  So the next step is to write this script and populate the filter banks in this file.

# FILTERS FOR ONLINE SYSTEM
#
# Computer generated file: DO NOT EDIT
#
# MODULES DOF2PD_AS11I DOF2PD_AS11Q DOF2PD_AS55I DOF2PD_AS55Q
# MODULES DOF2PD_ASDC DOF2PD_POP11I DOF2PD_POP11Q DOF2PD_POP55I
# MODULES DOF2PD_POP55Q DOF2PD_POPDC DOF2PD_REFL11I DOF2PD_REFL11Q
# MODULES DOF2PD_REFL55I DOF2PD_REFL55Q DOF2PD_REFLDC Mirror2DOF_f2x1
# MODULES Mirror2DOF_f2x2 Mirror2DOF_f2x3 Mirror2DOF_f2x4 Mirror2DOF_f2x5
# MODULES Mirror2DOF_f2x6 Mirror2DOF_f2x7 DOF2PD_MTRX_0_0 DOF2PD_MTRX_0_1
# MODULES DOF2PD_MTRX_0_2 DOF2PD_MTRX_0_3 DOF2PD_MTRX_0_4 DOF2PD_MTRX_0_5
# MODULES DOF2PD_MTRX_0_6 DOF2PD_MTRX_0_7 DOF2PD_MTRX_0_8 DOF2PD_MTRX_0_9
# MODULES DOF2PD_MTRX_0_10 DOF2PD_MTRX_0_11 DOF2PD_MTRX_0_12 DOF2PD_MTRX_0_13
# MODULES DOF2PD_MTRX_0_14 DOF2PD_MTRX_1_0 DOF2PD_MTRX_1_1 DOF2PD_MTRX_1_2
# MODULES DOF2PD_MTRX_1_3 DOF2PD_MTRX_1_4 DOF2PD_MTRX_1_5 DOF2PD_MTRX_1_6
# MODULES DOF2PD_MTRX_1_7 DOF2PD_MTRX_1_8 DOF2PD_MTRX_1_9 DOF2PD_MTRX_1_10
# MODULES DOF2PD_MTRX_1_11 DOF2PD_MTRX_1_12 DOF2PD_MTRX_1_13 DOF2PD_MTRX_1_14
# MODULES DOF2PD_MTRX_2_0 DOF2PD_MTRX_2_1 DOF2PD_MTRX_2_2 DOF2PD_MTRX_2_3
# MODULES DOF2PD_MTRX_2_4 DOF2PD_MTRX_2_5 DOF2PD_MTRX_2_6 DOF2PD_MTRX_2_7
# MODULES DOF2PD_MTRX_2_8 DOF2PD_MTRX_2_9 DOF2PD_MTRX_2_10 DOF2PD_MTRX_2_11
# MODULES DOF2PD_MTRX_2_12 DOF2PD_MTRX_2_13 DOF2PD_MTRX_2_14 DOF2PD_MTRX_3_0
# MODULES DOF2PD_MTRX_3_1 DOF2PD_MTRX_3_2 DOF2PD_MTRX_3_3 DOF2PD_MTRX_3_4
# MODULES DOF2PD_MTRX_3_5 DOF2PD_MTRX_3_6 DOF2PD_MTRX_3_7 DOF2PD_MTRX_3_8
# MODULES DOF2PD_MTRX_3_9 DOF2PD_MTRX_3_10 DOF2PD_MTRX_3_11 DOF2PD_MTRX_3_12
# MODULES DOF2PD_MTRX_3_13 DOF2PD_MTRX_3_14 DOF2PD_MTRX_4_0 DOF2PD_MTRX_4_1
# MODULES DOF2PD_MTRX_4_2 DOF2PD_MTRX_4_3 DOF2PD_MTRX_4_4 DOF2PD_MTRX_4_5
# MODULES DOF2PD_MTRX_4_6 DOF2PD_MTRX_4_7 DOF2PD_MTRX_4_8 DOF2PD_MTRX_4_9
# MODULES DOF2PD_MTRX_4_10 DOF2PD_MTRX_4_11 DOF2PD_MTRX_4_12 DOF2PD_MTRX_4_13
# MODULES DOF2PD_MTRX_4_14  
# MODULES 

Attachment 1: CDS_Library.png
CDS_Library.png
  3534   Tue Sep 7 15:31:28 2010 josephb, alexUpdateCDSBinary Output working/ IO chassis fixed

As noted previously, we were having problems with getting multiple 32 channel binary output cards working.  Alex came by and we eventually tracked the problem down to an incorrect counter in the c code.  This has been fixed and checked into the CDS svn repository.  I tested the actual hardware and we are in fact able to turn our test LEDs on with multiple binary output boards.

 

Alex and I also looked at the non-functional IO chassis (the one which wouldn't sync with the 1PPS signal and wasn't turning on when the computer turned on.  We discovered one corner of the trenton board wasn't screwed down and was in fact slightly warped.  I screwed it down properly, straightening the board out in the process.  After this, the IO chassis worked with a host interface board to the computer and started properly.  We were able to see the boards attached as well with lspci.  So that chassis looks to be in working condition now.

Onwards to the RFM test.

  3600   Thu Sep 23 12:05:20 2010 josephb, alexUpdateCDSfb40m down, new fb in progress

Alex came over this morning and we began work on the frame builder change over.  This required fb40m be brought down and disconnected from the RAID array, so the frame builder is not available.

He brought a Netgear switch which we've installed at the top of the 1X7 rack.  This will eventually be connected, via Cat 6 cable, to all the front ends.  It is connected to the new fb machine via a 10G fiber.

Alex has gone back to Downs to pickup a Symmetricon (sp?) card for getting timing information into the frame builder.  He will also be bringing back a harddrive with the necessary framebuilder software to be copied onto the new fb machine.

He said he'd like to also put a Gentoo boot server on the machine.  This boot server will not affect anything at the moment, but its apparently the style the sites are moving towards.  So you have a single boot server, and diskless front end computers, running Gentoo.  However for the moment we are sticking with our current Centos real time kernel (which is still compatible with the new frame builder code).  However this would make a switch over to the new system possible in the future.

At the moment, the RAID array is doing a file system check, and is going slowly while it checks terabytes of data.  We will continue work after lunch. 

 Punchline: things still don't work.

  3602   Thu Sep 23 21:01:11 2010 josephb, alexUpdateCDSfb40m still down, new fb still in progress
Unfortunately, copying the data to the USB/SATA drive over at downs took longer than expected for Alex. We will be installing the new code on the new fb machine tomorrow and running it. We will be running off of a timer on that machine until Monday. On Monday, a Symmetricom card will be arriving from LLO so that we can connect an IRIG-B timing signal into the frame builder and use a proper time signal. There is no running frame builder for tonight and thus will be no trends until we get the new FB running tomorrow morning.
  3620   Wed Sep 29 12:08:28 2010 josephb, alexSummaryCDSLast burt save of old controls

This is being recorded for posterity so we know where to look for the old controls settings.

The last good burt restore that was saved before turning off scipe25 aka c1dcuepics was on September 29, 11:07.

  3625   Thu Sep 30 11:07:20 2010 josephb, alexUpdateCDStest points starting to work

The centos 5.5 compiled gds code is currently living on rosalba in the /opt/app directory (this is local to Rosalba only).  It has not been fully compiled properly yet.  It is still missing ezcaread/write/ and so forth.  Once we have a fully working code, we'll propagate it to the correct directories on linux1.

So to have a working dtt session with the new front ends, log into rosalba, go to opt/apps/, and source gds-env.bash in /opt/apps (you need to be in bash for this to work, Alex has not made a tcsh environment script yet).  This will let get testpoints and be able to make transfer function measurements, for example

Also, to build the latest awgtpman, got to fb, go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/gds, and type make.  This has been done and mentioned just as reference.

The awgtpman along with the front end models should startup automatically on reboot of c1sus (courtesy of the /etc/rc.local file).

  3628   Thu Sep 30 16:29:35 2010 josephb, alexUpdateCDSfb update

There currently seems to be a timing issue with  the frame builder.  We switched over to using a symmetricom card to get an IRIG-B signal into the fb machine, but the gps time stamp is way off (~80 years Alex said).

If there is a frame buiilder issue, its currently often necessary to kill the associated mx_stream processes, since they don't seem to restart gracefully.  To fix it the following steps should be taken:

Kill frame builder, kill the two mx_stream processes, then /etc/restart_streams/, then restart the frame builder (usual daqd -c ./daqdrc >& ./daqd.log in /opt/rtcds/caltech/c1/target/fb).

To restart (or start after a boot) the nds server, you need to go to /opt/rtcds/caltech/c1/target/fb and type

./nds /opt/rtcds/caltech/c1/target/fb/pipe

At this time, testpoints are kind of working, but timing issues seem to be preventing useful work being done with it.  I'm leaving with Alex working on the code.

 

  3633   Fri Oct 1 11:33:15 2010 josephb, alexConfigurationCDSChanging gds code to the new working version

Alex is installing the newly compiled gds code (compiled on Centos 5.5 on Rosalba) which does in fact include the ezca type tools. 

At the moment we don't have a solaris compile, although that should be done at somepoint in the future.  It means the gds tools (diaggui, foton, etc) won't work on op440m.  On the bright side, this newer gds code has a foton that doesn't seem to crash all the time on Linux.

 

  3635   Fri Oct 1 14:13:29 2010 josephb, alexUpdateCDSfb work that still needs to be done

1) Need to check 1 PPS signal alignment

2) Figure out why 1PPS and ADC/DAC testpoints went away from feCodeGen.pl?

3) Fix 1PPS testpoint giving NaN data

4) Figure out why is daqd printing "making gps time correction" twice?

5) Need to investigate why mx_streams are still getting stuck

6) Epics channels should not go out on 114 network (seen messages when doing
burt restore/save).

7) Dataviewer leaves test points hanging, daqd does not deallocate them
(net_Writer.c shutdown_netwriter call)

8) Need to install wiper scripts on fb

9) Need to install newer kernel on fb to avoid loading myrinet firmware
(avoid boot delay)

  3648   Tue Oct 5 13:46:26 2010 josephb, alexUpdateCDSRestarted fb trending

Fb is now once again actually recording trends.

A section of the daqdrc file (located in /opt/rtcds/caltech/c1/target/fb/ directory) had been commented out by Alex and never uncommented.  This section included the commands which actually make the fb record trends.

The section now reads as:

# comment out this block to stop saving data

#
start frame-saver;
sync frame-saver;
start trender;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start raw_minute_trend_saver;
#start frame-writer "225.225.225.1" broadcast="131.215.113.0" all;
#sleep 5;

  3651   Tue Oct 5 14:11:09 2010 josephb, alexUpdateCDSGoing to from rtlinux to Gentoo requires front end code clean out

Apparently when updating front end codes from rtlinux to the patched Gentoo, certain files don't get deleted when running make clean, such as the sysfe.rtl files in the advLigoRTS/src/fe/sys directories.  This fouls the start up scripts by making it think it should be configured for rtlinux rather than the Gentoo kernel module.

  3696   Tue Oct 12 13:05:00 2010 josephb, alexUpdateCDSfb still flaky, models time out fixed

Interesting information from Alex.  We're limited to 2 Megabytes per second per front end model.  Assuming all your channels are running at a 2kHz rate, we can have at most 256 channels being set to the frame builder from the front end (assuming 4 byte data).  We're fine for the moment, but perhaps useful to keep in mind.

I talked to Alex this morning and he said the frame builder is being flaky (it crashed on us twice this morning, but the third time seemed to stay up when requesting data).  I've added a new wiki page called "New Computer Restart Prodecures" under Computers and Scripts, found here. It includes all the codes that need to be running, and also a start order if things seem to be in a particularly bad state.  Unfortunately, there were no fixes done to the frame builder but it is on Alex's list of things to do.

In regards to the timing out of the front ends, Alex came over to the 40m this morning and we sat down debugging.  We tried several things, such as removing all filters from C1MCS.txt file in the chans directory, and watching the timing as we pressed various medm control buttons. We traced it to a filters used by the DAC in the model talking to the IOP front end, which actually sends the data to the physical DAC card.  The filter is used when converting between sample rates, in this case between the 16 kHz of the front end model and the 64 kHz of the IOP.  Sending it raw zeros after having had real data seemed to cause this filter to eat up an usually large amount of CPU time. 

We modified the /opt/rtcds/caltech/c1/core/advLigoRTS/src/include/drv/fm10Gen.c file.

We reverted a change that was done between version 908 and 929, where underflows (really small numbers) were dealt with by adding and then subtracting a very small number.  We left the adding and subtracting, but also restored the hard limits on the history.

So instead of relying on just:

input += 1e-16;
junk = input;
input -= 1e-16;

we also now use

if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

Thus any filter value who's absolute value is less than 1e-20 will be clamped to -1e-20 or 1e-20.  On the bright side, we no longer crash the front ends when we turn something off.

 

  3735   Mon Oct 18 15:33:00 2010 josephb, alexUpdateCDSPossibly broken timing cards

It looks as though we may have two IO chassis with bad timing cards.

Symptoms are as follows:

We can get our front end models writing data and timestamps out on the RFM network.

However, they get rejected on the receiving end because the timestamps don't match up with the receiving front end's timestamp.  Once started, the system is consistently off by the same amount. Stopping the front end module on c1ioo and restarting it, generated a new consistent offset.  Say off by 29,000 cycles in the first case and on restart we might be 11,000 cycles off.  Essentially, on start up, the IOP isn't using the 1PPS signal to determine when to start counting.

We tried swapping the spare IO chassis (intended for the LSC) in ....

# Joe will finish this in 3 days.

# Basically, in conclusion, in a word, we found that c1ioo IO chassis is wrong.

ELOG V3.1.3-