40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 46 of 354  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  6020   Mon Nov 28 06:53:30 2011 kiwamuUpdateCDSc1sus shutdown

I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferomter.

Quote from #6016

c1sus has been shutdown so that the optics dont bang around.  This is because the watch dogs are not working.

  6026   Mon Nov 28 16:46:55 2011 kiwamuUpdateCDSc1sus is now up

I have restarted the c1sus machine and burt-restored c1sus and c1mcs to the day before Thank giving, namely 23rd of November.

Quote from #6020

I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferometer.

  6030   Mon Nov 28 19:24:51 2011 JenneUpdateCDSBeware of fancy filter modules

[Rana, Jenne]

Some of the funniness is some kind of mysterious interaction between 2 filter modules in the filter banks.  Just FM1 (30:0.0) or just FM4 (Cheby, which is 2 cheby1's) has reasonable coherence.  Both FM1 and FM4 together doesn't do so well - the coherence goes way down.

Just FM1 (30:0.0)

SUSPOS_ETMY_30and0_measured_vs_idealTF.pdf

Just FM4 (Cheby)

SUSPOS_ETMY_Cheby_measured_vs_idealTF.pdf

 Both FM1 and FM4

SUSPOS_ETMY_30and0andCheby_measured_vs_idealTF.pdf

 All the coherences plotted together

SUSPOS_ETMY_30and0andCheby_compareCoherence.pdf

You'd think that the signal encounters FM1, gets filtered, and that result is the signal sent to the next active filter module, FM4, so the 2 filter modules shouldn't interact.  But clearly there's some funny business here since engaging both makes things crappy. 

Matlab investigations to replicate this behavior offline are in progress.

Attachment 4: SUSPOS_ETMY_30and0andCheby_compareCoherence.pdf
SUSPOS_ETMY_30and0andCheby_compareCoherence.pdf
  6031   Mon Nov 28 22:09:24 2011 ranaUpdateCDSBeware of fancy filter modules

To see what might be causing the problem, I used a version of the filter noise test matlab code that Matt had in the elog.

To see if it was a single precision problem, I just recast the input data:   x = single(x)

This is not strictly correct, since some of the rest of the operations are as double precision, but I think that attached plot shows that a casting from double to single is close to the right amount of noise to explain our excess noise problem in the 0.1-1 Hz region.

Den is going to interview Alex to find out if we have some kind of issue like this. My understanding was that all of our filter module calculations were being done in double precision (64 bit), but its possible that some single stuff has crept back in. Currently the FIR filtering code IS single precision and in the past, the SUS code which didn't carry the LSC signals (meaning ASC and damping) were done in single precision.

Attachment 1: noise.pdf
noise.pdf
  6033   Tue Nov 29 04:47:49 2011 kiwamuUpdateCDSc1sus shut down again

I have shut down the c1sus machine at 3:30 AM.

  6037   Tue Nov 29 15:30:01 2011 jamieUpdateCDSlocation of currently used filter function

So I tracked down where the currently-used filter function code is defined (the following is all relative to /opt/rtcds/caltech/c1/core/release):

Looking at one of the generated front-end C source codes (src/fe/c1lsc/c1lsc.c) it looks like the relevant filter function is:

filterModuleD()

which is defined in:

src/include/drv/fm10Gen.c

and an associated header file is:

src/include/fm10Gen.h
  6038   Tue Nov 29 15:57:43 2011 DenUpdateCDSlocation of currently used filter function

 

We are interested in the following question : Can the structures defined in fm10Gen.h (or some other *.c *.h files with defined as FLOAT variables) create single precision instead of double in the filter calculations?

 

typedef struct FM_OP_IN{
  UINT32 opSwitchE;     /* Epics Switch Control Register; 28/32 bits used*/
  UINT32 opSwitchP;     /* PIII Switch Control Register; 28/32 bits used*/
  UINT32 rset;          /* reset switches */
  float offset;         /* signal offset */
  float outgain;        /* module gain */
  float limiter;        /* used to limit the filter output to +/- limit val */
  int rmpcmp[FILTERS];  /* ramp counts: ramps on a filter for type 2 output*/
                        /* comparison limit: compare limit for type 3 output*/
                        /* not used for type 1 output filter */
  int timeout[FILTERS]; /* used to timeout wait in type 3 output filter */
  int cnt[FILTERS];     /* used to keep track of up and down cnt of rmpcmp */
                        /* should be initialized to zero */
  float gain_ramp_time; /* gain change ramping time in seconds */
} FM_OP_IN;  

 

  6042   Tue Nov 29 18:54:29 2011 kiwamuUpdateCDSc1sus machine up

[Zach / Kiwamu]

 Woke up the c1sus machine in order to lock PSL to MC so that we can observe the effect of not having the EOM heater.

  6048   Wed Nov 30 01:35:49 2011 JenneUpdateCDSOSEM noise / nullstream and what does it mean for satellites

I'm picking points off of this no-magnet OSEM plot, and I thought I'd write them down somewhere so I don't have to do it again when I lose my sticky note...

1e-2 Hz        1.05e-2 um/rtHz

1e-1 Hz        3.4e-3 um/rtHz

1 Hz            1.3e-3 um/rtHz

10 Hz          2.5e-4 um/rtHz

60 Hz          7.5e-5 um/rtHz

100 Hz        7e-5 um/rtHz

400 Hz        7e-5 um/rtHz

  6049   Wed Nov 30 02:04:26 2011 rana, den, jenne, kiwamu, jzweizigUpdateCDSFiltering Noise issue tracked down ???

 You can read through all of our past tests to see what didn't work in tracking things down. As Den mentions, there was actually a lot of evidence that there was some double->single precision action in the filter calculation causing the noise we saw.

However, it turns out that this is NOT the case.

This afternoon I was so confused that I enlisted JZ to help us out. He came over and I tried to replicate the error. When looking at the time series, we noticed that it wasn't random noise; the signals seem to be getting clipped as they crossed zero. Sort of like a stiction problem. JZ left to go replicate the error on an offline system.

This turned out to be the important clue. As we examine the code we find this inside of fm10Gen.c:

if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

this is line is basically trapping the filter history at 1e-20, to prevent some kind of numerical underflow problem (?). Seems reasonable, except that some filters which have higher order low passing in them actually have an overall scale factor which can be small (even as small as 1e-23, as Den pointed out).

So the reason we saw such weird behavior is that the first filter in SUSPOS is an AC coupling filter. This takes the OSEM signal and remove the large mean value. Then the next filter multiplies it by 1e-23 before doing the filtering and you end up with this noise in the filter history.

I looked and this line is commented out in the new BiQuad code, but as far as I can tell this issue has been around in aLIGO, eLIGO, iLIGO, etc. for a long time and could have been causing many cases of excess noise whenever we ended up a tiny gain factor in an IIR filter. At the 40m, there are easily a hundred such cases.

For now, I suppose we can just change this number to 1e-40 or so. I don't know how to calculate what the right number should be. Not sure why this underflow is not an issue for the BiQuad, however.

  6051   Wed Nov 30 11:04:26 2011 josephbUpdateCDSFiltering Noise issue tracked down ???

Quote:

For now, I suppose we can just change this number to 1e-40 or so. I don't know how to calculate what the right number should be. Not sure why this underflow is not an issue for the BiQuad, however.

According to the RCG SVN logs, the reason it was removed was a more general change done to the compiled code, not specific to just the biquad.  Basically, the ability to have an underflow number (subnormal) has been turned off completely by having any number that underflows set to zero. I'm not positive, but from a quick search looks that the smallest number before hitting is an underflow as a double is 2.2250738585072014e-308.

Alex's entry from the SVN log for 2663:

Added new fz_daz() function to turn on two bits in the FPU SSE control register.
Bits FZ (flush underflows to zero) and DOZ (denorms are zeros) are set to
avoid runaway code on float/double denorms (really small numbers).
Ref: http://software.intel.com/en-us/articles/how-to-avoid-performance-penalties-for-gradual-underflow-behavior/

SVN log 2664:

Removed +- 1e-20 limiting code, this is taken care of by setting FZ/DOZ bits
in the CPU SEE control register (see mathInline.h)

SVN log 2665:

Kill the underflows and roll down float denorms to zero,
see fz_doz() in mathInline.h.

  6052   Wed Nov 30 11:36:12 2011 DenUpdateCDSFiltering Noise issue tracked down ???

Quote:

 if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

20 is indeed a random number. We can change it to 300. Alex said that during that iir filter calculations sometimes numbers are very small and if they are less then 1e-308 then a very slow code in the processor is executed and this will crash the online system. For single precision this number is 1e-38 and may be 10 years ago it was not decided for sure what to use - float or double. 20 will be "OK" for both but as we can see causes other problems.

Anyway, Alex removed this line from the code and added another code that sets the two proper bits in the MXCSR register and prohibits to the CPU to run the slow code. As far as I understand if the numbers are less then 1e-308 they become 0. Roughly, this is equivalent to 

 if((new_hist < 1e-308) && (new_hist > -1e-308)) new_hist = 0;

This is in 2.4 release. It is in the svn. I think we can install it and figure out if the problem is gone.

 

  6091   Thu Dec 8 19:48:23 2011 kiwamuUpdateCDSrestarted c1lsc machine and daqd

Since the c1lsc machine became frozen I restarted the c1lsc machined and daqd.

Then I burtrestored c1lsc, c1ass and c1oaf to this evening. They seem running okay.

  6095   Fri Dec 9 15:14:41 2011 DenUpdateCDSrelease 2.4

Alex has created a 2.4 branch of the RCD. Jamie, we can try to compile and install it. As a test a did it for c1oaf, it compiles, installs and runs once variables SITE, IFO, RCD_LIBRARY_PATH are properly defined. As we do not want to run one model at 2.4 code and others at 2.1, I recompiled c1oaf back to 2.1. Jamie, please, let me know when you are ready to upgrade to 2.4 release.

  6106   Mon Dec 12 13:02:08 2011 kiwamuUpdateCDSdaqd restarted

I have restarted the daqd process at 1:01 PM since I have added some new ALS's daq channels.

  6124   Thu Dec 15 11:47:43 2011 jamieUpdateCDSRTS UPGRADE IN PROGRESS

I'm now in the middle of upgrading the RTS to version 2.4.

All RTS systems will be down until futher notice...

  6125   Thu Dec 15 22:22:18 2011 jamieUpdateCDSRTS upgrade aborted; restored to previous settings

Unfortunately, after working on it all day, I had to abort the upgrade and revert the system back to yesterday's state.

I think I got most of the upgrade working, but for some reason I could never get the new models to talk to the framebuilder.  Unfortunately, since the upgrade procedure isn't document anywhere, it was really a fly by the seat of my pants thing.  I got some help from Joe, which got me through one road block, but I ultimately got stumped.

I'll try to post a longer log later about what exactly I went through.

In any event, the system is back to the state is was in yesterday, and everything seems to be working.

  6149   Mon Dec 26 12:04:41 2011 kiwamuUpdateCDSc1gcy.ini hand edited

I have edited c1scx.ini by hand in order to acquire some green locking related channels.

Somehow c1sus.ini, c1mcs.ini, c1scx.ini and c1scy.ini are not accessible via the daqconfig script.

As far as I remember it had been accessible via daqconfig a week ago when I edited c1scy.ini.

Anyway I had to edit it by hand. They need to be fixed at some point

  6173   Thu Jan 5 09:59:27 2012 JamieUpdateCDSRTS/RCG/DAQ UPGRADE TO COMMENCE

RTS/RCG/DAQ UPGRADE TO COMMENCE

I will be attempting (again) to upgrade the RTS, including the RCG and the daqd, to version 2.4 today.  The RTS will be offline until further notice.

  6174   Thu Jan 5 20:40:21 2012 JamieUpdateCDSRTS upgrade aborted; restored to previous settings; fb symmetricom card failing?

After running into more problems with the upgrade, I eventually decided to abort todays upgrade attempt, and revert back to where we were this morning (RTS 2.1).  I'll try to follow this with a fuller report explaining what problems I encountered when attempting the upgrade.

However, when Alex and I were trying to figure out what was going wrong in the upgrade, it appears that the fb symmetricom card lost the ability to sync with the GPS receiver.  When the symmeticom module is loaded, dmesg shows the following:

[  285.591880] Symmetricom GPS card on bus 6; device 0
[  285.591887] PIC BASE 2 address = fc1ff800
[  285.591924] Remapped 0x17e2800
[  285.591932] Current time 947125171s 94264us 800ns 
[  285.591940] Current time 947125171s 94272us 600ns 
[  285.591947] Current time 947125171s 94280us 200ns 
[  285.591955] Current time 947125171s 94287us 700ns 
[  285.591963] Current time 947125171s 94295us 800ns 
[  285.591970] Current time 947125171s 94303us 300ns 
[  285.591978] Current time 947125171s 94310us 800ns 
[  285.591985] Current time 947125171s 94318us 300ns 
[  285.591993] Current time 947125171s 94325us 800ns 
[  285.592001] Current time 947125171s 94333us 900ns 
[  285.592005] Flywheeling, unlocked...

Because of this, the daqd doesn't get the proper timing signal, and consequently is out of sync with the timing from the models.

It's completely unclear what caused this to happen.  The card seemed to be working all day today, then Alex and I were trying to debug some other(maybe?) timing issues and the symmetricom card all of a sudden stopped syncing to the GPS.  We tried rebooting the frame builder and even tried pulling all the power to the machine, but it never came back up.  We checked the GPS signal itself and to the extend that we know what that signal is supposed to look like it looked ok.

I speculate that this is also the cause of the problems were were seeing earlier in the week.  Maybe the symmetricom card has just been acting flaky, and something we did pushed it over the edge.

Anyway, we will try to replace it tomorrow, but Alex is skeptical that we have a replacement of this same card.  There may be a newer Spectracom card we can use, but there may be problems using it on the old sun hardware that the fb is currently running on.  We'll see.

In the mean time, the daqd is running rogue, off of it's own timing.  Surprisingly all of the models are currently showing 0x0 status, which means no problems.  It doesn't seem to be recording any data, though.  Hopefully we'll get it all sorted out tomorrow.

  6175   Fri Jan 6 01:00:56 2012 kiwamuUpdateCDSc1scx out of sync

Both the c1scx and its IOP realtime processes became out of sync.

Initially I found that the c1scx didn't show any ADC signals, though the sync sign was green.

Then I software-rebooted the c1iscex machine and then it became out of sync.

For tonight this is fine because I am concentrating on the central part anyway.

  6176   Fri Jan 6 11:49:13 2012 JamieUpdateCDSframebuilder taken offline to diagnose problem with symmetricom timing card

Alex and I have taken the framebuilder offline to try to see what's wrong with the symmetricom card.  We have removed the card from the chassis and Alex has taken it back to downs to do some more debugging.

We have been formulating some alternate methods to get timing to the fb in case we can't end up getting the card working.

  6177   Fri Jan 6 14:31:54 2012 JamieUpdateCDSframebuilder back online, using NTP time syncronization

The framebuilder is back online now, minus it's symmetricom GPS card.  The card seems to have failed entirely, and was not able to be made to work at downs either.  It has been entirely removed from fb.

As a fall back, the system has been made to work off of the system NTP-based time synchronization.  The latest symmetricom driver, which is part of the RCG 2.4 branch, will fall back to using local time if the GPS synchronization fails.  The new driver was compiled from our local checkout of the 2.4 source in the new to-be-used-in-the-future rtscore directory:

controls@fb ~ 0$ diff {/opt/rtcds/rtscore/branches/branch-2.4/src/drv/symmetricom,/lib/modules/2.6.34.1/kernel/drivers/symmetricom}/symmetricom.ko
controls@fb ~ 0$ 

The driver was reloaded.   daqd was also linked against the last running stable version and restarted:

controls@fb ~ 0$ ls -al $(readlink -f /opt/rtcds/caltech/c1/target/fb/daqd)
-rwxr-xr-x 1 controls controls 6592694 Dec 15 21:09 /opt/rtcds/caltech/c1/target/fb/daqd.20120104
controls@fb ~ 0$ 

We'll have to keep an eye on the system, to see that it continues to record data properly, and that the fb and the front-ends remain in sync.

The question now is what do we do moving forward.  CDS is not supporting the symmetricom cards anymore, and have moved to using Spectracom GPS/IRIG-B cards.  However, Downs has neither at the moment.  Even if we get a new Spectracom card, it might not work in this older Sun hardware, in which case we might need to consider upgrading the framebuilder to a new machine (one supported by CDS).

  6179   Fri Jan 6 20:10:48 2012 ranaUpdateCDSframebuilder back online, using NTP time syncronization

 

 You (Jamie) should talk with Rolf and Alex to find out what framebuilder they will support for > 3 years. Then we should buy that along with the adapter card which allows us to use the same RAID we now have for the frames.

  6182   Mon Jan 9 23:52:15 2012 kiwamuUpdateCDSSUS channels not accessible from dataviewer

[John / Kiwamu]

 We found that some of the suspensions channels (for example C1:SUS-BS_POS_IN1 and etc) were not accessible from dataviewer for some reasons.

So far it seems none of the channels associated with c1sus are accessible from dataviewer.

  6204   Tue Jan 17 02:44:59 2012 kiwamuUpdateCDSawg not working

AWG is not working. This needs to be fixed.

I could set the channel and the parameters in the AWGGUI screen, but it never inject signals to the realtime system.

  6207   Tue Jan 17 16:09:20 2012 kiwamuUpdateCDSawg not working on the c1sus machine

Actually awg works fine without any problems when the excitation channels belong to the c1lsc machine.

It seems that the awg doesn't inject signals on the channels of the c1sus machine, for example C1:SUS-BS_LSC_EXC and so on.

Quote from #6204

AWG is not working. This needs to be fixed.

  6319   Fri Feb 24 23:14:09 2012 kiwamuUpdateCDStdsavg went crazy

I found that the LSCoffset script didn't work today. The script is supposed to null the electrical offsets in all the LSC channels.

I went through the sentences in the script and eventually found that the tdsavg command returns 0 every time.

I thought this was related to the test points, so I ran the following commands to flush all the test point running and the issue was solved.

[term]> diag              
[diag]>open               
[diag]> diag  tp clear *  
 

EDIT, JCD 11June2012:  3rd line there should just be [diag]> tp clear *

  6327   Mon Feb 27 19:04:13 2012 jamieUpdateCDSspontaneous timing glitch in c1lsc IO chassis?

For some reason there appears to have been a spontaneous timing glitch in the c1lsc IO chassis that caused all models running on c1lsc to loose timing sync with the framebuilder.  All the models were reporting "0x4000" ("Timing mismatch between DAQ and FE application") in the DAQ status indicator.  Looking in the front end logs and dmesg on the c1lsc front end machine I could see no obvious indication why this would have happened.  The timing seemed to be hooked up fine, and the indicator lights on the various timing cards were nominal.

I restarted all the models on c1lsc, including and most importantly the c1x04 IOP, and things came back fine.  Below is the restart procedure I used.  Note I killed all the control models first, since the IOP can't be restarted if they're still running.  I then restarted the IOP, followed by all the other control models.

controls@c1lsc ~ 0$ for m in lsc ass oaf; do /opt/rtcds/caltech/c1/scripts/killc1${m}; done
controls@c1lsc ~ 0$ /opt/rtcds/caltech/c1/scripts/startc1x04
c1x04epics C1 IOC Server started
 * Stopping IOP awgtpman ...                                                                      [ ok ]
controls@c1lsc ~ 0$ for m in lsc ass oaf; do /opt/rtcds/caltech/c1/scripts/startc1${m}; done
c1lscepics: no process found
ERROR: Module c1lscfe does not exist in /proc/modules
c1lscepics C1 IOC Server started
 * WARNING:  awgtpman_c1lsc has not yet been started.
c1assepics: no process found
ERROR: Module c1assfe does not exist in /proc/modules
c1assepics C1 IOC Server started
 * WARNING:  awgtpman_c1ass has not yet been started.
c1oafepics: no process found
ERROR: Module c1oaffe does not exist in /proc/modules
c1oafepics C1 IOC Server started
 * WARNING:  awgtpman_c1oaf has not yet been started.
controls@c1lsc ~ 0$ 
  6392   Fri Mar 9 11:59:38 2012 Zweizig the ELOG MavenSummaryCDSNDS2 restart

 Hi Rana,

It looks like the channe list file has a few blank lines that the channel list reader is choking on. I removed the lines and it is working now.. I have made the error message a bit more obvious (gave the file name  and line number) and allowed it to ignore empty lines so this won't cause problems with future versions (when installed). The bottom line is nds2 is now running on mafalda.
Best regards,

John   

  6397   Fri Mar 9 20:44:24 2012 Jim LoughUpdateCDSDAQ restart with new ini file

DAQ reload/restart was performed at about 1315 PST today. The previous ini file was backed up as c1pem20120309.ini in the /chans/daq/working_backups/ directory.

I set the following to record:

The two JIMS channels at 2048:
[C1:PEM-JIMS_CH1_DQ] Persistent version of JIMS channel. When bit drops to zero indicating something bad (BLRMS threshold exceeded) happens the bit stays at zero  for >= the value of the persist EPICS variable.
[C1:PEM-JIMS_CH2_DQ] Non-persistent version of JIMS channel.

And all of the BLRMS channels at 256:
Names are of the form:
[C1:PEM-RMS_ACC1_F0p1_0p3_DQ]
[C1:PEM-RMS_ACC1_F0p3_1_DQ]

On monday I intend to look at the weekend seismic data to establish thresholds on the JIMS channels.

256 was the lowest rate possible according to the RCG manual. The JIMS channels are recorded at 2048 because I couldn't figure out how to disable the decimation filter. I will look into this further.

  6404   Tue Mar 13 13:28:31 2012 Ryan FisherUpdateCDSDAQ restart with new ini file
Extra note: This was the ini file that was edited:
/cvs/cds/rtcds/caltech/c1/chans/daq/C1PEM.ini
  6422   Thu Mar 15 08:48:40 2012 RyanSummaryCDSSummary of Syracuse Visit to 40m Mar 5-9 2012

JIMS Channels in PEM Model

The PEM model has been modified now to include the JIMS(Joint Information Management System) channel processing. Additionally Jim added test points at the outputs of the BLRMS.

For each seismometer channel, five bands are compared to threshold values to produce boolean results.  Bands with RMS below threshold produce bits with value 1, above threshold results in 0.  These bits are combined to produce one output channel that contains all of the results.

A persistent version of the channel is generated by a new library block that called persist which holds the value at 0 for a number of time steps equal to an EPICS variable setting from the time the boolean first drops to zero. The persist allows excursions shorter than the timestep of a downsampled timeseries to be seen reliably.

The EPICS variables for the thresholds are of the form (in order of increasing frequency):
C1:PEM-JIMS_GUR1X_THRES1
C1:PEM-JIMS_GUR1X_THRES2
etc.
 
The EPICS variables for the persist step size are of the form:
C1:PEM-JIMS_GUR1X_PERSIST
C1:PEM-JIMS_GUR1Y_PERSIST
etc.

The JIMS Channels are being recorded and written to frames:

The two JIMS channels at 2048:
[C1:PEM-JIMS_CH1_DQ] Persistent version of JIMS channel. When bit drops to zero indicating something bad (BLRMS threshold exceeded) happens the bit stays at zero  for >= the value of the persist EPICS variable.
[C1:PEM-JIMS_CH2_DQ] Non-persistent version of JIMS channel.

And all of the BLRMS channels at 256:
Names are of the form:
[C1:PEM-RMS_ACC1_F0p1_0p3_DQ]
[C1:PEM-RMS_ACC1_F0p3_1_DQ]

For additional details about the JIMS Channels and the implementation, please see the previous elog entries by Jim.

 

Conlog

I have a working aLIGO Conlog/EPICS Log installed and running on megatron.

Please see this wiki page for the details of use:
https://wiki-40m.ligo.caltech.edu/aLIGO%20EPICs%20log%20%28conlog%29

I also edited this page with restart instructions for megatron:
https://wiki-40m.ligo.caltech.edu/Computer_Restart_Procedures#megatron

Please see Ryan's previous elog entries for installation details.

Future Work

  • Determine useful thresholds for each band
  • Generate MEDM Screens for JIMS Channels
  • Add a decimation option to channels
  • Add EPICS Strings in PEM model to describe bits in JIMS Channels
  • Add additional JIMS Channels: Testing additional characterization methods
  • Implement a State Log on Megatron: Will Provide a 1Hz index into JIMS Channels
  • Generate a single web page that allows access to aLIGO Conlog/EPICS Log and State Log

 

  6436   Thu Mar 22 16:45:06 2012 kiwamuUpdateCDSc1scx and c1scy not properly running

It seems that neither c1scx nor c1scy is working properly as their ADC counts are showing digital-zeros.

However the IOPs, c1gcx and c1gcy look running fine, and also the IOPs seem successfully recognizing the ADCs according to dmesg.

Also there is one more confusing fact : c1scx and c1scy are synchronizing to the timing signal somehow.

I restarted the c1scx front end model to see if this helps, but unfortunately it didn't work.

As this is not the top priority concern for now, I am leaving them as they are now with the watchgods off.

(I may try hardware rebooting them in this evening)

Quote from #6434

The power was turned back on at 4pm It took some time for Suresh to restart the computers. We have damping but things are not perfect yet. Auto BURTH did not work well.

 

  6438   Thu Mar 22 17:41:15 2012 sureshUpdateCDSc1scx and c1scy not properly running

Quote:

It seems that neither c1scx nor c1scy is working properly as their ADC counts are showing digital-zeros.

Quote from #6434

The power was turned back on at 4pm It took some time for Suresh to restart the computers. We have damping but things are not perfect yet. Auto BURTH did not work well.

 When Steve and I restarted the c1iscex and c1iscey computers after the power shutdown, the models within them did not start-up automatically.  I had to start them manually from a terminal in the control room. 

I also tried rebooting the FB a couple of times.  Did not make any difference.

Manually starting the c1x05, c1scy and c1x01, c1scx models (with the Burt Restore button ON) did not resolve the issue of zeros in the epics screens.  though it did re-establish timing. 

  6439   Thu Mar 22 23:43:56 2012 KojiUpdateCDSc1scx and c1scy not properly running

Did you guys checked if the simplant switch is set to "REAL WORLD" mode?

Edit by KI:

Bingo ! The input signals were bypassed to the simplant. I switched the simplant settings to REAL WORLD and now both end suspensions are working fine.

  6540   Tue Apr 17 11:05:04 2012 JamieUpdateCDSCDS upgrade in progress

I am continuing to attempt to upgrade the CDS system to RTS 2.5.  Systems will continue to be up and down for the rest of the day.

  6541   Tue Apr 17 19:03:09 2012 JamieUpdateCDSCDS upgrade in progress

Upgrade progresses, but not complete.  There are some relatively minor issues, and one potentially big issue.

All new software has been installed, including the new epics that supports long channel names.

I've been doing a LOT of cleanup.  It was REALLY messy in there.

The new framebuilder/daqd code is running on fb.

Models are compiling with the new RCG and I am able to get them running.  Some of them are not compiling for relatively minor reasons (the simulink models need updating).  I'm also running into compile problems with IOPs that are using the dolphin drivers.

The major issue is that the framebuilder and the models are not syncing their timing, so there's no data collection.  I've spoken to Alex and he and Rolf are going to come over tomorrow to sort it out.  It's possible that we're missing timing hardware that the new code is expecting.

There are still some stability issues I haven't sorted out yet, and I have a lot more cleanup to do.

At this rate I'm going to shoot for being done Thursday.

  6546   Wed Apr 18 19:59:48 2012 JamieUpdateCDSCDS upgrade success

The upgrade is nearly complete:

  • new daqd code is running on fb
  • the fe/daqd timing issue was resolved by adjusting the GPS offset in the daqdrc.  I will document this more later.
  • the power outage conveniently rebooted all the front-end machines, so they're all now running new caRepeater
  • all models have been successfully recompiled with RCG 2.5 (with only a couple small glitches)
  • all new models are running on all front-end machines (with a couple exceptions)
  • all suspension models seem to be damping under local control (PRM is having troubles that are likely unrelated to the upgrade).
  • a lot of cleanup has been done

Remaining tasks/issues:

  • more testing OF EVERYTHING needs o be done
  • I did not yet update the DIS dolphin code, so we're running with the old code.  I don't think this is a problem, but it would be nice to get us running what they're running at the sites
  • I tried to cleanup/simplify how front-end initialization is done.  However, there is a problem and models are not auto-starting after reboot.  This needs to be fixed.
  • the userapps directory is in a new place (/opt/rtcds/userapps).  Not everything in the old location was checked into the repository, so we need to check to make sure everything that needs to be is checked in, and that all the models are running the right code.
  • the c1oaf model seems to be having a dolphin issue that needs to be sorted
  • the c1gfd model causes c1ioo to crash immediately upon being loaded.  I have removed it from the rtsystab.  That model needs to be fixed.
  • general model cleanup is in order.
  • more front-end cleanup is needed, particularly in regards to boot-up procedure.
  • document the entire upgrade procedure.

I'll finish up these remaining tasks tomorrow.

  6547   Wed Apr 18 23:12:49 2012 DenUpdateCDSoaf

Adaptive filter outputs some non-zero signal in the OFF position

Screenshot.png

I turned "ON" one of them and c1lsc suspended, I've rebooted it and restarted models on c1lsc and c1sus.

Now it also outputs something non-zero though the first line of the adaptive code is "if(OFF) output=0.0; return;" May be another version of the code has been compiled.

Edit: Old version (~september) of the code and oaf model is running now. In the 2.1 code there was a link from src/epics/simLink to oaf code for each DOF. It seems that 2.5 version finds models and c codes in standard directories. I need to move working code to the proper directory.

  6548   Thu Apr 19 08:43:16 2012 JamieUpdateCDSoaf

Quote:

Edit: Old version (~september) of the code and oaf model is running now. In the 2.1 code there was a link from src/epics/simLink to oaf code for each DOF. It seems that 2.5 version finds models and c codes in standard directories. I need to move working code to the proper directory.

 Yes, things have changed.  Please wait until I have things cleaned up before working on models.  I'll explain what the new setup is.

  6552   Fri Apr 20 19:54:57 2012 JamieUpdateCDSCDS upgrade problems

I ran into a couple of snags today.

A big one is that the framebuilder daqd started going haywire when I told it to start writing frames.  After restart the logs started showing this:

[Fri Apr 20 17:23:40 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:41 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:42 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:43 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:44 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:45 2012] main profiler warning: 0 empty blocks in the buffer
GPS time jumped from 1019002442 to 1019003041
FATAL: exception not rethrown
FATAL: exception not rethrown
FATAL: exception not rethrown

and the network seemed like it started to get really slow.  I wasn't able to figure out what was going on, so I shut the frame writing off again.  I'll have to work with Rolf on that next week.

Another big problem is the workstation application upgrades.  The NDS protocol version has been incremented, which means that all the NDS client applications have to be upgraded.  The new dataviewer is working fine (on pianosa), but dtt is not:

controls@pianosa:~ 0$ diaggui
diaggui: symbol lookup error: /ligo/apps/linux-x86_64/gds-2.15.1/lib/libligogui.so.0: undefined symbol: _ZN18TGScrollBarElement11ShowMembersER16TMemberInspector
controls@pianosa:~ 127$ 

I don't know what's going on here.  All the library paths are ok.  Hopefully I'll be able to figure this out soon.  The old version of dtt definitely does not work with the new setup.

I might go ahead and upgrade some more of the workstations to Ubuntu in the next couple of days as well, so everything is more on the same page.

I also tried to cleanup the front-end boot process, which has it's own problems (models won't auto-start).  I haven't figured that out yet either.  It really needs to just be completely overhauled.

  6554   Sat Apr 21 17:38:19 2012 JamieUpdateCDSdtt, dataviewer working; problem with trend frames

Quote:

Another big problem is the workstation application upgrades.  The NDS protocol version has been incremented, which means that all the NDS client applications have to be upgraded.  The new dataviewer is working fine (on pianosa), but dtt is not:

controls@pianosa:~ 0$ diaggui
diaggui: symbol lookup error: /ligo/apps/linux-x86_64/gds-2.15.1/lib/libligogui.so.0: undefined symbol: _ZN18TGScrollBarElement11ShowMembersER16TMemberInspector
controls@pianosa:~ 127$

 dtt (diaggui) and dataviewer are now working on pianosa to retrieve realtime data and past data from DQ channels.

Unfortunately it looks like there may be a problem with trend data, though.  If I try to retrieve 1 minute of "full data" with dataviewer for channel C1:SUS-ITMX_SUSPOS_IN1_DQ around GPS 1019089138 everything works fine:

Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
T0=12-04-01-00-17-45; Length=60 (s)
60 seconds of data displayed

but if I specify any trend data (second, minute, etc.) I get the following:

Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
Server error 18: trend data is not available
datasrv: DataWriteTrend failed in daq_send().
T0=12-04-01-00-17-45; Length=60 (s)
No data output.

Alex warned me that this might have happened when I was trying to test the new daqd without first turning off frame writing.

I'm not sure how to check the integrity of the frames, though.  Hopefully they can help sort this out on Monday.

  6555   Sat Apr 21 20:40:28 2012 JamieUpdateCDSframebuilder frame writing working again

Quote:

A big one is that the framebuilder daqd started going haywire when I told it to start writing frames.  After restart the logs started showing this:

[Fri Apr 20 17:23:40 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:41 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:42 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:43 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:44 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:45 2012] main profiler warning: 0 empty blocks in the buffer
GPS time jumped from 1019002442 to 1019003041
FATAL: exception not rethrown
FATAL: exception not rethrown
FATAL: exception not rethrown

and the network seemed like it started to get really slow.  I wasn't able to figure out what was going on, so I shut the frame writing off again.  I'll have to work with Rolf on that next week.

So the framebuilder/daqd frame writing issue seems to have been a transient one.  Alex said he tried enabling frame writing manually and it worked fine, so I tried re-enabling the frame writing config lines and sure enough it worked fine.  So it's a mystery.  Alex said the "main profiler warning" lines tend to show up when data is backed up in the buffer, although he didn't explain why exactly that would have been the issue here.

daqdrc frame writing directives:

start frame-saver;
sync frame-saver;
start trender;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start raw_minute_trend_saver;
  6556   Sat Apr 21 21:10:34 2012 JamieUpdateCDStrend frame issue partially resolved

Quote:

Unfortunately it looks like there may be a problem with trend data, though.  If I try to retrieve 1 minute of "full data" with dataviewer for channel C1:SUS-ITMX_SUSPOS_IN1_DQ around GPS 1019089138 everything works fine:

Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
T0=12-04-01-00-17-45; Length=60 (s)
60 seconds of data displayed

but if I specify any trend data (second, minute, etc.) I get the following:

Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
Server error 18: trend data is not available
datasrv: DataWriteTrend failed in daq_send().
T0=12-04-01-00-17-45; Length=60 (s)
No data output.

Alex warned me that this might have happened when I was trying to test the new daqd without first turning off frame writing.

Alex told me that the "trend data is not available" message comes from the "trender" functionality not being enabled in daqd.  After re-enabling it (see #6555) minute trend data was available again.  However, there still seems to be an issue with second trends.  When I try to retrieve second trend data from dataviewer for which minute trend data *is* available I get the following error message:

Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
No data found

read(); errno=9
read(); errno=9
T0=12-04-04-02-14-29; Length=3600 (s)
No data output.

Awaiting more help from Alex...

  6560   Tue Apr 24 14:30:08 2012 JamieUpdateCDSIntroducing: rtcds, a utility for interacting with the CDS RTS/RCG

The new rtcds utility

I have written a new utility for interacting with the CDS RTS/RCG system: "rtcds".  It should be available on all workstations and front-end machines, but certain commands are restricted to run on certain front end machines (build, start, stop, etc.).  Here's the help:

controls@c1lsc ~ 0$ rtcds help
Usage: rtcds <command> [arg]

Available commands:

  build|make SYS      build model
  install SYS         install model
  start SYS|all       start model
  restart SYS|all     restart running model
  stop|kill SYS|all   stop running model
  list                list all models for current host

controls@c1lsc ~ 0$ 

Please use this new utility from now on when interacting with RTS.

I'll be improving and expanding it as we go.   Please let me know if you encounter any problems.

 

  6561   Tue Apr 24 14:35:37 2012 JamieUpdateCDSlimited second trend lookback

Quote:

Alex told me that the "trend data is not available" message comes from the "trender" functionality not being enabled in daqd.  After re-enabling it (see #6555) minute trend data was available again.  However, there still seems to be an issue with second trends.  When I try to retrieve second trend data from dataviewer for which minute trend data *is* available I get the following error message:

Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
No data found

read(); errno=9
read(); errno=9
T0=12-04-04-02-14-29; Length=3600 (s)
No data output.

Awaiting more help from Alex...

It looks like this is actually just a limit of how long we're saving the second trends, which is just not that long.  I'll look into extending the second trend look-back.

  6562   Tue Apr 24 14:55:00 2012 JamieUpdateCDScds code paths (rtscore, userapps) have moved

NOTE: Unless you really care about what's going on under the hood, please ignore this entire post and go here: USE THE NEW RTCDS UTILITY

Quote:

  • the userapps directory is in a new place (/opt/rtcds/userapps).  Not everything in the old location was checked into the repository, so we need to check to make sure everything that needs to be is checked in, and that all the models are running the right code.

 An important new aspect of this upgrade is that we have some new directories and some of the code paths have moved to correspond with the "standard" LIGO CDS filesystem hierarchy:

  • rtscore:      /opt/rtcds/rtscore/release       -->  RTS RCG source code (svn: adcLigoRTS)
  • userapps: /opt/rtcds/userapps/release  -->  CDS userapps source for models, RCG C code, medm screens, scripts, etc (svn: cds_user_apps)
  • rtbuild:       /opt/rtcds/caltech/c1/rtbuild    -->  RCG build directory

All work should be done in the "userapps" directory, and all builds should be done in the build directory.  Some important points:

WARNING: DO NOT MODIFY ANYTHING IN RTSCORE

This is important.  The rtscore directory is now just where the RCG source code is stored.  WE NO LONGER BUILD MODELS IN THE RTS SOURCE  Please use the rtbuild directory instead.

NO MORE MODEL/CODE SYMLINKS

You don't need to link you model or code anywhere to get it to compile.  The RCG now uses proper search paths to source in the RCG_LIB_PATH to find the needed source.  This has been configured by the admin, so as long as you put your code in the right place it should be found.

ALL CODE/MODELS/ETC GO IN USERAPPS

All RTS code is now stored ENTIRELY in the userapps directory (e.g. /opt/rtcds/userapps/release/isc/c1/models/c1lsc.mdl).  This is more-or-less the same as before, except that symlinking is no longer necessary.  I have placed a symlink at the old location for convenience.

BUILD ALL MODELS IN RTSCORE

You can run "make c1lsc" in the rtbuild directory, just as you used to in the old core directory.  However, don't do that.  USE THE NEW RTCDS UTILITY instead.

  6573   Thu Apr 26 16:35:34 2012 JamieSummaryCDSrosalba now running Ubuntu 10.04

This morning I installed Ubuntu 10.04 on rosalba.  This is the same version that is running on pianosa.  The two machines should be identically configured, although rosalba may be missing some apt-getable packages.

  6574   Thu Apr 26 18:15:59 2012 JamieUpdateCDSpossible issue with mx_stream on front ends

I'm noticing what appears to be occasional failures of mx_stream on the front end machines.  It doesn't happen that frequently, but I've noticed it a couple of times already since the upgrade.

The symptom is that the DC Status goes to "0xbad" (red) and the "FE NET" goes red for all models on a given front end.

The solution seems to be restarting mx_stream on the given front end:    sudo  /etc/init.d/mx_stream restart"

There is nothing in the mx_stream log:

 controls@c1sus ~ 0$ cat /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log 
 c1x02
 c1sus
 c1mcs
 c1rfm
 c1pem
 mmapped address is 0x7f43740ec000
 mapped at 0x7f43740ec000
 mmapped address is 0x7f43700ec000
 mapped at 0x7f43700ec000
 mmapped address is 0x7f436c0ec000
 mapped at 0x7f436c0ec000
 mmapped address is 0x7f43680ec000
 mapped at 0x7f43680ec000
 mmapped address is 0x7f43640ec000
 mapped at 0x7f43640ec000
 send len = 263596
 Connection Made

but I do see some funny messages in the front end dmesg:

 [200341.317912] DXH Adapter 0 : Heartbeat alive-check for node=12 failed (cnt=8387 state=0x1 deb=0 val=0).
 [200341.318670] DXH Adapter 0 : Session for node 12 is disabled - Status = 0x5
 [200341.319062] Session callback reason=1 status=5 target_node=12
 [200341.319069] Session callback reason=3 status=0 target_node=12
 [200341.359534] (map_table_check_access:752):my id 1 ->  remote id 2 : entry was valid - is now tentatively valid
 [200341.859584] DXH Adapter 0 : Probe failure for node=12 - disabling session probeStatus=0x40000f02
 [200341.860335] DXH Adapter 0 : Session for node 12 is disabled - Status = 0x3
 [200341.860728] Session callback reason=1 status=3 target_node=12
 [200374.006111] DXH Adapter 0 : Set reachable remote node list.
 [200409.020670] DXH Adapter 0 : Set reachable remote node list.
 [200409.021076] DXH Adapter 0 : Session for node 12 is deleted - Status = 0x0
 [200409.021468] Session callback reason=5 status=0 target_node=12
 [200412.362824] (map_table_insert:648):** successfully inserted **(valid unicast) inst 0 node 1->0 fwd 0 fwd_tp 4 egress 0
 [200418.025994] (map_table_check_access:752):my id 1 ->  remote id 0 : entry was valid - is now invalid
 [200418.025998] (map_table_insert:648):** successfully inserted **(valid unicast) inst 0 node 1->2 fwd 0 fwd_tp 4 egress 0
 [200421.743916] Session callback reason=0 status=0 target_node=12
 [200422.073776] DXH Adapter 0 : Set reachable remote node list.
 [200422.342446] Session callback reason=7 status=0 target_node=12
 [200422.342454] DXH Adapter 0 : Session for node 12 is ok.

I'm awaiting feedback from experts.

 

ELOG V3.1.3-