40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
 40m Log, Page 49 of 339 Not logged in
ID Date Author Type Category Subject
11412   Tue Jul 14 16:51:01 2015 JamieSummaryCDSCDS upgrade: problem is not disk access

I think I have now determined once and for all that the daqd problems are NOT due to disk IO contention.

I have mounted a tmpfs at /frames/tmp and have told daqd to write frames there.  The tmpfs exists entirely in RAM.  There is essentially zero IO wait for such a filesystem, so daqd should never have trouble writing out the frames.

But yet daqd continues to fail with the "0 empty blocks in the buffer" warnings.  I've been down a rabbit hole.

11413   Tue Jul 14 17:06:00 2015 jamieUpdateCDSrunning test on daqd, please leave undisturbed

I have reverted daqd to the previous configuration, so that it's writing frames to disk.  It's still showing instability.

11415   Wed Jul 15 13:19:14 2015 JamieSummaryCDSCDS upgrade: reducing mx end-points as last ditch effort

I tried one last thing, suggested by Keith and Gerrit.  I tried reducing the number of mx end-points on fb to zero, which should reduce the total number of fb threads, in the hope that the extra threads were causing the chokes.

On Tue, Jul 14 2015, Keith Thorne <kthorne@ligo-la.caltech.edu> wrote:
> Assumptions
>  1) Before the upgrade (from RCG 2.6?), the DAQ had been working, reading out front-ends, writing frames trends
>  2) In upgrading to RCG 2.9, the mx start-up on the frame builder was modified to use multiple end-points
> (i.e. /etc/init.d/mx has a line like
> # 1 10G card - X2
> MX_MODULE_PARAMS="mx_max_instance=1 mx_max_endpoints=16 $MX_MODULE_PARAMS" > (This can be confirmed by the daqd log file with lines at the top like > 263596 > MX has 16 maximum end-points configured > 2 MX NICs available > [Fri Jul 10 16:12:50 2015] ->4: set thread_stack_size=10240 > [Fri Jul 10 16:12:50 2015] new threads will be created with the stack of size 10 > 240K > > If this is the case, the problem may be that the additional thread on the frame-builder (one per end-point) take up so many slots on the 8-core > frame-builder that they interrupt the frame-writing thread, thus preventing the main buffer from being emptied. > > One could go back to a single end-point. This only helps keep restart of front-end A from hiccuping DAQ for front-end B. > > You would have to remove code on front-ends (/etc/init.d/mx_stream) that chooses endpoints. i.e. > # find line number in rtsystab. Use that to mx_stream slot on card (0-15) > line_num=grep -v ^# /etc/rtsystab | grep --perl-regexp -n "^${hostname}\s" | se
> d 's/^$$[0-9]*$$:.*/\1/g'
> line_off=$(expr$line_num - 1)
> epnum=$(expr$line_off % 2)
> cnum=$(expr$line_off / 2)
>
>     start-stop-daemon --start --quiet -b -m --pidfile /var/log/mx_stream0.pid --exec /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream -- -e 0 -r "$epnum" -W 0 -w 0 -s "$sys" -d x2daqdc0:$cnum -l /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream_logs/$hostname.log

As per Keith's suggestion, I modified the mx startup script to only initialize a single endpoint, and I modified the mx_stream startup to point them all to endpoint 0.  I verified that indeed daqd was a single MX end-point:

MX has 1 maximum end-points configured

It didn't help.  After 5-10 minutes daqd crashes with the same "0 empty blocks" messages.

I should also mention that I'm pretty sure the start of these messages does not seem coincident with any frame writing to disk; further evidence that it's not a disk IO issue.

Keith is looking at the system now, so we if he can see anything obvious.  If not, I will start reverting to 2.5.

11417   Wed Jul 15 18:19:12 2015 JamieSummaryCDSCDS upgrade: tentative stabilty?

Keith Thorne provided his eyes on the situation today and had some suggestions that might have helped things

Reorder ini file list in master file.  Apparently the EDCU.ini file (C0EDCU.ini in our case), which describes EPICS subscriptions to be recorded by the daq, now has to be specified *after* all other front end ini files.  It's unclear why, but it has something to do with RTS 2.8 which changed all slow channels to be transported over the mx network.  This alone did not fix the problem, though.

Increase second trend frame size.  Interestingly, this might have been the key.  The second trend frame size was increased to 600 seconds:

start trender 600 60;

The two numbers are the lengths in seconds for the second and minute trends respectively.  They had been set to "60 60", but Keith suggested that longer second trend frames are better, for whatever reason.  It seems he may be right, given that daqd has been running and writing full and trend frames for 1.5 hours now without issue.

As I'm writing this, though, the daqd just crashed again.  I note, though, that it's right after the hour, and immediately following writing out a one hour minute trend file.  We've been seeing these hour, on the hour, crashes of daqd for quite a while now.  So maybe this is nothing new.  I've actually been wondering if the hourly daqd crashes were associated with writing out the minute trend frames, and I think we might have more evidence to point to that.

If increasing the size of the second trend frames from 60 seconds (35M) to 600 seconds (70M) made a difference in stability, could there be an issue since writing out files that are smaller than some value?  The full frames are 60M, and the minute trends are 35M.

11427   Sat Jul 18 15:37:19 2015 JamieSummaryCDSCDS upgrade: current status

So it appears we have found a semi-stable configuration for the DAQ system post upgrade:

Here are the issues:

## daqd

dadq is running mostly stably for the moment, although it still crashes at the top of every hour (see below).  Here are some relevant points of about the current configuration:

• recording data from only a subset of front-ends, to reduce the overall load:
• c1x01
• c1scx
• c1x02
• c1sus
• c1mcs
• c1pem
• c1x04
• c1lsc
• c1ass
• c1x05
• c1scy
• 16 second main buffer:
start main 16;
• trend lengths: second: 600, minute: 60
start trender 600 60;
• writing to frames:
• full
• second
• minute
• (NOT raw minute trends)
• frame compression ON

This elliminates most of the random daqd crashing.  However, daqd still crashes at the top of every hour after writing out the minute trend frame. Still unclear what the issue is, but Keith is investigating.  In some sense this is no worse that where we were before the upgrade, since daqd was also crashing hourly then.  It's still crappy, though, so hopefully we'll figure something out.

The inittab on fb automatically restarts daqd after it crashes, and monit on all of the front ends automatically restarts the mx_stream processes.

## front ends

The front end modules are mostly running fine.

One issue is that the execution times seem to have increased a bit, which is problematic for models that were already on the hairy edge.  For instance, the rough aversage for c1sus has some from ~48us to 50us.  This is most problematic for c1cal, which is now running at ~66us out of 60, which is obviously untenable.  We'll need to reduce the load in c1cal somehow.

All other front end models seem to be working fine, but a full test is still needed.

There was an issue with the DACs on c1sus, but I rebooted and everything came up fine, optics are now damped:

11428   Sat Jul 18 16:03:00 2015 jamieUpdateCDSEPICS freezes persist

I notice that the periodic EPICS freezes persist.  They last for 5-10 seconds.  MEDM completely freezes up, but then it comes back.

The sites have been noticing similar issues on a less dramatic scale.  Maybe we can learn from whatever they figure out.

fb has been loading a 'symmetricom' kernel module, presumably because it was once being used to help with timing.  It's no longer needed, so I unloaded it and commented out the lines that loaded it in /etc/conf.d/local.start.

11435   Wed Jul 22 14:10:04 2015 ericqUpdateCDSRCG Diagnostics

Now that we've seemed to landed on a working configuration, I've re-ran the tests first described in ELOG 11347. I've also compared the real filtering of filter modules to their designs.

TL;DR: No adverse, or even observable, differences have been witnessed.

As a reminder: In c1tst, there are three loops, called LOOPA, LOOPB, and LOOPC.

• LOOPA is a filter module feeding back onto its own input, with a unit time delay block
• LOOPB is a FM whose output goes to the DAC. In meatspace, the AI output is hooked up directly to an AA chassis input, and back to the FM in CDS
• LOOPC includes RFM connections to c1rfm and back again.

Here are the loop delay results, which measure the slope of the phase response of the OLTF. For the purely digital loops (A, C), we know the number of cycles we expect to compare the delay to.

At this time, I haven't done the adding up of cycles, zero-order-holds, etc. to get the delay we expect from the analog loop (B), so I've just looked at whether it changed at all.

Anyways, I've attached the code that analyzes data from DTT-exported text files containing the continuous phase data from the loop measurements.

Before:

Single Model loop cycles: 1.0000000+/-0.0000006, disparity: -0.00+/-0.25 nsec 2 Model RFM loop cycles: 1.9999999+/-0.0000013, disparity: 0.0+/-0.5 nsec ADC->DAC loop time: 338.2+/-0.4 usec

After:

Single Model loop cycles: 0.9999999+/-0.0000008, disparity: 0.02+/-0.29 nsec 2 Model RFM loop cycles: 2.0000001+/-0.0000011, disparity: -0.0+/-0.4 nsec ADC->DAC loop time: 338.18+/-0.35 usec

So, the digital loops take the number of cycles we expect, and there are no real differences after the upgrade.

Additionally, for all three loops, I created a simple 100:10 filter in foton, and injected broadband noise with awggui, to measure the real TF applied by the FM code. I want to turn this whole process into a single script that will switch the filter on and off, read the foton file, and compare the measured TF to the ideal shape.

In our system, before and after the upgrade, all three loops showed no appreciable difference from the designed filter shape, other than some tiny uptick in phase when approaching the nyquist frequency. This may be due to the fact that I'm comparing to the ideal analog filter, rather than what a 16kHz digital filter looks like.

What I've plotted below is the devitation from the ideal zpk(100Hz, 10Hz, 0.1) frequency response, i.e. Hmeasured / Hideal. The code to do this analysis is also attached, it estimates the TF by dividing the CSD of the filter input and output by the PSD of the input. The single worst coherence in any bin of all the measurements is 0.997, so I didn't really bother to estimate the error of the TF estimate.

Attachment 1: filterShapes.png
Attachment 2: CDS_diag_codes.zip
11461   Wed Jul 29 21:40:39 2015 KojiSummaryCDSStatus of the frame data syncing

The trend data hasn't been synced with LDAS since Jul 27 5AM local.

40m:

controls@nodus|minute > pwd /frames/trend/minute controls@nodus|minute > ls -l 11222 | tail total 590432 -rw-r--r-- 1 controls controls 35758781 Jul 29 11:59 C-M-1122228000-3600.gwf -rw-r--r-- 1 controls controls 35501472 Jul 29 12:59 C-M-1122231600-3600.gwf -rw-r--r-- 1 controls controls 35296271 Jul 29 13:59 C-M-1122235200-3600.gwf -rw-r--r-- 1 controls controls 35459901 Jul 29 14:59 C-M-1122238800-3600.gwf -rw-r--r-- 1 controls controls 35550346 Jul 29 15:59 C-M-1122242400-3600.gwf -rw-r--r-- 1 controls controls 35699944 Jul 29 16:59 C-M-1122246000-3600.gwf -rw-r--r-- 1 controls controls 35549480 Jul 29 17:59 C-M-1122249600-3600.gwf -rw-r--r-- 1 controls controls 35481070 Jul 29 18:59 C-M-1122253200-3600.gwf -rw-r--r-- 1 controls controls 35518238 Jul 29 19:59 C-M-1122256800-3600.gwf -rw-r--r-- 1 controls controls 35514930 Jul 29 20:59 C-M-1122260400-3600.gwf

LDAS Minute trend:

[koji.arai@ldas-pcdev3 C-M-11]$pwd /archive/frames/trend/minute-trend/40m/C-M-11 [koji.arai@ldas-pcdev3 C-M-11]$ ls -l | tail -rw-r--r-- 1 1001 1001 35488497 Jul 26 19:59 C-M-1121997600-3600.gwf -rw-r--r-- 1 1001 1001 35477333 Jul 26 21:00 C-M-1122001200-3600.gwf -rw-r--r-- 1 1001 1001 35498259 Jul 26 21:59 C-M-1122004800-3600.gwf -rw-r--r-- 1 1001 1001 35509729 Jul 26 22:59 C-M-1122008400-3600.gwf -rw-r--r-- 1 1001 1001 35472432 Jul 26 23:59 C-M-1122012000-3600.gwf -rw-r--r-- 1 1001 1001 35472230 Jul 27 00:59 C-M-1122015600-3600.gwf -rw-r--r-- 1 1001 1001 35468199 Jul 27 01:59 C-M-1122019200-3600.gwf -rw-r--r-- 1 1001 1001 35461729 Jul 27 02:59 C-M-1122022800-3600.gwf -rw-r--r-- 1 1001 1001 35486755 Jul 27 03:59 C-M-1122026400-3600.gwf -rw-r--r-- 1 1001 1001 35467084 Jul 27 04:59 C-M-1122030000-3600.gwf

11465   Thu Jul 30 11:47:54 2015 KojiSummaryCDSStatus of the frame data syncing

Today it was synced at 5AM but that was all.

40m:

controls@nodus|minute > pwd /frames/trend/minute controls@nodus|minute > ls -l 11222|tail -rw-r--r-- 1 controls controls 35521183 Jul 29 21:59 C-M-1122264000-3600.gwf -rw-r--r-- 1 controls controls 35509281 Jul 29 22:59 C-M-1122267600-3600.gwf -rw-r--r-- 1 controls controls 35511705 Jul 29 23:59 C-M-1122271200-3600.gwf -rw-r--r-- 1 controls controls 35809690 Jul 30 00:59 C-M-1122274800-3600.gwf -rw-r--r-- 1 controls controls 35752082 Jul 30 01:59 C-M-1122278400-3600.gwf -rw-r--r-- 1 controls controls 35927246 Jul 30 02:59 C-M-1122282000-3600.gwf -rw-r--r-- 1 controls controls 35775843 Jul 30 03:59 C-M-1122285600-3600.gwf -rw-r--r-- 1 controls controls 35648583 Jul 30 04:59 C-M-1122289200-3600.gwf -rw-r--r-- 1 controls controls 35643898 Jul 30 05:59 C-M-1122292800-3600.gwf -rw-r--r-- 1 controls controls 35704049 Jul 30 06:59 C-M-1122296400-3600.gwf controls@nodus|minute > ls -l 11223|tail total 139616 -rw-r--r-- 1 controls controls 35696854 Jul 30 08:02 C-M-1122300000-3600.gwf -rw-r--r-- 1 controls controls 35675136 Jul 30 08:59 C-M-1122303600-3600.gwf -rw-r--r-- 1 controls controls 35701754 Jul 30 09:59 C-M-1122307200-3600.gwf -rw-r--r-- 1 controls controls 35718038 Jul 30 10:59 C-M-1122310800-3600.gwf

LDAS Minute trend:

[koji.arai@ldas-pcdev3 C-M-11]$pwd /archive/frames/trend/minute-trend/40m/C-M-11 [koji.arai@ldas-pcdev3 C-M-11]$ ls -l |tail -rw-r--r-- 1 1001 1001 35518238 Jul 29 19:59 C-M-1122256800-3600.gwf -rw-r--r-- 1 1001 1001 35514930 Jul 29 20:59 C-M-1122260400-3600.gwf -rw-r--r-- 1 1001 1001 35521183 Jul 29 21:59 C-M-1122264000-3600.gwf -rw-r--r-- 1 1001 1001 35509281 Jul 29 22:59 C-M-1122267600-3600.gwf -rw-r--r-- 1 1001 1001 35511705 Jul 29 23:59 C-M-1122271200-3600.gwf -rw-r--r-- 1 1001 1001 35809690 Jul 30 00:59 C-M-1122274800-3600.gwf -rw-r--r-- 1 1001 1001 35752082 Jul 30 01:59 C-M-1122278400-3600.gwf -rw-r--r-- 1 1001 1001 35927246 Jul 30 02:59 C-M-1122282000-3600.gwf -rw-r--r-- 1 1001 1001 35775843 Jul 30 03:59 C-M-1122285600-3600.gwf -rw-r--r-- 1 1001 1001 35648583 Jul 30 04:59 C-M-1122289200-3600.gwf

11479   Wed Aug 5 10:56:07 2015 ericqUpdateCDSMany models crashed

Last night around 1AM, many of the the frontend models crashed due to an ADC timeout. (But none of the IOPs, and all the c1lsc models were fine.)

First, on c1sus (Wed Aug  5 00:56:46 PDT 2015)
[1502036.695639] c1rfm: ADC TIMEOUT 0 46281 9 46153 [1502036.945259] c1pem: ADC TIMEOUT 0 56631 55 56695 [1502036.965969] c1mcs: ADC TIMEOUT 1 56706 2 56770 [1502036.965971] c1sus: ADC TIMEOUT 1 56706 2 56770

Then, simultaneously on c1ioo, c1iscex, and c1iscey. (Wed Aug  5 01:10:53 PDT 2015)

[1509007.391124] c1ioo: ADC TIMEOUT 0 46329 57 46201 [1509007.702792] c1als: ADC TIMEOUT 1 63128 24 63192

[2448096.252002] c1scx: ADC TIMEOUT 0 46293 21 46165 [2448096.258001] c1asx: ADC TIMEOUT 0 46669 13 46541

[1674945.583003] c1scy: ADC TIMEOUT 0 46297 25 46169 [1674945.685002] c1tst: ADC TIMEOUT 0 52993 1 52865

I'm still working on getting things back up and running. Just restarting models wasn't working, so I'm trying some soft reboots...

UPDATE: A soft reboot of all frontends seems to have worked,

Attachment 1: crashes.png
11551   Tue Sep 1 02:44:44 2015 KojiSummaryCDSc1oaf, c1mcs modified for the IMC angular FF

[Koji, Ignacio]

In order to allow us to work on the IMC angular FF, we made the signal paths from PEM to MC SUSs.
In fact, there already were the paths from c1pem to c1oaf. So, the new paths were made from c1oaf to c1mcs. (Attachment 1~3)

After some debugging those two models started running. The additional cost of the processing time is insignificant.
FB was restarted to accomodate the change.

Once the modification of the models was completed, the OAF screens were modified. It seemed that the Kissel button
for the output matrix haven't been updated for the PRM ASC implementation. This was fixed as the button was updated this time.
In addition, the button for the FM matrix was also made and pasted.

Attachment 1: c1oaf_screenshot1.png
Attachment 2: c1oaf_screenshot2.png
Attachment 3: c1mcs_screenshot.png
Attachment 4: OAF_MEDM1.png
Attachment 5: OAF_MEDM2.png
11564   Thu Sep 3 02:12:08 2015 ranaUpdateCDSSimulink Webview updated

Back in 2011, JoeB wrote some entries on how to automatically update the Simulink webview stuff.

Somehow, the cron broke down over the years. I reran the matlab file by hand today and it worked fine, so now you can see the up to date models using the internet.

https://nodus.ligo.caltech.edu:30889/FE/

11565   Thu Sep 3 02:30:46 2015 ranaUpdateCDSc1cal time reduced by deleting LSC sensing matrix

I experimented with removing somethings here and there to reduce the c1cal runtime. Eventually I deleted the LSC Sensing Matrix from it.

• Ever since the upgrade, the c1cal has gone from 60 to 68 usec run time, so its constantly over.
• Back when Jenne set it up back in Oct 2013, it was running at 39 usec.
• The purely CAL stuff had some wacko impossible filters in it: please don't try to invert the AA filters making a filter with multiple zeros in it Masayuki.
• I removed the weird / impossible / unstable filters.
• I'm guessing that the sensing matrix code had some hand-rolled C-code blocks which are just not very speedy, so we need to rethink how to do the lockin / oscillator stuff so that it doesn't overload the CPU. I bet its somewhere in the weird way the I/Q signals were untangled. My suggestion is to change this stuff to use the standard CDS lockin modules and just record the I/Q stuff. We don't need to try to make magnitude and phase in the front end.

After removing sensing matrix, the run time is now down to 6 usec.

11567   Thu Sep 3 13:25:40 2015 ranaUpdateCDSSimulink Webview updated

added the cron script for this to megatron to run at 8:44 AM each morning. Here's the new MegaCron attached :-()-

** it takes ~13 minutes to complete on megatron

Attachment 1: crontab_150903.rtf
MAILTO=ericq@caltech.edu

# m h  dom mon dow   command
#0 */1 * * * bash /home/controls/public_html/summary/bin/c1_summary_page.sh > /dev/null 2>&1
#15 5 * * * /ligo/apps/nds2/nds2-megatron/test-restart

# MEDM Screen caps for the webpage
2,13,25,37,49 * * * * /cvs/cds/project/statScreen/bin/cronjob.sh

# op340m transplants -ericq

... 18 more lines ...
11570   Fri Sep 4 00:58:29 2015 ranaUpdateCDSsoldering the Generic Pentek interface board

Q and Ignacio were taking a second look at the Pentek interface board which we're using to acquire the POP QPD, ALS trans, and MCF/MCL channels. It has a differential intput, two jumper able whitening stages inside and some low pass filtering.

I noticed that each channel has a 1.5 kHz pole associate with each 150:15 whitening stage. It also has 2 2nd order Butterworth low pass at 800 Hz. Also there's a RF filter on the front end. We don't need all that low passing, so I started modifying the filters. Tonight I moved the 800 Hz poles to 8000 Hz. Tomorrow we'll move the others if Steve can find us enough (> 16) 1 nF SMD caps (1206 NPO).

After this those signals ought to have less phase lag and more signal above 1 kHz. Since the ADC is running at 64 kHz, we don't need any analog filtering below 8 kHz.

11573   Fri Sep 4 08:00:49 2015 IgnacioUpdateCDSRC low pass circuit (1s stage) of Pentek board

Here is the transfer function and cutoff frequency (pole) of the first stage low pass circuit of the Pentek whitening board.

Circuit:

R1 = R2 = 49.9 Ohm, R3 = 50 kOhm, C = 0.01uF. Given a differential voltage of 30 volts, the voltage across the 50k resistor should be 29.93 volts.

Transfer Function:

Given by,

$H(s) = \frac{1.002\text{e}06}{s+1.002\text{e}06}$

So low pass RC filter with one pole at 1 MHz.

I have updated the schematic, up to the changes mentioned by Rana plus some notes, see the DCC link here: [PLACEHOLDER]

I should have done this by hand...

Attachment 1: circuit.pdf
11574   Fri Sep 4 09:23:32 2015 IgnacioUpdateCDSModified Pentek schematic

Attached is the modifed Pentek whitening board schematic. It includes the yet to be installed 1nF capacitors  and comments.

Attachment 1: schematic.pdf
11579   Fri Sep 4 20:42:14 2015 gautam, ranaUpdateCDSCheckout of the Wenzel dividers

Some years ago I bought some dividers from Wenzel. For each arm, we have x256 and a x64 divider. Wired in series, that means we can divide each IR beat by 2^14.

The highest frequency we can read in our digital system is ~8100 Hz. This corresponds to an RF frequency of ~132 MHz which as much as the BBPD could go, but less than the fiber PDs.

Today we checked them out:

1. They run on +15V power.
2. For low RF frequencies (< 40 MHz) the signal level can be as low as -25 dBm.
3. For frequencies up to 130 MHz, the signal should be > 0 dBm.
4. In all cases, we get a square wave going from 0 ~ 2.5 V, so the limiter inside keeps the output amplitude roughly fixed at a high level.
5. When the RF amplitude goes below the minimum, the output gets shaky and eventually drops to 0 V.

Since this seems promising, we're going to make a box on Monday to package both of these. There will one SMA input and output per channel.

Each channel will have a an amplifier since this need not be a low noise channel. The ZKL-1R5 seems like a good choice to me. G=40 dB and +15 dBm output.

Then Gautam will make a frequency counter module in the RCG which can do counting with square waves and not care about the wiggles in the waveform.

I think this ought to do the trick for our Coarse frequency discriminator. Then our Delay Box ought to be able to have a few MHz range and do all of the Fast ALS Carm that we need.

Attachment 1: TEK00000.PNG
Attachment 2: TEK00001.PNG
Attachment 3: TEK00002.PNG
11647   Tue Sep 29 03:14:04 2015 gautamUpdateCDSFrequency divider box

Earlier today, the front panels for the 1U chassis I obtained to house the Wenzel dividers + RF amplifiers arrived, which meant that finally I had everything needed to complete the assembly. Pictures of the finished arrangement attached.

Summary of the arrangement:

• Two identical channels (RF amplifier + /64 divider + /256 divider), one for each arm
• The front panels are anodized, and isolated SMA feedthroughs are used
• Given the large number of units to be supplied with DC power (2 amplifiers + 4 dividers), I chose to use two D1000217 power regulators (the default configuration takes +-18V as input, and outputs regulated +-15V, which was fine for the dividers, but the ZKL-1R5 requires +12V, so I changed the resistor R2 in the schematic from a 10.7K to 8.451K so as to accommodate this).
• The amplifiers and dividers are mounted on a steel plate, which is itself mounted on the chassis via insulating posts.

Testing:

• I first verified the power regulator circuitry without hooking up the amplifiers/dividers - with a multimeter, I verified I was getting +15V and +12V as expected.
• I then connected the amplifiers and dividers, and decided to first check the behaviour of each channel using the Fluke 6061 RF function generator and an oscilloscope. One of the channels (X-arm in the current configuration) worked fine - I got a 0-2.5V square wave as the output for input signals as low as -38dBm at 130MHz (consistent with out earlier observations).
• The Y-arm channel however did not give me any output. In order to debug the problem, I decided to check the output after the amplifier first. The amplifier does not seem to be working for this channel - I get the same amplitude at the output as at the input. I verified that the correct DC power voltage of +12V was being supplied with a multimeter, but I am not sure how to debug this further. The amplifier is basically straight out of the box, and as far as I can tell, I have not done anything to damage it, as this was the first time I am connecting it to anything, and I repeated the same steps on the Y-arm as the X-arm, which seems to work alright.
• The rest of the Y-arm signal chain was verified to be working by bypassing the amplifier stage (the attached photographs show the box in this configuration. There seems to be no issues with the divider part of the signal chain.

Once I figure out the problem with this amplifier/replace it, the box is ready to be installed.

Attachment 1: IMG_0014.JPG
Attachment 2: IMG_0015.JPG
11661   Sun Oct 4 12:07:11 2015 jamieConfigurationCDSCSD network tests in progress

I'm about to start conducting some tests on the CDS network.  Things will probably be offline for a bit.  Will post when things are back to normal.

11663   Sun Oct 4 14:23:42 2015 jamieConfigurationCDSCSD network test complete

I've finished, for now, the CDS network tests that I was conducting.  Everything should be back to normal.

What I did:

I wanted to see if I could make the EPICS glitches we've been seeing go away if I unplugged everything from the CDS martian switch in 1X6 except for:

• fb
• fb1
• chiara
• all the front end machines

What I unplugged were things like megatron, nodus, the slow computers, etc.  The control room workstations were still connected, so that I could monitor.

I then used StripTool to plot the output of a front end oscillator that I had set up to generate a 0.1 Hz sine wave (see elog 11662).  The slow sine wave makes it easy to see the glitches, which show up as flatlines in the trace.

More tests are needed, but there was evidence that unplugging all the extra stuff from the switch did make the EPICS glitches go away.  During the duration of the test I did not see any EPICS glitches.  Once I plugged everything back in, I started to see them again.  However, I'm currently not seeing many glitches (with everything plugged back in) so I'm not sure what that means.  I think more tests are needed.  If unplugging everything did help, we still need to figure out which machine is the culprit.

11665   Sun Oct 4 14:32:49 2015 jamieConfigurationCDSCSD network test complete

Here's an example of the glitches we've been seeing, as seen in the StripTool trace of the front end oscillator:

You can clearly see the glitch at around T = -18.  Obviously during non-glitch times the sine wave is nice and cleanish (there are still the very small discretisation from the EPICS sample times).

11683   Fri Oct 9 19:54:58 2015 gautamUpdateCDSFrequency divider box - installation in 1X2 rack

The new ZKL-1R5 RF amplifier that Steve ordered arrived yesterday. I installed this in the frequency divider box and did a quick check using the Fluke RF signal generator and an oscilloscope to verify that both the X and Y paths were working.

I've now installed the box in the 1X2 rack where the olf "RF amplifiers for ALS and FOL" box used to sit (I swapped that out as I needed the L brackets on that chassis to mount mine, see Attachment #1 for the new layout). The power cable that used to power the old chassis was available, but the connector was of the wrong gender, so I had to switch this out. After verifying that I was getting the correct voltage (+15V), I connected it to the chassis.

I then did a quick check with the Fluke generator to make sure that all was working as expected - Eric had set up some ADC channels for me earlier today in the C1ALS model, and I copied over my frequency counting module from C1TST into C1ALS, and recompiled the model. The RF generator was set to generate a 25MHz signal at -20dBm, which I then split using an RF power splitter between the X and Y arms. I then checked the output using dataviewer - I recovered an output frequency of ~27.64 MHz with a jitter of ~0.02 MHz with a 20Hz low-pass filter in place (see Attachment #2), which looks consistent with the systematic error inherent in the zero-crossing counting algorithm and random fluctuations I had observed in my earlier trials, discussed here. But a more systematic investigation needs to be carried out in this regard. The interfacing between the hardware and software seems to be working alright though. I've left the RF generator near the 1x2 rack for now, though its powered off.

The mode cleaner unlocked quite a few times while I was working but looks stable now.

Attachment 1: IMG_0027.JPG
Attachment 2: time_seris_25MHz.pdf
11684   Mon Oct 12 17:04:02 2015 gautamUpdateCDSFrequency divider box - further tests

I carried out some more tests on the digital frequency counting system today, mainly to see if the actual performance mirrors the expected systematic errors I had calculated here

Setup and measurement details:

I used the Fluke 6061A RF signal generator to output an RF signal at various frequencies, one at a time, between 10 and 70 MHz. I split the signal (at -15 dBm) into two parts, one for the X-channel and one for the Y-channel using a mini-circuits splitter. I then looked at the input signal using testpoints I had set up within the model, to decide what thresholds to set for the Scmitt trigger. Finally, I averaged the outputs of the X and Y channels using z avg -s 10 C1:ALS-FC_X_FREQUENCY_OUT and also looked at the standard deviation as a measure of the fluctuations in the output (these averages were taken after a low-pass filter stage with two poles at 20Hz, chosen arbitrarily).

Results:

• Attachment #1 shows a plot of the measured RF frequency as a function of the frequency set on the Fluke 6061A. The errorbars on this plot are the standard deviations mentioned above.
• Attachment #2 shows a plot of the systematic error (mean measured value - expected value) for the two channels. It is consistent with the predictions of Attachments #3 and #4 in elog 11628 (although I need to change the plots there to a frequency-frequency comparison). This error is due to the inherent limitations of frequency counting using zero crossings, I can't think of a way to get around this).
• I found that a lower threshold of 1800 and an upper threshold of 2200 worked well over this range of frequencies (the output of the Wenzel dividers goes between 0V and 2.5V, and the "zero" level for the digitized signal corresponds to ~2000, as determined by looking at a dataviewer plot of a tetspoint I had set up in my model). Koji suggested taking a look at the raw ADC input signal sampled at 64 kHz but this is not available for c1x03, the machine that c1als runs on.

Attachment 1: calibration.pdf
Attachment 2: systematics.pdf
11687   Tue Oct 13 17:04:54 2015 ericqUpdateCDSCDS things

After some discussion at last week's 40m meeting, I increased the frequency of daqd trying to write out minute trends from hourly to every two hours.

This has eliminated the hourly crashes. daqd still crashes sometimes, but only a few times per day.

However, looking at the oplev summary pages that actually use the minute trends, it looks like they're only sporadically getting succesfully written out.

Also, I was having a lot of problems with the frontends' EPICS processes dying when I would try to update the SDF table. I rebuilt all of the frontends with RCG 2.9.6, which differs from the 2.9.4 that we had been running by SDF bugfixes and an RMS calculation bugfix. The SDF procedures are much more stable now.

I have not yet discovered anything broken by this chage, and the tests I made for the last upgrade were all fine; last weeks tiny DRFPMI lock was achieved after this change.

11690   Wed Oct 14 17:40:50 2015 gautamUpdateCDSFrequency divider box - further tests

Summary:

I carried out some further diagnostics and found some ways in which I could optimize the zero-crossing-counting algorithm, such that the error in the measured frequency is now entirely within the expected range (due to a +-1 clock cycle error in the counting). We can now determine frequencies up to ~60 MHz with less than 1 MHz systematic error and <10 kHz statistical error (fluctuations after the 20 Hz lowpass). This should be sufficient for slow control of the end-laser temperatures.

Details:

The conclusion from my earleir tests was that there was possibly an improvement that could be made to setting the thresholds for the Schmitt trigger stage in the model. In order to investigate this, I wanted to have a look at the 64K sampled raw input to the ADCs. Yesterday Eric helped me edit the appropriate .par file for viewing these channels for c1x03, and for an input frequency of 70MHz (after division, ~4.3 kHz square wave), the signal looked as expected (top left plot, attachment #1). This prompted me to check the counting algorithm again with the help of various test points I had setup within the model. I found that there was a tendency to under-count the number of clock-cycles between zero-crossings by more than 1 clock cycle, due to the way my code was organized. I fixed this and found that the performance improved dramatically, compared to my previous trials. With the revised counting algortihm, there was at most a +-1 clock cycle error in the counting, and the systematic error between the measured and requested RF frequencies is now completely accounted for taking this consideration into account. The origin of this residual error can be understood by looking at the top right plot in Attachment #1 - presumably because of the effects of some downsampling filter, the input signal to the Schmitt trigger isnt a clean square wave (even at 4kHz) - specifically, the time spent in the LOW and HIGH states of the Schmitt trigger can vary between successive zero crossings because of the shape of the input waveform. As a result, there can be a +-1 clock cycle error in the counting process. Attachment #2 shows this - the red and blue lines envelope the measured frequency for the whole range investigated: 10-70MHz. Attachment #3 shows the systematic error as a function of the requested frequency.

If there was some way to bypass the downsampling filter, perhaps the high-frequency performance could be improved a little.

Attachment 1: time_series_input_signals.pdf
Attachment 2: calibration_20151012.pdf
Attachment 3: systematics_20151012.pdf
11699   Mon Oct 19 16:16:01 2015 ericqUpdateCDSC1ALS crashes

Gautam was working on his digital frequency counter stuff when the c1als model crashed. I had trouble bringing it back until I realized that, for reasons unknown to me, the safe.snap file that the model looks for at boot had been deleted. (This file lives in /target/c1als/c1alsepics/burt). I copied over this morning's version from Chiara's local backup.

At the sites, these files are under version control in the userapps svn repository, presumably symlinked into the target directory. We should definitely do something along these lines.

11704   Tue Oct 20 17:36:01 2015 gautamUpdateCDSFrequency counting with moving average

I'm working on setting up a moving-average in the custom C code block that counts the zero crossings to see if this approach is able to mitigate the glitchy frequency readout due to mis-counting by one clock cycle between successive zero crossings. I'm storing an array the size of the moving average window of frequency readouts at each clock cycle, and then taking the arithmetic mean over the window. By keeping a summing variable that updates itself each clock cycle, the actual moving average process isnt very intensive in terms of computational time. The array does take up some memory, but even if I perform the moving average over 1 second with 16384 double precision numbers stored in the array, its still only 130 kB so I don't think it is a concern. Some tests I've been doing while tuning the code suggest that with a moving average over 16384 samples (i.e. 1 second), I can eliminate glitches at the 1Hz level in the frequency readout for frequencies up to 5 kHz (generated digitally using an oscillator block). Some tuning still needs to be done, and the window could possibly be shortened. I also need to take a look at the systematic errors in this revised counting scheme, preferably with an analog source, but this is overall, I think, an improvement.

On a side note, I noticed some strange behaviour while running the cds average command - even though my signal had zero fluctuations, using z avg 10 -s C1:TST-FC_FREQUENCY_OUT gave me a standard deviation of ~1 kHz. I'm not sure what the problem is here, but all the calibration data I took in earlier trials were obtained using this so it would be useful to perform the calibration again.

11709   Fri Oct 23 18:36:48 2015 gautamUpdateCDSFrequency counting - workable setup prepared

I've made quite a few changes in the software as well as the hardware of the digital frequency counting setup.

• The main change on the software side is that the custom C code that counts intervals between successive zero crossings and updates the frequency output now has a moving average capability - the window size is readily changable (by a macro in the first line of the code, which resides at /opt/rtcds/userapps/release/cds/c1/src/countZeroCrossingWindowed.c - however, changing the window size requires that the model be recompiled and restarted), and is currently set to 4096 because based on some empirical trials I did, this seemed to give me the frequency output with the least jitter, and also smaller systematic errors than in my earlier trials described here.
• The filter modules for both the X and Y channels now have 2 pole butterworth low pass filters with poles at 64Hz, 32Hz, 16Hz, 8Hz, 4Hz, 2Hz and 1Hz loaded. Again, based on my empirical trials, a combination of a moving average filter in the C code and the IIR filters after that worked best in terms of reducing the jitter in the frequency readout. I think the fact that the moving average 'spreads' the impulse caused by a glitch in the counting algorithm improves the response of the combination as compared to having only the IIR filters in place.
• The Frequency Counting SIMULINK block has been cleaned up a little - I have removed unnecessary test points I had set up for debugging, and is now a library part called "FC".
• After the experience of having C1ALS crash as noted here, I was doing all my testing in the C1TST model. Having made all the changes above, I reverted to the C1ALS model, which compiled and ran successfully this time.
• On the hardware side, I interchanged the couplers mentioned here - so the 20dB coupler now sits on the X green beat PD, while the 10dB coupler sits on the Y green beat PD. This change was motivated by wanting to test out the digital frequency counting setup by performing an arm scan through an IR resonance using ALS, and we found that the PSL+Y green beat frequency was better behaved than the X+PSL combination.

Arm scan

Eric helped me test the new setup by doing an arm scan through an IR resonance by ramping the ALS offset from -3 to +3 with a ramp time of 45 seconds. The data was acquired with the window size of the moving average set to 4096 clock cycles, and a 2 Hz low pass IIR filter before the frequency readout. Attachment 1 shows a plot of the data, and a fit with a function of the form trans = a/(1+((x-b)/c)^2), where a = normalization, b = center of lorentzian, and c = linewidth (FWHM) of the peak (the fitted parameter values, along with 95% confidence bounds are also quoted on the plot). In terms of the data acquisition, comparing this dataset to one from an earlier scan Eric did (elog11111) suggests that the frequency counting setup is working reasonably well - at any rate, I think the data is a lot cleaner than before implementing the moving average and having a 20Hz lowpass IIR filter. In any case, we plan to repeat this measurement sometime next week during a nighttime locking session. It remains to calculate the arm loss from these numbers analogous to what was done earlier for the X arm.

Calculation of loss:

Fitted linewidth = 10.884 kHz +/- 11Hz (95% C.I.)

FSR of Y arm (from elog 9804) = 3.9468 MHz +/- 1.1 kHz

=> Y arm Finesse = FSR/fitted linewidth =  362.6 +/- 0.5

Total round trip loss = 2*pi/Finesse = 0.0173

Attachment 1: Yscan.pdf
11710   Fri Oct 23 19:27:19 2015 KojiUpdateCDSFrequency counting - workable setup prepared

It is a nice scan. I'm still thinking about the equivalence of the moving average and the FIR low pass that I have mentioned in the meeting.

Anyways:

I'm confused by the plot. The bottom axis says "green beat frequency".
If you scan the IR laser frequency by df, you get 2*df shift of the green beatnote. You need to have this factor of two somewhere.
If you are looking at the IR beat freq, just the label is not correct. (I believe this is the case.)

Accepting the rather-too-low finesse of 363 (nominally 450), the total round trip loss is said to be 0.0173.
If we subtract the front transmission of 1.38% (and ignoring the transmission loss from the ETM),
the round trip loss is 3500ppm. Is this compatible with the following elog?
http://nodus.ligo.caltech.edu:8080/40m/11111
In fact, I'm afraid that the loss number in the above elog 11111 was not correct by a factor of 10.
Then, if so, can we believe this high loss number? (Nominally we expect ~100ppm loss per round trip...)

11712   Sat Oct 24 12:34:43 2015 gautamUpdateCDSFrequency counting - workable setup prepared

Sorry for the confusion - I did mean Green beat frequency, and I had neglected the factor of 2 in my earlier calculations. However, the fit parameter "c" in my fit was actually the half-width at half maximum and not the full width at half maximum. After correcting for both these errors (new fit is Attachment #1, where I have now accounted for the factor of 2, and the X axis is the IR beat frequency), I don't think the numbers change too much. It could be that the frequency counter wasn't reading out the frequency correctly, but looking at a time series plot of the frequency counter readout (Attachment #2), and my earlier trials, I don't think this is the case (38 MHz is a frequency at which I don't expect much systematic error - also, the offset was stepped from -3 to 3 over 45 seconds).

The revised numbers:

Fitted linewidth = 2*c = 10.884 kHz +/- 2 Hz (95% C.I.)

FSR of Y arm (from elog 9804) = 3.9468 MHz +/- 1.1 kHz

=> Y arm Finesse = FSR/fitted linewidth =  362.6 +/- 0.5

Total round trip loss = 2*pi/Finesse = 0.0173

Attachment 1: Yscan.pdf
11723   Fri Oct 30 16:56:04 2015 gautamUpdateCDSis there a problem with the SCHMITTTRIGGER CDS library part?

Over the course of my investigations into the systematic errors in the frequency readout using digital frequency counting, I noticed that my counter variable that keeps track of the number of clock cycles between successive zero crossings was NOT oscillating between 2 values as I would have expected (because of there being a +/- 1 clock cycle difference between successive zero crossings due to the fixed sampling time of 1/16384 seconds), but that there were occassional excursions to values that were +/- 3 clock cycles away. I then checked the output of the SCHMITTTRIGGER CDS library part (which I was using to provide some noise immunity), and noticed that it wasn't triggering on every zero crossing at higher frequencies. I tested this by hooking up a digital oscillator to the SCHMITTTRIGGER part, and looked at its output for different frequencies. The parameters used were as follows:

CLKGAIN: 10000

SCHMITTTRIGGER lower threshold: -1.0

SCHMITTTRIGGER upper threshold: +1.0

I am attaching plots for two frequencies, 3000Hz and 4628Hz (Attachments #1 and #2) . I would have expected a flip in the state of the output of the schmitt trigger between every pair of horizontal red lines in this plot, but at 4628 Hz, it looks like the schmitt trigger isn't catching some of the zero crossings? Come to think of it, I am not even sure why the output of the schmitt trigger takes on any values other than 0 or 1 (could this be an artefact of some sort of interpolation in the visualization of these plots? But this would not affect the conclusion about the schmitt trigger missing some of the zero crossings?)

As an interim measure, I implemented a Schmitt trigger in my C code block - it was just a couple of extra lines anyways - I have designated the schmitt trigger output as a static variable that should hold its value in successive execution cycles, unless it is updated by comparing the input value to the thresholds (code attached for reference). Attachments #3 and #4 show the output of this implementation of a Schmitt trigger at the same two frequencies, and I am seeing the expected flip in the state between successive zero crossings as expected (though I'm still not sure why it takes on values other than 0 and 1?). Anyways this warrants further investigation. An elog regarding the implications of this on the systematic error in the frequency counter readout to follow.

Attachment 1: 3000Hz.pdf
Attachment 2: 4628Hz.pdf
Attachment 3: 3000Hz_software_SCHMITT.pdf
Attachment 4: 4628Hz_software_SCHMITT.pdf
11731   Wed Nov 4 16:12:07 2015 ericqUpdateCDSOPTIMUS

There is a new machine on the martian network: 32 cores and 128GB of RAM. Probably this is more useful for intensive number crunching in MATLAB or whatever as opposed to IFO control. I've set up some of the LIGO analysis tools on it as well.

A successor to Megatron, I dub it: OPTIMUS

11736   Thu Nov 5 03:04:13 2015 gautamUpdateCDSFrequency counting - systematics and further changes

I've made a few more changes to the frequency counting code - these are mostly details and the algorithm is essentially unchanged.

• The C-code block in the simulink model now outputs the number of clock cycles (=1/16384 seconds) rather than the frequency itself. I've kept converting this period to frequency step by taking the reciprocal as the last step of the signal chain, i.e. after the LP filter.
• In the current version of the FC library block, I've disabled the moving average in the custom C code - I've left the functionality in the code, but the window length at the moment is set to 1, which in effect means that there is no moving average. I found that comparable jitters in the output are obtained by using no moving average, and having two 2-pole IIR low-pass filters in series (at 4Hz and 2Hz) as by doing a moving average over 4096 clock cycles and then passing that through a 2Hz IIR lowpass filter (as is to be expected).

Systematic error

The other thing that came up in the meeting last week was this issue of the systematic errors in the measured frequency, and how it was always over-estimating the 'actual' frequency. I've been investigating the origin of this over the last few days, and think I've found an explanation. But first, Attachment #1 shows why there is a systematic error in the first place - because we are counting the period of the input signal in terms of clock cycles, which can only take on discrete, integer values, we expect this number to fluctuate between the two integers bounding the 'true value'. So, if I'm trying to measure an input signal of 3000Hz, I would measure its period as either 5 or 6 clock cycles, while the "true"  value should be 5.4613 clock cycles. In attachment #1, I've plotted the actual measured frequency and the measured frequency if we always undercounted/overcounted to the nearest integer clock cycles, as functions of the requested frequency. So the observed systematic error is consistent with what is to be expected.

The reason why this doesn't average out to zero is shown in Attachment #2. In order to investigate this further, I recorded some additional diagnostic variables. If I were to average the period (in terms of clock cycles - i.e. I look for the peaks in the blue cuve, add them up, and divide by the number of peaks), I find that I can recover the expected period in terms of clock cycles pretty accurately. However, the way the code is set up at the moment, the c code block outputs a value every 1/16384 seconds (red curve) - but this is only updated each time I detect a zero crossing - and as a result, if I average this, I am in effect performing some sort of weighted average that distorts the true ratio of the number of times each integer clock-cycle-period is observed. This is the origin on the systematic error, and is a function of the relative frequency each of the two integer values of the clock-cycle-period occurs, which explains why the systematic error was a function of the requested frequency as seen in Attachment #1, and not a constant offset.

At the frequencies I investigated (10-70MHz in 5MHz steps), the maximum systematic error was ~1%.

Is there a fix?

I've been reading up a bit on the two approaches to frequency counting - direct and reciprocal. My algorithm is the latter, which is generally regarded as the more precise of the two. However, in both these approaches, there is a parameter known as the 'gate-time': this is effectively how long a frequency counter measures for before outputting a value. In the current approach, the gate time is effectively 1/16384 seconds. I would think that it is perhaps possible to eliminate the systematic error by setting the gate time to something like 0.25 seconds, and within the gate time, do an average of the total number of periods measured. Something like 0.25 seconds should be long enough that if, within the window, we do the averaging, and between windows, we hold the averaged value, the systematic error could be eliminated. I will give this a try tomorrow. This would be different from the moving average approach already explored in that within the gate-time, I would perform the average only using those datapoints where the 'running counter variable' shown in Attachment #2 is reset to zero - this way, I avoid the artificial weighting that is an artefact of spitting out a value every clock cycle.

Attachment 1: Systematic_error.pdf
Attachment 2: systematics_origin.pdf
11754   Wed Nov 11 22:50:39 2015 KojiConfigurationCDSSlow machine time&date

I was gazing at the log file for Autolocker script (/opt/rtcds/caltech/c1/scripts/MC/logs/AutoLocker.log )
and found quite old time stamps. e.g.

Old : C1:IOO-MC_VCO_GAIN             1991-08-08 14:36:28.889032 -3 New : C1:IOO-MC_VCO_GAIN             1991-08-08 14:36:36.705699 18 Old : C1:PSL-FSS_FASTGAIN            1991-08-09 19:05:39.972376 14 New : C1:PSL-FSS_FASTGAIN            1991-08-09 19:05:44.939043 18

It was found that the date/time setting of some of the slow machines (at least c1psl and c1iool0) is not correct.
I could not figure out how to fix it.

Question: Is this anything critical?

Another thing: While I was in c1iool0 I frequently saw the message like

c1iool0 > 0xc461f0 (CA event): Events lost, discard count was 514

Is this anything related to EPICS Freeze?

 controls@nodus|~ > telnet c1psl.martian Trying 192.168.113.53... Connected to c1psl.martian. Escape character is '^]'. c1psl > date Aug 09, 1991 19:13:26.439024274 value = 32 = 0x20 = ' ' c1psl > telnet> q Connection closed. controls@nodus|~ > telnet c1iool0.martian Trying 192.168.113.57... Connected to c1iool0.martian. Escape character is '^]'. c1iool0 > date Aug 08, 1991 14:44:39.755679528 value = 32 = 0x20 = ' ' c1iool0 > 0xc461f0 (CA event): Events lost, discard count was 514 Change MC VCO gain to -3. 0xc461f0 (CA event): Events lost, discard count was 423 Change MC VCO gain to 18. Change boost gain to 1. Change boost gain to 2.

11764   Sat Nov 14 00:52:25 2015 KojiSummaryCDSInvestion on EPICS freeze

[Yutaro, Koji]

Recently "EPICS Freeze" is so frequent and the normal work on the MEDM screen became almost impossible.

As a part of the investigation, all 19 realtime processes were stopped in order to see any effect on the probem.

IN FACT, when the realtime processes were absent (still slow machines were running), frequency of EPICS Freeze
became much less. This might mean that the issue is related to the data collection of the slow channels. We need more investigation.

After the testing, all the processes were restored although it was not so straghtforward. Particularly, c1sus DAC had an error which was
not visible from the CDS Status screen. We noticed it as suspension damping was not effective on any of the vertex suspensions.
This has been solved by restarting c1x02 process.

11772   Tue Nov 17 14:31:25 2015 ericqUpdateCDS16Hz frame writing temporarily disabled

To test the effect on EPICS latency, I've restarted daqd with modified ini files which disable all frame writing of 16Hz channels.

This happened at GPS:1131835955 aka Nov 17 2015 22:52:18 UTC

Last night, I started running a script written by Dave Barker that monitors a specified EPICS channel (in this case C1:IOO-MC_TRANS_SUM), to look for seconds in which it does not update the expected number of times. This is still running, so I will be able to compare the rate of EPICS slowdowns before and after this change.

I will revert back to the nominal state of things in a few hours, or until someone asks me to.

11777   Tue Nov 17 20:57:43 2015 ericqUpdateCDS16Hz frame writing running again

Back to nominal FB configuration at 1131857782, aka Nov 18 2015 04:56:05 UTC.

Weirdly, during this time, the script watching MC_TRANS_SUM from pianosa saw tons of freezes, but another instance  watching LSC-TRY_OUT16 on optimus saw no freezes.

11778   Wed Nov 18 10:10:53 2015 ericqUpdateCDSc1iscey IO chassis missing brackets

Steve and I inadvertently discovered that the c1iscey IO chassis doesn't have brackets to secure the cards where the ADC/DAC cables are connected, making them very easy to knock loose. All other IO chassis have these brackets. Pictures of c1iscey and c1lsc IO chassis to compare:

11789   Thu Nov 19 15:16:24 2015 ericqUpdateCDSc1iscey IO chassis now has brackets

[Steve, ericq]

Brackets for the c1iscey IO chassis cards have been installed. Now, I can't unseat the cards by wiggling the ADC or DAC cable.

11791   Thu Nov 19 17:06:57 2015 KojiConfigurationCDSDisabled auto-launching RT processes upon FE booting

We want to startup the RT processes one by one on boot of FE machines.

Therefore /diskless/root/etc/rc.local on FB was modified as follows. The last sudo line was commented out.

for sys in $(/etc/rt.sh); do #sudo -u controls sh -c ". /opt/rtapps/rtapps-user-env.sh && /opt/rtcds/cal\ tech/c1/scripts/start${sys}"

    # NOTE: we need epics stuff AND iniChk.pl in PATH     # we use -i here so that the .bashrc is sourced, which should also     # source rtapps and rtcds user env (for epics and scripts paths)

    # commented out Nov 19, 2015, KA     # see ELOG 11791 http://nodus.ligo.caltech.edu:8080/40m/11791     # sudo -u controls -i /opt/rtcds/caltech/c1/scripts/start\${sys} done

11809   Wed Nov 25 14:46:53 2015 gautamUpdateCDSc1lsc models restarted

I noticed that all the models running on C1LSC had crashed when I came in earlier today. I restarted all of them by ssh-ing into C1LSC and running rtcds restart all. The models seem to be running fine now.

11833   Tue Dec 1 17:16:31 2015 gautamUpdateCDSScript for copying BLRMS filters

We've been talking about putting in BLRMS filters for several channels - it would be a pain to manually copy over the correct bandpass and lowpass filter coefficients into the newly created filter banks, and so I've set up a script (attached) that can do the job. As template filters, I'vm using the filters rana detailed here. Essentially, what the script does is identify the (empty on creation) block of text for a given filter: e.g. RMS_STS1Z_BP_0p01_0p03 for STS1Z), and appends the template filter coefficients. To test my script, I first backed up the original C1PEM.txt file from /opt/rtcds/caltech/c1/chans, removed all the filter coefficients for the STS1Z BLRMS filters, and then replaced it with one generated using my script. I then loaded the coefficients for all the filters in the C1PEM modules, without any obvious error messages being generated. I also checked that foton could read the new file, and checked tmake sure that sensible filter shapes were seen for some channels. Since this seems to be working, I'm going to start putting in BLRMS blocks into the models tomorrow.

Attachment 1: makeFilterFiles.bash.zip
11840   Wed Dec 2 18:54:20 2015 gautamUpdateCDSChanges to C1MCS, C1PEM

Summary:

I've made several changes to the C1MCS model and C1PEM model, and have installed BLRMS filters for the MC mirror coils, which are now running. The main idea behind this test was to see how much CPU time was added as a result of setting up IPC channels to take the signals from C1MCS to C1PEM (where the BLRMS filtering happens) - I checked the average CPU time before and after installing the BLRMS filters, and saw that the increase was about 1 usec for 15 IPC channels installed (it increased from ~27usec to 28usec). A direct scaling would suggest that setting up the BLRMS for the vertex optics might push the c1sus model close to timing out - it is at ~50usec right now, and I would need, per optic, 12 IPC channels, and so for the 5 vertex optics, this would suggest that the CPU timing would be ~55usecI have not committed either of the changed models to the SVN just yet

Details:

1. On Eric's suggestion, I edited the C1_SUS_SINGLE_CONTROL library part to tap the signal at the input to the output filters to the coils, as this is what we want the BLRMS of. I essentially added 5 more outputs to this part, one for each of the coils, and they are named ULIn etc to differentiate it from the signal after the output filters. I have not committed the changed library part to the SVN just yet
2. I used the cdsIPCx_SHMEM part to pipe the signals from C1MCS to C1PEM - a total of 15 such channels were required for the three IMC mirrors.
3. I added the same cdsIPCx_SHMEM parts to the C1PEM model in the receiving configuration, and connected their outputs to BLRMS blocks. The BLRMS blocks themselves are named as RMS_MC2_UL_COIL_IN and so on.
4. I shutdown the watchdogs to MC1, MC2 and MC3, and restarted the C1PEM and C1MCS models on C1SUS. Yutaro pointed out that on restarting C1MCS, the IMC autolocker was disabled by default - I have enabled it again manually.
5. I was under the impression that each time a BLRMS block is added, a filter bank is automatically added to the C1PEM.txt file in /opt/rtcds/caltech/c1/chans - turns out, it doesn't and my script for copying the template bandpass and lowpass filtes into the .txt files was needlessly complicated. It suffices to change the filter names in the template file, and append the template file to C1PEM.txt using the cat command: i.e. cat template.txt > C1PEM.txt. The computer generated file seems to organize the filters in alphabetical order, and my approach obviously does not do so, but the coefficients are loaded correctlty and the filters seem to be functioning correctly so I don't think this is a problem (I measured the transfer function of one of the filters with DTT, it seems to match up well with the Foton bode plot).
6. I added a few lines to the script to also turn on the filter switches after loading the filter coefficients.

Next steps:

Now that I have a procedure in place to install the BLRMS filters, we can do so for other channels as well, such as for the coils and Oplevs of the vertex optics, and the remaining PEM channels (SEIS, accelerometers, microphones?). For the vertex optics though, I am not sure if we need to do some rearrangement to the c1sus model to make sure it does not time out...

11844   Thu Dec 3 18:18:48 2015 gautamUpdateCDSBLRMS for optics suspensions - library block

In order to be consistent with the naming conventions for the new BLRMS filters, I made a library block that takes all the input signals of interest (i.e. for a generic optic, the coil signals, the local damping shadow sensors, and the Oplev Pitch and Yaw signals - so a total of 12 signals, unused ones can be grounded). The block is called "sus_single_BLRMS". Inside the block, I've put in 12 BLRMS library blocks, with each input signal going to one of them. All the 7 outputs of the BLRMS block are terminated (I got a compiling error if I did not do this). The idea is to identify the optic using this block, e.g. MC2_BLRMS. The BLRMS filters inside are called UL_COIL, UR_COIL etc, so the BLRMS channels will end up being called C1:SUS-MC2_BLRMS_UL_COIL_0p01_0p03 and so on. I tried implementing this in C1PEM, but immediately after compiling and restarting the model, I noticed some strange behaviour in the seismic rainbow STS strip in the control room - this was right after the model was restarted, before I attempted to make any changes to the C1PEM.txt file and add filters. I then manually opened up the filter bank screens for the RMS_STS1Z bandpass and lowpass filters, and saw that the filter switches were OFF - I wonder if this has something to do with these settings not being updated in the SDF tables? So I manually turned them on and cleared the filter hitsory for all 7 low pass and band pass filter banks, but the traces on the seismic striptool did not return to their nominal levels. I manually checked the filter shapes with Foton and they seem alright. Anyways, for now, I've reverted to the C1PEM model before I made any changes, and the seismic strip looks to be back at its normal level - when I recompiled and restarted the model with the changes I made removed, the STS1Z BLRMS bandpass and lowpass filters were ON by default again! I'm not sure what I'm doing wrong here, I will investigate this further.

11848   Fri Dec 4 13:12:41 2015 ericqUpdateCDSc1ioo Timing Overruns Solved

I had noticed for a while that the c1ioo frontend model had much higher variability than any of the other other 16k models, and would run longer than 60us multiple times an hour. This struck me as odd, since all it does is control the WFS loops. (You can see this on the Nov17 Summary page. (But somehow, the CDS tab seems broken since then, and I'm not sure why...))

This has happily now been solved! While poking around the model, I noticed that the MC2 transmission QPD data streams being sent over from c1mcs were using RFM blocks. This seemed weird to me, since I wasn't even aware that c1ioo had an RFM card. Since the c1sus and c1ioo frontends are both on the Dolphin network, I changed them to Dolphin blocks and voila! The cylce time now holds steady at 21usec.

Update: I think I figured out the problem with the CDS summary pages. Looking at the .err files in /home/40m/public_html/summary/logs on the 40m LDAS account showed that C1:FEC-33_CPU_METER wasn't found in the frame files. Indeed, this channel was commented out in c1/chans/daq/C0EDCU.ini. I enabled it and restarted daqd. Hopefully the CDS tab will come back soon...

11849   Fri Dec 4 18:20:36 2015 gautamUpdateCDSBLRMS for IMC setup

[ericq, gautam]

BLRMS filters have been set up for the coil outputs and shadow sensor signals. The signals are sent to the C1PEM model from C1MCS, where I use the library block mentioned in the previous elog to put the filters in place. Some preliminary observations:

1. The entire operation seems to be computationally quite expensive - just for the 3 IMC mirrors, the average CPU time for C1PEM increased from ~50 usec (before any changes were made to C1PEM) to ~105 usec just as a result of installing 420 filter modules with no filter coefficients loaded (3 optics x 10 channels x 14 filter banks) to ~120 usec when all the BP and LP filters for the BLRMS blocks have been loaded and turned on (Attachment #1). Eric suggested that it may be computationally more efficient to do this without using the BLRMS library part - i.e. rather than having so many filter modules, do the RMS-ing using a piece of C code that essentially just implements the same SOS filters that FOTON generates, I will work on setting this up and checking if it makes a difference. The fact that just having empty filter modules in the model almost doubled the computation time suggests that this approach may help, but I have to think about how to implement some of the automatic checks that having a filter bank in place gives us, or if these are strictly necessary at all...
2. While restarting the C1PEM model, we noticed that some of the optics were shaking - looking at the CPU timing signals for all the models on C1SUS revealed that both the C1SUS model and the C1MCS model were geting overclocked when C1PEM was killed (see the sharp spikes in the red and green traces in Attachment #2 - the Y scale in this plot is poor and doesnt put numbers to the overclocking, I will upload a better screenshot that Eric took once I find it). The cause is unknown.
3. Yesterday, I noticed that when C1PEM was restarted, the states of the filter bank switches were not reverted to their states in the SDF tables. They are showing up as changes, but it is unclear why we have to manually revert them. I have also not yet added the states of the newly installed filters (BPs and LPs for the MC BLRMS blocks) to the SDF tables.

Unrelated to this work: we cleaned up the correspondence between the accelerometer numbers and channels in the C1PEM model. Also, the 3 unused ADC blocks in C1PEM (ADC0, ADC1 and ADC2) are required and cannot be removed as the ADC blocks have to be numbered sequentially and the signals needed in C1PEM come from ADC3 (as we found out when we tried recompiling the model after deleting these blocks).

Attachment 1: c1pem_timing.png
Attachment 2: C1SUS_overclocked.png
11854   Mon Dec 7 00:45:28 2015 ericqUpdateCDSdaqd is mad

I glanced at the summary pages and noticed that, since Friday around when we first loaded up the new BLRMS parts, daqd has crashing very frequently (few times per hour).

I'm going to comment out the c1pem lines from the daqd master file for tonight, and see if that helps.

11856   Mon Dec 7 10:45:21 2015 ericqUpdateCDSdaqd is mad

Since removing c1pem from the daqd master file, daqd has not crashed. I suppose we're running into the stability issue that motivated us to disable some of the other models (IOPs, RFM, etc.) during the RCG upgrade.

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

ELOG V3.1.3-