40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 51 of 344  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  11712   Sat Oct 24 12:34:43 2015 gautamUpdateCDSFrequency counting - workable setup prepared

Sorry for the confusion - I did mean Green beat frequency, and I had neglected the factor of 2 in my earlier calculations. However, the fit parameter "c" in my fit was actually the half-width at half maximum and not the full width at half maximum. After correcting for both these errors (new fit is Attachment #1, where I have now accounted for the factor of 2, and the X axis is the IR beat frequency), I don't think the numbers change too much. It could be that the frequency counter wasn't reading out the frequency correctly, but looking at a time series plot of the frequency counter readout (Attachment #2), and my earlier trials, I don't think this is the case (38 MHz is a frequency at which I don't expect much systematic error - also, the offset was stepped from -3 to 3 over 45 seconds). 

The revised numbers:

Fitted linewidth = 2*c = 10.884 kHz +/- 2 Hz (95% C.I.)

FSR of Y arm (from elog 9804) = 3.9468 MHz +/- 1.1 kHz

=> Y arm Finesse = FSR/fitted linewidth =  362.6 +/- 0.5

Total round trip loss = 2*pi/Finesse = 0.0173

 

Attachment 1: Yscan.pdf
Yscan.pdf
Attachment 2: Frequency_readout.pdf
Frequency_readout.pdf
  11723   Fri Oct 30 16:56:04 2015 gautamUpdateCDSis there a problem with the SCHMITTTRIGGER CDS library part?

Over the course of my investigations into the systematic errors in the frequency readout using digital frequency counting, I noticed that my counter variable that keeps track of the number of clock cycles between successive zero crossings was NOT oscillating between 2 values as I would have expected (because of there being a +/- 1 clock cycle difference between successive zero crossings due to the fixed sampling time of 1/16384 seconds), but that there were occassional excursions to values that were +/- 3 clock cycles away. I then checked the output of the SCHMITTTRIGGER CDS library part (which I was using to provide some noise immunity), and noticed that it wasn't triggering on every zero crossing at higher frequencies. I tested this by hooking up a digital oscillator to the SCHMITTTRIGGER part, and looked at its output for different frequencies. The parameters used were as follows:

CLKGAIN: 10000

SCHMITTTRIGGER lower threshold: -1.0

SCHMITTTRIGGER upper threshold: +1.0

I am attaching plots for two frequencies, 3000Hz and 4628Hz (Attachments #1 and #2) . I would have expected a flip in the state of the output of the schmitt trigger between every pair of horizontal red lines in this plot, but at 4628 Hz, it looks like the schmitt trigger isn't catching some of the zero crossings? Come to think of it, I am not even sure why the output of the schmitt trigger takes on any values other than 0 or 1 (could this be an artefact of some sort of interpolation in the visualization of these plots? But this would not affect the conclusion about the schmitt trigger missing some of the zero crossings?)

As an interim measure, I implemented a Schmitt trigger in my C code block - it was just a couple of extra lines anyways - I have designated the schmitt trigger output as a static variable that should hold its value in successive execution cycles, unless it is updated by comparing the input value to the thresholds (code attached for reference). Attachments #3 and #4 show the output of this implementation of a Schmitt trigger at the same two frequencies, and I am seeing the expected flip in the state between successive zero crossings as expected (though I'm still not sure why it takes on values other than 0 and 1?). Anyways this warrants further investigation. An elog regarding the implications of this on the systematic error in the frequency counter readout to follow.

Attachment 1: 3000Hz.pdf
3000Hz.pdf
Attachment 2: 4628Hz.pdf
4628Hz.pdf
Attachment 3: 3000Hz_software_SCHMITT.pdf
3000Hz_software_SCHMITT.pdf
Attachment 4: 4628Hz_software_SCHMITT.pdf
4628Hz_software_SCHMITT.pdf
  11731   Wed Nov 4 16:12:07 2015 ericqUpdateCDSOPTIMUS

There is a new machine on the martian network: 32 cores and 128GB of RAM. Probably this is more useful for intensive number crunching in MATLAB or whatever as opposed to IFO control. I've set up some of the LIGO analysis tools on it as well. 

A successor to Megatron, I dub it: OPTIMUS

  11736   Thu Nov 5 03:04:13 2015 gautamUpdateCDSFrequency counting - systematics and further changes

I've made a few more changes to the frequency counting code - these are mostly details and the algorithm is essentially unchanged.

  • The C-code block in the simulink model now outputs the number of clock cycles (=1/16384 seconds) rather than the frequency itself. I've kept converting this period to frequency step by taking the reciprocal as the last step of the signal chain, i.e. after the LP filter.
  • In the current version of the FC library block, I've disabled the moving average in the custom C code - I've left the functionality in the code, but the window length at the moment is set to 1, which in effect means that there is no moving average. I found that comparable jitters in the output are obtained by using no moving average, and having two 2-pole IIR low-pass filters in series (at 4Hz and 2Hz) as by doing a moving average over 4096 clock cycles and then passing that through a 2Hz IIR lowpass filter (as is to be expected).

Systematic error

The other thing that came up in the meeting last week was this issue of the systematic errors in the measured frequency, and how it was always over-estimating the 'actual' frequency. I've been investigating the origin of this over the last few days, and think I've found an explanation. But first, Attachment #1 shows why there is a systematic error in the first place - because we are counting the period of the input signal in terms of clock cycles, which can only take on discrete, integer values, we expect this number to fluctuate between the two integers bounding the 'true value'. So, if I'm trying to measure an input signal of 3000Hz, I would measure its period as either 5 or 6 clock cycles, while the "true"  value should be 5.4613 clock cycles. In attachment #1, I've plotted the actual measured frequency and the measured frequency if we always undercounted/overcounted to the nearest integer clock cycles, as functions of the requested frequency. So the observed systematic error is consistent with what is to be expected.

The reason why this doesn't average out to zero is shown in Attachment #2. In order to investigate this further, I recorded some additional diagnostic variables. If I were to average the period (in terms of clock cycles - i.e. I look for the peaks in the blue cuve, add them up, and divide by the number of peaks), I find that I can recover the expected period in terms of clock cycles pretty accurately. However, the way the code is set up at the moment, the c code block outputs a value every 1/16384 seconds (red curve) - but this is only updated each time I detect a zero crossing - and as a result, if I average this, I am in effect performing some sort of weighted average that distorts the true ratio of the number of times each integer clock-cycle-period is observed. This is the origin on the systematic error, and is a function of the relative frequency each of the two integer values of the clock-cycle-period occurs, which explains why the systematic error was a function of the requested frequency as seen in Attachment #1, and not a constant offset. 

At the frequencies I investigated (10-70MHz in 5MHz steps), the maximum systematic error was ~1%.

Is there a fix?

I've been reading up a bit on the two approaches to frequency counting - direct and reciprocal. My algorithm is the latter, which is generally regarded as the more precise of the two. However, in both these approaches, there is a parameter known as the 'gate-time': this is effectively how long a frequency counter measures for before outputting a value. In the current approach, the gate time is effectively 1/16384 seconds. I would think that it is perhaps possible to eliminate the systematic error by setting the gate time to something like 0.25 seconds, and within the gate time, do an average of the total number of periods measured. Something like 0.25 seconds should be long enough that if, within the window, we do the averaging, and between windows, we hold the averaged value, the systematic error could be eliminated. I will give this a try tomorrow. This would be different from the moving average approach already explored in that within the gate-time, I would perform the average only using those datapoints where the 'running counter variable' shown in Attachment #2 is reset to zero - this way, I avoid the artificial weighting that is an artefact of spitting out a value every clock cycle. 

Attachment 1: Systematic_error.pdf
Systematic_error.pdf
Attachment 2: systematics_origin.pdf
systematics_origin.pdf
  11754   Wed Nov 11 22:50:39 2015 KojiConfigurationCDSSlow machine time&date

I was gazing at the log file for Autolocker script (/opt/rtcds/caltech/c1/scripts/MC/logs/AutoLocker.log )
and found quite old time stamps. e.g.

Old : C1:IOO-MC_VCO_GAIN             1991-08-08 14:36:28.889032 -3
New : C1:IOO-MC_VCO_GAIN             1991-08-08 14:36:36.705699 18
Old : C1:PSL-FSS_FASTGAIN            1991-08-09 19:05:39.972376 14
New : C1:PSL-FSS_FASTGAIN            1991-08-09 19:05:44.939043 18

It was found that the date/time setting of some of the slow machines (at least c1psl and c1iool0) is not correct.
I could not figure out how to fix it.

Question: Is this anything critical?

Another thing: While I was in c1iool0 I frequently saw the message like

c1iool0 > 0xc461f0 (CA event): Events lost, discard count was 514

Is this anything related to EPICS Freeze?

controls@nodus|~ > telnet c1psl.martian
Trying 192.168.113.53...
Connected to c1psl.martian.
Escape character is '^]'.
c1psl > date
Aug 09, 1991 19:13:26.439024274
value = 32 = 0x20 = ' '
c1psl >
telnet> q
Connection closed.

controls@nodus|~ > telnet c1iool0.martian
Trying 192.168.113.57...
Connected to c1iool0.martian.
Escape character is '^]'.

c1iool0 > date
Aug 08, 1991 14:44:39.755679528
value = 32 = 0x20 = ' '
c1iool0 > 0xc461f0 (CA event): Events lost, discard count was 514
Change MC VCO gain to -3.
0xc461f0 (CA event): Events lost, discard count was 423
Change MC VCO gain to 18.
Change boost gain to 1.
Change boost gain to 2.

 

  11764   Sat Nov 14 00:52:25 2015 KojiSummaryCDSInvestion on EPICS freeze

[Yutaro, Koji]

Recently "EPICS Freeze" is so frequent and the normal work on the MEDM screen became almost impossible.

As a part of the investigation, all 19 realtime processes were stopped in order to see any effect on the probem.

IN FACT, when the realtime processes were absent (still slow machines were running), frequency of EPICS Freeze
became much less. This might mean that the issue is related to the data collection of the slow channels. We need more investigation.

After the testing, all the processes were restored although it was not so straghtforward. Particularly, c1sus DAC had an error which was
not visible from the CDS Status screen. We noticed it as suspension damping was not effective on any of the vertex suspensions.
This has been solved by restarting c1x02 process.

  11772   Tue Nov 17 14:31:25 2015 ericqUpdateCDS16Hz frame writing temporarily disabled

To test the effect on EPICS latency, I've restarted daqd with modified ini files which disable all frame writing of 16Hz channels. 

This happened at GPS:1131835955 aka Nov 17 2015 22:52:18 UTC

Last night, I started running a script written by Dave Barker that monitors a specified EPICS channel (in this case C1:IOO-MC_TRANS_SUM), to look for seconds in which it does not update the expected number of times. This is still running, so I will be able to compare the rate of EPICS slowdowns before and after this change. 

I will revert back to the nominal state of things in a few hours, or until someone asks me to. 

  11777   Tue Nov 17 20:57:43 2015 ericqUpdateCDS16Hz frame writing running again

Back to nominal FB configuration at 1131857782, aka Nov 18 2015 04:56:05 UTC.

Weirdly, during this time, the script watching MC_TRANS_SUM from pianosa saw tons of freezes, but another instance  watching LSC-TRY_OUT16 on optimus saw no freezes. 

  11778   Wed Nov 18 10:10:53 2015 ericqUpdateCDSc1iscey IO chassis missing brackets

Steve and I inadvertently discovered that the c1iscey IO chassis doesn't have brackets to secure the cards where the ADC/DAC cables are connected, making them very easy to knock loose. All other IO chassis have these brackets. Pictures of c1iscey and c1lsc IO chassis to compare:

  11789   Thu Nov 19 15:16:24 2015 ericqUpdateCDSc1iscey IO chassis now has brackets

[Steve, ericq]

Brackets for the c1iscey IO chassis cards have been installed. Now, I can't unseat the cards by wiggling the ADC or DAC cable. yes

  11791   Thu Nov 19 17:06:57 2015 KojiConfigurationCDSDisabled auto-launching RT processes upon FE booting

We want to startup the RT processes one by one on boot of FE machines.

Therefore /diskless/root/etc/rc.local on FB was modified as follows. The last sudo line was commented out.

for sys in $(/etc/rt.sh); do
    #sudo -u controls sh -c ". /opt/rtapps/rtapps-user-env.sh && /opt/rtcds/cal\
tech/c1/scripts/start${sys}"

    # NOTE: we need epics stuff AND iniChk.pl in PATH
    # we use -i here so that the .bashrc is sourced, which should also
    # source rtapps and rtcds user env (for epics and scripts paths)

    # commented out Nov 19, 2015, KA
    # see ELOG 11791 http://nodus.ligo.caltech.edu:8080/40m/11791
    # sudo -u controls -i /opt/rtcds/caltech/c1/scripts/start${sys}
done

  11809   Wed Nov 25 14:46:53 2015 gautamUpdateCDSc1lsc models restarted

I noticed that all the models running on C1LSC had crashed when I came in earlier today. I restarted all of them by ssh-ing into C1LSC and running rtcds restart all. The models seem to be running fine now.

  11833   Tue Dec 1 17:16:31 2015 gautamUpdateCDSScript for copying BLRMS filters

We've been talking about putting in BLRMS filters for several channels - it would be a pain to manually copy over the correct bandpass and lowpass filter coefficients into the newly created filter banks, and so I've set up a script (attached) that can do the job. As template filters, I'vm using the filters rana detailed here. Essentially, what the script does is identify the (empty on creation) block of text for a given filter: e.g. RMS_STS1Z_BP_0p01_0p03 for STS1Z), and appends the template filter coefficients. To test my script, I first backed up the original C1PEM.txt file from /opt/rtcds/caltech/c1/chans, removed all the filter coefficients for the STS1Z BLRMS filters, and then replaced it with one generated using my script. I then loaded the coefficients for all the filters in the C1PEM modules, without any obvious error messages being generated. I also checked that foton could read the new file, and checked tmake sure that sensible filter shapes were seen for some channels. Since this seems to be working, I'm going to start putting in BLRMS blocks into the models tomorrow.

Attachment 1: makeFilterFiles.bash.zip
  11840   Wed Dec 2 18:54:20 2015 gautamUpdateCDSChanges to C1MCS, C1PEM

Summary:

I've made several changes to the C1MCS model and C1PEM model, and have installed BLRMS filters for the MC mirror coils, which are now running. The main idea behind this test was to see how much CPU time was added as a result of setting up IPC channels to take the signals from C1MCS to C1PEM (where the BLRMS filtering happens) - I checked the average CPU time before and after installing the BLRMS filters, and saw that the increase was about 1 usec for 15 IPC channels installed (it increased from ~27usec to 28usec). A direct scaling would suggest that setting up the BLRMS for the vertex optics might push the c1sus model close to timing out - it is at ~50usec right now, and I would need, per optic, 12 IPC channels, and so for the 5 vertex optics, this would suggest that the CPU timing would be ~55usecI have not committed either of the changed models to the SVN just yet

Details:

  1. On Eric's suggestion, I edited the C1_SUS_SINGLE_CONTROL library part to tap the signal at the input to the output filters to the coils, as this is what we want the BLRMS of. I essentially added 5 more outputs to this part, one for each of the coils, and they are named ULIn etc to differentiate it from the signal after the output filters. I have not committed the changed library part to the SVN just yet
  2. I used the cdsIPCx_SHMEM part to pipe the signals from C1MCS to C1PEM - a total of 15 such channels were required for the three IMC mirrors.
  3. I added the same cdsIPCx_SHMEM parts to the C1PEM model in the receiving configuration, and connected their outputs to BLRMS blocks. The BLRMS blocks themselves are named as RMS_MC2_UL_COIL_IN and so on.
  4. I shutdown the watchdogs to MC1, MC2 and MC3, and restarted the C1PEM and C1MCS models on C1SUS. Yutaro pointed out that on restarting C1MCS, the IMC autolocker was disabled by default - I have enabled it again manually.
  5. I was under the impression that each time a BLRMS block is added, a filter bank is automatically added to the C1PEM.txt file in /opt/rtcds/caltech/c1/chans - turns out, it doesn't and my script for copying the template bandpass and lowpass filtes into the .txt files was needlessly complicated. It suffices to change the filter names in the template file, and append the template file to C1PEM.txt using the cat command: i.e. cat template.txt > C1PEM.txt. The computer generated file seems to organize the filters in alphabetical order, and my approach obviously does not do so, but the coefficients are loaded correctlty and the filters seem to be functioning correctly so I don't think this is a problem (I measured the transfer function of one of the filters with DTT, it seems to match up well with the Foton bode plot). 
  6. I added a few lines to the script to also turn on the filter switches after loading the filter coefficients.

Next steps:

Now that I have a procedure in place to install the BLRMS filters, we can do so for other channels as well, such as for the coils and Oplevs of the vertex optics, and the remaining PEM channels (SEIS, accelerometers, microphones?). For the vertex optics though, I am not sure if we need to do some rearrangement to the c1sus model to make sure it does not time out...

  11844   Thu Dec 3 18:18:48 2015 gautamUpdateCDSBLRMS for optics suspensions - library block

In order to be consistent with the naming conventions for the new BLRMS filters, I made a library block that takes all the input signals of interest (i.e. for a generic optic, the coil signals, the local damping shadow sensors, and the Oplev Pitch and Yaw signals - so a total of 12 signals, unused ones can be grounded). The block is called "sus_single_BLRMS". Inside the block, I've put in 12 BLRMS library blocks, with each input signal going to one of them. All the 7 outputs of the BLRMS block are terminated (I got a compiling error if I did not do this). The idea is to identify the optic using this block, e.g. MC2_BLRMS. The BLRMS filters inside are called UL_COIL, UR_COIL etc, so the BLRMS channels will end up being called C1:SUS-MC2_BLRMS_UL_COIL_0p01_0p03 and so on. I tried implementing this in C1PEM, but immediately after compiling and restarting the model, I noticed some strange behaviour in the seismic rainbow STS strip in the control room - this was right after the model was restarted, before I attempted to make any changes to the C1PEM.txt file and add filters. I then manually opened up the filter bank screens for the RMS_STS1Z bandpass and lowpass filters, and saw that the filter switches were OFF - I wonder if this has something to do with these settings not being updated in the SDF tables? So I manually turned them on and cleared the filter hitsory for all 7 low pass and band pass filter banks, but the traces on the seismic striptool did not return to their nominal levels. I manually checked the filter shapes with Foton and they seem alright. Anyways, for now, I've reverted to the C1PEM model before I made any changes, and the seismic strip looks to be back at its normal level - when I recompiled and restarted the model with the changes I made removed, the STS1Z BLRMS bandpass and lowpass filters were ON by default again! I'm not sure what I'm doing wrong here, I will investigate this further. 

  11848   Fri Dec 4 13:12:41 2015 ericqUpdateCDSc1ioo Timing Overruns Solved

I had noticed for a while that the c1ioo frontend model had much higher variability than any of the other other 16k models, and would run longer than 60us multiple times an hour. This struck me as odd, since all it does is control the WFS loops. (You can see this on the Nov17 Summary page. (But somehow, the CDS tab seems broken since then, and I'm not sure why...))

This has happily now been solved! While poking around the model, I noticed that the MC2 transmission QPD data streams being sent over from c1mcs were using RFM blocks. This seemed weird to me, since I wasn't even aware that c1ioo had an RFM card. Since the c1sus and c1ioo frontends are both on the Dolphin network, I changed them to Dolphin blocks and voila! The cylce time now holds steady at 21usec. 


Update: I think I figured out the problem with the CDS summary pages. Looking at the .err files in /home/40m/public_html/summary/logs on the 40m LDAS account showed that C1:FEC-33_CPU_METER wasn't found in the frame files. Indeed, this channel was commented out in c1/chans/daq/C0EDCU.ini. I enabled it and restarted daqd. Hopefully the CDS tab will come back soon...

  11849   Fri Dec 4 18:20:36 2015 gautamUpdateCDSBLRMS for IMC setup

[ericq, gautam]

BLRMS filters have been set up for the coil outputs and shadow sensor signals. The signals are sent to the C1PEM model from C1MCS, where I use the library block mentioned in the previous elog to put the filters in place. Some preliminary observations:

  1. The entire operation seems to be computationally quite expensive - just for the 3 IMC mirrors, the average CPU time for C1PEM increased from ~50 usec (before any changes were made to C1PEM) to ~105 usec just as a result of installing 420 filter modules with no filter coefficients loaded (3 optics x 10 channels x 14 filter banks) to ~120 usec when all the BP and LP filters for the BLRMS blocks have been loaded and turned on (Attachment #1). Eric suggested that it may be computationally more efficient to do this without using the BLRMS library part - i.e. rather than having so many filter modules, do the RMS-ing using a piece of C code that essentially just implements the same SOS filters that FOTON generates, I will work on setting this up and checking if it makes a difference. The fact that just having empty filter modules in the model almost doubled the computation time suggests that this approach may help, but I have to think about how to implement some of the automatic checks that having a filter bank in place gives us, or if these are strictly necessary at all...
  2. While restarting the C1PEM model, we noticed that some of the optics were shaking - looking at the CPU timing signals for all the models on C1SUS revealed that both the C1SUS model and the C1MCS model were geting overclocked when C1PEM was killed (see the sharp spikes in the red and green traces in Attachment #2 - the Y scale in this plot is poor and doesnt put numbers to the overclocking, I will upload a better screenshot that Eric took once I find it). The cause is unknown.
  3. Yesterday, I noticed that when C1PEM was restarted, the states of the filter bank switches were not reverted to their states in the SDF tables. They are showing up as changes, but it is unclear why we have to manually revert them. I have also not yet added the states of the newly installed filters (BPs and LPs for the MC BLRMS blocks) to the SDF tables. 

Unrelated to this work: we cleaned up the correspondence between the accelerometer numbers and channels in the C1PEM model. Also, the 3 unused ADC blocks in C1PEM (ADC0, ADC1 and ADC2) are required and cannot be removed as the ADC blocks have to be numbered sequentially and the signals needed in C1PEM come from ADC3 (as we found out when we tried recompiling the model after deleting these blocks).

Attachment 1: c1pem_timing.png
c1pem_timing.png
Attachment 2: C1SUS_overclocked.png
C1SUS_overclocked.png
  11854   Mon Dec 7 00:45:28 2015 ericqUpdateCDSdaqd is mad

I glanced at the summary pages and noticed that, since Friday around when we first loaded up the new BLRMS parts, daqd has crashing very frequently (few times per hour). 

I'm going to comment out the c1pem lines from the daqd master file for tonight, and see if that helps. 

  11856   Mon Dec 7 10:45:21 2015 ericqUpdateCDSdaqd is mad

Since removing c1pem from the daqd master file, daqd has not crashed. I suppose we're running into the stability issue that motivated us to disable some of the other models (IOPs, RFM, etc.) during the RCG upgrade. 

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

  11859   Mon Dec 7 11:25:10 2015 jamieUpdateCDSdaqd is mad
Quote:

A question to Jamie: although the new framebuilder prototype still had the same problem with trend writing, can it handle this higher testpoint/DQ channel load?

The new fb1 daqd was also crashing even without the trend writing enabled.  I'm not sure how much that's affected by the load, though, e.g. it might be able to handle the extra load fine but then die because of some other issue not related to the number of channels being acquired.

We should schedule some time this week to work on fb1 some more.

  11868   Wed Dec 9 19:01:45 2015 jamieUpdateCDSback to fb1

I spent this afternoon trying to debug fb1, with very little to show for it.  We're back to running from fb.

The first thing I did was to recompile EPICS from source, so that all the libraries needed by daqd were compiled for the system at hand.  I compiled epics-3.14-12-2_long from source, and installed it at /opt/rtapps/epics on local disk, not on the /opt/rtapps network mount.  I then recompiled daqd against that, and the framecpp, gds, etc from the LSCSoft packages.  So everything has been compiled for this version of the OS.  The compilation goes smoothly.

There are two things that I see while running this new daqd on fb1:

instability with mx_streams

The mx stream connection between the front ends and the daqd is flaky.  Everything will run fine for a while, the spontaneously one or all of the mx_stream processes on the front ends will die.  It appears more likely that all mx_stream processes will die at the same time.  It's unclear if this is some sort of chain reaction thing, or if something in daqd or in the network itself is causing them all to die at the same time.  It is independent of whether or not we're using multiple mx "end points" (i.e. a different one for each front end and separate receiver threads in the daqd) or just a single one (all front ends connecting to a single mx receiver thread in daqd).

Frequently daqd will recover from this.  The monit processes on the front ends restart the mx_stream processes and all will be recovered.  However occaissionally, possibly if the mx_streams do not recover fast enough (which seems to be related to how frequently the receiver threads in daqd can clear themselves), daqd will start to choke and will start spitting out the "empty blocks" messages that are harbirnger of doom:

Aborted 2 send requests due to remote peer 00:30:48:be:11:5d (c1iscex:0) disconnected
00:30:48:d6:11:17 (c1iscey:0) disconnected
mx_wait failed in rcvr eid=005, reqn=182; wait did not complete; status code is Remote endpoint is closed
disconnected from the sender on endpoint 005
mx_wait failed in rcvr eid=001, reqn=24; wait did not complete; status code is Remote endpoint is closed
disconnected from the sender on endpoint 001
[Wed Dec  9 18:40:14 2015] main profiler warning: 1 empty blocks in the buffer
[Wed Dec  9 18:40:15 2015] main profiler warning: 0 empty blocks in the buffer
[Wed Dec  9 18:40:16 2015] main profiler warning: 0 empty blocks in the buffer

My suspicion is that this time of failure is tied to the mx stream failures, so we should be looking at the mx connections and network to solve this problem.

frame writing troubles

There's possibly a separate issue associated with writing the second or minute trend files to disk.  With fair regularity daqd will die soon after it starts to write out the trend frames, producing the similar "empty blocks" messages.

  11882   Mon Dec 14 23:56:29 2015 ericqUpdateCDSc1pem reverted

To get C1PEM data back into the frames, I removed the new BLRMS blocks, recompiled, reinstalled, re-enabled it in daqd, restarted.

We still really want more headroom in our framebuilder situation. 

  11883   Tue Dec 15 11:22:53 2015 gautamUpdateCDSc1scx and c1asx crashed

I noticed what I thought was excessive movement of the beam spot on ITMX and ETMX on the control room monitors, and when I checked the CDS FE status overview MEDM screen, I saw that c1scx and c1asx had crashed. I ssh-ed into c1iscex and restarted both models, and then restarted fb as well. However, the DAQ-DCO_C1SCX_STATUS indicator remains red even after restarting fb (see attached screenshot). I am not sure how to fix this so I am leaving it as is for now, and the X arm looks to have settled down.

Attachment 1: CDS_FE_STATUS_OVERVIEW_15DEC2015.png
CDS_FE_STATUS_OVERVIEW_15DEC2015.png
  11886   Wed Dec 16 10:56:22 2015 gautamUpdateCDShard reboot of FB

[ericq,gautam]

Forgot to submit this yesterday...

While we were trying to get the X-arm locked to IR using MC2, frame-builder mysteriously crashed, necessitating us having to go down to the computer and perform a hard reboot (after having closed the PSL shutter and turning all the watchdogs to "shutdown"). All the models restarted by themselves, and everything seems back to normal now..

  11890   Thu Dec 17 14:02:05 2015 gautamUpdateCDSIPC channels for beat frequency control set up

I've set up two IPC channels that take the output from the digital frequency counters and send them to the end front-ends (via the RFM model). A summary of the steps I followed:

  1. Set up two Dolphin channels in C1ALS to send the output of the frequency counter blocks to C1RFM (I initially used RFM blocks for these, but eric suggested using Dolphin IPC for the ALS->RFM branch, as they're faster.. Eric's removed the redundant channel names)
  2. Set up two RFM channels in C1RFM to send the out put of the frequency counter blocks to C1SCX/Y (along with CDS monitor points to monitor the error rate and a filter module between the ALS->RFM and RFM->SCX/Y IPC blocks - I just followed what seemed to be the convention in the RFM model).
  3. Set up the receiving channels in C1SCX and C1SCY
  4. Re-compiled and re-started the models in the order C1ALS, C1RFM, C1SCX and C1SCY.

I've set things up such that we can select either the "PZT IN" or the frequency counter as the input to the slow servo, via means of a EPICS variable called "FC_SWITCH" (so C1:ALS-X_FC_SWITCH or C1:ALS-Y_FC_SWITCH). If this is 0, we use the default "PZT IN" signal, while setting it to 1 will change the input to the slow servo to be the frequency readout from the digital frequency counter. I've not updated the MEDM screens to reflect the two new paths yet, but will do so soon. It also remains to install appropriate filters for the servo path that takes the frequency readout as the input.

Tangentially related to this work: I've modified the FC library block so that it outputs frequency in MHz as opposed to Hz, just for convenience..

  11891   Thu Dec 17 16:44:03 2015 gautamUpdateCDSALS Slow control MEDM screen updated
Quote:

I've not updated the MEDM screens to reflect the two new paths yet, but will do so soon. It also remains to install appropriate filters for the servo path that takes the frequency readout as the input.

A few more related changes:

  1. The couplers that used to sit on the green beat PDs on the PSL table have now been shifted to the IR broadband PDs in the FOL box so that I can get the IR beat frequency over to the frequency counters. The FOL box itself, along with the fibers that bring IR light to the PSL table, have been relocated to the corner of the PSL table where the green beat PDs sit because of cable length constraints.
  2. I've updated the ALS slow control MEDM screen to allow for slow control of the beat frequency. The servo shape for now is essentially just an integrator with a zero at 1 Hz. The idea is to set an offset in the new filter module, which is the desired beat frequency, and let the integrator maintain this beat frequency. One thing I've not taken care of yet is automatically turning this loop off when the IMC loses lock. Screenshot of the modified MEDM screen is attached. 
  3. I checked the performance by using the temperature sliders to introduce an offset. The integrator is able to bring the beat frequency back to the setpoint in a few seconds, provided the step I introduced was not two big (~20 counts, but this is a pretty large shift in beat frequency, nearly 20MHz).

To do:

  1. Figure out how to deal with the IMC losing lock. I guess this is important if we want to use the IR beatnote as a diagnostic for the state of the X AUX laser.
  2. Optimize the servo gains a little - I still see some ringing when I introduce an offset, this could be avoided...
Attachment 1: ALS_SLOW_17DEC2015.png
ALS_SLOW_17DEC2015.png
  11972   Wed Feb 3 08:39:17 2016 SteveUpdateCDSblank daily summary pages

Daily summary pages are blank today. Yesterday is ok

  12024   Sun Mar 6 15:24:05 2016 gautamUpdateCDSFB down again

I came in to check the status of the nitrogen and noticed that the striptool panels in the control room were all blank.

  • PMC was unlocked but I was able to relock it using the usual procedure
  • FB seems to be down: I was unable to ssh into it (or any of the FEs for that matter). I checked the lights on the RAID array, they are all green. I am holding off on doing a hard reboot of FB in case there is some other debugging that can be done first
  • None of the watchdogs were tripped, but judging by the green spots on the mirrors, all of them are moving quite a bit. I've shutdown the watchdogs on all the optics except the MC mirrors, but the ITMs and ETMs still seem to be moving quite a bit.

I am leaving things in this state for now. It is unclear why this should have happened, it doesn't seem like there was a power glitch?

Attachment 1: 58.png
58.png
  12025   Mon Mar 7 20:40:02 2016 ericqUpdateCDSFB down again

We went and looked at the monitor plugged into FB. All kinds of messages were being spammed to the screen (maybe RAM errors), and nothing could be done to interrupt. Sadly, a hard reboot of FB was neccesary.

Video of error messages: https://youtu.be/7rea_kokhPY

After the reboot, it just took a couple of model restarts to get the CDS screen happy.

  12064   Tue Apr 5 14:16:34 2016 gautamUpdateCDSBLRMS for optics suspensions - library block UPDATED

As discussed in a Wednesday meeting some time ago, we don't need to be writing channels from BLRMS filter modules to frames at 16k (we suspect this is leading to the frequent daqd crashes which were seen the last time we tried setting BLRMS up for all the suspensions). EricQ pointed out to me that there conveniently exists a library block that is much better suited to our purposes, called BLRMS_2k. I've replaced all the BLRMS library blocks in the sus_single_BLRMS library block that I made with there BLRMS_2k blocks. I need to check that the filters used by the BLRMS_2k block (which reside in /opt/rtcds/userapps/release/cds/common/src/BLRMSFILTER.c) are appropriate, after which we can give setting up BLRMS for all the suspensions a second try...

  12140   Mon May 30 18:19:50 2016 JohannesUpdateCDSASS medm screen update

I noticed that the TRY button in the ASS main screen was linking to LSC_TRX instead of LSC_TRY. Gautam fixed it.

  12148   Fri Jun 3 13:05:18 2016 ericqUpdateCDSCDS Notes

Some CDS related things:


Keith Thorne has told us about a potential fix for our framebuilder woes. Jamie is going to be at the 40m next week to implement this, which could interfere with normal interferometer operation - so plan accordingly. 


I spent a little time doing some plumbing in the realtime models for Varun's audio processing work. Specifically, I tried to spin up a new model (C1DAF), running on the c1lsc machine. This included:

  • Removing the unused TT3 and TT4 parts from the IOO block in c1ass.mdl, freeing up some DAC outputs on the LSC rack
  • Adding an output row to the LSC input matrix which pipes to a shared memory IPC block. (This seemed like the simplest way for the DAFI model to have access to lots of signals with minimal overhead).
  • Removing two unused ADC inputs from c1lsc.mdl (that went to things like PD_XXX), to give c1daf.mdl the required two ADC inputs - and to give us the option of feeding in some analog signals.
  • Editing the rtsystab file to include c1daf in the list of models that run on c1lsc
  • Editing the existing DAFI .mdl file (which just looked like an old recolored cut-n-paste of c1ioo.mdl) to accept the IPC and ADC connections, and one DAC output that would go to the fibox. 

The simple DAFI model compiled and installed without complaint, but doesn't succesfully start. For some reason, the frontend never takes the CPU offline. Jamie will help with this next week. Since things aren't working, these changes have not been commited to the userapps svn. 

  12151   Mon Jun 6 16:41:36 2016 ericqUpdateCDSFB upgrade work

Barring objections, starting tomorrow morning, Jamie will be testing the new FB code. The IFO will not be available for other use while this is ongoing.

  12152   Tue Jun 7 11:12:47 2016 jamieUpdateCDSDAQD UPGRADE WORK UNDERWAY

I am re-starting work on the daqd upgrade again now.  Expect the daqd to be offline for most of the day.  I will report progress.

  12155   Tue Jun 7 20:49:50 2016 jamieUpdateCDSDAQD work ongoing

Summary: new daqd code running overnight test on fb1.  Stability issues persist.

The code is from Keith's "tests/advLigoRTS-40m" branch, which is a branch of the current trunk.  It's supposed to include patches to fix the crashes when multiple frame types are written to disk at the same time.  However, the issue is not fixed:

2016-06-07_20:38:55 about to write frame @ 1149392336
2016-06-07_20:38:55 Begin Full WriteFrame()
2016-06-07_20:38:57 full frame write done in 2seconds
2016-06-07_20:39:11 about to write frame @ 1149392352
2016-06-07_20:39:11 Begin Full WriteFrame()
2016-06-07_20:39:13 full frame write done in 2seconds
2016-06-07_20:39:27 about to write frame @ 1149392368
2016-06-07_20:39:27 Begin Full WriteFrame()
2016-06-07_20:39:29 full frame write done in 2seconds
2016-06-07_20:39:43 about to write second trend frame @ 1149391800
2016-06-07_20:39:43 Begin second trend WriteFrame()
2016-06-07_20:39:43 about to write frame @ 1149392384
2016-06-07_20:39:43 Begin Full WriteFrame()
2016-06-07_20:39:44 full frame write done in 1seconds
2016-06-07_20:39:59 about to write frame @ 1149392400
2016-06-07_20:40:04 Begin Full WriteFrame()
2016-06-07_20:40:04 Second trend frame write done in 21 seconds
2016-06-07_20:40:14 [Tue Jun  7 20:40:14 2016] main profiler warning: 1 empty blocks in the buffer
2016-06-07_20:40:15 [Tue Jun  7 20:40:15 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:16 [Tue Jun  7 20:40:16 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:17 [Tue Jun  7 20:40:17 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:18 [Tue Jun  7 20:40:18 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:19 [Tue Jun  7 20:40:19 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:20 [Tue Jun  7 20:40:20 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:21 [Tue Jun  7 20:40:21 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:22 [Tue Jun  7 20:40:22 2016] main profiler warning: 0 empty blocks in the buffer
2016-06-07_20:40:23 [Tue Jun  7 20:40:23 2016] main profiler warning: 0 empty blocks in the buffer

This failure comes when a full frame (1149392384+16) is written to disk at the same time as a second trend (1149391800+600).  It seems like every time this happens daqd crashes.

I have seen other stability issues as well, maybe caused by mx flakiness, or some sort of GPS time synchronization issue caused by our lack of IRIG-B cards.  I'm going to look to see if I can get the GPS issue taken care of so we take that out of the picture.

For the last couple of hours I've only seen issues with the frame writing every 20 minutes, when the full and second trend frames happen to be written at the same time.  Running overnight to gather more statistics.

  12156   Wed Jun 8 08:34:55 2016 jamieUpdateCDSDAQD work ongoing

38 restarts overnight.  Problem definitely not fixed.  I'll be reverting back to old daqd and fb this morning.  Then regroup and evaluate options.

  12158   Wed Jun 8 13:50:39 2016 jamieConfigurationCDSSpectracom IRIG-B card installed on fb1

[EDIT: corrected name of installed card]

We just installed a Spectracom TSyc-PCIe timing card on fb1.  The hope is that this will help with the GPS timeing syncronization issues we've been seeing in the new daqd on fb1, hopefully elliminating some of the potential failure channels.

The driver, called "symmetricom" in the advLigoRTS source (name of product from competing vendor), was built/installed (from DCC T1500227):

controls@fb1:~/rtscore/tests/advLigoRTS-40m 0$ cd src/drv/symmetricom/
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ ls
Makefile  stest.c  symmetricom.c  symmetricom.h
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ make
make -C /lib/modules/3.2.0-4-amd64/build SUBDIRS=/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-4-amd64'
  CC [M]  /home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.o
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:59:9: warning: initialization from incompatible pointer type [enabled by default]
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:59:9: warning: (near initialization for ‘symmetricom_fops.unlocked_ioctl’) [enabled by default]
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c: In function ‘get_cur_time’:
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:89:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c: In function ‘symmetricom_init’:
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:188:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:222:3: warning: label ‘out_remove_proc_entry’ defined but not used [-Wunused-label]
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:158:22: warning: unused variable ‘pci_io_addr’ [-Wunused-variable]
/home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.c:156:6: warning: unused variable ‘i’ [-Wunused-variable]
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.mod.o
  LD [M]  /home/controls/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom/symmetricom.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-4-amd64'
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ sudo make install
#remove all old versions of the driver
find /lib/modules/3.2.0-4-amd64 -name symmetricom.ko -exec rm -f {} \; || true
find /lib/modules/3.2.0-4-amd64 -name symmetricom.ko.gz -exec rm -f {} \; || true
# Install new driver
install -D -m 644 symmetricom.ko /lib/modules/3.2.0-4-amd64/extra/symmetricom.ko
/sbin/depmod -a || true
/sbin/modprobe symmetricom
if [ -e /dev/symmetricom ] ; then \
        rm -f /dev/symmetricom ; \
    fi
mknod /dev/symmetricom c `grep symmetricom /proc/devices|awk '{print $1}'` 0
chown controls /dev/symmetricom
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ ls /dev/symmetricom
/dev/symmetricom
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ ls -al /dev/symmetricom
crw-r--r-- 1 controls root 250, 0 Jun  8 13:42 /dev/symmetricom
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ 
  12161   Thu Jun 9 13:28:07 2016 jamieConfigurationCDSSpectracom IRIG-B card installed on fb1

Something is wrong with the timing we're getting out of the symmetricom driver, associated with the new spectracom card.

controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 127$ lalapps_tconvert 
1149538884
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ cat /proc/gps 
704637380.00
controls@fb1:~/rtscore/tests/advLigoRTS-40m/src/drv/symmetricom 0$ 

The GPS time is way off, and it's counting up at something like 900 seconds/second.  Something is misconfigured, but I haven't figured out what yet.

The timing distribution module we're using is spitting out what appears to be an IRIG B122 signal (amplitude moduled 1 kHz carrier), which I think is what we expect.  This is being fed into the "AM IRIG input" connector on the card.

Not sure why the driver is spinning so fast, though, with the wrong baseline time.  Reboot of the machine didn't help.

  12162   Thu Jun 9 15:14:46 2016 jamieUpdateCDSold fb restarted, test of new daqdon fb1 aborted for time being

I've restarted the old daqd on fb until I can figure out what's going on with the symmetricom driver on fb1.

Steve:  Jamie with hair.... long time ago
 

Attachment 1: Jamie.jpg
Jamie.jpg
  12166   Fri Jun 10 12:09:01 2016 jamieConfigurationCDSIRIG-B debugging

Looks like we might have a problem with the IRIG-B output of the GPS receiver.

Rolf came over this morning to help debug the strange symmetricom driver behavior on fb1 with the new Spectracom card.  We restarted the machine againt and this time when we loaded the drive rit was clocking at a normal rate (second/second).  However, the overall GPS time was still wrong, showing a time in October from this year.

The IRIG-B122 output is supposed to encode the time of year via amplitude modulation of a 1kHz carrier.  The current time of year is:

controls@fb1:~ 0$ TZ=utc date +'%j day, %T'
162 day, 18:57:35
controls@fb1:~ 0$ 

The absolute year is not encoded, though, so the symmetricon driver has the year offset hard coded into the driver (yuck), to which it adds the time of year from the IRIG-B signal to get the correct GPS time.

However, loading the symmetricom module shows the following:

...
[ 1601.607403] Spectracom GPS card on bus 1; device 0
[ 1601.607408] TSYNC PIC BASE 0 address = fb500000
[ 1601.607429] Remapped 0xffffc90017012000
[ 1606.606164] TSYNC NOT receiving YEAR info, defaulting to by year patch
[ 1606.606168] date = 299 days 18:28:1161455320
[ 1606.606169] bcd time = 1161455320 sec  959 milliseconds 398 microseconds  959398630 nanosec
[ 1606.606171] Board sync = 1
[ 1606.616076] TSYNC NOT receiving YEAR info, defaulting to by year patch
[ 1606.616079] date = 299 days 18:28:1161455320
[ 1606.616080] bcd time = 1161455320 sec  969 milliseconds 331 microseconds  969331350 nanosec
[ 1606.616081] Board sync = 1
controls@fb1:~ 0$ 

Apparently the symmetricom driver thinks it's the 299nth day of the year, which of course corresponds to some time in october, which jives with the GPS time the driver is spitting out.

Rolf then noticed that the timing module in the VME crate in the adjacent rack, which also receives an IRIG-B signal from the distribution box, was also showing day 299 on it's front panel display. We checked and confirmed that the symmetricom card and the VME timing module both agree on the wrong time of year, strongly suggesting that the GPS receiver is outputing bogus data on it's IRIG-B output, even though it's showing the correct time on it's front panel.  We played around with setting in the GPS receiver to no avail.  Finally we rebooted the GPS receiver, but it seemed to come up with the same bogus IRIG-B output (again both symmetricom driver and VME timing module agree on the wrong day).

So maybe our GPS receiver is busted?  Not sure what to try now.

 

  12167   Fri Jun 10 12:21:54 2016 jamieConfigurationCDSGPS receiver not resetting properly

The GPS receiver (EndRun Technologies box in 1Y5? (rack closest to door)) seems to not coming back up properly after the reboot.  The front pannel says that it's "LKD", but the "sync" LED is flashing instead of solid, and the time of year displayed on the front panel is showing day 6.  The fb1 symmetricom driver and VME timing module are still both seeing day 299, though.  So something may definitely be screwy with the GPS receiver.

  12173   Mon Jun 13 20:01:30 2016 varunUpdateCDSDAFI GUI update

Summary: I am implementing digital audio filtering on various interferometer signals in order to listen to the processed audio which will help in characterizing and noise reduction in the interferometer. following is a summary of the gui i have made towards a general purpose DAF module linked to the LSC. 

Details:  attachment 1 shows the top level overview of the daf module.

The "INPUTS" button shown redirects to the medm screen shown in attachment 2, which is a collection of inputs going into the module.

Each of the buttons shown in "C1DAFI_INPUTS.png" is further linked to various i/o boxes like adc1, adc2, lsc signal and exitation. An example is shown in attachment 3. This is the specific I/O box for the LSC signal.

The field labelled "INPUT_MTRX" is linked to a matrix which routes these 4 inputs to various DSP blocks. Similarly, the "OUTPUT_MTRX" tab is useful for choosing which output goes to the speaker. 

Time and computational load monitoring is done in the "GDS_TP" tab which links to the medm screen shown in attachment 4.

Currently the AGC is successfully implemented as one of the DSP block. The details of the AGC implementation were given in a previous elog: https://nodus.ligo.caltech.edu:8081/40m/12159

I need to make a few changes to the code for Frequency Shifting and Whitening before uploading them on the FE. I will put the details soon.

Some more things that I think need to be added: 

1) "Enable" buttons for each of the DSP blocks.

2) Labels for each of the matrix elements.

3) Further headers and other description for each of the tabs 

Attachment 1: C1DAF_OVERVIEW.png
C1DAF_OVERVIEW.png
Attachment 2: C1DAF_INPUTS.png
C1DAF_INPUTS.png
Attachment 3: C1DAF_LSC.png
C1DAF_LSC.png
Attachment 4: MONITOR.png
MONITOR.png
  12179   Tue Jun 14 19:37:40 2016 jamieUpdateCDSOvernight daqd test underway

I'm running another overnight test with new daqd software on fb1.  The normal daqd process on fb has been shutdown, and the front ends are sending their signals to fb1.

fb1 is running separate data concentrator (dc) and  frame writer (fw) processes, to see if this is a more stable configuration than the all-in-one framebuilder (fb) that we have been trying to run with.  I'll report on the test tomorrow.

  12180   Tue Jun 14 20:10:19 2016 varunUpdateCDSDAFI GUI update

I have added Enable buttons for each of the DSP blocks, and labels for the matrix elements. The input matrix takes inputs from each of the 4 channels: ADC1, ADC2, LSC and EXC, and routes them to the audio processing blocks (attachment 2). The output matrix (attachment 3) takes the outputs of the various DSP blocks and routes them to the output and then to the speakers. 

Attachment 1: C1DAF_OVERVIEW.png
C1DAF_OVERVIEW.png
Attachment 2: input_matrix.png
input_matrix.png
Attachment 3: output_matrix.png
output_matrix.png
  12181   Wed Jun 15 09:52:02 2016 jamieUpdateCDSVery encouraging results from overnight split daqd test

laughVery encouraging results from the test last night.  The new configuration did not crash once overnight, and seemed to write out full, second trend, and minute trend frames without issueyes.  However, full validity of all the written out frames has not been confirmed.

overview

The configuration under test involves two separate daqd binaries instead of one.  We usually run with what is referred to as a "framebuilder" (fb) configuration:

  • fb: a single daqd binary that:
    • collect the data from the front ends
    • coallate full data into frame file format
    • calculates trend data
    • writes frame files to disk.

The current configuration separates the tasks into multiple separate binaries: a "data concentrator" (dc) and a "frame writer" (fw):

  • dc:
    • collect data from front ends
    • coallate full data into frame file format
    • broadcasts frame files over local network
  • fw:
    • receives frame files from broadcast
    • calculates trend data
    • writes frame files to disk

This configuration is more like what is run at the sites, where all the various components are separate and run on separate hardware.  In our case, I tried just running the two binaries on the same machine, with the broadcast going over the loopback interface.  None of the systems that use separated daqd tasks see the failures that we've been seeing with the all-in-one fb configuration (and other sites like AEI have also seen).

My guess frown is that there's some busted semaphore somewhere in daqd that's being shared between the concentrator and writer components.  The writer component probably aquires the lock while it's writing out the frame, which prevents the concentrator for doing what it needs to be doing while the frame is being written out.  That causes the concentrator to lock up and die if the frame writing takes too long (which it seems to almost necessarily do, especially when trend frames are also being written out).

results

The current configuration hasn't been tweaked or optimized at all.  There is of course basically no documentation on the meaning of the various daqdrc directives.  Hopefully I can get Keith Thorne to help me figure out a well optimized configuration.

There is at least one problem whereby the fw component is issuing an excessively large number of re-transmission requests:

2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 6 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 8 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 3 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 5 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 5 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 5 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 5 packets; port 7097
2016-06-15_09:46:22 [Wed Jun 15 09:46:22 2016] Ask for retransmission of 6 packets; port 7097
2016-06-15_09:46:23 [Wed Jun 15 09:46:23 2016] Ask for retransmission of 1 packets; port 7097

It's unclear why.  Presumably the retransmissions requests are being honored, and the fw eventually gets the data it needs.  Otherwise I would hope that there would be the appropriate errors.

The data is being written out as expected:

 full/11500: total 182G
drwxr-xr-x  2 controls controls 132K Jun 15 09:37 .
-rw-r--r--  1 controls controls  69M Jun 15 09:37 C-R-1150043856-16.gwf
-rw-r--r--  1 controls controls  68M Jun 15 09:37 C-R-1150043840-16.gwf
-rw-r--r--  1 controls controls  68M Jun 15 09:37 C-R-1150043824-16.gwf
-rw-r--r--  1 controls controls  69M Jun 15 09:36 C-R-1150043808-16.gwf
-rw-r--r--  1 controls controls  69M Jun 15 09:36 C-R-1150043792-16.gwf
-rw-r--r--  1 controls controls  68M Jun 15 09:36 C-R-1150043776-16.gwf
-rw-r--r--  1 controls controls  68M Jun 15 09:36 C-R-1150043760-16.gwf
-rw-r--r--  1 controls controls  69M Jun 15 09:35 C-R-1150043744-16.gwf

 trend/second/11500: total 11G
drwxr-xr-x  2 controls controls 4.0K Jun 15 09:29 .
-rw-r--r--  1 controls controls 148M Jun 15 09:29 C-T-1150042800-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 09:19 C-T-1150042200-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 09:09 C-T-1150041600-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 08:59 C-T-1150041000-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 08:49 C-T-1150040400-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 08:39 C-T-1150039800-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 08:29 C-T-1150039200-600.gwf
-rw-r--r--  1 controls controls 148M Jun 15 08:19 C-T-1150038600-600.gwf

 trend/minute/11500: total 152M
drwxr-xr-x 2 controls controls 4.0K Jun 15 07:27 .
-rw-r--r-- 1 controls controls  51M Jun 15 07:27 C-M-1150023600-7200.gwf
-rw-r--r-- 1 controls controls  51M Jun 15 04:31 C-M-1150012800-7200.gwf
-rw-r--r-- 1 controls controls  51M Jun 15 01:27 C-M-1150002000-7200.gwf

The frame sizes look more or less as expected, and they seem to be valid as determined with some quick checks with the framecpp command line utilities.

  12182   Wed Jun 15 10:19:10 2016 SteveConfigurationCDSGPS antennas... debugging

Visual inspection of rooftop GPS antennae:

The 2 GPS antennas are located on the north west corner of CES roof. Their condition looks well weathered. I'd be surprised if they work.

The BNC connector of the 1553 head is inside of the conduit so it is more likely to have a better connection than the other one.

I have not had a chance to look into the "GPS Time Server" unit.

Attachment 1: GPStimeServer.jpg
GPStimeServer.jpg
Attachment 2: GPSsn.jpg
GPSsn.jpg
Attachment 3: GPSantennas.jpg
GPSantennas.jpg
Attachment 4: GPSantHead.jpg
GPSantHead.jpg
Attachment 5: GPSantHead2.jpg
GPSantHead2.jpg
Attachment 6: GPSantCabels.jpg
GPSantCabels.jpg
  12183   Wed Jun 15 11:21:51 2016 jamieUpdateCDSstill work to do to transition to new configuration/code

Just to be clear, there's still quite a bit of work to fully transition the 40m to this new system/configuration.  Once we determine a good configuration we need to complete the install, and modify the setup to run the two binaries instead of just the one.  The data is also being written to a raid on the new fb1, and we need to decide if we should use this new raid, or try to figure out how to move the old jetstor raid to the new fb1 machine.

  12185   Wed Jun 15 22:12:55 2016 varunUpdateCDSDAFI update: stereo output

I wish to have stereo audio output for the DAF module. Hence, there needs to be a second output from the DAF. I added this second output to the model. Following are the details:

FiBox: It consists of two analog inputs which are digitized and multiplexed and transmitted optically. (only 1 fiber is needed due to multiplexing). Attachment 1 shows the fibox with its 2 analog inputs (one of which, is connected), and 1 fiber output. The output of the DAF goes to the FiBox. Until today, the Fibox recieved only 1 analog input. This analog signal comes from the DAC-8 (count starting from 0), which is located at "CH 1 OUT" SMA output in the "MONITORS" bin on the racks (attachment 2).

I have added another output channel to the DAF model both in software and in hardware. The DAF now also uses DAC-9 analog output which goes to the second analog input of the FiBox. The DAC-9 output is located at "CH 2 OUT" SMA output in the "MONITORS" bin on the racks (attachment 4).

After making the changes, the Fibox is shown in attacment 3.

Testing: The LSC input on passing through the DAF block is given through two different DAC outputs, to the same Fibox channel (one after the other), and the output is heard. More concrete testing will be done tomorrow. It will be as follows:

1) Currently, I need to search for a suitable cable that would connect the second channel of the output fibox to the audio mixer. After doing this, end to end testing of both channels will be done.

2) I could not access the AWG, probably because the DAQ was offline today afternoon. Using a signal from the AWG will give a more concrete testing of the stereo output.

3) After this, I will separate the two channels of the stereo completely (currectly they are seperated only at the DAF output stage)

4) I also will edit the medm gui appropriately.

 

Quote:

I have added Enable buttons for each of the DSP blocks, and labels for the matrix elements. The input matrix takes inputs from each of the 4 channels: ADC1, ADC2, LSC and EXC, and routes them to the audio processing blocks (attachment 2). The output matrix (attachment 3) takes the outputs of the various DSP blocks and routes them to the output and then to the speakers. 

 

Attachment 1: IMG_20160615_145535907.jpg
IMG_20160615_145535907.jpg
Attachment 2: IMG_20160615_145413005_HDR.jpg
IMG_20160615_145413005_HDR.jpg
Attachment 3: IMG_20160616_101229499.jpg
IMG_20160616_101229499.jpg
Attachment 4: IMG_20160616_101157096.jpg
IMG_20160616_101157096.jpg
  12191   Thu Jun 16 16:11:11 2016 jamieUpdateCDSupgrade aborted for now

After poking at the new configuration more, it also started to show instability.  I couldn't figure out how to make test points or excitations available in this configuration, and adding in the full set of test point channels, and trying to do simple things like plotting channels with dtt, the frame writer (fw) would fall over, apparetnly unable to keep up with the broadcast from the dc.

I've revered everything back to the old semi-working fb configuration, and will be kicking this to the CDS group to deal with.

  12200   Mon Jun 20 11:11:20 2016 SteveConfigurationCDSGPS receiver not resetting properly

I called https://www.endruntechnologies.com/pdf/USM3014-0000-000.pdf  and they said it's very likely just needs a software update. They will email Jamie the details.

Quote:

The GPS receiver (EndRun Technologies box in 1Y5? (rack closest to door)) seems to not coming back up properly after the reboot.  The front pannel says that it's "LKD", but the "sync" LED is flashing instead of solid, and the time of year displayed on the front panel is showing day 6.  The fb1 symmetricom driver and VME timing module are still both seeing day 299, though.  So something may definitely be screwy with the GPS receiver.

 

ELOG V3.1.3-