40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 42 of 339  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  3839   Mon Nov 1 16:43:24 2010 KojiSummaryCDSCDS time delay measurement

Um, Beautiful.

Actually, 123.5usec is almost exactly twice of 1/16384Hz.
Because of the loop, we have 1/16384Hz delay. I wonder where we do have the delay.

In order to understand the behaviour of the system can I ask you to test the following things?

1) What are the delay without IOPs with fsampl of 16k, 32k, 64k?

2) What are the delay with IOP with fsampl of 32k, 64k?

Quote:

Result:
  TF agreed well with 2-time feCoeff4x and CDS time delay was -123.5 usec.
CDSdelay2.png

 

  3961   Sat Nov 20 03:37:11 2010 yutaSummaryCDSCDS time delay measurement - the ripple

(Koji, Joe, Yuta)

Motivation:
  We wanted to know more about CDS.

Setup:
  Same as in elog #3829.

What we did:

  1. Made test RT models c1tst and c1nio for c1iscex.
     c1tst has only 2 filter module(minimum limit of a model), 2 inputs, 2 outputs and it runs with IOP c1x01.
     c1nio is the same as c1tst except it runs(or, should run) without IOP.

  2. Measured the time delay of ADC through DAC using different machine, different sampling rate by measuring transfer functions.

  3. c1nio(without IOP) didn't seem to be running correctly and we couldn't measure the TF.
     "1 PPS" error appeared in GDS screen(C1:FEC-39_TIME_ERR).
     It looks like c1nio is receiving the signal as we could see in the MEDM screen, but the signal doesn't come out from the DAC.

TF we expected:
  All the filters and gains are set to 1.

  We have DA's TF when putting 64K signal out to analog world.
    D(f)=exp(-i*pi*f*Ts)*sin(pi*f*Ts)/(pi*f*Ts)  (Ts: sample time)

  We have AA filter and AI filter when downsampling and upsampling.
    A(f)=G*(1+b11/z+b12/z/z)/(1+a11/z+a12/z/z)*(1+b21/z+b22/z/z)/(1+a21/z+a22/z/z)       z=exp(i*2*pi*f*Ts)
  Coefficients can be found in /cvs/cds/rtcds/caltech/c1/core/advLigoRTS/src/fe/controller.c.

/* Coeffs for the 2x downsampling (32K system) filter */
static double feCoeff2x[9] =
        {0.053628649721183,
        -1.25687596603711,    0.57946661417301,    0.00000415782507,    1.00000000000000,
        -0.79382359542546,    0.88797791037820,    1.29081406322442,    1.00000000000000};
/* Coeffs for the 4x downsampling (16K system) filter */
static double feCoeff4x[9] =
    {0.014805052402446, 
    -1.71662585474518,    0.78495484219691,   -1.41346289716898,   0.99893884152400,
    -1.68385964238855,    0.93734519457266,    0.00000127375260,   0.99819981588176};


  For 64K system, we expect H=1.

  We also have a delay.
    S(f)=exp(-i*2*pi*f*dt)   (dt: delay time)

  So, total TF we expect is;
    H(f)=a*A(f)^2*D(f)*S(f)
  a is a constant depending on the range of ADC and DAC(I think). Currently, a=1/4.

  We may need to think about TF when upsampling.(D(f) is TF of upsampling 64K to analog)

Result:

  Example plot is attached.
  For other plots and the raw data, see /cvs/cds/caltech/users/yuta/scripts/CDSdelay2/ directory.
  As you can see, TFs are slightly different from what we expect.
  They show ripple we don't understand at near cut off frequency.

  If we ignore the ripple, here is the result of delay time at each condition;

data file    host    FE    IOP        rate    sample time    delay        delay/Ts
c1rms16K.dat    c1sus      c1rms    adcSlave    16K    61.0usec    110.4usec    1.8
c1scx16K.dat    c1iscex    c1scx    adcSlave    16K    61.0usec     85.5usec    1.4
c1tst16K.dat    c1iscex    c1tst    adcSlave    16K    61.0usec     84.3usec    1.4
c1tst32K.dat    c1iscex    c1tst    adcSlave    32K    30.5usec     53.7usec    1.8
c1tst64K.dat    c1iscex    c1tst    adcSlave    64K    15.3usec     38.4usec    2.5

  The delay time shown above does not include the delay of DA. To include, add 7.6usec(Ts/2).

  - delay time is different for different machine
  - number of filters (c1scx has full of filters for ETMX suspension, c1tst has only 2) doen't seem to effect much to delay time
  - higher the sampling rate, larger the (delay time)/(sample time) ratio

Plan:

 - figure out how to run a model without IOP
 - where do the ripples come from?
 - why we didn't see significant ripple at previous measurement on c1sus?

Attachment 1: c1tst16Kdelay.png
c1tst16Kdelay.png
  4302   Tue Feb 15 15:06:25 2011 josephbUpdateCDSCDS todo list for tomorrow morning

Currently, there is a test directory called /opt/rtcds/caltech/c1/new_core where we have the latest svn checkout.  Tomorrow (after everything works), it will become the core directory.

1) Modify on the fb machine the /diskless/root/etc/ld.so.cache file.  This is done by logging into fb, going to /etc/ld.so.conf.d/, modifying epics-x86_64.conf to only have .10 stuff , and running sudo /sbin/ldconfig.  Copy the newly generated /etc/ld.so.cache file to /diskless/root/etc/.

2) Modify the rc.local file on the fb machine in /diskless/root/etc/ to take advantage of the new subscripts and init.d/ start scripts.

3) Add the no_rfm_dma to all the iop models (c1x01,c1x02,c1x03,c1x04,c1x05).

4) Rebuild all front end models with new code.  Install.

5) Build awgtpman and mx_streams with new code.

6) Rerun activateDaq.py (to fix channel names from all the rebuilt code).

7) Double check Burt request files have the switch fix.

8) Restart the front ends.

9)Restart the frame builder.

9) Check channels, exitations, RFM connections.

10) Check Monit is working.

  3036   Wed Jun 2 17:34:33 2010 josephb, alex, valeraUpdateCDSCDS updates

From what I understand, Alex rewrote portions of the framebuilder and testpoint codes and then recompiled them in order to get more than 1 testpoint per front end working.   I've tested up to 5 testpoints at once so far, and it worked.

We also have a new noise component added to the RCG code.  This piece of code uses the random number generator from chapter 7.1 of Numerical Recipies Third Edition to generate uniform numbers from 0 to 1.  By placing a filter bank after it should give us sufficient flexibility in generating the necessary noise types.  We did a coherence test between two instances of this noise piece, and they looked pretty incoherent.  Valera will add a picture of it when it finishe 1000 averages to this elog.

I'm in the process of propagating the old suspension control filters to the new RCG filter banks to give us a starting point.  Tomorrow Valera and I are planning to choose a subset of the plant filters  and put them in, and then work out some initial control filters to correspond to the plant.  I also need to think about adding the anti-aliasing filters and whitening/dewhitening filters.

 

  4445   Mon Mar 28 15:18:04 2011 josephbUpdateCDSCDS updates on Friday

Last Friday, we discovered a bug in the RCG where the delay part was not actually delaying.  We reported this to Alex who promptly put a fix in the same day.  This allowed Matt's newly proposed frequency discriminator to work properly.

It also required a checkout of the latest RCG code (revision 2328), and rebuild of the various codes.  We backed up all the kernel and executables first such as mbuf.ko and awgtpman.

We did the following:

1) Log into the fb machine.

2) Go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/drv/mbuf and run make.  Copy the newly built mbuf.ko file to /diskless/root/modules/2.6.34.1/kernel/drivers/mbuf/mbuf.ko on the fb machine.

3) Use "sudo cp" to copy the newly built mbuf.ko file to /diskless/root/modules/2.6.34.1/kernel/drivers/mbuf/

4) Go to /cvs/cds/rtcds/caltech/c1/core/advLigoRTS/src/gds and run make.

5) Copy the newly built awgtpman executable to /opt/rtcds/caltech/c1/target/gds/bin/

6) Go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/mx_stream/ and run make.

7) Copy the newly built mx_stream executable to /opt/rtcds/caltech/c1/target/fb/

  17074   Wed Aug 10 20:51:14 2022 TegaUpdateComputersCDS upgrade Front-end machine setup

Here is a summary of what needs doing following the chat with Jamie today.

 

Jamie brought over the KVM switch shown in the attachment and I tested all 16 ports and 7 cables and can confirm that they all work as expected.

 

TODO

1. Do a rack space budget to get a clear picture of how many front-ends we can fit into the new rack

2. Look into what needs doing and how much effort would be needed to clear rack 1X7 and use that instead of the new rack. The power down on Friday would present a good opportunity to do this work on Monday, so get the info ready before then. 

3. Start mounting front-ends, KVM and dolphin network switch

4. Add the BOX rack layout to the CDS upgrade page.

Attachment 1: IMG_20220810_171002928.jpg
IMG_20220810_171002928.jpg
Attachment 2: IMG_20220810_171019633.jpg
IMG_20220810_171019633.jpg
  6540   Tue Apr 17 11:05:04 2012 JamieUpdateCDSCDS upgrade in progress

I am continuing to attempt to upgrade the CDS system to RTS 2.5.  Systems will continue to be up and down for the rest of the day.

  6541   Tue Apr 17 19:03:09 2012 JamieUpdateCDSCDS upgrade in progress

Upgrade progresses, but not complete.  There are some relatively minor issues, and one potentially big issue.

All new software has been installed, including the new epics that supports long channel names.

I've been doing a LOT of cleanup.  It was REALLY messy in there.

The new framebuilder/daqd code is running on fb.

Models are compiling with the new RCG and I am able to get them running.  Some of them are not compiling for relatively minor reasons (the simulink models need updating).  I'm also running into compile problems with IOPs that are using the dolphin drivers.

The major issue is that the framebuilder and the models are not syncing their timing, so there's no data collection.  I've spoken to Alex and he and Rolf are going to come over tomorrow to sort it out.  It's possible that we're missing timing hardware that the new code is expecting.

There are still some stability issues I haven't sorted out yet, and I have a lot more cleanup to do.

At this rate I'm going to shoot for being done Thursday.

  11390   Wed Jul 1 19:16:21 2015 JamieSummaryCDSCDS upgrade in progress

The CDS upgrade is now underway

Here's what's happened so far:

  • Installed and linked in all the RTS supporting software packages in /opt/rtapps (only on front end machines and fb):
    controls@c1lsc ~ 2$ find /opt/rtapps/ -mindepth 1 -maxdepth 1 -type l -ls
    12582916    0 lrwxrwxrwx   1 controls 1001           12 Jul  1 13:16 /opt/rtapps/gds -> gds-2.16.3.2
    12603452    0 lrwxrwxrwx   1 controls 1001           10 Jul  1 13:17 /opt/rtapps/fftw -> fftw-3.3.2
    12603451    0 lrwxrwxrwx   1 controls 1001           15 Jul  1 13:16 /opt/rtapps/libframe -> libframe-8.17.2
    12603450    0 lrwxrwxrwx   1 controls 1001           13 Jul  1 13:16 /opt/rtapps/libmetaio -> libmetaio-8.2
    12582915    0 lrwxrwxrwx   1 controls 1001           34 Jul  1 15:24 /opt/rtapps/framecpp -> ldas-tools-1.19.32-p1/linux-x86_64
    12582914    0 lrwxrwxrwx   1 controls 1001           20 Jul  1 13:15 /opt/rtapps/epics -> epics-3.14.12.2_long
  • Checked out the RTS source for the version we'll be using: 2.9.4

/opt/rtcds/rtscore/tags/advLigoRTS-2.9.4

  • built and installed all of the RTS components:
    • mbuf
    • mx_stream
    • daqd
    • nds
    • awgtpman
       
  • mx_stream is not working. Unknown why. It won't start on the front end machines (only tested on c1lsc so far) with the following error:
    controls@c1lsc ~ 1$ /opt/rtcds/caltech/c1/target/fb/mx_stream -s c1x04 c1lsc c1ass c1oaf c1cal -d fb:0
    mmapped address is 0x7ff7b71a0000
    send len = 263596
    mx_connect failed Remote Endpoint is Closed
    controls@c1lsc ~ 1$
    
    Have contact Keith T. and Rolf B. for backup.  This is a blocker, since this is what ferries the data from the front ends.
     
  • Rebuilt almost all models.  This was good.  Initially nothing would compile because of IPC creation errors, so I moved the old chans/ipc/C1.ipc file out of the way and generated a new one and then everything compiled (of course senders have to be compiled before receivers).
    I only had to fix a couple of things in the models themselves:
    • c1ioo - unterminated FiltCtrl inputs
    • C1_SUS_SINGLE_CONTROL - unterminated FiltCtrl inputs
    • c1oaf - bad part named "STATIC". There is some hacky namespace stuff going on in the RCG. I was able to just explode that part and it now works.
    • c1lsc - unterminated FiltCtrl inputs
    Haven't installed or tried to run anything yet, but the fact they compile is good.
    Some models are not compiling because they have C code in src blocks that are throwing errors:
    • c1lsc
    • c1cal
    It shouldn't be too hard to fix whatever is causing those compile errors.

That's it for today.  Will pick up again first thing tomorrow

  6552   Fri Apr 20 19:54:57 2012 JamieUpdateCDSCDS upgrade problems

I ran into a couple of snags today.

A big one is that the framebuilder daqd started going haywire when I told it to start writing frames.  After restart the logs started showing this:

[Fri Apr 20 17:23:40 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:41 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:42 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:43 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:44 2012] main profiler warning: 0 empty blocks in the buffer
[Fri Apr 20 17:23:45 2012] main profiler warning: 0 empty blocks in the buffer
GPS time jumped from 1019002442 to 1019003041
FATAL: exception not rethrown
FATAL: exception not rethrown
FATAL: exception not rethrown

and the network seemed like it started to get really slow.  I wasn't able to figure out what was going on, so I shut the frame writing off again.  I'll have to work with Rolf on that next week.

Another big problem is the workstation application upgrades.  The NDS protocol version has been incremented, which means that all the NDS client applications have to be upgraded.  The new dataviewer is working fine (on pianosa), but dtt is not:

controls@pianosa:~ 0$ diaggui
diaggui: symbol lookup error: /ligo/apps/linux-x86_64/gds-2.15.1/lib/libligogui.so.0: undefined symbol: _ZN18TGScrollBarElement11ShowMembersER16TMemberInspector
controls@pianosa:~ 127$ 

I don't know what's going on here.  All the library paths are ok.  Hopefully I'll be able to figure this out soon.  The old version of dtt definitely does not work with the new setup.

I might go ahead and upgrade some more of the workstations to Ubuntu in the next couple of days as well, so everything is more on the same page.

I also tried to cleanup the front-end boot process, which has it's own problems (models won't auto-start).  I haven't figured that out yet either.  It really needs to just be completely overhauled.

  6546   Wed Apr 18 19:59:48 2012 JamieUpdateCDSCDS upgrade success

The upgrade is nearly complete:

  • new daqd code is running on fb
  • the fe/daqd timing issue was resolved by adjusting the GPS offset in the daqdrc.  I will document this more later.
  • the power outage conveniently rebooted all the front-end machines, so they're all now running new caRepeater
  • all models have been successfully recompiled with RCG 2.5 (with only a couple small glitches)
  • all new models are running on all front-end machines (with a couple exceptions)
  • all suspension models seem to be damping under local control (PRM is having troubles that are likely unrelated to the upgrade).
  • a lot of cleanup has been done

Remaining tasks/issues:

  • more testing OF EVERYTHING needs o be done
  • I did not yet update the DIS dolphin code, so we're running with the old code.  I don't think this is a problem, but it would be nice to get us running what they're running at the sites
  • I tried to cleanup/simplify how front-end initialization is done.  However, there is a problem and models are not auto-starting after reboot.  This needs to be fixed.
  • the userapps directory is in a new place (/opt/rtcds/userapps).  Not everything in the old location was checked into the repository, so we need to check to make sure everything that needs to be is checked in, and that all the models are running the right code.
  • the c1oaf model seems to be having a dolphin issue that needs to be sorted
  • the c1gfd model causes c1ioo to crash immediately upon being loaded.  I have removed it from the rtsystab.  That model needs to be fixed.
  • general model cleanup is in order.
  • more front-end cleanup is needed, particularly in regards to boot-up procedure.
  • document the entire upgrade procedure.

I'll finish up these remaining tasks tomorrow.

  16881   Fri May 27 17:46:48 2022 PacoSummaryComputersCDS upgrade visit, downfall and rise of c1lsc models

[Paco, Anchal-remote, Yuta, JC]

Sometime around noon today, right after cds upgrade planning tour, c1lsc FE fell. We though this was ok because anyways c1sus was still up, but somehow the IFO alignment was compromised (this is in fact how we first noticed this loss). Yuta couldn't see REFL on the camera, and neither on the AP table (!!) so somehow either/all of TT1, TT2, PRM got affected by this model stopping. We even tried kicking PRM slightly to try and see if the beam was nearby with no success.

We decided to restart the models. To do this we first ssh into c1lsc, c1ioo and c1sus and stop all models. During this step, c1ioo and c1sus dropped their connection and so we had to physically restart them. We then noticed DC 0x4000 error in c1x04 (c1lsc iop) and after checking the gpstimes were different by 1 second. We then did stopped the model again, and from fb1 restart all daqd_* services and modprobe -r gpstime, modprobe gpstime, restart c1lsc and start the c1x04 model. This fixed the issue, so we finished restarting all FE models and burt restore all the relevant snap files to today 02:19 AM PDT.

This made the IFO recover its nominal alignment, minus the usual drift.

* The OAF model failed to start but we left it like so for now.

  11397   Wed Jul 8 21:02:02 2015 JamieSummaryCDSCDS upgrade: another step forward, so we're back to where we started (plus a bit?)

Koji did a bit of googling to determine that 'Wrong Network' status message could be explained by the fb myrinet  operating in the wrong mode:
(This was the useful link to track down the issue (KA))
 

    Network:    Myrinet 10G

I didn't notice it before, but we should in fact be operating in "Ethernet" mode, since that's the fabric we're using for the DC network.  Digging a bit deeper we found that the new version of mx (1.2.16) had indeed been configured with a different compile option than the 1.2.15 version had:

controls@fb ~ 0$ grep '$ ./configure' /opt/src/mx-1.2.15/config.log          
  $ ./configure --enable-ether-mode --prefix=/opt/mx
controls@fb ~ 0$ grep '$ ./configure' /opt/src/mx-1.2.16/config.log
  $ ./configure --enable-mx-wire --prefix=/opt/mx-1.2.16
controls@fb ~ 0$

So that would entirely explain the problem.  I re-linked mx to the older version (1.2.15), reloaded the mx drivers, and everything showed up correctly:

controls@fb ~ 0$ /opt/mx/bin/mx_info
MX Version: 1.2.12
MX Build: root@fb:/root/mx-1.2.12 Mon Nov  1 13:34:38 PDT 2010
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  299.8 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:46:ea:ec
    Product code:    10G-PCIE-8AL-S
    Part number:    09-03916
    Serial number:    352143
    Mapper:        00:60:dd:46:ea:ec, version = 0x00000000, configured
    Mapped hosts:    6

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:46:ea:ec fb:0                              1,0
   1) 00:25:90:0d:75:bb c1sus:0                           1,0
   2) 00:30:48:be:11:5d c1iscex:0                         1,0
   3) 00:30:48:d6:11:17 c1iscey:0                         1,0
   4) 00:30:48:bf:69:4f c1lsc:0                           1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
controls@fb ~ 0$

The front end hosts are also showing good omx info (even though they had been previously as well):

controls@c1lsc ~ 0$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.2
 build: controls@fb:/opt/src/open-mx-1.5.2 Tue May 21 11:03:54 PDT 2013

Found 1 boards (32 max) supporting 32 endpoints each:
 c1lsc:0 (board #0 name eth1 addr 00:30:48:bf:69:4f)
   managed by driver 'igb'

Peer table is ready, mapper is 00:30:48:d6:11:17
================================================
  0) 00:30:48:bf:69:4f c1lsc:0
  1) 00:60:dd:46:ea:ec fb:0
  2) 00:25:90:0d:75:bb c1sus:0
  3) 00:30:48:be:11:5d c1iscex:0
  4) 00:30:48:d6:11:17 c1iscey:0
  5) 00:14:4f:40:64:25 c1ioo:0
controls@c1lsc ~ 0$

This got all the mx_stream connections back up and running.

Unfortunately, daqd is back to being a bit flaky.  With all frame writing enabled we saw daqd crash again.  I then shut off all trend frame writing and we're back to a marginally stable state: we have data flowing from all front ends, and full frames are being written, but not trends.

I'll pick up on this again tomorrow, and maybe try to rebuild the new version of mx with the proper flags.

  11402   Mon Jul 13 01:11:14 2015 JamieSummaryCDSCDS upgrade: current assessment

daqd is still behaving unstably.  It's still unclear what the issue is.

The current failures look like disk IO contention.  However, it's hard to see any evidince of daqd is suffering from large IO wait while it's failing.

The frame size itself is currently smaller than it was before the upgrade:

controls@fb /frames/full 0$ ls -alth 11190 | head
total 369G
drwxr-xr-x 321 controls controls  36K Jul 12 22:20 ..
drwxr-xr-x   2 controls controls 268K Jun 23 06:06 .
-rw-r--r--   1 controls controls  67M Jun 23 06:06 C-R-1119099984-16.gwf
-rw-r--r--   1 controls controls  68M Jun 23 06:06 C-R-1119099968-16.gwf
-rw-r--r--   1 controls controls  69M Jun 23 06:05 C-R-1119099952-16.gwf
-rw-r--r--   1 controls controls  69M Jun 23 06:05 C-R-1119099936-16.gwf
-rw-r--r--   1 controls controls  67M Jun 23 06:05 C-R-1119099920-16.gwf
-rw-r--r--   1 controls controls  68M Jun 23 06:05 C-R-1119099904-16.gwf
-rw-r--r--   1 controls controls  68M Jun 23 06:04 C-R-1119099888-16.gwf
controls@fb /frames/full 0$ ls -alth 11208 | head
total 17G
drwxr-xr-x   2 controls controls  20K Jul 13 01:00 .
-rw-r--r--   1 controls controls  45M Jul 13 01:00 C-R-1120809632-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 01:00 C-R-1120809408-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809392-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809376-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:56 C-R-1120809360-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:55 C-R-1120809344-16.gwf
-rw-r--r--   1 controls controls  50M Jul 13 00:55 C-R-1120809328-16.gwf
controls@fb /frames/full 0$

This would seem to indicate that it's not an increase in frame size that's to blame.

Because slow data is now transported to daqd over the MX data concentrator network rather than via EPICS (RTS 2.8), there is more network on the MX network.   I note also that the channel lists have increased in size:

controls@fb /opt/rtcds/caltech/c1/chans/daq 0$ ls -alt archive/C1LSC* | head -20
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 18:21 archive/C1LSC_150706_182146.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 18:16 archive/C1LSC_150706_181603.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul  6 16:09 archive/C1LSC_150706_160946.ini
-rw-r--r-- 1 4294967294 4294967294  43366 Jul  1 16:05 archive/C1LSC_150701_160519.ini
-rw-r--r-- 1 4294967294 4294967294  43366 Jun 25 15:47 archive/C1LSC_150625_154739.ini
...

I would have thought, though, that data transmission errors would show up in the daqd status bits.

  11427   Sat Jul 18 15:37:19 2015 JamieSummaryCDSCDS upgrade: current status

So it appears we have found a semi-stable configuration for the DAQ system post upgrade:

Here are the issues:

daqd

dadq is running mostly stably for the moment, although it still crashes at the top of every hour (see below).  Here are some relevant points of about the current configuration:

  • recording data from only a subset of front-ends, to reduce the overall load:
    • c1x01
    • c1scx
    • c1x02
    • c1sus
    • c1mcs
    • c1pem
    • c1x04
    • c1lsc
    • c1ass
    • c1x05
    • c1scy
  • 16 second main buffer:
    start main 16;
  • trend lengths: second: 600, minute: 60
    start trender 600 60;
  • writing to frames:
    • full
    • second
    • minute
    • (NOT raw minute trends)
  • frame compression ON

This elliminates most of the random daqd crashing.  However, daqd still crashes at the top of every hour after writing out the minute trend frame. Still unclear what the issue is, but Keith is investigating.  In some sense this is no worse that where we were before the upgrade, since daqd was also crashing hourly then.  It's still crappy, though, so hopefully we'll figure something out.

The inittab on fb automatically restarts daqd after it crashes, and monit on all of the front ends automatically restarts the mx_stream processes.

front ends

The front end modules are mostly running fine.

One issue is that the execution times seem to have increased a bit, which is problematic for models that were already on the hairy edge.  For instance, the rough aversage for c1sus has some from ~48us to 50us.  This is most problematic for c1cal, which is now running at ~66us out of 60, which is obviously untenable.  We'll need to reduce the load in c1cal somehow.

All other front end models seem to be working fine, but a full test is still needed.

There was an issue with the DACs on c1sus, but I rebooted and everything came up fine, optics are now damped:

  11400   Thu Jul 9 16:50:13 2015 JamieSummaryCDSCDS upgrade: if all else fails try throwing metal at the problem

I roped Rolf into coming over and adding his eyes to the problem.  After much discussion we couldn't come up with any reasonable explanation for the problems we've been seeing other than daqd just needing a lot more resources that it did before.  He said he had some old Sun SunFire X4600s from which we could pilfer memory.  I went over to Downs and ripped all the CPU/memory cards out of one of his machines and stuffed them into fb:

fb now has 8 CPU and 16G of RAM

Unfortunately, this is still not enough.  Or at least it didn't solve the problem; daqd is showing the same instabilities, falling over a couple of minutes after I turn on trend frame writing.  As always, before daqd fails it starts spitting out the following to the logs:

[Thu Jul  9 16:37:09 2015] main profiler warning: 0 empty blocks in the buffer

followed by lines like:

[Thu Jul  9 16:37:27 2015] GPS MISS dcu 44 (ASX); dcu_gps=1120520264 gps=1120519812

right before it dies.

I'm no longer convinced that this is a resource issue, though, judging by the resource usage right before the crash:

top - 16:47:32 up 48 min,  5 users,  load average: 0.91, 0.62, 0.61
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.9%us,  0.9%sy,  0.0%ni, 89.1%id,  0.9%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  15952104k total, 13063468k used,  2888636k free,   138648k buffers
Swap:  1023996k total,        0k used,  1023996k free,  7672292k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12016 controls  20   0 8098m 4.4g 104m S  106 29.1   6:45.79 daqd
 4953 controls  20   0 53580 6092 5096 S    0  0.0   0:00.04 nds

Load average less than 1 per CPU, plenty of free memory (~3G free, 0 swap), no waiting for IO (0.9%wa), etc.  daqd is utilizing lots of  threads, which should be spread across many cpus, so even the >100%CPU should be ok.   I'm at a loss...

  11404   Mon Jul 13 18:12:50 2015 JamieSummaryCDSCDS upgrade: left running in semi-stable configuration

I have been watching daqd all day and I don't feel particularly closer to understanding what the issues are.  However, things are

Interestingly, though, the stability appears highly variable at the moment.  This morning, daqd was very unstable and was crashing within a couple of minutes of starting.  However this afternoon, things seemed much more stable.  As of this moment, daqd has been running for for 25 minutes now, writing full frames as well as minute and second trends (no minute_raw), without any issues.  What has changed?

To reiterate, I have been closing watching disk IO to /frames.  I see no indication that there is any disk contention while daqd is failing.  It's still possible, though, that there are disk IO issues affecting daqd at a level that is not readily visible.  From dstat, the frame writes are visible, but nothing else.

I have made one change that could be positively affecting things right now: I un-exported /frames from NFS.  This eliminates anything external from reading /frames over the network.  In particular, it also shuts off the transfer of frames to LDAS.  Since I've done this, daqd has appeared to be more stable.  It's NOT totally stable, though, as the instance that I described above did eventually just die after 43 minutes, as I was writing this.

In any event, as things are currently as stable as I've seen them, I'm leaving it running in this configuration for the moment, with the following relevant daqdrc parameters:

start main 16;
start frame-saver;
sync frame-saver;
start trender 60 60;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start profiler;
start trend profiler;
  11406   Tue Jul 14 09:08:37 2015 JamieSummaryCDSCDS upgrade: left running in semi-stable configuration

Overnight daqd restarted itself only about twice an hour, which is an improvement:

controls@fb /opt/rtcds/caltech/c1/target/fb 0$ tail logs/restart.log
daqd: Tue Jul 14 03:13:50 PDT 2015
daqd: Tue Jul 14 04:01:39 PDT 2015
daqd: Tue Jul 14 04:09:57 PDT 2015
daqd: Tue Jul 14 05:02:46 PDT 2015
daqd: Tue Jul 14 06:01:57 PDT 2015
daqd: Tue Jul 14 06:43:18 PDT 2015
daqd: Tue Jul 14 07:02:19 PDT 2015
daqd: Tue Jul 14 07:58:16 PDT 2015
daqd: Tue Jul 14 08:02:44 PDT 2015
daqd: Tue Jul 14 09:02:24 PDT 2015

Un-exporting /frames might have helped a bit.  However, the problem is obviously still not fixed.

  11408   Tue Jul 14 10:28:02 2015 ericqSummaryCDSCDS upgrade: left running in semi-stable configuration

There remains a pattern to some of the restarts, the following times are all reported as restart times. (There are others in between, however.)

daqd: Tue Jul 14 00:02:48 PDT 2015
daqd: Tue Jul 14 01:02:32 PDT 2015
daqd: Tue Jul 14 03:02:33 PDT 2015
daqd: Tue Jul 14 05:02:46 PDT 2015
daqd: Tue Jul 14 06:01:57 PDT 2015
daqd: Tue Jul 14 07:02:19 PDT 2015
daqd: Tue Jul 14 08:02:44 PDT 2015
daqd: Tue Jul 14 09:02:24 PDT 2015
daqd: Tue Jul 14 10:02:03 PDT 2015

Before the upgrade, we suffered from hourly crashes too:

daqd_start Sun Jun 21 00:01:06 PDT 2015
daqd_start Sun Jun 21 01:03:47 PDT 2015
daqd_start Sun Jun 21 02:04:04 PDT 2015
daqd_start Sun Jun 21 03:04:35 PDT 2015
daqd_start Sun Jun 21 04:04:04 PDT 2015
daqd_start Sun Jun 21 05:03:45 PDT 2015
daqd_start Sun Jun 21 06:02:43 PDT 2015
daqd_start Sun Jun 21 07:04:42 PDT 2015
daqd_start Sun Jun 21 08:04:34 PDT 2015
daqd_start Sun Jun 21 09:03:30 PDT 2015
daqd_start Sun Jun 21 10:04:11 PDT 2015

So, this isn't neccesarily new behavior, just something that remains unfixed. 

  11409   Tue Jul 14 11:57:27 2015 jamieSummaryCDSCDS upgrade: left running in semi-stable configuration
Quote:

There remains a pattern to some of the restarts, the following times are all reported as restart times. (There are others in between, however.)

daqd: Tue Jul 14 00:02:48 PDT 2015
daqd: Tue Jul 14 01:02:32 PDT 2015
daqd: Tue Jul 14 03:02:33 PDT 2015
daqd: Tue Jul 14 05:02:46 PDT 2015
daqd: Tue Jul 14 06:01:57 PDT 2015
daqd: Tue Jul 14 07:02:19 PDT 2015
daqd: Tue Jul 14 08:02:44 PDT 2015
daqd: Tue Jul 14 09:02:24 PDT 2015
daqd: Tue Jul 14 10:02:03 PDT 2015

Before the upgrade, we suffered from hourly crashes too:

daqd_start Sun Jun 21 00:01:06 PDT 2015
daqd_start Sun Jun 21 01:03:47 PDT 2015
daqd_start Sun Jun 21 02:04:04 PDT 2015
daqd_start Sun Jun 21 03:04:35 PDT 2015
daqd_start Sun Jun 21 04:04:04 PDT 2015
daqd_start Sun Jun 21 05:03:45 PDT 2015
daqd_start Sun Jun 21 06:02:43 PDT 2015
daqd_start Sun Jun 21 07:04:42 PDT 2015
daqd_start Sun Jun 21 08:04:34 PDT 2015
daqd_start Sun Jun 21 09:03:30 PDT 2015
daqd_start Sun Jun 21 10:04:11 PDT 2015

So, this isn't neccesarily new behavior, just something that remains unfixed. 

That's interesting, that we're still seeing those hourly crashes.

We're not writing out the full set of channels, though, and we're getting more failures than just those at the hour, so we're still suffering.

  11398   Thu Jul 9 13:26:47 2015 JamieSummaryCDSCDS upgrade: new mx 1.2.16 installed

I rebuilt/installed mx 1.2.16 to use "ether-mode", instead of the default MX-10G:

controls@fb /opt/src/mx-1.2.16 0$ ./configure --enable-ether-mode --prefix=/opt/mx-1.2.16
...
controls@fb /opt/src/mx-1.2.16 0$ make
..
controls@fb /opt/src/mx-1.2.16 0$ make install
...

I then rebuilt/installed daqd so that it properly linked against the updated mx install:

controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ ./configure --enable-debug --disable-broadcast --without-myrinet --with-mx --with epics=/opt/rtapps/epics/base --with-framecpp=/opt/rtapps/framecpp --enable-local-timing
...
controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ make
...
controls@fb /opt/rtcds/rtscore/release/src/daqd 0$ install daqd /opt/rtcds/caltech/c1/target/fb/

It's now back to running and receiving data from the front ends (still not stable yet, though).

  11396   Wed Jul 8 20:37:02 2015 JamieSummaryCDSCDS upgrade: one step forward, two steps back

After determining yesterday that all the daqd issues were coming from the frame writing, I started to dig into it more today.  I also spoke to Keith Thorne, and got some good suggestions from Gerrit Kuhn at GEO.

I  realized that it probably wasn't the trend writing per se, but that turning on more writing to disk was causing increased load on daqd, and consequently on the system itself.  With more frame writing turned on the memory consuption increased to the point of maxing out the physical RAM.  The system the probably starting swaping, which certainly would have choked daqd.

I noticed that fb only had 4G of RAM, which Keith suggested was just not enough.  Even if the memory consumption of daqd has increased significantly, it still seems like 4G would not be enough.  I opened up fb only to find that fb actually had 8G of RAM installed!  Not sure what happend to the other 4G, but somehow they were not visible to the system.  Koji and I eventually determined, via some frankenstein operations with megatron, that the RAM was just dead.  We then pulled 4G of RAM from megatron and replaced the bad RAM in fb, so that fb now has a full 8G of RAM cool.

Unfortunately, when we got fb fully back up and running we found that fb is not able to see any of the other hosts on the data concentrator network sad.  mx_info, which displays the card and network status for the myricom myrinet fiber card, shows:

MX Version: 1.2.16
MX Build: controls@fb:/opt/src/mx-1.2.16 Tue May 21 10:58:40 PDT 2013
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  299.8 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Wrong Network
    Network:    Myrinet 10G

    MAC Address:    00:60:dd:46:ea:ec
    Product code:    10G-PCIE-8AL-S
    Part number:    09-03916
    Serial number:    352143
    Mapper:        00:60:dd:46:ea:ec, version = 0x63e745ee, configured
    Mapped hosts:    1

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:46:ea:ec fb:0                            D 0,0

Note that all front end machines should be listed in the table at the bottom, and they're not.   Also note the "Wrong Network" note in the Status line above.  It appears that the card has maybe been initialized in a bad state?  Or Koji and I somehow disturbed the network when we were cleaning up things in the rack.  "sudo /etc/init.d/mx restart" on fb doesn't solve the problem.  We even rebooted fb and it didn't seem to help.

In any event, we're back to no data flow.  I'll pick up again tomorrow.

  11412   Tue Jul 14 16:51:01 2015 JamieSummaryCDSCDS upgrade: problem is not disk access

I think I have now determined once and for all that the daqd problems are NOT due to disk IO contention.

I have mounted a tmpfs at /frames/tmp and have told daqd to write frames there.  The tmpfs exists entirely in RAM.  There is essentially zero IO wait for such a filesystem, so daqd should never have trouble writing out the frames.

But yet daqd continues to fail with the "0 empty blocks in the buffer" warnings.  I've been down a rabbit hole.

  11393   Tue Jul 7 18:27:54 2015 JamieSummaryCDSCDS upgrade: progress!

After a couple of days of struggle, I made some progress on the CDS upgrade today:

Front end status:

  • RTS upgraded to 2.9.4, and linked in as "release":

/opt/rtcds/rtscore/release -> tags/advLigoRTS-2.9.4

  • mbuf kernel module built installed
  • All front ends have been rebooted with the latest patched kernel (from 2.6 upgrade)
  • All models have been rebuilt, installed, restarted.  Only minor model issues had to be corrected (unterminated unused inputs mostly).
  • awgtpman rebuilt, and installed/running on all front-ends
  • open-mx upgraded to 1.5.2:

/opt/open-mx -> open-mx-1.5.2

  • All front ends running latest version of mx_stream, built against 2.9.4 and open-mx-1.5.2.

We have new GDS overview screens for the front end models:

It's possible that our current lack of IRIG-B GPS distribution means that the 'TIM' status bit will always be red on the IOP models.  Will consult with Rolf.

There are other new features in the front ends that I can get into later.

DAQ (fb) status:

  • daqd and nds rebuilt against 2.9.4, both now running on fb

40m daqd compile flags:

cd src/daqd
./configure --enable-debug --disable-broadcast --without-myrinet --with-mx --enable-local-timing --with-epics=/opt/rtapps/epics/base --with-framecpp=/opt/rtapps/framecpp
make
make clean
install daqd /opt/rtcds/caltech/c1/target/fb/

However, daqd has unfortunately been very unstable, and I've been trying to figure out why.  I originally thought it was some sort of timing issue, but now I'm not so sure.

I had to make the following changes to the daqdrc:

set gps_leaps = 820108813 914803214 1119744016;

That enumerates some list of leap seconds since some time.  Not sure if that actually does anything, but I added the latest leap seconds anyway:

set symm_gps_offset=315964803;

This updates the silly, arbitrary GPS offset, that is required to be correct when not using external GPS reference.

Finally, the last thing I did that finally got it running stably was to turn off all trend frame writing:

# start trender;
# start trend-frame-saver;
# sync trend-frame-saver;
# start minute-trend-frame-saver;
# sync minute-trend-frame-saver;
# start raw_minute_trend_saver;

For whatever reason, it's the trend frame writing that that was causing things daqd to fall over after a short amount of time.  I'll continue investigating tomorrow.

 

We still have a lot of cleanup burt restores, testing, etc. to do, but we're getting there.

  11415   Wed Jul 15 13:19:14 2015 JamieSummaryCDSCDS upgrade: reducing mx end-points as last ditch effort

I tried one last thing, suggested by Keith and Gerrit.  I tried reducing the number of mx end-points on fb to zero, which should reduce the total number of fb threads, in the hope that the extra threads were causing the chokes.

On Tue, Jul 14 2015, Keith Thorne <kthorne@ligo-la.caltech.edu> wrote:
> Assumptions
>  1) Before the upgrade (from RCG 2.6?), the DAQ had been working, reading out front-ends, writing frames trends
>  2) In upgrading to RCG 2.9, the mx start-up on the frame builder was modified to use multiple end-points
> (i.e. /etc/init.d/mx has a line like
> # 1 10G card - X2
> MX_MODULE_PARAMS="mx_max_instance=1 mx_max_endpoints=16 $MX_MODULE_PARAMS"
>  (This can be confirmed by the daqd log file with lines at the top like
> 263596
> MX has 16 maximum end-points configured
> 2 MX NICs available
> [Fri Jul 10 16:12:50 2015] ->4: set thread_stack_size=10240
> [Fri Jul 10 16:12:50 2015] new threads will be created with the stack of size 10
> 240K
>
> If this is the case, the problem may be that the additional thread on the frame-builder (one per end-point) take up so many slots on the 8-core
> frame-builder that they interrupt the frame-writing thread, thus preventing the main buffer from being emptied.  
>
> One could go back to a single end-point. This only helps keep restart of front-end A from hiccuping DAQ for front-end B.
>
> You would have to remove code on front-ends (/etc/init.d/mx_stream) that chooses endpoints. i.e.
> # find line number in rtsystab. Use that to mx_stream slot on card (0-15)
> line_num=`grep -v ^# /etc/rtsystab | grep --perl-regexp -n "^${hostname}\s" | se
> d 's/^\([0-9]*\):.*/\1/g'`
> line_off=$(expr $line_num - 1)
> epnum=$(expr $line_off % 2)
> cnum=$(expr $line_off / 2)
>
>     start-stop-daemon --start --quiet -b -m --pidfile /var/log/mx_stream0.pid --exec /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream -- -e 0 -r "$epnum" -W 0 -w 0 -s "$sys" -d x2daqdc0:$cnum -l /opt/rtcds/tst/x2/target/x2daqdc0/mx_stream_logs/$hostname.log

As per Keith's suggestion, I modified the mx startup script to only initialize a single endpoint, and I modified the mx_stream startup to point them all to endpoint 0.  I verified that indeed daqd was a single MX end-point:

MX has 1 maximum end-points configured

It didn't help.  After 5-10 minutes daqd crashes with the same "0 empty blocks" messages.

I should also mention that I'm pretty sure the start of these messages does not seem coincident with any frame writing to disk; further evidence that it's not a disk IO issue.

Keith is looking at the system now, so we if he can see anything obvious.  If not, I will start reverting to 2.5.

  11417   Wed Jul 15 18:19:12 2015 JamieSummaryCDSCDS upgrade: tentative stabilty?

Keith Thorne provided his eyes on the situation today and had some suggestions that might have helped things

Reorder ini file list in master file.  Apparently the EDCU.ini file (C0EDCU.ini in our case), which describes EPICS subscriptions to be recorded by the daq, now has to be specified *after* all other front end ini files.  It's unclear why, but it has something to do with RTS 2.8 which changed all slow channels to be transported over the mx network.  This alone did not fix the problem, though.

Increase second trend frame size.  Interestingly, this might have been the key.  The second trend frame size was increased to 600 seconds:

start trender 600 60;

The two numbers are the lengths in seconds for the second and minute trends respectively.  They had been set to "60 60", but Keith suggested that longer second trend frames are better, for whatever reason.  It seems he may be right, given that daqd has been running and writing full and trend frames for 1.5 hours now without issue. 


As I'm writing this, though, the daqd just crashed again.  I note, though, that it's right after the hour, and immediately following writing out a one hour minute trend file.  We've been seeing these hour, on the hour, crashes of daqd for quite a while now.  So maybe this is nothing new.  I've actually been wondering if the hourly daqd crashes were associated with writing out the minute trend frames, and I think we might have more evidence to point to that.

If increasing the size of the second trend frames from 60 seconds (35M) to 600 seconds (70M) made a difference in stability, could there be an issue since writing out files that are smaller than some value?  The full frames are 60M, and the minute trends are 35M.

  8547   Tue May 7 23:03:12 2013 KojiConfigurationCDSCDS work

Summary:

c1rfm / c1lsc / c1ass / c1sus were modified. They were recomplied and installed. They are running fine
and confirmed PRMI locking (attempt), arm locking, and Yarm ass with the new codes.

Motivation:

1a. SQRTing switching for POP110 was wrong. 0 enabled sqrting, 1 disabled sqrting. I wanted to fix this.
1b. Sqrting for POP22 was not implemented.

2. Preparation for the shadow sensor control with POPDC.

3. ASS had only an input. I want to run two ASS for the X and Y arms.

SQRTing for POP110/22:

- Flipped the input of the bypass switch. Correspoding MEDM indicators are fixed on the power normalization screen.
- Copied the sqrting structure from POP110 to POP22. Correspoding MEDM buttom was made on the power normalization screen.

- The function of the sqrting buttons were confirmed.

Additional ASS output:

- The output path "NPRO" was removed. Corresponding RFM channels have also removed.
- The previous NPRO path was turned to the "ASS1" path. The previous "ASS" path was turned to "ASS2".
- Corresponding shared memory channel are created/renamed.
- c1ass was modified to receive the new ASS shared memory channels. ASS1 is assigned to the X arm. ASS2 is assigned to the Y arm
- The output matrix screen and the lockin screen were modified accordingly.
- Only script/ASS/Arm_ASS_Setup.py was affected. The corespoding lines (matrix assignment) was fixed.

- The function of Den's version of  ASS was confirmed.

LSC->PRM ASC path

- We want to connect POPDC to PRM ASC. POPDC is acquired on c1lsc.
- So, for now we use the LSC input matrix to assign POPDC to one of the servo bank.
- The last row of the LSC output matrix was assigned to the PCIE connection to c1sus.
- This PCIE connection was connected to the PRM ASC YAW input.

- The connection between LSC and SUS was confirmed.

- During this process I found that there are bunch of channels transferred from LSC to SUS via RFM.
  These channels are transferred via PCIE(dolphin) and then via RFM. But LSC and SUS are connected
  with dolphin. So this just adds additional sampling delay while there is no benefit. I think we should remove the RFM part.
  Note that we need to use RFM for the end mirrors but this also should use only the RFM connection.


Rebuilding the codes

- Prior to the tests of the new functionalities, the codes were rebuild/installed as usual.
- The suspension were shutdown with the watch dogs before the restart of the realtime codes.
- Once the realtime codes were restarted successfully, the watch dogs were reloaded.
- As we removed/added the channels, fb was restarted.
- c1rfm / c1lsc / c1ass / c1sus codes were checked-in to svn
 

  11221   Wed Apr 15 20:54:18 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade bad

The SUS align/misalign scripts don't work after the new CDS utils upgrade. 

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks. 

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

  11223   Wed Apr 15 23:29:08 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q remotely reverted this change.  Scripts seem to work again.

Quote:

The SUS align/misalign scripts don't work after the new CDS utils upgrade. 

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks. 

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

 

  11240   Thu Apr 23 21:05:23 2015 ranaUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q: please update this Wiki page with the go-back procedure:

https://wiki-40m.ligo.caltech.edu/CDSutils_Upgrade_Procedure

  11220   Wed Apr 15 15:14:18 2015 ericqUpdateComputer Scripts / ProgramsCDSutils upgraded to v474

CDSutlils has been updated to the newest version, 474; there are some matrix interface methods that will make our locking scripts easier to read, modify, and maintain.

I've tested the ALS and CARM down scripts, and the LSC offsets script, and they all work fine. 

  7013   Mon Jul 23 20:34:38 2012 JamieOmnistructureComputersCHECK IN YOUR CHANGES TO THE SVN

I'm seeing LOTS OF STUFF NOT CHECKED INTO THE SVN!!!  both modified things that haven't been updated, and things that looked like they haven't been checked in at all.

Please check in your stuff to the SVN!  We need the record!

Look through EVERYTHING that you think you might have touched, or even care about, and make sure it's checked in.

  5581   Fri Sep 30 03:36:19 2011 SureshUpdateComputer Scripts / ProgramsCIOO modified, but not yet compiled

I have added a switch in series with the WFS_GAIN. And I have also added a LKIN_OUT_MATRX between the lockin-outputs and the MC suspensions.  This will enable us to drive the MC mirrors in any combination so that we can (in principle) attain pure translations and rotations of the MC axis.

I will compile the model later during the day.  This is just in case anyone one else were to compile c1ioo.mdl before then.

.

  16248   Thu Jul 15 14:25:48 2021 PacoUpdateLSCCM board

[gautam, paco]

We tested the CM board by implementing the high bandwidth IR lock (single arm). In preparation for this test we temporarily connected the POY11_Q_MON output to the CM board IN1 input and checked the YARM POY transfer function by running the AA_YARM_TEMPLATE under users/Templates/LSC/LSC_loops/YARM_POY/. We made sure the YARM dither optimized TRY so as to maximize the optical gain stage. Then we proceeded as follows:

  • From the LSC --> CM Servo screen, we controlled the REFL 1 Gain (dB) slider (nominal +25) and MC Servo IN2 Gain (dB) slider (nominal -32 dB) to transfer the low bandwidth (digital) control to the high bandwidth (analog) control of the YARM.
  • During this game, we monitored the C1:LSC-POY11_I_ERR_DQ & C1:LSC-CM_SLOW_OUT_DQ error signal channels for saturation, oscillations, or stability.
  • Once a set of gains was successful in maintaining a stable lock, we measured the OLTF using SR 785 to track the UGF as we mix the two paths.
  • Once the gains have increased, a boost and super-boost stages may be enabled as well.

Ultimately, our ability to progressively increase the control bandwidth of the YARM is a proxy that the CM board is working properly. Attachment 1 shows the OLTF progression as we increased the loop's UGF. Note how as we approached the maximum measured UGF of ~ 22 kHz, our phase margin decreased signifying poor stability.


At the end of this measurement, at about ~ 15:45 I restored the CM board IN1 input and disconnected the POY11_Q_MON

gautam: the conclusion here is that the CM board seems to work as advertised, and it's not solely responsible for not being able to achieve the IR handoff. 

Attachment 1: high_BW_TFs.pdf
high_BW_TFs.pdf
  14769   Wed Jul 17 21:22:41 2019 gautamUpdateCDSCM board Latch Enable subtlety

[koji, gautam]

Koji pointed out an important subtlety pertaining to the "LATCH ENABLE" signal line on the CM board. The purpose of this line is to smoothly facilitate the transition of a change in the "multi-bit-binary-outputs", a.k.a. "mbbo", that are controlled by MEDM gain sliders, to the analog electronics on the CM board. Why is this necessary? Imagine changing the gain from 7dB (=0111 in mbbo representation) to 8dB (=1000 in mbbo representation). In order to realize this change, all 4 bits have to change their state. But this almost certainly doesn't happen synchronously, because our EPICS interface isn't synchronous. So at some intermediate times, the mbbo representation could be 0100 (=4dB), or 1111 (=15dB), or many other possible values, which are all significantly different from either the initial value or the desired final state. This is clearly undesirable.

In order to protect against this kind of error, a Latched output part, 74ALS573, is used to buffer the physical digital logic levels from the switches in the analog gain stages. So in the default state, the "LATCH ENABLE" signal line is held "LOW". When a change happens in the EPICS value corresponding to a gain slider, the "LATCH ENABLE" state is quickly toggled to "HIGH", so as to enable the appropriate analog gain stages to be switched, and then again to "LOW", at which point the latch holds its output state. This logic is currently implemented by a piece of code called "latch.o", which is the compiled version of "latch.st", which may be found in /cvs/cds/caltech/target/c1iool0 where it presumably was written for the IMC servo board, but not in /cvs/cds/caltech/target/c1iool0  , which is where the CM board database files reside. The only elog reference I can find pertaining to this particular piece of code is from Alan, and doesn't say anything about the actual logic.

For the new c1iscaux, we need to implement this logic somehow. After discussion between Koji and me, we feel that a piece of python code is sufficient. This would continuously run in the background on the supermicro server machine. The channel hierarchy for each gain channes is as follows (I've taken the example of C1:LSC-CM_REFL1_GAIN):

  • C1:LSC-CM_REFL1_GAIN ------ this is the channel tied to an MEDM slider, and so is a "soft" channel
  • C1:LSC-CM_REFL1_SET ------- this is a "soft" channel that gets converted to an mbbo
  • C1:LSC-CM_REFL1_BITS ------ this is a channel that actually controls (multiple) physical binary outputs on the Acromag

So the logic will be that it continuously scans the EPICS channel C1:LSC-CM_REFL1_GAIN  for a change in set value. When a change is detected, it has to update the C1:LSC-CM_REFL1_SET channel. In the next EPICS refresh cycle, this would result in the mbbo bits, C1:LSC-CM_REFL1_BITS , all changing to the appropriate values. After these changes have happened, we need to toggle the LATCH ENABLE in order to allow the changes to propagate to the analog gain stage switches. Need to think about what's the best way to do this.

  14790   Sun Jul 21 12:55:38 2019 gautamUpdateCDSCM board Latch Enable test script

DATED, SEE ELOG14941 for the most up-to-date info on latch.py.

I wrote (/cvs/cds/caltech/target/c1iscaux3/latch.py) and tested the logic illustrated in Attachment #1. Results of a test are shown in Attachment #2, the various channels change as expected. Note that for negative values of the gain channel, the corresponding "BITS" channel will take on values like 65536 - this is because the mbboDirect data type is a 16 bit data type, and presumably the MSB is the sign bit. A bit mask is applied to this channel before the actual BIO unit bits are set - we should verify that the correct behavior happens, but I don't immediately see any problems.

To me, this is a robust logic, but it will benefit from more sets of eyes giving it a look over. The idea is to run this continuously on the Supermicro machine.

Apart from this, I also fixed some errors in the mbboDirect record syntax - so now I am able to start up the EPICS server without it throwing any error messages. It remains to verify that changing an EPICS gain slider results in the appropriate gain bits being flipped in the correct way (on the hardware side, I think the correct behavior is happening on the software end). For this testing, I turned off the old c1iscaux crate at ~10am, and started up the server on c1iscaux3. I am reverting to the nominal config now (~1pm).

Further testing will require the wiring inside the Acromag chassis to be completed. This should be the priority task for next week.

*Update 1130 22 July 2019: I've now installed the required dependencies on c1iscaux3 and setup the latch.py script to run as a systemctl process dependent on modbusIOC.service.

Attachment 1: LatchLogic.pdf
LatchLogic.pdf
Attachment 2: LatchLogicTest.png
LatchLogicTest.png
  9935   Fri May 9 04:09:39 2014 JenneUpdateLSCCM board boost turn-on checkout

As part of checking the common mode board before we get too carried away with using it, I looked at the time series of the AO servo output when I turned on various boosts, or changed gain values.  As it turns out, basically anything that I did caused glitches.  Oooops.

I plugged a function generator to the IN1 port of the CM board, with a freq of 400Hz, and a voltage of 10mVpp (which is the smallest value that it would allow).  I plugged the BNC version of the servo output into a 300MHz 'scope.

First I looked at "boost" and "super boost", and then I looked at various steps of the AO gain slider.  For all of the button presses that gave me glitches, I saved .png's of the 'scope screen (on a floppy, so I'll have to fetch the data tomorrow...).

Both enabling, and disabling the "Boost" button gave me glitches.

For "Super Boost", I saw glitches for all of the steps, 0->1, 1->2, 2->3. 

For the AO path, I only started at 0dB, and only captured screenshots of glitches when I increased the gain, since presumably that's when we'll care the most during acquisition.  I found that going down in gain caused glitches at every step!  For increasing the gain, steps from an odd number of dBs to an even number consistently caused glitches.  Steps from an even number to an odd number occasionally caused glitches, but they weren't very common.  For the steps that did cause glitches, some were worse than others (7dB to 8dB, 15 dB to 16 dB, and 23 dB to 24 dB seemed the worst.)

After my work, I put all of the cables back, so that we should be ready to utilize the CM board for locking this evening.


For posterity, here are the notes that I took while I was working - I'll make them more coherent when I fold them in with my images tomorrow.  The "first .png, next, etc." are because the 'scope numbers them in order as a default.

1st png = boost enable, then disable
2nd png = super boost, start at 0, then 1, then 2, then 3
3rd png = AO gain from 1 to 0
4th is AO gain from 0 to 1 (happens less often than 1->0, which is every time I get a glitch)
Next is AO gain 1->2, got 2 glitches!
3->2 glitch often, 2->3 much less often
next is 2->3
next png is 3->4, 2 glitches with weird dip
4->5, rare
next png is 5->6
6->7 is rare
next png is 7->8, which is nasty!!
8->9 is rare
png 9->10
10->11 is rare
png 11-> 12, 3 glitches
 12->13 rare
png 13->14, 2 glitches
14->15, rare
png 15->16, kind of nasty
png 17->18, 2 glitches
png 19->20, 3 glitches
png 21->22, 2 glitches
png 23->24, kind of nasty
png 25->26, 2 glitches
png 27->28, 3 glitches, at least
png 29->30, 2 glitches

 

Somehow, the images got put into a whole new entry, even though I thought I was editing this one.  Anyhow, please see elog 9938.

  9938   Fri May 9 14:01:24 2014 JenneUpdateLSCCM board boost turn-on checkout

Note:  I thought I was editing elog 9935, but somehow this became a whole new entry.  Either way, all the info is in here.

 

As part of checking the common mode board before we get too carried away with using it, I looked at the time series of the AO servo output when I turned on various boosts, or changed gain values.  As it turns out, basically anything that I did caused glitches.  Oooops.

I plugged a function generator to the IN1 port of the CM board, with a freq of 400Hz, and a voltage of 10mVpp (which is the smallest value that it would allow).  I plugged the BNC version of the servo output into a 300MHz 'scope.

First I looked at "boost" and "super boost", and then I looked at various steps of the AO gain slider.  For all of the button presses that gave me glitches, I saved .png's of the 'scope screen (on a floppy, so I'll have to fetch the data tomorrow...).

Both enabling, and disabling the "Boost" button gave me glitches.

For "Super Boost", I saw glitches for all of the steps, 0->1, 1->2, 2->3. 

For the AO path, I only started at 0dB, and only captured screenshots of glitches when I increased the gain, since presumably that's when we'll care the most during acquisition.  I found that going down in gain caused glitches at every step!  For increasing the gain, steps from an odd number of dBs to an even number consistently caused glitches.  Steps from an even number to an odd number occasionally caused glitches, but they weren't very common.  For the steps that did cause glitches, some were worse than others (7dB to 8dB, 15 dB to 16 dB, and 23 dB to 24 dB seemed the worst.)

After my work, I put all of the cables back, so that we should be ready to utilize the CM board for locking this evening.


For posterity, here are the notes that I took while I was working - I'll make them more coherent when I fold them in with my images tomorrow.  The "first .png, next, etc." are because the 'scope numbers them in order as a default.

1st png = boost enable, then disable
2nd png = super boost, start at 0, then 1, then 2, then 3
3rd png = AO gain from 1 to 0
4th is AO gain from 0 to 1 (happens less often than 1->0, which is every time I get a glitch)
Next is AO gain 1->2, got 2 glitches!
3->2 glitch often, 2->3 much less often
next is 2->3
next png is 3->4, 2 glitches with weird dip
4->5, rare
next png is 5->6
6->7 is rare
next png is 7->8, which is nasty!!
8->9 is rare
png 9->10
10->11 is rare
png 11-> 12, 3 glitches
 12->13 rare
png 13->14, 2 glitches
14->15, rare
png 15->16, kind of nasty
png 17->18, 2 glitches
png 19->20, 3 glitches
png 21->22, 2 glitches
png 23->24, kind of nasty
png 25->26, 2 glitches
png 27->28, 3 glitches, at least
png 29->30, 2 glitches

 


The screenshot of the Boost enable / disable I'll have to re-take.  Apparently I instead caught a screenshot of the list of files on the floppy...ooops.

This is a shot of enabling the Super Boosts.  At the beginning, it's at "0", so no superboosts (also, regular boost was off).  Then, I switch to "1", and the trace gets a little fuzzy.  Then I switch to "2", and it gets very fuzzy.  Then I switch to "3", and a lot of the fuzz goes away.  There's a glitch at each transition.

SuperBoosts.PNG

The following screenshots are all of various steps of the AO gain slider.  For all of these, both the "boost" and "super boosts" were off.  Each screenshot is a single gain step, even if there are several glitches captured.

First, 0dB to 1dB:

AOgain_0dBto1dB.PNG

Next, 1dB to 2dB:

AOgain_1dBto2dB.PNG

2dB to 3dB:

AOgain_2dBto3dB.PNG

3dB to 4dB:

AOgain_3dBto4dB.PNG

While increasing the gain, I didn't find any more steps from an even to an odd number where I got a glitch.  They would glitch when I undid that step (decreased the gain), but over ~5 trials for each increase, I didn't ever catch a glitch.  The odd to even steps still had glitches while increasing the gain though.

5dB to 6dB:

AOgain_5dBto6dB.PNG

7dB to 8dB:

AOgain_7dBto8dB.PNG

9dB to 10dB:

AOgain_9dBto10dB.PNG

11dB to 12dB:

AOgain_11dBto12dB.PNG

13dB to 14dB:

AOgain_13dBto14dB.PNG

15dB to 16dB:

AOgain_15dBto16dB.PNG

17dB to 18dB:

AOgain_17dBto18dB.PNG

19dB to 20dB:

AOgain_19dBto20dB.PNG

21dB to 22dB:

AOgain_21dBto22dB.PNG

23dB to 24dB:

AOgain_23dBto24dB.PNG

25dB to 26dB:

AOgain_25dBto26dB.PNG

27dB to 28dB:

AOgain_27dBto28dB.PNG

29dB to 30dB:

AOgain_29dBto30dB.PNG

  15330   Thu May 14 00:21:03 2020 gautamUpdateLSCCM board boosts

Summary:

I think the boosts that are currently stuffed on the CM board are too aggressive to be usable for locking the interferometer. I propose some changes.

Details:

[CM board schematic]

[CM board transfer function measurement]

[Measurement of the AO path TF]. Empirically, I have observed that the CARM OLTF has ~90 degrees phase margin available at the UGF when no boosts are engaged, which is consistent with Koji's measurement. Assuming we want at least 30 degrees phase margin in the final configuration, and assuming a UGF to be ~10 kHz, the current boosts eat up way too much phase at 10 kHz. Attachment #1 shows the current TFs (dashed lines), as the boosts are serially engaged. I have subtracted the 180 degrees coming from the inverting input stage. The horizontal dash-dot line on the lower plot is meant to indicate the frequency at which the boost stages eat up 60 degrees of phase, which tells us if we can meet the 30 degree PM requirement.

In solid lines on Attachment #1, I have plotted the analogous TFs, with the following changes:

  • R52, R54: 1.21k --> 3.16k (changes 4 kHz zero to 1.5 kHz).
  • R61, R62: 82.5 --> 165 (changes 20 kHz zero to 10 kHz).
  • R63: 165 --> 300 (changes 10 kHz zero to 5 kHz).

These changes will allow possibly two super boosts to be engaged if we can bump up the CARM UGF to ~15 kHz. We sacrifice some DC gain - I have not yet done the noise analysis of the full CARM loop, but it may be that we don't need 120 dB gain at DC to be sensing noise limited. I suppose the pole frequencies can also be halved if we want to keep the same low frequency gain. In any case, in the current form, we can't access all that gain anyways because we can't enable the boosts without the loop going unstable.

The input referred noise gets worse by a factor of 2 as a result of these changes, but the IN1 gain stage noise is maybe already higher? If this sounds like a reasonable plan, I'll implement it the next time I'm in the lab.

Attachment 1: boosts.pdf
boosts.pdf
Attachment 2: boosts_noise.pdf
boosts_noise.pdf
  10965   Mon Feb 2 22:59:49 2015 diegoUpdateLSCCM board input switched to AS55

[Diego, Jenne]

We just changed the input to the CM board from REFL11 to AS55.

 

  15042   Thu Nov 21 12:46:22 2019 gautamUpdateLSCCM board study

In preparation for trying out some high-bandwidth Y arm cavity locking using the CM board, I hooked up the POY11_Q_Mon channel of the POY11 demod board to the IN1 of the CM board (and disconnected the usual REFL11 cable that goes to IN1). The digital phase rotation for usual POY Yarm locking is 106 degrees, so the analog POY11_Q channel contains most of the signal. I then set the IN1 gain of the servo to 0dB, and looked at the CM_Slow signal - I changed the whitening gain of this channel to +18dB (to match that used for POY11_I and POY11_Q), and found that I had to apply a digital gain of 0.5 to get the PDH horns in the usual POY11_I signal and the CM_Slow signal to line up. There was also a sign inversion. Then I was able to use the digital LSC system and lock the Y arm cavity length to the PSL frequency by actuating on ETMY, using CM_Slow as an error signal. A comparison of the in-loop POY11_I ASD when the arm is locked is shown in Attachment #1 - CMslow seems to be dominated by some kind of electroncis noise above ~100 Hz, so possibly needs more whitening (even though the nominal whitening filter is engaged)?

Anyway, now that I have this part of the servo working, the next step is to try and engage the AO path and achieve a higher BW lock of the Y arm cavity to the PSL frequency (= IMC length). Maybe it makes more sense to actuate on MC2 for the slow path.

Attachment 1: YARM_CMslow.pdf
YARM_CMslow.pdf
  15043   Thu Nov 21 13:14:33 2019 KojiUpdateLSCCM board study

One of the differences between the direct POY and the CM_SLOW POY is the presence of the CM Servo gain stages. So this might mean that you need to move some of the whitening gain to the CM IN1 gain.

  10966   Tue Feb 3 04:01:55 2015 diegoUpdateLSCCM servo & AO path status

[Diego, Jenne]

Tonight we worked on the CM board and AO path:

  • at first we changed the REFL1 input to the CM board from REFL11 to AS55, as written in my previous elog; we tried following Koji's procedures from http://nodus.ligo.caltech.edu:8080/40m/9500 but we didn't get any result: we could lock using the regular digital path but no luck at all for the analog path;
  • then we decided to follow the procedure to the letter, using POY11Q as input to the CM board;
    • we still couldn't lock following the Path #2, even after adjusting the gains to match the current configuration for the Yarm filter bank;
    • we had some more success using Path #1, but we had to lower the REFL1 Gain to ~3-4 (from the original 31) because of the different configuration of the Yarm filter bank, in order to have the same sensing in both of them; we managed to acquire lock a few times, it's not super stable but it can keep lock for a while;
    • when we tried to increase the gain of the MC filter bank and the AO Gain, however, we immediately had some gain peaking, and we couldn't go further then 0.15 and 9db respectively. We currently don't have an answer for that.
    • anyhow, we took a few measurements with the SR785:

 

The BLUE plot is at MC Gain = 0.10 and REFL1 Gain = 4dB; the GREEN plot is for MC Gain = 0.10 and REFL1 Gain = 3dB, which seemed a more stable configuration; after this last configuration, we increased the MC Gain to 0.15 and the AO Gain from 8dB to 9dB and took another measurement, the RED plot; this is as far as we got as of now. We also couldn't increase the REFL11 Gain because it made things unstable and more prone to unlock.

So, some little progress on the AO path procedure, but we are very low on our UGF and we have to find a way to increase our gains without breaking the lock and avoiding the gain peaking we have witnessed tonight.

 

Notes:

  • is the REFL1 Gain dB slider supposed to go to negative dBs? During the night we also tried to use negative dBs, but it seemed it wasn't doing anything instead;
  • when we plugged POY11Q to the CM board, we noticed that it wasn't connected to anything at the moment; since we phase rotate POY11, we were assuming that we were using that signal somewhere. We are confused by this...
  • we remind that REFL11 is no more connected to the CM board input, as POY11 is.
Attachment 1: CARM_03-02-2015_031754.pdf
CARM_03-02-2015_031754.pdf
  10969   Tue Feb 3 16:36:33 2015 ericqUpdateLSCCM servo & AO path status

I have removed REFLDC and the SR560 offsetter from the CM board IN2. Now, analog AS55 I lives there, for our single arm testing. (Analog I has more of the single arm Y PDH signal in it). REFL11 has been reconnected to IN1. 


With ITMX super misaligned, Diego and I locked the Y-arm with the AO path on AS55, ultimately at 4kHz bandwidth, but with plenty of gain margin. We didn't allocate the gains too intelligently, and had the CM board input gain slider maxed out, but plenty of headroom in the digital and AO sliders, making it inconvenient to up the UGF even more, to engage the super boosts. However, since this is just a test case to make sure we still can AO lock, I'm not too worried about this. 

Since LSC FMs and such had changed around, old recipies didn't neccesarily work 1:1. Diego is writing a script for the current recipe, and will post an elog with the steps. 

Gains and signs are able to be tracked by loop TFs, the real sticking point is a stable crossover. We used the 1.6k:80 hardware filter in the CM board to give the AO Path a 1/f shape in the crossover region, and undid it digitally in the CM_SLOW input FM. However, we do use a 300:80 in the MC2 sus FM to make the digital loop like 1/f^2 around the crossover, once a little bit of AO has come in to pull up the digital loop's phase. We used the CARM filter bank to do this, so I think we should be able to use a similar technique to do it in the PRFPMI case, as long as the coupled cavity pole is around ~100Hz. 

Attached are a few OLTFs from the progression.

Attachment 1: yarmAO.pdf
yarmAO.pdf
  10991   Mon Feb 9 17:47:17 2015 diegoUpdateLSCCM servo & AO path status

I wrote the script with the recipe we used, using the Yarm and AS55 on the IN2 of the CM board; however, the steps where the offset should be reduced are not completely deterministic, as we saw that the initial offset (and, therefore, the following ones) could change because of different states we were in. In the script I tried to "servo" the offset using C1:LSC-POY11_I_MON as the reference, but in the comments I wrote the actual values we used during our best test; the main points of the recipe are:

  • misalign the Xarm and the recycling mirrors;
  • setting up CARM_B for POY11 locking and enabling it;
  • setting up CARM_A for CM_SLOW;
  • setting up the CM_SLOW filter bank, with only FM1 and FM4 enabled;
  • setting up the CARM filter bank: FM1 FM2 FM6 triggered, only FM3 and FM5 on; usual CARM gain = 0.006;
  • setting up CARM actuating on MC2;
  • turn off the violin filter FM6 for MC2;
  • setting up the default configuration for the Common Mode Servo and the Mode Cleaner Servo; along with all the initial parameters, here is where the initial offset is set;
  • turn on the CARM output and, then, enable LSC mode;
  • wait until usual POY11 lock is acquired and, a bit later, transition from CARM_B to CARM_A;
  • then, the actual CM_SLOW recipe:
    • CM_AO_GAIN = 6 dB;
    • SUS-MC2_LSC FM6 on (the 300:80 filter);
    • CM_REFL2_GAIN = 18 dB;
    • servo CM_REFL_OFFSET;
    • CM_AO_GAIN = 9 dB;
    • CM_REFL2_GAIN = 21 dB;
    • servo CM_REFL_OFFSET;
    • CM_REFL2_GAIN = 24 dB;
    • servo CM_REFL_OFFSET;
    • CM_REFL2_GAIN = 27 dB;
    • servo CM_REFL_OFFSET;
    • CM_REFL2_GAIN = 31 dB;
    • servo CM_REFL_OFFSET;
    • CM_AO_GAIN = 6 dB;
    • SUS-MC2_LSC FM7 on (the :300 compensating filter);

I tried the procedure and it seems fine, as it did during the tries Q and I made; however, since it touches many things in many places, one should be careful about which state the IFO is into, before trying it.

The script is in scripts/CM/CM_Servo_OneArm_CARM_ON.py and in the SVN.

 

  14948   Tue Oct 8 03:32:42 2019 KojiUpdateCDSCM servo board testing

[Koji]

The logic chips 74ALS573 were replaced. And now the gain sliders are working properly.

== Test Status ==

[done] Whitening gain switching test
[done] AA enable/disable switching
[0th order] LO Det Mon channel check
[none] PD I/F board check
[done] QPD I/F board check
[done] CM Board
[none] ALS I/F board


Last week we found that the logic chip for the REFL1 gain switching was not transmitting the input logic. I went to Downs and obtained the chips. After some inspection some other latch chips were suspicious. Therefore U46, U47, and U48 (#1, #3, and #4 from the top) were replaced. After the replacement, the gain measurements were repeated. This time the test for the AO gain was also performed. Now all three slideres show the gain as expected except for the consistent -0.2dB deficit.

Note that the transfer functions for the REFL gains were measured with the input at IN1 or IN2 and the output at TESTA1. The TFs for the AO gain was measured with the excitation at EXC B, the input at TESTB2 and the output at the SERVO output. The gain and phase variantions for the AO gain at low frequency is the effect of AC coupling existing between the excitation and the servo output.

[Update on Oct 14, 2019]

The measured transfer functions show the phase delay determined by the opamps involved. The phase delay well below the pole frequencies can be represented well by a simple time delay (a phase delay linear to the frequency). Attachment 7 shows the time delay estimated by LISO for each gain setting of each gain stage. REFL2 has particularly large phase delay because of the use of OP27s. The delay is even larger when the gain is high presunmably because of the limited GBW.

Attachment 1: REFL1_2_GAIN1.pdf
REFL1_2_GAIN1.pdf
Attachment 2: REFL1_2_GAIN2.pdf
REFL1_2_GAIN2.pdf
Attachment 3: REFL2_2_GAIN1.pdf
REFL2_2_GAIN1.pdf
Attachment 4: REFL2_2_GAIN2.pdf
REFL2_2_GAIN2.pdf
Attachment 5: AO_GAIN1.pdf
AO_GAIN1.pdf
Attachment 6: AO_GAIN2.pdf
AO_GAIN2.pdf
Attachment 7: delay.pdf
delay.pdf
  14955   Tue Oct 8 18:42:39 2019 KojiUpdateCDSCM servo board testing

The boost filters of the CM servo board were tested. Their ZPK models were made.


The transfer functions of the boost filters were measured with the SG output of a SR785 connected to IN1. The IN1 gain was set to be 0dB. The transfer function was taken between the IN1 input and the TEST1A output.
With no boost and normal boost, the input signal amplitude was fixed to 20mVpk. For the other boosts, however, I could expect large gain variation through a single sweep. Therefore automatic SG amplitude tracking was used. The target was to have the output to be 1V with maximum amplitude of 100mV.

Attachment 1 shows the measured transfer functions.

The pole and zero frequencies of the boosts were estimated using LISO. Here the TFs were normalized by the TF of 'no boost' to cancel the delay of the other stages including that of the monitor channel.

 

ZPK model of Normal Boost:

pole 44.0597566447
zero 4.3927650910k

factor 98.8275377818

 

ZPK model of Super Boost (State1):

pole 878.5368382789
zero 17.5107366335k
factor 20.0840668188

 

ZPK model of Super Boost (State2):

pole 714.8112014271
pole 1.0147609373k
zero 13.2470941080k
zero 22.2259701828k

factor 404.5411036031
 

ZPK model of Super Boost (State3):

pole 886.3650348470
pole 420.4089305781
pole 887.8490768202
zero 8.3635166134k
zero 15.7953592754k
zero 20.5144907279k

factor 8.2051379423k

 

Attachment 1: boosts.pdf
boosts.pdf
  14965   Mon Oct 14 16:06:28 2019 KojiUpdateCDSCM servo board testing

CM Board Slow out (digital length control) path transfer function / pole-zero filter pair (79Hz/1.6kHz) transfer function

The excitation was given from EXC A. The denominator was TESTA2, and the numerator was OUT1.

Attachment 1 shows the measured transfer function with and without PZ filter off and on. The PZ filter provides ~26dB attenuation at  high frequency. The output stage has a single order 100kHz LPF and it is visible in the transfer function.

The transfer function without the PZ filter was modelled by LISO as the following PZK representation. There looked a small step in the TF which caused the additional PZ pair (66~67Hz) but has very minor effect in the mag and phase.

pole 66.2720207366
zero 67.2660731875
pole 93.3044858160k

factor -995.5583556921m

The transfer function of the PZ filter was separately analyzed. The TF with the switch ON was normalized by the one with the switch OFF. Thus it revealed the pure effect of the switch. The PZK model of the stage was estimated to be

pole 79.7312926438
zero 1.6395485993k

factor 996.2196584165m

Attachment 1: pole_zero_filter.pdf
pole_zero_filter.pdf
  14966   Mon Oct 14 16:19:30 2019 KojiUpdateCDSCM servo board testing

For the CM board modeling purpose, the transfer function from TESTA2 to TESTB2 was needed. (Attachment 1)

The ZPK model of this part is

pole 76.2369881805
zero 77.4655685092
pole 7.0761486105M

factor -993.0593433578m

 

Attachment 1: testb2.pdf
testb2.pdf
  14967   Mon Oct 14 16:25:03 2019 KojiUpdateCDSCM servo board testing

The output stage (and AO GAIN stage) of the MC board was modelled. The transfer function was measured with the injection from EXC B. The denominator was TESTB2, and the numerator was SERVO OUT.

This stage is AC coupled by 2x 1st order HPFs. Firstly, this transfer function was measured with AO GAIN set to be 0dB. (Attachment 1)
This TF was used to characterize the cutoffs of the HPF stages, represented as the following ZPK:

zero 1m
zero 1m
pole 6.0502599855
pole 6.0624642854
factor -26.2725046079n

Then the AO GAIN was already measured as seen in [ELOG 14948]. The AO gain TF was then modeled by LISO with the above HPF as the preset. This allows us to characterize the time delay of the AO GAIN part.

Attachment 1: servo_out.pdf
servo_out.pdf
ELOG V3.1.3-