40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 97 of 344  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  17109   Sun Aug 28 23:14:22 2022 JamieUpdateComputersrack reshuffle proposal for CDS upgrade

@tega This looks great, thank you for putting this together.  The rack drawing in particular is great.  Two notes:

  1. In "1X6 - proposed" I would move the "PEM AA + ADC Adapter" down lower in the rack, maybe where "Old FB + JetStor" are, after removing those units since they're no longer needed.  That would keep all the timing stuff together at the top without any other random stuff in between them.  If we can't yet remove Old FB and the JetStor then I would move the VME GPS/Timing chassis up a couple units to make room for the PEM module between the VME chassis and FB1.
  2. We'll eventually want to move FB1 and Megatron into 1X7, since it seems like there will be room there.  That will put all the computers into one rack, which will be very nice.  FB1 should also be on the KVM switch as well.

I think most of this work can be done with very little downtime.

  17111   Mon Aug 29 15:15:46 2022 TegaUpdateComputers3 FEs from LLO got delivered today

[JC, Tega]

We got the 3 front-ends from LLO today. The contents of each box are:

  1. FE machine
  2. OSS adapter card for connecting to I/O chassis
  3. PCI riser cards (x2)
  4. Timing Card and cable
  5. Power cables, mounting brackets and accompanying screws
Attachment 1: IMG_20220829_145533452.jpg
IMG_20220829_145533452.jpg
Attachment 2: IMG_20220829_144801365.jpg
IMG_20220829_144801365.jpg
  17113   Tue Aug 30 15:21:27 2022 TegaUpdateComputers3 FEs from LHO got delivered today

[Tega, JC]

We received the remaining 3 front-ends from LHO today. They each have a timing card and an OSS host adapter card installed. We also receive 3 dolphin DX cards. As with the previous packages from LLO, each box contains a rack mounting kit for the supermicro machine.

Attachment 1: IMG_20220830_144925325.jpg
IMG_20220830_144925325.jpg
Attachment 2: IMG_20220830_142307495.jpg
IMG_20220830_142307495.jpg
Attachment 3: IMG_20220830_143059443.jpg
IMG_20220830_143059443.jpg
  17127   Fri Sep 2 13:30:25 2022 Ian MacMillanSummaryComputersQuantization Noise Calculation Summary

P. P. Vaidyanathan wrote a chapter in the book "Handbook of Digital Signal Processing: Engineering Applications" called "Low-Noise and Low-Sensitivity Digital Filters" (Chapter 5 pg. 359).  I took a quick look at it and wanted to give some thoughts in case they are useful. The experts in the field would be Leland B. JacksonP. P. VaidyanathanBernard Widrow, and István Kollár.  Widrow and Kollar  wrote the book "Quantization Noise Roundoff Error in Digital Computation, Signal Processing, Control, and Communications" (a copy of which is at the 40m). it is good that P. P. Vaidyanathan is at Caltech.

Vaidyanathan's chapter is serves as a good introduction to the topic of quantization noise. He starts off with the basic theory similar to my own document on the topic. From there, there are two main relevant topics to our goals.

The first interesting thing is using Error-Spectrum Shaping (pg. 387). I have never investigated this idea but the general gist is as poles and zeros move closer to the unit circle the SNR deteriorates so this is a way of implementing error feedback that should alleviate this problem. See Fig. 5.20 for a full realization of a second-order section with error feedback.

The second starts on page 402 and is an overview of state space filters and gives an example of a state space realization (Fig. 5.26). I also tested this exact realization a while ago and found that it was better than the direct form II filter but not as good as the current low-noise implementation that LIGO uses. This realization is very close to the current realization except uses one less addition block.

Overall I think it is a useful chapter. I like the idea of using some sort of error correction and I'm sure his other work will talk more about this stuff. It would be useful to look into.

One thought that I had recently is that if the quantization noise is uncorrelated between the two different realizations then connecting them in parallel then averaging their results (as shown in Attachment 1) may actually yield lower quantization noise. It would require double the computation power for filtering but it may work. For example, using the current LIGO realization and the realization given in this book it might yield a lower quantization noise. This would only work with two similarly low noise realizations. Since it would be randomly sampling two uniform distributions and we would be going from one sample to two samples the variance would be cut in half, and the ASD would show a 1/√2 reduction if using realizations with the same level of quantization noise. This is only beneficial if the realization with the higher quantization noise only has less than about 1.7 times the one with the lower noise. I included a simple simulation to show this in the zip file in attachment 2 for my own reference.

Another thought that I had is that the transpose of this low-noise state-space filter (Fig. 5.26) or even of LIGO's current filter realization would yield even lower quantization noise because both of their transposes require one less calculation.

Attachment 1: averagefiltering.pdf
averagefiltering.pdf
Attachment 2: AveragingFilter.py.zip
  17144   Mon Sep 19 20:21:06 2022 TegaUpdateComputers1X7 and 1X6 work

[Tega, Paco, JC]


We moved the GPS network time server and the Frequency distribution amplifier from 1X7 to 1X6 and the PEM AA, ADC adapter and Martian network switch from 1X6 to 1X7. Also mounted the dolphin IX switch at the rear of 1X7 together with the DAQ and martian switches. This cleared up enough space to mount all the front-ends, however, we found that the mounting brackets for the frontends do not fit in the 1X7 rack, so I have decided to mount them on the upper part of the test stand for now while we come up with a fix for this problem. Attachments 1 to 3 show the current state of racks 1X6, 1X7 and the teststand.

 

Attachment 1: Front of racks 1X6 and 1X7

Attachment 2: Rear of rack 1X7

Attachment 3: Front of teststand rack


Plan for the remainder of the week

Tuesday

  • Setup the 6 new front-ends to boot off the FB1 clone.
  • Test PCIe I/O cables by connecting them btw the front-ends and teststand I/O chassis one at a time to ensure they work
  • Then lay the fiber cables to the various I/O chassis.

Wednesday

  • Migrate the current models on the 6 front-ends to the new system.
  • Replace RFM IPC parts with dolphin IPC parts in c1rfm model running c1sus machine
  • Replace the RFM parts in c1iscex and c1iscey models
  • Drop c1daf and c1oaf models from c1isc machine, since the front-ends have only have 6 cores
  • Build and install models

Thursday [CAN I GET THE IFO ON THIS DAY PLEASE?]

  • Complete any remaining model work
  • Connect all I/O chassis to their respective (new) front-end and see if we can start the models (Need to think of a safe way to do this. Should we disconnect coil drivers b4 starting the models?)

Friday

  • Tie-up any loose ends
Attachment 1: IMG_20220919_204013819.jpg
IMG_20220919_204013819.jpg
Attachment 2: IMG_20220919_203541114.jpg
IMG_20220919_203541114.jpg
Attachment 3: IMG_20220919_203458952.jpg
IMG_20220919_203458952.jpg
  17148   Tue Sep 20 23:06:23 2022 TegaUpdateComputersSetup the 6 new front-ends to boot off the FB1 clone

[Tega, Radhika, JC]

We wired the front-ends for power, DAQ and martian network connections. Then moved the I/O chassis from the buttom of the rack to the middle just above the KVM switch so we can leave the top og the I/O chassis open for access to the ports of OSS target adapter card for testing the extension fiber cables.

Attachment 1 (top -> bottom)

c1sus2

c1iscey

c1iscex

c1ioo

c1sus

c1lsc


When I turned on the test stand with the new front-ends, after a few minutes, the power to 1x7 was cut off due to overloading I assume. This brought down nodus, chiara and FB1. After Paco reset the tripped switch, everything came back without us actually doing anything, which is an interesting observation.


After this event, I moved the test stand power plug to the side wall rail socket. This seems fine so far. I then brought chiara (clone) and FB1 (clone) online. Here are some changes I made to get things going:

Chiara (clone)

  • Edited '/etc/dhcp/dhcpd.conf' to update the MAC address of the front-ends to match the new machines, then run
  • sudo service isc-dhcp-server restart
  • then restart front-ends
  • Edited '/etc/hosts' on chiara to include c1iscex and c1iscey as these were missing

 

FB1 (clone)

Getting the new front-ends booting off FB1 clone:

1. I found that the KVM screen was flooded with setup info about the dolphin cards on the LLO machines. This actually prevented login using the KVM switch for two of these machines.  Strangely, one of them 'c1sus' seemed to be fine, see attachment 2, so I guessed this was bcos the dolphin network was already configured earlier when we were testing the dolphin communications. So I decided to configure the remaining dolphin cards. To do so, we do the following

Dolphin Configuration:

1a. Ideally running

sudo /opt/DIS/sbin/dis_mkconf -fabrics 1 -sclw 8 -stt 1 -nodes c1lsc c1sus c1ioo c1iscex c1iscey c1sus2 -nosessions

should set up all the nodes, but this did not happen. In fact, I could no longer use the '/opt/DIS/sbin/dis_admin' GUI after running this operation and restarting the 'dis_networkmgr.service' via

sudo systemctl restart dis_networkmgr.service

so  I logged into each front-end and configured the dolphin adapter there using

sudo /opt/DIS/sbin/dis_config

After which I shut down FB1 (clone) bcos restarting it earlier didn't work, I waited a few minutes and then started it.  Everything was fine afterward, although I am not quite sure what solved the issue as I tried a few things and I was glad to see the problem go!

1b. I later found after configuring all the dolphin nodes that 2 of them failed the '/opt/DIS/sbin/dis_diag' test with an error message suggesting three possible issues of which one was 'faulty cable'. I looked at the units in question and found that swapping both cables with the remaining spares solved the problem. So it seems like these cables are faulty (need to double-check this). Attachment 3 shows the current state of the dolphin nodes on the front-ends and the dolphin switch.


2. I noticed that the NFS mount service for the mount points '/opt/rtcds' and '/opt/rtapps' in /etc/fstab exited with an error, so I ran 

sudo mount -a

3. edit '/etc/hosts' to include c1iscex and c1iscey as these were missing

 

Front-ends

To test the PCIe extension fiber cables that connect the front-ends to their respective I/O chassis, we run the following command (after booting the machine with the cable connected): 

controls@c1lsc:~$ lspci -vn | grep 10b5:3
    Subsystem: 10b5:3120
    Subsystem: 10b5:3101

If we see the output above, then both the cable and OSS card are fine (We know from previous tests that the OSS card on the I/O chassis is good). Since we only have one I/O chassis, we repeat the step above 8 times, also cycling through the six new front-end as we go so that we are also testing the installed OSS host adapter cards. I was able to test 4 cables and 4 OSS host cards (c1lsc, c1sus, c1ioo, c1sus2), but the remaining results were inconclusive (i.e. it seems to suggest that 3 out of the remaining 5 fiber cables are faulty, which in itself would be considered unfortunate but I found the reliability if the test to be in question when I went back to test the functionality to the 2 remaining OSS host cards using a cable that passed the test earlier and it didn't pass. After a few retries, I decided to call it a day b4 I lose my mind) and need to be redone again tomorrow.

 

Note: We were unable to lay the cables today bcos these tests were not complete, so we are a bit behind the plan. Would see if we can catch up tomorrow.

 

Quote:

Plan for the remainder of the week

Tuesday

  • Setup the 6 new front-ends to boot off the FB1 clone.
  • Test PCIe I/O cables by connecting them btw the front-ends and teststand I/O chassis one at a time to ensure they work
  • Then lay the fiber cables to the various I/O chassis.

 

Attachment 1: IMG_20220921_084220465.jpg
IMG_20220921_084220465.jpg
Attachment 2: dolphin_err_init_state.png
dolphin_err_init_state.png
Attachment 3: dolphin_final_state.png
dolphin_final_state.png
  17151   Wed Sep 21 17:16:14 2022 TegaUpdateComputersSetup the 6 new front-ends to boot off the FB1 clone

[Tega, JC]

We laid 4 out of 6 fiber cables today. The remaining 2 cables are for the I/O chassis on the vertex so we would test the cables the lay it tomorrow. We were also able to identify the problems with the 2 supposedly faulty cable, which are not faulty. One of them had a small bend in the connector that I was able to straighten out with a small plier and the other was a loose connection in the switch part. So there was no faulty cable, which is great! Chris wrote a matlab script that does the migration of all the model files. I am going through them, i.e. looking at the CDS parameter block to check that all is well. Next task is to build and install the updated models. Also need to update the '/opt/rtcds' and '/opt/rtapps' directory to the latest in the 40m chiara system.

 

  17153   Thu Sep 22 20:57:16 2022 TegaUpdateComputersbuild, install and start 40m models on teststand

[Tega, Chris]

We built, installed and started all the 40m models on the teststand today. The configuration we employ is to connect c1sus to the teststand I/O chassis and use dolphin to send the timing to other frontends. To get this far, we encounterd a few problems that was solved by doing the following:

0. Fixed frontend timing sync to FB1 via ntp

1. Set the rtcds enviroment variable `CDS_SRC=/opt/rtcds/userapps/trunk/cds/common/src` in the file '/etc/advligorts/env'

2. Resolved chgrp error during models installation using sticky bits on chiara, i.e. `sudo chmod g+s -R /opt/rtcds/caltech/c1/target`

3. Replaced `sqrt` with `lsqrt` in `RMSeval.c` to eliminate compilation error for c1ioo

4. Created a symlink for 'activateDQ.py' and 'generate_KisselButton.py' in '/opt/rtcds/caltech/c1/post_build'

5. Installed and configured dolphin for new frontend 'c1shimmer'

6. Replaced 'RFM0' with 'PCIE' in the ipc file, '/opt/rtcds/caltech/c1/chans/ipc/C1.ipc'

 

We still have a few issues namely:

1. The user models are not running smoothly. the cpu usage jumps to its maximum value every second or so.

2. c1omc seems to be unable to get its timing from its IOP model (Issue resolved by changing the CDS block parameter 'specific_cpu' from 6 to 4 bcos the new FEs only have 6 cores, 0-5)

3. The need to load the `dolphin-proxy-km` library and start the `rts-dolphin_daemon` service whenever we reboot the front-end

Attachment 1: dolphin_state_plus_c1shimmer.png
dolphin_state_plus_c1shimmer.png
Attachment 2: FE_status_overview.png
FE_status_overview.png
  17158   Fri Sep 23 19:07:03 2022 TegaUpdateComputersWork to improve stability of 40m models running on teststand

[Chris, Tega]

Timing glitch investigation:

  • Moved dolphin transmit node from c1sus to c1lsc bcos we suspect that the glitch might be coming from the c1sus machine (earlier c1pem on c1sus was running faster then realtime).
  • Installed and started c1oaf to remove the shared memory IPC error to/from c1lsc model
  • /opt/DIS/sbin/dis_diag gives two warnings on c1sus2
    • [WARN] IXH Adapter 0 - PCIe slot link speed is only Gen1
    • [WARN] Node 28 not reachable, but is an entry in the dishosts.conf file - c1shimmer is currently off, so this is fine.

DAQ network setup:

  • Added the DAQ ethernet MAC address  and fixed IPV4 address for the front-ends to '/etc/dhcp/dhcpd.conf'
  • Added the fixed DAQ IPV4 address and port for all the front-ends to '/etc/advligorts/subscriptions.txt' for `cps_recv` service
  • Edited '/etc/advligorts/master' by including all the iop and user models '.ini' files in '/opt/rtcds/caltech/c1/chans/daq/' containing channel info and the corresponding tespoint files in '/opt/rtcds/caltech/c1/target/gds/param/'
  • Created systemd environment file for each front-end in '/diskless/root/etc/advligorts/' containing the argument for local data concentrator and daq data transmitter (`local_dc_args` and `cps_xmit_args`). We currently have staggered the delay (-D waitValue) times of the front-ends by setting it to the last number in the daq ip address when we were facing timing glitch issues, but should probably set it back to zero to see if it has any effect.

Other:

  • Edited /etc/resolv.conf on fb1 and 'diskless/root' to enable name resolution via for example `host c1shimmer` but the file gets overwritten on chiara for some reason

Issues:

  1. Frame writing is not working at the moment. It did at some point in the past for a couple of days but stopped working earlier today and we can't quite figure out why. 
  2. We can't get data via diaggui or ndscope either. Again, we recall the working in the past too but not sure why it has stopped working now.   
  3. The cpu load on c1su2 is too high so we should split into two models
  4. We still get the occassional IPC glitch both for shared memory and dolphin, see attachments
Attachment 1: dolphin_state_all_green.png
dolphin_state_all_green.png
Attachment 2: dolphin_state_IPC_glitch.png
dolphin_state_IPC_glitch.png
  17164   Thu Sep 29 15:12:02 2022 JCUpdateComputersSetup the 6 new front-ends to boot off the FB1 clone

[Jamie, Christopher, JC]

This morning we decided to label the the fiber optic cables. While doing this, we noticed that the ends had different label, 'Host' and 'Target'. Come to find out, the fiber optic cables are directional. Four out of Six of the cables were reversed. Luckily, 1 cable for the 1Y3 IO Chassis has a spare already laid (The cable we are currently using).  Chris, Jamie, and I have begun reversing these cable to there correct position.

Quote:

[Tega, JC]

We laid 4 out of 6 fiber cables today. The remaining 2 cables are for the I/O chassis on the vertex so we would test the cables the lay it tomorrow. We were also able to identify the problems with the 2 supposedly faulty cable, which are not faulty. One of them had a small bend in the connector that I was able to straighten out with a small plier and the other was a loose connection in the switch part. So there was no faulty cable, which is great! Chris wrote a matlab script that does the migration of all the model files. I am going through them, i.e. looking at the CDS parameter block to check that all is well. Next task is to build and install the updated models. Also need to update the '/opt/rtcds' and '/opt/rtapps' directory to the latest in the 40m chiara system.

 

 

  17172   Tue Oct 4 21:00:49 2022 ChrisUpdateComputersFailed takeover attempt with the new front ends

[Jamie, JC, Chris]

Today we made a failed attempt to take over the 40m hardware with the new front ends on the test stand.

As an initial test, we connected the new c1iscey to its I/O chassis using the OneStop fiber link. This went smoothly, so we tried to proceed with the rest of the system, which uncovered several problems. Consequently, we’ve reverted control back to the old front ends tonight, and will regroup and make another attempt tomorrow.

Status summary:

  • c1iscey worked on the first try
  • c1lsc worked, after we sorted out which of the two OneStop cables run to its rack we needed to use
  • c1sus2 sort of worked (its models have been crashing sporadically)
  • c1ioo had a busted OneStop cable, and worked after that was replaced
  • c1sus refused to work with the fiber OneStop cables (we tried several, including the known working one from c1ioo), but we jury-rigged it to run over a copper cable, after nudging the teststand rack a bit closer to the chassis
  • c1iscex refused to work with the fiber OneStop cables, and substituting copper was not an option, so we were stuck

There are various pathologies that we've seen with the OneStop interface cards in the I/O chassis. We don't seem to have the documentation for these cards, but our interpretive guesses are as follows:

  • When working, it is supposed to illuminate all the green LEDs along the top of the card, and the four next to the connector. In this state, you can run lspci -vt on the host, and see the various PLX/Contec/etc devices that populate the chassis.
  • When the cable is unplugged or bad, only four green LEDs illuminate on the card, and none by the connector. No devices from the chassis can be seen from the host.
  • On c1iscex and c1sus, when a fiber link is plugged in, it turns on all the LEDs along the top of the card, but the four next to the connector remain dark. We’re not sure yet what this is trying to tell us, but lspci finds no devices from the chassis, same as if it is unplugged.
  • Also, sometimes on c1iscex, no LEDs would illuminate at all (possibly the card was not seated properly).

Tomorrow, we plan to swap out the c1iscex I/O chassis for the one in the test stand, and see if that lets us get the full system up and running.

  17173   Thu Oct 6 07:29:30 2022 ChrisUpdateComputersSuccessful takeover attempt with the new front ends

[JC, Chris]

Last night’s CDS upgrade attempt succeeded in taking over the IFO. If the IFO users are willing, let’s try to run with it today.

The new system was left in control of the IFO hardware overnight, to check its stability. All looks OK so far.

The next step will be to connect the new FEs, fb1, and chiara to the martian network, so they’re directly accessible from the control room workstations (currently the system remains quarantined on the teststand network). We’ll also need to bring over the changes to models, scripts, etc that have been made since Tega’s last sync of the filesystem on chiara.

The previous elog noted a mysterious broken state of the OneStop link between FE and IO chassis, where all green LEDs light up on the OneStop board in the IO chassis, except the four next to the fiber link connector. This was seen on c1sus and c1iscex. It was recoverable last night on c1iscex, by fully powering down both FE computer and chassis, waiting a bit, and then powering up chassis and computer again. Currently c1sus is running with a copper OneStop cable because of the fiber link troubles we had, but this procedure should be tried to see if one of the fiber links can be made to work after all.

In order to string the short copper OneStop cable for c1sus, we had to move the teststand rack closer to the IO chassis, up against the back of 1X6/1X7. This is a temporary state while we prepare to move the FEs to their new rack. It hopefully also allows sufficient clearance to the exit door to pass the upcoming fire inspection.

At first, we connected the teststand rack’s power cables to the receptacle in 1X7, but this eventually tripped 1X7’s circuit breaker in the wall panel. Now, half of the teststand rack is on the receptacle in 1X6, and the other half is on 1X7 (these are separate circuits).

After the breaker trip, daqd couldn’t start. It turned out that no data was flowing to it, because the power cycle caused the DAQ network switch to forget a setting I had applied to enable jumbo frames on the network. The configuration has now been saved so that it should apply automatically on future restarts. For future reference, the web interface of this switch is available by running firefox on fb1 and navigating to 10.0.113.254.

When the FE machines are restarted, a GPS timing offset in /sys/kernel/gpstime/offset sometimes fails to initialize. It shows up as an incorrect GPS time in /proc/gps and on the GDS_TP MEDM screens, and prevents the data from getting timestamped properly for the DAQ. This needs to be looked at and fixed soon. In the meantime, it can be worked around by setting the offset manually: look at the value on one of the FEs that got it right, and apply it using sudo sh -c "echo CORRECT_OFFSET >/sys/kernel/gpstime/offset".

In the first ~30 minutes after the system came up last night, there were transient IPC errors, caused by drifting timestamps while the GPS cards in the FEs got themselves resynced to the satellites. Since then, timing has remained stable, and no further errors occurred overnight. However, the timing status is still reported as red in the IOP state vectors. This doesn’t seem to be an operational problem and perhaps can be ignored, but we should check it out later to make sure.

Also, the DAC cards in c1ioo and c1iscey reported FIFO EMPTY errors, triggering their DACKILL watchdogs. This situation may have existed in the old system and gone undetected. To bypass the watchdog, I’ve added the optimizeIO=1 flag to the IOP models on those systems, which makes them skip the empty FIFO check. This too should be further investigated when we get a chance.

  309   Mon Feb 11 22:44:29 2008 robConfigurationDAQChange in channel trending on C1SUS2

I removed the MC2 optical lever related channels from trends, and the SRM POUT and YOUT (as these are redundant if we're also trending PERROR and YERROR). I did this because the c1susvme2 processor was having bursts of un-syncy lateness every ~15 seconds or so, and I suspected this might interfere with locking activities. This behaviour appears to have been happening for a month or so, and has been getting steadily worse. Rebooting did not fix the issue, but it appears that removing some trends has actually helped. Attached is a 50 day trend of the c1susvme2 sync-fe monitor.
Attachment 1: srm_sync.png
srm_sync.png
  422   Wed Apr 16 21:11:12 2008 ranaSummaryDAQAA/AI Filters for the DAQ & FE systems
I used Foton to make up some new filters which will be used all over the project in order to downsample/upsample.

There will be 2 flavors:

  • The first one will be a downsampling filter for use in the DAQ system.
    Whenever you specify a sampling rate in the .ini files below the natural rate of the ADC,
    the data will be downsampled using this filter (called ULYAW_0 in the plot). This one was
    designed for flat bandpass and a 'good' bandstop but no care given to the phase shift.

  • The second one will be used in the FE systems to downsample the ADC signal which is often
    sampled at 64 kHz down to something manageable like 2k or 16k. This one was tweaked for
    getting less phase lag in the 'control' band (usually 3x or so below Nyquist).

Here is the associated filter file:
# SAMPLING ULYAW 16384
# DESIGN   ULYAW 0 zpk([0.512+i*1024;0.512-i*1024;2.048+i*2048;2.048-i*2048], \
#                      [515.838+i*403.653;515.838-i*403.653;318.182+i*623.506;318.182-i*623.506;59.2857+i*827.88; \
#                      59.2857-i*827.88],0.988553,"n")
# DESIGN   ULYAW 1 zpk([0.512513+i*1024;0.512513-i*1024;1.53754+i*2048;1.53754-i*2048], \
#                      [200+i*346.41;200-i*346.41;45+i*718.592;45-i*718.592],1,"n")
# DESIGN   ULYAW 2 zpk([0.768769+i*1024;0.768769-i*1024;1.53754+i*2048;1.53754-i*2048], \
#                      [194.913-i*331.349;194.913+i*331.349;53.1611+i*682.119;53.1611-i*682.119],1,"n")
###                                                                          ###
ULYAW    0 21 3      0      0 DAQAA         0.00091455950698073    -1.62010355523604     0.67259370084279    -1.84740554170818     0.99961738977942
                                                                   -1.72089534598832     0.78482029284220    -1.41321371411946     0.99858678588255
                                                                   -1.85800352005967     0.95626992044093     2.00000000000000     1.00000000000000
ULYAW    1 21 2      0      0 FEAA            0.018236566955641    -1.83622978049494     0.85804776530302    -1.84740518752455     0.99961700649533
                                                                   -1.89200532023258     0.96649324616546    -1.41346289594856     0.99893883979950
ULYAW    2 21 2      0      0 ELP             0.015203943102927    -1.84117829296043     0.86136943504058    -1.84722827171918     0.99942556512240
                                                                   -1.89339022414279     0.96048849609619    -1.41346289594856     0.99893883979950
Attachment 1: DAQ_filters_080416.pdf
DAQ_filters_080416.pdf
  691   Thu Jul 17 16:39:58 2008 Max JonesUpdateDAQMagnetometer Installed
Today I installed the magnetometer near the beam splitter chamber. It is located on the BSC chamber at head height on the inner part of the interferometer (meaning I had to crawl under the arms to install it). I don't think I disturbed anything during installation but I think that's it's probably prudent to tell everyone that I was back there just in case. I plan to run 3 BNC cables (one for each axis) from the magnetometer to the DAQ input either tonight or tomorrow. Suggestions are appreciated. - Max.
  857   Tue Aug 19 19:14:17 2008 YoichiConfigurationDAQFixed C1:IOO-MC_RFAMPDDC
Yoichi, Rob

C1:IOO-MC_RFAMPDDC, which is a PD at the transmission port of the MC, was not recording sensible values.
So I tracked down the problem starting from the centering of the beam on the PD.
The beam was hitting the PD properly. The DC output BNC on the PD provided +1.25V output when the light was
falling on the PD. The PD is fine.
The flat cable from the PD runs to the IOO rack and fed into the LSC PD interface card.
The output from the interface card is connected to a VMIC3113A DAQ card, through cross connects.
The voltages on the cross connects were ok.
The VMIC3113A was controlled by an EPICS machine (c1iool0). So it provides only a slow channel.
By looking at C1IOOF.ini and tpchn_C1.par, I figured that C1:IOO-MC_RFAMPDDC is using chnnum=13639 in the RFM
network and it is named C1:IOO-ICS_CHAN_15 in the .par file. So it is reading values from the ICS DAQ board.
Actually nothing was connected to the channel 15 of the ICS board and that was why C1:IOO-MC_RFAMPDDC was reading
nothing. So I took the PD signal from the cross connect and hooked it up to the Ch15 of the ICS DAQ through
the large black break out box with 4-pin LEMOs. Now C1:IOO-MC_RFAMPDDC reads the DC output of the PD.
I also put an ND filter in front of the RFAMPD to avoid the saturation of the ADC. The attenuation should have been done
electronically, but I was too lazy. Since the ND filter changes the Stochmon values, someone should remove it and reduce the
gain of the LSC PD interface accordingly.
  1184   Sun Dec 7 16:12:53 2008 ranaUpdateDAQbooted awg
because it was red
  1292   Wed Feb 11 10:52:22 2009 YoichiConfigurationDAQC1:PEM-OSA_APTEMP and C1:PEM-OSA_SPTEMP disconnected

During the cleanup of the lab. Steve found a box with two BNCs going to the ICS DAQ interface and an unconnected D-SUB on the floor under the AP table.  It seemed like a temperature sensor.

The BNCs were connected to C1:PEM-OSA_APTEMP and C1:PEM-OSA_SPTEMP.

Steve removed the box from the floor. These channels can be now used as spare DAQ channels. I labeled those cables.

  1769   Tue Jul 21 17:01:18 2009 peteDAQDAQtemp channel PEM-PETER_FE

I added a temporary channel, to input 9 on the PEM ADCU.    Beware the 30, 31, and 32 inputs.  I tried 32 and it only gave noise.

 

 

  1973   Tue Sep 8 15:14:26 2009 rana, alexConfigurationDAQRAID update to Framebuilder: directories added + lookback increased

 Alex logged in around 10:30 this morning and, at our request, adjusted the configuration of fb40m to have 20 days of lookback.

I wasn't able to get him to elog, but he did email the procedure to us:


1) create a bunch of new "Data???" directories in /frames/full
2) change the setting in /usr/controls/daqdrc file
       set num_dirs=480;

my guess is that the next step is:

3) telnet fb0 8087

    daqd>  shutdown

I checked and we do, in fact, now have 480 directories in /frames/full and are so far using up 11% of our 13TB capacity. Lets try to remember to check up on this so that it doesn't get overfull and crash the framebuilder.

  2073   Fri Oct 9 01:31:56 2009 ranaConfigurationDAQtpchn mystery

Does anyone know if this master file is the real thing that's in use now? Are we really using a file called tpchn_C1_new.par? If anyone sees Alex, please get to the bottom of this.

allegra:daq>pwd
/cvs/cds/caltech/chans/daq
allegra:daq>more master
/cvs/cds/caltech/chans/daq/C1ADCU_PEM.ini
#/cvs/cds/caltech/chans/daq/C1ADCU_SUS.ini
/cvs/cds/caltech/chans/daq/C1LSC.ini
/cvs/cds/caltech/chans/daq/C1ASC.ini
/cvs/cds/caltech/chans/daq/C1SOS.ini
/cvs/cds/caltech/chans/daq/C1SUS_EX.ini
/cvs/cds/caltech/chans/daq/C1SUS_EY.ini
/cvs/cds/caltech/chans/daq/C1SUS1.ini
/cvs/cds/caltech/chans/daq/C1SUS2.ini
#/cvs/cds/caltech/chans/daq/C1SUS4.ini
/cvs/cds/caltech/chans/daq/C1IOOF.ini
/cvs/cds/caltech/chans/daq/C1IOO.ini
/cvs/cds/caltech/chans/daq/C0GDS.ini
/cvs/cds/caltech/chans/daq/C0EDCU.ini
/cvs/cds/caltech/chans/daq/C1OMC.ini
/cvs/cds/caltech/chans/daq/C1ASS.ini
/cvs/cds/gds/param/tpchn_C1_new.par
/cvs/cds/gds/param/tpchn_C2.par
/cvs/cds/gds/param/tpchn_C3.par

  2075   Fri Oct 9 14:23:53 2009 Alex IvanovConfigurationDAQtpchn mystery

"Yes. This master file is used."

Quote:

Does anyone know if this master file is the real thing that's in use now? Are we really using a file called tpchn_C1_new.par? If anyone sees Alex, please get to the bottom of this.

allegra:daq>pwd
/cvs/cds/caltech/chans/daq
allegra:daq>more master
/cvs/cds/caltech/chans/daq/C1ADCU_PEM.ini
#/cvs/cds/caltech/chans/daq/C1ADCU_SUS.ini
/cvs/cds/caltech/chans/daq/C1LSC.ini
/cvs/cds/caltech/chans/daq/C1ASC.ini
/cvs/cds/caltech/chans/daq/C1SOS.ini
/cvs/cds/caltech/chans/daq/C1SUS_EX.ini
/cvs/cds/caltech/chans/daq/C1SUS_EY.ini
/cvs/cds/caltech/chans/daq/C1SUS1.ini
/cvs/cds/caltech/chans/daq/C1SUS2.ini
#/cvs/cds/caltech/chans/daq/C1SUS4.ini
/cvs/cds/caltech/chans/daq/C1IOOF.ini
/cvs/cds/caltech/chans/daq/C1IOO.ini
/cvs/cds/caltech/chans/daq/C0GDS.ini
/cvs/cds/caltech/chans/daq/C0EDCU.ini
/cvs/cds/caltech/chans/daq/C1OMC.ini
/cvs/cds/caltech/chans/daq/C1ASS.ini
/cvs/cds/gds/param/tpchn_C1_new.par
/cvs/cds/gds/param/tpchn_C2.par
/cvs/cds/gds/param/tpchn_C3.par

 

  3216   Wed Jul 14 11:54:33 2010 josephbUpdateDAQDebugging Guralp and reboots

This is regards to zero signal being reported by the channels C1:PEM-SEIS_GUR1_X, C1:PEM-SEIS_GUR1_Y, and C1:PEM-SEIS_GUR1_Z.

I briefly swapped Guralp 1 EW and Guralp 2 EW to confirm to myself that it was not on the gurlap end (although the fact that its digital zero is highly indicative a digital realm problem).  I then unplugged the 17-32, and then 1-16 channel connections to the 110B.  I saw floating noise on the GUR2 channels, but still digital zero on the GUR1 channels, which means its not the BNC break out box.

There was a spare 110B, unconnected in the crate, so to do a quick test of the 110B, I turned off the crate and swapped the 110Bs, after copying the switch configuration of the first 110B to the second one.  The original 110B was labeled ADC 1, while the second 110B was labeled ADC 0.  The switches were identical except for the ones closest to the Dsub connectors on the front.  All those switches in that set were to the right, when looking down at the switches and the Dsub connectors pointing towards yourself.

Unfortunately, the c0duc1 never seemed to come up with the new 110B (ADC 0).  So we put the original 110B back.  And turned the crate back on. 

The fb then didn't seem to come back quite right.  We tried rebooting fb40m it, but its still red with status 1.  c0daqctrl is green, but c0dcu1 is red, although I'm not positive if thats due to fb40m being in a strange state.  Jenne tried a telnet in to port 8087 and shutdown, but that didn't seem to help.  At this point, we're going to contact Alex when he gets in around 12:30.

 

  3220   Wed Jul 14 16:39:06 2010 JenneUpdateDAQDebugging Guralp and reboots

[Joe, Jenne]

Joe got on the phone with Alex, and Alex's magic Alex intuition told him to ask about the RFM switch.  The C0DAQ_CTRL's overload light was orangeAlex suggested hitting the reset button on that RFM switch, which we did. That fixed everything -> c0dcu1 came back, as did the frame builder.  Rana had pointed out earlier that we could have brought back all of the other front ends, and enabled the damping of the optics even though the FB was still down.  It's okay to leave the front ends & watchdogs on, and just reboot the FB, AWG, and DAQ_CTRL computers if that is necessary.

Anyhow, once the FB was back online, we got around to bringing back all of the front ends (as usual, except for the ones which are unplugged because they're in the middle of being upgraded).  Everything is back online now.

After all of this craziness, all of the Guralp channels are working happily again. It is still unknown why they starting being digital zero, but they're back again. Maybe I should have rebooted the frame builder in addition to c0dcu1 last night?

 

Quote:

This is regards to zero signal being reported by the channels C1:PEM-SEIS_GUR1_X, C1:PEM-SEIS_GUR1_Y, and C1:PEM-SEIS_GUR1_Z.

I briefly swapped Guralp 1 EW and Guralp 2 EW to confirm to myself that it was not on the gurlap end (although the fact that its digital zero is highly indicative a digital realm problem).  I then unplugged the 17-32, and then 1-16 channel connections to the 110B.  I saw floating noise on the GUR2 channels, but still digital zero on the GUR1 channels, which means its not the BNC break out box.

There was a spare 110B, unconnected in the crate, so to do a quick test of the 110B, I turned off the crate and swapped the 110Bs, after copying the switch configuration of the first 110B to the second one.  The original 110B was labeled ADC 1, while the second 110B was labeled ADC 0.  The switches were identical except for the ones closest to the Dsub connectors on the front.  All those switches in that set were to the right, when looking down at the switches and the Dsub connectors pointing towards yourself.

Unfortunately, the c0duc1 never seemed to come up with the new 110B (ADC 0).  So we put the original 110B back.  And turned the crate back on. 

The fb then didn't seem to come back quite right.  We tried rebooting fb40m it, but its still red with status 1.  c0daqctrl is green, but c0dcu1 is red, although I'm not positive if thats due to fb40m being in a strange state.  Jenne tried a telnet in to port 8087 and shutdown, but that didn't seem to help.  At this point, we're going to contact Alex when he gets in around 12:30.

 

 

  3247   Mon Jul 19 21:47:36 2010 ranaSummaryDAQDAQ timing test

Since we now have a good measurement of the phase noise of the Rb clock Marconi locked to the Rb clock, I wanted to use that to check out the old DAQ system:

I used Megan's phase noise setup - Marconi #2 is putting out 11000013 Hz at 13 dBm into the ZP-3MH mixer. Marconi #1 is putting out 3 dBm at 11000000 Hz into the RF input.

The output goes through a 50 Ohm load and then a Mini-Circuits BNC LP filter (either 2 or 5 MHz). Then an SR560 set for low noise, G = 5, AC coupling, 1-pole LP @ 1 kHz.

This SR560 output goes into the channel C1:IOO-MC_DRUM1 (which is sampled at 16384 Hz with ICS-110B after the usual Sander Liu AA chassis containing the INA134s).

  3299   Tue Jul 27 16:03:36 2010 ranaSummaryDAQDAQ timing test

Quote:

Since we now have a good measurement of the phase noise of the Rb clock Marconi locked to the Rb clock, I wanted to use that to check out the old DAQ system:

I used Megan's phase noise setup - Marconi #2 is putting out 11000013 Hz at 13 dBm into the ZP-3MH mixer. Marconi #1 is putting out 3 dBm at 11000000 Hz into the RF input.

The output goes through a 50 Ohm load and then a Mini-Circuits BNC LP filter (either 2 or 5 MHz). Then an SR560 set for low noise, G = 5, AC coupling, 1-pole LP @ 1 kHz.

This SR560 output goes into the channel C1:IOO-MC_DRUM1 (which is sampled at 16384 Hz with ICS-110B after the usual Sander Liu AA chassis containing the INA134s).

 This is the 0.3 mHz BW spectrum of this test - as you can see the apparent linewidth (assuming the width is all caused by the DAQ jitter) is comparable to the BW and therefore not resolved.

Basically, the Hanning window function is not sharp enough to do this test and so I will do it offline in Matlab.

Attachment 1: Untitled.png
Untitled.png
  3657   Wed Oct 6 00:32:01 2010 ranaSummaryDAQNDS2

This is the link to the NDS2 webpage:

https://www.lsc-group.phys.uwm.edu/daswg/wiki/NetworkDataServer2

We should install this so that we can use this modern interface to get 40m data from outside and inside of the 40m.

  3702   Tue Oct 12 23:45:55 2010 ranaConfigurationDAQNDS2

I installed the NDS2 Client onto the workstations today using the instructions that Zach put onto the Wiki with a couple of modifications.

1) Instead of the adding path stuff in Matlab, I added the LD_LIBRARY_PATH and MATLABPATH variables into the .cshrc as instructed by JZ's NDS2 Wiki.

2) I installed the stuff into the shared /cvs/cds/caltech/apps/linux64/ partition so that it works now on all the 64-bit CentOS 5.5 workstations.

To run it you do:

> kinit albert.einstein

> matlab -nodesktop -nosplash

> help NDS2_GetData

(set the server to the NDS2 server that you like - the example in the help is fine)

> result = NDS2_GetData({'L1:LSC-DARM_ERR'}, 957313530, 10, server);

> plot(result.data)

Now you can get any of the S6 data super fast.

(** Remember to run kdestroy as soon as you are finished so that no one else in the control room can use your personal credentials. **)

Attachment 1: cerberus.jpg
cerberus.jpg
  3939   Wed Nov 17 15:49:53 2010 ranaUpdateDAQOle Channel Names

The following channels should be named as below to keep in line with their names pre-upgrade rather than use _DAQ in the name.

C1:SUS-{OPT}_{POS,PIT,YAW}

SUS{POS,PIT,YAW}_IN1
C1:SUS-{OPT}_OPLEV_{P,Y}ERROR

OL{PIT,YAW}_IN1

C1:SUS-{OPT}_SENSOR_{UL,UR,LL,LR,SIDE}
{UL,UR,LL,LR,SD}SEN_OUT
C1:SUS-{OPT}_OPLEV_{P,Y}OUT
OL{PIT,YAW}_OUT
C1:IOO-MC_TRANSPD
MC2_OLSUM_IN1

 

  4109   Wed Jan 5 00:23:30 2011 ranaSummaryDAQFrameBuilder fails in a new way

Since Leo was trying to demo his LIGO Data Listener code, he noticed that there was and NDS2 issue. The NDS2 guy (JZ) noticed that the FrameBuilder had an issue.

We investigated. At 4PM on Dec 31, the GPS timestamp of the frame file names started to be recorded wrong. In fact, it started to give it a file name matching the correct time from 1 year in the past.

So that's our version of the Y2011 bug. Here's the 'ls' of /frames/full:

drwxr-xr-x 2 controls controls 252K Dec 26 03:59 9773
drwxr-xr-x 2 controls controls 260K Dec 27 07:46 9774
drwxr-xr-x 2 controls controls 256K Dec 28 11:33 9775
drwxr-xr-x 2 controls controls 252K Dec 29 15:19 9776
drwxr-xr-x 2 controls controls 244K Dec 30 19:06 9777
drwxr-xr-x 2 controls controls 188K Dec 31 16:00 9778
drwxr-xr-x 2 controls controls 148K Jan  1 08:53 9463
drwxr-xr-x 2 controls controls 260K Jan  2 12:39 9464
drwxr-xr-x 2 controls controls 252K Jan  3 16:26 9465
drwxr-xr-x 2 controls controls 248K Jan  4 20:13 9466
drwxr-xr-x 2 controls controls  36K Jan  5 00:22 9467
controls@fb /frames/full $

The culprit is the directory who's name starts out as 9463 whereas it should be 9779.

 

  4112   Wed Jan 5 16:00:11 2011 rana, alexSummaryDAQFrameBuilder fails in a new way

Email from Alex:

Turned out to be the lack of current year information in the IRIG-B signal
received by the Symmetricom GPS card in the frame builder machine caused
this. I have added a constant in daqdrc to bring the seconds forward:

controls@fb /opt/rtcds/caltech/c1/target/
fb $ grep symm daqdrc
#set symm_gps_offset=-1;
set symm_gps_offset=31536001;

Hopefully we will be upgrading to the newer timing system at the 40M this
year, so this will not happen again next year.


 

Doing an 'ls -lrt' in /frames/full/ now shows that the names are correct:

drwxr-xr-x 2 controls controls 249856 Dec 30 19:06 9777
drwxr-xr-x 2 controls controls 192512 Dec 31 16:00 9778
drwxr-xr-x 2 controls controls 151552 Jan  1 08:53 9463
drwxr-xr-x 2 controls controls 266240 Jan  2 12:39 9464
drwxr-xr-x 2 controls controls 258048 Jan  3 16:26 9465
drwxr-xr-x 2 controls controls 253952 Jan  4 20:13 9466
drwxr-xr-x 2 controls controls 151552 Jan  5 13:54 9467
drwxr-xr-x 2 controls controls  12288 Jan  5 15:57 9783

  4115   Wed Jan 5 22:14:41 2011 ranaSummaryDAQFrameBuilder fails in a new way

Just a proof that the DAQ is working - ran DTT on nodus from 3 hours ago.

Attachment 1: Screen_shot_2011-01-05_at_10.13.21_PM.png
Screen_shot_2011-01-05_at_10.13.21_PM.png
  4185   Fri Jan 21 23:17:54 2011 ranaHowToDAQDAQ Wiki Failure

The DAQ Wiki pages say to use port 8088 for restarting the Frame Builder. I tried this to no avail.

op440m:daq>telnet fb 8088
Trying 192.168.113.202...
Connected to fb.martian.
Escape character is '^]'.
^]
telnet> quit
Connection to fb.martian closed.
op440m:daq>telnet fb 8087
Trying 192.168.113.202...
Connected to fb.martian.
Escape character is '^]'.
daqd> shutdown
OK
Connection to fb.martian closed by foreign host.

Apparently, 8087 is the right port. Various elog entries from Joe and Kiwamu say 8087 or 8088. Not sure what's going on here.

After figuring this out, I activated the C1:GCV-XARM_COARSE_OUT_DAQ and C1:GCV-XARM_FINE_OUT_DAQ and set both of them to be recorded at 2048 Hz. We are loading filters and setting gains into these filter modules such that the OUT signals will be calibrated into Hz (that's why we used the OUT instead of the IN1 as there was last night).

  4194   Mon Jan 24 10:39:16 2011 josephbHowToDAQDAQ Wiki Failure

Actually both port 8087 and 8088 work to talk to the frame builder.  Don't let the lack of a daqd prompt fool you.

 

Here's putting in the commands:

rosalba:~>telnet fb 8088 Trying 192.168.113.202...

Connected to fb.martian (192.168.113.202). Escape character is '^]'.

shutdown

0000Connection closed by foreign host.

rosalba:~>date Mon Jan 24 10:30:59 PST 2011

 

Then looking at the last 3 lines of restart.log in /opt/rtcds/caltech/c1/target/fb/

daqd_start Fri Jan 21 15:20:48 PST 2011

daqd_start Fri Jan 21 23:06:38 PST 2011

daqd_start Mon Jan 24 10:30:29 PST 2011

 

So clearly its talking to the frame builder, it just doesn't have the right formatting for the prompt.  If you try typing in "help" at the prompt, you still get all the frame builder commands listed and can try using any of them.

However, I'll edit the DAQ wiki and indicate 8087 should be used because of the better formatting for the prompt.


Quote:
Apparently, 8087 is the right port. Various elog entries from Joe and Kiwamu say 8087 or 8088. Not sure what's going on here.

After figuring this out, I activated the C1:GCV-XARM_COARSE_OUT_DAQ and C1:GCV-XARM_FINE_OUT_DAQ and set both of them to be recorded at 2048 Hz. We are loading filters and setting gains into these filter modules such that the OUT signals will be calibrated into Hz (that's why we used the OUT instead of the IN1 as there was last night).

 

  4319   Thu Feb 17 23:41:46 2011 ranaFrogsDAQFrames Directory got the wrong name: Data unreachable

DTT stopped working for recent data. An 'ls' in the frames/full/ directory reveals:

drwxr-xr-x 2 controls controls 258048 Feb  3 12:26 9807
drwxr-xr-x 2 controls controls 258048 Feb  4 16:13 9808
drwxr-xr-x 2 controls controls 262144 Feb  5 19:59 9809
drwxr-xr-x 2 controls controls 258048 Feb  6 23:46 9810
drwxr-xr-x 2 controls controls 258048 Feb  8 03:33 9811
drwxr-xr-x 2 controls controls 262144 Feb  9 07:19 9812
drwxr-xr-x 2 controls controls 253952 Feb 10 11:06 9813
drwxr-xr-x 2 controls controls 266240 Feb 11 14:53 9814
drwxr-xr-x 2 controls controls 266240 Feb 12 18:39 9815
drwxr-xr-x 2 controls controls 266240 Feb 13 22:26 9816
drwxr-xr-x 2 controls controls 262144 Feb 15 02:13 9817
drwxr-xr-x 2 controls controls 253952 Feb 16 05:59 9818
drwxr-xr-x 2 controls controls 241664 Feb 17 09:46 9819
drwxr-xr-x 2 controls controls  28672 Feb 17 12:22 9820
drwxr-xr-x 2 controls controls  32768 Feb 17 15:06 6663
drwxr-xr-x 2 controls controls  73728 Feb 17 23:39 6664
controls@fb /frames/full $ date
Thu Feb 17 23:39:27 PST 2011

  4407   Sun Mar 13 00:00:58 2011 jzweizig, ranaConfigurationDAQNDS2 code change and restart

 John has changed the NDS2 code and restarted it on Mafalda. The issue is that it goes off the rails everytime the DAQD is restarted on FB because of filename convention war between GDS and CDS.

Until this is resolved, please make sure to restart the NDS2 process on Mafalda everytime you restart DAQD by doing this:

pkill -KILL nds2

/users/jzweizig/nds2-mafalda/start_nds2

  4705   Thu May 12 22:54:20 2011 ranaUpdateDAQInput Beam Naming change (no more IP)

 We decided to rename the Input Beam channels (while keeping temporary backwards compatible aliases) as:

C1:ASC-IB_POS_X, C1:ASC-IB_POS_Y, C1:ASC-IB_ANG_SUM, etc.

  4779   Thu Jun 2 10:19:37 2011 Alex IvanovSummaryDAQinstalled new daqd (frame builder) program on fb (target/fb/daqd)

I hope that new daqd code will fix the problem with non-aligned at 16 seconds frame file GPS times.

I have compiled new daqd program under /opt/rtcds/caltech/c1/core/release/build/mx and installed it under

target/fb/daqd, then restarted daqd process on "fb" computer. It was installed with the ownership of user root

and I did chmod +s on it (set UID on execution bit). This was done in order to turn on some code to renice daqd process

to the value of -20 on the startup. Currently it runs as the lowest nice value (high priority).

 

controls@fb /opt/rtcds/caltech/c1/target/fb $ ls -alt daqd
-rwsr-sr-x 1 root controls 6592694 Jun  2 10:00 daqd

 

Backup daqd is here:

 

controls@fb /opt/rtcds/caltech/c1/target/fb $ ls -alt daqd.02jun11
-rwxr-xr-x 1 controls controls 6768158 Feb 21 11:30 daqd.02jun11

 

 

  4926   Thu Jun 30 21:55:16 2011 ranaConfigurationDAQNDS2 conf change

As I recently had trouble getting all of the SUS SENSOR channels at once from NDS2, I asked J.Z. for help. He found that the number of buffers on mafalda was set to only allow a small amount of data to be requested at one time.

He's going to have to figure out a more permanent fix, but for now he's increased the data buffer size to allow somewhat larger chunks to be gotten. I have made a work around in matlab, which gets smaller chunks and then cats them together.

Its in SUS/peakFit/.

Attachment 1: Untitled.png
Untitled.png
  4992   Tue Jul 19 21:05:55 2011 haixingUpdateDAQchoose the right relay

Rana and I are working on the AA/AI circuit for Cymac. We need relays to bypass certain paths in the circuit, and we just found a nice website
explaining how to choose the right relay:

http:/zone.ni.com/devzone/cda/tut/p/id/2774

This piece of information could be useful for others.

  6381   Wed Mar 7 21:13:30 2012 ranaUpdateDAQNDS2

 I noticed that NDS2 was not running on mafalda as it should be. Instead, there were a couple of zombie MEDMs using up 99% of the CPU. I killed the zombies and have run the 'build channel list' script. When it finished, I tried to restart the nds server, but got the following error in the log file. Email has been dispatched to JZ.

mafalda:logs>less nds2-mafalda-201203072111.log

Configuring from file: nds2.conf
Allow list: ALL
terminate called after throwing an instance of 'std::runtime_error'
  what():  Insufficient arguments
  8861   Tue Jul 16 19:16:12 2013 ranaUpdateDAQNDS2 Status

I have modified the settings on the router that connects our Martian network to the outside world so that one can access the NDS2 server running on megatron:31200.

To get at the data you point your data getting client (Matlab, ligoDV, DTT, etc.) at our router and the megatron port will be forwarded to you:

131.215.115.189:31200

is what you should point to. Now, it should be possible to run DetChar jobs (e.g. our 40m Summary pages) from the outside on some remote server. You can also grab 40m data on your laptop directly by using matlab or python NDS software.

  10507   Mon Sep 15 18:55:51 2014 ranaUpdateDAQ40m frames onto the cluster

 Dan Kozak is rsync transferring /frames from NODUS over to the LDAS grid. He's doing this without a BW limit, but even so its going to take a couple weeks. If nodus seems pokey or the net connection to the outside world is too tight, then please let me and him know so that he can throttle the pipe a little.

  10632   Wed Oct 22 21:06:33 2014 ChrisUpdateDAQ40m frames onto the cluster

Quote:

 Dan Kozak is rsync transferring /frames from NODUS over to the LDAS grid. He's doing this without a BW limit, but even so its going to take a couple weeks. If nodus seems pokey or the net connection to the outside world is too tight, then please let me and him know so that he can throttle the pipe a little.

The recently observed daqd flakiness looks related to this transfer. It appears to still be ongoing:

nodus:~>ps -ef | grep rsync
controls 29089   382  5 13:39:20 pts/1   13:55 rsync -a --inplace --delete --exclude lost+found --exclude .*.gwf /frames/trend
controls 29100   382  2 13:39:43 pts/1    9:15 rsync -a --delete --exclude lost+found --exclude .*.gwf /frames/full/10975 131.
controls 29109   382  3 13:39:43 pts/1    9:10 rsync -a --delete --exclude lost+found --exclude .*.gwf /frames/full/10978 131.
controls 29103   382  3 13:39:43 pts/1    9:14 rsync -a --delete --exclude lost+found --exclude .*.gwf /frames/full/10976 131.
controls 29112   382  3 13:39:43 pts/1    9:18 rsync -a --delete --exclude lost+found --exclude .*.gwf /frames/full/10979 131.
controls 29099   382  2 13:39:43 pts/1    9:14 rsync -a --delete --exclude lost+found --exclude .*.gwf /frames/full/10974 131.
controls 29106   382  3 13:39:43 pts/1    9:13 rsync -a --delete --exclude lost+found --exclude .*.gwf /frames/full/10977 131.
controls 29620 29603  0 20:40:48 pts/3    0:00 grep rsync

Diagnosing the problem:

I logged into fb and ran "top". It said that fb was waiting for disk I/O ~60% of the time (according to the "%wa" number in the header). There were 8 nfsd (network file server) processes running with several of them listed in status "D" (waiting for disk). The daqd logs were ending with errors like the following suggesting that it couldn't keep up with the flow of data:

[Wed Oct 22 18:58:35 2014] main profiler warning: 1 empty blocks in the buffer
[Wed Oct 22 18:58:36 2014] main profiler warning: 0 empty blocks in the buffer
GPS time jumped from 1098064730 to 1098064731

This all pointed to the possibility that the file transfer load was too heavy.

Reducing the load:

The following configuration changes were applied on fb.

Edited /etc/conf.d/nfs to reduce the number of nfsd processes from 8 to 1:

OPTS_RPC_NFSD="1"

(was "8")

Ran "ionice" to raise the priority of the framebuilder process (daqd):

controls@fb /opt/rtcds/rtscore/trunk/src/daqd 0$ sudo ionice -c 1 -p 10964

And to reduce the priority of the nfsd process:

controls@fb /opt/rtcds/rtscore/trunk/src/daqd 0$ sudo ionice -c 2 -p 11198

I also tried punishing nfsd with an even lower priority ("-c 3"), but that was causing the workstations to lag noticeably.

After these changes the %wa value went from ~60% to ~20%, and daqd seems to die less often, but some further throttling may still be in order.

  11265   Fri May 1 13:22:08 2015 ericqUpdateDAQPEM Slow channels added to saved frames

Rana asked me to include add slow outputs (OUT16) of the seismometer BLRMS channels to the frames. 

All of the PEM slow channels are already set up in c1/chans/daq/C1EDCU_PEM.ini, but up to this point, daqd had no knowledge of this file, since it wasn't included in c1/target/fb/master, which defines all the places to look for files describing channels to be written to disk. This file already includes lines for C1EDCU_LSC.ini and such, which from old elogs, looks like was set up by hand for subsystems we care about. 

Hence, since we now care about slow trends for the PEM subsystem, I have added a line to the daqd master file to tell it to save the PEM slow channels. This looks to have increased the size of the individual 16 second frame files from 57MB to 59MB, which isn't so bad.

  11266   Fri May 1 16:42:42 2015 ranaUpdateDAQPEM Slow channels added to saved frames

Still processing, but I think it should work fine once we have a day of data. Until then, here's the summary pages so far, including Vac channels:

http://www.ligo.caltech.edu/~misi/summary/day/20150501/pem/

  11627   Mon Sep 21 15:22:19 2015 jamieUpdateDAQworking on new fb replacement

I've been putting together a new machine that Rolf got for us as a replacement for fb.

I've installed and configured the OS, and compiled daqd and the necessary supporting software.  I want to try acquiring data with it.  This will require removing the current/old fb from the DAQ network, and adding the new machine.  It should be able to be done relatively non-invasively, such that none of the front end configuration needs to be adjusted, and the old fb can be put back in place easily.

If the test is successfully, then I'll push ahead with the rest of the replacement (such as either moving or copying the /frames RAID to the new machine).

I will do this work in the early AM tomorrow, September 22, 2015.

  11636   Tue Sep 22 17:30:55 2015 jamieUpdateDAQattempts at getting new fb working

Today I've been trying to get the new frame builder, tentatively 'fb1', to work.  It's not fully working yet, so I'm about to revert the system back to using 'fb'.  The switch-over process is annoying, since our one myrinet card has to be moved between the hosts.

A brief update on the process so far:

I'm being a little bold with this system by trying to build daqd against more system libraries, instead of the manually installed stuff usually nominally required.  Here's some of the relevant info about th fb1 system:

  • Debian 7 (wheezy)
  • lscsoft ldas-tools-framecpp-dev 2.4.1-1+deb7u0
  • lscsoft gds-dev 2.17.2-2+deb7u0
  • lscsoft libmetaio-dev 8.4.0-1+deb7u0
  • lscsoft libframe-dev 8.20-1+deb7u0
  • /opt/rtapps/epics-1.4.12.2_long
  • /opt/mx-1.2.16
  • advLigoRTS trunk

I finally managed to get daqd to build against the advLigoRTS trunk (post 2.9 branch).  I'll post detailed build log once I work out all the kinks.  It runs ok, including writing out full frames, as well as second and minute trends and raw minute trends, but there are a couple of show-stopper problems:

  • daqd segfaults if the C1EDCU.ini is specified.  If I comment out that one file from the 'master' channel ini file list then it runs without segfaulting.
  • Something is going on with the mx_streams from the front ends:
    • They appear to look ok from the daqd side, but the FEC-<ID>_FB_NET_STATUS indicators remain red.  The "DAQ" bit in the STATE_WORD is also red.  Again, this is even though data seems to be flowing.
    • The mx_stream processes on the front ends are dying (and restarting via monit) about every 2 minutes.  It's unclear what exactly is happening, but they all dia around the same time, so it possibly initiated from a daqd problem.  Around the time of the mx_stream failures, we see this in the daqd log:
[Tue Sep 22 17:24:07 2015] GPS MISS dcu 91 (TST); dcu_gps=1127003062 gps=1127003063

Aborted 1 send requests due to remote peer Aborted 1 send requests due to remote peer 00:25:90:0d:75:bb (c1sus:0) disconnected
mx_wait failed in rcvr eid=004, reqn=11; wait did not complete; status code is Remote endpoint is closed
00:30:48:d6:11:17 (c1iscey:0) disconnected
mx_wait failed in rcvr eid=002, reqn=235; wait did not complete; status code is Remote endpoint is closed
disconnected from the sender on endpoint 002
mx_wait failed in rcvr eid=005, reqn=253; wait did not complete; status code is Bad session (missing mx_connect?)
disconnected from the sender on endpoint 005
disconnected from the sender on endpoint 004
[Tue Sep 22 17:24:13 2015] GPS MISS dcu 39 (PEM); dcu_gps=1127003062 gps=1127003069
  • Occaissionally the daqd process dies when the front end mx_streams processes die.

I'll keep investigating, hopefully with some feedback from Keith and Rolf tomorrow.

  11645   Fri Sep 25 17:51:11 2015 jamieUpdateDAQfb replacement work update

Brief update about the fb replacement status.

The new hardware for fb is in the rack, temporarily sitting on top of megatron, and on the CDS network with the name 'fb1'.  I've installed an OS on it and have re-built daqd.

Earlier this week I swapped it into the network and tried to get it to acquire data from the front ends.  I was ultimately unsuccessfully.  The problem seemed to be the mx_stream communication from the front ends to the new host.

The swap is sort of a pain because we only have one Myrinet fiber network adapter card that has to be moved between machines, which of course requires shutting down both machines and opening up their chassis.  I instructed Steve to order us a new Myrinet card for the new machine, which will allow us to swap daqd machines by just moving the fiber connection.  Once that's in place (early next week) I'll go back to trying to figure out what the issue is with the mx_streams.

If all else fails I'll take the repulsive last resort of either swapping or cloning the disk from the old fb.

  11653   Wed Sep 30 13:59:49 2015 jamieUpdateDAQattempts at getting new fb working

I got Steve to get us a new Myrinet fiber network adapter card for fb1:

  • Myrinet 10G-PCIE-8B-S

I just finished installing the card in fb1, and it came up fine.  We happened to have a spare fiber, and a spare fiber jack in the DAQ switch, so I went ahead and plugged it in in parallel to the old fb:

controls@fb1:~/rtbuild/trunk 130$ /opt/mx/bin/mx_info
MX Version: 1.2.16
MX Build: controls@fb1:/opt/src/mx-1.2.16 Fri Sep 18 18:32:59 PDT 2015
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:         Running, P0: Link Up
    Network:        Ethernet 10G

    MAC Address:    00:60:dd:43:74:62
    Product code:   10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:  485052
    Mapper:         00:60:dd:46:ea:ec, version = 0x00000000, configured
    Mapped hosts:   7

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:43:74:62 fb1:0                             1,0
   1) 00:25:90:0d:75:bb c1sus:0                           1,0
   2) 00:30:48:be:11:5d c1iscex:0                         1,0
   3) 00:30:48:d6:11:17 c1iscey:0                         1,0
   4) 00:30:48:bf:69:4f c1lsc:0                           1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
   6) 00:60:dd:46:ea:ec fb:0                              1,0

We can now work on fb1 while fb continues to run and collect data from the front ends.

I'm still not getting the mx_stream connections to the new fb1 daq to work.  I'm leaving everything running as is on fb for the moment.

ELOG V3.1.3-