40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 207 of 339  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  13060   Mon Jun 12 17:42:39 2017 gautamUpdateASSETMY Oplev Pentek board pulled out

As part of my Oplev servo investigations, I have pulled out the Pentek Generic Whitening board (D020432) from the Y-end electronics rack. ETMY watchdog was shutdown for this, I will restore it once the Oplev is re-installed.

  13063   Wed Jun 14 18:15:06 2017 gautamUpdateASSETMY Oplev restored

I replaced the Pentek Generic Whitening Board and the Optical Lever PD Interface Board (D010033) which I had pulled out. The ETMY optical lever servo is operational again. I will post a more detailed elog with deviations from schematics + photos + noise and TF measurements shortly.

Quote:

As part of my Oplev servo investigations, I have pulled out the Pentek Generic Whitening board (D020432) from the Y-end electronics rack. ETMY watchdog was shutdown for this, I will restore it once the Oplev is re-installed.

 

  13064   Thu Jun 15 01:56:50 2017 gautamUpdateASSETMY Oplev restored

Summary:

I tried playing around with the Oplev loop shape on ITMY, in order to see if I could successfully engage the Coil Driver whitening. Unfortunately, I had no success tonight.

Details:

I was trying to guess a loop shape that would work - I guess this will need some more careful thought about loop shape optimization. I was basically trying to keep all the existing filters, and modify the low-passing that minimizes control noise injection. By adding a 4th order elliptic low pass with corner at 50Hz and stopband attenuation of 60dB yielded a stable loop with upper UGF of ~6Hz and ~25deg of phase margin (which is on the low side). But I was able to successfully engage this loop, and as seen in Attachment #1, the noise performance above 50Hz is vastly improved. But it also seems that there is some injection of noise around 6Hz. In any case, as soon as I tried to engage the dewhitening, the DAC output quickly saturated. The whitening filter for the ITMs has ~40dB of gain at ~40Hz already, so it looks like the high frequency roll-off has to be more severe.

I am not even sure if the Elliptic filter is the right choice here - it does have the steepest roll off for a given filter order, but I need to look up how to achieve good roll off without compromising on the phase margin of the overall loop. I am going to try and do the optimization in a more systematic way, and perhaps play around with some of the other filters' poles and zeros as well to get a stable controller that minimizes control noise injection everywhere.

Attachment 1: ITMY_OLspec.pdf
ITMY_OLspec.pdf
  13069   Fri Jun 16 13:53:11 2017 gautamUpdateCDSslow machine bootfest

Reboots for c1psl, c1iool0, c1iscaux today. MC autolocker log was complaining that the C1:IOO-MC_AUTOLOCK_BEAT EPICS channel did not exist, and running the usual slow machine check script revealed that these three machines required reboots. PMC was relocked, IMC Autolocker was restarted on Megatron and everything seems fine now.

 

  13079   Sun Jun 25 22:30:57 2017 gautamUpdateGeneralc1iscex timing troubles

I saw that the CDS overview screen indicated problems with c1iscex (also ETMX was erratic). I took a closer look and thought it might be a timing issue - a walk to the X-end confirmed this, the 1pps status light on the timing slave card was no longer blinking. 

I tried all versions of power cycling and debugging this problem known to me, including those suggested in this thread and from a more recent time. I am leaving things as it for the night, will look into this more tomorrow. I've also shutdown the ETMX watchdog for the time being. Looks like this has been down since 24Jun 8am UTC.

Attachment 1: c1iscex_status.png
c1iscex_status.png
  13082   Tue Jun 27 16:11:28 2017 gautamUpdateElectronicsCoil whitening

I got back to trying to engage the coil driver whitening today, the idea being to try and lock the DRMI in a lower noise configuration - from the last time we had the DRMI locked, it was determined that A2L coupling from the OL loops and coil driver noise were dominant from ~10-200Hz. All of this work was done on the Y-arm, while the X-arm CDS situation is being resolved.

To re-cap, every time I tried to do this in the last month or so, the optic would get kicked around. I suspected that the main cause was the insufficient low-pass filtering on the Oplev loops, which was causing the DAC rms to rail when the whitening was turned on. 

I had tried some loop-tweaking by hand of the OL loops without much success last week - today I had a little more success. The existing OL loops are comprised of the following:

  • Differentiator at low frequencies (zero at DC, 2 poles at 300Hz)
  • Resonant gain peaked around 0.6 Hz with a Q of ______ (to be filled in)
  • BR notches 
  • A 2nd order elliptic low pass with 2dB passband ripple and 20dB stopband attenutation

THe elliptic low pass was too shallow. For a first pass at loop shaping today, I checked if the resonant gain filter had any effect on the transmitted power RMS profile - turns out it had negligible effect. So I disabled this filter, replaced the elliptic low pass with a 5th order ELP with 2dB passband ripple and 80dB stopband attenuation. I also adjusted the overall loop gain to have an upper UGF for the OL loops around 2Hz. Looking at the spectrum of one coil output in this configuration (ITMY UL), I determined that the DAC rms was no longer in danger of railing.

However, I was still unable to smoothly engage the de-whitening. The optic again kept getting kicked around each time I tried. So I tried engaging the de-whitening on the ITM with just the local damping loop on, but with the arm locked. This transition was successful, but not smooth. Looking at the transmon spot on the camera, every time I engage the whitening, the spot gets a sizeable kick (I will post a video shortly).  In my ~10 trials this afternoon, the arm is able to stay locked when turning the whitening on, but always loses lock when turning the whitening off. 

The issue here is certainly not the DAC rms railing. I had a brief discussion with Gabriele just now about this, and he suggested checking for some electronic voltage offset between the two paths (de-whitening engaged and bypassed). I also wonder if this has something to do with some latency between the actual analog switching of paths (done by a slow machine) and the fast computation by the real time model? To be investigated.

GV 170628 11pm: I guess this isn't a viable explanation as the de-whitening switching is handled by the one of the BIO cards which is also handled by the fast FEs, so there isn't any question of latency.

With the Oplev loops disengaged, the initial kick given to the optic when engaging the whitening settles down in about a second. Once the ITM was stable again, I was able to turn on both Oplev loops without any problems. I did not investigate the new Oplev loop shape in detail, but compared to the original loop shape, there wasn't a significant difference in the TRY spectrum in this configuration (plot to follow). This remains to be done in a systematic manner. 

Plots to support all of this to follow later in the evening.

Attachment #1: Video of ETMY transmission CCD while engaging whitening. I confirmed that this "glitch" happens while engaging the whitening on the UL channel. This is reminiscent of the Satellite Box glitches seen recently. In that case, the problem was resolved by replacing the high-current buffer in the offending channel. Perhaps something similar is the problem here?

Attachment #2: Summary of the ITMY UL coil output spectra under various conditions.

 

Attachment 1: ETMYT_1182669422.mp4
Attachment 2: ITMY_whitening_studies.pdf
ITMY_whitening_studies.pdf
  13085   Wed Jun 28 20:15:46 2017 gautamUpdateGeneralc1iscex timing troubles

[Koji, gautam]

Here is a summary of what we did today to fix the timing issue on c1iscex. The power supply to the timing card in the X end expansion chassis was to blame.

  1. We prepared the Y-end expansion chassis for transport to the X end. To do so, we disconnected the following from the expansion chassis
    • Cables going to the ADC/DAC adaptor boards
    • Dolphin connector
    • BIO connector
    • RFM fiber
    • Timing fiber
  2. We then carried the expansion chassis to the X end electronics rack. There we repeated the above steps for the X-end expansion chassis
  3. We swapped the X and Y end expansion chassis in the X end electronics rack. Powering the unit, we immediately saw the green lights on the front of the timing card turn on, suggesting that the Y-end expansion chassis works fine at the X end as well (as it should). To further confirm that all was well, we were able to successfully start all the RT models on c1iscex without running into any timing issues.
  4. Next, we decided to verify if the spare timing card is functional. So we swapped out the timing card in the expansion chassis brought over to the X end from the Y end with the spare. In this test too, all worked as expected. So at this stage, we concluded that
    • There was nothing wrong with the fiber bringing the timing signal to the X end
    • The Y-end expansion chassis works fine
    • The spare timing card works fine.
  5. Then we decided to try the original X-end expansion chassis timing card in the Y-end expansion chassis. This test too was successful - so there was nothing wrong with any of the timing card!
  6. Next, we decided to power the X-end timing chassis with its original timing card, which was just verified to work fine. Surprisingly, the indicator lights on the timing card did not turn on.
  7. The timing card has 3 external connections
    • A 40 pin IDE connector
    • Power
    • Fiber carrying the timing signal
  8. We went back to the Y-end expansion chassis, and checked that the indicator lights on the timing card turned on even when the 40 pin IDE connector was left unconnected (so the timing card just gets power and the timing signal).
  9. We concluded that the power supply in the X end expansion chassis was to blame. Indeed, when Koji jiggled the connector around a little, the indicator lights came on!
  10. The connection was diagnosed to be somewhat flaky - it employs the screw-in variety of terminal blocks, and one of the connections was quite loose - Koji was able to pull the cable out of the slot applying a little pressure.
  11. I replaced the cabling (swapped the wires for thicker gauge, more flexible variety), and re-tightened the terminal block screws. The connection was reasonably secure even when I applied some force. A quick test verified that the timing card was functional when the unit was powered.
  12. We then replaced the X and Y-end expansion chassis (complete with their original timing cards, so the spare is back in the CDS cabinet), in the racks. The models started up again without complaint, and the CDS overview screen is now in a good state [Attachment #1]. The arms are locked and aligned for maximum transmission now.
  13. There was some additional difficulty in getting the 40-pin IDE connector in on the Y-end expansion chassis. Looked like we had bent some of the pins on the timing board while pulling this cable out. But Koji was able to fix this with a screw driver. Care should be taken when disconnecting this cable in the future!

There were a few more flaky things in the Expansion chassis - the IDE connectors don't have "keys" that fix the orientation they should go in, and the whole timing card assembly is kind of difficult and not exactly secure. But for now, things are back to normal it seems.

Wouldn't it be nice if this fix also eliminates the mystery ETMX glitching problem? After all, seems like this flaky power supply has been a problem for a number of years. Let's keep an eye out.

Attachment 1: CDS_status_28Jun2017.png
CDS_status_28Jun2017.png
  13088   Fri Jun 30 02:13:23 2017 gautamUpdateGeneralDRMI locking attempt

Summary:

I attempted to re-lock the DRMI and try and realize some of the noise improvements we have identified. Summary elog, details to follow.

  1. Locked arms, ran ASS, centered OLs on ITMs and BS on their respective QPDs.
  2. Looked into changing the BS Oplev loop shape to match that of the ITMs - it looks like the analog electronics that take the QPD signals in for the BS Oplev is a little different, the 800Hz poles are absent. But I thought I had managed to do this successfully in that the error signal suppression improved and it didn't look like the performance of the modified loop was worse anywhere except possibly at the stack resonance of ~3Hz --- see Attachment #1 (will be rotated later). The TRX spectra before and after this modification also didn't raise any red flags.
  3. Re-aligned PRM - went to the AS table and centered beam on all REFL PDs
  4. Locked PRMI on carrier, ran MICH and AS dither alignment. PRC angular feedforward also seemed to work well.
  5. Re-aligned SRM, looked for DRMI locks - there was a brief lock of a couple of seconds, but after this, the BS behaviour changed dramatically.

Basically after this point, I was unable to repeat stuff I did earlier in the evening just a couple of hours ago. The single arm locks catch quickly, and seem stable over the hour timescale, but when I run the X arm dither, the BS PITCH loop starts to oscillate at ~0.1 Hz. Moreover, I am unable to acquire PRMI carrier lock. I must have changed a setting somewhere that I am not catching right now (although I've scripted most of these things for repeatability, so I am at a loss what I'm missing indecision). The only change I can think of is that I changed the BS Oplev loop shape. But I went back into the filter file archives and restored these to their original configuration. Hopefully I'll have better luck figuring this out tomorrow.

Attachment 1: BS_OLmods.pdf
BS_OLmods.pdf
  13090   Fri Jun 30 11:50:17 2017 gautamUpdateGeneralDRMI locking attempt

Seems like the problem is actually with ITMX - the attached DV plots are for ITMX with just local damping loops on (no OLs), LR seems to be suspect.

I'm going to go squish cables and the usual sat. box voodoo, hopefully that settles it.

Quote:

Summary:

I attempted to re-lock the DRMI and try and realize some of the noise improvements we have identified. Summary elog, details to follow.

  1. Locked arms, ran ASS, centered OLs on ITMs and BS on their respective QPDs.
  2. Looked into changing the BS Oplev loop shape to match that of the ITMs - it looks like the analog electronics that take the QPD signals in for the BS Oplev is a little different, the 800Hz poles are absent. But I thought I had managed to do this successfully in that the error signal suppression improved and it didn't look like the performance of the modified loop was worse anywhere except possibly at the stack resonance of ~3Hz --- see Attachment #1 (will be rotated later). The TRX spectra before and after this modification also didn't raise any red flags.
  3. Re-aligned PRM - went to the AS table and centered beam on all REFL PDs
  4. Locked PRMI on carrier, ran MICH and AS dither alignment. PRC angular feedforward also seemed to work well.
  5. Re-aligned SRM, looked for DRMI locks - there was a brief lock of a couple of seconds, but after this, the BS behaviour changed dramatically.

Basically after this point, I was unable to repeat stuff I did earlier in the evening just a couple of hours ago. The single arm locks catch quickly, and seem stable over the hour timescale, but when I run the X arm dither, the BS PITCH loop starts to oscillate at ~0.1 Hz. Moreover, I am unable to acquire PRMI carrier lock. I must have changed a setting somewhere that I am not catching right now (although I've scripted most of these things for repeatability, so I am at a loss what I'm missing indecision). The only change I can think of is that I changed the BS Oplev loop shape. But I went back into the filter file archives and restored these to their original configuration. Hopefully I'll have better luck figuring this out tomorrow.

 

Attachment 1: ITMX_glitchy.png
ITMX_glitchy.png
  13093   Fri Jun 30 22:28:27 2017 gautamUpdateGeneralDRMI re-locked

Summary:

Reverted to old settings, tried to reproduce DRMI lock with settings as close to those used in May this year as possible. Tonight, I was successful in getting a couple of ~10min DRMI 1f locks yes. Now I can go ahead and try and reduce the noise.

I am not attempting a full characterization tonight, but the important changes since the May locks are in the de-whitening boards and coil driver boards. I did not attempt to engage the coil-dewhitening, but the PD whitening works fine.

As a quick check, I tested the hypothesis that the BS OL loop A2L coupling dominates between ~10-50Hz. The attached control signal spectra [Attachment #2] supports this hypothesis. Now to actually change the loop shape.

I've centered Oplevs of all vertex optics, and also the beams on the REFL and AS PDs. The ITMs and BS have been repeatedly aligned since re-installing their respective coil driver electronics, but the SRM alignment needed some adjustment of the bias sliders.

Full characterization to follow. Some things to check:

  • Investigate adn fix the suspect X-arm ASS loop
  • Is there too much power on the AS110 PD post Oct2016 vent? Is the PD saturating?

Lesson learnt: Don't try and change too many things at once!

GV July 5 1130am: Looks like the MICH loop gain wasn't set correctly when I took the attached spectra, seems like the bump around 300Hz was caused by this. On later locks, this feature wasn't present.

Attachment 1: DRMI_relocked.png
DRMI_relocked.png
Attachment 2: MICH_OL.pdf
MICH_OL.pdf
  13096   Wed Jul 5 16:09:34 2017 gautamUpdateCDSslow machine bootfest

Reboots for c1susaux, c1iscaux today.

 

  13097   Wed Jul 5 19:10:36 2017 gautamUpdateGeneralNB code checkout - updated

I've been making NBs on my laptop, thought I would get the copy under version control up-to-date since I've been negligent in doing so.

The code resides in /ligo/svncommon/NoiseBudget, which as a whole is a git directory. For neatness, most of Evan's original code has been put into the sub-directory  /ligo/svncommon/NoiseBudget/H1NB/, while my 40m NB specific adaptations of them are in the sub-directory /ligo/svncommon/NoiseBudget/NB40. So to make a 40m noise budget, you would have to clone and edit the parameter file accordingly, and run python C1NB.py C1NB_2017_04_30.py for example. I've tested that it works in its current form. I had to install a font package in order to make the code run (with sudo apt-get install tex-gyre ), and also had to comment out calls to GwPy (it kept throwing up an error related to the package "lal", I opted against trying to debug this problem as I am using nds2 instead of GwPy to get the time series data anyways).

There are a few things I'd like to implement in the NB like sub-budgets, I will make a tagged commit once it is in a slightly neater state. But the existing infrastructure should allow making of NBs from the control room workstations now.

Quote:

[evan, gautam]

We spent some time trying to get the noise-budgeting code running today. I guess eventually we want this to be usable on the workstations so we cloned the git repo into /ligo/svncommon. The main objective was to see if we had all the dependencies for getting this code running already installed. The way Evan has set the code up is with a bunch of dictionaries for each of the noise curves we are interested in - so we just commented out everything that required real IFO data. We also commented out all the gwpy stuff, since (if I remember right) we want to be using nds2 to get the data. 

Running the code with just the gwinc curves produces the plots it is supposed to, so it looks like we have all the dependencies required. It now remains to integrate actual IFO data, I will try and set up the infrastructure for this using the archived frame data from the 2016 DRFPMI locks..

 

  13101   Sat Jul 8 17:09:50 2017 gautamUpdateGeneralETMY TRANS QPD anomaly

About 2 weeks ago, I noticed some odd behaviour of the LSC TRY data stream. Its DC value seems to be drifting ~10x more than TRX. Both signals come from the transmission QPDs. At the time, we were dealing with various CDS FE issues but things have been stable on that end for the last two weeks, so I looked into this a bit more today. It seems like one particular channel is bad - Quadrant 4 of the ETMY TRANS QPD. Furthermore, there is a bump around 150Hz, and some features above 2kHz, that are only present for the ETMY channels and not the ETMX ones.

Since these spectra were taken with the PSL shutter closed and all the lab room lights off, it would suggest something is wrong in the electronics - to be investigated.

The drift in TRY can be as large as 0.3 (with 1.0 being the transmitted power in the single arm lock). This seems unusually large, indeed we trigger the arm LSC loops when TRY > 0.3. Attachment #2 shows the second trend of the TRX and TRY 16Hz EPICS channels for 1 day. In the last 12 hours or so, I had left the LSC master switch OFF, but the large drift of the DC value of TRY is clearly visible.

In the short term, we can use the high-gain THORLABS PD for TRY monitoring.

Attachment 1: ETMY_QPD.pdf
ETMY_QPD.pdf
Attachment 2: ETMY_QPD.png
ETMY_QPD.png
  13103   Mon Jul 10 09:49:02 2017 gautamUpdateGeneralAll FEs down

Attachment #1: State of CDS overview screen as of 9.30AM today morning when I came in.

Looks like there may have bene a power glitch, although judging by the wall StripTool traces, if there was one, it happened more than 8 hours ago. FB is down atm so can't trend to find out when this happened.

All FEs and FB are unreachable from the control room workstations, but Megatron, Optimus and Chiara are all ssh-able. The latter reports an uptime of 704 days, so all seems okay with its UPS. Slow machines are all responding to ping as well as telnet.

Recovery process to begin now. Hopefully it isn't as complicated as the most recent effort indecision[FAMOUS LAST WORDS]

Attachment 1: CDS_down_10Jul2017.png
CDS_down_10Jul2017.png
  13104   Mon Jul 10 11:20:20 2017 gautamUpdateGeneralAll FEs down

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

  13106   Mon Jul 10 17:46:26 2017 gautamUpdateGeneralAll FEs down

A bit more digging on the diagnostics page of the RAID array reveals that the two power supplies actually failed on Jun 2 2017 at 10:21:00. Not surprisingly, this was the date and approximate time of the last major power glitch we experienced. Apart from this, the only other error listed on the diagnostics page is "Reading Error" on "IDE CHANNEL 2", but these errors precede the power supply failure.

Perhaps the power supplies are not really damaged, and its just in some funky state since the power glitch. After discussing with Jamie, I think it should be safe to power cycle the Jetstor RAID array once the FB machine has been powered down. Perhaps this will bring back one/both of the faulty power supplies. If not, we may have to get new ones. 

The problem with FB may or may not be related to the state of the Jestor RAID array. It is unclear to me at what point during the boot process we are getting stuck at. It may be that because the RAID disk is in some funky state, the boot process is getting disrupted.

Quote:

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

 

  13107   Mon Jul 10 19:15:21 2017 gautamUpdateGeneralAll FEs down

The Jetstor RAID array is back in its nominal state now, according to the web diagnostics page. I did the following:

  1. Powered down the FB machine - to avoid messing around with the RAID array while the disks are potentially mounted.
  2. Turned off all power switches on the back of the Jetstor unit - there were 4 of them, all of them were toggled to the "0" position.
  3. Disconnected all power cords from the back of the Jetstor unit - there were 3 of them.
  4. Reconnected the power cords, turned the power switches back on to their "1" position.

After a couple of minutes, the front LCD display seemed to indicate that it had finished running some internal checks. The messages indicating failure of power units, which was previously constantly displayed on the front LCD panel, was no longer seen. Going back to the control room and checking the web diagnostics page, everything seemed back to normal.

However, FB still will not boot up. The error is identical to that discussed in this thread by Intel. It seems FB is having trouble finding its boot disk. I was under the impression that only the FE machines were diskless, and that FB had its own local boot disk - in which case I don't know why this error is showing up. According to the linked thread, it could also be a problem with the network card/cable, but I saw both lights on the network switch port FB is connected to turn green when I powered the machine on, so this seems unlikely. I tried following the steps listed in the linked thread but got nowhere, and I don't know enough about how FB is supposed to boot up, so I am leaving things in this state now. 

  13111   Tue Jul 11 15:03:55 2017 gautamUpdateGeneralAll FEs down

Jamie suggested verifying that the problem is indeed with the disk and not with the controller, so I tried switching the original boot disk to Slot #1 (from Slot #0 where it normally resides), but the same problem persists - the green "OK" indicator light keeps flashing even in Slot #1, which was verified to be a working slot using the spare 2.5 inch disk. So I think it is reasonable to conclude that the problem is with the boot disk itself.

The disk is a Seagate Savvio 10K.2 146GB disk. The datasheet doesn't explicitly suggest any recovery options. But Table 24 on page 54 suggests that a blinking LED means that the disk is "spinning up or spinning down". Is this indicative of any particular failure moed? Any ideas on how to go about recovery? Is it even possible to access the data on the disk if it doesn't spin up to the nominal operating speed?

Quote:

I think this is the boot disk failure. I put the spare 2.5 inch disk into the slot #1. The OK indicator of the disk became solid green almost immediately, and it was recognized on the BIOS in the boot section as "Hard Disk". On the contrary, the original disk in the slot #0 has the "OK" indicator kept flashing and the BIOS can't find the harddisk.

 

 

  13113   Wed Jul 12 10:21:07 2017 gautamUpdateGeneralAll FEs down

Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.

Quote:

If we have a SATA/USB adapter, we can test if the disk is still responding or not. If it is still responding, can we probably salvage the files?
Chiara used to have a 2.5" disk that is connected via USB3. As far as I know, we have remote and local backup scripts running (TBC), we can borrow the USB/SATA interface from Chiara.

If the disk is completely gone, we need to rebuilt the disk according to Jamie, and I don't know how to do it. (Don't we have any spare copy?)

 

  13114   Wed Jul 12 14:46:09 2017 gautamUpdateGeneralAll FEs down

I couldn't find an external docking setup for this SAS disk, seems like we need an actual controller in order to interface with it. Mike Pedraza in Downs had such a unit, so I took the disk over to him, but he wasn't able to interface with it in any way that allows us to get the data out. He wants to try switching out the logic board, for which we need an identical disk. We have only one such spare at the 40m that I could locate, but it is not clear to me whether this has any important data on it or not. It has "hda RTLinux" written on its front panel with a sharpie. Mike thinks we can back this up to another disk before trying anything, but he is going to try locating a spare in Downs first. If he is unsuccessful, I will take the spare from the 40m to him tomorrow, first to be backed up, and then for swapping out the logic board.

Chatting with Jamie and Koji, it looks like the options we have are:

  1. Get the data from the old disk, copy it to a working one, and try and revert the original FB machine to its last working state. This assumes we can somehow transfer all the data from the old disk to a working one.
  2. Prepare a fresh boot disk, load the old FB daqd code (which is backed up on Chiara) onto it, and try and get that working. But Jamie isn't very optimistic of this working, because of possible conflicts between the code and any current OS we would install.
  3. Get FB1 working. Jamie is looking into this right now.
Quote:

Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.

 

 

  13117   Fri Jul 14 17:47:03 2017 gautamUpdateGeneralDisks from LLO have arrived

[jamie, gautam]

Today morning, the disks from LLO arrived. Jamie and I have been trying to get things back up and running, but have not had much success today. Here is a summary of what we tried.

Keith Thorne sent us two disks: one has the daqd code and the second is the boot disk for the FE machines. Since Jamie managed to successfully compile the daqd code on FB1 yesterday, we decided to try the following: mount the boot disk KT sent us (using a SATA/USB adapter) on /mnt on FB1, get the FEs booted up, and restart the RT models. 

Quote:

I just want to mention that the situation is actually much more dire than we originally thought.  The diskless NFS root filesystem for all the front-ends was on that fb disk.  If we can't recover it we'll have to rebuilt the front end OS as well.

As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.

While on FB1, Jamie realized he actually had a copy of the /diskless/root directory, which is the NFS filesystem for the FEs, on FB1. So we decided to try and boot some of the FEs with this (instead of starting from scratch with the disks KT sent us). The way things were set up, the FEs were querying the FB machine as the DHCP server. But today, we followed the instructions here to get the FEs to get their IP address from chiara instead. We also added the line 

/diskless/root *(sync,rw,no_root_squash,no_all_squash,no_subtree_check)

to /etc/exports followed by exportfs -ra on FB1. At which point the FE machine we were testing (c1lsc) was able to boot up. 

However, it looks like the NFS filesystem isn't being mounted correctly, for reasons unknown. We commented out some of the rtcds related lines in /etc/rc.local because they were causing a whole bunch of errors at boot (the lines that were touched have been tagged with today's date).


So in summary, the status as of now is:

  1. Front-end machines are able to boot
  2. There seems to be some problem during the boot process, leading to the NFS file system not being correctly mounted. The closest related thing I could find from an elog search is this entry, but I think we are facing a different probelm.
  3. We wanted to see if we could start the realtime models (but without daqd for now), but we weren't even able to get that far today.

We will resume recovery efforts on Monday.

  13120   Sat Jul 15 16:19:00 2017 gautamUpdateCamerasMakeshift PyPylon

Some days ago, I stumbled upon this github page, by a grad student at KIT who developed this code as he was working with Basler GigE cameras. Since we are having trouble installing SnapPy, I figured I'd give this package a try. Installation was very easy, took me ~10mins, and while there isn't great documentation, basic use is very easy - for instance, I was able to adjust the exposure time, and capture an image, all from Pianosa. The attached is some kind of in-built function rendering of the captured image - it is a piece of paper with some scribbles on it near Jigyasa's BRDF measurement setup on the SP table, but it should be straightforward to export the images in any format we like. I believe the axes are pixel indices.

Of course this is only a temporary solution as I don't know if this package will be amenable to interfacing with EPICS servers etc, but seems like a useful tool to have while we figure out how to get SnapPy working. For instance, the HDR image capture routine can now be written entirely as a Python script, and executed via an MEDM button or something.

A rudimentary example file can be found at /opt/rtcds/caltech/c1/scripts/GigE/PyPylon/examples - some of the dictionary keywords to access various properties of the camera (e.g. Exposure time) are different, but these are easy enough to figure out.

 

Attachment 1: pyPylon_test.png
pyPylon_test.png
  13124   Wed Jul 19 00:59:47 2017 gautamUpdateGeneralFINESSE model of DRMI (no arms)

Summary:

I've been working on improving the 40m FINESSE model I set up sometime last year (where the goal was to model various RC folding mirror scenarios). Specifically, I wanted to get the locking feature of FINESSE working, and also simulate the DRMI (no arms) configuration, which is what I have been working on locking the real IFO to. This elog is a summary of what I have from the last few days of working on this.

Model details:

  • No IMC included for now.
  • Core optics R and T from the 40m wiki page.
  • Cavity lengths are the "ideal" ones - see the attached ipynb for the values used.
  • RF modulation depths from here. But for now, the relative phase between f1 and f2 at the EOM is set to 0.
  • I've not included flipped folding mirrors - instead, I put a loss of 0.5% on PR3 and SR3 in the model to account for the AR surface of these optics being inside the RCs. 
  • I've made the AR surfaces of all optics FINESSE "beamsplitters" - there was some discussion on the FINESSE mattermost channel about how not doing this can lead to slightly inaccurate results, so I've tried to be more careful in this respect.
  • I'm using "maxtem 1" in my FINESSE file, which means TEM_mn modes up to (m+n=1) are taken into account - setting this to 0 makes it a plane wave model. This parameter can significantly increase the computational time. 

Model validation:

  • As a first check, I made the PRM and SRM transparent, and used the in-built routines in FINESSE to mode-match the input beam to the arm cavities.
  • I then scanned one arm cavity about a resonance, and compared the transmisison profile to the analytical FP cavity expression - agreement was good.
  • Next, I wanted to get a sensing matrix for the DRMI (no arms) configuration (see attached ipynb notebook).
    • First, I make the ETMs in the model transparent
    • I started with the phases for the BS, PRM and SRM set to their "naive" values of 0, 0 and 90 (for the standard DRMI configuration)
    • I then scanned these optics around, used various PDs to look at the points where appropriate circulating fields reached their maximum values, and updated the phase of the optic with these values.
    • Next, I set the demod phase of various RFPDs such that the PDH error signal is entirely in one quadrature. I use the RFPDs in pairs, with demod phases separated by 90 degrees. I arbitrarily set the demod phase of the Q phase PD as 90 + phase of I phase PD. I also tried to mimic the RFPD-IFO DoF pairing that we use for the actual IFO - so for example, PRCL is controlled by REFL11_I.
    • Confident that I was close enough to the ideal operating point, I then fed the error signals from these RFPDs to the "lock" routine in FINESSE. The manual recommends setting the locking loop gain to 1/optical gain, which is what I did.
    • The tunings for the BS and RMs in the attached kat file are the result of this tuning.
    • For the actual sensing matrix, I moved each of PRM, BS and SRM +/-5 degrees (~15nm) around each resonance. I then computed the numerical derivative around the zero crossing of each RFPD signal, and then plotted all of this in some RADAR plots - see Attachment #1.

Explanation of Attachments and Discussion:

  • Attachment #1 - Computed sensing matrix from this model. Compare to an actual measurement, for example here - the relative angle between the sensing matrix elements dont exactly line up with what is measured. EQ suggested today that I should look into tuning the relative phase between the RF frequencies at the EOM. Nevertheless, I tried comparing the magnitudes of the MICH sensing element in AS55 Q - the model tells me that it should be ~7.8*10^5 W/m. In this elog, I measured it to be 2.37*10^5 W/m. On the AS table, there is a 50-50 BS splitting the light between the AS55 and AS110 photodiodes which is not accounted for in the model. Factoring this in, along with the fact that there are 6 in-vaccuum steering mirrors (assume 98% reflectivity for these), 3 in air steering mirrors, and the window, the sensing matrix element from the model starts to be in the same ballpark as the measurement, at ~3*10^5 W/m. So the model isn't giving completely crazy results.
  • Attachment #2 - Example of the signals at various RFPDs in response to sweeping the PRM around its resonance. To be compared with actual IFO data. Teal lines are the "I" phase, and orange lines are "Q" phase.
  • Attachment #3 - FINESSE kat file and the IPython notebook I used to make these plots. 
  • Next steps
    • More validation against measurements from the actual IFO.
    • Try and resolve differences between modeled and measured sensing matrices.
    • Get locking working with full IFO - there was a discussion on the mattermost thread about sequential/parallel locking some time ago, I need to dig that up to see what is the right way to get this going. Probably the DRMI operating point will also change, because of the complex reflectivities of the arm cavities seen by the RF sidebands (this effect is not present in the current configuration where I've made the ETMs transparent).

GV Edit: EQ pointed out that my method of taking the slope of the error signal to compute the sensing element isn't the most robust - it relies on choosing points to compute the slope that are close enough to the zero crossing and also well within the linear region of the error signal. Instead, FINESSE allows this computation to be done as we do in the real IFO - apply an excitation at a given frequency to an optic and look at the twice-demodulated output of the relevant RFPD (e.g. for PRCL sensing element in the 1f DRMI configuration, drive PRM and demodulate REFL11 at 11MHz and the drive frequenct). Attachment #4 is the sensing matrix recomputed in this way - in this case, it produces almost identical results as the slope method, but I think the double-demod technique is better in that you don't have to worry about selecting points for computing the slope etc. 

 

Attachment 1: DRMI_sensingMat.pdf
DRMI_sensingMat.pdf
Attachment 2: DRMI_errSigs.pdf
DRMI_errSigs.pdf
Attachment 3: 40m_DRMI_FINESSE.zip
Attachment 4: DRMI_sensingMat_19Jul.pdf
DRMI_sensingMat_19Jul.pdf
  13135   Mon Jul 24 10:45:23 2017 gautamUpdateCDSc1iscex models died

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

Quote:

All front ends and model are (mostly) running now

Attachment 1: c1iscexFailure.png
c1iscexFailure.png
  13137   Mon Jul 24 12:00:21 2017 gautamUpdatePSLPSL NPRO mysteriously shut off

Summary:

At around 10:30AM today morning, the PSL mysteriously shut off. Steve and I confirmed that the NPRO controller had the RED "OFF" LED lit up. It is unknown why this happened. We manually turned the NPRO back on and hte PMC has been stably locked for the last hour or so.

Details:

There are so many changes to lab hardware/software that have been happening recently, it's not entirely clear to me what exactly was the problem here. But here are the observations:

  1. Yesterday, when I came into the lab, the MC REFL trace on the wall StripTool was 0 for the full 8 hour history - since we don't have data records, I can't go back further than this. I remember the PMC TRANS and REFL cameras looked normal, but there was no MC REFL spot on the CCD monitors. This is consistent with the PSL operating normally, the PMC being locked, and the PSL shutter being closed. Isn't the emergency vacuum interlock also responsible for automatically closing the PSL shutter? Perhaps if the turbo controller failure happened prior to Jamie/me coming in yesterday, maybe this was just the interlock doing its job. On Friday evening, the PSL shutter was certainly open and the MC REFL spot was visible on the camera. I also confirmed with Jamie that he didn't close the shutter.
  2. Attachment #1 shows the wall StripTool traces from earlier this morning. It looks like ~7.40AM, the MC REFL level went back up. Steve says he didn't manually open the shutter, and in any case, this was before the turbo pump controller failure was diagnosed. So why did the shutter open again
  3. When I came in at ~10AM, the CCD monitor showed that the PMC was locked, and the MC REFL spot was visible. 
  4. Also on attachment #1, there is a ~10min dip in the MC REFL level. This corresponds to ~10:30AM this morning. Both Steve and I were sitting in the control room at this time. We noticed that the PMC TRANS and REFL CCDs were dark. When we went in to check on the laser, we saw that it was indeed off. There was no one inside the lab area at this time to our knowledge, and as far as I know, the only direct emergency shutoff for the PSL is on the North-West corner of the PSL enclosure. So it is unclear why the laser just suddenly went off.

Steve says that this kind of behaviour is characteristic of a power glitch/surge, but nothing else seems to have been affected (I confirmed that the X and Y end lasers are ON). 

Attachment 1: IMG_7454.JPG
IMG_7454.JPG
  13139   Mon Jul 24 19:57:54 2017 gautamUpdateCDSIMC locked, Autolocker re-enabled

Now that all the front end models are running, I re-aligned the IMC, locked it manually, and then tweaked the alignment some more. The IMC transmission now is hovering around 15300 counts. I re-enabled the Autolocker and FSS Slow loops on Megatron as well.

Quote:

MX/OpenMX network running

Today I got the mx/open-mx networking working for the front ends.  This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel.  Again, this will all be documented.

 

  13141   Tue Jul 25 02:03:59 2017 gautamUpdateOptical LeversOptical lever tuning thoughts

Summary:

Currently, I am unable to engage the coil-dewhitening filters without destroying cavity locks. One reason why this is so is because the present Oplev servos have a roll-off at high frequencies that is not steep enough - engaging the digital whitening + analog de-whitening just causes the DAC output to saturate. Today, Rana and I discussed some ideas about how to approach this problem. This elog collects these thoughts. As I flesh out these ideas, I will update them in a more complete writeup in T1700363 (placeholder for now). Past relevant elogs: 5376, 9680

  1. Why do we need optical levers?
    • ​​To stabilize the low-frequency seismic driven angular motion of the optics.
  2.  In what frequency range can we / do we need to stabilize the angular motion of the optics? How much error signal suppression do we need in the control band? How much is achievable given the current Oplev setup?
    • ​​To answer these questions, we need to build a detailed Oplev noise budget.
    • Ultimately, the Oplev error signal is sensing the differential motion between the suspended optic and the incident laser beam.
    • What frequency range does laser beam jitter dominate the actual optic motion? What about mechanical drifts of the optical tables the HeNes sit on? And for many of the vertex optics, the Oplev beam has multiple bounces on steering mirrors on the stack. What is the contribution of the stack motion to the error signal?
    • The answers to the above will tell us what lower and upper UGFs we should and can pick. It will also be instructive to investigate if we can come up with a telescope design near the Oplev QPD that significantly reduces beam jitter effects (see elog 10732). Also, can we launch/extract the beam into/from the vacuum chamber in such a way that we aren't so susceptible to motion of the stack?
  3. What are some noises that have to be measured and quantified?
    • Seismic noise
    • ​Shot noise
    • Electronics noise of the QPD readout chain
    • HeNe intensity noise (does this matter since we are normalizing by QPD sum?)
    • HeNe beam pointing / jitter noise (How? N-corner hat method?)
    • Stack motion contribution to the Oplev error signal
  4. How do we design the Oplev controller?
    • ​The main problem is to frame the right cost function for this problem. Once this cost function is made, we can use MATLAB's PSO tool (which is what was used for the PR3 coating design optimization, and also successfully for this kind of loop shaping problems by Rana for aLIGO) to find a minimum by moving the controller poles and zeros around within bounds we define.
  5. What terms should enter the cost function?

    • ​In addition to those listed in elog 5376
    • We need the >10Hz roll-off to be steep enough that turning on the digital whitening will not significantly increase the DAC output RMS or drive it to saturation.
    • We'd like for the controller to be insensitive to 5% (?) errors in the assumed optical plant and noise models i.e. the closed loop shouldn't become unstable if we made a small error in some assumed parameters.
    • Some penalty for using excessive numbers of poles/zeros? Penalty for having too many high-frequency features.
  6. Other things to verify / look into
    • ​Verify if the counts -> urad calibration is still valid for all the Oplevs. We have the arm-cavity power quadratic dependance method, and the geometry method to do this.
    •  Check if the Oplev error signals are normalized by the quadrant sum.
    • How important is it to balance the individual quadrant gains?
    • Check with Koji / Rich about new QPDs. If we can get some, perhaps we can use these in the setup that Steve is going to prepare, as part of the temperature vs HeNe noise invenstigations.

Before the CDS went down, I had taken error signal spectra for the ITMs. I will update this elog tomorrow with these measurements, as well as some noise estimates, to get started.

  13146   Thu Jul 27 22:42:24 2017 gautamUpdateSUSSeismic noise, DAC noise, and Coil Driver electronics noise

Summary:

Yesterday at the meeting, we talked about how the analog de-whitening filters in the coil driver path may be more aggressive than necessary. I think Attachment #1 shows that this is indeed the case.

Details:

I had done some modeling and measurement of some of these noises while I was putting together the initial DRMI noise budget, but I had never put things together in one plot. In Attachment #1, I've plotted the following:

  1. Quadrature sum of seismic noise (from GWINC calculations) for 3 suspended optics (I'm sticking to the case of 3 optics since I've been doing all the noise-budgeting for MICH - for DARM, it will be 4 suspended optics).
  2. The unfiltered DAC noise estimate. The voltage noise was measured in this elog. To convert this to displacement noise for 3 suspended optics, I've used the value of 1.55e-9/f^2 m/ct as the actuator coefficient. This number should be accurate under the assumption that the series resistance on the coil driver board output is 400 ohms (we could increase this - by how much depends on how much actuation range is needed).  
  3. Coil driver board and de-whitening board electronics noises (added in quadrature). I've used the LISO model noises, which line up well with the measured noises in elogs 13010 and 13015.
  4. The DAC noise filtered by the de-whitening transfer function, separately for the cases of using one or both of the available biquad stages. This cannot be lower than the preceeding trace (electronics noise of de-whitening and coil driver boards), so should be disregarded where it dips below it. 

It would seem that the coil driver + de-whitening board electronic noises dominate above ~150Hz. The electronics noise is ~10nV/rtHz at the output of the coil driver board, which is only a factor of 100 below the DAC noise - so the stopband attenuation of ~70dB on the de-whitening boards seems excessive.

We can lower this noise by a factor of 2.5 if we up the series resistance on the coil driver boards from 400ohm to 1kohm, but even so, the displacement noise is ~1e-18 m/rtHz. I need to investigate the electronics noises a little more carefully - I only measured it for the case when both biquad stages were engaged, I will need to do the model for all permutations - to be updated. 

Attachment #2 has an iPython notebook used to generate this plot along with all the data.


Edit 28 Jul 2.30pm: I've added Attachment #3 with traces for different assumed values of the series resistance on the coil driver board - although I have not re-computed the Johnson noise contribution for the various resistances. If we can afford to reduce the actuation range by a factor of 25, then it looks like we get to within a factor of ~5 of the seismic noise at ~150Hz. 

Attachment 1: noiseComparison.pdf
noiseComparison.pdf
Attachment 2: deWhiteConfigs.zip
Attachment 3: noiseComparison_resistances.pdf
noiseComparison_resistances.pdf
  13147   Fri Jul 28 15:36:32 2017 gautamUpdateOptical LeversOptical lever tuning thoughts

Attachment #1 - Measured error signal spectrum with the Oplev loop disabled, measured at the IN1 input for ITMY. The y-axis calibration into urad/rtHz may not be exact (I don't know when this was last calibrated).

From this measurement, I've attempted to disentangle what is the seismic noise contribution to the measured plant output.

  • To do so, I first modelled the plant as a pair of complex poles @0.95 Hz, Q=3. This gave the best agreement with measurement by eye, I didn't try and optimize this too carefully. 
  • Next, I assumed all the noise between DC-10Hz comes from only seismic disturbance. So dividing the measured PSD by the plant transfer function gives the spectrum of the seismic disturbance. I further assumed this to be flat, and so I averaged it between DC-10Hz.
  • This will be a first seismic noise model to the loop shape optimizer. I can probably get a better model using the GWINC calculations but for a start, this should be good enough.

It remains to characterize various other noise sources.

Quote:

Before the CDS went down, I had taken error signal spectra for the ITMs. I will update this elog tomorrow with these measurements, as well as some noise estimates, to get started.


I have also confirmed that the "QPD" Simulink block, which is what is used for Oplevs, does indeed have the PIT and YAW outputs normalized by the SUM (see Attachment #2). This was not clear to me from the MEDM screen.


GV 30 Jul 5pm: I've included in Attachment #3 the block diagram of the general linear feedback topology, along with the specific "disturbances" and "noises" w.r.t. the Oplev loop. The measured (open loop) error signal spectrum of Attachment #1 (call it y) is given by:

y_{meas}(s) = P(s)\sum_{i=1}^{3}d_{i}(s) + \sum_{k=1}^{4}n_{k}(s)

If it turns out that one (or more) term(s) in each of the summations above dominates in all frequency bands of interest, then I guess we can drop the others. An elog with a first pass at a mathematical formulation of the cost-function for controller optimization to follow shortly.

Attachment 1: errSig.pdf
errSig.pdf
Attachment 2: QPD_simulink.png
QPD_simulink.png
Attachment 3: feedbackTopology.pdf
feedbackTopology.pdf
  13148   Fri Jul 28 16:47:16 2017 gautamUpdateGeneralPSL StripTool flatlined

About 3.5 hours ago, all the PSL wall StripTool traces "flatlined", as happens when we had the EPICS freezes in the past - except that all these traces were flat for more than 3 hours. I checked that the c1psl slow machine responded to ping, and I could also telnet into it. I tried opening the StripTool on pianosa and all the traces were responsive. So I simply re-started the PSL StripTool on zita. All traces look responsive now.

  13150   Sat Jul 29 14:05:19 2017 gautamUpdateGeneralPSL StripTool flatlined

The PMC was unlocked when I came in ~10mins ago. The wall StripTool traces suggest it has been this way for > 8hours. I was unable to get the PMC to re-lock by using the PMC MEDM screen. The c1psl slow machine responded to ping, and I could also telnet into it. But despite burt-restoring c1psl, I could not get the PMC to lock. So I re-started c1psl by keying the crate, and then burt-restored the EPICS values again. This seems to have done the trick. Both the PMC and IMC are now locked.


Unrelated to this work: It looks like some/all of the FE models were re-started. The x3 gain on the coil outputs of the 2 ITMs and BS, which I had manually engaged when I re-aligned the IFO on Monday, were off, and in general, the IMC and IFO alignment seem much worse now than it was yesterday. I will do the re-alignment later as I'm not planning to use the IFO today.

  13152   Mon Jul 31 15:13:24 2017 gautamUpdateCDSFB ---> FB1

[jamie, gautam]

In order to test the new daqd config that Jamie has been working on, we felt it would be most convenient for the host name "fb" (martian network IP 192.168.113.202) to point to the physical machine "fb1" (martian network IP 192.168.113.201).

I made this change in /var/lib/bind/martian.hosts on chiara, and then ran sudo service bind9 restart. It seems to have done the job. So as things stand, both hostnames "fb" and "fb1" point to 192.168.113.201.

Now, when starting up DTT or dataviewer, the NDS server is automatically found.

More details to follow.

  13156   Tue Aug 1 16:05:01 2017 gautamUpdateOptical LeversOptical lever tuning - cost function construction

Summary:

I've been trying to put together the cost-function that will be used to optimize the Oplev loop shape. Here is what I have so far.

Details:

All of the terms that we want to include in the cost function can be derived from:

  1. A measurement of the open-loop error signal [using DTT, calibrated to urad/rtHz]. We may want a breakdown of this in terms of "sensing noises" and "disturbances" (see the previous elog in this thread), but just a spectrum will suffice for the optimal controller given the current noises.
  2. A model of the optical plant, P(s) [validated with a DTT swept-sine measurement]. 
  3. A model of the controller, C(s). Some/all of the poles and zeros of this transfer function is what the optimization algorithm will tune to satisfy the design objectives.

From these, we can derive, for a given controller, C(s):

  1. Closed-loop stability (i.e. all poles should be in the left-half of the complex plane), and exactly 2 UGFs. We can use MATLAB's allmargin function for this. An unstable controller can be rejected by assigning it an extremely high cost.
  2. RMS rrror signal suppression in the frequency band (0.5Hz - 2Hz). We can require this to be >= 15dB (say).
  3. Minimize gain peaking and noise injection - this information will be in the sensitivity function, \left | \frac{1}{1+P(s)C(s)} \right |. We can require this to be <= 10dB (say).
  4. RMS of the control signal between 10 Hz and 200 Hz, multiplied by the digital suspension whitening filter, should be <10% of the DAC range (so that we don't have problems engaging the coil de-whitening).
  5. Smallest gain margin (there will be multiple because of the various notches we have) should be > 10dB (say). Phase margin at both UGFs should be >30 degrees.
  6. Terms 1-5 should not change by more than 10% for perturbations in the plant model parameters (f0 and Q of the pendulum) at the 10% (?) level. 

We can add more terms to the cost function if necessary, but I want to get some minimal set working first. All the "requirements" I've quoted above are just numbers out of my head at the moment, I will refine them once I get some feeling for how feasible a solution is for these requirements.

Quote:

An elog with a first pass at a mathematical formulation of the cost-function for controller optimization to follow shortly.


For a start, I attempted to model the current Oplev loop. The modeling of the plant and open-loop error signal spectrum have been described in the previous elogs in this thread.

I am, however, confused by the controller - the MEDM screen (see Attachment #2) would have me believe that the digital transfer function is FM2*FM5*FM7*FM8*gain(10). However, I get much better agreement between the measured and modelled in-loop error signal if I exclude the overall gain of 10 (see Attachments #1 for the models and #3 for measurements).

What am I missing? Getting this right will be important in specifying Term #4 in the cost function...

GV Edit 2 Aug 0030: As another sanity check, I computed the whitened Oplev control signal given the current loop shape (with sub-optimal high-frequency roll-off). In Attachment #4, I converted the y-axis from urad/rtHz to cts/rtHz using the approximate calibration of 240urad/ct (and the fact that the Oplev error signal is normalized by the QPD sum of ~13000 cts), and divided by 4 to account for the fact that the control signal is sent to 4 coils. It is clear that attempting to whiten the coil driver signals with the present Oplev loop shapes causes DAC saturation. I'm going to use this formulation for Term #4 in the cost function, and to solve a simpler optimization problem first - given the existing loop shape, what is the optimal elliptic low-pass filter to implement such that the cost function is minimized? 


There is also the question of how to go about doing the optimization, given that our cost function is a vector rather than a scalar. In the coating optimization code, we converted the vector cost function to a scalar one by taking a weighted sum of the individual components. This worked adequately well.

But there are techniques for vector cost-function optimization as well, which may work better. Specifically, the question is  if we can find the (infinite) solution set for which no one term in the error function can be made better without making another worse (the so-called Pareto front). Then we still have to make a choice as to which point along this curve we want to operate at.

Attachment 1: loopPerformance.pdf
loopPerformance.pdf
Attachment 2: OplevLoop.png
OplevLoop.png
Attachment 3: OL_errSigs.pdf
OL_errSigs.pdf
Attachment 4: DAC_saturation.pdf
DAC_saturation.pdf
  13160   Wed Aug 2 15:04:15 2017 gautamConfigurationComputerscontrol room workstation power distribution

The 4 control room workstation CPUs (Rossa, Pianosa, Donatella and Allegra) are now connected to the UPS.

The 5 monitors are connected to the recently acquired surge-protecting power strips.

Rack-mountable power strip + spare APC Surge Arrest power strip have been stored in the electronics cabinet.

Quote:

this is not the right one; this Ethernet controlled strip we want in the racks for remote control.

Buy some of these for the MONITORS.

 

  13161   Thu Aug 3 00:59:33 2017 gautamUpdateCDSNDS2 server restarted, /frames mounted on megatron

[Koji, Nikhil, Gautam]

We couldn't get data using python nds2. There seems to have been many problems.

  1. /frames wasn't mounted on megatron, which was the nds2 server. Solution: added /frames 192.168.113.209(sync,ro,no_root_squash,no_all_squash,no_subtree_check) to /etc/exportfs on fb1, followed by sudo exportfs -ra. Using showmount -e, we confirmed that /frames was being exported.
  2. Edited /etc/fstab on megatron to be fb1:/frames/ /frames nfs ro,bg,soft 0 0. Tried to run mount -a, but console stalled.
  3. Used nfsstat -m on megatron. Found out that megatron was trying to mount /frames from old FB (192.168.113.202). Used sudo umount -f /frames to force unmount /frames/ (force was required).
  4. Re-ran mount -a on megatron.
  5. Killed nds2 using /etc/init.d/nds2 stop - didn't work, so we manually kill -9'ed it.
  6. Restarted nds2 server using /etc/init.d/nds2 start.
  7. Waited for ~10mins before everything started working again. Now usual nds2 data getting methods work.

I have yet to check about getting trend data via nds2, can't find the syntax. EDIT: As Jamie mentioned in his elog, the second trend data is being written but is inaccessible over nds (either with dataviewer, which uses fb as the ndsserver, or with python NDS, which uses megatron as the ndsserver). So as of now, we cannot read any kind of trends directly, although the full data can be downloaded from the past either with dataviewer or python nds2. On the control room workstations, this can also be done with cds.getdata.

  13163   Thu Aug 3 11:11:29 2017 gautamUpdateCDSNDS2 server restarted, /frames mounted on nodus

I added nodus' eth0 IP (192.168.113.200) to the list of allowed nfs clients in /etc/exportfs on fb1, and then ran sudo mount -a on nodus. Now /frames is mounted.

Quote:

needs more debugging - this is the machine that allows us to have backed up frames in LDAS. Permissions issues from fb1 ?

 

  13167   Fri Aug 4 18:25:15 2017 gautamUpdateGeneralBilinear noise coupling

[Nikhil, gautam]

We repeated the test that EricQ detailed here today. We have downloaded ~10min of data (between GPS times 11885925523 - 11885926117), and Nikhil will analyze it.

Attachment 1: bilinearTest.pdf
bilinearTest.pdf
  13168   Sat Aug 5 11:04:07 2017 gautamUpdateSUSMC1 glitches return

See Attachment #1, which is full (2048Hz) data for a 3 minute stretch around when I saw the MC1 glitch. At the time of the glitch, WFS loops were disabled, so the only actuation on MC1 was via the local damping loops. The oscillations in the MC2 channels are the autolocker turning on the MC2 length tickle.

Nikhil and I tried the usual techniques of squishing cables at the satellite box, and also at 1X4/1X5, but the glitching persists. I will try and localize the problem this weekend. This thread details investigations the last time something like this happened. In the past, I was able to fix this kind of glitching by replacing the (high speed) current buffer IC LM6321M. These are present in a two places: Satellite box (for the shadow sensor LED current drive), and on the coil driver boards. I think we can rule out the slow machine ADCs that supply the static PIT and YAW bias voltages to the optic, as that path is low-passed with a 4th order filter @1Hz, while the glitches that show up in the OSEM sensor channels do not appear to be low-passed, as seen in the zoomed in view of the glitch in Attachment #2 (but there is an LM6321 in this path as well).

Attachment 1: MC1_glitch_Aug42017.png
MC1_glitch_Aug42017.png
Attachment 2: MC1_glitch_zoomed.png
MC1_glitch_zoomed.png
  13173   Tue Aug 8 20:48:06 2017 gautamUpdateSUSITMX stuck

Somewhere between CDS model restarts and the IFO venting, ITMX got stuck.

I shook it loose using the usual bias slider technique. It appears to be free now, I was able to lock the green beam on a TEM00 mode without touching the green input pointing. The ITMX Oplev spot has also returned to within its MEDM display bounds.

  13174   Wed Aug 9 11:33:49 2017 gautamUpdateElectronicsMC2 de-whitening

Summary:

The analog de-whitening filters for MC2 are different from those on the other optics (i.e. ITMs and ETMs). They have one complex pole pair @7Hz, Q~sqrt(2), one complex zero pair @50Hz, Q~sqrt(2), one real pole at 2.5kHz, and one real zero @250Hz (with a DC gain of 10dB).

Details:

I took the opportunity last night to measure all 4 de-whitening channel TFs. Measurements and overlaid LISO fits are seen in Attachment #1. 

The motivation behind this investigation was that last week, I was unable to lock the IMC to one of the arms. In the past, this has been done simply by routing the control signal of the appropriate arm filter bank (e.g. C1:LSC-YARM_OUT) to MC2 instead of ETMY via the LSC output matrix (if the matrix element to ETMY is 1, the matrix element to MC2 is -1).

Looking at the coil output filter banks on the MC2 suspension MEDM screen (see Attachment #2), the positions of filters in the filter banks is different from that on the other optics. In general, the BIO outputs of the DAC are wired such that disengaging FM9 on the MEDM screen engages the analog de-whitening path. FM10 then has the inverse of the de-whitening filter, such that the overall TF from DAC to optic is unity. But on MC2, these filters occupy FM7 and FM8, and FM9 was originally a 28Hz Elliptic Low-pass filter.

So presumably, I was unable to lock the IMC to an arm because for either configuration of FM9 (ON or OFF), the signal to the optic was being aggressively low-passed. To test this hypothesis, I simply copied the 28Hz elliptic to FM6, put a gain of 1 on FM9, left it engaged (so that the analog path TF is just flat with gain x3), and tried locking the IMC to the arm again - I was successful. See Attachment #3 for comparison of the control signal spectra of the X-arm control signal, with the IMC locked to the Y-arm cavity.

In this test, I also confirmed that toggling FM9 in the coil output filter banks actually switches the analog path on the de-whitening boards.

Since I now have the measurements for individual channels, I am going to re-configure the filter arrangement on MC2 to mirror that on the other optics. 


Unrelated to this work: the de-whitening boards used for MC1 and MC3 are D000316, as opposed to D000183 used for all other SOS optics. From the D000316 schematic, it looks like the signals from the AI board are routed to this board via the backplane. I will try squishing this backplane connector in the hope it helps with the glitching MC1 suspension.


GV Aug 13 11:45pm - I've made a DCC page for the MC2 dewhitening board. For now, it has the data from this measurement, but if/when we modify the filter shape, we can keep track of it on this page (for MC2 - for the other suspensions, there are other pages). 

Attachment 1: MC2deWhites.pdf
MC2deWhites.pdf
Attachment 2: MC2Coils.png
MC2Coils.png
Attachment 3: MC2stab.pdf
MC2stab.pdf
  13177   Wed Aug 9 12:35:47 2017 gautamUpdateALSFiber ALS

Last week, we were talking about reviving the Fiber ALS box. Right now, it's not in great shape. Some changes to be made:

  1. Supply power to the PDs (Menlo FPD310) via a power regulator board. The datasheet says the current consumption per PD is 250 mA. So we need 500mA. We have the D1000217 power regulator board available in the lab. It uses the LM2941 and LM2991 power regulator ICs, both of which are rated for 1A output current, so this seems suitable for our purposes. Thoughts?
  2. Install power decoupling capacitors on the PDs.
  3. Clean up the fiber arrangement inside the box.
  4. Install better switches, plus LED indicators.
  5. Cover the box.
  6. Install it in a better way on the PSL table. Thoughts? e.g. can we mount the unit in some electronics rack and route the fibers to the rack? Perhaps the PSL IR and one of the arm fibers are long enough, but the other arm might be tricky.

Previous elog thread about work done on this box: elog11650

Attachment 1: IMG_3942.JPG
IMG_3942.JPG
  13178   Wed Aug 9 15:15:47 2017 gautamUpdateSUSMC1 glitches return

Happened again just now, although the characteristics of the glitch are very different from the previous post, its less abrupt. Only actuation on MC1 at this point was local damping.

Attachment 1: MC1_glitch.png
MC1_glitch.png
  13180   Wed Aug 9 19:21:18 2017 gautamUpdateALSALS recovery

Summary:

Between frequent MC1 excursions, I worked on ALS recovery today. Attachment #1 shows the out-of-loop ALS noise as of today evening (taken with arms locked to IR) - I have yet to check loop shapes of the ALS servos, looks like there is some tuning to be done.

On the PSL table:

  • First, I locked the arms to IR, ran the dither alignment servos to maximize transmission.
  • I used the IR beat PDs to make sure a beat existed, at approximately.
  • Then I used a scope to monitor the green beat, and tweaked steering mirror alignment until the beat amplitude was maximized. I was able to improve the X arm beat amplitude, which Koji and Naomi had tweaked last week, by ~factor of 2, and Y arm by ~factor of 10.
  • I used the DC outputs of the BBPDs to center the beam onto the PD.
  • Currently, the beat notes have amplitudes of ~-40dBm on the scopes in the control room (there are various couplers/amplifiers in the path so I am not sure what beatnote amplitude this translates to at the BBPD output). I have yet to do a thorough power budget, but I have in my mind that they used to be ~-30dBm. To be investigated.
  • Removed the fiber beat PD 1U chassis unit from the PSL table for further work. The fibers have been capped and remain on the PSL table. Cleaned the NW corner of the PSL table up a bit.

To do:

  • Optimization of the input pointing of the green beam for X (with PZTs) and Y (manual) arms.
  • ALS PDH servo loop measurement. Attachment #1 suggests some loop gain adjustment is required for both arms (although the hump centered around ~70Hz seem to be coming from the IR lock).
  • Power budgeting on the PSL table to compare to previous such efforts.

Note: Some of the ALS scripts are suffering from the recent inablilty of cdsutils to pull up testpoints (e.g. the script that is used to set the UGFs of the phase tracker servo). The workaround is to use DTT to open the test points first (just grab 0.1s time series for all channels of interest). Then the cdsutils scripts can read the required channels (but you have to keep the DTT open).

Attachment 1: ALS_oolSpec.pdf
ALS_oolSpec.pdf
  13185   Thu Aug 10 14:25:52 2017 gautamUpdateCDSSlow EPICS channels -> Frames re-enabled

I went into /opt/rtcds/caltech/c1/target/daqd, opened the master file, and uncommented the line with C0EDCU.ini (this is the file in which all the slow machine channels are defined). So now I am able to access, for example, the c1vac1 channels.

The location of the master file is no longer in /opt/rtcds/caltech/c1/target/fb, but is in the above mentioned directory instead. This is part of the new daqd paradigm in which separate processes are handling the data transfer between FEs and FB, and the actual frame-writing. Jamie will explain this more when he summarizes the CDS revamp.

It looks like trend data is also available for these newly enabled channels, but thus far, I've only checked second trends. I will update with a more exhaustive check later in the evening.

So, the two major pending problems (that I can think of) are:

  1. Inability to unload models cleanly
  2. Inability of dataviewer (and cdsutils) to open testpoints.

Apart from this, dataviewer frequently hangs on Donatella at startup. I used ipcs -a | grep 0x | awk '{printf( "-Q %s ", $1 )}' | xargs ipcrm to remove all the extra messages in the dataviewer queue.


Restarting the daqd processes on fb1 using Jamie's instructions from earlier in this thread works - but the mx_stream processes do not seem to come back automatically on c1lsc, c1sus and c1ioo (reasons unknown). I've made a copy of the mxstreamrestart.sh script with the new mxstream restart commands, called mxstreamrestart_debian.sh, which lives in /opt/rtcds/caltech/c1/scripts/cds. I've also modified the CDS overview MEDM screen such that the "mxstream restart" calls this modified script. For now, this requires you to enter the controls password for each machine. I don't know what is a secure way to do it otherwise, but I recall not having to do this in the past with the old mxstreamrestart.sh script.

  13187   Thu Aug 10 21:01:43 2017 gautamUpdateSUSMC1 glitches debugging

I have squished cables in all the places I can think of - but MC1 has been glitching regularly today. Before starting to pull electronics out, I am going to attempt a more systematic debugging in the hope I can localize the cause.

To this end, I've disabled the MC autolocker, and have shutdown the MC1 watchdog. I plan to leave it in this state overnight. From this, I hope to look at the free-swinging optic spectra to see that this isn't a symptom of something funky with the suspension itself.

Some possible scenarios (assuming the free swinging spectra look alright and the various resonances are where we expect them to be):

  1. With the watchdog shutdown, the PIT/YAW bias voltages still goto the coil (low-passed by 4 poles @1Hz). So if the glitching happens in this path, we should see it in both the shadow sensors and the DC spot positions on the WFS.
  2. If the glitching happens in the shadow sensor readout electronics/cabling, we should see it in the shadow sensor channels, but NOT in the DC spot positions on the WFS (as the watchdog is shutdown, so there should be no actuation to the coils based on OSEM signals).
  3. If we don't see any glitches in WFS spot positions or shadow sensors, then it is indicative of the problem being in the coil driver board / dewhitening board/anti-aliasing board.
  4. I am discounting the problem being in the Satellite box, as we have switched around the MC1 satellite box multiple times - the glitches remain on MC1 and don't follow a Satellite Box. Of course there is the possibility that the cabling from 1X5/1X6 to the Satellite box is bad.

MC1 has been in a glitchy mood today, with large (MC-REFL spot shifts by ~1 beam diameter on the CCD monitor) glitches happening ~every 2-3 hours. Hopefully it hasn't gone into an extended quiet period. For reference, I've attached the screen-grab of the MC-QUAD and MC-REFL as they are now.


GV 9.20PM: Just to make sure of good SNR in measuring the pendulum eigenfreqs, I ran /opt/rtcds/caltech/c1/scripts/SUS/freeswing MC1 in a terminal . The result looked rather violent on the camera but its already settling down. The terminal output:

The following optics were kicked:
MC1
Thu Aug 10 21:21:24 PDT 2017
1186460502
Quote:

Happened again just now, although the characteristics of the glitch are very different from the previous post, its less abrupt. Only actuation on MC1 at this point was local damping.

 

Attachment 1: MC_QUAD_10AUG2017.jpg
MC_QUAD_10AUG2017.jpg
Attachment 2: MCR_10AUG2017.jpg
MCR_10AUG2017.jpg
  13189   Fri Aug 11 00:10:03 2017 gautamUpdateCDSSlow EPICS channels -> Frames re-enabled

Seems like something has failed after I did this - full frames are no longer on Aug 10 being written since ~2.30pm PDT. I found out when I tried to download some of the free-swinging MC1 data.

To clarify, I logged into fb1, and ran sudo systemctl restart daqd_*. The only change I made was to uncomment the line quoted below in the master file.

Looking at the log using systemctl, I see the following (I just tried restarting the daqd processes again):

Aug 11 00:00:31 fb1 daqd_fw[16149]: LDASUnexpected::unexpected: Caught unexpected exception      "This is a bug. Please log an LDAS problem report including this message.
Aug 11 00:00:31 fb1 daqd_fw[16149]: daqd_fw: LDASUnexpected.cc:131: static void LDASTools::Error::LDASUnexpected::unexpected(): Assertion `false' failed.
Aug 11 00:00:32 fb1 systemd[1]: daqd_fw.service: main process exited, code=killed, status=6/ABRT
Aug 11 00:00:32 fb1 systemd[1]: Unit daqd_fw.service entered failed state.
Aug 11 00:00:32 fb1 systemd[1]: daqd_fw.service holdoff time over, scheduling restart.
Aug 11 00:00:32 fb1 systemd[1]: Stopping Advanced LIGO RTS daqd frame writer...
Aug 11 00:00:32 fb1 systemd[1]: Starting Advanced LIGO RTS daqd frame writer...
Aug 11 00:00:32 fb1 systemd[1]: daqd_fw.service start request repeated too quickly, refusing to start.
Aug 11 00:00:32 fb1 systemd[1]: Failed to start Advanced LIGO RTS daqd frame writer.
Aug 11 00:00:32 fb1 systemd[1]: Unit daqd_fw.service entered failed state.

Oddly, I am able to access second trends for the same channels from the past which will be useful for the MC1 debugging). Not sure whats going on.


The live data grabbing using cdsutils still seems to be working though - so I've kicked MC1 again, and am grabbing 2 hours of data live on Pianosa.

Quote:

I went into /opt/rtcds/caltech/c1/target/daqd, opened the master file, and uncommented the line with C0EDCU.ini (this is the file in which all the slow machine channels are defined). So now I am able to access, for example, the c1vac1 channels.

The location of the master file is no longer in /opt/rtcds/caltech/c1/target/fb, but is in the above mentioned directory instead. This is part of the new daqd paradigm in which separate processes are handling the data transfer between FEs and FB, and the actual frame-writing. Jamie will explain this more when he summarizes the CDS revamp.

It looks like trend data is also available for these newly enabled channels, but thus far, I've only checked second trends. I will update with a more exhaustive check later in the evening.

So, the two major pending problems (that I can think of) are:

  1. Inability to unload models cleanly
  2. Inability of dataviewer (and cdsutils) to open testpoints.

Apart from this, dataviewer frequently hangs on Donatella at startup. I used ipcs -a | grep 0x | awk '{printf( "-Q %s ", $1 )}' | xargs ipcrm to remove all the extra messages in the dataviewer queue.


Restarting the daqd processes on fb1 using Jamie's instructions from earlier in this thread works - but the mx_stream processes do not seem to come back automatically on c1lsc, c1sus and c1ioo (reasons unknown). I've made a copy of the mxstreamrestart.sh script with the new mxstream restart commands, called mxstreamrestart_debian.sh, which lives in /opt/rtcds/caltech/c1/scripts/cds. I've also modified the CDS overview MEDM screen such that the "mxstream restart" calls this modified script. For now, this requires you to enter the controls password for each machine. I don't know what is a secure way to do it otherwise, but I recall not having to do this in the past with the old mxstreamrestart.sh script.

 

  13192   Fri Aug 11 11:14:24 2017 gautamUpdateCDSSlow EPICS channels -> Frames re-enabled

I commented out the line pertaining to C0EDCU again, now full frames are being written again.

But we no longer have access to the slow EPICS records.

I am not sure what the failure mode is here - In the master file, there is a line that says the EDCU list "*MUST* COME *AFTER* ALL OTHER FAST INI DEFINITIONS" which it does. But there are a bunch of lines that are testpoint lists after this EDCU line. I wonder if that is the problem?

Quote:

Seems like something has failed after I did this - full frames are no longer on Aug 10 being written since ~2.30pm PDT. I found out when I tried to download some of the free-swinging MC1 data.

 

  13195   Fri Aug 11 12:32:46 2017 gautamUpdateSUSMC1 glitches debugging

Attachment #1: Free swinging sensor spectra. I havent done any peak fitting but the locations of the resonances seem consistent with where we expect them to be.

The MC_REFL spot appears to not have shifted significantly (so slow bias voltages are probably not to blame). Now I have to look at trend data to see if there is any evidence of glitching.

I'm not sure I understand the input matrix though - the matrix elements would have me believe that the sensing of POS in UL is ~5x stronger than in UR and LL, but the peak heights don't back that up.

Attachment #3: Second trend over 5hours (since frame writing was re-enabled this morning). Note that MC1 is still free-swinging but there is no evidence of steps of ~30cts which have been observed some days ago. Also, from my observations yesterday, MC1 glitched multiple times over a few hours timescale. More data will have to be looked at, but as things stand, Hypothesis #3 below looks the best.

Quote:
 

Some possible scenarios (assuming the free swinging spectra look alright and the various resonances are where we expect them to be):

  1. With the watchdog shutdown, the PIT/YAW bias voltages still goto the coil (low-passed by 4 poles @1Hz). So if the glitching happens in this path, we should see it in both the shadow sensors and the DC spot positions on the WFS.
  2. If the glitching happens in the shadow sensor readout electronics/cabling, we should see it in the shadow sensor channels, but NOT in the DC spot positions on the WFS (as the watchdog is shutdown, so there should be no actuation to the coils based on OSEM signals).
  3. If we don't see any glitches in WFS spot positions or shadow sensors, then it is indicative of the problem being in the coil driver board / dewhitening board/anti-aliasing board.
  4. I am discounting the problem being in the Satellite box, as we have switched around the MC1 satellite box multiple times - the glitches remain on MC1 and don't follow a Satellite Box. Of course there is the possibility that the cabling from 1X5/1X6 to the Satellite box is bad.

 

Attachment 1: MC1_freeswinging.pdf
MC1_freeswinging.pdf
Attachment 2: MC1_inmatrix.png
MC1_inmatrix.png
Attachment 3: MC1_sensors.png
MC1_sensors.png
  13196   Fri Aug 11 17:36:47 2017 gautamUpdateSUSMC1 <--> MC3

About 30mins ago, I saw another glitch on MC1 - this happened while the Watchdog was shutdown.

In order to further narrow down the cause of the glitch, we switched the Coil Driver Board --> Satellite box DB(15?) connectors on the coil drivers between MC1 and MC3 coil driver boards. I also changed the static PIT/YAW bias voltages to MC1 and MC3 such that MC-REFL is now approximately back to the center of the CCD monitor.

 

Attachment 1: MC1_glitch_watchdog_shutdown.png
MC1_glitch_watchdog_shutdown.png
  13197   Fri Aug 11 18:53:35 2017 gautamUpdateCDSSlow EPICS channels -> Frames re-enabled
Quote:

Seems like something has failed after I did this - full frames are no longer on Aug 10 being written since ~2.30pm PDT. I found out when I tried to download some of the free-swinging MC1 data.

To clarify, I logged into fb1, and ran sudo systemctl restart daqd_*. The only change I made was to uncomment the line quoted below in the master file.

Looking at the log using systemctl, I see the following (I just tried restarting the daqd processes again):

Aug 11 00:00:31 fb1 daqd_fw[16149]: LDASUnexpected::unexpected: Caught unexpected exception      "This is a bug. Please log an LDAS problem report including this message.
Aug 11 00:00:31 fb1 daqd_fw[16149]: daqd_fw: LDASUnexpected.cc:131: static void LDASTools::Error::LDASUnexpected::unexpected(): Assertion `false' failed.
Aug 11 00:00:32 fb1 systemd[1]: daqd_fw.service: main process exited, code=killed, status=6/ABRT
Aug 11 00:00:32 fb1 systemd[1]: Unit daqd_fw.service entered failed state.
Aug 11 00:00:32 fb1 systemd[1]: daqd_fw.service holdoff time over, scheduling restart.
Aug 11 00:00:32 fb1 systemd[1]: Stopping Advanced LIGO RTS daqd frame writer...
Aug 11 00:00:32 fb1 systemd[1]: Starting Advanced LIGO RTS daqd frame writer...
Aug 11 00:00:32 fb1 systemd[1]: daqd_fw.service start request repeated too quickly, refusing to start.
Aug 11 00:00:32 fb1 systemd[1]: Failed to start Advanced LIGO RTS daqd frame writer.
Aug 11 00:00:32 fb1 systemd[1]: Unit daqd_fw.service entered failed state.

Oddly, I am able to access second trends for the same channels from the past which will be useful for the MC1 debugging). Not sure whats going on.


The live data grabbing using cdsutils still seems to be working though - so I've kicked MC1 again, and am grabbing 2 hours of data live on Pianosa.

So we tried this again with a fresh build of daqd_fw, and it still fails.  The error message is pointing to an underlying bug in the framecpp library ("LDASTools"), which may be tricky to solve.  I'm rustling the appropriate bushes...

ELOG V3.1.3-