40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 276 of 337  Not logged in ELOG logo
ID Date Author Typeup Category Subject
  13096   Wed Jul 5 16:09:34 2017 gautamUpdateCDSslow machine bootfest

Reboots for c1susaux, c1iscaux today.

 

  13097   Wed Jul 5 19:10:36 2017 gautamUpdateGeneralNB code checkout - updated

I've been making NBs on my laptop, thought I would get the copy under version control up-to-date since I've been negligent in doing so.

The code resides in /ligo/svncommon/NoiseBudget, which as a whole is a git directory. For neatness, most of Evan's original code has been put into the sub-directory  /ligo/svncommon/NoiseBudget/H1NB/, while my 40m NB specific adaptations of them are in the sub-directory /ligo/svncommon/NoiseBudget/NB40. So to make a 40m noise budget, you would have to clone and edit the parameter file accordingly, and run python C1NB.py C1NB_2017_04_30.py for example. I've tested that it works in its current form. I had to install a font package in order to make the code run (with sudo apt-get install tex-gyre ), and also had to comment out calls to GwPy (it kept throwing up an error related to the package "lal", I opted against trying to debug this problem as I am using nds2 instead of GwPy to get the time series data anyways).

There are a few things I'd like to implement in the NB like sub-budgets, I will make a tagged commit once it is in a slightly neater state. But the existing infrastructure should allow making of NBs from the control room workstations now.

Quote:

[evan, gautam]

We spent some time trying to get the noise-budgeting code running today. I guess eventually we want this to be usable on the workstations so we cloned the git repo into /ligo/svncommon. The main objective was to see if we had all the dependencies for getting this code running already installed. The way Evan has set the code up is with a bunch of dictionaries for each of the noise curves we are interested in - so we just commented out everything that required real IFO data. We also commented out all the gwpy stuff, since (if I remember right) we want to be using nds2 to get the data. 

Running the code with just the gwinc curves produces the plots it is supposed to, so it looks like we have all the dependencies required. It now remains to integrate actual IFO data, I will try and set up the infrastructure for this using the archived frame data from the 2016 DRFPMI locks..

 

  13098   Thu Jul 6 11:58:28 2017 jigyasaUpdateCamerasHDR images of ETMX

I captured a few images of the beam spot on ETMX at 5ms, 10ms, 14ms, 50ms, 100ms, 500ms, 1000ms exposure and ran them through my python script for HDR images. Here's what I obtained. 
The resulting image is an improvement over the highly saturated images at say, 500ms and 1 second exposures. 
Additionally, I also included a colormapped version of the image. 

Attachment 1: ETMXHDRcolormap.png
ETMXHDRcolormap.png
Attachment 2: ETMXHDRimage.png
ETMXHDRimage.png
  13100   Fri Jul 7 14:34:27 2017 ranaUpdateCamerasHDR images of ETMX

i wonder how 'HDR' these images really are. is there a quantitative way to check that we are really getting more bits? also, how many bits does the PNG format allow for monochrome images? i worry that these elog images are already lossy.

 

  13101   Sat Jul 8 17:09:50 2017 gautamUpdateGeneralETMY TRANS QPD anomaly

About 2 weeks ago, I noticed some odd behaviour of the LSC TRY data stream. Its DC value seems to be drifting ~10x more than TRX. Both signals come from the transmission QPDs. At the time, we were dealing with various CDS FE issues but things have been stable on that end for the last two weeks, so I looked into this a bit more today. It seems like one particular channel is bad - Quadrant 4 of the ETMY TRANS QPD. Furthermore, there is a bump around 150Hz, and some features above 2kHz, that are only present for the ETMY channels and not the ETMX ones.

Since these spectra were taken with the PSL shutter closed and all the lab room lights off, it would suggest something is wrong in the electronics - to be investigated.

The drift in TRY can be as large as 0.3 (with 1.0 being the transmitted power in the single arm lock). This seems unusually large, indeed we trigger the arm LSC loops when TRY > 0.3. Attachment #2 shows the second trend of the TRX and TRY 16Hz EPICS channels for 1 day. In the last 12 hours or so, I had left the LSC master switch OFF, but the large drift of the DC value of TRY is clearly visible.

In the short term, we can use the high-gain THORLABS PD for TRY monitoring.

Attachment 1: ETMY_QPD.pdf
ETMY_QPD.pdf
Attachment 2: ETMY_QPD.png
ETMY_QPD.png
  13102   Sun Jul 9 08:58:07 2017 ranaUpdateGeneralETMY TRANS QPD anomaly

Indeed, the whole point of the high/low gain setup is to never use the QPDs for the single arm work. Only use the high gain Thorlabs PD and then the switchover code uses the QPD once the arm powers are >5.

I don't know how the operation procedure went so higgledy piggledy.

  13103   Mon Jul 10 09:49:02 2017 gautamUpdateGeneralAll FEs down

Attachment #1: State of CDS overview screen as of 9.30AM today morning when I came in.

Looks like there may have bene a power glitch, although judging by the wall StripTool traces, if there was one, it happened more than 8 hours ago. FB is down atm so can't trend to find out when this happened.

All FEs and FB are unreachable from the control room workstations, but Megatron, Optimus and Chiara are all ssh-able. The latter reports an uptime of 704 days, so all seems okay with its UPS. Slow machines are all responding to ping as well as telnet.

Recovery process to begin now. Hopefully it isn't as complicated as the most recent effort indecision[FAMOUS LAST WORDS]

Attachment 1: CDS_down_10Jul2017.png
CDS_down_10Jul2017.png
  13104   Mon Jul 10 11:20:20 2017 gautamUpdateGeneralAll FEs down

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

  13105   Mon Jul 10 17:13:21 2017 jigyasaUpdateComputer Scripts / ProgramsCapture image without pylon GUI

Over the day, I have been working on a C++ program to interface with Pylon to capture images and reduce dependence on the Pylon GUI. The program uses the Pylon header files along with opencv headers. While ultimately a wrapper in python may be developed for the program, the current C++ program at, 

/users/jigyasa/GigEcode/Grab/Grab.cpp when compiled as

g++ -Wl,--enable-new-dtags -Wl,-rpath,/opt/pylon5/lib64 -o Grab Grab.o -L/opt/pylon5/lib64 -Wl,-E -lpylonbase -lpylonutility -lGenApi_gcc_v3_0_Basler_pylon_v5_0 -lGCBase_gcc_v3_0_Basler_pylon_v5_0 `pkg-config opencv --cflags --libs`

returns an executable file named Grab which can be executed as ./Grab

This captures one image from the camera and displays it, additionally it also displays the gray value of the first pixel.

I am working on adding more utility to the program such as manually adjusting exposure, gain and also on the python wrapper (Cython has been installed locally on Ottavia for the purpose)!

  13106   Mon Jul 10 17:46:26 2017 gautamUpdateGeneralAll FEs down

A bit more digging on the diagnostics page of the RAID array reveals that the two power supplies actually failed on Jun 2 2017 at 10:21:00. Not surprisingly, this was the date and approximate time of the last major power glitch we experienced. Apart from this, the only other error listed on the diagnostics page is "Reading Error" on "IDE CHANNEL 2", but these errors precede the power supply failure.

Perhaps the power supplies are not really damaged, and its just in some funky state since the power glitch. After discussing with Jamie, I think it should be safe to power cycle the Jetstor RAID array once the FB machine has been powered down. Perhaps this will bring back one/both of the faulty power supplies. If not, we may have to get new ones. 

The problem with FB may or may not be related to the state of the Jestor RAID array. It is unclear to me at what point during the boot process we are getting stuck at. It may be that because the RAID disk is in some funky state, the boot process is getting disrupted.

Quote:

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

 

  13107   Mon Jul 10 19:15:21 2017 gautamUpdateGeneralAll FEs down

The Jetstor RAID array is back in its nominal state now, according to the web diagnostics page. I did the following:

  1. Powered down the FB machine - to avoid messing around with the RAID array while the disks are potentially mounted.
  2. Turned off all power switches on the back of the Jetstor unit - there were 4 of them, all of them were toggled to the "0" position.
  3. Disconnected all power cords from the back of the Jetstor unit - there were 3 of them.
  4. Reconnected the power cords, turned the power switches back on to their "1" position.

After a couple of minutes, the front LCD display seemed to indicate that it had finished running some internal checks. The messages indicating failure of power units, which was previously constantly displayed on the front LCD panel, was no longer seen. Going back to the control room and checking the web diagnostics page, everything seemed back to normal.

However, FB still will not boot up. The error is identical to that discussed in this thread by Intel. It seems FB is having trouble finding its boot disk. I was under the impression that only the FE machines were diskless, and that FB had its own local boot disk - in which case I don't know why this error is showing up. According to the linked thread, it could also be a problem with the network card/cable, but I saw both lights on the network switch port FB is connected to turn green when I powered the machine on, so this seems unlikely. I tried following the steps listed in the linked thread but got nowhere, and I don't know enough about how FB is supposed to boot up, so I am leaving things in this state now. 

  13108   Mon Jul 10 21:03:48 2017 jamieUpdateGeneralAll FEs down

 

Quote:
 

However, FB still will not boot up. The error is identical to that discussed in this thread by Intel. It seems FB is having trouble finding its boot disk. I was under the impression that only the FE machines were diskless, and that FB had its own local boot disk - in which case I don't know why this error is showing up. According to the linked thread, it could also be a problem with the network card/cable, but I saw both lights on the network switch port FB is connected to turn green when I powered the machine on, so this seems unlikely. I tried following the steps listed in the linked thread but got nowhere, and I don't know enough about how FB is supposed to boot up, so I am leaving things in this state now. 

It's possible the fb bios got into a weird state.  fb definitely has it's own local boot disk (*not* diskless boot).  Try to get to the BIOS during boot and make sure it's pointing to it's local disk to boot from.

If that's not the problem, then it's also possible that fb's boot disk got fried in the power glitch.  That would suck, since we'd have to rebuild the disk.  If it does seem to be a problem with the boot disk then we can do some invasive poking to see if we can figure out what's up with the disk before rebuilding.

  13110   Mon Jul 10 22:07:35 2017 KojiUpdateGeneralAll FEs down

I think this is the boot disk failure. I put the spare 2.5 inch disk into the slot #1. The OK indicator of the disk became solid green almost immediately, and it was recognized on the BIOS in the boot section as "Hard Disk". On the contrary, the original disk in the slot #0 has the "OK" indicator kept flashing and the BIOS can't find the harddisk.

 

  13111   Tue Jul 11 15:03:55 2017 gautamUpdateGeneralAll FEs down

Jamie suggested verifying that the problem is indeed with the disk and not with the controller, so I tried switching the original boot disk to Slot #1 (from Slot #0 where it normally resides), but the same problem persists - the green "OK" indicator light keeps flashing even in Slot #1, which was verified to be a working slot using the spare 2.5 inch disk. So I think it is reasonable to conclude that the problem is with the boot disk itself.

The disk is a Seagate Savvio 10K.2 146GB disk. The datasheet doesn't explicitly suggest any recovery options. But Table 24 on page 54 suggests that a blinking LED means that the disk is "spinning up or spinning down". Is this indicative of any particular failure moed? Any ideas on how to go about recovery? Is it even possible to access the data on the disk if it doesn't spin up to the nominal operating speed?

Quote:

I think this is the boot disk failure. I put the spare 2.5 inch disk into the slot #1. The OK indicator of the disk became solid green almost immediately, and it was recognized on the BIOS in the boot section as "Hard Disk". On the contrary, the original disk in the slot #0 has the "OK" indicator kept flashing and the BIOS can't find the harddisk.

 

 

  13112   Tue Jul 11 15:12:57 2017 KojiUpdateGeneralAll FEs down

If we have a SATA/USB adapter, we can test if the disk is still responding or not. If it is still responding, can we probably salvage the files?
Chiara used to have a 2.5" disk that is connected via USB3. As far as I know, we have remote and local backup scripts running (TBC), we can borrow the USB/SATA interface from Chiara.

If the disk is completely gone, we need to rebuilt the disk according to Jamie, and I don't know how to do it. (Don't we have any spare copy?)

  13113   Wed Jul 12 10:21:07 2017 gautamUpdateGeneralAll FEs down

Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.

Quote:

If we have a SATA/USB adapter, we can test if the disk is still responding or not. If it is still responding, can we probably salvage the files?
Chiara used to have a 2.5" disk that is connected via USB3. As far as I know, we have remote and local backup scripts running (TBC), we can borrow the USB/SATA interface from Chiara.

If the disk is completely gone, we need to rebuilt the disk according to Jamie, and I don't know how to do it. (Don't we have any spare copy?)

 

  13114   Wed Jul 12 14:46:09 2017 gautamUpdateGeneralAll FEs down

I couldn't find an external docking setup for this SAS disk, seems like we need an actual controller in order to interface with it. Mike Pedraza in Downs had such a unit, so I took the disk over to him, but he wasn't able to interface with it in any way that allows us to get the data out. He wants to try switching out the logic board, for which we need an identical disk. We have only one such spare at the 40m that I could locate, but it is not clear to me whether this has any important data on it or not. It has "hda RTLinux" written on its front panel with a sharpie. Mike thinks we can back this up to another disk before trying anything, but he is going to try locating a spare in Downs first. If he is unsuccessful, I will take the spare from the 40m to him tomorrow, first to be backed up, and then for swapping out the logic board.

Chatting with Jamie and Koji, it looks like the options we have are:

  1. Get the data from the old disk, copy it to a working one, and try and revert the original FB machine to its last working state. This assumes we can somehow transfer all the data from the old disk to a working one.
  2. Prepare a fresh boot disk, load the old FB daqd code (which is backed up on Chiara) onto it, and try and get that working. But Jamie isn't very optimistic of this working, because of possible conflicts between the code and any current OS we would install.
  3. Get FB1 working. Jamie is looking into this right now.
Quote:

Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.

 

 

  13115   Wed Jul 12 14:52:32 2017 jamieUpdateGeneralAll FEs down

I just want to mention that the situation is actually much more dire than we originally thought.  The diskless NFS root filesystem for all the front-ends was on that fb disk.  If we can't recover it we'll have to rebuilt the front end OS as well.

As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.

  13117   Fri Jul 14 17:47:03 2017 gautamUpdateGeneralDisks from LLO have arrived

[jamie, gautam]

Today morning, the disks from LLO arrived. Jamie and I have been trying to get things back up and running, but have not had much success today. Here is a summary of what we tried.

Keith Thorne sent us two disks: one has the daqd code and the second is the boot disk for the FE machines. Since Jamie managed to successfully compile the daqd code on FB1 yesterday, we decided to try the following: mount the boot disk KT sent us (using a SATA/USB adapter) on /mnt on FB1, get the FEs booted up, and restart the RT models. 

Quote:

I just want to mention that the situation is actually much more dire than we originally thought.  The diskless NFS root filesystem for all the front-ends was on that fb disk.  If we can't recover it we'll have to rebuilt the front end OS as well.

As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.

While on FB1, Jamie realized he actually had a copy of the /diskless/root directory, which is the NFS filesystem for the FEs, on FB1. So we decided to try and boot some of the FEs with this (instead of starting from scratch with the disks KT sent us). The way things were set up, the FEs were querying the FB machine as the DHCP server. But today, we followed the instructions here to get the FEs to get their IP address from chiara instead. We also added the line 

/diskless/root *(sync,rw,no_root_squash,no_all_squash,no_subtree_check)

to /etc/exports followed by exportfs -ra on FB1. At which point the FE machine we were testing (c1lsc) was able to boot up. 

However, it looks like the NFS filesystem isn't being mounted correctly, for reasons unknown. We commented out some of the rtcds related lines in /etc/rc.local because they were causing a whole bunch of errors at boot (the lines that were touched have been tagged with today's date).


So in summary, the status as of now is:

  1. Front-end machines are able to boot
  2. There seems to be some problem during the boot process, leading to the NFS file system not being correctly mounted. The closest related thing I could find from an elog search is this entry, but I think we are facing a different probelm.
  3. We wanted to see if we could start the realtime models (but without daqd for now), but we weren't even able to get that far today.

We will resume recovery efforts on Monday.

  13118   Sat Jul 15 01:28:53 2017 jigyasaUpdateCamerasBRDF Calibrations

This evening, Gautam helped me with setting up the apparatus for calibrating the GigE for BRDF measurements.
The SP table was chosen to set up the experiment and for this reason a few things including a laser and power meter (presumably set up by Steve) had to be moved around.

We initially started by setting up the Crysta laser with its power source (Crysta #2, 150-190 mW 1064 laser) on the SP table. The Ophir power meter was used to measure the laser power. We discovered that the laser was highly unstable as its output on the power meter fluctuated (kind of periodically) between 40 and 150 mW. The beam spot on the beam card also appeared to validate this change in intensity. So we decided to use another 1064 nm laser instead.
Gautam got the LightWave NPro laser from the PSL table and set it up on the SP table and with this laser the output as measured by the same power meter was quite stable.

We manually adjusted the power to around 150 mW. This was followed by setting up the half wave plate(HWP) with the polarizing beam splitter (PBS), which was very gently and precisely done by Gautam, while explaining how to handle the optics to me.
 On first installing the PBS, we found that the beam was already quite strongly polarized as there seemed to be zero transmission but a strong reflection.
With the HWP in place, we get a control over the transmitted intensity. The reflected beam is directed to a beam dump.
I have taken down the GigE(+mount) at ETMX and wired a spare PoE injector.
We tried to interface with the camera wirelessly through the wireless network extenders but that seems to render an unstable connection to the GigE so while a single shot works okay, a continuous shot on the GigE didn’t succeed.

The GigE was connected to the Martian via Ethernet cable and images were observed using a continuous shot on the Pylon Viewer App on Paola. 

We deliberated over the need of a beam expander, but it has been omitted presently. White printer paper is currently being used to model the Lambertian scatterer. So light scattered off the paper was observed at a distance of about 40 cm from the sample.
While proceeding with the calibrations further tonight, we realized a few challenges.

While the CCD is able to observe the beam spot perfectly well, measuring the actual power with the power meter seems to be tricky. As the scattered power is quite low, we can’t actually see any spot using a beam card and hence can’t really ensure if we are capturing the entire beam spot on the active region of the power meter (placed at a distance of ~40cm from the paper) or if we are losing out on some light, all the while ensuring that the power meter and the CCD are in the same plane.

We tried to think of some ways around that, the description of which will follow. Any ideas would be greatly appreciated.

Thanks a ton for all your patience and help Gautam! :) 

More to follow.. 

  13119   Sat Jul 15 13:40:59 2017 ranaUpdateCamerasBRDF Calibrations

Power meter only needed to measure power going into the paper not out. We use the BRDF of paper to estimate the power going out given the power going in.

  13120   Sat Jul 15 16:19:00 2017 gautamUpdateCamerasMakeshift PyPylon

Some days ago, I stumbled upon this github page, by a grad student at KIT who developed this code as he was working with Basler GigE cameras. Since we are having trouble installing SnapPy, I figured I'd give this package a try. Installation was very easy, took me ~10mins, and while there isn't great documentation, basic use is very easy - for instance, I was able to adjust the exposure time, and capture an image, all from Pianosa. The attached is some kind of in-built function rendering of the captured image - it is a piece of paper with some scribbles on it near Jigyasa's BRDF measurement setup on the SP table, but it should be straightforward to export the images in any format we like. I believe the axes are pixel indices.

Of course this is only a temporary solution as I don't know if this package will be amenable to interfacing with EPICS servers etc, but seems like a useful tool to have while we figure out how to get SnapPy working. For instance, the HDR image capture routine can now be written entirely as a Python script, and executed via an MEDM button or something.

A rudimentary example file can be found at /opt/rtcds/caltech/c1/scripts/GigE/PyPylon/examples - some of the dictionary keywords to access various properties of the camera (e.g. Exposure time) are different, but these are easy enough to figure out.

 

Attachment 1: pyPylon_test.png
pyPylon_test.png
  13121   Sun Jul 16 11:58:36 2017 jigyasaUpdateCamerasBRDF Calibrations

 

From what I understood froom my reading, [Large-angle scattered light measurements for quantum-noise filter cavity design studies(Refer https://arxiv.org/abs/1204.2528)], we do the white paper test in order to calibrate for the radiometric response, i.e. the response of the CCD sensor to radiance.‘We convert the image counts measured by the CCD camera into a calibrated measure of scatter. To do this we measure the scattered light from a diffusing sample twice, once with the CCD camera and once with a calibrated power meter. We then compare their readings.’

But thinking about this further, if we assume that the BRDF remains unscaled and estimate the scattered power from the images, we get a calibration factor for the scattered power and the angle dependence of the scattered power!

Quote:

Power meter only needed to measure power going into the paper not out. We use the BRDF of paper to estimate the power going out given the power going in.

 

  13122   Sun Jul 16 12:09:47 2017 jigyasaUpdateCamerasBRDF Calibrations

With this idea in mind, we can now actually take images of the illuminated paper at different scattering angles, assume BRDF is the constant value of (1/pi per steradian), 

then scattered power Ps= BRDF * Pi cosθ * Ω, where Pi is the incident power, Ω is the solid angle of the camera and θ is the scattering angle at which measurement is taken. This must also equal the sum of pixel counts divided by the exposure time multiplied by some calibration factor. 

From these two equations we can obtain the calibration factor of the CCD. And for further BRDF measurements, scale the pixel count/ exposure by this calibration factor.  

Quote:

 

From what I understood froom my reading, [Large-angle scattered light measurements for quantum-noise filter cavity design studies(Refer https://arxiv.org/abs/1204.2528)], we do the white paper test in order to calibrate for the radiometric response, i.e. the response of the CCD sensor to radiance.‘We convert the image counts measured by the CCD camera into a calibrated measure of scatter. To do this we measure the scattered light from a diffusing sample twice, once with the CCD camera and once with a calibrated power meter. We then compare their readings.’

But thinking about this further, if we assume that the BRDF remains unscaled and estimate the scattered power from the images, we get a calibration factor for the scattered power and the angle dependence of the scattered power!

Quote:

Power meter only needed to measure power going into the paper not out. We use the BRDF of paper to estimate the power going out given the power going in.

 

 

  13123   Mon Jul 17 16:22:01 2017 SteveUpdateSUSruby wire standoff pictures

Bluebean Optical Tech Limited of Shanghai delivered 50 pieces red ruby prisms with radius.  The first prism pictures were taken at June 5

and it was retaken again as BB#1 later

More samples were selected randomly as one from each bag of 5 and labeled as BB#2.......6    

 The R10 mm radius can be seen agains the  ruler edge.  The v-groove edge was labeled with blue marker and pictures were taken

from both side of this ridge. The top view is shown as the wire laying across on it.

SOS sus wire of 43 micron OD used as calibration as it was placed close to the side that it was focused on.

The V-groove ridge surface quality was evaluated based on as scale of 1 – 10 with 10 being the most positive.

 BB# Edge quality score
1 4
2 8
3 3
4 9.5
5 2
6 9

Remaining thing to examin, take picture of the contacting ridge to SOS from the side.

Attachment 1: contacting_ridge.bmp
Attachment 2: contacting_ridge.png
contacting_ridge.png
  13124   Wed Jul 19 00:59:47 2017 gautamUpdateGeneralFINESSE model of DRMI (no arms)

Summary:

I've been working on improving the 40m FINESSE model I set up sometime last year (where the goal was to model various RC folding mirror scenarios). Specifically, I wanted to get the locking feature of FINESSE working, and also simulate the DRMI (no arms) configuration, which is what I have been working on locking the real IFO to. This elog is a summary of what I have from the last few days of working on this.

Model details:

  • No IMC included for now.
  • Core optics R and T from the 40m wiki page.
  • Cavity lengths are the "ideal" ones - see the attached ipynb for the values used.
  • RF modulation depths from here. But for now, the relative phase between f1 and f2 at the EOM is set to 0.
  • I've not included flipped folding mirrors - instead, I put a loss of 0.5% on PR3 and SR3 in the model to account for the AR surface of these optics being inside the RCs. 
  • I've made the AR surfaces of all optics FINESSE "beamsplitters" - there was some discussion on the FINESSE mattermost channel about how not doing this can lead to slightly inaccurate results, so I've tried to be more careful in this respect.
  • I'm using "maxtem 1" in my FINESSE file, which means TEM_mn modes up to (m+n=1) are taken into account - setting this to 0 makes it a plane wave model. This parameter can significantly increase the computational time. 

Model validation:

  • As a first check, I made the PRM and SRM transparent, and used the in-built routines in FINESSE to mode-match the input beam to the arm cavities.
  • I then scanned one arm cavity about a resonance, and compared the transmisison profile to the analytical FP cavity expression - agreement was good.
  • Next, I wanted to get a sensing matrix for the DRMI (no arms) configuration (see attached ipynb notebook).
    • First, I make the ETMs in the model transparent
    • I started with the phases for the BS, PRM and SRM set to their "naive" values of 0, 0 and 90 (for the standard DRMI configuration)
    • I then scanned these optics around, used various PDs to look at the points where appropriate circulating fields reached their maximum values, and updated the phase of the optic with these values.
    • Next, I set the demod phase of various RFPDs such that the PDH error signal is entirely in one quadrature. I use the RFPDs in pairs, with demod phases separated by 90 degrees. I arbitrarily set the demod phase of the Q phase PD as 90 + phase of I phase PD. I also tried to mimic the RFPD-IFO DoF pairing that we use for the actual IFO - so for example, PRCL is controlled by REFL11_I.
    • Confident that I was close enough to the ideal operating point, I then fed the error signals from these RFPDs to the "lock" routine in FINESSE. The manual recommends setting the locking loop gain to 1/optical gain, which is what I did.
    • The tunings for the BS and RMs in the attached kat file are the result of this tuning.
    • For the actual sensing matrix, I moved each of PRM, BS and SRM +/-5 degrees (~15nm) around each resonance. I then computed the numerical derivative around the zero crossing of each RFPD signal, and then plotted all of this in some RADAR plots - see Attachment #1.

Explanation of Attachments and Discussion:

  • Attachment #1 - Computed sensing matrix from this model. Compare to an actual measurement, for example here - the relative angle between the sensing matrix elements dont exactly line up with what is measured. EQ suggested today that I should look into tuning the relative phase between the RF frequencies at the EOM. Nevertheless, I tried comparing the magnitudes of the MICH sensing element in AS55 Q - the model tells me that it should be ~7.8*10^5 W/m. In this elog, I measured it to be 2.37*10^5 W/m. On the AS table, there is a 50-50 BS splitting the light between the AS55 and AS110 photodiodes which is not accounted for in the model. Factoring this in, along with the fact that there are 6 in-vaccuum steering mirrors (assume 98% reflectivity for these), 3 in air steering mirrors, and the window, the sensing matrix element from the model starts to be in the same ballpark as the measurement, at ~3*10^5 W/m. So the model isn't giving completely crazy results.
  • Attachment #2 - Example of the signals at various RFPDs in response to sweeping the PRM around its resonance. To be compared with actual IFO data. Teal lines are the "I" phase, and orange lines are "Q" phase.
  • Attachment #3 - FINESSE kat file and the IPython notebook I used to make these plots. 
  • Next steps
    • More validation against measurements from the actual IFO.
    • Try and resolve differences between modeled and measured sensing matrices.
    • Get locking working with full IFO - there was a discussion on the mattermost thread about sequential/parallel locking some time ago, I need to dig that up to see what is the right way to get this going. Probably the DRMI operating point will also change, because of the complex reflectivities of the arm cavities seen by the RF sidebands (this effect is not present in the current configuration where I've made the ETMs transparent).

GV Edit: EQ pointed out that my method of taking the slope of the error signal to compute the sensing element isn't the most robust - it relies on choosing points to compute the slope that are close enough to the zero crossing and also well within the linear region of the error signal. Instead, FINESSE allows this computation to be done as we do in the real IFO - apply an excitation at a given frequency to an optic and look at the twice-demodulated output of the relevant RFPD (e.g. for PRCL sensing element in the 1f DRMI configuration, drive PRM and demodulate REFL11 at 11MHz and the drive frequenct). Attachment #4 is the sensing matrix recomputed in this way - in this case, it produces almost identical results as the slope method, but I think the double-demod technique is better in that you don't have to worry about selecting points for computing the slope etc. 

 

Attachment 1: DRMI_sensingMat.pdf
DRMI_sensingMat.pdf
Attachment 2: DRMI_errSigs.pdf
DRMI_errSigs.pdf
Attachment 3: 40m_DRMI_FINESSE.zip
Attachment 4: DRMI_sensingMat_19Jul.pdf
DRMI_sensingMat_19Jul.pdf
  13125   Wed Jul 19 08:37:21 2017 JamieUpdateCDSUpdate on front-end/DAQ rebuild

After the catastrophic fb disk failure last week we lost essentially the entire front end system (not any of the userapp code, but the front end boot server, operating system, and DAQ).  The fb disk was entirely unrecoverable, so we've been trying to rebuild everything from the bits and pieces lying around, and some disks that Keith Thorne sent from LLO.  We're trying to get the front ends working first, and will work on recovering daqd after.

Luckily, fb1, which was being configured as an fb replacement, is mostly fully configured, including having a copy of the front end diskless root image.  We setup fb1 as the new boot server, and were able to get front ends booting again.  Unfortunately, we've been having trouble running and building models, so something is still amis.  We've been taking a three-pronged approach to getting the front ends running:

  • /diskless/root.fb: This involves booting the front ends from the backup of the diskless root from fb.  Runs gentoo kernel 2.6.34.1.  This should correspond to the environment that all models were built and running against.  But something is missing in the configuration.  The front ends were also mounting /opt from fb, which included the dolphin drivers, and we don't have a copy of that, so models aren't loading or recompiling.
  • /diskless/root.x1boot: Keith sent a disk image of the entire x1boot server from LLO.  It uses gentoo kernel 3.0.8.  This ostensibly includes everything we should need to run the front ends, but it's unfortunately configured with newer versions of some of the software and also isn't loading our existing models or building new ones.  This also seems to be having issues with the dolphin drivers.
  • /diskless/root.jessie: This is an entirely new boot image build from scratch with Debian jessie, using an RTS-patched 3.2 kernel.  This would use the latest versions of everything.  It's mostly working, we just need to rebuild the dolphin driver and source.

It seems that in all cases we need to rebuild the dolphin drivers from source.

  13127   Wed Jul 19 14:26:50 2017 JamieUpdateCDSUpdate on front-end/DAQ rebuild

 

Quote:

After the catastrophic fb disk failure last week we lost essentially the entire front end system (not any of the userapp code, but the front end boot server, operating system, and DAQ).  The fb disk was entirely unrecoverable, so we've been trying to rebuild everything from the bits and pieces lying around, and some disks that Keith Thorne sent from LLO.  We're trying to get the front ends working first, and will work on recovering daqd after.

Luckily, fb1, which was being configured as an fb replacement, is mostly fully configured, including having a copy of the front end diskless root image.  We setup fb1 as the new boot server, and were able to get front ends booting again.  Unfortunately, we've been having trouble running and building models, so something is still amis.  We've been taking a three-pronged approach to getting the front ends running:

  • /diskless/root.fb: This involves booting the front ends from the backup of the diskless root from fb.  Runs gentoo kernel 2.6.34.1.  This should correspond to the environment that all models were built and running against.  But something is missing in the configuration.  The front ends were also mounting /opt from fb, which included the dolphin drivers, and we don't have a copy of that, so models aren't loading or recompiling.
  • /diskless/root.x1boot: Keith sent a disk image of the entire x1boot server from LLO.  It uses gentoo kernel 3.0.8.  This ostensibly includes everything we should need to run the front ends, but it's unfortunately configured with newer versions of some of the software and also isn't loading our existing models or building new ones.  This also seems to be having issues with the dolphin drivers.
  • /diskless/root.jessie: This is an entirely new boot image build from scratch with Debian jessie, using an RTS-patched 3.2 kernel.  This would use the latest versions of everything.  It's mostly working, we just need to rebuild the dolphin driver and source.

It seems that in all cases we need to rebuild the dolphin drivers from source.

To clarify, we're able to boot the x1boot image with the existing 2.6.25 kernel that we have from fb.  The issue with the root.x1boot image is not the kernel version but some of the other support libraries, such as dolphin.

  13130   Fri Jul 21 18:03:17 2017 JamieUpdateCDSUpdate on front-end/DAQ rebuild

Update:

  • front ends booting with the new Debian jessie diskless root image and a linux 3.2 version of the RTS-patched kernel
  • dolphin is configured correctly and running on c1lsc and c1sus
  • models building and running with RCG 3.0.3

Up next:

  • add c1ioo to the dolphin network
  • recompile/restart all front end models
  • daqd

I'll try to get the first two of those done tomorrow, although it's unclear what model updates we'll have to do to get things working with the newer RCG.

 

  13133   Sun Jul 23 22:16:55 2017 Jamie, gautamUpdateCDSfront-end now running with new OS, RCG

All front ends and model are (mostly) running now

All suspensions are damped:

It should be possible at this point to do more recovery, like locking the MC.

Some details on the restore process:

  • all models were recompiled with the new RCG version 3.0.3
  • the new RCG does stricter simulink drawing checks, and was complaining about unterminated outputs in some of the SUS models.  Terminated all outputs it was concerned about and saved.
  • RCG 3.0 requires a new directory for doing better filter module diagnostics: /opt/rtcds/caltech/c1/chans/tmp
  • had to reset the slow machines c1susaux, c1auxex, c1auxey

The daqd is not yet running.  This is the next task.

I have been taking copious notes and will fully document the restore process once complete.

c1ioo issues

c1ioo has been giving us a little bit of trouble.  The c1ioo model kept crashing and taking down the whole c1ioo host.  We found a red light on one of the ADCs (ADC1).  We pulled the card and replaced it with a spare from the CDS cabinet.  That seemed to fix the problem and c1ioo became more stable.

We've still been seeing a lot of glitching in c1ioo, though, with CPU cycle times frequently (every couple of seconds) running above threshold for all models, up to 200 us.  I tried unloading every kernel module I could and shutting down every non-critical process, but nothing seemed to help.

We eventually tried stopping the c1ioo model altogether and that seemed to help quite a bit, dropping the long cycle rate down to something like one every 30 seconds or so.  Not sure what that means.  We should look into the BIOS again, to see if there could be something interacting with the newer kernel.

So currently the c1ioo model is not running (which is why it's all white in the CDS overview snapshot above).  The fact that c1ioo is not running and the remaining models are still occaissionly glitching is also causing various IPC errors on auxilliary models (see c1mcs, c1rfm, c1ass, c1asx). 

RCG compile warnings

the new RCG tries to do more checks on custom c code, but it seems to be having trouble finding our custom "ccodeio.h" files that live with the c definitions in USERAPPS/*/common/src/.  Unclear why yet.  This is causing the RCG to spit out warnings like the following:

Cannot verify the number of ins/outs for C function BLRMS.
    File is /opt/rtcds/userapps/release/cds/c1/src/BLRMSFILTER.c
    Please add file and function to CDS_SRC or CDS_IFO_SRC ccodeio.h file.

This are just warnings and will not prevent the model form compiling or warning.  We'll figure out what the problem is to make these go away, but they can be ignored for the time being.

model unload instability

Probably the worst problem we're facing right now is an instability that will occaissionally, but not always, cause the entire front end host to freeze up upon unloading an RTS kernel module.  This is a known issue with the newer linux kernels (we're using kernel version 3.2.35), and is being looked into.

This is particularly annoying with the machines on the dolphin network, since if one of the dolphin hosts goes down it manages to crash all the models reading from the dolphin network.  Since half the time they can't be cleanly restarted, this tends to cause a boot fest with c1sus, c1lsc, and c1ioo.  If this happens, just restart those machines, wait till they've all fully booted, then restart all the models on all hosts with "rtcds start all".

  13135   Mon Jul 24 10:45:23 2017 gautamUpdateCDSc1iscex models died

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

Quote:

All front ends and model are (mostly) running now

Attachment 1: c1iscexFailure.png
c1iscexFailure.png
  13136   Mon Jul 24 10:59:08 2017 JamieUpdateCDSc1iscex models died
Quote:

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

This was me.  I had rebooted that machine and hadn't restarted the models.  Sorry for the confusion.

  13137   Mon Jul 24 12:00:21 2017 gautamUpdatePSLPSL NPRO mysteriously shut off

Summary:

At around 10:30AM today morning, the PSL mysteriously shut off. Steve and I confirmed that the NPRO controller had the RED "OFF" LED lit up. It is unknown why this happened. We manually turned the NPRO back on and hte PMC has been stably locked for the last hour or so.

Details:

There are so many changes to lab hardware/software that have been happening recently, it's not entirely clear to me what exactly was the problem here. But here are the observations:

  1. Yesterday, when I came into the lab, the MC REFL trace on the wall StripTool was 0 for the full 8 hour history - since we don't have data records, I can't go back further than this. I remember the PMC TRANS and REFL cameras looked normal, but there was no MC REFL spot on the CCD monitors. This is consistent with the PSL operating normally, the PMC being locked, and the PSL shutter being closed. Isn't the emergency vacuum interlock also responsible for automatically closing the PSL shutter? Perhaps if the turbo controller failure happened prior to Jamie/me coming in yesterday, maybe this was just the interlock doing its job. On Friday evening, the PSL shutter was certainly open and the MC REFL spot was visible on the camera. I also confirmed with Jamie that he didn't close the shutter.
  2. Attachment #1 shows the wall StripTool traces from earlier this morning. It looks like ~7.40AM, the MC REFL level went back up. Steve says he didn't manually open the shutter, and in any case, this was before the turbo pump controller failure was diagnosed. So why did the shutter open again
  3. When I came in at ~10AM, the CCD monitor showed that the PMC was locked, and the MC REFL spot was visible. 
  4. Also on attachment #1, there is a ~10min dip in the MC REFL level. This corresponds to ~10:30AM this morning. Both Steve and I were sitting in the control room at this time. We noticed that the PMC TRANS and REFL CCDs were dark. When we went in to check on the laser, we saw that it was indeed off. There was no one inside the lab area at this time to our knowledge, and as far as I know, the only direct emergency shutoff for the PSL is on the North-West corner of the PSL enclosure. So it is unclear why the laser just suddenly went off.

Steve says that this kind of behaviour is characteristic of a power glitch/surge, but nothing else seems to have been affected (I confirmed that the X and Y end lasers are ON). 

Attachment 1: IMG_7454.JPG
IMG_7454.JPG
  13138   Mon Jul 24 19:28:55 2017 JamieUpdateCDSfront end MX stream network working, glitches in c1ioo fixed

MX/OpenMX network running

Today I got the mx/open-mx networking working for the front ends.  This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel.  Again, this will all be documented.

controls@fb1:~ 0$ /opt/mx/bin/mx_info
MX Version: 1.2.16
MX Build: root@fb1:/opt/src/mx-1.2.16 Mon Jul 24 11:33:57 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:43:74:62
    Product code:    10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:    485052
    Mapper:        00:60:dd:43:74:62, version = 0x00000000, configured
    Mapped hosts:    6

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:43:74:62 fb1:0                             1,0
   1) 00:30:48:be:11:5d c1iscex:0                         1,0
   2) 00:30:48:bf:69:4f c1lsc:0                           1,0
   3) 00:25:90:0d:75:bb c1sus:0                           1,0
   4) 00:30:48:d6:11:17 c1iscey:0                         1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
controls@fb1:~ 0$

c1ioo timing glitches fixed

I also checked the BIOS on c1ioo and found that the serial port was enabled, which is known to cause timing glitches.  I turned off the serial port (and some power management stuff), and rebooted, and all the c1ioo timing glitches seem to have gone away.

It's unclear why this is a problem that's just showing up now.  Serial ports have always been a problem, so it seems unlikely this is just a problem with the newer kernel.  Could the BIOS have somehow been reset during the power glitch?

In any event, all the front ends are now booting cleanly, with all dolphin and mx networking coming up automatically, and all models running stably:

Now for daqd...

  13139   Mon Jul 24 19:57:54 2017 gautamUpdateCDSIMC locked, Autolocker re-enabled

Now that all the front end models are running, I re-aligned the IMC, locked it manually, and then tweaked the alignment some more. The IMC transmission now is hovering around 15300 counts. I re-enabled the Autolocker and FSS Slow loops on Megatron as well.

Quote:

MX/OpenMX network running

Today I got the mx/open-mx networking working for the front ends.  This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel.  Again, this will all be documented.

 

  13141   Tue Jul 25 02:03:59 2017 gautamUpdateOptical LeversOptical lever tuning thoughts

Summary:

Currently, I am unable to engage the coil-dewhitening filters without destroying cavity locks. One reason why this is so is because the present Oplev servos have a roll-off at high frequencies that is not steep enough - engaging the digital whitening + analog de-whitening just causes the DAC output to saturate. Today, Rana and I discussed some ideas about how to approach this problem. This elog collects these thoughts. As I flesh out these ideas, I will update them in a more complete writeup in T1700363 (placeholder for now). Past relevant elogs: 5376, 9680

  1. Why do we need optical levers?
    • ​​To stabilize the low-frequency seismic driven angular motion of the optics.
  2.  In what frequency range can we / do we need to stabilize the angular motion of the optics? How much error signal suppression do we need in the control band? How much is achievable given the current Oplev setup?
    • ​​To answer these questions, we need to build a detailed Oplev noise budget.
    • Ultimately, the Oplev error signal is sensing the differential motion between the suspended optic and the incident laser beam.
    • What frequency range does laser beam jitter dominate the actual optic motion? What about mechanical drifts of the optical tables the HeNes sit on? And for many of the vertex optics, the Oplev beam has multiple bounces on steering mirrors on the stack. What is the contribution of the stack motion to the error signal?
    • The answers to the above will tell us what lower and upper UGFs we should and can pick. It will also be instructive to investigate if we can come up with a telescope design near the Oplev QPD that significantly reduces beam jitter effects (see elog 10732). Also, can we launch/extract the beam into/from the vacuum chamber in such a way that we aren't so susceptible to motion of the stack?
  3. What are some noises that have to be measured and quantified?
    • Seismic noise
    • ​Shot noise
    • Electronics noise of the QPD readout chain
    • HeNe intensity noise (does this matter since we are normalizing by QPD sum?)
    • HeNe beam pointing / jitter noise (How? N-corner hat method?)
    • Stack motion contribution to the Oplev error signal
  4. How do we design the Oplev controller?
    • ​The main problem is to frame the right cost function for this problem. Once this cost function is made, we can use MATLAB's PSO tool (which is what was used for the PR3 coating design optimization, and also successfully for this kind of loop shaping problems by Rana for aLIGO) to find a minimum by moving the controller poles and zeros around within bounds we define.
  5. What terms should enter the cost function?

    • ​In addition to those listed in elog 5376
    • We need the >10Hz roll-off to be steep enough that turning on the digital whitening will not significantly increase the DAC output RMS or drive it to saturation.
    • We'd like for the controller to be insensitive to 5% (?) errors in the assumed optical plant and noise models i.e. the closed loop shouldn't become unstable if we made a small error in some assumed parameters.
    • Some penalty for using excessive numbers of poles/zeros? Penalty for having too many high-frequency features.
  6. Other things to verify / look into
    • ​Verify if the counts -> urad calibration is still valid for all the Oplevs. We have the arm-cavity power quadratic dependance method, and the geometry method to do this.
    •  Check if the Oplev error signals are normalized by the quadrant sum.
    • How important is it to balance the individual quadrant gains?
    • Check with Koji / Rich about new QPDs. If we can get some, perhaps we can use these in the setup that Steve is going to prepare, as part of the temperature vs HeNe noise invenstigations.

Before the CDS went down, I had taken error signal spectra for the ITMs. I will update this elog tomorrow with these measurements, as well as some noise estimates, to get started.

  13142   Tue Jul 25 08:48:57 2017 SteveUpdateVACRGA scan at d278

The RGA did not shut down at the turbo pump controller failing.

Quote:

Ifo pressure was 5.5 mTorr this morning. The PSL shutter was still open. TP2 controller failed. Interlock closed V1, V4 and VM1

Turbo pump 2 is the fore pump of the Maglev. The pressure here was 3.9 Torr so The Magelv got warm ~38C but it was still rotating at 560 Hz normal with closed V1

What I did:

Looked at pressures of Hornet and Super Bee  Instru Tech. Inc

Closed all annuloses and VA6,  disconnected V4 and VA6 and turned on external fan to cool Maglev

Opened V7 to pump the Maglev fore line with TP3

V1 opened manually when foreline pressure dropped to <2mTorr at P2 and the body temp of the Maglev cooled down to  25-27 C

VM1 opened at 1e-5 Torr

Valve configuration: vacuum normal with annuloses not pumped

Ifo pressure 8.5e-6 Torr -IT at 10am,  P2 foreline pressure 64 mTorr, TP3 controller 0.17A   22C  50Krpm

note: all valves open manually, interlock can only close them

 

Quote:

While walking down to the X end to reset c1iscex I heard what I would call a "rythmic squnching" sound coming from under the turbo pump.  I would have said the sound was coming from a roughing pump, but none of them are on (as far as I can tell).

Steve maybe look into this??

PS: please call me next time you see the vacuum is not Vacuum Normal

 

Attachment 1: RGA278d.png
RGA278d.png
  13143   Tue Jul 25 14:04:06 2017 SteveUpdateVACturbo controller installed and we are running at vac normal

Gautam and Steve,

Spare Varian turbo-V 70 controller, Model 969-9505, sn 21612 was swapped in. It is running the turbo fine @ 50Krpm but it does not allow it's V4 valve to be opened............

It turns out that TP2 @ 75Krpm will allow V4 to open and close. This must be a software issue.

So Vacuum Normal is operational if TP2 is running 75,000 rpm

We want to run at 50,000 rpm on the long term.

Note: the RS232 Dsub connector on the back of this controller is mounted 180 degrees opposite than TP3  and old failed TP2 controller

 

PS: controller is shipping out for repair 7-28-2017

 

Attachment 1: TP2@75Krpm.png
TP2@75Krpm.png
  13144   Tue Jul 25 14:27:19 2017 SteveUpdatesafetysafety training

Kira Dubrovina and Naomi Wharton received 40m specific basic safety training.

Attachment 1: safety.jpg
safety.jpg
  13145   Wed Jul 26 19:13:07 2017 JamieUpdateCDSdaqd showing same instability as before

I recompiled daqd on the updated fb1, similar to how I had before, and we're seeing the same instability: process crashes when it tries to write out the second trend (technically it looks like it crashes while it's trying to write out the full frame while the second trend is also being written out).  Jonathan Hanks and I are actively looking into it and i'll provide further report soon.

  13146   Thu Jul 27 22:42:24 2017 gautamUpdateSUSSeismic noise, DAC noise, and Coil Driver electronics noise

Summary:

Yesterday at the meeting, we talked about how the analog de-whitening filters in the coil driver path may be more aggressive than necessary. I think Attachment #1 shows that this is indeed the case.

Details:

I had done some modeling and measurement of some of these noises while I was putting together the initial DRMI noise budget, but I had never put things together in one plot. In Attachment #1, I've plotted the following:

  1. Quadrature sum of seismic noise (from GWINC calculations) for 3 suspended optics (I'm sticking to the case of 3 optics since I've been doing all the noise-budgeting for MICH - for DARM, it will be 4 suspended optics).
  2. The unfiltered DAC noise estimate. The voltage noise was measured in this elog. To convert this to displacement noise for 3 suspended optics, I've used the value of 1.55e-9/f^2 m/ct as the actuator coefficient. This number should be accurate under the assumption that the series resistance on the coil driver board output is 400 ohms (we could increase this - by how much depends on how much actuation range is needed).  
  3. Coil driver board and de-whitening board electronics noises (added in quadrature). I've used the LISO model noises, which line up well with the measured noises in elogs 13010 and 13015.
  4. The DAC noise filtered by the de-whitening transfer function, separately for the cases of using one or both of the available biquad stages. This cannot be lower than the preceeding trace (electronics noise of de-whitening and coil driver boards), so should be disregarded where it dips below it. 

It would seem that the coil driver + de-whitening board electronic noises dominate above ~150Hz. The electronics noise is ~10nV/rtHz at the output of the coil driver board, which is only a factor of 100 below the DAC noise - so the stopband attenuation of ~70dB on the de-whitening boards seems excessive.

We can lower this noise by a factor of 2.5 if we up the series resistance on the coil driver boards from 400ohm to 1kohm, but even so, the displacement noise is ~1e-18 m/rtHz. I need to investigate the electronics noises a little more carefully - I only measured it for the case when both biquad stages were engaged, I will need to do the model for all permutations - to be updated. 

Attachment #2 has an iPython notebook used to generate this plot along with all the data.


Edit 28 Jul 2.30pm: I've added Attachment #3 with traces for different assumed values of the series resistance on the coil driver board - although I have not re-computed the Johnson noise contribution for the various resistances. If we can afford to reduce the actuation range by a factor of 25, then it looks like we get to within a factor of ~5 of the seismic noise at ~150Hz. 

Attachment 1: noiseComparison.pdf
noiseComparison.pdf
Attachment 2: deWhiteConfigs.zip
Attachment 3: noiseComparison_resistances.pdf
noiseComparison_resistances.pdf
  13147   Fri Jul 28 15:36:32 2017 gautamUpdateOptical LeversOptical lever tuning thoughts

Attachment #1 - Measured error signal spectrum with the Oplev loop disabled, measured at the IN1 input for ITMY. The y-axis calibration into urad/rtHz may not be exact (I don't know when this was last calibrated).

From this measurement, I've attempted to disentangle what is the seismic noise contribution to the measured plant output.

  • To do so, I first modelled the plant as a pair of complex poles @0.95 Hz, Q=3. This gave the best agreement with measurement by eye, I didn't try and optimize this too carefully. 
  • Next, I assumed all the noise between DC-10Hz comes from only seismic disturbance. So dividing the measured PSD by the plant transfer function gives the spectrum of the seismic disturbance. I further assumed this to be flat, and so I averaged it between DC-10Hz.
  • This will be a first seismic noise model to the loop shape optimizer. I can probably get a better model using the GWINC calculations but for a start, this should be good enough.

It remains to characterize various other noise sources.

Quote:

Before the CDS went down, I had taken error signal spectra for the ITMs. I will update this elog tomorrow with these measurements, as well as some noise estimates, to get started.


I have also confirmed that the "QPD" Simulink block, which is what is used for Oplevs, does indeed have the PIT and YAW outputs normalized by the SUM (see Attachment #2). This was not clear to me from the MEDM screen.


GV 30 Jul 5pm: I've included in Attachment #3 the block diagram of the general linear feedback topology, along with the specific "disturbances" and "noises" w.r.t. the Oplev loop. The measured (open loop) error signal spectrum of Attachment #1 (call it y) is given by:

y_{meas}(s) = P(s)\sum_{i=1}^{3}d_{i}(s) + \sum_{k=1}^{4}n_{k}(s)

If it turns out that one (or more) term(s) in each of the summations above dominates in all frequency bands of interest, then I guess we can drop the others. An elog with a first pass at a mathematical formulation of the cost-function for controller optimization to follow shortly.

Attachment 1: errSig.pdf
errSig.pdf
Attachment 2: QPD_simulink.png
QPD_simulink.png
Attachment 3: feedbackTopology.pdf
feedbackTopology.pdf
  13148   Fri Jul 28 16:47:16 2017 gautamUpdateGeneralPSL StripTool flatlined

About 3.5 hours ago, all the PSL wall StripTool traces "flatlined", as happens when we had the EPICS freezes in the past - except that all these traces were flat for more than 3 hours. I checked that the c1psl slow machine responded to ping, and I could also telnet into it. I tried opening the StripTool on pianosa and all the traces were responsive. So I simply re-started the PSL StripTool on zita. All traces look responsive now.

  13149   Fri Jul 28 20:22:41 2017 JamieUpdateCDSpossible stable daqd configuration with separate DC and FW

This week Jonathan Hanks and I have been trying to diagnose why the daqd has been unstable in the configuration used by the 40m, with data concentrator (dc) and frame writer (fw) in the same process (referred to generically as 'fb').  Jonathan has been digging into the core dumps and source to try to figure out what's going on, but he hasn't come up with anything concrete yet.

As an alternative, we've started experimenting with a daqd configuration with the dc and fw components running in separate processes, with communication over the local loopback interface.  The separate dc/fw process model more closely matches the configuration at the sites, although the sites put dc and fwprocesses on different physical machines.  Our experimentation thus far seems to indicate that this configuration is stable, although we haven't yet tested it with the full configuration, which is what I'm attempting to do now.

Unfortunately I'm having trouble with the mx_stream communication between the front ends and the dc process.  The dc does not appear to be receiving the streams from the front ends and is producing a '0xbad' status message for each.  I'm investigating.

  13150   Sat Jul 29 14:05:19 2017 gautamUpdateGeneralPSL StripTool flatlined

The PMC was unlocked when I came in ~10mins ago. The wall StripTool traces suggest it has been this way for > 8hours. I was unable to get the PMC to re-lock by using the PMC MEDM screen. The c1psl slow machine responded to ping, and I could also telnet into it. But despite burt-restoring c1psl, I could not get the PMC to lock. So I re-started c1psl by keying the crate, and then burt-restored the EPICS values again. This seems to have done the trick. Both the PMC and IMC are now locked.


Unrelated to this work: It looks like some/all of the FE models were re-started. The x3 gain on the coil outputs of the 2 ITMs and BS, which I had manually engaged when I re-aligned the IFO on Monday, were off, and in general, the IMC and IFO alignment seem much worse now than it was yesterday. I will do the re-alignment later as I'm not planning to use the IFO today.

  13151   Sat Jul 29 16:24:55 2017 jamieUpdateGeneralPSL StripTool flatlined
Quote:
Unrelated to this work: It looks like some/all of the FE models were re-started. The x3 gain on the coil outputs of the 2 ITMs and BS, which I had manually engaged when I re-aligned the IFO on Monday, were off, and in general, the IMC and IFO alignment seem much worse now than it was yesterday. I will do the re-alignment later as I'm not planning to use the IFO today.

This was me.  I restarted the front ends when I was getting the MX streams working yesterday.  I'll try to me more conscientious about logging front end restarts.

  13152   Mon Jul 31 15:13:24 2017 gautamUpdateCDSFB ---> FB1

[jamie, gautam]

In order to test the new daqd config that Jamie has been working on, we felt it would be most convenient for the host name "fb" (martian network IP 192.168.113.202) to point to the physical machine "fb1" (martian network IP 192.168.113.201).

I made this change in /var/lib/bind/martian.hosts on chiara, and then ran sudo service bind9 restart. It seems to have done the job. So as things stand, both hostnames "fb" and "fb1" point to 192.168.113.201.

Now, when starting up DTT or dataviewer, the NDS server is automatically found.

More details to follow.

  13153   Mon Jul 31 18:44:40 2017 JamieUpdateCDSCDS system essentially fully recovered

The CDS system is mostly fully recovered at this point.  The mx_streams are all flowing from all front ends, and from all models, and the daqd processes are receiving them and writing the data to frames:

Remaining unresolved issues:

  • IFO needs to be fully locked to make sure ALL components of all models are working.
  • The remaining red status lights are from the "FB NET" diagnostics, which are reflecting a missing status bit from the front end processes due to the fact that they were compiled with an earlier RCG version (3.0.3) than the mx_streams were (3.3+/trunk).  There will be a new release of the RTS soon, at which point we'll compile everything from the same version, which should get us all green again.
  • The entire system has been fully modernized, to the target CDS reference OS (Debian jessie) and more recent RCG versions.  The management of the various RTS components, both on the front ends and on fb, have as much as possible been updated to use the modern management tools (e.g. systemd, udev, etc.).  These changes need to be documented.  In particular...
  • The fb daqd process has been split into three separate components, a configuration that mirrors what is done at the sites and appears to be more stable: The "target" directory for all of these components is now:
    • daqd_dc: data concentrator (receives data from front ends)
    • daqd_fw: receives frames from dc and writes out full frames and second/minute trends
    • daqd_rcv: NDS1 server (raises test points and receives archive data from frames from 'nds' process)
    The "target" directory for all of these new components is:
    • /opt/rtcds/caltech/c1/target/daqd
    All of these processes are now managed under systemd supervision on fb, meaning the daqd restart procedure has changed.  This needs to be simplified and clarified.
  • Second trend frames are being written, but for some reason they're not accessible over NDS.
  • Have not had a chance to verify minute trend and raw minute trend writing yet.  Needs to be confirmed.
  • Get wiper script working on new fb.
  • Front end RTS kernel will occaissionally crash when the RTS modules are unloaded.  Keith Thorne apparently has a kernel version with a different set of patches from Gerrit Kuhn that does not have this problem.  Keith's kernel needs to be packaged and installed in the front end diskless root.
  • The models accessing the dolphin shared memory will ALL crash when one of the front end hosts on the dolphin network goes away.  This results in a boot fest of all the dolphin-enabled hosts.  Need to figure out what's going on there.
  • The RCG settings snapshotting has changed significantly in later RCG versions.  We need to make sure that all burt backup type stuff is still working correctly.
  • Restoration of /frames from old fb SCSI RAID?
  • Backup of entirety of fb1, including fb1 root (/) and front end diskless root (/diskless)
  • Full documentation of rebuild procedure from Jamie's notes.
  13155   Mon Jul 31 23:39:02 2017 ranaUpdateCOCCavity Scan Simulation Code

Hiro Yamamoto has updated SIS (Static Interferometer Simulation) to allow us to do the MCMC based inference of the 40m arm cavity mirror maps. 

The latest version is in git.ligo.org: IFOsim/SIS/

In the examples directory I have put 3 files:

  1. mcmcCavityScans.m - runs many cavity scans using parfor and saves the data
  2. plotCavityScans.m - loads the .mat file with the data and plots it
  3. plotCavityScans.py - python file which also loads & plots, but nicer since python has a transparency option for the traces.

Attached is the plots and the data. The first attached plot is a low resolution one: 200 scans of 100 frequency points each. Second plot is 200 scans of 300 points each.

The run was done assuming perfect LIGO arm params with a random set of Zernike perturbations for each run. The amplitude of each Zernike was chosen from a Normal distribution with a standard deviation of 10 nm.

We need to come up with a better guess for the initial distribution from which to sample, and also to use the more smart sampling that one does using the MCMC Hammer.

Attachment 1: manyCavityScans-SIS.pdf
manyCavityScans-SIS.pdf
Attachment 2: manyCavityScans-SIS.pdf
manyCavityScans-SIS.pdf
Attachment 3: MonteCarlo_CavityScans.mat
  13156   Tue Aug 1 16:05:01 2017 gautamUpdateOptical LeversOptical lever tuning - cost function construction

Summary:

I've been trying to put together the cost-function that will be used to optimize the Oplev loop shape. Here is what I have so far.

Details:

All of the terms that we want to include in the cost function can be derived from:

  1. A measurement of the open-loop error signal [using DTT, calibrated to urad/rtHz]. We may want a breakdown of this in terms of "sensing noises" and "disturbances" (see the previous elog in this thread), but just a spectrum will suffice for the optimal controller given the current noises.
  2. A model of the optical plant, P(s) [validated with a DTT swept-sine measurement]. 
  3. A model of the controller, C(s). Some/all of the poles and zeros of this transfer function is what the optimization algorithm will tune to satisfy the design objectives.

From these, we can derive, for a given controller, C(s):

  1. Closed-loop stability (i.e. all poles should be in the left-half of the complex plane), and exactly 2 UGFs. We can use MATLAB's allmargin function for this. An unstable controller can be rejected by assigning it an extremely high cost.
  2. RMS rrror signal suppression in the frequency band (0.5Hz - 2Hz). We can require this to be >= 15dB (say).
  3. Minimize gain peaking and noise injection - this information will be in the sensitivity function, \left | \frac{1}{1+P(s)C(s)} \right |. We can require this to be <= 10dB (say).
  4. RMS of the control signal between 10 Hz and 200 Hz, multiplied by the digital suspension whitening filter, should be <10% of the DAC range (so that we don't have problems engaging the coil de-whitening).
  5. Smallest gain margin (there will be multiple because of the various notches we have) should be > 10dB (say). Phase margin at both UGFs should be >30 degrees.
  6. Terms 1-5 should not change by more than 10% for perturbations in the plant model parameters (f0 and Q of the pendulum) at the 10% (?) level. 

We can add more terms to the cost function if necessary, but I want to get some minimal set working first. All the "requirements" I've quoted above are just numbers out of my head at the moment, I will refine them once I get some feeling for how feasible a solution is for these requirements.

Quote:

An elog with a first pass at a mathematical formulation of the cost-function for controller optimization to follow shortly.


For a start, I attempted to model the current Oplev loop. The modeling of the plant and open-loop error signal spectrum have been described in the previous elogs in this thread.

I am, however, confused by the controller - the MEDM screen (see Attachment #2) would have me believe that the digital transfer function is FM2*FM5*FM7*FM8*gain(10). However, I get much better agreement between the measured and modelled in-loop error signal if I exclude the overall gain of 10 (see Attachments #1 for the models and #3 for measurements).

What am I missing? Getting this right will be important in specifying Term #4 in the cost function...

GV Edit 2 Aug 0030: As another sanity check, I computed the whitened Oplev control signal given the current loop shape (with sub-optimal high-frequency roll-off). In Attachment #4, I converted the y-axis from urad/rtHz to cts/rtHz using the approximate calibration of 240urad/ct (and the fact that the Oplev error signal is normalized by the QPD sum of ~13000 cts), and divided by 4 to account for the fact that the control signal is sent to 4 coils. It is clear that attempting to whiten the coil driver signals with the present Oplev loop shapes causes DAC saturation. I'm going to use this formulation for Term #4 in the cost function, and to solve a simpler optimization problem first - given the existing loop shape, what is the optimal elliptic low-pass filter to implement such that the cost function is minimized? 


There is also the question of how to go about doing the optimization, given that our cost function is a vector rather than a scalar. In the coating optimization code, we converted the vector cost function to a scalar one by taking a weighted sum of the individual components. This worked adequately well.

But there are techniques for vector cost-function optimization as well, which may work better. Specifically, the question is  if we can find the (infinite) solution set for which no one term in the error function can be made better without making another worse (the so-called Pareto front). Then we still have to make a choice as to which point along this curve we want to operate at.

Attachment 1: loopPerformance.pdf
loopPerformance.pdf
Attachment 2: OplevLoop.png
OplevLoop.png
Attachment 3: OL_errSigs.pdf
OL_errSigs.pdf
Attachment 4: DAC_saturation.pdf
DAC_saturation.pdf
ELOG V3.1.3-