40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 277 of 341  Not logged in ELOG logo
ID Date Author Typeup Category Subject
  13085   Wed Jun 28 20:15:46 2017 gautamUpdateGeneralc1iscex timing troubles

[Koji, gautam]

Here is a summary of what we did today to fix the timing issue on c1iscex. The power supply to the timing card in the X end expansion chassis was to blame.

  1. We prepared the Y-end expansion chassis for transport to the X end. To do so, we disconnected the following from the expansion chassis
    • Cables going to the ADC/DAC adaptor boards
    • Dolphin connector
    • BIO connector
    • RFM fiber
    • Timing fiber
  2. We then carried the expansion chassis to the X end electronics rack. There we repeated the above steps for the X-end expansion chassis
  3. We swapped the X and Y end expansion chassis in the X end electronics rack. Powering the unit, we immediately saw the green lights on the front of the timing card turn on, suggesting that the Y-end expansion chassis works fine at the X end as well (as it should). To further confirm that all was well, we were able to successfully start all the RT models on c1iscex without running into any timing issues.
  4. Next, we decided to verify if the spare timing card is functional. So we swapped out the timing card in the expansion chassis brought over to the X end from the Y end with the spare. In this test too, all worked as expected. So at this stage, we concluded that
    • There was nothing wrong with the fiber bringing the timing signal to the X end
    • The Y-end expansion chassis works fine
    • The spare timing card works fine.
  5. Then we decided to try the original X-end expansion chassis timing card in the Y-end expansion chassis. This test too was successful - so there was nothing wrong with any of the timing card!
  6. Next, we decided to power the X-end timing chassis with its original timing card, which was just verified to work fine. Surprisingly, the indicator lights on the timing card did not turn on.
  7. The timing card has 3 external connections
    • A 40 pin IDE connector
    • Power
    • Fiber carrying the timing signal
  8. We went back to the Y-end expansion chassis, and checked that the indicator lights on the timing card turned on even when the 40 pin IDE connector was left unconnected (so the timing card just gets power and the timing signal).
  9. We concluded that the power supply in the X end expansion chassis was to blame. Indeed, when Koji jiggled the connector around a little, the indicator lights came on!
  10. The connection was diagnosed to be somewhat flaky - it employs the screw-in variety of terminal blocks, and one of the connections was quite loose - Koji was able to pull the cable out of the slot applying a little pressure.
  11. I replaced the cabling (swapped the wires for thicker gauge, more flexible variety), and re-tightened the terminal block screws. The connection was reasonably secure even when I applied some force. A quick test verified that the timing card was functional when the unit was powered.
  12. We then replaced the X and Y-end expansion chassis (complete with their original timing cards, so the spare is back in the CDS cabinet), in the racks. The models started up again without complaint, and the CDS overview screen is now in a good state [Attachment #1]. The arms are locked and aligned for maximum transmission now.
  13. There was some additional difficulty in getting the 40-pin IDE connector in on the Y-end expansion chassis. Looked like we had bent some of the pins on the timing board while pulling this cable out. But Koji was able to fix this with a screw driver. Care should be taken when disconnecting this cable in the future!

There were a few more flaky things in the Expansion chassis - the IDE connectors don't have "keys" that fix the orientation they should go in, and the whole timing card assembly is kind of difficult and not exactly secure. But for now, things are back to normal it seems.

Wouldn't it be nice if this fix also eliminates the mystery ETMX glitching problem? After all, seems like this flaky power supply has been a problem for a number of years. Let's keep an eye out.

  13086   Thu Jun 29 00:13:08 2017 KaustubhUpdateComputer Scripts / ProgramsTransfer Function Testing

In continuation to my previous posts, I have been working on evaluating the data on transfer function. Recently, I have calculated the correlation values between the real and imaginary part of the transfer function. Also I have written the code for plotting the transfer function data stream at each frequency in the argand plane just for referring to. Also I have done a few calculations and found the errors in magnitude and phase using those in the real and imaginary parts of the transfer function. More details for the process are in this git repository.

The following attachments have been added:

  1. The correlation plot at different frequencies. This data is for a 100 data files.
  2. The Test files used to produce the abover plot along with the code for the plotting it as well as the text file containing the correlation values. (Most of the code is commented as that part wasn't needed fo rhte recent changes.)

 

Conclusion:

Seeing the correlation values, it sounds reasonable that the gaussian in real and imaginary parts approximation is actually holding. This is because the correlation values are mostly quite small. This can be seen by studying the distribution of the transfer function on the argand plane. The entire distribution can be seen to be somewhat, if not entirely, circular. Even when the ellipticity of the curve seems to be high, the curve still appears to be elliptical along the real and imaginary axes, i.e., correlation in them is still low.

 

To Do:

  1. Use a better way to estimate the errors in magnitude and phase as the method used right now is a only valid with the liner approximation and gives insane values which are totally out of bounds when the magnitude is extrmely small and the phase is varying as mad.
  2. Use the errors in the transfer function to estimate the coherence in the data for each frequency point. That is basically plot a cohernece Vs frequency plot showing how the coherence of the measurements vary as the frequency is varied.

 

In order to test the above again, with an even larger data set, I am leaving a script running on Ottavia. It should take more than just the night(I estimate around 10-11 hours) if there are no problems.

  13087   Thu Jun 29 10:04:18 2017 jigyasaUpdateComputer Scripts / ProgramsMC2 Pitch-Yaw offset

The script is being executed again, now.

Quote:

I worked on the code today and have left a script (MC2rerun.py) running on Ottavia which should run overnight.

 

 

  13088   Fri Jun 30 02:13:23 2017 gautamUpdateGeneralDRMI locking attempt

Summary:

I attempted to re-lock the DRMI and try and realize some of the noise improvements we have identified. Summary elog, details to follow.

  1. Locked arms, ran ASS, centered OLs on ITMs and BS on their respective QPDs.
  2. Looked into changing the BS Oplev loop shape to match that of the ITMs - it looks like the analog electronics that take the QPD signals in for the BS Oplev is a little different, the 800Hz poles are absent. But I thought I had managed to do this successfully in that the error signal suppression improved and it didn't look like the performance of the modified loop was worse anywhere except possibly at the stack resonance of ~3Hz --- see Attachment #1 (will be rotated later). The TRX spectra before and after this modification also didn't raise any red flags.
  3. Re-aligned PRM - went to the AS table and centered beam on all REFL PDs
  4. Locked PRMI on carrier, ran MICH and AS dither alignment. PRC angular feedforward also seemed to work well.
  5. Re-aligned SRM, looked for DRMI locks - there was a brief lock of a couple of seconds, but after this, the BS behaviour changed dramatically.

Basically after this point, I was unable to repeat stuff I did earlier in the evening just a couple of hours ago. The single arm locks catch quickly, and seem stable over the hour timescale, but when I run the X arm dither, the BS PITCH loop starts to oscillate at ~0.1 Hz. Moreover, I am unable to acquire PRMI carrier lock. I must have changed a setting somewhere that I am not catching right now (although I've scripted most of these things for repeatability, so I am at a loss what I'm missing indecision). The only change I can think of is that I changed the BS Oplev loop shape. But I went back into the filter file archives and restored these to their original configuration. Hopefully I'll have better luck figuring this out tomorrow.

  13089   Fri Jun 30 11:08:26 2017 jigyasaUpdateCamerasGigE camera at ETMX
With Steve's help in getting the right depth of field for imaging and focusing on the test mass with the new AR coated lens, Gautam's help with locking the arm and trying my hand at adjusting the focus of the camera yesterday, we were able to get some images of the IR beam, with the green shutter on and off at different exposures. Since the CCD is at an angle to the optic, the exposure time had to be increased signifcantly(and varied between 0.08 to 0.5 seconds) to capture bright images. 
A few frames without the IR on and with the green shutter closed were captured.
These show the OSEM and the Oplev on the test mass. 
 
Steve's note: AR coated camera lens M5018-SW installed at ~40 degrees
                    Atm2,  pcicture is taken through dirty window
 
Quote:

Also the GigE has been wired and conencted to the Martian. Image acquisition is possible with Pylon.

 

 

  13090   Fri Jun 30 11:50:17 2017 gautamUpdateGeneralDRMI locking attempt

Seems like the problem is actually with ITMX - the attached DV plots are for ITMX with just local damping loops on (no OLs), LR seems to be suspect.

I'm going to go squish cables and the usual sat. box voodoo, hopefully that settles it.

Quote:

Summary:

I attempted to re-lock the DRMI and try and realize some of the noise improvements we have identified. Summary elog, details to follow.

  1. Locked arms, ran ASS, centered OLs on ITMs and BS on their respective QPDs.
  2. Looked into changing the BS Oplev loop shape to match that of the ITMs - it looks like the analog electronics that take the QPD signals in for the BS Oplev is a little different, the 800Hz poles are absent. But I thought I had managed to do this successfully in that the error signal suppression improved and it didn't look like the performance of the modified loop was worse anywhere except possibly at the stack resonance of ~3Hz --- see Attachment #1 (will be rotated later). The TRX spectra before and after this modification also didn't raise any red flags.
  3. Re-aligned PRM - went to the AS table and centered beam on all REFL PDs
  4. Locked PRMI on carrier, ran MICH and AS dither alignment. PRC angular feedforward also seemed to work well.
  5. Re-aligned SRM, looked for DRMI locks - there was a brief lock of a couple of seconds, but after this, the BS behaviour changed dramatically.

Basically after this point, I was unable to repeat stuff I did earlier in the evening just a couple of hours ago. The single arm locks catch quickly, and seem stable over the hour timescale, but when I run the X arm dither, the BS PITCH loop starts to oscillate at ~0.1 Hz. Moreover, I am unable to acquire PRMI carrier lock. I must have changed a setting somewhere that I am not catching right now (although I've scripted most of these things for repeatability, so I am at a loss what I'm missing indecision). The only change I can think of is that I changed the BS Oplev loop shape. But I went back into the filter file archives and restored these to their original configuration. Hopefully I'll have better luck figuring this out tomorrow.

 

  13091   Fri Jun 30 15:25:19 2017 jigyasaUpdateCamerasGigE camera at ETMX

All thanks to Steve, we cleaned the view port on the ETMX on which the camera is installed, and with a little fine tuning of the focus of the camera, here's a really good image of the beam spot at 6 and 14 ms.

Quote:
Steve's note: AR coated camera lens M5018-SW installed at ~40 degrees

 

  13092   Fri Jun 30 16:03:54 2017 jigyasaUpdateCamerasGigE camera at ETMX

 

Quote:

All thanks to Steve, we cleaned the view port on the ETMX on which the camera is installed, and with a little fine tuning of the focus of the camera, here's a really good image of the beam spot at 6 and 14 ms.

Quote:
Steve's note: AR coated camera lens M5018-SW installed at ~40 degrees

 

 

  13093   Fri Jun 30 22:28:27 2017 gautamUpdateGeneralDRMI re-locked

Summary:

Reverted to old settings, tried to reproduce DRMI lock with settings as close to those used in May this year as possible. Tonight, I was successful in getting a couple of ~10min DRMI 1f locks yes. Now I can go ahead and try and reduce the noise.

I am not attempting a full characterization tonight, but the important changes since the May locks are in the de-whitening boards and coil driver boards. I did not attempt to engage the coil-dewhitening, but the PD whitening works fine.

As a quick check, I tested the hypothesis that the BS OL loop A2L coupling dominates between ~10-50Hz. The attached control signal spectra [Attachment #2] supports this hypothesis. Now to actually change the loop shape.

I've centered Oplevs of all vertex optics, and also the beams on the REFL and AS PDs. The ITMs and BS have been repeatedly aligned since re-installing their respective coil driver electronics, but the SRM alignment needed some adjustment of the bias sliders.

Full characterization to follow. Some things to check:

  • Investigate adn fix the suspect X-arm ASS loop
  • Is there too much power on the AS110 PD post Oct2016 vent? Is the PD saturating?

Lesson learnt: Don't try and change too many things at once!

GV July 5 1130am: Looks like the MICH loop gain wasn't set correctly when I took the attached spectra, seems like the bump around 300Hz was caused by this. On later locks, this feature wasn't present.

  13094   Sat Jul 1 14:27:00 2017 KojiUpdateGeneralDRMI re-locked

Basically we use the arm cavities as the reference of the beam alignment. The incident beam is aligned such that the ITMY angle dither is minimized (at least at the dither freq).
This means that we have no capability to adjust the spot poisitions on the PRM, SRM, BS, ITMX optics.

We are still able to minimize A2L by adding intentional asymmetry to the coil actuators.

  13095   Wed Jul 5 10:23:18 2017 SteveUpdatesafetyliquid nitrogen boil off

The liquid nitrogen container has a pressure releif valve set to 35 PSI  This valve will open periodically when contains LN2

The exiting  very cold gas can cause burning so it should not hit directly your eyes or skin.  Set the pointing of this valve into the corner.

Leave entry door open so nitrogen concentration can not build up.

Oxygen deficiency
Nitrogen can displace oxygen in the air, reducing the
percentage of oxygen to below safe levels. Because the brain
needs a continuous supply of oxygen to remain active, lack
of oxygen prevents the brain from functioning properly, and
it shuts down.
Being odorless, colorless, tasteless, and nonirritating,
nitrogen has no properties that can warn people of its pres-
ence. Inhalation of excessive amounts of nitrogen can cause
dizziness, nausea, vomiting, loss of consciousness, and death
  13096   Wed Jul 5 16:09:34 2017 gautamUpdateCDSslow machine bootfest

Reboots for c1susaux, c1iscaux today.

 

  13097   Wed Jul 5 19:10:36 2017 gautamUpdateGeneralNB code checkout - updated

I've been making NBs on my laptop, thought I would get the copy under version control up-to-date since I've been negligent in doing so.

The code resides in /ligo/svncommon/NoiseBudget, which as a whole is a git directory. For neatness, most of Evan's original code has been put into the sub-directory  /ligo/svncommon/NoiseBudget/H1NB/, while my 40m NB specific adaptations of them are in the sub-directory /ligo/svncommon/NoiseBudget/NB40. So to make a 40m noise budget, you would have to clone and edit the parameter file accordingly, and run python C1NB.py C1NB_2017_04_30.py for example. I've tested that it works in its current form. I had to install a font package in order to make the code run (with sudo apt-get install tex-gyre ), and also had to comment out calls to GwPy (it kept throwing up an error related to the package "lal", I opted against trying to debug this problem as I am using nds2 instead of GwPy to get the time series data anyways).

There are a few things I'd like to implement in the NB like sub-budgets, I will make a tagged commit once it is in a slightly neater state. But the existing infrastructure should allow making of NBs from the control room workstations now.

Quote:

[evan, gautam]

We spent some time trying to get the noise-budgeting code running today. I guess eventually we want this to be usable on the workstations so we cloned the git repo into /ligo/svncommon. The main objective was to see if we had all the dependencies for getting this code running already installed. The way Evan has set the code up is with a bunch of dictionaries for each of the noise curves we are interested in - so we just commented out everything that required real IFO data. We also commented out all the gwpy stuff, since (if I remember right) we want to be using nds2 to get the data. 

Running the code with just the gwinc curves produces the plots it is supposed to, so it looks like we have all the dependencies required. It now remains to integrate actual IFO data, I will try and set up the infrastructure for this using the archived frame data from the 2016 DRFPMI locks..

 

  13098   Thu Jul 6 11:58:28 2017 jigyasaUpdateCamerasHDR images of ETMX

I captured a few images of the beam spot on ETMX at 5ms, 10ms, 14ms, 50ms, 100ms, 500ms, 1000ms exposure and ran them through my python script for HDR images. Here's what I obtained. 
The resulting image is an improvement over the highly saturated images at say, 500ms and 1 second exposures. 
Additionally, I also included a colormapped version of the image. 

  13100   Fri Jul 7 14:34:27 2017 ranaUpdateCamerasHDR images of ETMX

i wonder how 'HDR' these images really are. is there a quantitative way to check that we are really getting more bits? also, how many bits does the PNG format allow for monochrome images? i worry that these elog images are already lossy.

 

  13101   Sat Jul 8 17:09:50 2017 gautamUpdateGeneralETMY TRANS QPD anomaly

About 2 weeks ago, I noticed some odd behaviour of the LSC TRY data stream. Its DC value seems to be drifting ~10x more than TRX. Both signals come from the transmission QPDs. At the time, we were dealing with various CDS FE issues but things have been stable on that end for the last two weeks, so I looked into this a bit more today. It seems like one particular channel is bad - Quadrant 4 of the ETMY TRANS QPD. Furthermore, there is a bump around 150Hz, and some features above 2kHz, that are only present for the ETMY channels and not the ETMX ones.

Since these spectra were taken with the PSL shutter closed and all the lab room lights off, it would suggest something is wrong in the electronics - to be investigated.

The drift in TRY can be as large as 0.3 (with 1.0 being the transmitted power in the single arm lock). This seems unusually large, indeed we trigger the arm LSC loops when TRY > 0.3. Attachment #2 shows the second trend of the TRX and TRY 16Hz EPICS channels for 1 day. In the last 12 hours or so, I had left the LSC master switch OFF, but the large drift of the DC value of TRY is clearly visible.

In the short term, we can use the high-gain THORLABS PD for TRY monitoring.

  13102   Sun Jul 9 08:58:07 2017 ranaUpdateGeneralETMY TRANS QPD anomaly

Indeed, the whole point of the high/low gain setup is to never use the QPDs for the single arm work. Only use the high gain Thorlabs PD and then the switchover code uses the QPD once the arm powers are >5.

I don't know how the operation procedure went so higgledy piggledy.

  13103   Mon Jul 10 09:49:02 2017 gautamUpdateGeneralAll FEs down

Attachment #1: State of CDS overview screen as of 9.30AM today morning when I came in.

Looks like there may have bene a power glitch, although judging by the wall StripTool traces, if there was one, it happened more than 8 hours ago. FB is down atm so can't trend to find out when this happened.

All FEs and FB are unreachable from the control room workstations, but Megatron, Optimus and Chiara are all ssh-able. The latter reports an uptime of 704 days, so all seems okay with its UPS. Slow machines are all responding to ping as well as telnet.

Recovery process to begin now. Hopefully it isn't as complicated as the most recent effort indecision[FAMOUS LAST WORDS]

  13104   Mon Jul 10 11:20:20 2017 gautamUpdateGeneralAll FEs down

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

  13105   Mon Jul 10 17:13:21 2017 jigyasaUpdateComputer Scripts / ProgramsCapture image without pylon GUI

Over the day, I have been working on a C++ program to interface with Pylon to capture images and reduce dependence on the Pylon GUI. The program uses the Pylon header files along with opencv headers. While ultimately a wrapper in python may be developed for the program, the current C++ program at, 

/users/jigyasa/GigEcode/Grab/Grab.cpp when compiled as

g++ -Wl,--enable-new-dtags -Wl,-rpath,/opt/pylon5/lib64 -o Grab Grab.o -L/opt/pylon5/lib64 -Wl,-E -lpylonbase -lpylonutility -lGenApi_gcc_v3_0_Basler_pylon_v5_0 -lGCBase_gcc_v3_0_Basler_pylon_v5_0 `pkg-config opencv --cflags --libs`

returns an executable file named Grab which can be executed as ./Grab

This captures one image from the camera and displays it, additionally it also displays the gray value of the first pixel.

I am working on adding more utility to the program such as manually adjusting exposure, gain and also on the python wrapper (Cython has been installed locally on Ottavia for the purpose)!

  13106   Mon Jul 10 17:46:26 2017 gautamUpdateGeneralAll FEs down

A bit more digging on the diagnostics page of the RAID array reveals that the two power supplies actually failed on Jun 2 2017 at 10:21:00. Not surprisingly, this was the date and approximate time of the last major power glitch we experienced. Apart from this, the only other error listed on the diagnostics page is "Reading Error" on "IDE CHANNEL 2", but these errors precede the power supply failure.

Perhaps the power supplies are not really damaged, and its just in some funky state since the power glitch. After discussing with Jamie, I think it should be safe to power cycle the Jetstor RAID array once the FB machine has been powered down. Perhaps this will bring back one/both of the faulty power supplies. If not, we may have to get new ones. 

The problem with FB may or may not be related to the state of the Jestor RAID array. It is unclear to me at what point during the boot process we are getting stuck at. It may be that because the RAID disk is in some funky state, the boot process is getting disrupted.

Quote:

I am unable to get FB to reboot to a working state. A hard reboot throws it into a loop of "Media Test Failure. Check Cable".

Jetstor RAID array is complaining about some power issues, the LCD display on the front reads "H/W Monitor", with the lower line cycling through "Power#1 Failed", "Power#2 Failed", and "UPS error". Going to 192.168.113.119 on a martian machine browser and looking at the "Hardware information" confirms that System Power #1 and #2 are "Failed", and that the UPS status is "AC power loss". So far I've been unable to find anything on the elog about how to handle this problem, I'll keep looking.


In fact, looks like this sort of problem has happened in the past. It seems one power supply failed back then, but now somehow two are down (but there is a third which is why the unit functions at all). The linked elog thread strongly advises against any sort of power cycling. 

 

  13107   Mon Jul 10 19:15:21 2017 gautamUpdateGeneralAll FEs down

The Jetstor RAID array is back in its nominal state now, according to the web diagnostics page. I did the following:

  1. Powered down the FB machine - to avoid messing around with the RAID array while the disks are potentially mounted.
  2. Turned off all power switches on the back of the Jetstor unit - there were 4 of them, all of them were toggled to the "0" position.
  3. Disconnected all power cords from the back of the Jetstor unit - there were 3 of them.
  4. Reconnected the power cords, turned the power switches back on to their "1" position.

After a couple of minutes, the front LCD display seemed to indicate that it had finished running some internal checks. The messages indicating failure of power units, which was previously constantly displayed on the front LCD panel, was no longer seen. Going back to the control room and checking the web diagnostics page, everything seemed back to normal.

However, FB still will not boot up. The error is identical to that discussed in this thread by Intel. It seems FB is having trouble finding its boot disk. I was under the impression that only the FE machines were diskless, and that FB had its own local boot disk - in which case I don't know why this error is showing up. According to the linked thread, it could also be a problem with the network card/cable, but I saw both lights on the network switch port FB is connected to turn green when I powered the machine on, so this seems unlikely. I tried following the steps listed in the linked thread but got nowhere, and I don't know enough about how FB is supposed to boot up, so I am leaving things in this state now. 

  13108   Mon Jul 10 21:03:48 2017 jamieUpdateGeneralAll FEs down

 

Quote:
 

However, FB still will not boot up. The error is identical to that discussed in this thread by Intel. It seems FB is having trouble finding its boot disk. I was under the impression that only the FE machines were diskless, and that FB had its own local boot disk - in which case I don't know why this error is showing up. According to the linked thread, it could also be a problem with the network card/cable, but I saw both lights on the network switch port FB is connected to turn green when I powered the machine on, so this seems unlikely. I tried following the steps listed in the linked thread but got nowhere, and I don't know enough about how FB is supposed to boot up, so I am leaving things in this state now. 

It's possible the fb bios got into a weird state.  fb definitely has it's own local boot disk (*not* diskless boot).  Try to get to the BIOS during boot and make sure it's pointing to it's local disk to boot from.

If that's not the problem, then it's also possible that fb's boot disk got fried in the power glitch.  That would suck, since we'd have to rebuild the disk.  If it does seem to be a problem with the boot disk then we can do some invasive poking to see if we can figure out what's up with the disk before rebuilding.

  13110   Mon Jul 10 22:07:35 2017 KojiUpdateGeneralAll FEs down

I think this is the boot disk failure. I put the spare 2.5 inch disk into the slot #1. The OK indicator of the disk became solid green almost immediately, and it was recognized on the BIOS in the boot section as "Hard Disk". On the contrary, the original disk in the slot #0 has the "OK" indicator kept flashing and the BIOS can't find the harddisk.

 

  13111   Tue Jul 11 15:03:55 2017 gautamUpdateGeneralAll FEs down

Jamie suggested verifying that the problem is indeed with the disk and not with the controller, so I tried switching the original boot disk to Slot #1 (from Slot #0 where it normally resides), but the same problem persists - the green "OK" indicator light keeps flashing even in Slot #1, which was verified to be a working slot using the spare 2.5 inch disk. So I think it is reasonable to conclude that the problem is with the boot disk itself.

The disk is a Seagate Savvio 10K.2 146GB disk. The datasheet doesn't explicitly suggest any recovery options. But Table 24 on page 54 suggests that a blinking LED means that the disk is "spinning up or spinning down". Is this indicative of any particular failure moed? Any ideas on how to go about recovery? Is it even possible to access the data on the disk if it doesn't spin up to the nominal operating speed?

Quote:

I think this is the boot disk failure. I put the spare 2.5 inch disk into the slot #1. The OK indicator of the disk became solid green almost immediately, and it was recognized on the BIOS in the boot section as "Hard Disk". On the contrary, the original disk in the slot #0 has the "OK" indicator kept flashing and the BIOS can't find the harddisk.

 

 

  13112   Tue Jul 11 15:12:57 2017 KojiUpdateGeneralAll FEs down

If we have a SATA/USB adapter, we can test if the disk is still responding or not. If it is still responding, can we probably salvage the files?
Chiara used to have a 2.5" disk that is connected via USB3. As far as I know, we have remote and local backup scripts running (TBC), we can borrow the USB/SATA interface from Chiara.

If the disk is completely gone, we need to rebuilt the disk according to Jamie, and I don't know how to do it. (Don't we have any spare copy?)

  13113   Wed Jul 12 10:21:07 2017 gautamUpdateGeneralAll FEs down

Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.

Quote:

If we have a SATA/USB adapter, we can test if the disk is still responding or not. If it is still responding, can we probably salvage the files?
Chiara used to have a 2.5" disk that is connected via USB3. As far as I know, we have remote and local backup scripts running (TBC), we can borrow the USB/SATA interface from Chiara.

If the disk is completely gone, we need to rebuilt the disk according to Jamie, and I don't know how to do it. (Don't we have any spare copy?)

 

  13114   Wed Jul 12 14:46:09 2017 gautamUpdateGeneralAll FEs down

I couldn't find an external docking setup for this SAS disk, seems like we need an actual controller in order to interface with it. Mike Pedraza in Downs had such a unit, so I took the disk over to him, but he wasn't able to interface with it in any way that allows us to get the data out. He wants to try switching out the logic board, for which we need an identical disk. We have only one such spare at the 40m that I could locate, but it is not clear to me whether this has any important data on it or not. It has "hda RTLinux" written on its front panel with a sharpie. Mike thinks we can back this up to another disk before trying anything, but he is going to try locating a spare in Downs first. If he is unsuccessful, I will take the spare from the 40m to him tomorrow, first to be backed up, and then for swapping out the logic board.

Chatting with Jamie and Koji, it looks like the options we have are:

  1. Get the data from the old disk, copy it to a working one, and try and revert the original FB machine to its last working state. This assumes we can somehow transfer all the data from the old disk to a working one.
  2. Prepare a fresh boot disk, load the old FB daqd code (which is backed up on Chiara) onto it, and try and get that working. But Jamie isn't very optimistic of this working, because of possible conflicts between the code and any current OS we would install.
  3. Get FB1 working. Jamie is looking into this right now.
Quote:

Seems like the connector on this particular disk is of the SAS variety (and not SATA). I'll ask Steve to order a SAS to USB cable. In the meantime I'm going to see if the people at Downs have something we can borrow.

 

 

  13115   Wed Jul 12 14:52:32 2017 jamieUpdateGeneralAll FEs down

I just want to mention that the situation is actually much more dire than we originally thought.  The diskless NFS root filesystem for all the front-ends was on that fb disk.  If we can't recover it we'll have to rebuilt the front end OS as well.

As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.

  13117   Fri Jul 14 17:47:03 2017 gautamUpdateGeneralDisks from LLO have arrived

[jamie, gautam]

Today morning, the disks from LLO arrived. Jamie and I have been trying to get things back up and running, but have not had much success today. Here is a summary of what we tried.

Keith Thorne sent us two disks: one has the daqd code and the second is the boot disk for the FE machines. Since Jamie managed to successfully compile the daqd code on FB1 yesterday, we decided to try the following: mount the boot disk KT sent us (using a SATA/USB adapter) on /mnt on FB1, get the FEs booted up, and restart the RT models. 

Quote:

I just want to mention that the situation is actually much more dire than we originally thought.  The diskless NFS root filesystem for all the front-ends was on that fb disk.  If we can't recover it we'll have to rebuilt the front end OS as well.

As of right now none of the front ends are accessible, since obviously their root filesystem has disappeared.

While on FB1, Jamie realized he actually had a copy of the /diskless/root directory, which is the NFS filesystem for the FEs, on FB1. So we decided to try and boot some of the FEs with this (instead of starting from scratch with the disks KT sent us). The way things were set up, the FEs were querying the FB machine as the DHCP server. But today, we followed the instructions here to get the FEs to get their IP address from chiara instead. We also added the line 

/diskless/root *(sync,rw,no_root_squash,no_all_squash,no_subtree_check)

to /etc/exports followed by exportfs -ra on FB1. At which point the FE machine we were testing (c1lsc) was able to boot up. 

However, it looks like the NFS filesystem isn't being mounted correctly, for reasons unknown. We commented out some of the rtcds related lines in /etc/rc.local because they were causing a whole bunch of errors at boot (the lines that were touched have been tagged with today's date).


So in summary, the status as of now is:

  1. Front-end machines are able to boot
  2. There seems to be some problem during the boot process, leading to the NFS file system not being correctly mounted. The closest related thing I could find from an elog search is this entry, but I think we are facing a different probelm.
  3. We wanted to see if we could start the realtime models (but without daqd for now), but we weren't even able to get that far today.

We will resume recovery efforts on Monday.

  13118   Sat Jul 15 01:28:53 2017 jigyasaUpdateCamerasBRDF Calibrations

This evening, Gautam helped me with setting up the apparatus for calibrating the GigE for BRDF measurements.
The SP table was chosen to set up the experiment and for this reason a few things including a laser and power meter (presumably set up by Steve) had to be moved around.

We initially started by setting up the Crysta laser with its power source (Crysta #2, 150-190 mW 1064 laser) on the SP table. The Ophir power meter was used to measure the laser power. We discovered that the laser was highly unstable as its output on the power meter fluctuated (kind of periodically) between 40 and 150 mW. The beam spot on the beam card also appeared to validate this change in intensity. So we decided to use another 1064 nm laser instead.
Gautam got the LightWave NPro laser from the PSL table and set it up on the SP table and with this laser the output as measured by the same power meter was quite stable.

We manually adjusted the power to around 150 mW. This was followed by setting up the half wave plate(HWP) with the polarizing beam splitter (PBS), which was very gently and precisely done by Gautam, while explaining how to handle the optics to me.
 On first installing the PBS, we found that the beam was already quite strongly polarized as there seemed to be zero transmission but a strong reflection.
With the HWP in place, we get a control over the transmitted intensity. The reflected beam is directed to a beam dump.
I have taken down the GigE(+mount) at ETMX and wired a spare PoE injector.
We tried to interface with the camera wirelessly through the wireless network extenders but that seems to render an unstable connection to the GigE so while a single shot works okay, a continuous shot on the GigE didn’t succeed.

The GigE was connected to the Martian via Ethernet cable and images were observed using a continuous shot on the Pylon Viewer App on Paola. 

We deliberated over the need of a beam expander, but it has been omitted presently. White printer paper is currently being used to model the Lambertian scatterer. So light scattered off the paper was observed at a distance of about 40 cm from the sample.
While proceeding with the calibrations further tonight, we realized a few challenges.

While the CCD is able to observe the beam spot perfectly well, measuring the actual power with the power meter seems to be tricky. As the scattered power is quite low, we can’t actually see any spot using a beam card and hence can’t really ensure if we are capturing the entire beam spot on the active region of the power meter (placed at a distance of ~40cm from the paper) or if we are losing out on some light, all the while ensuring that the power meter and the CCD are in the same plane.

We tried to think of some ways around that, the description of which will follow. Any ideas would be greatly appreciated.

Thanks a ton for all your patience and help Gautam! :) 

More to follow.. 

  13119   Sat Jul 15 13:40:59 2017 ranaUpdateCamerasBRDF Calibrations

Power meter only needed to measure power going into the paper not out. We use the BRDF of paper to estimate the power going out given the power going in.

  13120   Sat Jul 15 16:19:00 2017 gautamUpdateCamerasMakeshift PyPylon

Some days ago, I stumbled upon this github page, by a grad student at KIT who developed this code as he was working with Basler GigE cameras. Since we are having trouble installing SnapPy, I figured I'd give this package a try. Installation was very easy, took me ~10mins, and while there isn't great documentation, basic use is very easy - for instance, I was able to adjust the exposure time, and capture an image, all from Pianosa. The attached is some kind of in-built function rendering of the captured image - it is a piece of paper with some scribbles on it near Jigyasa's BRDF measurement setup on the SP table, but it should be straightforward to export the images in any format we like. I believe the axes are pixel indices.

Of course this is only a temporary solution as I don't know if this package will be amenable to interfacing with EPICS servers etc, but seems like a useful tool to have while we figure out how to get SnapPy working. For instance, the HDR image capture routine can now be written entirely as a Python script, and executed via an MEDM button or something.

A rudimentary example file can be found at /opt/rtcds/caltech/c1/scripts/GigE/PyPylon/examples - some of the dictionary keywords to access various properties of the camera (e.g. Exposure time) are different, but these are easy enough to figure out.

 

  13121   Sun Jul 16 11:58:36 2017 jigyasaUpdateCamerasBRDF Calibrations

 

From what I understood froom my reading, [Large-angle scattered light measurements for quantum-noise filter cavity design studies(Refer https://arxiv.org/abs/1204.2528)], we do the white paper test in order to calibrate for the radiometric response, i.e. the response of the CCD sensor to radiance.‘We convert the image counts measured by the CCD camera into a calibrated measure of scatter. To do this we measure the scattered light from a diffusing sample twice, once with the CCD camera and once with a calibrated power meter. We then compare their readings.’

But thinking about this further, if we assume that the BRDF remains unscaled and estimate the scattered power from the images, we get a calibration factor for the scattered power and the angle dependence of the scattered power!

Quote:

Power meter only needed to measure power going into the paper not out. We use the BRDF of paper to estimate the power going out given the power going in.

 

  13122   Sun Jul 16 12:09:47 2017 jigyasaUpdateCamerasBRDF Calibrations

With this idea in mind, we can now actually take images of the illuminated paper at different scattering angles, assume BRDF is the constant value of (1/pi per steradian), 

then scattered power Ps= BRDF * Pi cosθ * Ω, where Pi is the incident power, Ω is the solid angle of the camera and θ is the scattering angle at which measurement is taken. This must also equal the sum of pixel counts divided by the exposure time multiplied by some calibration factor. 

From these two equations we can obtain the calibration factor of the CCD. And for further BRDF measurements, scale the pixel count/ exposure by this calibration factor.  

Quote:

 

From what I understood froom my reading, [Large-angle scattered light measurements for quantum-noise filter cavity design studies(Refer https://arxiv.org/abs/1204.2528)], we do the white paper test in order to calibrate for the radiometric response, i.e. the response of the CCD sensor to radiance.‘We convert the image counts measured by the CCD camera into a calibrated measure of scatter. To do this we measure the scattered light from a diffusing sample twice, once with the CCD camera and once with a calibrated power meter. We then compare their readings.’

But thinking about this further, if we assume that the BRDF remains unscaled and estimate the scattered power from the images, we get a calibration factor for the scattered power and the angle dependence of the scattered power!

Quote:

Power meter only needed to measure power going into the paper not out. We use the BRDF of paper to estimate the power going out given the power going in.

 

 

  13123   Mon Jul 17 16:22:01 2017 SteveUpdateSUSruby wire standoff pictures

Bluebean Optical Tech Limited of Shanghai delivered 50 pieces red ruby prisms with radius.  The first prism pictures were taken at June 5

and it was retaken again as BB#1 later

More samples were selected randomly as one from each bag of 5 and labeled as BB#2.......6    

 The R10 mm radius can be seen agains the  ruler edge.  The v-groove edge was labeled with blue marker and pictures were taken

from both side of this ridge. The top view is shown as the wire laying across on it.

SOS sus wire of 43 micron OD used as calibration as it was placed close to the side that it was focused on.

The V-groove ridge surface quality was evaluated based on as scale of 1 – 10 with 10 being the most positive.

 BB# Edge quality score
1 4
2 8
3 3
4 9.5
5 2
6 9

Remaining thing to examin, take picture of the contacting ridge to SOS from the side.

  13124   Wed Jul 19 00:59:47 2017 gautamUpdateGeneralFINESSE model of DRMI (no arms)

Summary:

I've been working on improving the 40m FINESSE model I set up sometime last year (where the goal was to model various RC folding mirror scenarios). Specifically, I wanted to get the locking feature of FINESSE working, and also simulate the DRMI (no arms) configuration, which is what I have been working on locking the real IFO to. This elog is a summary of what I have from the last few days of working on this.

Model details:

  • No IMC included for now.
  • Core optics R and T from the 40m wiki page.
  • Cavity lengths are the "ideal" ones - see the attached ipynb for the values used.
  • RF modulation depths from here. But for now, the relative phase between f1 and f2 at the EOM is set to 0.
  • I've not included flipped folding mirrors - instead, I put a loss of 0.5% on PR3 and SR3 in the model to account for the AR surface of these optics being inside the RCs. 
  • I've made the AR surfaces of all optics FINESSE "beamsplitters" - there was some discussion on the FINESSE mattermost channel about how not doing this can lead to slightly inaccurate results, so I've tried to be more careful in this respect.
  • I'm using "maxtem 1" in my FINESSE file, which means TEM_mn modes up to (m+n=1) are taken into account - setting this to 0 makes it a plane wave model. This parameter can significantly increase the computational time. 

Model validation:

  • As a first check, I made the PRM and SRM transparent, and used the in-built routines in FINESSE to mode-match the input beam to the arm cavities.
  • I then scanned one arm cavity about a resonance, and compared the transmisison profile to the analytical FP cavity expression - agreement was good.
  • Next, I wanted to get a sensing matrix for the DRMI (no arms) configuration (see attached ipynb notebook).
    • First, I make the ETMs in the model transparent
    • I started with the phases for the BS, PRM and SRM set to their "naive" values of 0, 0 and 90 (for the standard DRMI configuration)
    • I then scanned these optics around, used various PDs to look at the points where appropriate circulating fields reached their maximum values, and updated the phase of the optic with these values.
    • Next, I set the demod phase of various RFPDs such that the PDH error signal is entirely in one quadrature. I use the RFPDs in pairs, with demod phases separated by 90 degrees. I arbitrarily set the demod phase of the Q phase PD as 90 + phase of I phase PD. I also tried to mimic the RFPD-IFO DoF pairing that we use for the actual IFO - so for example, PRCL is controlled by REFL11_I.
    • Confident that I was close enough to the ideal operating point, I then fed the error signals from these RFPDs to the "lock" routine in FINESSE. The manual recommends setting the locking loop gain to 1/optical gain, which is what I did.
    • The tunings for the BS and RMs in the attached kat file are the result of this tuning.
    • For the actual sensing matrix, I moved each of PRM, BS and SRM +/-5 degrees (~15nm) around each resonance. I then computed the numerical derivative around the zero crossing of each RFPD signal, and then plotted all of this in some RADAR plots - see Attachment #1.

Explanation of Attachments and Discussion:

  • Attachment #1 - Computed sensing matrix from this model. Compare to an actual measurement, for example here - the relative angle between the sensing matrix elements dont exactly line up with what is measured. EQ suggested today that I should look into tuning the relative phase between the RF frequencies at the EOM. Nevertheless, I tried comparing the magnitudes of the MICH sensing element in AS55 Q - the model tells me that it should be ~7.8*10^5 W/m. In this elog, I measured it to be 2.37*10^5 W/m. On the AS table, there is a 50-50 BS splitting the light between the AS55 and AS110 photodiodes which is not accounted for in the model. Factoring this in, along with the fact that there are 6 in-vaccuum steering mirrors (assume 98% reflectivity for these), 3 in air steering mirrors, and the window, the sensing matrix element from the model starts to be in the same ballpark as the measurement, at ~3*10^5 W/m. So the model isn't giving completely crazy results.
  • Attachment #2 - Example of the signals at various RFPDs in response to sweeping the PRM around its resonance. To be compared with actual IFO data. Teal lines are the "I" phase, and orange lines are "Q" phase.
  • Attachment #3 - FINESSE kat file and the IPython notebook I used to make these plots. 
  • Next steps
    • More validation against measurements from the actual IFO.
    • Try and resolve differences between modeled and measured sensing matrices.
    • Get locking working with full IFO - there was a discussion on the mattermost thread about sequential/parallel locking some time ago, I need to dig that up to see what is the right way to get this going. Probably the DRMI operating point will also change, because of the complex reflectivities of the arm cavities seen by the RF sidebands (this effect is not present in the current configuration where I've made the ETMs transparent).

GV Edit: EQ pointed out that my method of taking the slope of the error signal to compute the sensing element isn't the most robust - it relies on choosing points to compute the slope that are close enough to the zero crossing and also well within the linear region of the error signal. Instead, FINESSE allows this computation to be done as we do in the real IFO - apply an excitation at a given frequency to an optic and look at the twice-demodulated output of the relevant RFPD (e.g. for PRCL sensing element in the 1f DRMI configuration, drive PRM and demodulate REFL11 at 11MHz and the drive frequenct). Attachment #4 is the sensing matrix recomputed in this way - in this case, it produces almost identical results as the slope method, but I think the double-demod technique is better in that you don't have to worry about selecting points for computing the slope etc. 

 

  13125   Wed Jul 19 08:37:21 2017 JamieUpdateCDSUpdate on front-end/DAQ rebuild

After the catastrophic fb disk failure last week we lost essentially the entire front end system (not any of the userapp code, but the front end boot server, operating system, and DAQ).  The fb disk was entirely unrecoverable, so we've been trying to rebuild everything from the bits and pieces lying around, and some disks that Keith Thorne sent from LLO.  We're trying to get the front ends working first, and will work on recovering daqd after.

Luckily, fb1, which was being configured as an fb replacement, is mostly fully configured, including having a copy of the front end diskless root image.  We setup fb1 as the new boot server, and were able to get front ends booting again.  Unfortunately, we've been having trouble running and building models, so something is still amis.  We've been taking a three-pronged approach to getting the front ends running:

  • /diskless/root.fb: This involves booting the front ends from the backup of the diskless root from fb.  Runs gentoo kernel 2.6.34.1.  This should correspond to the environment that all models were built and running against.  But something is missing in the configuration.  The front ends were also mounting /opt from fb, which included the dolphin drivers, and we don't have a copy of that, so models aren't loading or recompiling.
  • /diskless/root.x1boot: Keith sent a disk image of the entire x1boot server from LLO.  It uses gentoo kernel 3.0.8.  This ostensibly includes everything we should need to run the front ends, but it's unfortunately configured with newer versions of some of the software and also isn't loading our existing models or building new ones.  This also seems to be having issues with the dolphin drivers.
  • /diskless/root.jessie: This is an entirely new boot image build from scratch with Debian jessie, using an RTS-patched 3.2 kernel.  This would use the latest versions of everything.  It's mostly working, we just need to rebuild the dolphin driver and source.

It seems that in all cases we need to rebuild the dolphin drivers from source.

  13127   Wed Jul 19 14:26:50 2017 JamieUpdateCDSUpdate on front-end/DAQ rebuild

 

Quote:

After the catastrophic fb disk failure last week we lost essentially the entire front end system (not any of the userapp code, but the front end boot server, operating system, and DAQ).  The fb disk was entirely unrecoverable, so we've been trying to rebuild everything from the bits and pieces lying around, and some disks that Keith Thorne sent from LLO.  We're trying to get the front ends working first, and will work on recovering daqd after.

Luckily, fb1, which was being configured as an fb replacement, is mostly fully configured, including having a copy of the front end diskless root image.  We setup fb1 as the new boot server, and were able to get front ends booting again.  Unfortunately, we've been having trouble running and building models, so something is still amis.  We've been taking a three-pronged approach to getting the front ends running:

  • /diskless/root.fb: This involves booting the front ends from the backup of the diskless root from fb.  Runs gentoo kernel 2.6.34.1.  This should correspond to the environment that all models were built and running against.  But something is missing in the configuration.  The front ends were also mounting /opt from fb, which included the dolphin drivers, and we don't have a copy of that, so models aren't loading or recompiling.
  • /diskless/root.x1boot: Keith sent a disk image of the entire x1boot server from LLO.  It uses gentoo kernel 3.0.8.  This ostensibly includes everything we should need to run the front ends, but it's unfortunately configured with newer versions of some of the software and also isn't loading our existing models or building new ones.  This also seems to be having issues with the dolphin drivers.
  • /diskless/root.jessie: This is an entirely new boot image build from scratch with Debian jessie, using an RTS-patched 3.2 kernel.  This would use the latest versions of everything.  It's mostly working, we just need to rebuild the dolphin driver and source.

It seems that in all cases we need to rebuild the dolphin drivers from source.

To clarify, we're able to boot the x1boot image with the existing 2.6.25 kernel that we have from fb.  The issue with the root.x1boot image is not the kernel version but some of the other support libraries, such as dolphin.

  13130   Fri Jul 21 18:03:17 2017 JamieUpdateCDSUpdate on front-end/DAQ rebuild

Update:

  • front ends booting with the new Debian jessie diskless root image and a linux 3.2 version of the RTS-patched kernel
  • dolphin is configured correctly and running on c1lsc and c1sus
  • models building and running with RCG 3.0.3

Up next:

  • add c1ioo to the dolphin network
  • recompile/restart all front end models
  • daqd

I'll try to get the first two of those done tomorrow, although it's unclear what model updates we'll have to do to get things working with the newer RCG.

 

  13133   Sun Jul 23 22:16:55 2017 Jamie, gautamUpdateCDSfront-end now running with new OS, RCG

All front ends and model are (mostly) running now

All suspensions are damped:

It should be possible at this point to do more recovery, like locking the MC.

Some details on the restore process:

  • all models were recompiled with the new RCG version 3.0.3
  • the new RCG does stricter simulink drawing checks, and was complaining about unterminated outputs in some of the SUS models.  Terminated all outputs it was concerned about and saved.
  • RCG 3.0 requires a new directory for doing better filter module diagnostics: /opt/rtcds/caltech/c1/chans/tmp
  • had to reset the slow machines c1susaux, c1auxex, c1auxey

The daqd is not yet running.  This is the next task.

I have been taking copious notes and will fully document the restore process once complete.

c1ioo issues

c1ioo has been giving us a little bit of trouble.  The c1ioo model kept crashing and taking down the whole c1ioo host.  We found a red light on one of the ADCs (ADC1).  We pulled the card and replaced it with a spare from the CDS cabinet.  That seemed to fix the problem and c1ioo became more stable.

We've still been seeing a lot of glitching in c1ioo, though, with CPU cycle times frequently (every couple of seconds) running above threshold for all models, up to 200 us.  I tried unloading every kernel module I could and shutting down every non-critical process, but nothing seemed to help.

We eventually tried stopping the c1ioo model altogether and that seemed to help quite a bit, dropping the long cycle rate down to something like one every 30 seconds or so.  Not sure what that means.  We should look into the BIOS again, to see if there could be something interacting with the newer kernel.

So currently the c1ioo model is not running (which is why it's all white in the CDS overview snapshot above).  The fact that c1ioo is not running and the remaining models are still occaissionly glitching is also causing various IPC errors on auxilliary models (see c1mcs, c1rfm, c1ass, c1asx). 

RCG compile warnings

the new RCG tries to do more checks on custom c code, but it seems to be having trouble finding our custom "ccodeio.h" files that live with the c definitions in USERAPPS/*/common/src/.  Unclear why yet.  This is causing the RCG to spit out warnings like the following:

Cannot verify the number of ins/outs for C function BLRMS.
    File is /opt/rtcds/userapps/release/cds/c1/src/BLRMSFILTER.c
    Please add file and function to CDS_SRC or CDS_IFO_SRC ccodeio.h file.

This are just warnings and will not prevent the model form compiling or warning.  We'll figure out what the problem is to make these go away, but they can be ignored for the time being.

model unload instability

Probably the worst problem we're facing right now is an instability that will occaissionally, but not always, cause the entire front end host to freeze up upon unloading an RTS kernel module.  This is a known issue with the newer linux kernels (we're using kernel version 3.2.35), and is being looked into.

This is particularly annoying with the machines on the dolphin network, since if one of the dolphin hosts goes down it manages to crash all the models reading from the dolphin network.  Since half the time they can't be cleanly restarted, this tends to cause a boot fest with c1sus, c1lsc, and c1ioo.  If this happens, just restart those machines, wait till they've all fully booted, then restart all the models on all hosts with "rtcds start all".

  13135   Mon Jul 24 10:45:23 2017 gautamUpdateCDSc1iscex models died

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

Quote:

All front ends and model are (mostly) running now

  13136   Mon Jul 24 10:59:08 2017 JamieUpdateCDSc1iscex models died
Quote:

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

This was me.  I had rebooted that machine and hadn't restarted the models.  Sorry for the confusion.

  13137   Mon Jul 24 12:00:21 2017 gautamUpdatePSLPSL NPRO mysteriously shut off

Summary:

At around 10:30AM today morning, the PSL mysteriously shut off. Steve and I confirmed that the NPRO controller had the RED "OFF" LED lit up. It is unknown why this happened. We manually turned the NPRO back on and hte PMC has been stably locked for the last hour or so.

Details:

There are so many changes to lab hardware/software that have been happening recently, it's not entirely clear to me what exactly was the problem here. But here are the observations:

  1. Yesterday, when I came into the lab, the MC REFL trace on the wall StripTool was 0 for the full 8 hour history - since we don't have data records, I can't go back further than this. I remember the PMC TRANS and REFL cameras looked normal, but there was no MC REFL spot on the CCD monitors. This is consistent with the PSL operating normally, the PMC being locked, and the PSL shutter being closed. Isn't the emergency vacuum interlock also responsible for automatically closing the PSL shutter? Perhaps if the turbo controller failure happened prior to Jamie/me coming in yesterday, maybe this was just the interlock doing its job. On Friday evening, the PSL shutter was certainly open and the MC REFL spot was visible on the camera. I also confirmed with Jamie that he didn't close the shutter.
  2. Attachment #1 shows the wall StripTool traces from earlier this morning. It looks like ~7.40AM, the MC REFL level went back up. Steve says he didn't manually open the shutter, and in any case, this was before the turbo pump controller failure was diagnosed. So why did the shutter open again
  3. When I came in at ~10AM, the CCD monitor showed that the PMC was locked, and the MC REFL spot was visible. 
  4. Also on attachment #1, there is a ~10min dip in the MC REFL level. This corresponds to ~10:30AM this morning. Both Steve and I were sitting in the control room at this time. We noticed that the PMC TRANS and REFL CCDs were dark. When we went in to check on the laser, we saw that it was indeed off. There was no one inside the lab area at this time to our knowledge, and as far as I know, the only direct emergency shutoff for the PSL is on the North-West corner of the PSL enclosure. So it is unclear why the laser just suddenly went off.

Steve says that this kind of behaviour is characteristic of a power glitch/surge, but nothing else seems to have been affected (I confirmed that the X and Y end lasers are ON). 

  13138   Mon Jul 24 19:28:55 2017 JamieUpdateCDSfront end MX stream network working, glitches in c1ioo fixed

MX/OpenMX network running

Today I got the mx/open-mx networking working for the front ends.  This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel.  Again, this will all be documented.

controls@fb1:~ 0$ /opt/mx/bin/mx_info
MX Version: 1.2.16
MX Build: root@fb1:/opt/src/mx-1.2.16 Mon Jul 24 11:33:57 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:43:74:62
    Product code:    10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:    485052
    Mapper:        00:60:dd:43:74:62, version = 0x00000000, configured
    Mapped hosts:    6

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:43:74:62 fb1:0                             1,0
   1) 00:30:48:be:11:5d c1iscex:0                         1,0
   2) 00:30:48:bf:69:4f c1lsc:0                           1,0
   3) 00:25:90:0d:75:bb c1sus:0                           1,0
   4) 00:30:48:d6:11:17 c1iscey:0                         1,0
   5) 00:14:4f:40:64:25 c1ioo:0                           1,0
controls@fb1:~ 0$

c1ioo timing glitches fixed

I also checked the BIOS on c1ioo and found that the serial port was enabled, which is known to cause timing glitches.  I turned off the serial port (and some power management stuff), and rebooted, and all the c1ioo timing glitches seem to have gone away.

It's unclear why this is a problem that's just showing up now.  Serial ports have always been a problem, so it seems unlikely this is just a problem with the newer kernel.  Could the BIOS have somehow been reset during the power glitch?

In any event, all the front ends are now booting cleanly, with all dolphin and mx networking coming up automatically, and all models running stably:

Now for daqd...

  13139   Mon Jul 24 19:57:54 2017 gautamUpdateCDSIMC locked, Autolocker re-enabled

Now that all the front end models are running, I re-aligned the IMC, locked it manually, and then tweaked the alignment some more. The IMC transmission now is hovering around 15300 counts. I re-enabled the Autolocker and FSS Slow loops on Megatron as well.

Quote:

MX/OpenMX network running

Today I got the mx/open-mx networking working for the front ends.  This required some tweaking to the network interface configuration for the diskless front ends, and recompiling mx and open-mx for the newer kernel.  Again, this will all be documented.

 

  13141   Tue Jul 25 02:03:59 2017 gautamUpdateOptical LeversOptical lever tuning thoughts

Summary:

Currently, I am unable to engage the coil-dewhitening filters without destroying cavity locks. One reason why this is so is because the present Oplev servos have a roll-off at high frequencies that is not steep enough - engaging the digital whitening + analog de-whitening just causes the DAC output to saturate. Today, Rana and I discussed some ideas about how to approach this problem. This elog collects these thoughts. As I flesh out these ideas, I will update them in a more complete writeup in T1700363 (placeholder for now). Past relevant elogs: 5376, 9680

  1. Why do we need optical levers?
    • ​​To stabilize the low-frequency seismic driven angular motion of the optics.
  2.  In what frequency range can we / do we need to stabilize the angular motion of the optics? How much error signal suppression do we need in the control band? How much is achievable given the current Oplev setup?
    • ​​To answer these questions, we need to build a detailed Oplev noise budget.
    • Ultimately, the Oplev error signal is sensing the differential motion between the suspended optic and the incident laser beam.
    • What frequency range does laser beam jitter dominate the actual optic motion? What about mechanical drifts of the optical tables the HeNes sit on? And for many of the vertex optics, the Oplev beam has multiple bounces on steering mirrors on the stack. What is the contribution of the stack motion to the error signal?
    • The answers to the above will tell us what lower and upper UGFs we should and can pick. It will also be instructive to investigate if we can come up with a telescope design near the Oplev QPD that significantly reduces beam jitter effects (see elog 10732). Also, can we launch/extract the beam into/from the vacuum chamber in such a way that we aren't so susceptible to motion of the stack?
  3. What are some noises that have to be measured and quantified?
    • Seismic noise
    • ​Shot noise
    • Electronics noise of the QPD readout chain
    • HeNe intensity noise (does this matter since we are normalizing by QPD sum?)
    • HeNe beam pointing / jitter noise (How? N-corner hat method?)
    • Stack motion contribution to the Oplev error signal
  4. How do we design the Oplev controller?
    • ​The main problem is to frame the right cost function for this problem. Once this cost function is made, we can use MATLAB's PSO tool (which is what was used for the PR3 coating design optimization, and also successfully for this kind of loop shaping problems by Rana for aLIGO) to find a minimum by moving the controller poles and zeros around within bounds we define.
  5. What terms should enter the cost function?

    • ​In addition to those listed in elog 5376
    • We need the >10Hz roll-off to be steep enough that turning on the digital whitening will not significantly increase the DAC output RMS or drive it to saturation.
    • We'd like for the controller to be insensitive to 5% (?) errors in the assumed optical plant and noise models i.e. the closed loop shouldn't become unstable if we made a small error in some assumed parameters.
    • Some penalty for using excessive numbers of poles/zeros? Penalty for having too many high-frequency features.
  6. Other things to verify / look into
    • ​Verify if the counts -> urad calibration is still valid for all the Oplevs. We have the arm-cavity power quadratic dependance method, and the geometry method to do this.
    •  Check if the Oplev error signals are normalized by the quadrant sum.
    • How important is it to balance the individual quadrant gains?
    • Check with Koji / Rich about new QPDs. If we can get some, perhaps we can use these in the setup that Steve is going to prepare, as part of the temperature vs HeNe noise invenstigations.

Before the CDS went down, I had taken error signal spectra for the ITMs. I will update this elog tomorrow with these measurements, as well as some noise estimates, to get started.

  13142   Tue Jul 25 08:48:57 2017 SteveUpdateVACRGA scan at d278

The RGA did not shut down at the turbo pump controller failing.

Quote:

Ifo pressure was 5.5 mTorr this morning. The PSL shutter was still open. TP2 controller failed. Interlock closed V1, V4 and VM1

Turbo pump 2 is the fore pump of the Maglev. The pressure here was 3.9 Torr so The Magelv got warm ~38C but it was still rotating at 560 Hz normal with closed V1

What I did:

Looked at pressures of Hornet and Super Bee  Instru Tech. Inc

Closed all annuloses and VA6,  disconnected V4 and VA6 and turned on external fan to cool Maglev

Opened V7 to pump the Maglev fore line with TP3

V1 opened manually when foreline pressure dropped to <2mTorr at P2 and the body temp of the Maglev cooled down to  25-27 C

VM1 opened at 1e-5 Torr

Valve configuration: vacuum normal with annuloses not pumped

Ifo pressure 8.5e-6 Torr -IT at 10am,  P2 foreline pressure 64 mTorr, TP3 controller 0.17A   22C  50Krpm

note: all valves open manually, interlock can only close them

 

Quote:

While walking down to the X end to reset c1iscex I heard what I would call a "rythmic squnching" sound coming from under the turbo pump.  I would have said the sound was coming from a roughing pump, but none of them are on (as far as I can tell).

Steve maybe look into this??

PS: please call me next time you see the vacuum is not Vacuum Normal

 

  13143   Tue Jul 25 14:04:06 2017 SteveUpdateVACturbo controller installed and we are running at vac normal

Gautam and Steve,

Spare Varian turbo-V 70 controller, Model 969-9505, sn 21612 was swapped in. It is running the turbo fine @ 50Krpm but it does not allow it's V4 valve to be opened............

It turns out that TP2 @ 75Krpm will allow V4 to open and close. This must be a software issue.

So Vacuum Normal is operational if TP2 is running 75,000 rpm

We want to run at 50,000 rpm on the long term.

Note: the RS232 Dsub connector on the back of this controller is mounted 180 degrees opposite than TP3  and old failed TP2 controller

 

PS: controller is shipping out for repair 7-28-2017

 

  13144   Tue Jul 25 14:27:19 2017 SteveUpdatesafetysafety training

Kira Dubrovina and Naomi Wharton received 40m specific basic safety training.

ELOG V3.1.3-