40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 265 of 341  Not logged in ELOG logo
ID Date Author Type Category Subjectdown
  10352   Fri Aug 8 14:27:18 2014 AkhilUpdateComputer Scripts / ProgramsFOL Scripts

 The scripts written for interfacing the FC with R Pi, building EPICS database, piping data into EPICS channels,PID loop for FOL are contained in :

 /opt/rtcds/caltech/c1/scripts/FOL 

The instructions to run these codes on R Pi( controls@domenica) will be available on FOL 40m wiki page.

Also instructions regarding EPICS installation on R Pi and building an EPICS SoftIoc to streamline data from hardware devices into channels will be updated shortly.

 

 

 

 

 

 

  10376   Wed Aug 13 16:12:55 2014 HarryUpdateGeneralFOL Layout Diagram

Per Q's request, I've made up a diagram of the complete FOL layout for general reference.

FOLLayout2.png

  10273   Fri Jul 25 17:28:31 2014 HarryUpdateGeneralFOL Box and PER Update

 Purpose

We're putting together a box to go into the 1X2 rack, to facilitate the frequency counters, and Raspberry Pi that will be used in FOL.

Separately, I am working on characterizing the Polarization Extinction Ratio of the PM980 fibers, for further use in FOL.

What's Been Done

The frequency counters have been mounted on the face of the box, and nylon spacers installed in the bottom, which will insulate the RPi in the future, once it's finally installed.

FOLBox.png

In regard to the PER setup, there is an issue, in that the mounts which hold the collimators rotate, so as to align the axes of the fibers with the polarization of the incoming light.

This rotational degree of freedom, however, isn't "sticky" enough, and rotates under the influence of the stress in the fiber. (It's not much, but enough.)

This causes wild fluctuations in coupled power, making it impossible to make accurate measurements of PER.

What's Next

In the FOL box's case, we've ordered a longer power cable for the raspberry pi (the current one is ~9 inches long).

Once it arrives, we will install the RPi, and move the box into its place in the rack.

In the case of the PER measurement, we've ordered more collimator mounts//adapters, which will hopefully give better control over rotation.

 

  1498   Mon Apr 20 05:18:42 2009 YoichiConfigurationLockingFM6 and FM10 of LSC-MC were restored
During tonight's locking work, I realized that FM6 and FM10 (both resonant gains around 20Hz) were actually activated by cm_step.
So I restored those filters from the svn history.
Instead, I removed a bunch of unused filters from LSC-DEMOD and LSC-DEMOD_A (moving zero filters) to off load c1lsc.

As for the locking itself, the DARM loop becomes unstable at around arm power = 30. I may have to add a filter to give a broader phase bubble.
  10226   Thu Jul 17 02:57:32 2014 AndresUpdate40m Xend Table upgradeFInish Calculation on Current X-arm mode Matching

Data and Calculation for the Xarm Current Mode Matching

Two days ago, Nick, Jenne, and I took a measurement for the Green Transmission for the X-arm. I took the data and I analyzed it. The first figure attached below is the raw data plotted. I used the function findpeaks in Matlab, and I found all the peaks. Then, by taking close look at the plot, I chose two peaks as shown in the second figure attached below. I took the ratio of the TEM00 and the High order mode, and I average them. This gave me a Mode Matching of 0.9215, which this value is pretty close to the value that I predicted by using a la Mode in http://nodus.ligo.caltech.edu:8080/40m/10191, which is 0.9343. Nick and I measured the reflected power when the cavity is unlocked and when the cavity is locked, so we measured the PreflUnLocked=52+1µW and PreflOnLocked=16+2µW and the backgroundNoise=0.761µW. Using this information we calculated  Prefl/Pin=0.297. Now, since Prefl/Pin=|Eref/Ein|2, we looked at the electric fields component by using the reflectivity of the mirror we calculated 0.67. The number doesn't agree, but this is because we didn't take into account the losses when making this calculation. I'm working in the calculation that will include the losses.

Today, Nick and I ordered the lenses and the mirrors. I'm working in putting together a representation of how much improvement the new design will give us in comparison to the current setup.

Attachment 1: RawDataForTheModeGreenScan.png
RawDataForTheModeGreenScan.png
Attachment 2: ResultForModeMatching.png
ResultForModeMatching.png
Attachment 3: DataAndCalculationOfModeMismatch.zip
  10237   Fri Jul 18 16:52:56 2014 AndresUpdate40m Xend Table upgradeFInish Calculation on Current X-arm mode Matching

Quote:

Data and Calculation for the Xarm Current Mode Matching

Two days ago, Nick, Jenne, and I took a measurement for the Green Transmission for the X-arm. I took the data and I analyzed it. The first figure attached below is the raw data plotted. I used the function findpeaks in Matlab, and I found all the peaks. Then, by taking close look at the plot, I chose two peaks as shown in the second figure attached below. I took the ratio of the TEM00 and the High order mode, and I average them. This gave me a Mode Matching of 0.9215, which this value is pretty close to the value that I predicted by using a la Mode in http://nodus.ligo.caltech.edu:8080/40m/10191, which is 0.9343. Nick and I measured the reflected power when the cavity is unlocked and when the cavity is locked, so we measured the PreflUnLocked=52+1µW and PreflOnLocked=16+2µW and the backgroundNoise=0.761µW. Using this information we calculated  Prefl/Pin=0.297. Now, since Prefl/Pin=|Eref/Ein|2, we looked at the electric fields component by using the reflectivity of the mirror we calculated 0.67. The number doesn't agree, but this is because we didn't take into account the losses when making this calculation. I'm working in the calculation that will include the losses.

Today, Nick and I ordered the lenses and the mirrors. I'm working in putting together a representation of how much improvement the new design will give us in comparison to the current setup.

We want to be able to graphically see how much better it is the new optical table setup in comparison to the current optical table setup. In other words, we want to be able to see how displacement of the beam and how much angle change can be obtained at the ETM from changing the mirrors angles independently. Depending on the spread of the mirrors' vectors we can observe whether the Gouy phase is good. In the plot below, the dotted lines correspond to the current set up, and we can see that the lines are not spread from each other, which essentially mean that changing the angles of the two mirrors just contribute to small change in angle and in the displacement of the beam at the ETM, and therefore the Gouy phase is not good. Now on the other hand. The other solid lines correspond to the new setup mirrors. We can observe that the spread of the line of mirror 1 and mirror 4 is almost 90 degrees, which just implies that there is a good Gouy phase different between these two mirrors. For the angles chosen in the plot, I looked at how much the PZT yaw the mirrors from the elog http://nodus.ligo.caltech.edu:8080/40m/8912. In this elog, they give a plot in mrad/v for the pitch and yaw, so I took the range that the PZT can yaw the mirrors, and I converted into mdegrees/v and then I plotted as shown below. I plot for the current setup and for the new setup in the same plot. The matlab code is also attached below.

Attachment 1: OldAndNewSetupPlotsOfDisplacementAndAngleAtTheETM.png
OldAndNewSetupPlotsOfDisplacementAndAngleAtTheETM.png
Attachment 2: OldSetUpDisplacementAndNewSetup.m.zip
  13124   Wed Jul 19 00:59:47 2017 gautamUpdateGeneralFINESSE model of DRMI (no arms)

Summary:

I've been working on improving the 40m FINESSE model I set up sometime last year (where the goal was to model various RC folding mirror scenarios). Specifically, I wanted to get the locking feature of FINESSE working, and also simulate the DRMI (no arms) configuration, which is what I have been working on locking the real IFO to. This elog is a summary of what I have from the last few days of working on this.

Model details:

  • No IMC included for now.
  • Core optics R and T from the 40m wiki page.
  • Cavity lengths are the "ideal" ones - see the attached ipynb for the values used.
  • RF modulation depths from here. But for now, the relative phase between f1 and f2 at the EOM is set to 0.
  • I've not included flipped folding mirrors - instead, I put a loss of 0.5% on PR3 and SR3 in the model to account for the AR surface of these optics being inside the RCs. 
  • I've made the AR surfaces of all optics FINESSE "beamsplitters" - there was some discussion on the FINESSE mattermost channel about how not doing this can lead to slightly inaccurate results, so I've tried to be more careful in this respect.
  • I'm using "maxtem 1" in my FINESSE file, which means TEM_mn modes up to (m+n=1) are taken into account - setting this to 0 makes it a plane wave model. This parameter can significantly increase the computational time. 

Model validation:

  • As a first check, I made the PRM and SRM transparent, and used the in-built routines in FINESSE to mode-match the input beam to the arm cavities.
  • I then scanned one arm cavity about a resonance, and compared the transmisison profile to the analytical FP cavity expression - agreement was good.
  • Next, I wanted to get a sensing matrix for the DRMI (no arms) configuration (see attached ipynb notebook).
    • First, I make the ETMs in the model transparent
    • I started with the phases for the BS, PRM and SRM set to their "naive" values of 0, 0 and 90 (for the standard DRMI configuration)
    • I then scanned these optics around, used various PDs to look at the points where appropriate circulating fields reached their maximum values, and updated the phase of the optic with these values.
    • Next, I set the demod phase of various RFPDs such that the PDH error signal is entirely in one quadrature. I use the RFPDs in pairs, with demod phases separated by 90 degrees. I arbitrarily set the demod phase of the Q phase PD as 90 + phase of I phase PD. I also tried to mimic the RFPD-IFO DoF pairing that we use for the actual IFO - so for example, PRCL is controlled by REFL11_I.
    • Confident that I was close enough to the ideal operating point, I then fed the error signals from these RFPDs to the "lock" routine in FINESSE. The manual recommends setting the locking loop gain to 1/optical gain, which is what I did.
    • The tunings for the BS and RMs in the attached kat file are the result of this tuning.
    • For the actual sensing matrix, I moved each of PRM, BS and SRM +/-5 degrees (~15nm) around each resonance. I then computed the numerical derivative around the zero crossing of each RFPD signal, and then plotted all of this in some RADAR plots - see Attachment #1.

Explanation of Attachments and Discussion:

  • Attachment #1 - Computed sensing matrix from this model. Compare to an actual measurement, for example here - the relative angle between the sensing matrix elements dont exactly line up with what is measured. EQ suggested today that I should look into tuning the relative phase between the RF frequencies at the EOM. Nevertheless, I tried comparing the magnitudes of the MICH sensing element in AS55 Q - the model tells me that it should be ~7.8*10^5 W/m. In this elog, I measured it to be 2.37*10^5 W/m. On the AS table, there is a 50-50 BS splitting the light between the AS55 and AS110 photodiodes which is not accounted for in the model. Factoring this in, along with the fact that there are 6 in-vaccuum steering mirrors (assume 98% reflectivity for these), 3 in air steering mirrors, and the window, the sensing matrix element from the model starts to be in the same ballpark as the measurement, at ~3*10^5 W/m. So the model isn't giving completely crazy results.
  • Attachment #2 - Example of the signals at various RFPDs in response to sweeping the PRM around its resonance. To be compared with actual IFO data. Teal lines are the "I" phase, and orange lines are "Q" phase.
  • Attachment #3 - FINESSE kat file and the IPython notebook I used to make these plots. 
  • Next steps
    • More validation against measurements from the actual IFO.
    • Try and resolve differences between modeled and measured sensing matrices.
    • Get locking working with full IFO - there was a discussion on the mattermost thread about sequential/parallel locking some time ago, I need to dig that up to see what is the right way to get this going. Probably the DRMI operating point will also change, because of the complex reflectivities of the arm cavities seen by the RF sidebands (this effect is not present in the current configuration where I've made the ETMs transparent).

GV Edit: EQ pointed out that my method of taking the slope of the error signal to compute the sensing element isn't the most robust - it relies on choosing points to compute the slope that are close enough to the zero crossing and also well within the linear region of the error signal. Instead, FINESSE allows this computation to be done as we do in the real IFO - apply an excitation at a given frequency to an optic and look at the twice-demodulated output of the relevant RFPD (e.g. for PRCL sensing element in the 1f DRMI configuration, drive PRM and demodulate REFL11 at 11MHz and the drive frequenct). Attachment #4 is the sensing matrix recomputed in this way - in this case, it produces almost identical results as the slope method, but I think the double-demod technique is better in that you don't have to worry about selecting points for computing the slope etc. 

 

Attachment 1: DRMI_sensingMat.pdf
DRMI_sensingMat.pdf
Attachment 2: DRMI_errSigs.pdf
DRMI_errSigs.pdf
Attachment 3: 40m_DRMI_FINESSE.zip
Attachment 4: DRMI_sensingMat_19Jul.pdf
DRMI_sensingMat_19Jul.pdf
  13385   Tue Oct 17 23:07:52 2017 gautamUpdateCDSFEs unresponsive

While working on the IFO tonight, I noticed that the blinky status lights on c1iscex and c1iscey were frozen (but those on the other 3 FEs seemed fine). But all other lights on the CDS overview screen were green I couldn't access testpoints from these machines, and the EPICS readbacks for models on these FEs (e.g. Oplev servo inputs outputs etc) were frozen at some fixed value. This lasted for a good 5 minutes at least. But the blinky lights started blinking again without me doing anything. Not sure what to make of this. I am also not sure how to diagnose this problem, as trending the slow EPICS records of the CPU execution cycle time (for example) doesn't show any irregularity.

  13386   Wed Oct 18 01:41:32 2017 jamieUpdateCDSFEs unresponsive
Quote:

While working on the IFO tonight, I noticed that the blinky status lights on c1iscex and c1iscey were frozen (but those on the other 3 FEs seemed fine). But all other lights on the CDS overview screen were green I couldn't access testpoints from these machines, and the EPICS readbacks for models on these FEs (e.g. Oplev servo inputs outputs etc) were frozen at some fixed value. This lasted for a good 5 minutes at least. But the blinky lights started blinking again without me doing anything. Not sure what to make of this. I am also not sure how to diagnose this problem, as trending the slow EPICS records of the CPU execution cycle time (for example) doesn't show any irregularity.

So this wasn't just an EPICS freeze?  I don't see how this had anything to do with any of the work I did earlier today.  I didn't modify any of the running front ends, didn't touch either of the end station machines or the DAQ, and didn't modify the network in any way.  I didn't leave anything running.

If you couldn't access test points then it sounds like it was more than just EPICS.  It sounds like maybe the end machines somehow fell of the network momentarily.  Was there anything else going on at the time?

  13387   Wed Oct 18 02:09:32 2017 gautamUpdateCDSFEs unresponsive

I was looking at the ASDC channel on dataviewer, and toggling various settings like whitening gain. At some point, the signal just froze. So I quit dataviewer and tried restarting it, at which point it complained about not being able to connect to FB. This is when I brought up the CDS_OVERVIEW medm screen, and noticed the frozen 1pps indicator lights. There was certainly something going on with the end FEs, because I was able to ping the machine, but not ssh into it. Once the 1pps lights came back, I was able to ssh into c1iscex and c1iscey, no problems.

Could it be that some of the mx processes stalled, but the systemctl routine automatically restarted them after some time? 

Quote:

So this wasn't just an EPICS freeze?  I don't see how this had anything to do with any of the work I did earlier today.  I didn't modify any of the running front ends, didn't touch either of the end station machines or the DAQ, and didn't modify the network in any way.  I didn't leave anything running.

If you couldn't access test points then it sounds like it was more than just EPICS.  It sounds like maybe the end machines somehow fell of the network momentarily.  Was there anything else going on at the time?

 

  13388   Wed Oct 18 09:21:22 2017 jamieUpdateCDSFEs unresponsive
Quote:

I was looking at the ASDC channel on dataviewer, and toggling various settings like whitening gain. At some point, the signal just froze. So I quit dataviewer and tried restarting it, at which point it complained about not being able to connect to FB. This is when I brought up the CDS_OVERVIEW medm screen, and noticed the frozen 1pps indicator lights. There was certainly something going on with the end FEs, because I was able to ping the machine, but not ssh into it. Once the 1pps lights came back, I was able to ssh into c1iscex and c1iscey, no problems.

Could it be that some of the mx processes stalled, but the systemctl routine automatically restarted them after some time?

An mx_stream glitch would have interrupted data flowing from the front end to the DAQ, but it wouldn't have affected the heartbeat.  The heartbeat stop could mean either that the front end process froze, or the EPICS communication stopped.  The fact that everything came back fine after a couple of minutes indicates to me that the front end processes all kept running fine.  If they hadn't I'm sure the machines would have locked up.  The fact that you couldn't connect to the FE machine is also suspicious.

My best guess is that there was a network glitch on the martian network.  I don't know how to account for the fact that pings still worked, though.

  13394   Wed Oct 18 23:11:53 2017 gautamUpdateCDSFEs unresponsive

This happened again just now - it was roughly this time when this happened last night as well.

There was certainly an EPICS freeze of the kind we were used to seeing prior to replacing the martian wireless router sometime in late 2015 (or early 2016?). I was trying to run the dither alignment servos on the Y-arm at this time, and all the StripTool traces flatlined.

I took the opportunity to try accessing testpoints from the iscey ADCs - specifically C1:SUS-TRY_OUT, and it seemed to work just fine. However, I couldn't ssh into c1iscey.

Looking at the dmesg once I was able to ssh in eventually (~2 minutes deadtime tonight, I feel like it was longer yesterday but can't quantify), I see the following: not sure if there are any clues in here, or whether this is the correct log to check. But there are many instances of the nfs server related message in the log. Note that the system time-stamp corresponds to when this freeze happened.

[5461308.784018] nfs: server 192.168.113.201 not responding, still trying
[5461412.936284] nfs: server 192.168.113.201 OK
[5461412.937130] systemd[1]: Starting Journal Service...
[5461412.947947] systemd-journald[20281]: Received SIGTERM from PID 1 (systemd).
[5461412.996063] systemd[1]: Unit systemd-journald.service entered failed state.
[5461413.002627] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[5461413.008983] systemd[1]: Stopping Journal Service...
[5461413.014664] systemd[1]: Starting Journal Service...
[5461413.044262] systemd[1]: Started Journal Service.
[5461413.694838] systemd-journald[400]: Received request to flush runtime journal from PID 1

 

  1038   Fri Oct 10 00:34:52 2008 robOmnistructureComputersFEs are down

The front-end machines are all down. Another cosmic-ray in the RFM, I suppose. Whoever comes in first in the morning should do the all-boot described in the wiki.
  1039   Fri Oct 10 10:20:42 2008 AlbertoOmnistructureComputersFEs are down

Quote:

The front-end machines are all down. Another cosmic-ray in the RFM, I suppose. Whoever comes in first in the morning should do the all-boot described in the wiki.


Yoichi and I went along the arms turning off and on all the FE machines. Then, from the control room we rebooted them all following the procedures in the wiki. Everything is now up again.

I restored the full IFO, re-locked the mode cleaner.
  15998   Tue Apr 6 11:13:01 2021 JonUpdateCDSFE testing

I/O chassis assembly

Yesterday I installed all the available ADC/DAC/BIO modules and adapter boards into the new I/O chassis (c1bhd, c1sus2). We are still missing three ADC adapter boards and six 18-bit DACs. A thorough search of the FE cabinet turned up several 16-bit DACs, but only one adapter board. Since one 16-bit DAC is required anyway for c1sus2, I installed the one complete set in that chassis.

Below is the current state of each chassis. Missing components are highlighted in yellow. We cannot proceed to loopback testing until at least some of the missing hardware is in hand.

C1BHD

Component Qty Required Qty Installed
16-bit ADC 1 1
16-bit ADC adapter 1 0
18-bit DAC 1 0
18-bit DAC adapter 1 1
16-ch DIO 1 1

C1SUS2

Component Qty required Qty Installed
16-bit ADC 2 2
16-bit ADC adapter 2 0
16-bit DAC 1 1
16-bit DAC adapter 1 1
18-bit DAC 5 0
18-bit DAC adapter 5 5
32-ch DO 6 6
16-ch DIO 1 1

Gateway for remote access

To enable remote access to the machines on the test stand subnet, one machine must function as a gateway server. Initially, I tried to set this up using the second network interface of the chiara clone. However, having two active interfaces caused problems for the DHCP and FTS servers and broke the diskless FE booting. Debugging this would have required making changes to the network configuration that would have to be remembered and reverted, were the chiara disk to ever to be used in the original machine.

So instead, I simply grabbed another of the (unused) 1U Supermicro servers from the 1Y1 rack and set it up on the subnet as a standalone gateway server. The machine is named c1teststand. Its first network interface is connected to the general computing network (ligo.caltech.edu) and the second to the test-stand subnet. It has no connection to the Martian subnet. I installed Debian 10.9 anticipating that, when the machine is no longer needed in the test stand, it can be converted into another docker-cymac for to run additional sim models.

Currently, the outside-facing IP address is assigned via DHCP and so periodically changes. I've asked Larry to assign it a static IP on the ligo.caltech.edu domain, so that it can be accessed analogously to nodus.

  1308   Mon Feb 16 10:18:13 2009 AlbertoUpdateLSCFE system rebooted

Quote:

Quote:

I didn't get a chance to do much testing since the sus controller (susvme1) went nuts. In retrospect, this could be due to something in the script, so maybe we should try a burt restore to Friday afternoon next time someone wants to look at it.


I tried the burtrestore today, it didn't work. Also tried some switching of timing cables, and multiple reboots, to no avail. This will require some more debugging. We might try diagnosing the clock driver and fanout modules, the penteks, and we can also try rebooting the whole FE system.


I rebooted the whole FE system and now c1susvme1 and c1susvme2 are back on.

I can't restart the MC autolocker on c1susvme2 because it doesn't let me ssh in. I tried to reboot it a few times but it didn't work. Once you restart it, it becomes inaccessible and doesn't even respond to pinging. Although the controls for the MC mirrors are on.

The mode cleaner stays unlocked.
  1309   Mon Feb 16 14:12:21 2009 YoichiUpdateLSCFE system rebooted

Quote:

I can't restart the MC autolocker on c1susvme2 because it doesn't let me ssh in. I tried to reboot it a few times but it didn't work. Once you restart it, it becomes inaccessible and doesn't even respond to pinging. Although the controls for the MC mirrors are on.

The mode cleaner stays unlocked.


MC autolocker runs on op340m, not on c1susvme2.
I restarted it and now MC locks fine.
Before that, I had to reboot c1iool0 and restore the alignment of the MC mirrors (for some reason, burt did not restore the alignment properly, so I used conlog).
  4249   Fri Feb 4 13:31:16 2011 josephbUpdateCDSFE start scripts moved to scripts/FE/ from scripts/

All start and kill scripts for the front end models have been moved into the FE directory under scripts:  /opt/rtcds/caltech/c1/scripts/FE/.  I modified the Makefile in /opt/rtcds/caltech/c1/core/advLigoRTS/ to update and place new scripts in that directory. 

This was done by using

sed -i 's[scripts/start$${system}[scripts/FE/start$${system}[g' Makefile

sed -i 's[scripts/kill$${system}[scripts/FE/kill$${system}[g' Makefile

  15695   Wed Dec 2 17:54:03 2020 gautamUpdateCDSFE reboot

As discussed at the meeting, I commenced the recovery of the CDS status at 1750 local time.

  • Started by attempting to just soft-restart the c1rfm model and see if that fixes the issue. It didn't and what's more, took down the c1sus machine.
  • So hard reboots of the vertex machines was required. c1iscey also crashed. I was able to keep the EX machine up, but I soft-stopped all the RT models on it.
  • All systems were recovered by 1815. For anyone checking, the DC light on the c1oaf model is red - this is a "known" issue and requires a model restart, but i don't want to get into that now and it doesn't disrupt normal operation.

Single arm POX/POY locking was checked, but not much more. Our IMC WFS are still out of service so I hand aligned the IMC a bit, IMC REFL DC went from ~0.3 to ~0.12, which is the usual nominal level.

  2627   Mon Feb 22 12:48:31 2010 josephb, alex, kojiUpdateComputersFE machines now coming up

Even after bringing up the fb40m, I was unable to get the front ends to come up, as they would error out with an RFM problem.

We proceeded to reboot everything I could get my hands on, although its likely it was daqawg and daqctrl which were the issue, as on the C0DAQ_DETAIL screen their status had been showing as 0xbad, but after the reboot showed up as 0x0.  They had originally come up before the frame builder had been fixed, so this might have been the culprit.  In the course of rebooting, I also found c1omc and c1lsc had been turned off as well, and turned them on.

After this set of reboots, we're now able to bring the front ends up one by one.

  1916   Mon Aug 17 02:12:53 2009 YoichiSummaryComputersFE bootfest
Rana, Yoichi

All the FE computers went red this evening.
We power cycled all of them.
They are all green now.

Not related to this, the CRT display of op540m is not working since Friday night.
We are not sure if it is the failure of the display or the graphics card.
Rana started alarm handler on the LCD display as a temporary measure.
  8895   Mon Jul 22 22:06:18 2013 KojiUpdateCDSFE Web view was fixed

FE Web view was broken for a long time. It was fixed now.

The problem was that path names were not fixed when we moved the models from the old local place to the SVN structure.

The auto updating script (/cvs/cds/rtcds/caltech/c1/scripts/AutoUpdate/update_webview.cron) is running on Mafalda.

Link to the web view: https://nodus.ligo.caltech.edu:30889/FE/

  9364   Mon Nov 11 12:19:36 2013 ranaUpdateCDSFE Web view was fixed

Quote:

FE Web view was broken for a long time. It was fixed now.

The problem was that path names were not fixed when we moved the models from the old local place to the SVN structure.

The auto updating script (/cvs/cds/rtcds/caltech/c1/scripts/AutoUpdate/update_webview.cron) is running on Mafalda.

Link to the web view: https://nodus.ligo.caltech.edu:30889/FE/

 Seems partially broken again. Not updating for most of the FE. I've commented out the cron lines for this as well as the mostly broken MEDM Snapshots job. I'm in the process of adding them to the megatron cron (since that machine is at least running 64 bit Ubuntu 12, instead of 32-bit CentOS)

  9366   Tue Nov 12 15:04:35 2013 ranaUpdateCDSFE Web view was fixed

Quote:

 Seems partially broken again. Not updating for most of the FE. I've commented out the cron lines for this as well as the mostly broken MEDM Snapshots job. I'm in the process of adding them to the megatron cron (since that machine is at least running 64 bit Ubuntu 12, instead of 32-bit CentOS)

 https://nodus.ligo.caltech.edu:30889/medm/screenshot.html

Seems to now be working. I made several fixes to the scripts to get it working again:

  1. changed TCSH scripts to BASH. Used /usr/bin/env to find bash.
  2. fixed stdout and stderr redirection so that we could see all error messages.
  3. made the PERL scripts executable. most of the PERL errors are not being logged yet.
  4. fixed paths for the MEDM screens to point to the right directories.
  5. the screen cap only works on screens which pop open on the left monitor, so I edited the screens so that they open up there by default.
  6. moved the CRON jobs from mafalda over to megatron. Mafalda no longer is running any crons.
  7. op540m used to run the 3 projector StripTool displays and have its screen dumped for this web page. Now zita is doing it, but I don't know how to make zita dump her screen.
  8483   Wed Apr 24 14:20:49 2013 KojiUpdateCDSFE Web view not updated?

The FE web view seems not up-to-date, does it? ( maybe for a year)

https://nodus.ligo.caltech.edu:30889/FE/c1mcs_slwebview_files/index.html

  5211   Fri Aug 12 16:50:37 2011 YoichiConfigurationCDSFE Status screen rearranged
I rearranged the FE_STATUS.adl so that I have a space to add c1ffc in the screen.
So, please be aware that the FE monitors are no longer in their original positions
in the screen.
  7138   Fri Aug 10 09:47:19 2012 MashaUpdateComputersFE Status

The c1lsc and c1sus screens are red in the front-end status. I restarted the frame builder, and hit the "global diag reset" button, but to no avail. Yesterday, the only thing Den and I did to c1sus was install a new c1pem model. I got rid of the changes and switched to the old one (I ran rtcds build, install, restart), but the status is still the same.

  7143   Fri Aug 10 11:08:26 2012 jamieUpdateComputersFE Status

Quote:

The c1lsc and c1sus screens are red in the front-end status. I restarted the frame builder, and hit the "global diag reset" button, but to no avail. Yesterday, the only thing Den and I did to c1sus was install a new c1pem model. I got rid of the changes and switched to the old one (I ran rtcds build, install, restart), but the status is still the same.

 The issue you're seeing here is stalled mx_stream processes on the front ends.  On the troublesome front ends you can log in and restart the mx_streams with the "mxstreamrestart" command.

  459   Tue Apr 29 21:09:12 2008 ranaDAQCDSFE Filters
These are new FE filters for downsampling and upsampling. We will be going from native hardware sampling rates of 64k down to 32k, 16k, and 2k.

The attached plot shows these filters. They are 3dB ripple, 40 dB stopband, 4th order elliptic filters in which I have moved the zeros around
into good places (e.g. to the Nyquist frequency).

I'm also attaching the .txt file containg the filter coefficients and the design strings. The filters are called x2, x4, and x32, for the
D2, D4, and D32 downsampling, respectively.
Attachment 1: fefilters.jpg
fefilters.jpg
Attachment 2: fefilters.txt
# FILTERS FOR ONLINE SYSTEM
#
# Computer generated file: DO NOT EDIT
#
# MODULES ULYAW
#
################################################################################
### ULYAW                                                                    ###
################################################################################
# SAMPLING ULYAW 65536
... 28 more lines ...
  546   Thu Jun 19 20:22:03 2008 ranaUpdateGeneralFE Computer Status
I called Rolf (@LLO) who called Alex (@MIT) who suggested that we power cycle every crate
with an RFM connection as we did before (twice in the past year).

Rob and I followed Yoichi around the lab as he turned off and on everything. There
was no special order; he started at the Y-end and worked his way into the corner and
finishing at the X-End. Along the way we also reset the 2 RFM switches around fb0.

This cured the EPICS problem; the FEs could now boot and received the EPICS data.

However, there are still some residual channel hopping-ish issues which Rob and Yoichi are
now working on.
  12325   Fri Jul 22 03:02:37 2016 KojiUpdateCOCFC painting

[Koji Gautam]

We have worked on the FC painting on ITMX and ITMY. We also replaced the OSEM fixing screws with the ones with a hex knob.
This was done except for the SD OSEM as the new screw was not long enough. We left an allen-key version of the screw for the SD OSEM.

All the full-resolution photos can be found on g-photo.


ITMY

Attachment1: The barrel was pretty dusty. Some dusts were observed on the HR face but it was not so terrible. The barrel and the HR face were blown with the ionized N2 and then wiped with IPA. The face wiping was done n a similar way as the drag wiping.

Attachment2: FC was applied to the HR surface.

Attachment3: The AR surface was also painted with FC. The brush touched the coil holder.

Attachment4: The brush touched the coil holder. Another PEEK tab was applied to remove this FC stain on the metal holder.

Attachment5: This is the result of successful removal of the FC stain.

ITMX

Attachment6: The OSEM arrangement before removal. We confirmed that the OSEM arrangement was as described on Wiki.

Attachment7/8: The ITMX was obviously a lot dirtier than ITMY. The barrel accumulated dusts.

Attachment9: This is the HR face picture with large dusts on it.

Attachment10: The HR surface was painted with FC.

Attachment11: This is the AR surface with FC painted.

Attachment 1: ITMY_barrel_dust.jpg
ITMY_barrel_dust.jpg
Attachment 2: ITMY_HR_FC.jpg
ITMY_HR_FC.jpg
Attachment 3: ITMY_AR_FC.jpg
ITMY_AR_FC.jpg
Attachment 4: ITMY_drip_removal.jpg
ITMY_drip_removal.jpg
Attachment 5: ITMY_drip_removed.jpg
ITMY_drip_removed.jpg
Attachment 6: ITMX_OSEMS.jpg
ITMX_OSEMS.jpg
Attachment 7: ITMX_barrel_dust1.jpg
ITMX_barrel_dust1.jpg
Attachment 8: ITMX_barrel_dust2.jpg
ITMX_barrel_dust2.jpg
Attachment 9: ITMX_HR_dusty.jpg
ITMX_HR_dusty.jpg
Attachment 10: ITMX_HR_FC.jpg
ITMX_HR_FC.jpg
Attachment 11: ITMX_AR_FC.jpg
ITMX_AR_FC.jpg
  6157   Tue Jan 3 15:45:04 2012 JenneUpdateComputersFB?

Is there a reason the framebuilder status light is red for all the front ends?

Also, I reenabled PRM watchdog.

  13396   Fri Oct 20 16:30:17 2017 gautamUpdateCDSFB1 installed on shelves

[steve, jamie, gautam]

The machine that now serves as out Frame Builder, FB1, was sitting on top of megatron. I decided that this wasn't ideal, and asked Steve to get some alternative mounting solution. Today, he procured some shelves to put FB1 on. Jamie suggested looking for the slider-rail that came with the machine, and using that instead, as it will allow us to slide FB1 out of the rack as we do megatron and the old FB. But as luck would have it, the distance between the rack vertical posts is 26 inches, but the rail is 27 inches. So we had to accept using the less ideal solution of putting FB1 on two shelves, with no sliding option. Photo to be uploaded shortly.

For this work, I had to shutdown FB1 for about 1 hour between 3pm and 4pm. It seems to have come back up fine now.

  354   Tue Mar 4 00:42:51 2008 ranaUpdateComputersFB0 still down ?
The framebuilder is still down. I tried restarting the daqd task and resetting the RFM
switch like it says in the Wiki but it still doesn't work right. The computer itself is
running (I can ssh to it) and the daqd process is running but there's a red light for
it on the RFM screen and dataviewer won't connect to it.

If Alex isn't over by ~10 AM, we should call him and ask for help.
  8274   Tue Mar 12 00:35:56 2013 JenneUpdateComputersFB's RAID is beeping

[Manasa, Jenne]

Manasa just went inside to recenter the AS beam on the camera after our Yarm spot centering exercises of the evening, and heard a loud beeping.  We determined that it is the RAID attached to the framebuilder, which holds all of our frame data that is beeping incessantly.  The top center power switch on the back (there are FOUR power switches, and 3 power cables, btw.  That's a lot) had a red light next to it, so I power cycled the box.  After the box came back up, it started beeping again, with the same front panel message:

H/W monitor power #1 failed.

Right now the fb is trying to stay connected to things, and we can kind of use dataviewer, but we lose our connection to the framebuilder every ~30 seconds or so.  This rough timing estimate comes from how often we see the fb-related lights on the frontend status screen cycle from green to white to red back to green (or, how long do the lights stay green before going white again).  We weren't having trouble before the RAID went down a few minutes ago, so I'm hopeful that once that's fixed, the fb will be fine. 

In other news, just to make Jamie's day a little bit better, Dataviewer does not open on Pianosa or Rosalba.  The window opens, but it stays a blank grey box.  This has been going on for Pianosa for a few days, but it's new (to me at least) on Rosalba.  This is different from the lack of ability to connect to the fb that Rossa and Ottavia are seeing.

  13312   Fri Sep 15 15:54:28 2017 gautamUpdateCDSFB wiper script

A wiper script is not yet set up for our new Frame-Builder. The disk usage is ~80% now, so I think we should start running a wiper script that manages overall disk usage and deletes old frame files to this end.

From what I could find on the elog, the way this was done was by running a cron job on FB. There is a perl script, /opt/rtcds/caltech/c1/target/fb/wiper.pl, which from what I could understand, runs a bunch of du commands on different directories to determine if there is a need to delete any files.

I copied this script over to /opt/rtcds/caltech/c1/target/daqd/wiper.pl. This is the directory in which all the new FB stuff resides. Conveniently, the script has a "dry-run" option, which I tried running on FB1. However, I get the following error message:

Fri Sep 15 15:44:45 PDT 2017
Dry run, will not remove any files!!!
You need to rerun this with --delete argument to really delete frame files
Directory disk usage:
 /frames/trend/minute_rawk
Combined 0k or 0m or 0Gb
Illegal division by zero at ./wiper.pl line 98.


So it would seem that for some reason, the du commands aren't working. From what I could tell, there aren't any directory paths specific to the old FB machine that need to be changed. I believe the script was working prior to the FB disk crash - unfortunately it doesn't look like this script was under version control but I don't think any changes have been made to this script.

Before I go down a Perl rabbit hole, has anyone seen such an error or is aware of some reason why this might not work on the new FB? Am I even using the correct scripts?

  13317   Mon Sep 18 17:17:49 2017 gautamUpdateCDSFB wiper script

After trying to debug this issue using the Perl debugger, I concluded that the problem is in the part of the code that splits the output of the "du" command into directory and disk usage. For whatever, reason, this isn't working. The version of perl running on the new FB1 machine is 5.20.2, whereas I suspect the version running on the old FB machine was 5.14.2 (which is the version on all the Ubuntu 12 workstations and megatron). Unclear whether downgrading the Perl version is the right way to go.

The FB1 disk is now getting close to full, the usage is up to 85% today.

Quote:

Before I go down a Perl rabbit hole, has anyone seen such an error or is aware of some reason why this might not work on the new FB? Am I even using the correct scripts?

 

  13318   Mon Sep 18 17:30:54 2017 ChrisUpdateCDSFB wiper script

Attached is the version of the wiper script we use on the CryoLab cymac. It works with perl v5.20.2. Is this different from what you have?

Attachment 1: wiper.pl
#!/usr/bin/perl
use File::Basename;

print "\n" .  `date` . "\n";
# Dry run, do not delete anything
$dry_run = 1;

if ($ARGV[0] eq "--delete") { $dry_run = 0; }

print "Dry run, will not remove any files!!!\n" if $dry_run;
... 184 more lines ...
  13319   Mon Sep 18 17:51:26 2017 gautamUpdateCDSFB wiper script

It is a little different - specifically, the way the splitting of the output of the "du" command into disk usage and directory is different (see Attachment #1). Apart from this, some of the parameters (e.g. what percentage to keep free) are different.

I changed the percentages to match what we had here, and edited a couple of other lines to print out the files that will be deleted. The dry run seemed to work okay, it produced the output below. Not sure why "df -h" reports a different use percentage though...

Since the script seems to be working now, I am going to set it up on FB1's crontab. Thanks Chris!.

controls@fb1:/opt/rtcds/caltech/c1/target/daqd 0$ ./wiper.pl
Mon Sep 18 17:47:06 PDT 2017
Dry run, will not remove any files!!!
You need to rerun this with --delete argument to really delete frame files
Directory disk usage:
/frames/trend/minute_raw 47126124k
/frames/trend/minute 22900668k
/frames/trend/second 760359168k
/frames/full 19337278516k
Combined 20167664476k or 19694984m or 19233Gb
/frames size 25097525144k at 80.36%
/frames is below keep value of 85.00%
Will not delete any files
df reported usage 80.36%
controls@fb1:/opt/rtcds/caltech/c1/target/daqd 0$ df -h
Filesystem                        Size  Used Avail Use% Mounted on
/dev/sda4                         2.0T  1.7T  152G  92% /
udev                               10M     0   10M   0% /dev
tmpfs                              13G  177M   13G   2% /run
tmpfs                              32G     0   32G   0% /dev/shm
tmpfs                             5.0M     0  5.0M   0% /run/lock
tmpfs                              32G     0   32G   0% /sys/fs/cgroup
/dev/sda2                          19G  3.7G   14G  21% /var
/dev/sda1                         461M   65M  373M  15% /boot
/dev/sdb1                          24T   19T  3.5T  85% /frames
192.168.113.104:/home/cds/rtcds   2.0T  1.6T  291G  85% /opt/rtcds
192.168.113.104:/home/cds/rtapps  2.0T  1.6T  291G  85% /opt/rtapps
tmpfs                             6.3G     0  6.3G   0% /run/user/1001
Quote:

Attached is the version of the wiper script we use on the CryoLab cymac. It works with perl v5.20.2. Is this different from what you have?

 

Attachment 1: perlDiff.png
perlDiff.png
  13320   Mon Sep 18 18:40:34 2017 gautamUpdateCDSFB wiper script

I did a further check on the wiper script by changing the "percent_keep" from 85.0 to 75.0, and running the script in "dry_run" mode again. The script then output to console the names of all the files it would delete in order to free up the required amount of space (but didn't actually delete any files as it was a dry run). Seemed to be sensible.

To set up the cron job, I did the following on FB1:

  • crontab -e opened up the crontab
  • Copied over a script called "wiper.cron" from /opt/rtcds/caltech/c1/target/fb to /opt/rtcds/caltech/c1/target/daqd. This essentially contains a bunch of instructions to run the wiper script with the --delete flag, and write the console output to a log file.
  • Added the following line: 33 3 * * * /opt/rtcds/caltech/c1/target/daqd/wiper.cron. So the cron job should be executed at 3:33AM everyday.
  • The cron daemon seems to be running - sudo systemctl status cron.service yields the following output:
    controls@fb1:~ 0$ sudo systemctl status cron.service
    ● cron.service - Regular background program processing daemon
       Loaded: loaded (/lib/systemd/system/cron.service; enabled)
       Active: active (running) since Mon 2017-09-18 18:16:58 PDT; 27min ago
         Docs: man:cron(8)
     Main PID: 30183 (cron)
       CGroup: /system.slice/cron.service
               └─30183 /usr/sbin/cron -f
    Sep 18 18:16:58 fb1 cron[30183]: (CRON) INFO (Skipping @reboot jobs -- not system startup)
    Sep 18 18:17:01 fb1 CRON[30205]: pam_unix(cron:session): session opened for user root by (uid=0)
    Sep 18 18:17:01 fb1 CRON[30206]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
    Sep 18 18:17:01 fb1 CRON[30205]: pam_unix(cron:session): session closed for user root
    Sep 18 18:25:01 fb1 CRON[30820]: pam_unix(cron:session): session opened for user root by (uid=0)
    Sep 18 18:25:01 fb1 CRON[30821]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Sep 18 18:25:01 fb1 CRON[30820]: pam_unix(cron:session): session closed for user root
    Sep 18 18:35:01 fb1 CRON[31515]: pam_unix(cron:session): session opened for user root by (uid=0)
    Sep 18 18:35:01 fb1 CRON[31516]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Sep 18 18:35:01 fb1 CRON[31515]: pam_unix(cron:session): session closed for user root

     

  • crontab -l on FB1 now shows the following:
    controls@fb1:~ 0$ crontab -l
    # Edit this file to introduce tasks to be run by cron.
    #
    # Each task to run has to be defined through a single line
    # indicating with different fields when the task will be run
    # and what command to run for the task
    #
    # To define the time you can provide concrete values for
    # minute (m), hour (h), day of month (dom), month (mon),
    # and day of week (dow) or use '*' in these fields (for 'any').#
    # Notice that tasks will be started based on the cron's system
    # daemon's notion of time and timezones.
    #
    # Output of the crontab jobs (including errors) is sent through
    # email to the user the crontab file belongs to (unless redirected).
    #
    # For example, you can run a backup of all your user accounts
    # at 5 a.m every week with:
    # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
    #
    # For more information see the manual pages of crontab(5) and cron(8)
    #
    # m h  dom mon dow   command
    33 3 * * * /opt/rtcds/caltech/c1/target/daqd/wiper.cron

Let's see if this works.

Quote:

Since the script seems to be working now, I am going to set it up on FB1's crontab. Thanks Chris!.

 

  12151   Mon Jun 6 16:41:36 2016 ericqUpdateCDSFB upgrade work

Barring objections, starting tomorrow morning, Jamie will be testing the new FB code. The IFO will not be available for other use while this is ongoing.

  9839   Tue Apr 22 01:39:57 2014 JenneUpdateCDSFB unhappy again

[Jenne, Q]

The frame builder (or something) is unhappy again.  I know that we've seen this before, but I can't find the elog entry that relates to this particular problem.

Every few minutes, the fb status lights on the CDS_STATUS screen go white, and then come back green.  It's annoying when it happens every hour or so (which is unfortunately typical), but it's pretty debilitating when it stops dataviewer and dtt every few minutes.  Just from the way the lights change, it looks like perhaps the daqd process is restarting itself periodically? 

  10050   Tue Jun 17 17:04:26 2014 ericqUpdateComputer Scripts / ProgramsFB troubles

Quote:

Also, the CDS FE status screen had red lights blinking as if it required an 'mxstream restart'. I did the same and it did not fix the problem. So I tried to restart fb using the usual 'telnet fb 8087'; but could not restart fb that way.

 FB is acting strange. When ssh-ing in, certain commands cause an inescapable hang, which can't be ctrl-c'd out of. Telling it to reboot does nothing. This kind of situation was seen by me before, when we were getting all the front ends back, I eventually hard rebooted it, hoping it was a one time thing. Guess it's not. 

Looking at the dmesg output, daqd seems to be segfault-ing all over the place. This may be related... Here are some examples:

451314.730502] daqd[17339]: segfault at 7ff589ae3b30 ip 00007ff589ae3b30 sp 00007ff49931dfb8 error 15 in libmyriexpress.so[7ff589ae3000+1000]

[530516.313238] daqd[18442] general protection ip:7f3f2ce73a6c sp:7f3e29949d50 error:0

[530516.313250] daqd[18420] general protection ip:7f3f2ce73a6c sp:7f3e2a19fd50 error:0 in libc-2.10.1.so[7f3f2ce3f000+14c000]

[530516.313262]  in libc-2.10.1.so[7f3f2ce3f000+14c000]

[530516.327083] daqd[18412]: segfault at 3b04c9cd0 ip 00007f3f2ce73a6c sp 00007f3e2a4a7d50 error 4 in libc-2.10.1.so[7f3f2ce3f000+14c000]

[537695.364481] daqd[18489]: segfault at 12dbbcae0 ip 00007fa35a3b8a0a sp 00007fa298381af0 error 6 in libmyriexpress.so[7fa35a399000+28000]

[577316.821618] daqd[18758]: segfault at 7f5c4d3e9b30 ip 00007f5c4d3e9b30 sp 00007f5b5cc23fb8 error 15 in libmyriexpress.so[7f5c4d3e9000+1000]

I'm not inclined to go reboot it right now, but not sure how to address these problems...

 

 

  9437   Wed Dec 4 12:02:39 2013 KojiUpdateCDSFB restored

Now FB is fixed: daqd and nds are running


When I rebooted FB, I noticed that any of the nfs file systems were not mounted.
I started tracking down the issues from here.

I googled the common issues of the nfs mounting during the boot sequence.
- It is good to give "_netdev" option to fstab to mount the system after the network connection is established.

- "auto" option specifies that the file system is mounted when mount -a is run

Resulting /etc/fstab is this:

/dev/sdb1                            /            ext3    noatime                    0 1
/swapfile                            none         swap    sw                         0 0
shm                                  /dev/shm     tmpfs   nodev,nosuid,noexec        0 0
/dev/sda1                            /frames      ext3    noatime                    0 0
linux1:/home/cds/                    /cvs/cds     nfs     _netdev,auto,rw,bg,soft    0 0
linux1:/home/cds/rtcds               /opt/rtcds   nfs     _netdev,auto,rw,bg,soft    0 0
linux1:/home/cds/rtapps              /opt/rtapps  nfs     _netdev,auto,rw,bg,soft    0 0
linux1:/home/cds/caltech/apps/linux  /opt/apps    nfs     _netdev,auto,rw,bg,soft    0 0

But this didn't help mounting the nfs file systems at boot yet. I dug into google again and found a command "/sbin/rc-update".
"/sbin/rc-update show" shows what services are activated at boot. It did not include "nfsmount". So the following command
was executed

 

> sudo /sbin/rc-update add nfsmount boot

> /sbin/rc-update show

* Broken runlevel entry: /etc/runlevels/boot/portmap
            bootmisc | boot                         
             checkfs | boot                         
           checkroot | boot                         
               clock | boot                         
         consolefont | boot                         
               dcron |      default                 
               dhcpd |      default                 
            hostname | boot                         
            in.tftpd | boot                         
             keymaps | boot                         
               local |      default nonetwork       
          localmount | boot                         
             modules | boot                         
               monit |      default                 
                  mx |      default                 
            net.eth0 |      default                 
              net.lo | boot                         
            netmount |      default                 
                 nfs | boot                         
            nfsmount | boot                         
          ntp-client | boot default                 
           rmnologin | boot                         
           rpc.statd | boot                         
                sshd | boot                         
           syslog-ng | boot                         
      udev-postmount |      default                 
             urandom | boot                         
              xinetd |      default

After rebooting, I confirmed that the nfs file systems are correctly mounted
and daqd and nds are automatically started.

This means that FB had never been configured to run correctly at boot. Shame on you!

  8278   Tue Mar 12 12:06:22 2013 JamieUpdateComputersFB recovered, RAID power supply #1 dead

The framebuilder RAID is back online.  The disk had been mounted read-only (see below) so daqd couldn't write frames, which was in turn causing it to segfault immediately, so it was constantly restarting.

The jetstor RAID unit itself has a dead power supply.  This is not fatal, since it has three.  It has three so it can continue to function if one fails.  I have removed the bad supply and gave it to Steve so he can get a suitable replacement.

Some recovery had to be done on fb to get everything back up and running again.  I ran into issues trying to do it on the fly, so I eventually just rebooted.  It seemed to come back ok, except for something going on with daqd.  It was reporting the following error upon restart:

[Tue Mar 12 11:43:54 2013] main profiler warning: 0 empty blocks in the buffer

It was spitting out this message about once a second, until eventually the daqd died.  When it restarted it seemed to come back up fine.  I'm not exactly clear what those messages were about, but I think it has something to do with not being able to dump it's data buffers to disk.  I'm guessing that this was a residual problem from the umounted /frames, which somehow cleared on it's own.  Everything seems to be ok now.

Quote:

Manasa just went inside to recenter the AS beam on the camera after our Yarm spot centering exercises of the evening, and heard a loud beeping.  We determined that it is the RAID attached to the framebuilder, which holds all of our frame data that is beeping incessantly.  The top center power switch on the back (there are FOUR power switches, and 3 power cables, btw.  That's a lot) had a red light next to it, so I power cycled the box.  After the box came back up, it started beeping again, with the same front panel message:

H/W monitor power #1 failed.

DO NOT DO THIS.  This is what caused all the problems.  The unit has three redundant power supplies, for just this reason.  It was probably continuing to function fine.  The beeping was just to tell you that there was something that needed attention.  Rebooting the device does nothing to solve the problem.  Rebooting in an attempt to silence beeping is not a solution.  Shutting of the RAID unit is basically the equivalent of ripping out a mounted external USB drive.  You can damage the filesystem that way.  The disk was still functioning properly.  As far as I understand it the only problem was the beeping, and there were no other issues.  After you hard rebooted the device, fb lost it's mounted disk and then went into emergency mode, which was to remount the disk read-only.  It didn't understand what was going on, only that the disk seemed to disappear and the reappear.  This was then what caused the problems.  It was not the beeping, it was the restarting the RAID that was mounted on fb.

Computers are not like regular pieces of hardware.  You can't just yank the power on them.  Worse yet is yanking the power on a device that is connected to a computer.  DON"T DO THIS UNLESS YOU KNOW WHAT YOU"RE DOING.  If the device is a disk drive, then doing this is a sure-fire way to damage data on disk.

 

  9354   Wed Nov 6 15:12:01 2013 JenneUpdateCDSFB not talking to LSC?

Something funny is going on with the framebuilder's communication with the LSC machine. 

This is a different failure mode / error than I have seen before.  It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem.

ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem.  Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually.  All of the models on the lsc machine are running fine now, though.

Here's a screenshot of the CDS overview screen, with the error lights:

Screenshot-Untitled_Window-1.png

  9357   Wed Nov 6 17:21:58 2013 JamitUpdateCDSFB not talking to LSC?

Quote:

Something funny is going on with the framebuilder's communication with the LSC machine. 

This is a different failure mode / error than I have seen before.  It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem.

ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem.  Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually.  All of the models on the lsc machine are running fine now, though.

Here's a screenshot of the CDS overview screen, with the error lights:

Screenshot-Untitled_Window-1.png

This definitely looks like a timing problem on the c1lsc front end computer.  The red lights on the left mean that the timing synchronization is lost at the user model.  I'm perplexed why it looks like the IOP is not seeing the same error, though, since it should originate at the ADC.  The red lights to the right just mean the timing synchronization is lost with the DAQ, which is too be expected given a timing loss at the front end.

We'll have to take a closer look when this happens again.

  9021   Sun Aug 18 16:04:07 2013 ranaSummaryCDSFB lights all RED: mxstream restart

Sun Aug 18 15:52:50 2013

Found the FB lights (C1:FEC-NN_FB_NET_STATUS and C1:DAQ-DC0_C1XXX_STATUS) RED for everything on the CDS_FE_STATUS screen.

I used the (! mxstream restart) button ro restart the mxstreams. Everything is green now.

PMC was out of lock- relocked it and the IMC locked itself as did the X & Y arms on IR. X was already green locked.

Attachment 1: IFO-Trend.png
IFO-Trend.png
  16294   Tue Aug 24 18:44:03 2021 KojiUpdateCDSFB is writing the frames with a year old date

Dan Kozak pointed out that the new frame files of the 40m has not been written in 2021 GPS time but 2020 GPS time.

Current GPS time is 1313890914 (or something like that), but the new files are written as C-R-1282268576-16.gwf

I don't know how this can happen but this may explain why we can't have the agreement between the FB gps time and the RTS gps time.

(dataviewer seems dependent on the FB GPS time and it indicates 2020 date. DTT/diaggui does not.)


This is the way to check the gpstime on fb1. It's apparently a year off.

controls@fb1:~ 0$ cat /proc/gps
1282269402.89

Attachment 1: Screen_Shot_2021-08-24_at_18.46.24.png
Screen_Shot_2021-08-24_at_18.46.24.png
  16298   Wed Aug 25 17:31:30 2021 PacoUpdateCDSFB is writing the frames with a year old date

[paco, tega, koji]

After invaluable assistance from Jamie in fixing this yearly offset in the gps time reported by cat /proc/gps, we managed to restart the real time system correctly (while still manually synchronizing the front end machine times). After this, we recovered the mode cleaner and were able to lock the arms with not much fuss.

Nevertheless, tega and I noticed some weird noise in the C1:LSC-TRX_OUT which was not present in the YARM transmission, and that is present even in the absence of light (we unlocked the arms and just saw it on the ndscope as shown in Attachment #1). It seems to affect the XARM and in general the lock acquisition...

We took some quick spectrum with diaggui (Attachment #2) but it doesn't look normal; there seems to be broadband excess noise with a remarkable 1 kHz component. We will probably look into it in more detail.

Attachment 1: TRX_noise_2021-08-25_17-40-55.png
TRX_noise_2021-08-25_17-40-55.png
Attachment 2: TRX_TRY_power_spectra.pdf
TRX_TRY_power_spectra.pdf
ELOG V3.1.3-