There has been an ongoing memory error in optimus with the following messages:
Message from syslogd@optimus at Jun 30 14:57:48 ...
kernel:[1292439.705127] [Hardware Error]: Corrected error, no action required.
Message from syslogd@optimus at Jun 30 14:57:48 ...
kernel:[1292439.705174] [Hardware Error]: CPU:24 (10:4:2) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc04410032080a13
Message from syslogd@optimus at Jun 30 14:57:48 ...
kernel:[1292439.705237] [Hardware Error]: MC4_ADDR: 0x0000001ad2bd06d0
Message from syslogd@optimus at Jun 30 14:57:48 ...
kernel:[1292439.705264] [Hardware Error]: MC4 Error (node 6): DRAM ECC error detected on the NB.
Message from syslogd@optimus at Jun 30 14:57:48 ...
kernel:[1292439.705323] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
Optimus is a Sun Fire X4600 M2 Split-Plane server. Based on this message, the issue seems to be in memory controller (MC) 6, chip set row (csrow) 7, channel 0. I got this same result again after installing edac-utils and running edac-util -v, which gave me:
mc6: csrow7: mc#6csrow#7channel#0: 287 Corrected Errors
and said that all other DIMMs were working fine with 0 errors. Each MC has 4 csrows numbered 4-7. I shut off optimus and checked inside and found that it consists of 8 CPU slots lined up horizontally, each with 4 DIMMs stacked vertically and 4 empty DIMM slots beneath. I'm thinking that each of the 8 CPU slots has its own memory controller (0-7) and that the csrow corresponds to the position in the vertical stack, with csrow 7 being the topmost DIMM in the stack. This would mean that MC 6, csrow 7 would be the 7th memory controller, topmost DIMM. The channel would then correspond to which one of the DIMMs in the pair is faulty although if the DIMM was replaced, both channels 0 and 1 would be switched out. Here are some sources that I used:
I'll find the exact part needed to replace soon.
Optimus' memory errors are back so I found the exact DIMM model needed to replace: http://www.ebay.com/itm/Lot-of-10-Samsung-4GB-2Rx4-PC2-5300P-555-12-L0-M393T5160QZA-CE6-ECC-Memory-/201604698112?hash=item2ef0939000:g:EgEAAOSwqBJXWFZh I'm not sure what website would be the best for buying new DIMMs but this is the part we need: Samsung 4GB 2Rx4 PC2-5300P-555-12-L0 M393T5160QZA-CE6.
I replaced the suspected faulty DIMM earlier today (actually I replaced a pair of them as per the Sun Fire X4600 manual). I did things in the following sequence, which was the recommended set of steps according to the maintenance manual and also the set of graphics on the top panel of the unit:
I then checked for memory errors using edac-utils, and over the last couple of hours, found no errors (corrected or otherwise, see Praful's earlier elog for the error messages that we were getting prior to the DIMM swap)- I guess we will need to monitor this for a while more before we can say that the issue has been resolved.
Looking at dmesg after the reboot, I noticed the following error messages (not related to the memory issue I think):
[ 19.375865] k10temp 0000:00:18.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.375996] k10temp 0000:00:19.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.376234] k10temp 0000:00:1a.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.376362] k10temp 0000:00:1b.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.376673] k10temp 0000:00:1c.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.376816] k10temp 0000:00:1d.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.376960] k10temp 0000:00:1e.3: unreliable CPU thermal sensor; monitoring disabled
[ 19.377152] k10temp 0000:00:1f.3: unreliable CPU thermal sensor; monitoring disabled
I wonder if this could explain why the fans on Optimus often go into overdrive and make a racket? For the moment, the fan volume seems normal, comparable to the other SunFire X4600s we have running like megatron and FB...
I did apt-get update and then apt-get upgrade on optimus. All systems are nominal.
Assembled is the list of dead pressure gauges. Their locations are also circled in Attachment 1.
For replacements, I recommend we consider the Agilent FRG-700 Pirani Inverted Magnetron Gauge. It uses dual sensing techniques to cover a broad pressure range from 3e-9 torr to atmosphere in a single unit. Although these are more expensive, I think we would net save money by not having to purchase two separate gauges (Pirani + hot/cold cathode) for each location. It would also simplify the digital controls and interlocking to have a streamlined set of pressure readbacks.
For controllers, there are two options with either serial RS232/485 or Ethernet outputs. We probably want the Agilent XGS-600, as it can handle all the gauges in our system (up to 12) in a single controller and no new software development is needed to interface it with the slow controls.
Now that the new Agilent full-range gauges (FRGs) have been received, I'm putting together an installation plan. Since my last planning note in Sept. (ELOG 15577), two more gauges appear to be malfunctioning: CC2 and PAN. Those are taken into account, as well. Below are the proposed changes for all the sensors in the system.
Update to the gauge replacement plan (15692), based on Jordan's walk-through today. He confirmed:
Based on this info (and also info from Gautam that the PAN gauge is still working), I've updated the plan as follows. In summary, I now propose we install the fifth FRG in the TP1 foreline (PTP1 location) and leave P2 and P3 where they are, as they are no longer needed elsewhere. Any comments on this plan? I plan to order all the necessary gaskets, blanks, etc. tomorrow.
The 24 V Sorenson (2nd from bottom) in the small rack west of 1x2 was repurposed to 12V 600 mA, and was run to a terminal block on the north side of 1X1. Cables were routed underneath 1X1 and 1X2 to the terminal blocks. 12V was then routed to the PSL table and banana clip terminals were added.
For the 40m Upgrade, we plan to eliminate the Mach-Zehnder and replace it with a single EOM driven by all three modulation frequencies that we'll need: f1=11MHz, f2=5*f1=55MHz, fmc=29.5MHz.
A frequency generator will produce the three frequencies and with some other electronics we'll properly combine and feed them to the EOM.
The frequency generator will have two crystals to produce the f1 and fmc signals. The f2 modulation will be obtained by a frequency multiplier (5x) from the f1.
The frequency multiplier, for the way it works, will inevitably introduce some unwanted harmonics into the signals. These will show up as extra modulation frequencies in the EOM.
In order to quantify the effects of such unwanted harmonics on the interferometer and thus to let us set some limits on their amplitude, I ran some simulations with Optickle. The way the EOM is represented is by three RF modulators in series. In order to introduce the unwanted harmonics, I just added an RF modulator in series for each of them. I also made sure not to leave any space in between the modulators, so not to introduce phase shifts.
To check the effect at DC I looked at the sensing matrix and at the error signals. I considered the 3f error signals that we plan to use for the short DOFs and looked at how they depend on the CARM offset. I repeated the simulations for several possible amplitude of the unwanted harmonics. Some results are shown in the plots attached to this entry. 'ga' is the amplitude ratio of the unwanted harmonics relative to the amplitude of the 11 & 55 MHz modulations.
Comparing to the case where there are no unwanted harmonics (ga = 0), one can see that not considerable effect on the error signals for amplitudes 40dB smaller than that of the main sidebands. Above that value, the REFL31I signals, that we're going to use to control PRCL, will start to be distorted: gain and linearity range change.
So 40 dB of attenuation in the unwanted harmonics is probably the minimum requirement on the frequency multiplier, although 60dB would provide a safer margin.
I'm still thinking how to evaluate any AC effect on the IFO.
** TODO: Plot DC sweeps with a wider range (+/- 20 pm). Also plot swept sines to look for changes in TFs out to ~10 kHz.
I re-routed around the c1lsc machine this morning. I turned the crate off, and disconnected the transmission fiber from c1lsc (which went to the receiver on c1asc). I then took the receiving fiber from c1lsc and plugged it into the receiver on c1asc.
I pulled out the c1lsc computer from the VME crate and pulled out the RFM card, which I needed for the CDS upgrade. I then replaced the lsc card back in the crate and turned it back on. Since there hasn't been a working version of the LSC code on linux1 since I overwrote it with the new CDS lsc code, this shouldn't have any significant impact on the interferometer.
I've confirmed that the RFM network seems to be in a good state (the only red lights on the RFM timing and status medm screen are LSC, ASC, and ETMX). Fast channels can still be seen with dataviewer and fb40m appears to still be happy.
The RFM card has found its new home in the SUS IO Chassis. The short fiber that used to go between c1asc and c1lsc is now on the top shelf of the new 1X3 rack.
I just realized that an unfortunate casualty of this LSC work was the deletion of the slow controls for the LSC which we still use (some sort of AUX processor). For example, the modulation
depth slider for the MC is now in an unknown state.
If you're refering to just the medm screen, those can be restored from the SVN. As we're moving to a new directory structure, starting with /opt/rtcds/caltech/c1/, the old LSC screens can all be put back in the /cvs/cds/caltech/medm/c1/lsc directory if desired.
The slow lsc aux crate, c1iscaux2, is still working, and those channels are still available. I confirmed that one was still updating. As a quick test, I went to the SVN and pulled out the C1LSC_RFADJUST.adl file, renamed it to C1LSC_RFadjust.adl and placed it in /cvs/cds/caltech/medm/c1/lsc/, and checked it linked properly from the C1IOO_ModeCleaner.adl file. I haven't touched the modulation depths, as I didn't want to mess with the mode cleaner, but if I get an OK, we can test that today and confirm that modulation depth control is still working.
Fridge brought back inside.
It happened again. Defrosting required.
After Koji and I reset the transmission normalizations last Friday, he did some alignment work that increased the Yarm power. So, I had set the transmission normalization when we weren't really at full Yarm power. Today I reset the normalization so that instead of ~1.2, the Y transmission PDs read ~1.0.
OD = 0.0036" = 0.091 mm
Length = 46" = 1168.4 mm
Resistance = 33.3 Ohms
resistivity = R * pi * (OD/2)^2
----------------- = 1.85e-7 Ohm-meters
resistivity (Ohm-meter x 10^-7)
304 Stainless 7.2
316 Stainless 7.4
Cast Steel 1.6
Is it because of the change in the resonant frequency of the BS-PRM stack? How much the load on BS-PRM changed?
Or is it because of the change in the resonant frequency of PR2/PR3
I claim that neither of those things is plausible. We took out 1 PZT, and put in 1 active TT onto the BS table. There is no way the resonant frequency changed by an appreciable amount due to that switch.
I don't think that it is the resonant frequency of the TTs either. Here, I collate the data that we have on the resonant frequencies of our tip tilts. It appears that in elog 3425 I recorded results for TTs 2 and 3, but in elog 3447 I just noted that the measurements had been done, and never put them into the elog. Ooops.
Resonant frequency and Q of modes of passive tip tilts.
Notes: "Serial Number" of TTs here is based on the SN of the top suspension point block. This does not give info about which TT is where. Pitch modes were all too low of Q to be measured, although we tried.
Tip tilt mode measurements were taken with a HeNe and PD shadow sensor setup - the TT's optic holder ring was partially obscuring the beam.
[JC, Anchal, Yuta]
We are working on resonant frequency idendification from the free swing test done last weekend.
Table below is the resonant frequencies identified, and attached are the plots of peak identification for some of our new suspensions.
To identify the resonant frequencies, the kicks were done in each degrees of freedom so that we can assume, for example, SUSPOS will be mostly excited when kicked in POS and the heighest peak is at the POS resonant frequency.
For PR3, AS1 and ETMY, the resonant frequency idendification needs to be done in the order of POS, PIT, YAW, SIDE and identified frequencies need to be removed when finding a peak.
Other than that, the identification was done without any prior assumptions on the suspensions.
For ITMY, ETMY, PR2, PR3, AS1, AS4, yaw has lower resonant frequencies than pitch, as opposed to other suspensions.
For LO1, POS and PIT frequencies might be swapped because LLCOIL is not working (40m/16898) and POS/PIT kicks both might be excited SUSPOS/PIT.
LO1 coil output matrix was temporarily modified so that we use only two coils for POS/PIT/YAW excitation (Attachment #7), as we did for ITMY (40m/16899).
The scripts for the free swinging test and analysis live in /Git/40m/scripts/SUS/InMatCalc
POS PIT YAW SIDE
BS 0.990 0.748 0.794 0.959
ITMY 0.987 0.739 0.634 0.948 fPIT > fYAW
ETMY 0.979 0.816 0.649 0.954 fPIT > fYAW
ITMX 0.978 0.586 0.758 0.959
ETMX 0.962 0.725 0.847 1.000
PRM 0.939 0.541 0.742 0.990
PR3 1.019 0.885 0.751 0.989 fPIT > fYAW
PR2 0.996 0.816 0.724 0.999 fPIT > fYAW
SRM 0.969 0.533 0.815 0.985
SR2 0.978 0.720 0.776 0.997
LO1 0.926 1.011 0.669 0.993 POS AND PIT MIGHT BE SWAPPED
LO2 0.964 0.998 0.995 0.990 WRONG DUE TO STUCK (40m/16913)
AS1 1.028 0.832 0.668 0.988 fPIT > fYAW
AS4 1.015 0.800 0.659 0.991 fPIT > fYAW
MC1 0.967 0.678 0.797 0.995
MC2 0.968 0.748 0.815 0.990
MC3 0.978 0.770 0.841 0.969
Every so often things just work out. You do the calculations, you put the lenses on the bench, you manually adjust the pointing and fiddle with the lenses a bit, you get massive chunks of assistance from Kiwamu to get the alignment controls and monitors set up and after quite a bit of fiddling and tweaking the cavity mirror alignment you might get some nice TEM_00 -like shapes showing up on your Y-arm video monitors.
So. We have resonating green light in the Y-arm. The beam is horribly off-axis and the mode-matching, while close enough to give decent looking spots, has in no way been optimised yet. Things to do tomorrow - fix the off-cavity-axis problem and tweak up the mode-matching... then start looking at the locking...
Koji asked me to perform a simulation of the response of POP QPD DC signal to mirror motions, as a function of the CARM offset. Later than promised, here are the first round of results.
I simulated a double cavity, and the PRC is folded with parameters close to the 40m configuration. POP is extracted in transmission of PR2 (1ppm, forward beam). For the moment I just placed the QPD one meter from PR2, if needed we can adjust the Gouy phase. There are two QPDs in the simulation: one senses all the field coming out in POP, the other one is filtered to sense only the contribution from the carrier field. The difference can be used to compute what a POP_2F_QPD would sense. All mirrors are moved at 1 Hz and the QPD signals are simulated:
This shows the signal on the POP QPD when all fields (carrier and 55 MHz sidebands) are sensed. This is what a real DC QPD will see. As expected at low offset ETM is dominant, while at large offset the PRC mirrors are dominant. It's interesting to note that for any mirror, there is one offset where the signal disappears.
This is the contribution coming only from the carrier. This is what an ideal QPD with an optical low pass will sense. The contribution from the carrier increases with decreasing offset, as expected since there is more power.
Finally, this is what a 2F QPD will sense. The contribution is always dominated by the PRC mirrors, and the ETM is negligible.
The zeros in the real QPD signal is clearly coming from a cancellation of the contributions from carrier and sidebands.
The code is attached.
For several MICH offsets, I measured the response of REFL33Q, ASDC and the ratio ASDC/POPDC to a MICH EXC. It appears that there is no frequency-dependent effect. The plots for MICH_OFFSET = 0.0 and 2.0 are slightly lower in magnitude: the reason is they were the first measurements done, and after that a little realignment of BS was necessary, so probably that is the reason.
Amplitude of 29.485MHz input sine wave [dBm] | Value of channel C1:IOO-MC_DEMOD_LO
-------------------------------------------- | -----------------------------------
-10 | -0.000449867
-8 | -0.000449867
-6 | -0.000449867
-4 | 0.000384331
-2 | 0.00526733
0 | 0.0199163
2 | 0.0492143
4 | 0.0931613
6 | 0.161523
8 | 0.229885
10 | 0.293364
I restarted the frame builder at the following times
Tue Sep 13 14:53:49 PDT 2011
Tue Sep 13 16:46:32 PDT 2011
Tue Sep 13 17:24:16 PDT 2011
Restarted backup since fb40m was rebooted.
Elog restart script killed the elog, but didn't restart it. Restarted by hand.
I've noticed that it always takes running the script twice for it to actually work. I think there's something wrong with how it's doing it. I'll mess with it sometime the elog isn't getting much use.
with the script, as it was down.
Fb is now once again actually recording trends.
A section of the daqdrc file (located in /opt/rtcds/caltech/c1/target/fb/ directory) had been commented out by Alex and never uncommented. This section included the commands which actually make the fb record trends.
The section now reads as:
# comment out this block to stop saving data
#start frame-writer "22.214.171.124" broadcast="126.96.36.199" all;
The elog was dead this morning. I reanimated it. It is now undead.
Aka, from a hotel in Pisa.
Restarted Thu May 19 00:21:49 2011 to recover from Jenne's Italian terrorism.
Manasa let me know that the ELOG was down, and that she used the normal restart procedure, but then all entries were gone.
This is because the ELOG has been moved on nodus, so going to the old place and running the restart script starts up the old dysfunctional ELOG installation.
The proper place to restart the ELOG is now nodus:/export/home/elog/start-elog.csh
I'm updating the relevant 40m wiki page now.
Came in a little bit after 8 and found the MC unlocked and struggling to lock for the past 3 hours. Looking at the SUS overview, both MC1 and ITMX Watchdogs had tripped so we damped the suspensions and brought them back to a good state. The autolocker was still not able to catch lock, so we cleared the WFS filter history to remove large angular offsets in MC1 and after this the MC caught its lock again.
Looks like two EQs came in at around 4:45 AM (Pacific) suggested by a couple of spikes in the seismic rainbow, and this.
MC was unlocked and struggling to recover this morning due to misguided WFS offsets. In order to recover from this kind of issue, we
The MC is now restored and the plan is to let it run for a few hours so the offsets converge; then run the WFS relief script.
We found that megatron is unable to properly run scripts/MC/WFS/mcwfsoff and scripts/MC/WFS/mcwfson scripts. It fails cdsutils commands due to a library conflict. This meant that WFS loops were not turned off when IMC would get unlocked and they would keep integrating noise into offsets. The mcwfsoff script is also supposed to clear up WFS loop offsets, but that wasn't happening either. The mcwfson script was also not bringing back WFS loops on.
Gautam fixed these scripts temprorarily for running on megatron by using ezcawrite and ezcaswitch commands instead of cdsutils commands. Now these scripts are running normally. This could be the reason for wildly fluctuating WFS offsets that we have seen in teh past few months.
gautam: the problem here is that megatron is running Ubuntu18 - I'm not sure if there is any dedicated CDS group packaging for Ubuntu, and so we're using some shared install of the cdsutils (hosted on the shared chiara NFS drive), which is complaining about missing linked lib files. Depending on people's mood, it may be worth biting the bullet and make Megatron run Debian10, for which the CDS group maintains packages.
This would be a daily first task in the morning. We'll need to check the status of arm alignment and optimize it back to maximum every morning for the rest of the day's work.
Today, when I came, on openin gthe PSL shutter, IMC was aligned good, both arms were flashing but YARM maximum transmission count was around 0.7 (as opposed to 1 from yesterday) and XARM maximum transmission count was 0.5 (as opposed to 1 from yesterday). I did not change the input alignment to the interferometer. I only used ITMY-ETMY to regain flashing count of 1 on YARM and used BS and tehn ITMX-ETMX to regain flashing count of 0.9 to 1 in XARM.
Even thought the oplevs were centered yesterday, I found the oplev had drifted from the center and the optimal position also is different for all ooptics except EMTY and BS. It is worth nothign that in optimal position both PIT and YAW of ITMY and ITMX are off by 70-90 uradians and ETMX Pit oplev is off by 55 uradians.
Restored arm algiment to get 0.8 max flashing on YARM and 1 max flashing on XARM. I had to move input alignment using TT2-PR3 pair and realign YARM cavity axis using ITMY-ETMY pair.
I would like to advertise this useful tool that I've been using for moving cavity axis or input beam direction. It's a simple code that makes your terminal kind of videogame area. It moves two optics together (either in same direction or opposite direction) by arrow key strokes (left, right for YAW, up, down for PIT). Since it moves two optics together, you actually control the cavity axis or input beam angle or position depening on the mode.
I've been futzing with the common mode servo, trying to engage the AO path with POY for high bandwidth control of a single arm lock. I'm able to pull in the crossover and get a nice loop shape, but keep getting tripped up by the offset glitches from the CM board gain steps, so can't get much more than a 1kHz UGF.
As yutaro measured, these can be especially nasty at the major carrier transitions (i.e. something like 0111->1000). This happens at the +15->+16dB input gain step; the offset step is ~200x larger than the in-loop error signal RMS, so obviously there is no hope of keeping the loop engaged when recieving this kind of kick. Neither of the CM board inputs are immune from this, as I have empirically discovered. I can turn down the initial input gain to try and avoid this step occuring anywhere in the sequence, but then the SNR at high frequencies get terrible and I inject all kinds of crud into the mode cleaner, making the PC drive furious.
I think we're able to escape this when locking the full IFO because the voltages coming out of REFL11 are so much larger than the puny POY signals so the input-referred glitches aren't as bad. I think in the past, we used AS55 with a misaligned ITMX for this kind of single arm thing, which probably gives better SNR, but the whole point of this is to keep the X arm aligned and lock it to the Y-arm stabilized PSL.
It's been a while since I've attempted any locking, so tonight was mostly getting the various subsystems back together.