40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
 40m Log, Page 76 of 341 Not logged in
ID Date Author Type Category Subject
4208   Wed Jan 26 12:04:31 2011 josephbUpdateCDSExplanation of why c1sus and c1lsc models crash when the other one goes down

So apparently with the current Dolphin drivers, when one of the nodes goes down (say c1lsc), it causes all the other nodes to freeze for up to 20 seconds.

This 20 seconds can force a model to go over the 60 microseconds limit and is sufficiently long enough to force the FE to time out.  Alex and Rolf have been working with the vendors to get this problem fixed, as having all your front ends go down because you rebooted a single computer is bad.

[40184.120912] c1rfm: sync error my=0x3a6b2d5d00000000 remote=0x0 [40184.120914] c1rfm: sync error my=0x3a6b2d5d00000000 remote=0x0 [44472.627831] c1pem: ADC TIMEOUT 0 7718 38 7782 [44472.627835] c1mcs: ADC TIMEOUT 0 7718 38 7782 [44472.627849] c1sus: ADC TIMEOUT 0 7718 38 7782 [44472.644677] c1rfm: cycle 1945 time 17872; adcWait 15; write1 0; write2 0; longest write2 0 [44472.644682] c1x02: cycle 7782 time 17849; adcWait 12; write1 0; write2 0; longest write2 0 [44472.646898] c1rfm: ADC TIMEOUT 0 8133 5 7941 

The solution for the moment is to start the computers at exactly the same time, so the dolphin is up before the front ends, or start the models by hand after the computer is up and dolphin running, but after they have timed out.  This is done by:

sudo rmmod c1SYSfe

sudo insmod /opt/rtcds/caltech/c1/target/c1SYS/bin/c1SYSfe.ko

Alex and Rolf have been working with the vendors to get this fixed, and we may simply need to update our Dolphin drivers.  I'm trying to get in contact with them and see if this is the case.

15503   Tue Jul 28 13:55:11 2020 HangUpdateBHDExploring bilinear SRCL->DARM coupling

We explore bilinear SRCL to DARM noise coupling mechanisms, and show two cases that by doing BHD readout the noise performance can be improved. In the first case, the bilinear piece is due to residual DHARD motion (see also LHO:45823), and it matters mostly for the low-frequency (<100 Hz) part, and in the second piece the bilinear piece is due to residual SRCL fluctuation and it matters mostly for the a few x 100 Hz part. Details are below:

=================================================

General Model:

We can write the SRCL to DARM transfer function as (Evan Hall's thesis, eq. 2.29)

Z_s2d(f) = C_lf(f) * F^2 * x_D + C_hf(f) * F * dphi_S * x_D    ---- (1)

where

C_lf ~ 1/f^2 and C_hf ~ f are constants at each frequency unless there are major upgrades to the IFO,

F is the finesse of the arm cavity which depends on the alignment, spot position on the TMs, etc.,

dphi_S is the SRCL detuning (wrt the nominal 90 deg value),

x_D is the DC DARM offset.

The linear part of this can be removed with feedforward subtractions and it is the bilinear piece that matters, which reads

dZ_s2d = C_lf * <F>^2 * dx_D + C_hf * <F> * <dphi_S> * dx_D

+ 2C_lf * <F> * <x_D>  * dF + C_hf * <dphi_S> * <x_D> * dF

+ C_hf  * <F> * <x_D> * d(dphi_S).     ---- (2)

The first term in (2) is due to residual DARM motion dx_D. This term does not depends on the DC value of DARM offset <x_D> and thus does not depend on doing BHD or DC readout. On the other hand, the typical residual DARM motion is 1 fm << 1 pm of DARM offset. Since the current feedforward reduction factor is about 10 (see both Den Martynov's thesis and Evan Hall's thesis), clearly we are not limited by the residual DARM motion.

The second term is due to the change in the arm finesse, which can be affected by, e.g., the alignment fluctuation (both increasing the loss due to scattering into 01/10 modes and affecting the spot positon and hence changing the losses), and is likely to be the reason why we see the effect being modulated by DHARD.

The last term in (2) is due to the residual SRCL fluctuation and is important for the ~ a few x 100 Hz band.

=================================================

DHARD effects.

As argued above, the DHARD affects the SRCL -> DARM coupling as it changes the finesse in the arm cavity (through scattering into 01/10 modes; in finesse we cannot directly simulate the effects due to spot hitting a rougher location).

Since in the second term of eq. (2) the LF part depends on the DARM DC offset <x_D>, this effect can be improved by going from DC readout to BHD.

To simulate it in finesse, at a fixed DARM DC offset, we compute the SRCL->DARM transfer functions at different DHARD offsets, and then numerically compute the derivative \partial Z_s2d / \partial \theta_{DH}. Then multiplying this derivative with the rms value of DHARD fluctuation \theta_{DH} we then know the expected bilinear coupling piece.

The result is shown in the first attached plot. Here we have assumed a flat SRCL noise of 5e-16 m/rtHz for simplicity (see PRD 93, 112004, 2016). We do not account for the loop effects which further reduces the high frequency components for now. The residual DHARD RMS is assumed to be 1 nrad.

In the first plot, from top to bottom we show the SRCL noise projection at different DARM DC offsets of (0.1, 1, 10) pm. Since the DHARD alignment only affects the arm finesse starting at quadratic order, it thus matters what DC offset in DHARD we assume. In each pannel, the blue trace is for no DC offset in DHARD and the orange one for a 5 nrad DC offset. As a reference, the A+ sensitivity is shown in grey trace in each plot as a reference.

We can see if there is a large DC offset in DHARD (a few nrad) and we still do DC readout with a few pm of DARM offset, then the bilinear piece of SRCL can still contaminate the sensitivity in the 10-100 Hz band (bottom panel; orange trace). On the other hand, if we do BHD, then the SRCL noise should be down by ~ x100  even compared to with the top panel.

(A 5 nrad of DC offset in DHARD coupled with 1 nrad RMS would cause about 0.5% RIN in the arms. This is somewhat greater than the typically measured RIN which is more like <~ 0.2%. See the second plot).

=================================================

SRCL effect.

Similarly we can consider the SRCL->DARM coupling due to residual SRCL rms. The approach is very similar to what we did above for DHARD. I.e., we compute Z_s2d at fixed DARM offset and for different SRCL offsets, then we numerically evaluate \partial Z_s2d / \partial dphi_S. A residual SRCL rms of 0.1 nm is then used to generate the projection shown in the third figure.

Unlike the DHARD effect, the bilinear SRCL piece does not depend on the DC SRCL detuning (for the 50-500 Hz part). It does still depends on the DARM DC offset and therefore could be improved by BHD.

Since we do not include the LP of the SRCL loop in this plot, the HF noise at 1 kHz is artifical as it can be easily filtered out. However, the LP will not be very strong around 100-300 Hz for a SRCL UGF ~ 30 Hz, and thus doing BHD could still have some small improvements for this effect.

Attachment 1: SRCL_bilin_DHARD.pdf
Attachment 2: ARM_RIN.pdf
Attachment 3: SRCL_bilin_SRCL.pdf
16100   Thu Apr 29 17:43:48 2021 AnchalUpdateCDSF2A Filters double check

I double checked today and the F2A filters in the output matrices of MC1, MC2 and MC3 in the POS column are ON. I do not get what SDF means? Did we need to add these filters elsewhere?

 Quote: The IMC suspension team should double check their filters are on again. I am not familiar with the settings and I don't think they've been added to the SDF.
Attachment 1: F2AFiltersON.png
16105   Fri Apr 30 00:20:30 2021 gautamUpdateCDSF2A Filters double check

The SDF system is supposed to help with restoring the correct settings, complementary to burt. My personal opinion is that there is no need to commit these filters to SDF until we're convinced that they help with the locking / noise performance.

 Quote: I double checked today and the F2A filters in the output matrices of MC1, MC2 and MC3 in the POS column are ON. I do not get what SDF means? Did we need to add these filters elsewhere
6004   Thu Nov 24 20:22:42 2011 MirkoUpdateIOOF2A filter for MC

I calculated the F2A filters for the input mode cleaner optics as described in T010140-01-D eq (4). On Ranas recommendation I added an s/ ( w_0 * Q ) term to the numerator.

The used values are:

w_0 = 2pi / s
h= 0.0009
D= 2.46957E-2
Q=10

I put theses filters into C1:SUS-MC1_TO_COIL_1_1 to _4_1 . For convenience split in Z and P. Well it doesn't work. After a few seconds the optic begins to swing wildly.

6006   Fri Nov 25 17:52:28 2011 ranaUpdateIOOF2A filter for MC

Woo. Pretty crazy. The numerators should only be ~10% larger than the denominator below 1 Hz. Let's try again.

6012   Fri Nov 25 23:25:24 2011 MirkoUpdateIOOF2A filter for MC

 Quote: Woo. Pretty crazy. The numerators should only be ~10% larger than the denominator below 1 Hz. Let's try again.

[Rana, Mirko]

I redid this calculation. The idea behind it is to get rid on any pitch that is introduced by applying longitudinal feedback to the mirrors. This coupling happens because the center of percussion for pitch , which is identical with the point where the wires lift off of the mirror, is above the center of mass.

With the same values as before, just less faulty math and Q = 2 instead of 10 we end up with the following filters:

For the lower coils (red), compared to corresponding preexisting BS filters (black):

The upper coils' TF is just mirrored at the 0dB magnitude axis, and have a corresponding frequency response.

I switched the F2a filters on for all MC mirrors. For convenience they are split into F2aZeros and F2aPoles. Everything seems fine. The F2a filters seem to be off for ( all ?) other mirrors.

6021   Mon Nov 28 10:54:40 2011 ranaUpdateSUSF2A filter for MC

Our approach to making the F2A or F2P filters for the MC is to use the measured resonant frequencies and then calculating the appropriate mechanical dimensions of each suspension. This is basically because we don't have optical levers with normal incidence on these optics, but the method should be fine.

To find the formulas, I asked Gaby for her old cheat sheet: Its now in the DCC. Its only for Large optics, but you should be able to reconstruct the right ones for SOS by just changing the parameters.

989   Thu Sep 25 02:35:21 2008 ranaSummaryPSLFAST is moving alot
It looks like the FAST signal has started moving a lot - this is partly what inspired us to tune the SLOW loop.

Some of the spiking events happen when people go on the table or the MC loses lock. But at other times it just
spikes for no apparent reason. You can also see from the first plot (9 day 10-minute trend) that there is no
great change in DTEC so we shouldn't be worried about clogging in the NPRO head.

The second plot is a 1 day minute-trend.
Attachment 1: Untitled.png
Attachment 2: Untitled.png
1023   Fri Oct 3 15:09:58 2008 robUpdatePSLFAST/SLOW

Last night during locking, for no apparent reason (no common mode), the PSL FAST/SLOW loop starting going just a little
nutz. Attached is a two day plot. The noisy period started around 11-ish last night.
Attachment 1: FASTSLOW.png
6639   Thu May 10 22:05:21 2012 DenUpdateCDSFB

Already for the second time today all computers loose connection to the framebuilder. When I ssh to framebuilder DAQD process was not running. I started it

controls@fb ~ 130$sudo /sbin/init q But I do not know what causes this problem. May be this is a memory issue. For FB Mem: 7678472k total, 7598368k used, 80104k free Practically all memory is used. If more is needed and swap is off, DAQD process may die. 6640 Fri May 11 08:07:30 2012 JamieUpdateCDSFB  Quote: Already for the second time today all computers loose connection to the framebuilder. When I ssh to framebuilder DAQD process was not running. I started it controls@fb ~ 130$ sudo /sbin/init q

Just to be clear, "init q" does not start the framebuilder.  It just tells the init process to reparse the /etc/inittab.  And since init is supposed to be configured to restart daqd when it dies, it restarted it after the reloading of /etc/inittab.  You and Alex must have forgot to do that after you modified the inittab when you're were trying to fix daqd last week.

daqd is known to crash without reason.  It usually just goes unnoticed because init always restarts it automatically.  But we've known about this problem for a while.

 Quote: But I do not know what causes this problem. May be this is a memory issue. For FB Mem:   7678472k total,  7598368k used,    80104k free Practically all memory is used. If more is needed and swap is off, DAQD process may die.

This doesn't really mean anything, since the computer always ends up using all available memory.  It doesn't indicate a lack of memory.  If the machine is really running out of memory you would see lots of ugly messages in dmesg.

13152   Mon Jul 31 15:13:24 2017 gautamUpdateCDSFB ---> FB1

[jamie, gautam]

In order to test the new daqd config that Jamie has been working on, we felt it would be most convenient for the host name "fb" (martian network IP 192.168.113.202) to point to the physical machine "fb1" (martian network IP 192.168.113.201).

## I made this change in /var/lib/bind/martian.hosts on chiara, and then ran sudo service bind9 restart. It seems to have done the job. So as things stand, both hostnames "fb" and "fb1" point to 192.168.113.201.

Now, when starting up DTT or dataviewer, the NDS server is automatically found.

More details to follow.

11076   Thu Feb 26 13:17:31 2015 ericqUpdateComputer Scripts / ProgramsFB IO load

Over the past few days, I've occasionally been peeking at the framebuilder IO load to see If I could correlate anything with it, but it's usually been low when I looked. I.e. with daqd and all models running, the %wa time was in the few percents at most.

Just now, I was seeing some EPICS sluggishness, and sure enough, the %wa was in the 50-60 range. I used iostat -xmh 5 on the framebuilder to see that /dev/sda, the /frames drive, was at 100% utilization, which means it was reading and writing as fast as it possibliy could.

I ssh'd over to nodus, and with iotop found that an rsync job was running (rsync -am --exclude .*.gwf full 131.215.114.19::40m/full), and its IO rates corresponded very closely to the data read rates on the framebuilder from /frames.

I killed the rsync process on nodus, and the %wa time on the framebuilder dropped to near zero. The ASS striptools, where I had noticed the sluggishness, immediately started updating faster.

While rsync is supposed to play nice with a system's IO demands, maybe it only knows about nodus's IO usage, not fb which is the underlying NFS server where the frames live. I think it would be good to throttle the bandwidth of these jobs to a specific bandwidth. 50MB/s seemed like too much, so maybe 10MB/s is ok?

11077   Thu Feb 26 13:55:59 2015 jamieUpdateComputer Scripts / ProgramsFB IO load
We should use "ionice" to throttle the rsync. Use something like "ionice -c 3 rsync ..." to set the priority such that the rsync process will only work when there is no other IO contention. See "man ionice" for other options.
8374   Fri Mar 29 17:24:43 2013 JamieUpdateComputersFB RAID power supply replaced

Steve ordered a replacement power supply for the FB JetStor power supply that failed a couple weeks ago.  I just installed it and it looks fine.

12024   Sun Mar 6 15:24:05 2016 gautamUpdateCDSFB down again

I came in to check the status of the nitrogen and noticed that the striptool panels in the control room were all blank.

• PMC was unlocked but I was able to relock it using the usual procedure
• FB seems to be down: I was unable to ssh into it (or any of the FEs for that matter). I checked the lights on the RAID array, they are all green. I am holding off on doing a hard reboot of FB in case there is some other debugging that can be done first
• None of the watchdogs were tripped, but judging by the green spots on the mirrors, all of them are moving quite a bit. I've shutdown the watchdogs on all the optics except the MC mirrors, but the ITMs and ETMs still seem to be moving quite a bit.

I am leaving things in this state for now. It is unclear why this should have happened, it doesn't seem like there was a power glitch?

Attachment 1: 58.png
12025   Mon Mar 7 20:40:02 2016 ericqUpdateCDSFB down again

We went and looked at the monitor plugged into FB. All kinds of messages were being spammed to the screen (maybe RAM errors), and nothing could be done to interrupt. Sadly, a hard reboot of FB was neccesary.

Video of error messages: https://youtu.be/7rea_kokhPY

After the reboot, it just took a couple of model restarts to get the CDS screen happy.

16294   Tue Aug 24 18:44:03 2021 KojiUpdateCDSFB is writing the frames with a year old date

Dan Kozak pointed out that the new frame files of the 40m has not been written in 2021 GPS time but 2020 GPS time.

Current GPS time is 1313890914 (or something like that), but the new files are written as C-R-1282268576-16.gwf

I don't know how this can happen but this may explain why we can't have the agreement between the FB gps time and the RTS gps time.

(dataviewer seems dependent on the FB GPS time and it indicates 2020 date. DTT/diaggui does not.)

This is the way to check the gpstime on fb1. It's apparently a year off.

controls@fb1:~ 0cat /proc/gps 1282269402.89 Attachment 1: Screen_Shot_2021-08-24_at_18.46.24.png 16298 Wed Aug 25 17:31:30 2021 PacoUpdateCDSFB is writing the frames with a year old date [paco, tega, koji] After invaluable assistance from Jamie in fixing this yearly offset in the gps time reported by cat /proc/gps, we managed to restart the real time system correctly (while still manually synchronizing the front end machine times). After this, we recovered the mode cleaner and were able to lock the arms with not much fuss. Nevertheless, tega and I noticed some weird noise in the C1:LSC-TRX_OUT which was not present in the YARM transmission, and that is present even in the absence of light (we unlocked the arms and just saw it on the ndscope as shown in Attachment #1). It seems to affect the XARM and in general the lock acquisition... We took some quick spectrum with diaggui (Attachment #2) but it doesn't look normal; there seems to be broadband excess noise with a remarkable 1 kHz component. We will probably look into it in more detail. Attachment 1: TRX_noise_2021-08-25_17-40-55.png Attachment 2: TRX_TRY_power_spectra.pdf 16300 Thu Aug 26 10:10:44 2021 PacoUpdateCDSFB is writing the frames with a year old date [paco, ] We went over the X end to check what was going on with the TRX signal. We spotted the ground terminal coming from the QPD is loosely touching the handle of one of the computers on the rack. When we detached it completely from the rack the noise was gone (attachment 1). We taped this terminal so it doesn't touch anything accidently. We don't know if this is the best solution since it is probably needs a stable voltage reference. In the Y end those ground terminals are connected to the same point on the rack. The other ground terminals in the X end are just cut. We also took the PSD of these channels (attachment 2). The noise seem to be gone but TRX is still a bit noisier than TRY. Maybe we should setup a proper ground for the X arm QPD? We saw that the X end station ALS laser was off. We turned it on and also the crystal oven and reenabled the temperature controller. Green light immidiately appeared. We are now working to restore the ALS lock. After running XARM ASS we were unable to lock the green laser so we went to the XEND and moved the piezo X ALS alignment mirrors until we maximized the transmission in the right mode. We then locked the ALS beams on both arms successfully. It very well could be that the PZT offsets were reset by the power glitch. The XARM ALS still needs some tweaking, its level is ~ 25% of what it was before the power glitch. Attachment 1: Screenshot_from_2021-08-26_10-09-50.png Attachment 2: TRXTRY_Spectra.pdf 9021 Sun Aug 18 16:04:07 2013 ranaSummaryCDSFB lights all RED: mxstream restart Sun Aug 18 15:52:50 2013 Found the FB lights (C1:FEC-NN_FB_NET_STATUS and C1:DAQ-DC0_C1XXX_STATUS) RED for everything on the CDS_FE_STATUS screen. I used the (! mxstream restart) button ro restart the mxstreams. Everything is green now. PMC was out of lock- relocked it and the IMC locked itself as did the X & Y arms on IR. X was already green locked. Attachment 1: IFO-Trend.png 9354 Wed Nov 6 15:12:01 2013 JenneUpdateCDSFB not talking to LSC? Something funny is going on with the framebuilder's communication with the LSC machine. This is a different failure mode / error than I have seen before. It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem. ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem. Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually. All of the models on the lsc machine are running fine now, though. Here's a screenshot of the CDS overview screen, with the error lights: 9357 Wed Nov 6 17:21:58 2013 JamitUpdateCDSFB not talking to LSC?  Quote: Something funny is going on with the framebuilder's communication with the LSC machine. This is a different failure mode / error than I have seen before. It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem. ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem. Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually. All of the models on the lsc machine are running fine now, though. Here's a screenshot of the CDS overview screen, with the error lights: This definitely looks like a timing problem on the c1lsc front end computer. The red lights on the left mean that the timing synchronization is lost at the user model. I'm perplexed why it looks like the IOP is not seeing the same error, though, since it should originate at the ADC. The red lights to the right just mean the timing synchronization is lost with the DAQ, which is too be expected given a timing loss at the front end. We'll have to take a closer look when this happens again. 8278 Tue Mar 12 12:06:22 2013 JamieUpdateComputersFB recovered, RAID power supply #1 dead The framebuilder RAID is back online. The disk had been mounted read-only (see below) so daqd couldn't write frames, which was in turn causing it to segfault immediately, so it was constantly restarting. The jetstor RAID unit itself has a dead power supply. This is not fatal, since it has three. It has three so it can continue to function if one fails. I have removed the bad supply and gave it to Steve so he can get a suitable replacement. Some recovery had to be done on fb to get everything back up and running again. I ran into issues trying to do it on the fly, so I eventually just rebooted. It seemed to come back ok, except for something going on with daqd. It was reporting the following error upon restart: [Tue Mar 12 11:43:54 2013] main profiler warning: 0 empty blocks in the buffer  It was spitting out this message about once a second, until eventually the daqd died. When it restarted it seemed to come back up fine. I'm not exactly clear what those messages were about, but I think it has something to do with not being able to dump it's data buffers to disk. I'm guessing that this was a residual problem from the umounted /frames, which somehow cleared on it's own. Everything seems to be ok now.  Quote: Manasa just went inside to recenter the AS beam on the camera after our Yarm spot centering exercises of the evening, and heard a loud beeping. We determined that it is the RAID attached to the framebuilder, which holds all of our frame data that is beeping incessantly. The top center power switch on the back (there are FOUR power switches, and 3 power cables, btw. That's a lot) had a red light next to it, so I power cycled the box. After the box came back up, it started beeping again, with the same front panel message: H/W monitor power #1 failed. DO NOT DO THIS. This is what caused all the problems. The unit has three redundant power supplies, for just this reason. It was probably continuing to function fine. The beeping was just to tell you that there was something that needed attention. Rebooting the device does nothing to solve the problem. Rebooting in an attempt to silence beeping is not a solution. Shutting of the RAID unit is basically the equivalent of ripping out a mounted external USB drive. You can damage the filesystem that way. The disk was still functioning properly. As far as I understand it the only problem was the beeping, and there were no other issues. After you hard rebooted the device, fb lost it's mounted disk and then went into emergency mode, which was to remount the disk read-only. It didn't understand what was going on, only that the disk seemed to disappear and the reappear. This was then what caused the problems. It was not the beeping, it was the restarting the RAID that was mounted on fb. Computers are not like regular pieces of hardware. You can't just yank the power on them. Worse yet is yanking the power on a device that is connected to a computer. DON"T DO THIS UNLESS YOU KNOW WHAT YOU"RE DOING. If the device is a disk drive, then doing this is a sure-fire way to damage data on disk. 9437 Wed Dec 4 12:02:39 2013 KojiUpdateCDSFB restored Now FB is fixed: daqd and nds are running When I rebooted FB, I noticed that any of the nfs file systems were not mounted. I started tracking down the issues from here. I googled the common issues of the nfs mounting during the boot sequence. - It is good to give "_netdev" option to fstab to mount the system after the network connection is established. - "auto" option specifies that the file system is mounted when mount -a is run Resulting /etc/fstab is this:  /dev/sdb1 / ext3 noatime 0 1 /swapfile none swap sw 0 0 shm /dev/shm tmpfs nodev,nosuid,noexec 0 0 /dev/sda1 /frames ext3 noatime 0 0 linux1:/home/cds/ /cvs/cds nfs _netdev,auto,rw,bg,soft 0 0 linux1:/home/cds/rtcds /opt/rtcds nfs _netdev,auto,rw,bg,soft 0 0 linux1:/home/cds/rtapps /opt/rtapps nfs _netdev,auto,rw,bg,soft 0 0 linux1:/home/cds/caltech/apps/linux /opt/apps nfs _netdev,auto,rw,bg,soft 0 0 But this didn't help mounting the nfs file systems at boot yet. I dug into google again and found a command "/sbin/rc-update". "/sbin/rc-update show" shows what services are activated at boot. It did not include "nfsmount". So the following command was executed > sudo /sbin/rc-update add nfsmount boot > /sbin/rc-update show    * Broken runlevel entry: /etc/runlevels/boot/portmap bootmisc | boot checkfs | boot checkroot | boot clock | boot consolefont | boot dcron | default dhcpd | default hostname | boot in.tftpd | boot keymaps | boot local | default nonetwork localmount | boot modules | boot monit | default mx | default net.eth0 | default net.lo | boot netmount | default  nfs | boot  nfsmount | boot ntp-client | boot default rmnologin | boot rpc.statd | boot sshd | boot syslog-ng | boot udev-postmount | default urandom | boot xinetd | default After rebooting, I confirmed that the nfs file systems are correctly mounted and daqd and nds are automatically started. This means that FB had never been configured to run correctly at boot. Shame on you! 10050 Tue Jun 17 17:04:26 2014 ericqUpdateComputer Scripts / ProgramsFB troubles  Quote: Also, the CDS FE status screen had red lights blinking as if it required an 'mxstream restart'. I did the same and it did not fix the problem. So I tried to restart fb using the usual 'telnet fb 8087'; but could not restart fb that way. FB is acting strange. When ssh-ing in, certain commands cause an inescapable hang, which can't be ctrl-c'd out of. Telling it to reboot does nothing. This kind of situation was seen by me before, when we were getting all the front ends back, I eventually hard rebooted it, hoping it was a one time thing. Guess it's not. Looking at the dmesg output, daqd seems to be segfault-ing all over the place. This may be related... Here are some examples: 451314.730502] daqd[17339]: segfault at 7ff589ae3b30 ip 00007ff589ae3b30 sp 00007ff49931dfb8 error 15 in libmyriexpress.so[7ff589ae3000+1000] [530516.313238] daqd[18442] general protection ip:7f3f2ce73a6c sp:7f3e29949d50 error:0 [530516.313250] daqd[18420] general protection ip:7f3f2ce73a6c sp:7f3e2a19fd50 error:0 in libc-2.10.1.so[7f3f2ce3f000+14c000] [530516.313262] in libc-2.10.1.so[7f3f2ce3f000+14c000] [530516.327083] daqd[18412]: segfault at 3b04c9cd0 ip 00007f3f2ce73a6c sp 00007f3e2a4a7d50 error 4 in libc-2.10.1.so[7f3f2ce3f000+14c000] [537695.364481] daqd[18489]: segfault at 12dbbcae0 ip 00007fa35a3b8a0a sp 00007fa298381af0 error 6 in libmyriexpress.so[7fa35a399000+28000] [577316.821618] daqd[18758]: segfault at 7f5c4d3e9b30 ip 00007f5c4d3e9b30 sp 00007f5b5cc23fb8 error 15 in libmyriexpress.so[7f5c4d3e9000+1000] I'm not inclined to go reboot it right now, but not sure how to address these problems... 9839 Tue Apr 22 01:39:57 2014 JenneUpdateCDSFB unhappy again [Jenne, Q] The frame builder (or something) is unhappy again. I know that we've seen this before, but I can't find the elog entry that relates to this particular problem. Every few minutes, the fb status lights on the CDS_STATUS screen go white, and then come back green. It's annoying when it happens every hour or so (which is unfortunately typical), but it's pretty debilitating when it stops dataviewer and dtt every few minutes. Just from the way the lights change, it looks like perhaps the daqd process is restarting itself periodically? 12151 Mon Jun 6 16:41:36 2016 ericqUpdateCDSFB upgrade work Barring objections, starting tomorrow morning, Jamie will be testing the new FB code. The IFO will not be available for other use while this is ongoing. 13312 Fri Sep 15 15:54:28 2017 gautamUpdateCDSFB wiper script A wiper script is not yet set up for our new Frame-Builder. The disk usage is ~80% now, so I think we should start running a wiper script that manages overall disk usage and deletes old frame files to this end. From what I could find on the elog, the way this was done was by running a cron job on FB. There is a perl script, /opt/rtcds/caltech/c1/target/fb/wiper.pl, which from what I could understand, runs a bunch of du commands on different directories to determine if there is a need to delete any files. I copied this script over to /opt/rtcds/caltech/c1/target/daqd/wiper.pl. This is the directory in which all the new FB stuff resides. Conveniently, the script has a "dry-run" option, which I tried running on FB1. However, I get the following error message: Fri Sep 15 15:44:45 PDT 2017 Dry run, will not remove any files!!! You need to rerun this with --delete argument to really delete frame files Directory disk usage: /frames/trend/minute_rawk Combined 0k or 0m or 0Gb Illegal division by zero at ./wiper.pl line 98. So it would seem that for some reason, the du commands aren't working. From what I could tell, there aren't any directory paths specific to the old FB machine that need to be changed. I believe the script was working prior to the FB disk crash - unfortunately it doesn't look like this script was under version control but I don't think any changes have been made to this script. ## Before I go down a Perl rabbit hole, has anyone seen such an error or is aware of some reason why this might not work on the new FB? Am I even using the correct scripts? 13317 Mon Sep 18 17:17:49 2017 gautamUpdateCDSFB wiper script After trying to debug this issue using the Perl debugger, I concluded that the problem is in the part of the code that splits the output of the "du" command into directory and disk usage. For whatever, reason, this isn't working. The version of perl running on the new FB1 machine is 5.20.2, whereas I suspect the version running on the old FB machine was 5.14.2 (which is the version on all the Ubuntu 12 workstations and megatron). Unclear whether downgrading the Perl version is the right way to go. The FB1 disk is now getting close to full, the usage is up to 85% today. Quote: ## Before I go down a Perl rabbit hole, has anyone seen such an error or is aware of some reason why this might not work on the new FB? Am I even using the correct scripts? 13318 Mon Sep 18 17:30:54 2017 ChrisUpdateCDSFB wiper script Attached is the version of the wiper script we use on the CryoLab cymac. It works with perl v5.20.2. Is this different from what you have? Attachment 1: wiper.pl #!/usr/bin/perl use File::Basename; print "\n" . date . "\n"; # Dry run, do not delete anythingdry_run = 1;

if ($ARGV[0] eq "--delete") {$dry_run = 0; }

print "Dry run, will not remove any files!!!\n" if $dry_run;  ... 184 more lines ... 13319 Mon Sep 18 17:51:26 2017 gautamUpdateCDSFB wiper script It is a little different - specifically, the way the splitting of the output of the "du" command into disk usage and directory is different (see Attachment #1). Apart from this, some of the parameters (e.g. what percentage to keep free) are different. I changed the percentages to match what we had here, and edited a couple of other lines to print out the files that will be deleted. The dry run seemed to work okay, it produced the output below. Not sure why "df -h" reports a different use percentage though... Since the script seems to be working now, I am going to set it up on FB1's crontab. Thanks Chris!. controls@fb1:/opt/rtcds/caltech/c1/target/daqd 0$ ./wiper.pl
Mon Sep 18 17:47:06 PDT 2017
Dry run, will not remove any files!!! You need to rerun this with --delete argument to really delete frame files
Directory disk usage: /frames/trend/minute_raw 47126124k /frames/trend/minute 22900668k /frames/trend/second 760359168k /frames/full 19337278516k Combined 20167664476k or 19694984m or 19233Gb
/frames size 25097525144k at 80.36% /frames is below keep value of 85.00% Will not delete any files df reported usage 80.36% controls@fb1:/opt/rtcds/caltech/c1/target/daqd 0$df -h Filesystem Size Used Avail Use% Mounted on /dev/sda4 2.0T 1.7T 152G 92% / udev 10M 0 10M 0% /dev tmpfs 13G 177M 13G 2% /run tmpfs 32G 0 32G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/sda2 19G 3.7G 14G 21% /var /dev/sda1 461M 65M 373M 15% /boot /dev/sdb1 24T 19T 3.5T 85% /frames 192.168.113.104:/home/cds/rtcds 2.0T 1.6T 291G 85% /opt/rtcds 192.168.113.104:/home/cds/rtapps 2.0T 1.6T 291G 85% /opt/rtapps tmpfs 6.3G 0 6.3G 0% /run/user/1001  Quote: Attached is the version of the wiper script we use on the CryoLab cymac. It works with perl v5.20.2. Is this different from what you have? Attachment 1: perlDiff.png 13320 Mon Sep 18 18:40:34 2017 gautamUpdateCDSFB wiper script I did a further check on the wiper script by changing the "percent_keep" from 85.0 to 75.0, and running the script in "dry_run" mode again. The script then output to console the names of all the files it would delete in order to free up the required amount of space (but didn't actually delete any files as it was a dry run). Seemed to be sensible. To set up the cron job, I did the following on FB1: • crontab -e opened up the crontab • Copied over a script called "wiper.cron" from /opt/rtcds/caltech/c1/target/fb to /opt/rtcds/caltech/c1/target/daqd. This essentially contains a bunch of instructions to run the wiper script with the --delete flag, and write the console output to a log file. • Added the following line: 33 3 * * * /opt/rtcds/caltech/c1/target/daqd/wiper.cron. So the cron job should be executed at 3:33AM everyday. • The cron daemon seems to be running - sudo systemctl status cron.service yields the following output: controls@fb1:~ 0$ sudo systemctl status cron.service ● cron.service - Regular background program processing daemon    Loaded: loaded (/lib/systemd/system/cron.service; enabled)    Active: active (running) since Mon 2017-09-18 18:16:58 PDT; 27min ago      Docs: man:cron(8)  Main PID: 30183 (cron)    CGroup: /system.slice/cron.service            └─30183 /usr/sbin/cron -f
Sep 18 18:16:58 fb1 cron[30183]: (CRON) INFO (Skipping @reboot jobs -- not system startup) Sep 18 18:17:01 fb1 CRON[30205]: pam_unix(cron:session): session opened for user root by (uid=0) Sep 18 18:17:01 fb1 CRON[30206]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly) Sep 18 18:17:01 fb1 CRON[30205]: pam_unix(cron:session): session closed for user root Sep 18 18:25:01 fb1 CRON[30820]: pam_unix(cron:session): session opened for user root by (uid=0) Sep 18 18:25:01 fb1 CRON[30821]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Sep 18 18:25:01 fb1 CRON[30820]: pam_unix(cron:session): session closed for user root Sep 18 18:35:01 fb1 CRON[31515]: pam_unix(cron:session): session opened for user root by (uid=0) Sep 18 18:35:01 fb1 CRON[31516]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Sep 18 18:35:01 fb1 CRON[31515]: pam_unix(cron:session): session closed for user root

• crontab -l on FB1 now shows the following:
controls@fb1:~ 0\$ crontab -l # Edit this file to introduce tasks to be run by cron. # # Each task to run has to be defined through a single line # indicating with different fields when the task will be run # and what command to run for the task # # To define the time you can provide concrete values for # minute (m), hour (h), day of month (dom), month (mon), # and day of week (dow) or use '*' in these fields (for 'any').# # Notice that tasks will be started based on the cron's system # daemon's notion of time and timezones. # # Output of the crontab jobs (including errors) is sent through # email to the user the crontab file belongs to (unless redirected). # # For example, you can run a backup of all your user accounts # at 5 a.m every week with: # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/ # # For more information see the manual pages of crontab(5) and cron(8) # # m h  dom mon dow   command 33 3 * * * /opt/rtcds/caltech/c1/target/daqd/wiper.cron

Let's see if this works.

 Quote: Since the script seems to be working now, I am going to set it up on FB1's crontab. Thanks Chris!.

8274   Tue Mar 12 00:35:56 2013 JenneUpdateComputersFB's RAID is beeping

[Manasa, Jenne]

Manasa just went inside to recenter the AS beam on the camera after our Yarm spot centering exercises of the evening, and heard a loud beeping.  We determined that it is the RAID attached to the framebuilder, which holds all of our frame data that is beeping incessantly.  The top center power switch on the back (there are FOUR power switches, and 3 power cables, btw.  That's a lot) had a red light next to it, so I power cycled the box.  After the box came back up, it started beeping again, with the same front panel message:

H/W monitor power #1 failed.

Right now the fb is trying to stay connected to things, and we can kind of use dataviewer, but we lose our connection to the framebuilder every ~30 seconds or so.  This rough timing estimate comes from how often we see the fb-related lights on the frontend status screen cycle from green to white to red back to green (or, how long do the lights stay green before going white again).  We weren't having trouble before the RAID went down a few minutes ago, so I'm hopeful that once that's fixed, the fb will be fine.

In other news, just to make Jamie's day a little bit better, Dataviewer does not open on Pianosa or Rosalba.  The window opens, but it stays a blank grey box.  This has been going on for Pianosa for a few days, but it's new (to me at least) on Rosalba.  This is different from the lack of ability to connect to the fb that Rossa and Ottavia are seeing.

354   Tue Mar 4 00:42:51 2008 ranaUpdateComputersFB0 still down ?
The framebuilder is still down. I tried restarting the daqd task and resetting the RFM
switch like it says in the Wiki but it still doesn't work right. The computer itself is
running (I can ssh to it) and the daqd process is running but there's a red light for
it on the RFM screen and dataviewer won't connect to it.

If Alex isn't over by ~10 AM, we should call him and ask for help.
13396   Fri Oct 20 16:30:17 2017 gautamUpdateCDSFB1 installed on shelves

[steve, jamie, gautam]

The machine that now serves as out Frame Builder, FB1, was sitting on top of megatron. I decided that this wasn't ideal, and asked Steve to get some alternative mounting solution. Today, he procured some shelves to put FB1 on. Jamie suggested looking for the slider-rail that came with the machine, and using that instead, as it will allow us to slide FB1 out of the rack as we do megatron and the old FB. But as luck would have it, the distance between the rack vertical posts is 26 inches, but the rail is 27 inches. So we had to accept using the less ideal solution of putting FB1 on two shelves, with no sliding option. Photo to be uploaded shortly.

For this work, I had to shutdown FB1 for about 1 hour between 3pm and 4pm. It seems to have come back up fine now.

6157   Tue Jan 3 15:45:04 2012 JenneUpdateComputersFB?

Is there a reason the framebuilder status light is red for all the front ends?

Also, I reenabled PRM watchdog.

12325   Fri Jul 22 03:02:37 2016 KojiUpdateCOCFC painting

[Koji Gautam]

We have worked on the FC painting on ITMX and ITMY. We also replaced the OSEM fixing screws with the ones with a hex knob.
This was done except for the SD OSEM as the new screw was not long enough. We left an allen-key version of the screw for the SD OSEM.

All the full-resolution photos can be found on g-photo.

ITMY

Attachment1: The barrel was pretty dusty. Some dusts were observed on the HR face but it was not so terrible. The barrel and the HR face were blown with the ionized N2 and then wiped with IPA. The face wiping was done n a similar way as the drag wiping.

Attachment2: FC was applied to the HR surface.

Attachment3: The AR surface was also painted with FC. The brush touched the coil holder.

Attachment4: The brush touched the coil holder. Another PEEK tab was applied to remove this FC stain on the metal holder.

Attachment5: This is the result of successful removal of the FC stain.

ITMX

Attachment6: The OSEM arrangement before removal. We confirmed that the OSEM arrangement was as described on Wiki.

Attachment7/8: The ITMX was obviously a lot dirtier than ITMY. The barrel accumulated dusts.

Attachment9: This is the HR face picture with large dusts on it.

Attachment10: The HR surface was painted with FC.

Attachment11: This is the AR surface with FC painted.

Attachment 1: ITMY_barrel_dust.jpg
Attachment 2: ITMY_HR_FC.jpg
Attachment 3: ITMY_AR_FC.jpg
Attachment 4: ITMY_drip_removal.jpg
Attachment 5: ITMY_drip_removed.jpg
Attachment 6: ITMX_OSEMS.jpg
Attachment 7: ITMX_barrel_dust1.jpg
Attachment 8: ITMX_barrel_dust2.jpg
Attachment 9: ITMX_HR_dusty.jpg
Attachment 10: ITMX_HR_FC.jpg
Attachment 11: ITMX_AR_FC.jpg
546   Thu Jun 19 20:22:03 2008 ranaUpdateGeneralFE Computer Status
I called Rolf (@LLO) who called Alex (@MIT) who suggested that we power cycle every crate
with an RFM connection as we did before (twice in the past year).

Rob and I followed Yoichi around the lab as he turned off and on everything. There
was no special order; he started at the Y-end and worked his way into the corner and
finishing at the X-End. Along the way we also reset the 2 RFM switches around fb0.

This cured the EPICS problem; the FEs could now boot and received the EPICS data.

However, there are still some residual channel hopping-ish issues which Rob and Yoichi are
now working on.
459   Tue Apr 29 21:09:12 2008 ranaDAQCDSFE Filters
These are new FE filters for downsampling and upsampling. We will be going from native hardware sampling rates of 64k down to 32k, 16k, and 2k.

The attached plot shows these filters. They are 3dB ripple, 40 dB stopband, 4th order elliptic filters in which I have moved the zeros around
into good places (e.g. to the Nyquist frequency).

I'm also attaching the .txt file containg the filter coefficients and the design strings. The filters are called x2, x4, and x32, for the
D2, D4, and D32 downsampling, respectively.
Attachment 1: fefilters.jpg
Attachment 2: fefilters.txt
# FILTERS FOR ONLINE SYSTEM
#
# Computer generated file: DO NOT EDIT
#
# MODULES ULYAW
#
################################################################################
### ULYAW                                                                    ###
################################################################################
# SAMPLING ULYAW 65536

... 28 more lines ...
7138   Fri Aug 10 09:47:19 2012 MashaUpdateComputersFE Status

The c1lsc and c1sus screens are red in the front-end status. I restarted the frame builder, and hit the "global diag reset" button, but to no avail. Yesterday, the only thing Den and I did to c1sus was install a new c1pem model. I got rid of the changes and switched to the old one (I ran rtcds build, install, restart), but the status is still the same.

7143   Fri Aug 10 11:08:26 2012 jamieUpdateComputersFE Status

 Quote: The c1lsc and c1sus screens are red in the front-end status. I restarted the frame builder, and hit the "global diag reset" button, but to no avail. Yesterday, the only thing Den and I did to c1sus was install a new c1pem model. I got rid of the changes and switched to the old one (I ran rtcds build, install, restart), but the status is still the same.

The issue you're seeing here is stalled mx_stream processes on the front ends.  On the troublesome front ends you can log in and restart the mx_streams with the "mxstreamrestart" command.

5211   Fri Aug 12 16:50:37 2011 YoichiConfigurationCDSFE Status screen rearranged
I rearranged the FE_STATUS.adl so that I have a space to add c1ffc in the screen.
So, please be aware that the FE monitors are no longer in their original positions
in the screen.
8483   Wed Apr 24 14:20:49 2013 KojiUpdateCDSFE Web view not updated?

The FE web view seems not up-to-date, does it? ( maybe for a year)

https://nodus.ligo.caltech.edu:30889/FE/c1mcs_slwebview_files/index.html

8895   Mon Jul 22 22:06:18 2013 KojiUpdateCDSFE Web view was fixed

FE Web view was broken for a long time. It was fixed now.

The problem was that path names were not fixed when we moved the models from the old local place to the SVN structure.

The auto updating script (/cvs/cds/rtcds/caltech/c1/scripts/AutoUpdate/update_webview.cron) is running on Mafalda.

Link to the web view: https://nodus.ligo.caltech.edu:30889/FE/

9364   Mon Nov 11 12:19:36 2013 ranaUpdateCDSFE Web view was fixed

 Quote: FE Web view was broken for a long time. It was fixed now. The problem was that path names were not fixed when we moved the models from the old local place to the SVN structure. The auto updating script (/cvs/cds/rtcds/caltech/c1/scripts/AutoUpdate/update_webview.cron) is running on Mafalda. Link to the web view: https://nodus.ligo.caltech.edu:30889/FE/

Seems partially broken again. Not updating for most of the FE. I've commented out the cron lines for this as well as the mostly broken MEDM Snapshots job. I'm in the process of adding them to the megatron cron (since that machine is at least running 64 bit Ubuntu 12, instead of 32-bit CentOS)

9366   Tue Nov 12 15:04:35 2013 ranaUpdateCDSFE Web view was fixed

 Quote: Seems partially broken again. Not updating for most of the FE. I've commented out the cron lines for this as well as the mostly broken MEDM Snapshots job. I'm in the process of adding them to the megatron cron (since that machine is at least running 64 bit Ubuntu 12, instead of 32-bit CentOS)

Seems to now be working. I made several fixes to the scripts to get it working again:

1. changed TCSH scripts to BASH. Used /usr/bin/env to find bash.
2. fixed stdout and stderr redirection so that we could see all error messages.
3. made the PERL scripts executable. most of the PERL errors are not being logged yet.
4. fixed paths for the MEDM screens to point to the right directories.
5. the screen cap only works on screens which pop open on the left monitor, so I edited the screens so that they open up there by default.
6. moved the CRON jobs from mafalda over to megatron. Mafalda no longer is running any crons.
7. op540m used to run the 3 projector StripTool displays and have its screen dumped for this web page. Now zita is doing it, but I don't know how to make zita dump her screen.
1916   Mon Aug 17 02:12:53 2009 YoichiSummaryComputersFE bootfest
Rana, Yoichi

All the FE computers went red this evening.
We power cycled all of them.
They are all green now.

Not related to this, the CRT display of op540m is not working since Friday night.
We are not sure if it is the failure of the display or the graphics card.
Rana started alarm handler on the LCD display as a temporary measure.
2627   Mon Feb 22 12:48:31 2010 josephb, alex, kojiUpdateComputersFE machines now coming up

Even after bringing up the fb40m, I was unable to get the front ends to come up, as they would error out with an RFM problem.

We proceeded to reboot everything I could get my hands on, although its likely it was daqawg and daqctrl which were the issue, as on the C0DAQ_DETAIL screen their status had been showing as 0xbad, but after the reboot showed up as 0x0.  They had originally come up before the frame builder had been fixed, so this might have been the culprit.  In the course of rebooting, I also found c1omc and c1lsc had been turned off as well, and turned them on.

After this set of reboots, we're now able to bring the front ends up one by one.

ELOG V3.1.3-