40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 295 of 344  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  4762   Mon May 23 18:10:41 2011 kiwamuUpdateLSCf2p filters on PRM : not good

During the DRMI trial I noticed that the f2p filters on PRM is not quite effective (i.e. pushing PRM in POS direction makes misalignments).

I checked the f2p filters in an easy way. I pushed POS at 0.01 Hz with an amplitude of 1000 counts and looked at the oplev error signals with / without the f2p filters.

The picture below is a time series of the POS excitation, the oplev's PITCH and YAW error signals.

You can see there still is a big coupling from POS to YAW after the f2p filters were enabled. (Its supposed to be like this)

I will redo the f2p measurement on PRM.

f2p_PRM.png

  4719   Sun May 15 12:42:29 2011 kiwamuUpdateSUSf2p ratio adjutment done for all the suspensions

The f2p adjustment for all the suspensions are done (except for MC1,2,3)

Attachment 1: f2p_summary.pdf
f2p_summary.pdf f2p_summary.pdf f2p_summary.pdf f2p_summary.pdf f2p_summary.pdf f2p_summary.pdf f2p_summary.pdf
  9822   Thu Apr 17 11:00:54 2014 jamieUpdateCDSfailed attempt to get Dolphin working on c1ioo

I've been trying to get c1ioo on the Dolphin network, but have not yet been successful.

Background: if we can put the c1ioo machine on the fast Dolphin IPC network, we can essentially eliminate latencies between the c1als model and the c1lsc model, which are currently connected via a rube goldberg-esq c1lsc->dolphin->c1sus->rfm->c1ioo configuration.

Rolf gave us a Dolpin host adapter card, and we purchased a Dolphin fiber cable to run from the 1X2 rack to the 1X4 rack where the Dolphin switch is.

Yesterday I installed the dolphin card into c1ioo.  Unfortunately, c1ioo, which is Sun Fire X4600, and therefore different than the rest of the front end machines, doesn't seem to be recognizing the card.  The /etc/dolphin_present.sh script, which is supposed to detect the presence of the card by grep'ing for the string 'Stargen' in the lspci output, returns null.

I've tried moving the card to different PCIe slots, as well as swapping it out with another Dolphin host adapter that we have.  Neither worked.

I looked at the Dolphin host adapter installed in c1lsc and it's quite different, presumably a newer or older model.  Not sure if that has anything to do with anything.

I'm contacting Rolf to see if he has any other ideas.

  9899   Fri May 2 03:51:29 2014 ranaUpdateLSCfarther into CM

Rana, Q

After some more matlab loopology (see Qlog), we turned on the AO path successfully. The key was to turn on the 300:80 filter in the MCL path so that it could cross stably with the AO. Then we ramp up the AO gain via the newly AC coupled AO path into the MC servo board.

The POY11 signal looks nice and smooth. For the final smoothness after the overall common gain is ramped up, I turned on a FM7 pole at 300 Hz so that the MC path would keep falling like 1/f^2 and not interfere with the AO path around 1 kHz.

There's not enough gain yet to be able to turn on the Boost. PCDRIVE is ~3 V. Earlier tonight we were seeing the EOM saturation effect maybe, but we re-allocated the gain more to the front end and its all fine now. I think we can get another ~10-15 dB of gain by using the POY whitening gain slider + the CM AO slider. Then we can get the Boost on and take some TFs with the SR785 (as long GPIB allows).

Good Settings:

CM REFL1 = +31 dB, AO = +16 dB, MC IN2 = +16 dB. SUS-MC2_LSC = FM6, FM& ON

 

** Everything has been pretty stable tonight except some occasional MC/EOM locking oscillations. This means that its been easy to keep trying some different CM steps since the Y-Arm relocks using MCL within seconds.

Attachment 1: MCkicked.png
MCkicked.png
  9908   Sun May 4 22:28:54 2014 ericqUpdateLSCfarther into CM

 [Rana, ericq]

Today, we got a ~2kHz bandwidth lock of the YARM with the AO path. We weren't able to turn any boosts on, due to POY noise. 

Rana and Koji have written scripts (/scripts/PRFPMI/cm_step and cm_down) that work very reliably. 

Here is an OLTF. (Violin filter was off, the crap around 600Hz goes away with them on)

 OLTF.png

My MATLAB modeling was useful is predicting the features of the loop shape, and the dependence on AO gain/crossover. Still, I need to check it out, because there is nonzero discrepancy between reality and my model (this may be hiding in the non flat MC AO response, i.e. the bump at ~35kHz. Alternatively, the crossover frequency is a free parameter...)

In any case, we have confidence that the CM board is mostly working predictably. We presume that our current obstacle is the very noisy nature of POY, and thus it's not worth spending more time in this configuration. 

Upcoming plans:

  • Use the CM board to control the Y arm coupled with the PRM. ("PRY"?)
  • Determine the game plane for high BW control of CARM. 

Next steps:

  • Check CM board boosts turn on politely (Transients, TFs)
  • Use fast spectrum analyzer to check MC loop gain out to a few MHz. (The bump in the tens of kHz should be fixed / moved higher)
  • Think about noise performance of, say, REFLDC, ASDC, RF AS signals, etc. in the PRY case, figure out which one to use first. 
  • We may want to first focus on directly locking the arm on an RF signal, figure out gains etc. and then figure out how to do DC->RF handoff nicely, or if high bandwidth DC signal control is even feasible. 

RXA: we should also use AS45 instead of POY11. It has better SNR and I think our whole problem is too little light on POY.

  9913   Tue May 6 03:17:15 2014 ranaUpdateLSCfarther into CM

Yes, we still need to do these things, day team. Please tune up the MC loop first, before anything else.

Quote:

Next steps:

  • Check CM board boosts turn on politely (Transients, TFs)
  • Use fast spectrum analyzer to check MC loop gain out to a few MHz. (The bump in the tens of kHz should be fixed / moved higher)
  • Think about noise performance of, say, REFLDC, ASDC, RF AS signals, etc. in the PRY case, figure out which one to use first. 
  • We may want to first focus on directly locking the arm on an RF signal, figure out gains etc. and then figure out how to do DC->RF handoff nicely, or if high bandwidth DC signal control is even feasible.  

  9917   Tue May 6 17:58:44 2014 ericqUpdateLSCfarther into CM

 I took a look at the MC OLTF and AO path TFs with the fast agilent analyzer. 

I played with the relative gain of the EOM and PZT, but couldn't really change the MC OLTF shape much without making the PC Drive RMS angry. 

However, it turns out we have plenty of phase headroom to up the MC UGF from ~100kHz to ~180, with about 40 degrees of phase margin and ~7dB of gain margin. As I write this, PC drive RMS is around 1.1, and FSS Fast at 5.6, so I think the extra gain is fine for now. 

This pushes up and smoothens out the gain peaking in the AO path; see this figure:

AOTFs.pdf

(why does ELOG hate my python plots?! argggg)

Rana's rule of thumb was "We need at least +3dB MC loop gain at our CM servo UGF," so it looks like high tens of kHz bandwidth may be doable from the AO standpoint.

RXA: No, no, no, no, no, noooo. Rana said we need a gain of 3-10 at the CM UGF, not +3 dB.

  8681   Wed Jun 5 15:48:02 2013 SteveUpdateGeneralfast 2004 qpd

Thank you Ben Abbott forwarding this information:

 QPD Amplifier D990272 https://dcc.ligo.org/cgi-bin/private/DocDB/ShowDocument?.submit=Number&docid=D990272&version= at the X-end.  It plugs into a Generic QPD Interface, D990692, https://dcc.ligo.org/cgi-bin/private/DocDB/ShowDocument?.submit=Number&docid=D990692&version= according to my drawings, that should be in 1x4-2-2A.

Attachment 1: D990272qpd.jpg
D990272qpd.jpg
Attachment 2: interfaceD990692.jpg
interfaceD990692.jpg
  8683   Wed Jun 5 17:37:10 2013 ranaUpdateGeneralfast 2004 qpd

Quote:

Thank you Ben Abbott forwarding this information:

 QPD Amplifier D990272 https://dcc.ligo.org/cgi-bin/private/DocDB/ShowDocument?.submit=Number&docid=D990272&version= at the X-end.  It plugs into a Generic QPD Interface, D990692, https://dcc.ligo.org/cgi-bin/private/DocDB/ShowDocument?.submit=Number&docid=D990692&version= according to my drawings, that should be in 1x4-2-2A.

 Wrong: this is not an interface.

  9278   Thu Oct 24 12:00:11 2013 jamieUpdateCDSfb acquisition of slow channels

Quote:

 

 While that would be good - it doesn't address the EDCU problem at hand. After some verbal emailing, Jamie and I find that the master file in target/fb/ actually doesn't point to any of the EDCU files created by any of the FE machines. It is only using the C0EDCU.ini as well as the *_SLOW.ini files that were last edited in 2011 !!!

So....we have not been adding SLOW channels via the RCG build process for a couple years. Tomorrow morning, Jamie will edit the master file and fix this unless I get to it tonight. There a bunch of old .ini files in the daq/ dir that can be deleted too.

I took a look at the situation here so I think I have a better idea of what's going on (it's a mess, as usual):

The framebuilder looks at the "master" file

    /opt/rtcds/caltech/c1/target/fb/master

which lists a bunch of other files that contain lists of channels to acquire.  It looks like there might have been some notion to just use 

    /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini

as the master slow channels file.  Slow channels from all over the place have been added to this file, presumably by hand.  Maybe the idea was to just add slow channels manually as needed, instead of recording them all by default.  The full slow channels lists are in the

    /opt/rtcds/caltech/c1/chans/daq/C1EDCU_<model>.ini

files, none of which are listed in the fb master file.

There are also these old slow channel files, like

    /opt/rtcds/caltech/c1/chans/daq/SUS_SLOW.ini

There's a perplexing breakdown of channels spread out between these files and C1EDCU.ini:

controls@fb ~ 0$ grep MC3_URS /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini
[C1:SUS-MC3_URSEN_OVERFLOW]
[C1:SUS-MC3_URSEN_OUTPUT]
controls@fb ~ 0$ grep MC3_URS /opt/rtcds/caltech/c1/chans/daq/MCS_SLOW.ini
[C1:SUS-MC3_URSEN_INMON]
[C1:SUS-MC3_URSEN_OUT16]
[C1:SUS-MC3_URSEN_EXCMON]
controls@fb ~ 0$

why some of these channels are in one file and some in the other I have no idea.  If the fb finds multiple of the same channel if will fail to start, so at least we've been diligent about keeping disparate lists in the different files.

So I guess the question is if we want to automatically record all slow channels by default, in which case we add in the C1EDCU_<model>.ini files, or if we want to keep just adding them in by hand, in which case we keep the status quo.  In either case we should probably get rid of the *_SLOW.ini files (by maybe integrating their channels in C0EDCU.ini), since they're old and just confusing things.

In the mean time, I added C1:FEC-45_CPU_METER to C0EDCU.ini, so that we can keep track of the load there.

 

  9288   Fri Oct 25 01:46:33 2013 ranaUpdateCDSfb acquisition of slow channels

Rather than limp along with a broken SLOW channel system, I fixed it so that the EDCU files made during the RCG build actually get used and added to the channel list (and thereby available in DV and trends).

I first started by adding all of the EDCU files. This completely fails; daqd just doesn't start and gives some weird exceptions.

So I removed a bunch of them and it runs OK now with ~15000 channels. Previously we had ~1500 slow channels.

An in-between config tonight had ~58000 channels and was also running fine, but the connection to the FB would time out when using DV after several minutes. Possibly we can fix this by adding some more RAM to the FB (the DAQD process uses up 45% of the CPU and 39% of the 8 GB of RAM).

Another issue in getting this to work was that there were a bunch of channel name conflicts between the old C0EDCU.ini and the sub-system EDCU files that I was trying to add. I went through by hand and deleted all of the duplicates from the old file. The new frame files are 80 MB, the old ones were 66 MB.

I hope that /frames doesn't become full - not sure how that is wiped...

  3841   Mon Nov 1 19:32:08 2010 yutaSummaryCDSfb crashed? during c1ioo and c1mcs connection at ASC

Frame builder died again!!

Background:
  We want to do angle to length measurement to optimize the beam position and increase visibility of MC locking.
  In order to do A2L measurement, we need excitation point, but AWG is currently not working.
  The better way is to use LOCKIN stuff like we had for OMC and put it to C1IOO WFS.
  A software oscillator in LOCKIN shakes the suspension, and demodulate the length signal.
  We can choose whatever DOF to shake, whatever signal to demodulate. It would be useful not just for A2L.

What I did:

  I started to put C1IOO WFS signal into C1SUS MC suspension RT model, but after compiling new c1mcs, fb crashed.
  Looks like daqd and mx_streams are running, but DAQ is not working(red).
  I don't know how to restart in a new way!

  5315   Sun Aug 28 22:49:40 2011 SureshUpdateCDSfb down

I recompiled c1ioo after making some changes and restarted fb. (about 9:45 - 10PM PDT)  But it failed to restart.  It responds to ping, but does not allow a ssh or telnet. The screen output is:

allegra:~>ssh fb
ssh: connect to host fb port 22: Connection refused
allegra:~>telnet fb 8087
Trying 192.168.113.202...
telnet: connect to address 192.168.113.202: Connection refused
telnet: Unable to connect to remote host: Connection refused
allegra:~>
 

Nor am I able to connect to c1ioo either....

 

 

  6854   Fri Jun 22 13:37:17 2012 JenneUpdateComputersfb lost connection

...Perhaps related to the fact that Jamie is copying a lot of stuff over the network to back up Ottavia before converting her to Ubuntu, perhaps totally independent. 

After restarting the daqd, c1lsc was the only computer whose mx_stream came up on its own.  I restarted c1sus. c1ioo, c1iscey, c1iscex by hand.

  3796   Wed Oct 27 12:32:53 2010 josephbUpdateCDSfb rebooted to try and fix testpoints

Problem:

Test points were unavailable last night, even after reboots of c1sus and even restarting the daqd process on the frame builder.

Cause:

Its unclear at this time.  My guess is flaky fb and mx_stream codes.  At the moment, the daqd often requires several restarts as it segfaults within a minute or two of restarting it.

What we did (aka treating the symptoms):

We rebooted the frame builder machine.  I also added the daqd and nds processes to the inittab.  Now when these die, they will automatically be restarted.

Steps to add to the inittab on fb

0) If not on fb, ssh -X fb

1) cd /etc/

2) sudo vi inittab or sudo emacs init

3) Add a line like: id:runlevels:action:process

The id is a unqiue 2-4 letter and number identifier for the process

Run levels is the run level of linux that it will start at. 345 will cover the normal cases

action is what to do with the process. Respawn makes it run at startup and also restarts it everytime it dies.

process is the command you want to run

See "man inittab" for more details

In this case we added

daq:345:respawn:/opt/rtcds/caltech/c1/target/fb/daqd -c /opt/rtcds/caltech/c1/target/fb/daqdrc > /opt/rtcds/caltech/c1/target/fb/daqd.log


nds:345:respawn:/opt/rtcds/caltech/c1/target/fb/nds pipe > /opt/rtcds/caltech/c1/target/fb/nds.log

4) Save.

5) Run "sudo /sbin/telinit q".  This forces init to rexamine the inittab file

daqd and nds will now automatically restart when they die.

Continuing issues:

When the frame builder dies, the mx_stream processes on the front ends die as well.  These need to be restarted manually at the moment by using "sudo /etc/restart_streams" while on c1sus.

The framebuilder code shouldn't be this flaky.

  11645   Fri Sep 25 17:51:11 2015 jamieUpdateDAQfb replacement work update

Brief update about the fb replacement status.

The new hardware for fb is in the rack, temporarily sitting on top of megatron, and on the CDS network with the name 'fb1'.  I've installed an OS on it and have re-built daqd.

Earlier this week I swapped it into the network and tried to get it to acquire data from the front ends.  I was ultimately unsuccessfully.  The problem seemed to be the mx_stream communication from the front ends to the new host.

The swap is sort of a pain because we only have one Myrinet fiber network adapter card that has to be moved between machines, which of course requires shutting down both machines and opening up their chassis.  I instructed Steve to order us a new Myrinet card for the new machine, which will allow us to swap daqd machines by just moving the fiber connection.  Once that's in place (early next week) I'll go back to trying to figure out what the issue is with the mx_streams.

If all else fails I'll take the repulsive last resort of either swapping or cloning the disk from the old fb.

  9879   Wed Apr 30 14:21:50 2014 manasaUpdateCDSfb restarted

c1sus and c1isey were not talking to fb. The usual mxstream restart did not help.

Restarted fb

>>ssh fb

>>telnet fb 8087
shutdown

All lights on the FE status screen are green now.

Note that Steve did an mxstreamrestart earlier today because the same machines c1sus and c1isey were not talking to fb.

  5733   Tue Oct 25 01:19:17 2011 SureshUpdateComputersfb restarted and c1ioo model committed to svn

When I installed the new model I restarted the fb between 1 and 1:30 AM PDT Oct 25, 2011

  5416   Thu Sep 15 11:37:24 2011 SureshUpdateComputer Scripts / Programsfb restarted at Thu Sep 15 11:30:30 PDT 2011

I changed a filter bank name (C1IOO-WFS1_PIT) in c1ioo model reverting it to its earlier name.  Had to restart c1ioo model and the fb

  3696   Tue Oct 12 13:05:00 2010 josephb, alexUpdateCDSfb still flaky, models time out fixed

Interesting information from Alex.  We're limited to 2 Megabytes per second per front end model.  Assuming all your channels are running at a 2kHz rate, we can have at most 256 channels being set to the frame builder from the front end (assuming 4 byte data).  We're fine for the moment, but perhaps useful to keep in mind.

I talked to Alex this morning and he said the frame builder is being flaky (it crashed on us twice this morning, but the third time seemed to stay up when requesting data).  I've added a new wiki page called "New Computer Restart Prodecures" under Computers and Scripts, found here. It includes all the codes that need to be running, and also a start order if things seem to be in a particularly bad state.  Unfortunately, there were no fixes done to the frame builder but it is on Alex's list of things to do.

In regards to the timing out of the front ends, Alex came over to the 40m this morning and we sat down debugging.  We tried several things, such as removing all filters from C1MCS.txt file in the chans directory, and watching the timing as we pressed various medm control buttons. We traced it to a filters used by the DAC in the model talking to the IOP front end, which actually sends the data to the physical DAC card.  The filter is used when converting between sample rates, in this case between the 16 kHz of the front end model and the 64 kHz of the IOP.  Sending it raw zeros after having had real data seemed to cause this filter to eat up an usually large amount of CPU time. 

We modified the /opt/rtcds/caltech/c1/core/advLigoRTS/src/include/drv/fm10Gen.c file.

We reverted a change that was done between version 908 and 929, where underflows (really small numbers) were dealt with by adding and then subtracting a very small number.  We left the adding and subtracting, but also restored the hard limits on the history.

So instead of relying on just:

input += 1e-16;
junk = input;
input -= 1e-16;

we also now use

if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

Thus any filter value who's absolute value is less than 1e-20 will be clamped to -1e-20 or 1e-20.  On the bright side, we no longer crash the front ends when we turn something off.

 

  9567   Wed Jan 22 18:17:46 2014 JenneUpdateCDSfb timing was off

Since this morning, the fb's timing has been off.  Steve pointed it out to me earlier today, but I didn't have a chance to look at it until now. 

This was different from the more common problem of the mx stream needing to be restarted - that causes 3 red blocks per core, on all cores on a computer, but it doesn't have to be every computer.  This was only one red block per core in the CDS FE status screen, but it was on every core on every computer. 

The error message, when you click into the details of a single core, was 0x4000.  I elog searched for that, and found elog 6920, which says that this is a timing issue with the frame builder.  Since Jamie had already set things on nodus' config correctly, all I did was reconnect the fb to the ntp: 

fb$ sudo /etc/init.d/ntp-client restart

As in elog 6920, the daqd stopped, then restarted itself, and cleared the error message. It looks like everything is good again.

I suspect (without proof) that this may have to do with the campus network being down this morning, so the computers couldn't sync up with the outside world.

  9587   Thu Jan 30 11:59:03 2014 manasaUpdateCDSfb timing was off

Quote:

Since this morning, the fb's timing has been off.  Steve pointed it out to me earlier today, but I didn't have a chance to look at it until now. 

This was different from the more common problem of the mx stream needing to be restarted - that causes 3 red blocks per core, on all cores on a computer, but it doesn't have to be every computer.  This was only one red block per core in the CDS FE status screen, but it was on every core on every computer. 

The error message, when you click into the details of a single core, was 0x4000.  I elog searched for that, and found elog 6920, which says that this is a timing issue with the frame builder.  Since Jamie had already set things on nodus' config correctly, all I did was reconnect the fb to the ntp: 

fb$ sudo /etc/init.d/ntp-client restart

As in elog 6920, the daqd stopped, then restarted itself, and cleared the error message. It looks like everything is good again.

I suspect (without proof) that this may have to do with the campus network being down this morning, so the computers couldn't sync up with the outside world.

The above timing problem has been repeating (a couple of times this week so far). It does not seem to be related to the campus network.

The same solution was applied.

  9679   Wed Feb 26 23:14:07 2014 JenneUpdateCDSfb timing was off

....fb timing issue happened again.

I thought that it was the thing that Koji and I saw the other day, where it was individual front end computers that had lost ntp sync, since it wasn't every core on every computer that was red, but reconnecting to the ntp server on c1lsc didn't do anything.  I then tried reconnecting to the ntp server on fb, and that fixed things right up.  Annoying.

  9683   Mon Mar 3 10:42:53 2014 JenneUpdateCDSfb timing was off

...yet again.

lsc and sus needed mxstream restarts after I restarted the ntp on fb.

  9684   Mon Mar 3 11:55:39 2014 KojiUpdateCDSfb timing was off

We need to correctly setup crontab or rc.local for the frontend machines.

  9706   Mon Mar 10 11:42:36 2014 JenneUpdateCDSfb timing was off

fb timing was off again.

  9732   Mon Mar 17 12:31:58 2014 manasaUpdateCDSfb timing was off

Off again. Restarted ntp on fb.

  3628   Thu Sep 30 16:29:35 2010 josephb, alexUpdateCDSfb update

There currently seems to be a timing issue with  the frame builder.  We switched over to using a symmetricom card to get an IRIG-B signal into the fb machine, but the gps time stamp is way off (~80 years Alex said).

If there is a frame buiilder issue, its currently often necessary to kill the associated mx_stream processes, since they don't seem to restart gracefully.  To fix it the following steps should be taken:

Kill frame builder, kill the two mx_stream processes, then /etc/restart_streams/, then restart the frame builder (usual daqd -c ./daqdrc >& ./daqd.log in /opt/rtcds/caltech/c1/target/fb).

To restart (or start after a boot) the nds server, you need to go to /opt/rtcds/caltech/c1/target/fb and type

./nds /opt/rtcds/caltech/c1/target/fb/pipe

At this time, testpoints are kind of working, but timing issues seem to be preventing useful work being done with it.  I'm leaving with Alex working on the code.

 

  3632   Fri Oct 1 10:56:30 2010 josephb,alexUpdateCDSfb work continued

Alex fixed the time issue with the IRIG-B signal being far off, apparently their IRIG-B signal in downs seems to be different.  He simply corrected for the difference in the two signals in the code.

For debugging purposes we uncommented the following line in the feCodeGen.pl script (in /opt/rtcds/caltech/c1/advLigoRTS/src/epics/util/):

print EPICS "test_points ONE_PPS $dac_testpoint_names $::extraTestPoints\n" 

This is to make every ADC testpoint available from the IOP (such as c1x02).

  3635   Fri Oct 1 14:13:29 2010 josephb, alexUpdateCDSfb work that still needs to be done

1) Need to check 1 PPS signal alignment

2) Figure out why 1PPS and ADC/DAC testpoints went away from feCodeGen.pl?

3) Fix 1PPS testpoint giving NaN data

4) Figure out why is daqd printing "making gps time correction" twice?

5) Need to investigate why mx_streams are still getting stuck

6) Epics channels should not go out on 114 network (seen messages when doing
burt restore/save).

7) Dataviewer leaves test points hanging, daqd does not deallocate them
(net_Writer.c shutdown_netwriter call)

8) Need to install wiper scripts on fb

9) Need to install newer kernel on fb to avoid loading myrinet firmware
(avoid boot delay)

  4009   Fri Dec 3 15:37:10 2010 josephbUpdateCDSfb, front ends fixed - tested RFM between c1ioo and c1iscex

Problem:

The front ends and fb computers were unresponsive this morning.

This was due to the fb machine having its ethernet cable plugged into the wrong input.   It should be plugged into the port labeled 0.

Since all the front end machines mount their root partition from fb, this caused them to also hang.

Solution:

The cable has been relabled to "fb" on both ends, and plugged into the correct jack.  All the front ends were rebooted.

 

Testing RFM for green locking:

I tested the RFM connection between c1ioo and c1scx.  Unfortunately, on the first test, it turns out the c1ioo machine had its gps time off by 1 second compared to c1sus and c1iscex.  A second reboot seems to have fixed the issue.

However, it bothers me that the code didn't come up with the correct time on the first boot.

The test was done using the c1gcv model and by modifying the c1scx model.  At the moment, the MC_L channel is being passed the MC_L input of the ETMX suspension.  In the final configuration, this will be a properly shaped error signal from the green locking.

The MC_L signal is currently not actually driving the optic, as the ETMX POS MATRIX currently has a 0 for the MC_L component.

  16325   Tue Sep 14 15:57:05 2021 jamieFrogsCDSfb1 /var full after reboot, caused all sorts of problems

/var on fb1 filled up today, which caused all sorts of CDS issues.  I found out about the problem by reading the logs of the services that were having trouble running, in which they complained about not being able to write to disk.  I looked at the filesystem status with 'df' and noticed that /var was full, which is where applications write temporary data, and will always cause problems if it's full.

I tracked the issue down to multiple multi-gigabyte log files: /var/log/messages and /var/log/messages.1.  They were full of lines like this one:

Aug 29 06:25:21 fb1 kernel: l called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl called cmd = 1gpstime iotcl ca

Seems like something related to the gpstime kernel module?

Anyway, I deleted the log files for now, which cleared up the space on /var.  Things should be back to normal now, until the logs fill up again...

  16327   Tue Sep 14 16:44:54 2021 jamieFrogsCDSfb1 /var full after reboot, caused all sorts of problems

Jonathan Hanks pointed me to this fix to the gpstime kernel module that was unfortunately put in after the 3.4 release that we're currently using:

https://git.ligo.org/cds/advligorts/-/commit/6f6d6e2eb1d3355d0cbfe9fe31ea3b59af1e7348

I hacked the source in place (/usr/src/gpstime-3.4/drv/gpstime/gpstime.c) to get the fix, and then rebuilt the kernel module with dkms :

sudo dkms uninstall gpstime/3.4
sudo dkms install gpstime/3.4

I then stopped daqd_dc, unloaded gpstime, reloaded it, restarted daqd_dc.  The messages are no longer showing up in /var/log/messages, so I think we're ok for the moment.

NOTE: the fix will be undone if we for some reason reinstall the advligorts-gpstime-dkms package.  There shouldn't be a need to do that, but we should be aware.  I'm discussing with Jonathan if we want to try to push out a new debian package to fix this issue...

  17250   Wed Nov 9 14:19:18 2022 TegaUpdateCDSfb1 OS migration

[Tega, Chris]

We migrated fb1 OS from the teststand fb1 drive to the internal 2TB RAID of fb1. We then rebooted twice to check that we no longer have the fb1 booting issue. 

The next step is to set up software RAID and backup for chiara, which we plan to complete this week. Then we would work on nodus and workstation OS upgrade next week.

  2626   Mon Feb 22 11:46:55 2010 josephbUpdateComputersfb40m

I fixed the JetStor 416S raid array IP address by plugging in my laptop to its ethernet port, setting my IP to be on the same subnet, and using the web interface.  (After finally tracking down the password, it has been placed in the usual place).

After this change, I powered up the fb40m2 machine and reboot the fb40m machine. This seems to have made all the associated lights green.

Data viewer is working such that is recording from the point I fixed the JetStor raid array and did the fb40m reboot.  It also can go back in time before the IP switch over.

  1554   Thu May 7 12:21:36 2009 josephb, alexConfigurationComputersfb40m

Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.

 

We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.

 

 

  1555   Thu May 7 15:22:19 2009 josephb, albertoConfigurationComputersfb40m

Quote:

Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.

 

We brought it down and up several times between 11 and noon.  We  eventually were able to daisy chain the old raid and the new raid so that fb40m sees both.  At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available).  We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours).  In the meantime, operations should not be affected.  If it is, contact one of us.

 

 

 

 

This afternoon the alignment script chrashed after returning sysntax errors. We found that the tpman wasn't running on the framebuilder becasue it had probably failed to get restarted in one of the several reboots executed in the morning by Alex and Jo.

Restarting the tpman was then sufficient for the alignment scripts to get back to work.

  1746   Wed Jul 15 08:59:30 2009 steveUpdateComputersfb40m

The fb40m just went out of order with status indicator number 8

It recovered on its own five minutes later.

  1756   Thu Jul 16 09:49:52 2009 AlanUpdateComputersfb40m

Quote:

The fb40m just went out of order with status indicator number 8

It recovered on its own five minutes later.

 Backup script restarted, backup of trend frames and /cvs/cds is up-to-date.

 

  2385   Thu Dec 10 13:13:08 2009 JenneUpdateComputersfb40m backup restarted

The frame builder was power cycled during the morning bootfest.  I have restarted the backup script once more.

  1574   Mon May 11 12:25:03 2009 josephb,AlexUpdateComputersfb40m down for patching

The 40m frame builder is currently being patched to be able utilize the full 14 TB of the new raid array (as opposed to being limited to 2 TB).  This process is expected to take several hours, during which the frame builder will be unavailable.

  3600   Thu Sep 23 12:05:20 2010 josephb, alexUpdateCDSfb40m down, new fb in progress

Alex came over this morning and we began work on the frame builder change over.  This required fb40m be brought down and disconnected from the RAID array, so the frame builder is not available.

He brought a Netgear switch which we've installed at the top of the 1X7 rack.  This will eventually be connected, via Cat 6 cable, to all the front ends.  It is connected to the new fb machine via a 10G fiber.

Alex has gone back to Downs to pickup a Symmetricon (sp?) card for getting timing information into the frame builder.  He will also be bringing back a harddrive with the necessary framebuilder software to be copied onto the new fb machine.

He said he'd like to also put a Gentoo boot server on the machine.  This boot server will not affect anything at the moment, but its apparently the style the sites are moving towards.  So you have a single boot server, and diskless front end computers, running Gentoo.  However for the moment we are sticking with our current Centos real time kernel (which is still compatible with the new frame builder code).  However this would make a switch over to the new system possible in the future.

At the moment, the RAID array is doing a file system check, and is going slowly while it checks terabytes of data.  We will continue work after lunch. 

 Punchline: things still don't work.

  1831   Wed Aug 5 07:33:04 2009 steveDAQComputersfb40m is down
  1832   Wed Aug 5 09:25:57 2009 AlbertoDAQComputersfb40m is up

FB40m up and running again after restarting the DAQ.

  3602   Thu Sep 23 21:01:11 2010 josephb, alexUpdateCDSfb40m still down, new fb still in progress
Unfortunately, copying the data to the USB/SATA drive over at downs took longer than expected for Alex. We will be installing the new code on the new fb machine tomorrow and running it. We will be running off of a timer on that machine until Monday. On Monday, a Symmetricom card will be arriving from LLO so that we can connect an IRIG-B timing signal into the frame builder and use a proper time signal. There is no running frame builder for tonight and thus will be no trends until we get the new FB running tomorrow morning.
  2603   Sat Feb 13 18:58:31 2010 josephb, alexUpdateComputersfb40m testpoints fixed

I received an e-mail from Alex indicating he found the testpoint problem and fixed it today:

Quote from Alex: "After we swapped the frame builder computer it has reconfigured all device files and I needed to create some symlinks on /dev/ to make tpman work again. I test the testpoints and they do work now."

 

  1743   Tue Jul 14 14:54:19 2009 steveConfigurationComputersfb40m2 in 1Y6

Alex and Steve,

SunFire x4600 ( not  MEGATRON 2 , it is fb40m2 ) and JetStor ( 16 x 1 TB drives ) were installed on side rails at the bottom of 1Y6

We cleaned up the fibres and cabling in 1Y7 also

  7267   Fri Aug 24 00:23:20 2012 DenUpdateModern Controlfeedback using LQG method

I did a simulation of linear quadratic gaussian (LQG) controller applied to local damping. The cost function was frequency shaped to have a peak at 1 Hz. This technique prevents the controller from adding sensor noise at high and very low frequencies.

Noise was simulated to have 1/f spectrum (seismic) multiplied by stack with a resonance at 4 Hz with Q=5.

model.png         feedback_lqg.png

 

 

  7732   Tue Nov 20 15:11:22 2012 SteveUpdateGeneralfew more sensing cards

New  Lumitek IR Sensor Cards are here. We got 2 pieces of Q-11-T (2" x 2"), 2 pieces of Q-11-T (0.75" x 0.75")  and one Q-11 (4" x 5")

  8985   Thu Aug 8 10:31:28 2013 SteveUpdateVACfew reminders of this vent

 1, Vacuum envelope grounds must be connected all times!  After door removal reconnect both cables immediately.

 2, The crane folding had a new issue of getting cut as picture shows.

 3, Too much oplev light is scattered. This picture was taken just before we put on the heavy door.

 4, We were unprepared to hold the smaller side chamber door 29" od of the IOC

 5, Silicon bronze 1/2-13 nuts for chamber doors will be replaced. They are not smooth turning.

 

Attachment 1: GROUND!.jpg
GROUND!.jpg
Attachment 2: bad_Folding.jpg
bad_Folding.jpg
Attachment 3: toomuchred.jpg
toomuchred.jpg
Attachment 4: BETTERholder.jpg
BETTERholder.jpg
ELOG V3.1.3-