40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 27 of 344  Not logged in ELOG logo
ID Date Author Typeup Category Subject
  94   Mon Nov 12 14:09:19 2007 robDAQGeneraltpman dead on fb40m
The testpoint manager was dead on fb40m. I know I re-started it sometime after the power outage, so something must have killed it. If you get an error from DTT like
"diagnostic kernel does not support: testpoints", then log into fb40m as root, check for the tpman with a ps -ef | grep tpman. If it's not there, then run /usr/controls/tpman & and close the terminal window.
  152   Fri Nov 30 21:27:24 2007 ranaDAQPEMweather / stacis / c1pem1
I was trying to add some Seis BLRMS channels to the c1pem1 processor so that we could have DMT trends.

Then I found that none of the Weather channels have been working for a year or so. I could also not
telnet into it. I tried resetting it but no luck. There was no entry in the Wiki for it so I added
a place holder.

Have the weather channels ever worked? Do we have those sensors? I think I've never actually looked
for this. Seems like a fine ugrad job.
  157   Mon Dec 3 00:10:42 2007 ranaDAQComputer Scripts / Programslinemon
I've started up one of our first Matlab based DMT processes as a test.

There's a matlab script running on Mafalda which is measuring the height of the 60 Hz peak
in the MC1 UL SENSOR and writing it to an unused EPICS channel (PZT1_PIT_OFFSET).

The purpose of this is just to see if such a thing is stable over long periods of time. Its
open on a terminal on linux3 so it can be killed at any time if it runs amok.

Right now the code just demods the channel and tracks the absolute value of the peak. The
next upgrade will have it track the actual frequency once per minute and then report that
as well. We also have to figure out how to make it a binary and then make a single script
that launches all of the binaries.

For now you can watch its progress on the StripTool on op540m; its cheap and easy DMT viewer.
  160   Mon Dec 3 19:06:49 2007 ranaDAQComputer Scripts / Programslinemon
I turned up my nose at Matlab's special tools. I modified the linetracker to use the
relationship phase = 2*pi*f*t to estimate the frequency each minute. The
code uses 'polyfit' to get the mean and trend of the unwrapped phase and then determines
how far the initial frequency estimate was off. It then uses the updated number as the
initial guess for the next minute.

I looked at a couple hours of data before letting it run. It looks like the phase of the
'60 Hz' peak varies at 20 second time scales but not much faster or rather anything faster
would be a glitch and not a monotonic frequency drift.

From the attached snapshot you can see that the amplitude (PZT1_PIT) varies by ~10 %
and the frequency by ~40 mHz in a couple hour span.
Attachment 1: spd64d1.jpg
spd64d1.jpg
  170   Wed Dec 5 19:25:07 2007 ranaDAQCDSDMF
I made a database file on C1AUX called dmf.db. It has 9 DMF EPICS channels which are also trended
so that one can now write data to those channels from a DMF Monitor and the data will be records.

New channels:
[C1:DMF-SEIS_1]
[C1:DMF-SEIS_2]
[C1:DMF-SEIS_3]
[C1:DMF-LINE_1]
[C1:DMF-LINE_2]
[C1:DMF-LINE_3]
[C1:DMF-MC_1]
[C1:DMF-MC_2]
[C1:DMF-MC_3]

I added these to C1AUX because it doesn't do much and can be booted without having much effect.
(it controls Mech Shutters, Video, and Illuminators. It used to also do the EO Shutter but I
removed that from its startup.cmd and it will no longer load those records).
  204   Wed Dec 19 20:28:27 2007 AndreyDAQPEMNames for all 6 accelerometers have been changed

I eventually changed the names for all 6 accelerometers (see my ELOG entry # 172 from Dec. 05 about my intent to do that).

I removed the word "BS" from their names,
and I changed the word combination "ACC_BS_EAST" in the old name for "ACC_ITMX" in the new name;
as well "ACC_BS_WEST" is now replaced by "ACC_ETMX".
(the reasoning behind such a change should become clear from my ELOG entry #172).

New accelerometer names are:
(note: there are no spaces (nowhere!) in the names of accelerometers, but ELOG replaces ": P" written without a space by a strange symbol Tongue)

C1 : PEM - ACC _ ETMX _ X ;
C1 : PEM - ACC _ ETMX _ Y ;
C1 : PEM - ACC _ ETMX _ Z ;
C1 : PEM - ACC _ ITMX _ X ;
C1 : PEM - ACC _ ITMX _ Y ;
C1 : PEM - ACC _ ITMX _ Z .

One can find them in "C1 : PEM - ACC" in Dataviewer.

  312   Tue Feb 12 16:34:07 2008 robDAQDMFseisBLRMS 1.1
The compiled version of seisBLRMS had been running ~2 weeks without crashing as of last night, when I killed it
so it wouldn't interfere with alignment scripts.  I added an EPICS channel C1:DMF-ENABLE, and updated the DMF
executables to check this channel while running.  So far it seems to work.  When you're running alignment acripts,
simply click the DISABLE button on the C1DMF_MASTER.adl screen, and then re-ENABLE when the scripts finish. 
It's not clear why this is necessary though.  Theories include the constant disk access is keeping the
framebuilder busy, reducing its ability to deal with ezcademod commands and DMF programs just flooding the
network with so much traffic that ezcademod-related packets run late and get ignored.

Also, for reasons of aesthetics, I changed the data delay from 6 minutes to 5 minutes.  We'll see if that's enough.
  316   Thu Feb 14 15:04:53 2008 robDAQDMFDMF delay

Sometime ago I edited seisBLRMS to keep of track of how long it was taking to write RMS data (that is, the delay between the accelerometer data and the write of the EPICS rms data). Here's a plot of that info, showing how the delay increases over time. I think this indicates a logical flaw in the timing of the seisBLRMS program, which sort of relies on everything running well consistently; this should not be difficult to fix. I'll maybe try increasing the delay to ~10 minutes, and making it relatively inflexible.
Attachment 1: DMFdelay.png
DMFdelay.png
  317   Thu Feb 14 15:05:18 2008 robDAQDMFseisBLRMS 1.1
> 
> Also, for reasons of aesthetics, I changed the data delay from 6 minutes to 5 minutes.  We'll see if that's enough.

5 minutes didn't work.  
  358   Tue Mar 4 23:22:32 2008 robDAQComputersc1susvme1&2 rebooted

I found that some channels from c1susvme1 and c1susvme2 were not being recording by the DAQ (and were not showing up in DV). I rebooted these processors, which fix the problem. If you see other cases of this (signal exactly zero, but not a testpoint problem), just reboot the corresponding processor.
  440   Wed Apr 23 22:39:54 2008 AndreyDAQComputer Scripts / ProgramsProblem with "get_data" and slow PEM channels

It turns out that I cannot read minute trends for the slow weather channels for more than 1000 seconds back (roughly more than 15 minutes ago) using "get_data" script.

For comparison, I tried MC1 slow channels, and similar problem did not arise there. Probably, something is wrong with the memory of slow weather channels. At the same time, I can see minute-trends in Dataviewer as long ago as I want.

In response to
>>get_data('C1: PEM-weather_outsideTemp', 'minute', gps('now') - 3690, 3600);
I get the error message:
"Warning: Missong C1: PEM-weather_outsideTemp M data at 893045156".
  456   Sun Apr 27 18:11:58 2008 robDAQComputersbr40m?

The testpoint manager (which runs on fb40m) crashed this afternoon. Upon re-starting it, I found there was a rogue dtt process on op440m and also a daqd daemon running on br40m. One or both of these caused the tpman to crash. br40m is the frame broadcaster, which is never used here as we don't run DMT. I killed the daqd process there.

The way to find if there is a rogue process is to watch the output to the console from the tpman when you start it:

Allocate new TP handle 56 by 131.215.113.203
Allocate new TP handle 57 by 131.215.113.203
Allocate new TP handle 58 by 131.215.113.203
Allocate new TP handle 59 by 131.215.113.203
Allocate new TP handle 60 by 131.215.113.203
Allocate new TP handle 61 by 131.215.113.203
Allocate new TP handle 62 by 131.215.113.203
Allocate new TP handle 63 by 131.215.113.203
Allocate new TP handle 64 by 131.215.113.203
Allocate new TP handle 65 by 131.215.113.203
Allocate new TP handle 66 by 131.215.113.203
Allocate new TP handle 67 by 131.215.113.203
Allocate new TP handle 68 by 131.215.113.203


If you see something like this, with a new TP handle being allocated every few seconds, you need to log in to the corresponding host and kill whatever process has run away.
  457   Sun Apr 27 22:57:15 2008 ajwDAQComputersbr40m?

Quote:

The testpoint manager (which runs on fb40m) crashed this afternoon. Upon re-starting it, I found there was a rogue dtt process on op440m and also a daqd daemon running on br40m. One or both of these caused the tpman to crash. br40m is the frame broadcaster, which is never used here as we don't run DMT. I killed the daqd process there.

The way to find if there is a rogue process is to watch the output to the console from the tpman when you start it:

Allocate new TP handle 56 by 131.215.113.203
Allocate new TP handle 57 by 131.215.113.203
Allocate new TP handle 58 by 131.215.113.203
Allocate new TP handle 59 by 131.215.113.203
Allocate new TP handle 60 by 131.215.113.203
Allocate new TP handle 61 by 131.215.113.203
Allocate new TP handle 62 by 131.215.113.203
Allocate new TP handle 63 by 131.215.113.203
Allocate new TP handle 64 by 131.215.113.203
Allocate new TP handle 65 by 131.215.113.203
Allocate new TP handle 66 by 131.215.113.203
Allocate new TP handle 67 by 131.215.113.203
Allocate new TP handle 68 by 131.215.113.203


If you see something like this, with a new TP handle being allocated every few seconds, you need to log in to the corresponding host and kill whatever process has run away.


I *think* Alex is responsible for the daqd daemon running on br40m (he set up some new stuff recently, a data concentrator and broadcaster); I'll make sure he sees this post.
  459   Tue Apr 29 21:09:12 2008 ranaDAQCDSFE Filters
These are new FE filters for downsampling and upsampling. We will be going from native hardware sampling rates of 64k down to 32k, 16k, and 2k.

The attached plot shows these filters. They are 3dB ripple, 40 dB stopband, 4th order elliptic filters in which I have moved the zeros around
into good places (e.g. to the Nyquist frequency).

I'm also attaching the .txt file containg the filter coefficients and the design strings. The filters are called x2, x4, and x32, for the
D2, D4, and D32 downsampling, respectively.
Attachment 1: fefilters.jpg
fefilters.jpg
Attachment 2: fefilters.txt
# FILTERS FOR ONLINE SYSTEM
#
# Computer generated file: DO NOT EDIT
#
# MODULES ULYAW
#
################################################################################
### ULYAW                                                                    ###
################################################################################
# SAMPLING ULYAW 65536
... 28 more lines ...
  583   Fri Jun 27 15:20:52 2008 robDAQLSC.ini file change

I removed C1:LSC-XARM_CTRL from the frames and added C1:LSC-CARM_ERR
  640   Mon Jul 7 13:58:37 2008 Eric, josephbDAQPEMUsing unused PEM channels to test camera code
Joe and I have taken control of the EPICS channels C1:PEM-Stacis_EEEX_geo and C1:PEM-Stacis_EEEY_geo since we heard that they are no longer in use.  We are currently 
using them to test the ability for the Snap camera code to read and write from EPICS channels.  Thus, the information being written to these channels is completely unrelated
to their names or previous use.  This is only temporary; we'll create our own channels for the camera code shortly (probably within the next couple of days).

- Eric
  660   Fri Jul 11 20:16:01 2008 EricDAQCamerasTaking data from the GC 750 Camera
Mafalda has been set up with a background process to constantly take data from the GC 750 camera (at the end of the x-arm) for the weekend. This camera will otherwise be inoperable until then.

In the small chance that this slows either Mafalda or the network to a crawl, the process to kill should have PID 26265.
  671   Tue Jul 15 10:09:42 2008 EricDAQCamerasDid anyone kill the picture taking process on Mafalda?
Did anyone kill the process on Mafalda that was taking pictures of the end mirror of the x-arm last Friday? I need to know whether or not it crashed of its own accord.
  673   Tue Jul 15 11:47:56 2008 JenneDAQPEMAccelerometer channels in ASS Adapt MEDM screen
Jenne, Sharon

We have traced which accelerometers correspond to which channels in the C1ASS_TOP MEDM screen.

Accelerometer Channel
------------- --------------------------
MC1-X C1:ASS-TOP_PEM_2_ADAPT_IN1
MC1-Y C1:ASS-TOP_PEM_3_ADAPT_IN1
MC1-Z C1:ASS-TOP_PEM_4_ADAPT_IN1
MC2-X C1:ASS-TOP_PEM_5_ADAPT_IN1
MC2-Y C1:ASS-TOP_PEM_6_ADAPT_IN1
MC2-Z C1:ASS-TOP_PEM_7_ADAPT_IN1

SEISMOMETER C1:ASS-TOP_PEM_1_ADAPT_IN1
  684   Wed Jul 16 17:36:51 2008 JohnDAQPSLFSS input offset
I changed the nominal FSS input offset to 0 from 0.3. Tolerance remains unchanged at +/-0.05.
  700   Fri Jul 18 19:43:55 2008 YoichiDAQComputersPSL fast channels cannot be read by dataviewer
At this moment only the PSL fast channels have trouble.
Rob restarted fb40m, c1IOVME, but no effect.
  826   Mon Aug 11 19:09:28 2008 JenneDAQPEMSeismometer DAQ is being funny
While looking at the Ranger seismometer's output to figure out what our max typical ground motion is, Rana and I saw that the DAQ output is at a weird level. It looks like even though the input to the DAQ channel is being saturated, the channel isn't outputing as many counts as expected to Dataviewer.

Sharon and I checked that the output of the seismometer looks reasonable - sinusoidal when I tap on the seismometer, and the the output of the SR560 (preamp) is also fine, and not clipping. If I stomp on the floor, the output of the SR560 goes above 2V (to about 3V ish), so we should be saturating the DAQ, and getting the max number of counts out. However, as you can see in the first figure, taken when I was tapping the seismometer, the number of counts at saturation is well beneath 32768counts. (16 bit machine, so the +-2V of the DAQ should have a total range of 65536. +2V should correspond to +32768counts.) The second figure shows 40 days of seismometer data. It looks like we saturate the DAQ regularly.

I did a check of the DAQ using an HP6236B power supply. I sent in 1V, 2V and 2.2V (measuring the output of the power supply with a 'scope), and measured the number of counts output on the DAQ.

Input Voltage [V]Counts on DataviewerExpected counts from 16 bit machine
11898316384
22933132768
2.22934732768


I'm not sure why the +1V output more than the expected number of counts (unless I mis-measured the output from the power supply).

Moral of the story is...when the DAQ is saturated, it is not outputting the expected number of counts. To be explored further tomorrow...
Attachment 1: SeisDAQ.png
SeisDAQ.png
Attachment 2: SeisData.png
SeisData.png
  917   Wed Sep 3 19:09:56 2008 YoichiDAQComputersc1iovme power cycled
When I tried to measure the sideband power of the FSS using the scan of the reference cavity, I noticed that the RC trans. PD signal was not
properly recorded by the frame builder.
Joe restarted c1iovme software wise. The medm screen said c1iovme is running fine, and actually some values were recorded by the FB.
Nonetheless, I couldn't see flashes of the RC when I scanned the laser frequency.
I ended up power cycling the c1iovme and run the restart script again. Now the signals recorded by c1iovme look fine.
Probably, the DAQ boards were not properly initialized only by the software reset.
I will re-try the sideband measurement tomorrow morning.
  1028   Mon Oct 6 12:45:41 2008 AlbertoDAQLSCC1LSC in coma
Alberto, Joe,

The C1LSC medm screen is frozen and the C1LSC computer is down. We tried to reboot and to restart it first turning off the power and then just rebooting remotely. None of them worked. We check whether any of the cable was unplugged but they were ok. Also all the led turned on to green after rebooting.
Trying to reboot we get the following error message: init_module: device or resource busy.

We called Alex who first suggested to check all the connection and then to swap the timing cable between two Pentek boards but the computer was still down.
It is possible that the board is dead. Alex and Rolf are going to look into this problem and for any spare board.

by now we can't lock any DOF of the IFO.
  1029   Mon Oct 6 16:41:33 2008 AlbertoDAQLSCC1LSC in coma

Quote:
Alberto, Joe,

The C1LSC medm screen is frozen and the C1LSC computer is down. We tried to reboot and to restart it first turning off the power and then just rebooting remotely. None of them worked. We check whether any of the cable was unplugged but they were ok. Also all the led turned on to green after rebooting.
Trying to reboot we get the following error message: init_module: device or resource busy.

We called Alex who first suggested to check all the connection and then to swap the timing cable between two Pentek boards but the computer was still down.
It is possible that the board is dead. Alex and Rolf are going to look into this problem and for any spare board.

by now we can't lock any DOF of the IFO.


Alex, Rob, Alberto,

Alex replaced the Pentek board used by C1LSC with a spare one that they had at the Wilson house. That fixed the DMA failure but since the board had a channel broken, one of the channels (POY) was still not available.

Looking at the wiring diagram of the ASC crate, we found that one of the Pentek boards in it was actually not used by anything and thus available to replace the bad one in LSC. We switched the two boards so that now the one that Alex installed is mounted in the ASC crate and it is connected to the cable labeled 1x2-ASC 6.

C1LSC is up again and all the channels in the C1LSC screen, including POY, now seem to be working properly.
  1066   Wed Oct 22 09:42:41 2008 AlbertoDAQComputersc1iool0 rebooted and MC autolocker restarted
This morning I found the MC unlocked. The MC-Down script didn't work because of network problems in communicating with scipe7, a.k.a. c1iool0. Telneting to the computer was also impossible so I power cycled it from its key switch. The first time it failed so I repeated it a second time and then it worked.
Yoichi then restarted c1iovme. It was also necessary to restart the MC autolocker script according to the following procedure:
- ssh into op440m
- from op440m, ssh into op340m
- restart /cvs/cds/caltech/scripts/scripts/MC/autolockMCmain40
  1114   Tue Nov 4 17:58:42 2008 AlbertoDAQPSLMC temperature sensor
I added a channel for the temperature sensor on the MC1/MC3 chamber: C1:PSL-MC_TEMP_SEN.
To do that I had to reboot the frame builder. The slow servo of the FSS had to get restarted, the reference cavity locked and so the PMC and MZ.
  1124   Fri Nov 7 18:38:19 2008 AlbertoDAQPSLMC temperature sensor hooked up
Alberto, Rana,
we found that the computer handling the signals from ICS-110B was C1IOVME so we restarted it. We changed the name of the channel to C1:PEM_TEMPS and the number to 16349. We tracked it up to the J14 connector of the DAQ.
We also observed the strange thing that both of the differential pairs on J13 are read by the channle. Also, if you connect a 50 Ohm terminator to one of the pairs, the signal even get amplified.

(The name of the channel is PEM-MC1_TEMPS)
  1130   Wed Nov 12 11:14:59 2008 CarynDAQPSLMC temp sensor hooked up incorrectly
MC Temperature sensor was not hooked up correctly. It turns out that for the 4 pin LEMO connections on the DAQ like J13, J14, etc. the channels correspond to horizontal pairs on the 4 pin LEMO. The connector we used for the temp sensor had vertical pairs connected to each BNC which resulted in both the differential pairs on J13 being read by the channel.
To check that a horizontal pair 4 pin LEMO2BNC connector actually worked correctly we unlocked the mode cleaner, and borrowed a connector that was hooked up to the MC servo (J8a). We applied a sine wave to each of the BNCs on the connector, checked the J13 signal and only one of the differential pairs on J13 was being read by the channel. So, horizontal pairs worked.
  1143   Tue Nov 18 13:28:08 2008 CarynDAQIOOnew channel for MC drum modes
Alberto has added a channel for the Mode Cleaner drum modes.
C1:IOO-MC_DRUM1
sample rate-2048
chnum-13648
  1228   Wed Jan 14 15:53:32 2009 steveDAQPSLMC temperature sensor

Quote:
I added a channel for the temperature sensor on the MC1/MC3 chamber: C1:PSL-MC_TEMP_SEN.
To do that I had to reboot the frame builder. The slow servo of the FSS had to get restarted, the reference cavity locked and so the PMC and MZ.


Where is this channel?
  1246   Thu Jan 22 14:38:41 2009 carynDAQPSLMC temperature sensor

Quote:

Quote:
I added a channel for the temperature sensor on the MC1/MC3 chamber: C1:PSL-MC_TEMP_SEN.
To do that I had to reboot the frame builder. The slow servo of the FSS had to get restarted, the reference cavity locked and so the PMC and MZ.


Where is this channel?


That's not the name of the channel anymore. The channel name is PEM-MC1_TEMPS. It's written in a later entry.
  1349   Tue Mar 3 11:39:50 2009 OsamuDAQComputers2 PCs in Martian

 Kiwamu and I brought 2 SUPER MICRO PCs from Willson house into 40m.

Both PCs are hooked up into Martian network. One is named as bscteststand for BSC which has been set up by Cds people and another one is named kami1 for temporary use for CLIO which is a bland new, no operating installed PC. This bland new PC will be returned Cds or 40m once another new PC which we will order within several days arrives.

IP address for each machine is 131.215.113.83 and 131.215.113.84 respectively.

We have installed CentOS5.2 into the new PC.

  1381   Mon Mar 9 23:55:38 2009 OsamuDAQComputersbscteststand and kami1 outside martian

This morning there was a confliction of tpman running on fb40m and kami1. Alex fixed it temporary but Rana suggested it was better to move both PCs outside martian. We moved both PCs physically to the control room and connected to general network with a local router. I believe it won't conflict anymore but if you guess these PC might have trouble please feel free to shutdown.

 

Today's work summary:

 *connected expansion chassis to bscteststand

 *obtained signals on dataviewer, dtt for both realtime and past data on bscteststand with 64kHz timing signal

 

Questions:

Excitation channels are not shown, only "other" is shown.

qts.mdl should run with 16kHz but 16kHz timing causes a slow speed on dataviewer and failing data aquisition on dtt. We are using 64kHz timing but is it really correct?

  1407   Mon Mar 16 15:19:52 2009 OsamuDAQElectronicsSR785

I borrowed SR785 to measure AA, AI noise and TF.

  1417   Sun Mar 22 23:16:41 2009 ranaDAQComputer Scripts / Programstpman restart
Could get testpoints but couldn't start excitations. Restarted tpman on daqawg. Works now.

Still no log file. Mad
  1528   Tue Apr 28 12:55:57 2009 CarynDAQPEMUnplugged Guralp channels

For the purpose of testing out the temperature sensors, I stole the PEM-SEIS_MC1X,Y,Z channels.

I unplugged Guralp NS1b, Guralp Vert1b, Guralp EW1b cables from the PEM ADCU(#10,#11,#12) near 1Y7 and put temp sensors in their place (temporarily).

  1540   Sat May 2 16:34:31 2009 carynDAQPEMGuralp channels plugged back in

I plugged the Guralp cables back into the PEM ADCU

       Guralp NS1b ---> #11

       Guralp Vert1b --->#10

       Guralp EW1b --->#12

  1643   Tue Jun 2 23:53:12 2009 peteDAQComputersreset c1susvme1

rob, alberto, rana, pete

we reset this computer, which was out of sync (16384 in the FE_SYNC field instead of 0)

  1661   Mon Jun 8 18:22:27 2009 AlbertoDAQLSCAdded PD11 I amd Q slow channels

I just added two slow channels to C0EDCUEPICS to monitor the input of PD11. The names are:

[C1:LSC-PD11_I_INMON]
[C1:LSC-PD11_Q_INMON]

  1733   Sun Jul 12 20:06:44 2009 JenneDAQComputersAll computers down

I popped by the 40m, and was dismayed to find that all of the front end computers are red (only framebuilder, DAQcontroler, PEMdcu, and c1susvmw1 are green....all the rest are RED).

 

I keyed the crates, and did the telnet.....startup.cmd business on them, and on c1asc I also pushed the little reset button on the physical computer and tried the telnet....startup.cmd stuff again.  Utter failure. 

 

I have to pick someone up from the airport, but I'll be back in an hour or two to see what more I can do.

  1735   Mon Jul 13 00:34:37 2009 AlbertoDAQComputersAll computers down

Quote:

I popped by the 40m, and was dismayed to find that all of the front end computers are red (only framebuilder, DAQcontroler, PEMdcu, and c1susvmw1 are green....all the rest are RED).

 

I keyed the crates, and did the telnet.....startup.cmd business on them, and on c1asc I also pushed the little reset button on the physical computer and tried the telnet....startup.cmd stuff again.  Utter failure. 

 

I have to pick someone up from the airport, but I'll be back in an hour or two to see what more I can do.

 I think the problem was caused by a failure of the RFM network: the RFM MEDM screen showed frozen values even when I was power recycling any of the FE computers. So I tried the following things:

- resetting the RFM switch
- power cycling the FE computers
- rebooting the framebuilder
 
but none of them worked.  The FEs didn't come back. Then I reset C1DCU1 and power cycled C1DAQCTRL.
 
After that, I could restart the FEs by power recycling them again. They all came up again except for C1DAQADW. Neither the remote reboot or the power cycling could bring it up.
 
After every attempt of restarting it its lights on the DAQ MEDM  screen turned green only for a fraction of a second and then became red again.
 
So far every attempt to reanimate it failed.
  1736   Mon Jul 13 00:53:50 2009 AlbertoDAQComputersAll computers down

Quote:

Quote:

I popped by the 40m, and was dismayed to find that all of the front end computers are red (only framebuilder, DAQcontroler, PEMdcu, and c1susvmw1 are green....all the rest are RED).

 

I keyed the crates, and did the telnet.....startup.cmd business on them, and on c1asc I also pushed the little reset button on the physical computer and tried the telnet....startup.cmd stuff again.  Utter failure. 

 

I have to pick someone up from the airport, but I'll be back in an hour or two to see what more I can do.

 I think the problem was caused by a failure of the RFM network: the RFM MEDM screen showed frozen values even when I was power recycling any of the FE computers. So I tried the following things:

- resetting the RFM switch
- power cycling the FE computers
- rebooting the framebuilder
 
but none of them worked.  The FEs didn't come back. Then I reset C1DCU1 and power cycled C1DAQCTRL.
 
After that, I could restart the FEs by power recycling them again. They all came up again except for C1DAQADW. Neither the remote reboot or the power cycling could bring it up.
 
After every attempt of restarting it its lights on the DAQ MEDM  screen turned green only for a fraction of a second and then became red again.
 
So far every attempt to reanimate it failed.

 

After Alberto's bootfest which was more successful than mine, I tried powercycling the AWG crate one more time.  No success.  Just as Alberto had gotten, I got the DAQ screen's AWG lights to flash green, then go back to red.  At Alberto's suggestion, I also gave the physical reset button another try.  Another round of flash-green-back-red ensued.

When I was in a few hours ago while everything was hosed, all the other computer's 'lights' on the DAQ screen were solid red, but the two AWG lights were flashing between green and red, even though I was power cycling the other computers, not touching the AWG at the time.  Those are the lights which are now solid red, except for a quick flash of green right after a reboot.

I poked around in the history of the curren and old elogs, and haven't found anything referring to this crazy blinking between good and bad-ness for the AWG computers.  I don't know if this happens when the tpman goes funky (which is referred to a lot in the annals of the elog in the same entries as the AWG needing rebooting) and no one mentions it, or if this is a new problem.  Alberto and I have decided to get Alex/someone involved in this, because we've exhausted our ideas. 

  1752   Wed Jul 15 17:18:24 2009 JenneDAQComputersDAQAWG gone, now back

Yet again, the DAQAWG flipped out for an unknowable reason.  In order of restart activities listed on the Wiki, I keyed the crate and nothing really happened, then I hit the physical reset button and nothing happened, and then I did the 'telnet....vmeBusReset', and a couple minutes later, it was all good again.

  1769   Tue Jul 21 17:01:18 2009 peteDAQDAQtemp channel PEM-PETER_FE

I added a temporary channel, to input 9 on the PEM ADCU.    Beware the 30, 31, and 32 inputs.  I tried 32 and it only gave noise.

 

 

  1831   Wed Aug 5 07:33:04 2009 steveDAQComputersfb40m is down
  1832   Wed Aug 5 09:25:57 2009 AlbertoDAQComputersfb40m is up

FB40m up and running again after restarting the DAQ.

  1836   Wed Aug 5 15:33:05 2009 rob, albertoDAQGeneralcan't get trends

We can't read minute trends from either Dataviewer or loadLIGOData from before 11am this morning. 

 

fb:/frames>du -skh minute-trend-frames/
 106G   minute-trend-frames

So the frames are still on the disk.  We just can't get them with our usual tools (NDS).

 

 Trying to read 60 days of minute trends from C1:PSL-PMC_TRANSPD yields:

Connecting to NDS Server fb40m (TCP port 8088)
Connecting.... done
258.0 minutes of trend displayed
read(); errno=9
read(); errno=9
T0=09-06-06-22-34-02; Length=5184000 (s)
No data output.

 

Trying to read 3 seconds of full data works.

Second trends are readable after about 4am UTC this morning, which is about 9 pm last night.

 


  1841   Thu Aug 6 09:22:17 2009 AlbertoDAQGeneralcan't get trends

Quote:

We can't read minute trends from either Dataviewer or loadLIGOData from before 11am this morning. 

 

fb:/frames>du -skh minute-trend-frames/
 106G   minute-trend-frames

So the frames are still on the disk.  We just can't get them with our usual tools (NDS).

 

 Trying to read 60 days of minute trends from C1:PSL-PMC_TRANSPD yields:

Connecting to NDS Server fb40m (TCP port 8088)
Connecting.... done
258.0 minutes of trend displayed
read(); errno=9
read(); errno=9
T0=09-06-06-22-34-02; Length=5184000 (s)
No data output.

 

Trying to read 3 seconds of full data works.

Second trends are readable after about 4am UTC this morning, which is about 9 pm last night.

 


 Yesterday Alex started transferring the data records to the new storage unit. That prevented us from accessing the trends for a fe hours.

The process had been completed and now we can read the trends again.

  2365   Tue Dec 8 10:20:33 2009 AlbertoDAQComputersBootfest succesfully completed

Alberto, Kiwamu, Koji,

this morning we found the RFM network and all the front-ends down.

To fix the problem, we first tried a soft strategy, that is, we tried to restart CODAQCTRL and C1DCUEPICS alone, but it didn't work.

We then went for a big bootfest. We first powered off fb40m, C1DCUEPICS, CODAQCTRL, reset the RFM Network switch. Then we rebooted them in the same order in which we turned them off.

Then we power cycled and restarted all the front-ends.

Finally we restored all the burt snapshots to Monday Dec 7th at 20:00.

ELOG V3.1.3-