40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 248 of 341  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  538   Wed Jun 18 16:07:57 2008 robSummaryComputersRFM network down

The RFM network tripped off around noon today. It's still down. The problem appears to be with the EPICS interface (c1dcuepics). Trying to restart one of the end stations yields the error: No response from EPICS.

Possible causes include (but not limited to): busted RFM card on c1dcuepics, busted PMC bus on c1dcuepics, busted fiber from c1dcuepics to the RFM switch. We need Alex.
  544   Wed Jun 18 18:50:09 2008 ranaUpdateComputersIt can only be attributable to human error. (HAL - 2001)
There has been another one of "those" events and all of the front end machines are down.

We poked around and Rob determined that the FEs can't get the EPICS data from EPICS. The
dcuepics machine is hooked up and running and all of the epics binaries are running. We also
tried resetting its RFM switch as well as power cycling the box using the "poweroff" command.


Not a sausage.

Rob points out that although the Signal Detect lights are on on the cards, the 'Own Data' light
is not on on the dcuepics' card although it is on for some of the cards on the other boxes.


We have placed messages with the Russian. If anyone sees him, don't let him go without fixing things.
Also, make sure to follow him around with notepad and possibly a camera to record what it is that
he does. If he's muttering, maybe try to use a sensitive hidden sound recorder.
  545   Thu Jun 19 15:52:06 2008 AlbertoConfigurationComputersMeasure of the current absorbed by the new Megatron Computer
Together with Rich Abbot, sam Abbot and I measured the current absorbed by the new Megatron computer that we installed yesterday in the 1Y3 rack. The computer alone absorbs 8.1A at the startup and then goes down to 5.9A at regime. The rest of the rack took 5.2A without the computer so the all rack needs 13.3 at the startup and the 11.1A.

We also measured the current for the 1Y6 rack where an other similar Sun machine has been installed as temporary frame builder and we get 6.5A.


Alberto, Rich and Sam Abbot
  586   Fri Jun 27 19:59:44 2008 JohnUpdateComputersc1iovme
C1susvme2 and C1iovme crashed which sent the optics swinging and tripped the watchdogs.

Koji and I were able to restore c1susvme2 without any trouble.

We have been unable to revive c1iovme. We have tried telneting in and running startup.cmd,
the process runs for a while then hangs with "DAQ init failed -- exiting".

Resetting the board doesn't help. I didn't try keying the whole crate.

All optics are back to normal with damping restored.
  587   Sat Jun 28 03:10:25 2008 robUpdateComputersc1iovme

Quote:
C1susvme2 and C1iovme crashed which sent the optics swinging and tripped the watchdogs.

Koji and I were able to restore c1susvme2 without any trouble.

We have been unable to revive c1iovme. We have tried telneting in and running startup.cmd,
the process runs for a while then hangs with "DAQ init failed -- exiting".

Resetting the board doesn't help. I didn't try keying the whole crate.

All optics are back to normal with damping restored.


I tried keying the crate, then keying the DAQ controller & AWG, then powering down & restarting the framebuilder.
On coming up, the framebuild doesn't start a daqd process, and I can't get one to start by hand (it just prints "652", and then stops).
No error messages and daqd doesn't appear in the prstat.

I then tried keying the DAQ controller again (after the fb0 reboot), which blew the watchdogs on all the suspensions. So then I went around and keyed all the crates.

Now, the suspension controllers are back online. Still no c1iovme, and now the framebuilder/DAQ/AWG are also hosed. We can try keying all the crates again, in the order that Yoichi did last week.

After some more poking around, I found the daqd log file. It's now complaining about

Jun 28 03:00:39 fb daqd[546]: [ID 355684 user.info] Fatal error: channel `C1: PSL-FSS_MIXERM_F' is duplicated 126

This is the second error message like this. It first complained about C1: PSL-FSS_FAST_F, so I commented that out of C1IOOF.ini and rebooted the framebuilder (note this is an actual reboot of the full solaris machine). Eventually I discovered that C1IOOF.ini and C1IOO.ini are essentially identical. They presumably will keep getting these duplicate channel errors until one of them is completely removed.

C1IOO.ini has a modification time of seven PM on Friday night. Who did this and didn't elog it? I've now modified C1IOOF.ini, and I don't remember when it was last modified.
  588   Sat Jun 28 14:56:44 2008 JohnUpdateComputersini files
In short, I was editing the ini files yesterday evening, I didn't e-log it and after some investigation this afternoon it apears
that I am to blame for all the computer problems which followed.


I wanted to edit C0EDCU.ini and C1IOOF.ini to change C1: PSL-FSS_FAST to a fast channel as C1: PSL-FSS_FAST_F
was dead.

I opened these files and made backups. It appears this is where it all went awry. My backup for C1IOOF
is called C1IOO.ini.090627 i.e. missing the F.

Later c1susvme2 and c1iovme crashed. After failing to bring c1iovme back I wondered if my edits had
caused the problems so I restored the back up files. It appears that here I wrote over C1IOO with my backup of
C1IOOF (presumably because I had made a typo in the name).

To remedy the situation we could restore C1IOO from e.g. chans/archive/C1IOO_080618_160028.ini

No excuses for not e-logging this activity.
  589   Sat Jun 28 23:23:50 2008 JohnUpdateComputersRebooting
All of the computers are now showing green lights.

Remaining problems:

Alignment scripts are failing with "ERROR: LDS - NDS server error #13"
I think this is a server transmission error.

Dataviwer shows all channels as zero.
  592   Sun Jun 29 14:53:02 2008 robUpdateComputersRebooting

Quote:
All of the computers are now showing green lights.

Remaining problems:

Alignment scripts are failing with "ERROR: LDS - NDS server error #13"
I think this is a server transmission error.

Dataviwer shows all channels as zero.


Fixed. Just started the testpoint manager on fb40m.


su
/usr/controls/tpman &
  593   Sun Jun 29 18:58:43 2008 ranaSummaryComputers1e20 is too big for AWG and/or IOVME
While testing out my matlab/awgstream based McWFS diagnostic script I accidentally put a
huge excitation into
C1:IOO-WFS1_PIT_EXC
. This went to 1e20 and then caused
some SUS to trip and c1susvme2 to go red. I tried booting it via the normal procedures
but it wouldn't come back, even after 2 crate power cycles. I also tried booting AWG
via the vmeBusReset, but that didn't do it. Then I booted c1iovme from the telnet prompt
and then I could restart c1susvme2 successfully.

The reason the excitation was so large is that the following filter command is unstable:
[b,a] = butter(4,[0.02 30]/1024);

The low pass part is OK, but it looks like making such a low frequency digital filter
is not. Que lastima. On the bright side, the code now has some excitation amplitude
checking.
  606   Mon Jun 30 16:00:02 2008 josephb, samConfigurationComputers 
Sam and I setup Cat6 cable from Megatron to the 1Y6 Switch (131.215.113.252) and also connected the 1Y6 Hub to the control room switch.

While I was at it, I checked the configurations of the two switchs now connected (one in 1X4 and one in 1Y6) to the martian network. For some reason, the 1X4 had switched to DHCP enabled and was using 131.215.113.105 as an IP address. I had thought I had setup it correctly initially, so am not sure what caused the change.

The easiest way I know of to check the setup is use smartwizard discovery program from the Netgear install CD (in the equipment manual file cabinet of the control room) on a windows machine. The passwords have been set to the controls password.

Megatron should now see and be accessible through the martian network.
  608   Tue Jul 1 09:26:33 2008 steveUpdateComputersRFM network is down
  610   Tue Jul 1 11:53:38 2008 YoichiUpdateComputersRFM network back
I took a tour of the FE machines and power cycled all of them.
After executing the software restart procedures of those computers, the RFM network got back to the normal state.
For some reason, the computers requiring startup.cmd (like c1lsc) halt after running this command. Actually the computer is running ok, but the command freezes. Basically, what it does is simply to load a kernel module. I don't know what is wrong.
Anyway, I just closed the terminal after running startup.cmd and it seems fine for now.
  614   Tue Jul 1 13:34:29 2008 robUpdateComputersRFM network back

Quote:

For some reason, the computers requiring startup.cmd (like c1lsc) halt after running this command. Actually the computer is running ok, but the command freezes. Basically, what it does is simply to load a kernel module. I don't know what is wrong.
Anyway, I just closed the terminal after running startup.cmd and it seems fine for now.


This is normal. On the linux RTFEs (Real-Time Front Ends), the real-time code totally hijacks the kernel, disallowing any interrupts. The system thus becomes totally unresponsive while the code is running, and communicates only through the RFM and the VME backplane.
  625   Wed Jul 2 17:19:03 2008 JohnSummaryComputersop440m - shutdown and restarted
After 160days op440m was getting a little slow.
  631   Thu Jul 3 13:54:26 2008 robConfigurationComputersmDV on rosalba

Does mDV work on rosalba? It can't find NDS_GetChannels. Looking on mafalda, I see that NDS_GetChannels is a mexglx. I think this means someone may need to compile it for 64-bit matlab before we can have mDV on rosalba. When that's done, we should get mDV running on megatron.
  636   Sun Jul 6 16:17:40 2008 tobinHowToComputersSVN
I was able to check out the 40m SVN here in Livingston using this command:

svn co svn+ssh://controls@nodus.ligo.caltech.edu/cvs/cds/caltech/svn/trunk/medm

As you might guess, this uses ssh in place of the web server (which we don't have yet).
  641   Mon Jul 7 14:02:05 2008 YoichiUpdateComputersSVN conversion progress
So far /cvs/cds/caltech/medm, /cvs/cds/caltech/chans and /cvs/cds/caltech/scripts have been converted to svn working copies.
Now /cvs/cds/caltech/target is being converted.
  644   Tue Jul 8 00:14:28 2008 JohnSummaryComputersAlarm handler
Rob thought it would be nice to have some alarms on the cpu loads and FE syncs.
I added all these channels to the alarm handler config file and wrote a script
which would set their values (HIHI,HIGH etc).

Ezcawrite allowed me to set the alarm levels (and ezcaread would give the correct
value) but no matter what I set the value to the alarm wouldn't sound.

After experimenting with a few other channels it appears that the alarm handler will
not show alarms if the alarm levels are absent from the db file (even though ezca
gives a value).

I edited the following files so we can have alarms on the cpus.

In c1iscepics:
lsc40m.db
asc40m1.db

In c1losepics:
bs.db
etmx.db
etmy.db
itmx.db
itmy.db
mc1.db
mc2.db
mc3.db
prm.db
srm.db

I saved backups in the appropriate folders.

Next time we have a bootfest please also do c1iscepics and c1losepics so these changes
will be implemented.
  654   Thu Jul 10 13:47:12 2008 YoichiHowToComputerssvn access via https
Now you can access to the svn repository on nodus by https.
To perform a checkout, you can use the following command

svn co --username svn40m https://nodus.ligo.caltech.edu:30889/svn/trunk/chans

This will check out "chans" directory.
The password for svn40m is written in the usual place.
You can also access the URL by a web browser to see the repository in a very primitive way.
A nice web interface for browsing the repository is planed but not yet implemented.
  658   Fri Jul 11 00:30:24 2008 robMetaphysicsComputersstrange SUS controllers

rob, johnnieM

We were hampered early tonight by the fact that someone sneakily turned off the HP RF Ampflier on the AS table.

After that, we were hampered further by mode cleaner strangeness. It would occasionally spontaneously unlock & blow its watchdogs. It never made it through the ontoMCL script (putting DC-CARM onto the MCL). After some investigation, we found that c1susvme1 and c1susvme2 were running stochastically late (SYNC_FE != 0), even though their computation times never got above 61. Also, the end SUS controllers were never late.

Weird.

After rebooting the vertex SUS controllers and the c1lsc, things appear to be working again.
  667   Mon Jul 14 12:43:07 2008 JohnSummaryComputersRestarted fb40m, tpman and c1ass
  682   Wed Jul 16 16:28:14 2008 josephbConfigurationComputersFixed IP address on Switch
Realized today that the change I made back on June 30th to the switch was to the wrong switch. I had disabled the DHCP setting and mislabeled the switch in the control room (which seems to not have affected anything).

I've turned DHCP back on and labeled it correctly using the Netgear "Smartwizard discovery" program.
  695   Fri Jul 18 17:06:20 2008 JenneUpdateComputersComputers down for most of the day, but back up now
[Sharon, Alex, Rob, Alberto, Jenne]

Sharon and I have been having trouble with the C1ASS computer the past couple of days. She has been corresponding with Alex, who has been rebooting the computers for us. At some point this afternoon, as a result of this work, or other stuff (I'm not totally sure which) about half of the computers' status lights on the MEDM screen were red. Alberto and Sharon spoke to Alex, who then fixed all of them except C1ASC. Alberto and I couldn't telnet into C1ASC to follow the restart procedures on the Wiki, so Rob helped us hook up a monitor and keyboard to the computer and restart it the old fashioned way.

It seems like C1ASC has some confusion as to what its IP address is, or some other computer is now using C1ASC's IP address.

As of now, all the computers are back up.
  700   Fri Jul 18 19:43:55 2008 YoichiDAQComputersPSL fast channels cannot be read by dataviewer
At this moment only the PSL fast channels have trouble.
Rob restarted fb40m, c1IOVME, but no effect.
  724   Wed Jul 23 16:31:02 2008 AlbertoConfigurationComputersMegatron connected
Joe, Rana, Alberto,

we found out the password for Megatron so we could log in and set a new one so that now it's the same as that for controls.
The IP address is 131.215.113.59.

We had to switch to another LAN ports to actually connect it.
  725   Wed Jul 23 17:19:48 2008 AlbertoConfigurationComputersMegatron connected
We changed the IP address. Ther new one is 131.215.113.95.

Joe, Alberto


Quote:
Joe, Rana, Alberto,

we found out the password for Megatron so we could log in and set a new one so that now it's the same as that for controls.
The IP address is 131.215.113.59.

We had to switch to another LAN ports to actually connect it.
  742   Sat Jul 26 15:09:57 2008 AidanUpdateComputersReboot of op440m

I was reviewing the PSL Overview screen this afternoon and op440m completely froze when I center-clicked on the REF CAVITY TRANSMISSION indicator. It was unresponsive to any keyboard or mouse control. The moon button had no effect to shut the machine down.

Called Alberto in and we logged into op440m from rosalba. From there we logged in as 'root' and run a shutdown script '/usr/sbin/shutdown -i S -g 1'. The medm screens started disappearing from the op440m display and we were eventually asked to enter System Maintenance Mode. From here we selected RUN LEVEL 5: "state 5: Shut the machine down so that it is safe to remove the power". Following this the machine turned itself off.

We powered it back on, logged back in as controls and restarted the medm screens. Everything seems to be running fine now.
Aidan.
  744   Sun Jul 27 20:49:21 2008 ranaConfigurationComputersNTP
After Aidan did whatever he did on op440m, I had to restart ntpd. I noticed it didn't actually do
anything so I restarted it by hand with the '-l' option to make a logfile. Essentially, the
problem is that NTPD is not allowed access to the outside world's NTP servers by our NAT router;
this should be fixed.

So for now I set all of the .conf files to point to rana and nodus' IP addresses. According to the
log files, that is successful. Rosalba and Mafalda, however, seem to have correct time but are
looking at rhel.ntp.pool.org and time.nist.gov, respectively. Maybe these have special rules?

For reference, the linux machines' conf files are /etc/ntp.conf
and the solaris machines' conf files are /etc/inet/ntp.conf

I also logged into dcuepics (aka scipe25) and did as instructed.
  777   Thu Jul 31 16:11:22 2008 josephbConfigurationComputersMatlab on Megatron
Matlab now works on megatron.

I did a few things:

1) Added to the PATH environment variable. Did this in .bash_profile in the /home/controls directory by adding the line

PATH=$PATH:/cvs/cds/caltech/apps/linux64/matlab/bin/
export PATH

This probably should be somewhere else up further up the line, but I was too lazy to figure it out.

2)Fixed a gateway mistake I had added earlier so the megatron could use the NAT router and see the outside world so yum worked.

3) Removed the i386 based libXp and openmotif packages.

4) Installed the x86_64 based libXp and openmotif packages.

Edit: Forgot that I also added the following line to the /etc/fstab file in order to mount the shared code. This was stolen directly from Rosalba's /etc/fstab file. This was so that it could see the matlab code.
linux1:/home/cds/ /cvs/cds nfs rw,bg,soft 0 0
  779   Fri Aug 1 10:45:46 2008 josephbConfigurationComputersMegatron now running tcsh
At Rana's request, I've remotely switched Megatron over to using tcsh. I had to ssh -X in order ot use the "/sbin/system-config-users" program which is a graphical UI for modifying users. I had to go to preferences and uncheck hide system users, which then allowed me to see the controls user (at the bottom of the list), and edit it.

I also created a .tcshrc file in the /home/controls directory and copied the information from the .bashrc file, and also moved the matlab path definition into the PATH environment variable.

Does anyone know if sourcing /cvs/cds/caltech/cshrc.40m would be usable on a 64 bit machine, or does a new one need to be made for Megatron and/or Rosalba?
  780   Fri Aug 1 11:51:15 2008 justingOmnistructureComputersadded /cvs/cds/site directory
I added a /cvs/cds/site directory. This is the same as is dicsussed here. Right now it just has the text file 'cit' in it, but eventually the other scripts should be added. I'll probably use it in the next version of mDV.
  815   Fri Aug 8 12:21:57 2008 josephbConfigurationComputersSwitched X end ethernet connections over to new switch
In 1X4, I've switched the ethernet connections from c1iscex and c1auxex over to the new Prosafe 24 port switches. They also use the new cat6 cables, and are labeled.

At the moment, everything seems to be working as normally as it was before. In addition:

I can telnet into c1auxex (and can do the same to c1auxey which I didn't touch).
I can't telnet into c1iscex (but I couldn't do that before, nor can I telnet into c1iscey either, and I think these are computers which once running don't let you in).
  822   Mon Aug 11 11:36:11 2008 josephb, SteveConfigurationComputersc1susvme1 minor problems
Around 11 am c1susvme1 start having issues. Namely C1:SUS-PRM_FE_SYNC was railing at some large value like 16384 (2^14). I presume this means the computer was running catastophically late.

I turned off the BS and ITM watch dogs (the PRM was already off), tried hitting reset and sshing in, and running startup, but this didn't help. I then turned off the c1susvme2 associated watch dogs (MC1-3, SRM) and went out to do a hard reboot by switching the crate power off. c1susvme2 came back up fine, was restarted and associated watch dogs turned back on. However, c1susvme1 came back up without mounting /cvs/cds/.

As a test, I replaced the ethernet connection with a CAT6 cable to the Prosafe switch in 1Y6, and then ran reboot on c1susvme1. When it came back up, it had mounted properly, and I was able to run the ./startup.cmd file. At this point it seems to be happy. The new cable is in the trays coming in from the top of the 1Y4 and 1Y6 and approriately labeled.

Edit: Apparently ITMX and ITMY became excited after the reboot (perhaps I turned the watchdogs back on too early? Although that was after the DAQ light was listed as green for c1susvme). Steve noticed this when the alarms went off again (I had turned them off after the reboot seemed successful), and he damped them. Interestingly, the BS remained unexcited.
  823   Mon Aug 11 12:42:04 2008 josephbConfigurationComputersContinuing saga of c1susvme1
Coming back after lunch around 12:30pm, c1susvme1's status was again red. After switching off watchdogs, a reboot (ssh, su, reboot) and restarting startup.cmd, c1susvme1 is still reporting a max sync value (16384), occassionally dropping down to about 16377. The error light cycles between green and red as well.

At this point, I'm under the impression further reboots are not going to solve the problem.

Currently leaving the watchdogs associated with c1susvme1 off for the moment, at least until I get an idea of how to proceed.
  824   Mon Aug 11 13:59:23 2008 josephbConfigurationComputers 
While poking around the crate, I noticed an error light on one of the c1susvme2 related boards was lit, while the corresponding light on the c1susvme1 was not. This confuses me as the c1susvme1 is the one having problems.

As a quick sanity check, I unplugged the ethernet connection from the c1susvme1 labeled board, and confirmed I couldn't log into it, and then plugged it back in, restarted it, and re-ran the startup script. This time c1susvme1 seemed to come up fine. Re-enabling the watchdogs doesn't seem to kick anything, and in fact seems to be bringing everything into line properly.

Although the error light on the c1susvme2 clk drvr board is still on. So I'm not sure what thats trying to tell us. Open to suggestions.
  825   Mon Aug 11 15:07:49 2008 josephbConfigurationComputersProcyon aka fb40m switched to new switch
I've connected Procyon to the Prosafe 24 port switch with a new, labeled Cat6 cable. Quick tests with dataviewer shows that its working.
  827   Tue Aug 12 12:05:36 2008 YoichiUpdateComputersHP color printer is back
I restarted the HP printer server (a little box connected to the HP color laser) so that we can use the HP LaserJet 2550.
After this treatment, the printer spat out a bunch of pages from suspended jobs, many of these were black and white.
I think people should use the black-and-white printer for these kind of jobs, because the color printer is slow and troublesome.
  852   Tue Aug 19 13:34:58 2008 josephbConfigurationComputersSwitched c1pem1, c0daqawg, c0daqctrl over to new switches
Moved the Ethernet connections for c1pem1, c0daqawg, and c0daqctrl over to the Netgear Prosafe switch in 1Y6, using new cat6 cables.
  858   Wed Aug 20 11:42:49 2008 JohnSummaryComputerspdftk
I've installed pdftk on all the control room machines.

http://www.pdfhacks.com/pdftk/
  859   Wed Aug 20 11:50:10 2008 JohnSummaryComputersStripTools on op540m

To restart the striptools on op540m:

cd /cvs/cds/caltech/scripts/general/

./startstrip.csh
  889   Tue Aug 26 19:07:37 2008 YoichiHowToComputersReading data from Agilent 4395A analyzer through GPIB from *Linux* machine
I succeeded in reading data from Agilent 4395A analyzer, who's floppy is crappy, through GPIB from a Linux machine using
agilent 82357B USB-GPIB interface.
I installed the linux GPIB driver to one of the lab. laptops (the silver DELL one currently sitting on the 4395A analyzer).
I wrote an initialization script for the USB-GPIB interface and a small python script for reading data from the analyzer.

[Usage]

1. Connect the USB-GPIB interface to the laptop and the analyzer.
2. Run /usr/local/bin/initGPIB command (it takes about 10sec to complete).
3. Run /usr/local/bin/getgpibdata.py > data.txt to save data from the analyzer to a text file.

The data format is explained in the comments of getgpibdata.py
This method is way faster than the unreliable floppy. The data is transfered in a few sec.

I'm now writing a wiki page on this
http://lhocds.ligo-wa.caltech.edu:8000/40m/GPIB

I will install the same thing into the other DELL laptop soon.
Let me know if you have trouble with this.
  890   Wed Aug 27 10:55:35 2008 YoichiHowToComputersAnnoying behavior of the touch pads of the lab. laptops is fixed
I was sick of the stupid touch pad behavior of the lab. laptops, i.e. firefox goes back and forth in the history when the cursor is moved.
It was caused by firefox mis-interpreting the horizontal scroll signal as back/forward command.
I stopped it by going to about:config in firefox and set mousewheel.horizscroll.withnokey.action to 0 and
mousewheel.horizscroll.withnokey.sysnumlines to true.
  894   Thu Aug 28 19:02:25 2008 rana, josephb, robSummaryComputersbig boot
This afternoon Joe did something with an .ini file (look for his detailed elog entry) and the computers went bad.
RFM network screen not active - filter modules not working.

We went around and booted every machine as has been done before. The correct order for a memory corruption
fixing big boot is the following:

    [1] RESET the RFM switches near the FB racks.
    [2] Power cycle c1dcuepics.
    [3] Power cycle all other crates with real time CPUs:
    c1iscey, daqctrl, daqawg, c1susvme1, c1susvme2, c1sosvme, c1iovme, c1lsc, c1asc, & c1iscex
    [4] Start up all FEs as described in Wiki.
    [5] Burt restore everyone (losepics, iscepics, assepics, omcepics?)
  897   Fri Aug 29 11:01:49 2008 josephbConfigurationComputersAttempt to change a channel gain in ICS-110B
As noted earlier by Rana, I was playing around with the /cvs/cds/caltech/chans/daq/C1IOOF.ini file with help from Rob. I had made a backup before hand and saved it as C1IOOF.ini.Aug-28-2008. (I have since been informed that C1IOOF.ini.082808 would have been prefered as a name).

We had been trying to up the gain in the C1: PSL-ISS_INMONPD_F in order to do a very low power PMC sweep, in an attempt to get clean modes for fitting. Initially we pressed the reconfig button on the C0DAQ_DETAIL screen, but all that seemed to do was change the Config File CRC. We proceeded to reboot fb40m remotely. However, any change to the ini file (even an extra space at the end of the file) caused a 0x2000 status for C1IOVME16k on the C0DAQ_DETAIL screen. At the time I presumed it was comparing the CRC of the ini-file to something else.

Digging around on in Alex's webspace at http://www.ligo.caltech.edu/~aivanov/ , I found the NDS Access page, which indicated that 0x2000 was a conflict between the front-end and frame builder .ini files.

"There is also status bit 0x2000 which gets added when the DCU configuration is different in front-end and frame builder. That is you can change and .ini file an then reload DAQ configuration with Epics button, which reconfigures the front-end, but leaves frame builders with invalid old configuration. They will detect this change and set the status to 0x2000 to indicate this condition. You will have to restart frame builders to pick up new .ini file and set status back to zero for the affected DCU."

It was when I was going to try reseting the c1iovme via the C0DAQ_RFMNETOWRK medm screen that we realized the EPICS controls were not responding properly. The .ini file was returned to its original form, and mass reboots commenced.
  898   Fri Aug 29 11:05:11 2008 josephbSummaryComputersc1asc was down this morning
I had to manually reboot c1asc this morning, as for some unknown reason its status was red, and the fiber lights on the board were status:red, sig det:amber, own data: nothing. Shut the crate down, turned it back on, heard a beep, then followed wiki reboot instructions. Seems to be working now.
  899   Fri Aug 29 12:41:26 2008 josephb, EricConfigurationComputersMore front ends moved to new network
Used Cat6 cables to finish moving all the front ends in 1Y4 and 1Y5 over to the new GigE network switches, specifically to the switch in 1Y6. This included the ones labeled c1susvme2, c1sosvme, and c1dscl1epics0.
  900   Fri Aug 29 12:43:44 2008 josephbSummaryComputersc1susvme1 down
Around noon today, c1susvme was having problems. The C0DAQ_RFMNETWORK light was red. The status light was off, the sig det light was amber and the own data light was green. I could also ssh in, but could not not run startup. I switched off the watchdogs for c1susvme2 (the watchdogs for c1susvme1 had already been tripped), and manually power cycled the crate.

However, when c1susvme1 when it came back up it had not mounted the usual cvs/cds/ directories. c1susvme2 did however. c1susvme1 has been on the new network for awhile, while c1susvme2 was switch over today. So apparently switching networks doesn't help this particular problem.

I did a remote reboot of c1susvme1, and it came up with the correct files mounted. Both machines ran their approriate startup.cmd files and are currently green.
  917   Wed Sep 3 19:09:56 2008 YoichiDAQComputersc1iovme power cycled
When I tried to measure the sideband power of the FSS using the scan of the reference cavity, I noticed that the RC trans. PD signal was not
properly recorded by the frame builder.
Joe restarted c1iovme software wise. The medm screen said c1iovme is running fine, and actually some values were recorded by the FB.
Nonetheless, I couldn't see flashes of the RC when I scanned the laser frequency.
I ended up power cycling the c1iovme and run the restart script again. Now the signals recorded by c1iovme look fine.
Probably, the DAQ boards were not properly initialized only by the software reset.
I will re-try the sideband measurement tomorrow morning.
  922   Thu Sep 4 11:33:25 2008 josephb, Eric, JenneConfigurationComputersAttempt to increase gain for C1:PSL-ISS_INMONPD_F via 110B
We were attempting to increase the gain on the channel C1:PSL-ISS_INMONPD_F in preparation to do a scan of the PMC at very low input power.

We started by adding a line to the C1:IOOF.ini file in /cvs/cds/caltech/chans/daq/ under that channel that said "gain=10.0". Before touching anything, the channel was outputting around 4000 counts.

We hit the reconfig button for c1iovme16k, then rebooted c1iovme (which turned out to do nothing) and then the framebuilder, in a method consistent with the wiki. This turned out to put the channel in an odd state, where it was showing very rapid, random spikes, virtually but still around 4000ish counts. We returned the file back to its original format, hit reconfig, and then rebooted the framebuilder. The channel however, was still behaving in the same broken way.

After poking around the PSL table, looking at some direct outputs, we came back and rebooted c1iovme and the framebuilder again, which fixed the channel, such that it was reading out correctly. Taking this as a sign that maybe we should reboot the framebuilder, then c1iovme to get the channel to load changes, we changed the file again to have "gain=10.0". Upon reboot of the framebuilder, the channel was still reading out fine, but at the same level. So we continued with the reboot of c1iovme. This still had no effect on the channel output.

The ini file has been set back at this point, however since Yoichi is working, I'm holding off doing a reconfig and reboot on the framebuilder until later.
  925   Thu Sep 4 16:24:56 2008 ranaConfigurationComputersAttempt to increase gain for C1:PSL-ISS_INMONPD_F via 110B

Quote:
We were attempting to increase the gain on the channel C1:PSL-ISS_INMONPD_F in preparation to do a scan of the PMC at very low input power.

According to the Wikipedia, certain esoteric mathematical
operations lead to the result that 4000 x 10 > 32768.
ELOG V3.1.3-