40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 80 of 335  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  1201   Mon Dec 22 13:48:22 2008 YoichiUpdateComputersRFM network bypass box's power supply is dead
As a temporary fix, I cut the cable of the power supply and connected it to the Sorensen power supply +5V on the rack.
Now, the RFM bypass box is powered up, but some LEDs are red, which looks like a bad sign.
I restarted all the FE computers, but this time I got errors during the execution of the startup commands in the VxWorks machines.
The errors are "General Protection Fault" or "Invalid Opcode".
The linux machines do not show errors but still the status lights in EPICS are red.
We need Alex's help. He did not answer the phone, so Alberto left a voice mail.
  1202   Tue Dec 23 10:35:40 2008 YoichiUpdateComputersRFM network breakdown mostly fixed
Rana, Rolf, Alberto, Yoichi

The source of the problem was the RFM bypass box, as expected.
Rana pointed out that the long cable I used to bring the 5V from the Sorensen to the box
may cause a large voltage drop considering that the box is sucking ~3A.
So we connected the cable to another power supply (5V/5A linear power supply).
Then the LEDs on the bypass box turned green from red, and everything started to work.

A weired thing is that when I connected the cable to the wrong terminals of the power supply which
have lower current supply capabilities, the supply voltage dropped to 3V, but still the LEDs on the bypass box
turned green. This means the bypass box can live with 3V.
I noticed that there is a long cable from the Sorensen to the cross connect on the side of the rack, where I
connected my cable to the bypass box. This long cable had somewhat large resistance (1 or 2 Ohms) and dropped
the supply voltage to less than 3V ?
Anyway, the bypass box is now on a temporary power supply. Alberto was assigned a task to find a replacement power
supply.

There are two remaining problems.
c1susvme1 fails to start often claiming a DMA error on a Pentek. After several attempts, you can start the machine,
but after a while (1 hour ?) it fails again.
op340m is not responding to ssh login. It responds to ping.
We hooked up a monitor and keyboard (USB because the machine does not have a PS/2 port) to it and rebooted.
At the boot, it briefly displays a message "No keyboard, try TTYa", but after that no display signal.
Steve found me a serial cable. I will try to login to the machine using the serial port.

  1203   Wed Dec 24 10:33:24 2008 YoichiUpdateComputersSeveral fixes. Test point problem remains.
Yesterday, I fixed several remaining problems from the power failure.

I found a LEMO cable connecting the timing board to the Penteks was lose on the c1susvme1 crate.
After I pushed it in, the DMA error has not occured on c1susvme1.

I logged into op340m using a Null Modem Cable.
The computer was failing to boot because there were un-recoverable disk errors by the automatic fsck.
I run fsck manually and corrected some errors. After that, op340m booted normally and now it is working fine.
Here is the serial communication parameters I used to communicate with op340m:
>kermit      (I used kermit command for serial communication.)
>set modem type none
>set line /dev/ttyS0     (ttyS0 should be the device name of your serial port)
>set speed 9600
>set parity none
>set stop-bits 1
>set flow-control none
>connect

After fixing op340m, the MC locked.
Then I reset the HV amps. for the steering PZTs.
Somehow, the PZT1 PIT did not work. But after moving the slider back and forth several times, it started to work.

I reset the mechanical shutters around the lab.

I went ahead to align the mirrors. The X-arm locked but the alignment script did not improve
the arm power.
I found that test points are not available. (diag said test point management not available).
Looks like test point manager is not running. Called Rolf, but could not reach him.
I'm not even sure on which machine, the tp manager is supposed to be run.
Is it c0daqawg ?
  1204   Wed Dec 24 12:46:54 2008 YoichiUpdateComputersTest points are back
Rob told me how to restart the test point manager.
It runs on fb40m and actually there is an instruction on how to do that in the Wiki.
http://lhocds.ligo-wa.caltech.edu:8000/40m/Computer_Restart_Procedures#fb40m

I couldn't find the page because when I put a keyword in the search box on the upper right
corner of the Wiki page and hit "enter", it only searches for titles. To do a full text
search, you have to click on the "Text" button.

Anyway, now the test points are back.
  1206   Mon Dec 29 21:38:57 2008 YoichiUpdateComputersSnapshots of MEDM screens
I wrote scripts to take snapshots of MEDM screens in the background.
These scripts work even on a computer without a physical display attached.
You don't need to have X running.
So now the scripts run on nodus every 5 minutes from cron.
The screen shots are saved in /cvs/cds/caltech/statScreen/images/

There is a wiki page for the scripts.
http://lhocds.ligo-wa.caltech.edu:8000/40m/captureScreen.sh

Someone has to make a nice web page summarizing the captured images.
  1207   Mon Dec 29 21:51:02 2008 YoichiConfigurationComputersWeb server on nodus
The apache on nodus has been solely serving for the svn web access.
I changed the configuration and all files under /cvs/cds/caltech/users/public_html/ can be seen under
https://nodus.ligo.caltech.edu:30889/

The page is not password protected, but you can add a protection by putting an appropriate .htaccess
in your directory.
For the standard LVC password, put the following in your .htaccess
AuthType Basic  
AuthName "LVC password"
AuthUserFile /cvs/cds/caltech/apache/etc/LVC.auth
Require valid-user
  1221   Fri Jan 9 17:30:10 2009 KakeruUpdateComputersSnapshots of MEDM screens
I wrote a web page which shows snapshots of MEDM screens generated by Yoich's script (e-log #1206).
https://nodus.ligo.caltech.edu:30889/medm/screenshot.html
This page refreshes itself every 5 minutes automatically.

The .html file is generated by /cvs/cds/caltech/statScreen/bin/genHtml.pl
This script generates the .html file contains snapshots listed on /cvs/cds/caltech/statScreen/etc/medmScreens.txt every 5 minutes with cron.
When you wont to display other screens, please edit this .txt file and wait 5 minutes!


To make thumbnails, I wrote /cvs/cds/caltech/statScreen/bin/genThumbnail.pl
This script reads /cvs/cds/caltech/statScreen/etc/medmScreens.txt, too.
(Sometimes, it makes thumbnails with larger storage...)


Quote:
I wrote scripts to take snapshots of MEDM screens in the background.
These scripts work even on a computer without a physical display attached.
You don't need to have X running.
So now the scripts run on nodus every 5 minutes from cron.
The screen shots are saved in /cvs/cds/caltech/statScreen/images/

There is a wiki page for the scripts.
http://lhocds.ligo-wa.caltech.edu:8000/40m/captureScreen.sh

Someone has to make a nice web page summarizing the captured images.
  1224   Tue Jan 13 11:10:42 2009 robConfigurationComputersconlogger restarted
unknown how long it's been down.
  1231   Fri Jan 16 11:28:54 2009 YoichiUpdateComputersLab. laptop needs wireless lan driver update
One of the lab. laptops (belladonna) cannot connect to the network now.
I guess this was caused by someone clicked the update icon and unknowingly updated the kernel, which resulted in the wireless lan driver malfunctioning.
It was using a Windows driver through ndiswrapper.
Someone has to fix it.
  1235   Fri Jan 16 18:33:54 2009 YoichiSummaryComputersc1lsc rebooted to fix 16Hz glitches
Kakeru, Yoichi

There were 16Hz harmonics in the PD3 and PD4 channels even when there is no light falling on it.
Actually, even when the connection to the ADC was removed, the 16Hz noise was still there.

Rob suggested that this might be digital problem, because data is sent to the daq computer very 1/16 of a second.

We restarted c1lsc and the problem went away.
  1238   Mon Jan 19 15:10:37 2009 YoichiHowToComputersloadLIGOData a GUI for mDV
I installed loadLIGOData, a product of my weekend project, in /cvs/cds/caltech/apps/loadLIGOData.
This is a Matlab GUI for getting data from nds servers. It uses a modified version of mDV to retrieve data.
You can choose and download LIGO data into Matlab quickly.
I also wrote a GUI to plot the downloaded data easily.
With this GUI, you can plot multiple channel data in a single figure, which is useful to identify the cause for a lock loss etc.
You can change the time axis labels to UTC or Local time in stead of GPS second.

You can run it by typing loadLIGOData in a terminal of a linux machine.
A brief explanation of how to use it is written here:
http://lhocds.ligo-wa.caltech.edu:8000/40m/loadLIGOData

At this moment, data from test points cannot be retrieved properly (of course there is no way to go back to the past for test points.
But still we should be able to get data in real time.). I'll try to find a solution.
Attachment 1: loadLIGOData.png
loadLIGOData.png
Attachment 2: plotLigoData.png
plotLigoData.png
  1239   Mon Jan 19 18:21:41 2009 ranaUpdateComputersloadLIGOData a GUI for mDV
The tool is very nice; I looked at the seismic trend for 16 days (attached).
However, it gives some kind of error when trying to get Hanford or Livingston data.
Attachment 1: a.png
a.png
  1240   Tue Jan 20 15:28:42 2009 YoichiUpdateComputersloadLIGOData a GUI for mDV

Quote:
The tool is very nice; I looked at the seismic trend for 16 days (attached).
However, it gives some kind of error when trying to get Hanford or Livingston data.


I fixed it.
You have to click "Load channels" button when you select a new site.
I plotted one minute of MC_F signals from H1, H2, L1 and 40m.
Looks like L1 MC was swinging a lot.
Attachment 1: MC_F.png
MC_F.png
  1255   Wed Jan 28 12:51:32 2009 YoichiUpdateComputersMegatron is dying
For the past three days, Megatron has been making a huge noise. Sounds like a fan is failing.
There is an LED with "!" sign on the front panel. It is now orange. Looks like some kind of warning.
We can login to the machine. "top" shows the CPU load is almost zero.
Shall we try rebooting it ?
  1258   Thu Jan 29 16:50:53 2009 josephb, albertoConfigurationComputersMegatron fixed
The warning light on megatron and the fans at full speed were fixed by not just power cycling, but completely unplugging megatron from power, waiting for a minute, and then reconnecting the power cables.

Apparently, the Sunfire X4600s at Hanford have also had intermittent problems, which required unplugging the machines completely. In their case, they were unable to access the machine normally, nor could they access the the Intergrated Lights Out Manager (ILOM).

Here, we could interact normally with megatron, but were unable to connect to the ILOM. We were able to get BIOS, but unable to change any of the setting for the ILOM connection. Since the ILOM is a seperate processor and effectively always on, even when the power light is off, a normal shutdown won't reset it. Hence the need to completely disconnect the power supply.
  1261   Fri Jan 30 17:30:31 2009 Alberto, JosephbConfigurationComputersNew computer Ottavia set up
Alberto, Joseph,

Today we installed the computer that some time ago Joe bought for his GigE cameras. It was baptized "OTTAVIA".

Ottavia is black, weighs about 20 lbs and it's all her sister, Allegra (who also pays for bad taste in picking names). She runs an Intel Core 2 Quad and has 4GB of RAM. We expect much from her.

Some typical post-natal operations were necessary.

1) Editing of the user ID
  • By means of the command "./usermod -u 1001 controls" we set the user ID of the user controls to 1001, as it is supposed to be.

2) Connection to the Martian network
  • Ottavia was given IP address 131.215.113.097 by editing the file /etc/sysconfig/networ-scripts/ifcfg-eth0 (we also edited the netmask and the gateway address as in the Wiki)
  • In linux1, which serves as name server, in the directory /var/named/chroot/var/named, we modified both the IP-to-name and name-to-IP register files 131.215.113.in-addr.arpa.zone and 131.215.11in-addr.martian.zone.
  • We set the file /etc/resolv.conf so that the OS knows who is the name server.

3) Mounting of the /cvs/cds path
  • We created locally the empty directories /cvs/cds
  • We edited the files /etc/fstab adding the line "linux1:/home/cds /cvs/cds nfs rw,bg,soft 0 0"
  • We implemented the common variables of the controls environment by sourcing the cshrc.40m: in the file /home/controls/.cshrc we added the two lines "source /cvs/cds/caltech/cshrc.40m" and "setenv PATH ${PATH}:/cvs/cds/caltech/apps/linux64/matlab/bin/"
  1268   Tue Feb 3 15:01:38 2009 AlbertoFrogsComputersmegatron slow?

I notice that Megatron is slower than any other computer in running code that invokes optickle or looptickle (i.e. three times slower than Ottavia). Even without the graphics.

Has anyone ever experienced that?

  1275   Thu Feb 5 16:21:07 2009 JenneFrogsComputersBelladonna connects to the wireless Martian network again

Symptoms:  Belladonna could not (for a while) connect to the wireless network, since there was a driver problem for the wireless card.  This (I believe) started when Yoichi was doing updates on it a while back.

The system: Belladonna is a Dell Inspirion E1505 laptop, with a Broadcom Corporation Dell Wireless 1390 WLAN Mini-PCI Card (rev 01)

Result:  Belladonna now can talk to it's wireless card, and is connected to the Martian network.  (MEDM and Dataviewer both work, so it must be on the network.)

 

What I did:

0.  Find a linux forum with the following method:  http://www.thelinuxpimp.com/main/index.php?name=News&file=article&sid=749

The person who wrote this has the exact same laptop, with the same wireless card.

1.  Get a new(er) version of ndiswrapper, which "translates" the Windows Driver for the wireless card to Linux-ese.  Belladonna previously was using ndiswrapper-1.37.

$wget http://nchc.dl.sourceforge.net/sourceforge/ndiswrapper/ndiswrapper-1.42.tar.gz

2.  Put the ndiswrapper in /home/controls/Drivers, and installed it.

$ndiswrapper -i bcmwl5.inf  3.  Get and put the Windows driver in /home/controls/Drivers/WiFi

$wget http://ftp.us.dell.com/network/R140747.EXE
4. Unzip the driver $unzip -a R140747.EXE

5.  Make Fedora use ndiswrapper

$ndiswrapper -m

$modprobe ndiswrapper

6. Change some files to make everything work:

/etc/sysconfig/wpa_supplicant      CHANGE FROM: DRIVERS="-Dndiswrapper"     CHANGE TO: DRIVERS="-Dwext"

/etc/sysconfig/network-scripts/ifcfg-wlan0      CHANGE FROM: BOOTPROTO=none      CHANGE TO: BOOTPROTO=dhcp

/etc/rc.d/init.d/wpa_supplicant        CHANGE FROM: daemon $prog -c $conf $INTERFACES $DRIVERS -B        CHANGE TO: daemon $prog -c$conf $INTERFACES $DRIVERS -B

6.  Restart things

$service wpa_supplicant restart
$service network restart

7.  Restart computer (since it wasn't working after 1-6, so give a restart a try)

8. Success!!!  MEDM and Dataviewer work without any wired internet connection => wireless card is all good again!

  1286   Mon Feb 9 17:09:51 2009 YoichiUpdateComputersA bunch of updates for the network GPIB stuff.
During the work on ISS, we noticed that netgpibdata.py is very unreliable for SR785.
The problem was caused by flakiness of the "DUMP" command of SR785, which dumps the data from the analyzer to the client.
So I decided to use other GPIB commands to download data from SR785. The new method is a bit slower but much more reliable.

I also rewrote netgpibdata.py and related modules using a new class "netGPIB".
This class is provided by netgpib.py module in the netgpibdata directory. If you use this class for your python program, all technical details and dirty tricks are hidden in the class methods. So you can concentrate on your job.
Since python can also be used interactively, you can use this class for a quick communication with an GPIB instrument.

Here is an example.
>ipython #start interactive python
>>import netgpib #Import the module
>>g=netgpib.netGPIB('teofila',10) #Create a netGPIB object. 'teofila' is the hostname of
#the GPIB-Ethernet converter. 10 is the GPIB address.
>>g.command('ACTD0') #Send a GPIB command "ACTD0". This is an SR785 command meaning "Change active display to 0".
>>ans=g.query('DFMT?') #If you expect a response from the instrument, use query command.
#For SR785, "DFMT?" will return the current display format (0 for single, 1 for dual).
>>g.close() #Close the connection when you are done.

Sometimes, SR785 gets stuck to a weird state and netgpibdata.py may not work properly. I wrote resetSR785.py command to reset it remotely.
Wait for 30sec after you issue this command before doing anything.

I wrote two utility commands to perform measurements with SR785 automatically.
TFSR785.py commands SR785 to perform a transfer function measurement.
SPSR785.py will execute spectrum measurements.
You can control various parameters (bandwidth, resolution, window, etc) with command-line options.
Run those commands with '-h' for help.
It is recommended to use those commands even when you are in front of the analyzer, because they save various measurement parameters (input coupling, units, average number, etc) into a parameter file along with the measured data. Those parameters are useful but recording them for each measurement by hand is a pain.
  1294   Wed Feb 11 15:01:47 2009 josephbConfigurationComputersAllegra

So after having broke Allegra by updating the kernel, I was able to get it running again by copying the xorg.conf.backup file over xorg.conf in /etc/X11.  So at this point in time, Allegra is running with generic video drivers, as opposed to the ATI specific and proprietary drivers.

  1303   Sat Feb 14 16:15:19 2009 robConfigurationComputersc1susvme1

c1susvme1 is behaving weirdly.  I've restarted it several times but its computation time is hanging out around 260 usec, making it useless for suspension control and locking.  I also found a PS/2 keyboard plugged in, which doesn't work, so I unplugged it.  It needs to be plugged into a PS/2 keyboard/mouse Y-splitter cable. 

  1307   Mon Feb 16 00:43:46 2009 ranaUpdateComputersmedm directory wiped on nodus
I accidentally did an 'rm -rf' on the medm directory in nodus, instead of on my laptop as was intended.

I then did an svn checkout. So everything should be current as of the last update, but I am sure that
we have not done a checkin on all of the latest screen enhancements. So...we may have to revert to the
Sunday morning tar to get the latest changes back.
  1310   Mon Feb 16 15:54:07 2009 YoichiUpdateComputersmedm directory wiped on nodus

Quote:
I accidentally did an 'rm -rf' on the medm directory in nodus, instead of on my laptop as was intended.

I then did an svn checkout. So everything should be current as of the last update, but I am sure that
we have not done a checkin on all of the latest screen enhancements. So...we may have to revert to the
Sunday morning tar to get the latest changes back.


Indeed, some changes to the medm directory I made were lost.
It was my fault not to check-in those changes.
I asked Alan to restore the directory from the daily rsync backup.
However, the backup job executed this morning have already overwritten the previous (good) backup with the current (bad) medm directory, which Rana restored from the svn. Alan will ask Stuart and Phil if there is still older backup remaining somewhere.

Anyway, I realized that we should stop the backup cron job whenever you think you made a mistake on /cvs/cds/ directory to prevent unwanted overwriting.
The procedure is:
(1) Login to fb40m
(2) Type 'crontab -e'. Emacs will open up in the terminal.
(3) Comment out the backup job (insert # at the beginning of the line containing /cvs/cds/caltech/scripts/backup/rsync.backup ).
(4) Save the file (Ctrl-x Ctrl-s) and exit (Ctrl-x Ctrl-c).

I will post this information on the wiki.
  1311   Mon Feb 16 16:26:29 2009 robUpdateComputersmedm directory wiped on nodus

Quote:

Quote:
I accidentally did an 'rm -rf' on the medm directory in nodus, instead of on my laptop as was intended.

I then did an svn checkout. So everything should be current as of the last update, but I am sure that
we have not done a checkin on all of the latest screen enhancements. So...we may have to revert to the
Sunday morning tar to get the latest changes back.


Indeed, some changes to the medm directory I made were lost.
It was my fault not to check-in those changes.
I asked Alan to restore the directory from the daily rsync backup.
However, the backup job executed this morning have already overwritten the previous (good) backup with the current (bad) medm directory, which Rana restored from the svn. Alan will ask Stuart and Phil if there is still older backup remaining somewhere.

Anyway, I realized that we should stop the backup cron job whenever you think you made a mistake on /cvs/cds/ directory to prevent unwanted overwriting.
The procedure is:
(1) Login to fb40m
(2) Type 'crontab -e'. Emacs will open up in the terminal.
(3) Comment out the backup job (insert # at the beginning of the line containing /cvs/cds/caltech/scripts/backup/rsync.backup ).
(4) Save the file (Ctrl-x Ctrl-s) and exit (Ctrl-x Ctrl-c).

I will post this information on the wiki.


We should change the rsync script so that it does not delete stuff. Maybe it can keep deleted stuff for 6 months or something.
  1318   Wed Feb 18 03:25:25 2009 YoichiUpdateComputersmedm directory back
I restored the medm directory from the backup on the tape.
The directory had an svn property svn:ignore set and the value of the property included *.snap and *.req.
This resulted in the exclusion of those files from the repository.
I fixed this problem by changing the property of all the directories under /cvs/cds/caltech/medm.
After fixing several other svn problems, the current medm directory contents were checked in to the repository.
  1320   Wed Feb 18 19:13:20 2009 ranaConfigurationComputersSVN & MEDM & old medm files
Allegra had a 2 year old version of SVN installed and CentOS (yum) couldn't upgrade it, so I did an 'svn remove subversion'
and then a 'svn install subversion' to get us up to the Dec '08 version (1.5.5) which is the latest stable.

I also removed all of the old ASS medm directories without backing them up. There's a new RCG script version which is
fixed so that it no longer dumps these old medm directories in there; there's no need since there's already an
medm archive area.

I also removed the medm/old/ directory, did an svn remove, and then copied it back. This is the only way I know of
removing something from the repository without removing it from the working directory.
  1325   Thu Feb 19 16:29:43 2009 YoichiUpdateComputersMartian wireless router bad
The Martian wireless router is dead.
I rebooted it several times, but it hangs up in a minute.
I will ask steve to buy a new one.
  1338   Thu Feb 26 00:36:53 2009 YoichiSummaryComputersC1:LSC-TRX_OUT broken (and fixed later).
Today, Kakeru tried to convert C1:LSC-TRX_OUT and C1:LSC-TRY_OUT to DAQ channels.
He edited C1LSC.ini in the chans/daq directory to add the channel but it did not work.
Then he reverted the file back to the original one.
But after we still could not access these channels from dataviewer nor tds tools.
We restarted daqd and tpman on fb40m, but the problem persisted. Even rebooting the whole fb40m did not help.
After inspecting the log file of daqd, it was clear that tpman was failing to create test points for those channels.
I rebooted c1daqawg and then restarted tpman and daqd on fb40m again.
This time, the problem went away.
  1339   Thu Feb 26 01:24:44 2009 YoichiUpdateComputersMartian wireless is back
Today, a new wireless router arrived.
I configured and installed it. Now the martian wireless network is back.
I updated the wiki page about the wireless network.
http://lhocds.ligo-wa.caltech.edu:8000/40m/Network
  1342   Thu Feb 26 20:09:32 2009 YoichiHowToComputersSR785 python scripts now produce plots
I updated the python scripts to remotely perform measurements with an SR785.
Now these scripts can plot the results immediately using python's matplotlib capability. The sample plots can be seen in my previous elog entry.
In addition to the transfer function (TFSR785.py) and spectrum measurement (SPSR785.py) scripts, I also wrote a script for time series measurements (TSSR785.py).
This is useful when you want to check the signal level flowing in the channels before determining the excitation amplitude.
TSSR785.py will measure and show the time series and histogram of the signal measured by the SR785.
More detailed usage is explained in this wiki page:
http://lhocds.ligo-wa.caltech.edu:8000/40m/netgpib_package
  1349   Tue Mar 3 11:39:50 2009 OsamuDAQComputers2 PCs in Martian

 Kiwamu and I brought 2 SUPER MICRO PCs from Willson house into 40m.

Both PCs are hooked up into Martian network. One is named as bscteststand for BSC which has been set up by Cds people and another one is named kami1 for temporary use for CLIO which is a bland new, no operating installed PC. This bland new PC will be returned Cds or 40m once another new PC which we will order within several days arrives.

IP address for each machine is 131.215.113.83 and 131.215.113.84 respectively.

We have installed CentOS5.2 into the new PC.

  1356   Wed Mar 4 17:59:14 2009 YoichiConfigurationComputersezca tools and tds tools work around
Some of ezca commands and tds commands sporadically fail with a segmentation fault on linux machines.
As far as I know, ezcawrite, ezcastep, ezcaswitch, and tdswrite have this problem.
These are commands to write values into epics channels. So usually people do not check the exit status of those commands in their scripts.
This could cause incomplete execution of, for example, down scripts.
Ideally, this problem should be fixed in the source codes of the problematic commands.
However, I don't have a patience to wait it to happen, and I needed to fix these problems immediately for the lock acquisition.
So I resorted to a hacky solution.
I renamed those commands to *.bin, e.g. ezcawrite -> ezcawrite.bin.
Then wrote wrapper scripts to repeatedly call those commands until it succeeds.
For example, ezcawrite now looks like,
#!/bin/csh -f
setenv POSIXLY_CORRECT
while (! { ezcawrite.bin $* })
      echo "Retry $0 $*"
end
So, when ezcawrite.bin fails, the command retries it and show a message "Retry ....".

If you need to call the original commands, you can always do so by adding ".bin" at the end of the command name.
Currently the following commands are wrapped.
ezcawrite, ezcaservo, ezcastep, ezcaswitch, tdswrite, tdssine.

Please let me know if you have any trouble with this.
  1360   Thu Mar 5 02:24:19 2009 ranaConfigurationComputersyum.repos.d

I added the following repos which I found on allegra to megatron and then did a 'yum install sshfs' on both machines:

allegra:yum.repos.d>l
total 28
-rw-r--r-- 1 root root  428 Feb 12 16:47 rpmforge.repo
-rw-r--r-- 1 root root  684 Feb 12 16:47 mirrors-rpmforge
-rw-r--r-- 1 root root 1054 Feb 12 16:47 epel-testing.repo
-rw-r--r-- 1 root root  954 Feb 12 16:47 epel.repo
-rw-r--r-- 1 root root  626 Feb 12 16:47 CentOS-Media.repo
-rw-r--r-- 1 root root 1869 Feb 12 16:47 CentOS-Base.repo
-rw-r--r-- 1 root root  179 Feb 12 16:47 adobe-linux-i386.repo

This also required me to import the rpmforge GPG key:

sudo rpm --import http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt

  1362   Thu Mar 5 23:18:38 2009 KakeruConfigurationComputerstdsdata doesn't work
I found that tdsdata doesn't work.

When I star tdsdata, he takes a few ~ 10 seconds of data, and he dies with a message "Segmentation fault".
I tried to get data for some times and some channels, and this problem was observed everytime.
I also tried tdsdata on allegra, op440m and mafalda, and it didn't work on all of them.

Yesterday, I got a new version of tdsdata (which modified the problem of Message ID: 1328) and tried to build 
thme on my directory (/cvs/cds/caltech/users/kakeru.....)
This may have some relation to this problem.
  1366   Fri Mar 6 18:14:58 2009 YoichiUpdateComputersawg not working
Starting from this afternoon, the awg is not working.
I rebooted FE computers, c0daqawg as well as tpman and daqd processes on fb40m several times.
But the problem is still there.
I sent an email to Alex.
  1367   Fri Mar 6 18:22:42 2009 YoichiSummaryComputersScripts to restart the FE computers
While doing locking, the FE computers are overloaded sometimes and I have to reboot them.
Being sick of logging into the FE computers one by one to start front end codes, I wrote scripts to do this automatically.
The scripts are in /cvs/cds/caltech/scripts/FE/.
For example, you can restart c1lsc by typing
restartFE c1lsc
You can give multiple computer names to the restartFE command like,
restartFE c1lsc c1asc c1susvme1

To restart all the FE computers, type
restartFE all

For the scripts to work properly, the computers have to accept login, i.e. you either have to power cycle the computers or push "Reset" buttons on the RFMNETWORK medm screen prior to running the scripts.
  1368   Fri Mar 6 18:26:37 2009 YoichiConfigurationComputersezca tools and tds tools work around
I updated the wrapper scripts so that they do not retry more than 6 times.
Otherwise, the wrapper scripts loop over infinitely when you give wrong arguments.



Quote:
Some of ezca commands and tds commands sporadically fail with a segmentation fault on linux machines.
As far as I know, ezcawrite, ezcastep, ezcaswitch, and tdswrite have this problem.
These are commands to write values into epics channels. So usually people do not check the exit status of those commands in their scripts.
This could cause incomplete execution of, for example, down scripts.
Ideally, this problem should be fixed in the source codes of the problematic commands.
However, I don't have a patience to wait it to happen, and I needed to fix these problems immediately for the lock acquisition.
So I resorted to a hacky solution.
I renamed those commands to *.bin, e.g. ezcawrite -> ezcawrite.bin.
Then wrote wrapper scripts to repeatedly call those commands until it succeeds.
For example, ezcawrite now looks like,
#!/bin/csh -f
setenv POSIXLY_CORRECT
while (! { ezcawrite.bin $* })
      echo "Retry $0 $*"
end
So, when ezcawrite.bin fails, the command retries it and show a message "Retry ....".

If you need to call the original commands, you can always do so by adding ".bin" at the end of the command name.
Currently the following commands are wrapped.
ezcawrite, ezcaservo, ezcastep, ezcaswitch, tdswrite, tdssine.

Please let me know if you have any trouble with this.
  1369   Sat Mar 7 16:50:25 2009 YoichiUpdateComputersNot even data retrieval working
Now our digital system is really in trouble.
We can't even get data from tp channels.

I did another round of computer reboots, this time including the RFM bypass switch, c0daqctrl, c0dcu1 and fb40m itself.
But the problem still persists.

I guess there is nothing I can do until Alex comes in.
  1370   Sun Mar 8 23:09:26 2009 ranaUpdateComputersNot even data retrieval working
Although getting the regular DAQ data works, we can't get any testpoints.

I tried restarting tpman several times; there's no inittab on fb40m for this so we should get Alex to set one up when he comes.
I also tried various power cycles and reboots: daqawg, daqctrl, etc. I also notice that Osamu's setup of new stuff is connected to
the same rack and power strips as all of our sensitive DAQ machines. We should find out if there was any hardware installed in the
last couple days; it would be easy to accidentally unplug or damage on of our fibers.

I moved the old tpman.log over to tpman.log.090308. It starts out with a header and then just lists when each TP is requested.

When restarting tpman it puts the following into the terminal:
fb:controls>./tpman &
[1] 1037
fb:controls>VMIC RFM 5565 (0) found, mapped at 0x2868c90
VMIC RFM 5579 (1) found, mapped at 0x2868c90
Could not open 5565 reflective memory in /dev/daqd-rfm1
16 kHz system
Spawn testpoint manager
Channel list length for node 0 is 4168
Test point manager (31001001 / 1): node 0
which is OK?; its the same startup outputs that are in the old log file. It would be nice if there was not and error message about the RFM.
Requesting new testpoints via tdsdata, dtt, or the diag command line doesn't seem to work. tpman doesn't spit anything out although 'tp show 0'
does show that the TP is selected.

Once Alex fixes the 'tpman' issue, we should make sure to put an inittab or startup script in there so that tpman writes a log
file and also archives its old log files upon a restart.
  1372   Mon Mar 9 10:59:05 2009 AlanOmnistructureComputersssh agent on fb40m restarted for backup

After the boot-fest, the nightly backup to Powell-Booth failed, and an automatic email got sent to me. I restarted the ssh agent, following the instructions in /cvs/cds/caltech/scripts/backup/000README.txt .

  1373   Mon Mar 9 11:09:33 2009 AlbertoUpdateComputersRe: Not even data retrieval working

Quote:
Although getting the regular DAQ data works, we can't get any testpoints.

I tried restarting tpman several times; there's no inittab on fb40m for this so we should get Alex to set one up when he comes.
I also tried various power cycles and reboots: daqawg, daqctrl, etc. I also notice that Osamu's setup of new stuff is connected to
the same rack and power strips as all of our sensitive DAQ machines. We should find out if there was any hardware installed in the
last couple days; it would be easy to accidentally unplug or damage on of our fibers.

I moved the old tpman.log over to tpman.log.090308. It starts out with a header and then just lists when each TP is requested.

When restarting tpman it puts the following into the terminal:
fb:controls>./tpman &
[1] 1037
fb:controls>VMIC RFM 5565 (0) found, mapped at 0x2868c90
VMIC RFM 5579 (1) found, mapped at 0x2868c90
Could not open 5565 reflective memory in /dev/daqd-rfm1
16 kHz system
Spawn testpoint manager
Channel list length for node 0 is 4168
Test point manager (31001001 / 1): node 0
which is OK?; its the same startup outputs that are in the old log file. It would be nice if there was not and error message about the RFM.
Requesting new testpoints via tdsdata, dtt, or the diag command line doesn't seem to work. tpman doesn't spit anything out although 'tp show 0'
does show that the TP is selected.

Once Alex fixes the 'tpman' issue, we should make sure to put an inittab or startup script in there so that tpman writes a log
file and also archives its old log files upon a restart.


Alex fixed the problem. It was caused by the awgtpman running on kami1.martian which conflicted with the tpman in fb0.

Killing awgtpman on kami1 allowed for the tpman on tp0 to work properly again.

If more test points are needed, Alex suggested to tune the GDS settings accordingly.
What this actually means, I still have to understand it.
  1374   Mon Mar 9 12:04:18 2009 YoichiUpdateComputersTPs and AWG are back
I had to do one more reboot of tpman and daqd to get the TPs working.
I confirmed the alignment scripts run fine.

Now the oplevs of some optics are largely mis-centered. Alberto and I will center them after lunch.
  1378   Mon Mar 9 19:27:16 2009 ranaConfigurationComputersMove of the CLIO Digital Controls test setup

Because of the network interference we've had from the CLIO system for the past 3-4 days, I asked the guys to remove

the test stand from the 40m lab area. It is now in the 40m control room. Since it needed an ethernet connection to get out

for some reason we've let them hook into GC. Also, instead of using a real timing signal slaved to the GPS, Jay suggested

just skipping it and having the Timing Slave talk to itself by looping back the fiber with the timing signal. Osamu will enter

more details, but this is just to give a status update.

  1381   Mon Mar 9 23:55:38 2009 OsamuDAQComputersbscteststand and kami1 outside martian

This morning there was a confliction of tpman running on fb40m and kami1. Alex fixed it temporary but Rana suggested it was better to move both PCs outside martian. We moved both PCs physically to the control room and connected to general network with a local router. I believe it won't conflict anymore but if you guess these PC might have trouble please feel free to shutdown.

 

Today's work summary:

 *connected expansion chassis to bscteststand

 *obtained signals on dataviewer, dtt for both realtime and past data on bscteststand with 64kHz timing signal

 

Questions:

Excitation channels are not shown, only "other" is shown.

qts.mdl should run with 16kHz but 16kHz timing causes a slow speed on dataviewer and failing data aquisition on dtt. We are using 64kHz timing but is it really correct?

  1404   Sun Mar 15 21:50:29 2009 Kakeru, Kiwamu, OsamuUpdateComputersSome computers are rebooted

We found c1lsc, c1iscex, c1iscey, c1susvme, c1asc and c1sosvme are dead.
We  turned off all watchdogs and turned off all lock of suspensions.
Then, I tried to reboot these machines from terminal, but I couldn't login to all of these machines.

So, we turned off and on key switches of these machines physically, and login to them to run startup scripts.

Then we turned on all watchdogs and restored all IFO.

Now they look like they are working fine.
 

  1457   Tue Apr 7 21:39:57 2009 YoichiConfigurationComputersLSC code recompiled with a fix for denormalization problem
This is not my work but I will put it for the record.

A few days ago, Rob recompiled the LSC code with the fix of the denormalization problem provided by Alex.
Since then, the LSC code has been working fine. I recognize that c1lsc is now less loaded.

I believe Rob only recompiled the LSC code, so there could still be the problem in the suspension controllers.
  1460   Wed Apr 8 18:18:33 2009 ranaConfigurationComputersLSC code recompiled with a fix for denormalization problem
Below is the link to the anti-denormalization technique that Rolf and Alex implemented at the sites,
that was pointed out by Chris Wipf from MIT:

http://www.musicdsp.org/files/denormal.pdf
  1467   Fri Apr 10 01:24:08 2009 ranaUpdateComputersallegra update (sort of)

I tried to play an .avi file on allegra. In a normal universe this would be easy, but because its linux I was foiled.

The default video player (Totem) doesn't play .avi or .wmv format. The patches for this work in Suse but not Fedora. Kubuntu but not CentOS, etc.I also tried installing Kplayer, Kaffeine, mplayer, xine, Aktion, Realplay, Helix, etc. They all had compatibility issues with various things but usuallylibdvdread or some gstreamer plugin.So I pressed the BIG update button. This has now started and allegra may never recover. The auto update wouldn't work in default mode becauseof the libdvdread and gstreamer-ugly plugins, so I unchecked those boxes. I think we're going to have this problem as long as we used any kind ofadvanced gstreamer stuff for the GigE cameras (which is unavoidable).

 

  1474   Sun Apr 12 01:19:30 2009 YoichiConfigurationComputersNew FE codes for suspensions not successful
Alex recompiled the suspension FE codes for c1susvme1 and c1susvme2 to fix the denormalization problem.
The new modules are in
/cvs/cds/caltech/users/alex/cds/rts/src/fe/40m/losLinux1.o
/cvs/cds/caltech/users/alex/cds/rts/src/fe/40m/losLinux2.o

I tried them today, but c1susvme1 did not work with the new code while c1susvme2 seemed to run ok.
So I reverted the modules (losLinux1.o and losLinux2.o) to the original ones.
The original modules are also backed up as losLinux1.o.11Apr09 and losLinux2.o.11Apr09 in the corresponding target directories.

I reported the problem to Alex.
  1479   Mon Apr 13 18:57:03 2009 AlbertoFrogsComputersGPIB/ETH Interface Troubles

I really don't understand why my programs that I used to use to get data from the HP Spectrum Analyzer and the Marconi frequency generator don't work anymore.

I spent hours trying to debug the code but I can't sort the problem out.

The main problem seem to be with the function recv from the socket library. Somehow it can't anymore get any data from the instruments. The thing I can't understand, though, is that if called directly from the python terminal it works fine!

In particular the problem is with the following lines in my code:

netSock.send("mkpk;mka?\n")
netSock.send("++read eoi\n")
tmp = netSock.recv(1024)

Tried a lot of tickering but it didn't work.

I attach the two scripts I've been using. One (sweepfrequencyPRC.py) calls the other (HP4395PRC.py).

They worked egregiously for weeks in the past. Don't know what happened since then.

Attachment 1: sweepfrequencyPRC.py
## sweepfrequency.py [-f filename] [-i ip_address] [-a startFreq] [-z endFreq] [-s stepFreq] [-m numAvg]
#
## This script sweeps the frequency of a Marconi local oscillator, within the range
## delimited by startFreq and endFreq, with a step set by stepFreq. An arbitary
## signal is monitored on a HP8590 spectrum analyzer and the scripts records the
## amplitude of the spectrum at the frequency injected by the Marconi at the moment.

## The GPIB address of the Marconi is assumed to be 17, that of the HP Spectrum Analyzer to be 18

## Alberto Stochino, October 2008
... 53 more lines ...
Attachment 2: HP8590PRC.py
# This function provides the measuremeent of the peak amplitude on the spectrum analyzer
# HP8590 analyzer while sweeping the excitation frequency on the function generator.
#
# Alberto Stochino 2008

import re
import sys
import math
from optparse import OptionParser
from socket import *
... 70 more lines ...
ELOG V3.1.3-