ID |
Date |
Author |
Type |
Category |
Subject |
1294
|
Wed Feb 11 15:01:47 2009 |
josephb | Configuration | Computers | Allegra |
So after having broke Allegra by updating the kernel, I was able to get it running again by copying the xorg.conf.backup file over xorg.conf in /etc/X11. So at this point in time, Allegra is running with generic video drivers, as opposed to the ATI specific and proprietary drivers. |
1303
|
Sat Feb 14 16:15:19 2009 |
rob | Configuration | Computers | c1susvme1 |
c1susvme1 is behaving weirdly. I've restarted it several times but its computation time is hanging out around 260 usec, making it useless for suspension control and locking. I also found a PS/2 keyboard plugged in, which doesn't work, so I unplugged it. It needs to be plugged into a PS/2 keyboard/mouse Y-splitter cable. |
1307
|
Mon Feb 16 00:43:46 2009 |
rana | Update | Computers | medm directory wiped on nodus |
I accidentally did an 'rm -rf' on the medm directory in nodus, instead of on my laptop as was intended.
I then did an svn checkout. So everything should be current as of the last update, but I am sure that
we have not done a checkin on all of the latest screen enhancements. So...we may have to revert to the
Sunday morning tar to get the latest changes back. |
1310
|
Mon Feb 16 15:54:07 2009 |
Yoichi | Update | Computers | medm directory wiped on nodus |
Quote: | I accidentally did an 'rm -rf' on the medm directory in nodus, instead of on my laptop as was intended.
I then did an svn checkout. So everything should be current as of the last update, but I am sure that
we have not done a checkin on all of the latest screen enhancements. So...we may have to revert to the
Sunday morning tar to get the latest changes back. |
Indeed, some changes to the medm directory I made were lost.
It was my fault not to check-in those changes.
I asked Alan to restore the directory from the daily rsync backup.
However, the backup job executed this morning have already overwritten the previous (good) backup with the current (bad) medm directory, which Rana restored from the svn. Alan will ask Stuart and Phil if there is still older backup remaining somewhere.
Anyway, I realized that we should stop the backup cron job whenever you think you made a mistake on /cvs/cds/ directory to prevent unwanted overwriting.
The procedure is:
(1) Login to fb40m
(2) Type 'crontab -e'. Emacs will open up in the terminal.
(3) Comment out the backup job (insert # at the beginning of the line containing /cvs/cds/caltech/scripts/backup/rsync.backup ).
(4) Save the file (Ctrl-x Ctrl-s) and exit (Ctrl-x Ctrl-c).
I will post this information on the wiki. |
1311
|
Mon Feb 16 16:26:29 2009 |
rob | Update | Computers | medm directory wiped on nodus |
Quote: |
Quote: | I accidentally did an 'rm -rf' on the medm directory in nodus, instead of on my laptop as was intended.
I then did an svn checkout. So everything should be current as of the last update, but I am sure that
we have not done a checkin on all of the latest screen enhancements. So...we may have to revert to the
Sunday morning tar to get the latest changes back. |
Indeed, some changes to the medm directory I made were lost.
It was my fault not to check-in those changes.
I asked Alan to restore the directory from the daily rsync backup.
However, the backup job executed this morning have already overwritten the previous (good) backup with the current (bad) medm directory, which Rana restored from the svn. Alan will ask Stuart and Phil if there is still older backup remaining somewhere.
Anyway, I realized that we should stop the backup cron job whenever you think you made a mistake on /cvs/cds/ directory to prevent unwanted overwriting.
The procedure is:
(1) Login to fb40m
(2) Type 'crontab -e'. Emacs will open up in the terminal.
(3) Comment out the backup job (insert # at the beginning of the line containing /cvs/cds/caltech/scripts/backup/rsync.backup ).
(4) Save the file (Ctrl-x Ctrl-s) and exit (Ctrl-x Ctrl-c).
I will post this information on the wiki. |
We should change the rsync script so that it does not delete stuff. Maybe it can keep deleted stuff for 6 months or something. |
1318
|
Wed Feb 18 03:25:25 2009 |
Yoichi | Update | Computers | medm directory back |
I restored the medm directory from the backup on the tape.
The directory had an svn property svn:ignore set and the value of the property included *.snap and *.req.
This resulted in the exclusion of those files from the repository.
I fixed this problem by changing the property of all the directories under /cvs/cds/caltech/medm.
After fixing several other svn problems, the current medm directory contents were checked in to the repository. |
1320
|
Wed Feb 18 19:13:20 2009 |
rana | Configuration | Computers | SVN & MEDM & old medm files |
Allegra had a 2 year old version of SVN installed and CentOS (yum) couldn't upgrade it, so I did an 'svn remove subversion'
and then a 'svn install subversion' to get us up to the Dec '08 version (1.5.5) which is the latest stable.
I also removed all of the old ASS medm directories without backing them up. There's a new RCG script version which is
fixed so that it no longer dumps these old medm directories in there; there's no need since there's already an
medm archive area.
I also removed the medm/old/ directory, did an svn remove, and then copied it back. This is the only way I know of
removing something from the repository without removing it from the working directory. |
1325
|
Thu Feb 19 16:29:43 2009 |
Yoichi | Update | Computers | Martian wireless router bad |
The Martian wireless router is dead.
I rebooted it several times, but it hangs up in a minute.
I will ask steve to buy a new one. |
1338
|
Thu Feb 26 00:36:53 2009 |
Yoichi | Summary | Computers | C1:LSC-TRX_OUT broken (and fixed later). |
Today, Kakeru tried to convert C1:LSC-TRX_OUT and C1:LSC-TRY_OUT to DAQ channels.
He edited C1LSC.ini in the chans/daq directory to add the channel but it did not work.
Then he reverted the file back to the original one.
But after we still could not access these channels from dataviewer nor tds tools.
We restarted daqd and tpman on fb40m, but the problem persisted. Even rebooting the whole fb40m did not help.
After inspecting the log file of daqd, it was clear that tpman was failing to create test points for those channels.
I rebooted c1daqawg and then restarted tpman and daqd on fb40m again.
This time, the problem went away. |
1339
|
Thu Feb 26 01:24:44 2009 |
Yoichi | Update | Computers | Martian wireless is back |
Today, a new wireless router arrived.
I configured and installed it. Now the martian wireless network is back.
I updated the wiki page about the wireless network.
http://lhocds.ligo-wa.caltech.edu:8000/40m/Network |
1342
|
Thu Feb 26 20:09:32 2009 |
Yoichi | HowTo | Computers | SR785 python scripts now produce plots |
I updated the python scripts to remotely perform measurements with an SR785.
Now these scripts can plot the results immediately using python's matplotlib capability. The sample plots can be seen in my previous elog entry.
In addition to the transfer function (TFSR785.py) and spectrum measurement (SPSR785.py) scripts, I also wrote a script for time series measurements (TSSR785.py).
This is useful when you want to check the signal level flowing in the channels before determining the excitation amplitude.
TSSR785.py will measure and show the time series and histogram of the signal measured by the SR785.
More detailed usage is explained in this wiki page:
http://lhocds.ligo-wa.caltech.edu:8000/40m/netgpib_package |
1349
|
Tue Mar 3 11:39:50 2009 |
Osamu | DAQ | Computers | 2 PCs in Martian |
Kiwamu and I brought 2 SUPER MICRO PCs from Willson house into 40m.
Both PCs are hooked up into Martian network. One is named as bscteststand for BSC which has been set up by Cds people and another one is named kami1 for temporary use for CLIO which is a bland new, no operating installed PC. This bland new PC will be returned Cds or 40m once another new PC which we will order within several days arrives.
IP address for each machine is 131.215.113.83 and 131.215.113.84 respectively.
We have installed CentOS5.2 into the new PC. |
1356
|
Wed Mar 4 17:59:14 2009 |
Yoichi | Configuration | Computers | ezca tools and tds tools work around |
Some of ezca commands and tds commands sporadically fail with a segmentation fault on linux machines.
As far as I know, ezcawrite, ezcastep, ezcaswitch, and tdswrite have this problem.
These are commands to write values into epics channels. So usually people do not check the exit status of those commands in their scripts.
This could cause incomplete execution of, for example, down scripts.
Ideally, this problem should be fixed in the source codes of the problematic commands.
However, I don't have a patience to wait it to happen, and I needed to fix these problems immediately for the lock acquisition.
So I resorted to a hacky solution.
I renamed those commands to *.bin, e.g. ezcawrite -> ezcawrite.bin.
Then wrote wrapper scripts to repeatedly call those commands until it succeeds.
For example, ezcawrite now looks like,
#!/bin/csh -f
setenv POSIXLY_CORRECT
while (! { ezcawrite.bin $* })
echo "Retry $0 $*"
end
So, when ezcawrite.bin fails, the command retries it and show a message "Retry ....".
If you need to call the original commands, you can always do so by adding ".bin" at the end of the command name.
Currently the following commands are wrapped.
ezcawrite, ezcaservo, ezcastep, ezcaswitch, tdswrite, tdssine.
Please let me know if you have any trouble with this. |
1360
|
Thu Mar 5 02:24:19 2009 |
rana | Configuration | Computers | yum.repos.d |
I added the following repos which I found on allegra to megatron and then did a 'yum install sshfs' on both machines:
allegra:yum.repos.d>l
total 28
-rw-r--r-- 1 root root 428 Feb 12 16:47 rpmforge.repo
-rw-r--r-- 1 root root 684 Feb 12 16:47 mirrors-rpmforge
-rw-r--r-- 1 root root 1054 Feb 12 16:47 epel-testing.repo
-rw-r--r-- 1 root root 954 Feb 12 16:47 epel.repo
-rw-r--r-- 1 root root 626 Feb 12 16:47 CentOS-Media.repo
-rw-r--r-- 1 root root 1869 Feb 12 16:47 CentOS-Base.repo
-rw-r--r-- 1 root root 179 Feb 12 16:47 adobe-linux-i386.repo
This also required me to import the rpmforge GPG key:
sudo rpm --import http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt |
1362
|
Thu Mar 5 23:18:38 2009 |
Kakeru | Configuration | Computers | tdsdata doesn't work |
I found that tdsdata doesn't work.
When I star tdsdata, he takes a few ~ 10 seconds of data, and he dies with a message "Segmentation fault".
I tried to get data for some times and some channels, and this problem was observed everytime.
I also tried tdsdata on allegra, op440m and mafalda, and it didn't work on all of them.
Yesterday, I got a new version of tdsdata (which modified the problem of Message ID: 1328) and tried to build
thme on my directory (/cvs/cds/caltech/users/kakeru.....)
This may have some relation to this problem. |
1366
|
Fri Mar 6 18:14:58 2009 |
Yoichi | Update | Computers | awg not working |
Starting from this afternoon, the awg is not working.
I rebooted FE computers, c0daqawg as well as tpman and daqd processes on fb40m several times.
But the problem is still there.
I sent an email to Alex. |
1367
|
Fri Mar 6 18:22:42 2009 |
Yoichi | Summary | Computers | Scripts to restart the FE computers |
While doing locking, the FE computers are overloaded sometimes and I have to reboot them.
Being sick of logging into the FE computers one by one to start front end codes, I wrote scripts to do this automatically.
The scripts are in /cvs/cds/caltech/scripts/FE/.
For example, you can restart c1lsc by typing
restartFE c1lsc
You can give multiple computer names to the restartFE command like,
restartFE c1lsc c1asc c1susvme1
To restart all the FE computers, type
restartFE all
For the scripts to work properly, the computers have to accept login, i.e. you either have to power cycle the computers or push "Reset" buttons on the RFMNETWORK medm screen prior to running the scripts. |
1368
|
Fri Mar 6 18:26:37 2009 |
Yoichi | Configuration | Computers | ezca tools and tds tools work around |
I updated the wrapper scripts so that they do not retry more than 6 times.
Otherwise, the wrapper scripts loop over infinitely when you give wrong arguments.
Quote: | Some of ezca commands and tds commands sporadically fail with a segmentation fault on linux machines.
As far as I know, ezcawrite, ezcastep, ezcaswitch, and tdswrite have this problem.
These are commands to write values into epics channels. So usually people do not check the exit status of those commands in their scripts.
This could cause incomplete execution of, for example, down scripts.
Ideally, this problem should be fixed in the source codes of the problematic commands.
However, I don't have a patience to wait it to happen, and I needed to fix these problems immediately for the lock acquisition.
So I resorted to a hacky solution.
I renamed those commands to *.bin, e.g. ezcawrite -> ezcawrite.bin.
Then wrote wrapper scripts to repeatedly call those commands until it succeeds.
For example, ezcawrite now looks like,
#!/bin/csh -f
setenv POSIXLY_CORRECT
while (! { ezcawrite.bin $* })
echo "Retry $0 $*"
end
So, when ezcawrite.bin fails, the command retries it and show a message "Retry ....".
If you need to call the original commands, you can always do so by adding ".bin" at the end of the command name.
Currently the following commands are wrapped.
ezcawrite, ezcaservo, ezcastep, ezcaswitch, tdswrite, tdssine.
Please let me know if you have any trouble with this. |
|
1369
|
Sat Mar 7 16:50:25 2009 |
Yoichi | Update | Computers | Not even data retrieval working |
Now our digital system is really in trouble.
We can't even get data from tp channels.
I did another round of computer reboots, this time including the RFM bypass switch, c0daqctrl, c0dcu1 and fb40m itself.
But the problem still persists.
I guess there is nothing I can do until Alex comes in. |
1370
|
Sun Mar 8 23:09:26 2009 |
rana | Update | Computers | Not even data retrieval working |
Although getting the regular DAQ data works, we can't get any testpoints.
I tried restarting tpman several times; there's no inittab on fb40m for this so we should get Alex to set one up when he comes.
I also tried various power cycles and reboots: daqawg, daqctrl, etc. I also notice that Osamu's setup of new stuff is connected to
the same rack and power strips as all of our sensitive DAQ machines. We should find out if there was any hardware installed in the
last couple days; it would be easy to accidentally unplug or damage on of our fibers.
I moved the old tpman.log over to tpman.log.090308. It starts out with a header and then just lists when each TP is requested.
When restarting tpman it puts the following into the terminal:fb:controls>./tpman &
[1] 1037
fb:controls>VMIC RFM 5565 (0) found, mapped at 0x2868c90
VMIC RFM 5579 (1) found, mapped at 0x2868c90
Could not open 5565 reflective memory in /dev/daqd-rfm1
16 kHz system
Spawn testpoint manager
Channel list length for node 0 is 4168
Test point manager (31001001 / 1): node 0 which is OK?; its the same startup outputs that are in the old log file. It would be nice if there was not and error message about the RFM.
Requesting new testpoints via tdsdata, dtt, or the diag command line doesn't seem to work. tpman doesn't spit anything out although 'tp show 0'
does show that the TP is selected.
Once Alex fixes the 'tpman' issue, we should make sure to put an inittab or startup script in there so that tpman writes a log
file and also archives its old log files upon a restart. |
1372
|
Mon Mar 9 10:59:05 2009 |
Alan | Omnistructure | Computers | ssh agent on fb40m restarted for backup |
After the boot-fest, the nightly backup to Powell-Booth failed, and an automatic email got sent to me. I restarted the ssh agent, following the instructions in /cvs/cds/caltech/scripts/backup/000README.txt . |
1373
|
Mon Mar 9 11:09:33 2009 |
Alberto | Update | Computers | Re: Not even data retrieval working |
Quote: | Although getting the regular DAQ data works, we can't get any testpoints.
I tried restarting tpman several times; there's no inittab on fb40m for this so we should get Alex to set one up when he comes.
I also tried various power cycles and reboots: daqawg, daqctrl, etc. I also notice that Osamu's setup of new stuff is connected to
the same rack and power strips as all of our sensitive DAQ machines. We should find out if there was any hardware installed in the
last couple days; it would be easy to accidentally unplug or damage on of our fibers.
I moved the old tpman.log over to tpman.log.090308. It starts out with a header and then just lists when each TP is requested.
When restarting tpman it puts the following into the terminal:fb:controls>./tpman &
[1] 1037
fb:controls>VMIC RFM 5565 (0) found, mapped at 0x2868c90
VMIC RFM 5579 (1) found, mapped at 0x2868c90
Could not open 5565 reflective memory in /dev/daqd-rfm1
16 kHz system
Spawn testpoint manager
Channel list length for node 0 is 4168
Test point manager (31001001 / 1): node 0 which is OK?; its the same startup outputs that are in the old log file. It would be nice if there was not and error message about the RFM.
Requesting new testpoints via tdsdata, dtt, or the diag command line doesn't seem to work. tpman doesn't spit anything out although 'tp show 0'
does show that the TP is selected.
Once Alex fixes the 'tpman' issue, we should make sure to put an inittab or startup script in there so that tpman writes a log
file and also archives its old log files upon a restart. |
Alex fixed the problem. It was caused by the awgtpman running on kami1.martian which conflicted with the tpman in fb0.
Killing awgtpman on kami1 allowed for the tpman on tp0 to work properly again.
If more test points are needed, Alex suggested to tune the GDS settings accordingly.
What this actually means, I still have to understand it. |
1374
|
Mon Mar 9 12:04:18 2009 |
Yoichi | Update | Computers | TPs and AWG are back |
I had to do one more reboot of tpman and daqd to get the TPs working.
I confirmed the alignment scripts run fine.
Now the oplevs of some optics are largely mis-centered. Alberto and I will center them after lunch. |
1378
|
Mon Mar 9 19:27:16 2009 |
rana | Configuration | Computers | Move of the CLIO Digital Controls test setup |
Because of the network interference we've had from the CLIO system for the past 3-4 days, I asked the guys to remove
the test stand from the 40m lab area. It is now in the 40m control room. Since it needed an ethernet connection to get out
for some reason we've let them hook into GC. Also, instead of using a real timing signal slaved to the GPS, Jay suggested
just skipping it and having the Timing Slave talk to itself by looping back the fiber with the timing signal. Osamu will enter
more details, but this is just to give a status update. |
1381
|
Mon Mar 9 23:55:38 2009 |
Osamu | DAQ | Computers | bscteststand and kami1 outside martian |
This morning there was a confliction of tpman running on fb40m and kami1. Alex fixed it temporary but Rana suggested it was better to move both PCs outside martian. We moved both PCs physically to the control room and connected to general network with a local router. I believe it won't conflict anymore but if you guess these PC might have trouble please feel free to shutdown.
Today's work summary:
*connected expansion chassis to bscteststand
*obtained signals on dataviewer, dtt for both realtime and past data on bscteststand with 64kHz timing signal
Questions:
Excitation channels are not shown, only "other" is shown.
qts.mdl should run with 16kHz but 16kHz timing causes a slow speed on dataviewer and failing data aquisition on dtt. We are using 64kHz timing but is it really correct? |
1404
|
Sun Mar 15 21:50:29 2009 |
Kakeru, Kiwamu, Osamu | Update | Computers | Some computers are rebooted |
We found c1lsc, c1iscex, c1iscey, c1susvme, c1asc and c1sosvme are dead.
We turned off all watchdogs and turned off all lock of suspensions.
Then, I tried to reboot these machines from terminal, but I couldn't login to all of these machines.
So, we turned off and on key switches of these machines physically, and login to them to run startup scripts.
Then we turned on all watchdogs and restored all IFO.
Now they look like they are working fine.
|
1457
|
Tue Apr 7 21:39:57 2009 |
Yoichi | Configuration | Computers | LSC code recompiled with a fix for denormalization problem |
This is not my work but I will put it for the record.
A few days ago, Rob recompiled the LSC code with the fix of the denormalization problem provided by Alex.
Since then, the LSC code has been working fine. I recognize that c1lsc is now less loaded.
I believe Rob only recompiled the LSC code, so there could still be the problem in the suspension controllers. |
1460
|
Wed Apr 8 18:18:33 2009 |
rana | Configuration | Computers | LSC code recompiled with a fix for denormalization problem |
Below is the link to the anti-denormalization technique that Rolf and Alex implemented at the sites,
that was pointed out by Chris Wipf from MIT:
http://www.musicdsp.org/files/denormal.pdf |
1467
|
Fri Apr 10 01:24:08 2009 |
rana | Update | Computers | allegra update (sort of) |
I tried to play an .avi file on allegra. In a normal universe this would be easy, but because its linux I was foiled.
The default video player (Totem) doesn't play .avi or .wmv format. The patches for this work in Suse but not Fedora. Kubuntu but not CentOS, etc.I also tried installing Kplayer, Kaffeine, mplayer, xine, Aktion, Realplay, Helix, etc. They all had compatibility issues with various things but usuallylibdvdread or some gstreamer plugin.So I pressed the BIG update button. This has now started and allegra may never recover. The auto update wouldn't work in default mode becauseof the libdvdread and gstreamer-ugly plugins, so I unchecked those boxes. I think we're going to have this problem as long as we used any kind ofadvanced gstreamer stuff for the GigE cameras (which is unavoidable).
|
1474
|
Sun Apr 12 01:19:30 2009 |
Yoichi | Configuration | Computers | New FE codes for suspensions not successful |
Alex recompiled the suspension FE codes for c1susvme1 and c1susvme2 to fix the denormalization problem.
The new modules are in
/cvs/cds/caltech/users/alex/cds/rts/src/fe/40m/losLinux1.o
/cvs/cds/caltech/users/alex/cds/rts/src/fe/40m/losLinux2.o
I tried them today, but c1susvme1 did not work with the new code while c1susvme2 seemed to run ok.
So I reverted the modules (losLinux1.o and losLinux2.o) to the original ones.
The original modules are also backed up as losLinux1.o.11Apr09 and losLinux2.o.11Apr09 in the corresponding target directories.
I reported the problem to Alex. |
1479
|
Mon Apr 13 18:57:03 2009 |
Alberto | Frogs | Computers | GPIB/ETH Interface Troubles |
I really don't understand why my programs that I used to use to get data from the HP Spectrum Analyzer and the Marconi frequency generator don't work anymore.
I spent hours trying to debug the code but I can't sort the problem out.
The main problem seem to be with the function recv from the socket library. Somehow it can't anymore get any data from the instruments. The thing I can't understand, though, is that if called directly from the python terminal it works fine!
In particular the problem is with the following lines in my code:
netSock.send("mkpk;mka?\n")
netSock.send("++read eoi\n")
tmp = netSock.recv(1024)
Tried a lot of tickering but it didn't work.
I attach the two scripts I've been using. One (sweepfrequencyPRC.py) calls the other (HP4395PRC.py).
They worked egregiously for weeks in the past. Don't know what happened since then. |
Attachment 1: sweepfrequencyPRC.py
|
## sweepfrequency.py [-f filename] [-i ip_address] [-a startFreq] [-z endFreq] [-s stepFreq] [-m numAvg]
#
## This script sweeps the frequency of a Marconi local oscillator, within the range
## delimited by startFreq and endFreq, with a step set by stepFreq. An arbitary
## signal is monitored on a HP8590 spectrum analyzer and the scripts records the
## amplitude of the spectrum at the frequency injected by the Marconi at the moment.
## The GPIB address of the Marconi is assumed to be 17, that of the HP Spectrum Analyzer to be 18
## Alberto Stochino, October 2008
... 53 more lines ...
|
Attachment 2: HP8590PRC.py
|
# This function provides the measuremeent of the peak amplitude on the spectrum analyzer
# HP8590 analyzer while sweeping the excitation frequency on the function generator.
#
# Alberto Stochino 2008
import re
import sys
import math
from optparse import OptionParser
from socket import *
... 70 more lines ...
|
1481
|
Tue Apr 14 12:10:11 2009 |
Alberto | Frogs | Computers | GPIB/ETH Interface Troubles |
Quote: |
I really don't understand why my programs that I used to use to get data from the HP Spectrum Analyzer and the Marconi frequency generator don't work anymore.
I spent hours trying to debug the code but I can't sort the problem out.
The main problem seem to be with the function recv from the socket library. Somehow it can't anymore get any data from the instruments. The thing I can't understand, though, is that if called directly from the python terminal it works fine!
In particular the problem is with the following lines in my code:
netSock.send("mkpk;mka?\n")
netSock.send("++read eoi\n")
tmp = netSock.recv(1024)
Tried a lot of tickering but it didn't work.
I attach the two scripts I've been using. One (sweepfrequencyPRC.py) calls the other (HP4395PRC.py).
They worked egregiously for weeks in the past. Don't know what happened since then.
|
This morning Joe looked at my code and made me notice that for some reason the query to the Spectrum Analyzer made by netSock.recv(1024) contained two answers. It was like the buffer contained the answer two different queries.
After some experiment I found that basically the GPIB interface wasn't switching from the "auto 1" to the "auto 0" mode as it should. I rewrote part of the code and that seemed have solved the problem.
Still don't understand why it used to work in the past and then it stopped. |
1483
|
Wed Apr 15 02:18:42 2009 |
rana | Configuration | Computers | nodus vfstab changed for rigel |
nodus was hanging because it was trying to mount the cit40m account from rigel and rigel was not responding.
Neither I nor Yoichi can recall what the cit40m account does for us on nodus anymore and so I commented it out of the nodus /etc/vfstab.
nodus may still need a boot to make it pay attention. I was unable to do a 'umount' to get rid of the rigel parasite. But mainly I don't want anything in
the 40m to depend on the LIGO GC system if at all possible. |
1508
|
Thu Apr 23 13:55:43 2009 |
josephb, peter | Update | Computers | RCG example |
We successfully compiled and installed the Real time Code Generator "Hello World" example (which is a skeleton for the ETMX suspension controller) on megatron. In order to get it to compile, we had to add a flag indicating the computer is stand alone, and not using a myrinet card at the moment. This was done by adding the shmem_daq = 1 flag to the cdsParameters module. The symptom was it was unable to find gm.h (and there is no installed /opt/gm directory).
It is called "sam". It was installed to /cvs/cds/caltech/target/sam, and produced medm screens in /cvs/cds/caltech/medm/c1/sam. As nothing points to these, I figure it won't harm any of the current configuration, but lets us play around a bit. If by some strange reason, these do cause problems, feel free to remove them. |
1551
|
Wed May 6 16:56:35 2009 |
rana, alex, joe | Configuration | Computers | daqd log, cront, etc. |
While Alex came over, we investigated the log file problems with DAQD and NDS on FB0. There was a lot of
the standard puzzling and mumbling, but eventually we saw that it doesn't create its log file and so it
doesn't write to it. The log file is /usr/controls/main_daqd.log. The other files called daqd.log.DATE
in the logs/ directory are actually not written to. Its awesome.
We also have put in a fix for the overflowing jobs/ directory. It gets a file written to it every time
you make and NDS request and our seisBLRMS has been overloading it. There's now a cron for it in the fb0
crontab which cleans out week-old files at 6:30 AM every day.
We also changed the time of the daily backup from 3:30 AM (when people are still working) to 5:50 AM
(by which time the seismic has ramped up and interferometerists should be asleep). I didn't like the
idea of a bandwidth hog nailing the framebuilder during the peak of interferometer work.
#
# Script to backup via rsync the most recent 40m minute trends and
# any changes to the /cvs/cds filesystem.
#
50 05 * * * /cvs/cds/caltech/scripts/backup/rsync.backup < /dev/null > /cvs/cds/caltech/scripts/\
backup/rsync.backup.log 2>&1
30 06 * * * find /usr/controls/jobs -mtime +7 -exec /bin/rm -f {} \;
seisBLRMS.m restarted on mafalda. |
1554
|
Thu May 7 12:21:36 2009 |
josephb, alex | Configuration | Computers | fb40m |
Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.
We brought it down and up several times between 11 and noon. We eventually were able to daisy chain the old raid and the new raid so that fb40m sees both. At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available). We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours). In the meantime, operations should not be affected. If it is, contact one of us.
|
1555
|
Thu May 7 15:22:19 2009 |
josephb, alberto | Configuration | Computers | fb40m |
Quote: |
Having determined that Rana (the computer) was having to many issues with testing the new Raid array due to age of the system, we proceeded to test on fb40m.
We brought it down and up several times between 11 and noon. We eventually were able to daisy chain the old raid and the new raid so that fb40m sees both. At this time, the RAID arrays are still daisy chained, but the computer is setup to run on just the original raid, while the full 14 TB array is initialized (16 drives, 1 hot spare, RAID level 5 means 14 TB out of the 16 TB are actually available). We expect this to take a few hours, at which point we will copy the data from the old RAID to the new RAID (which I also expect to take several hours). In the meantime, operations should not be affected. If it is, contact one of us.
|
This afternoon the alignment script chrashed after returning sysntax errors. We found that the tpman wasn't running on the framebuilder becasue it had probably failed to get restarted in one of the several reboots executed in the morning by Alex and Jo.
Restarting the tpman was then sufficient for the alignment scripts to get back to work. |
1564
|
Fri May 8 10:05:40 2009 |
Alan | Omnistructure | Computers | Restarted backup since fb40m was rebooted |
Restarted backup since fb40m was rebooted. |
1574
|
Mon May 11 12:25:03 2009 |
josephb,Alex | Update | Computers | fb40m down for patching |
The 40m frame builder is currently being patched to be able utilize the full 14 TB of the new raid array (as opposed to being limited to 2 TB). This process is expected to take several hours, during which the frame builder will be unavailable. |
1589
|
Fri May 15 14:05:14 2009 |
Dmass | HowTo | Computers | How To: Crash the Elog |
The Elog started crashing last night. It turns out I was the culprit, and whenever I tried to upload a certain 500kb .png picture, it would die. It has happened both when choosing "upload" of a picture, and when choosing "submit" after successfully uploading a picture. Both culprits were ~500kb .png files. |
1622
|
Fri May 22 17:05:24 2009 |
rob, pete | Update | Computers | hard reboot of vertex suspension controllers |
we did a hard reboot of c1susvme1, c1susvme2, c1sosvme, and c1susaux. We are hoping this will fix some of the weird suspension issues we've been having (MC3 side coil, ITMX alignment). |
1623
|
Sun May 24 11:24:08 2009 |
rob | Update | Computers | elog restarted |
I just restarted the elog. It was crashed for unknown reasons. The restarting instructions are in the wiki. |
1634
|
Sat May 30 12:36:52 2009 |
rob | Update | Computers | c1susvme2, c1iscex running late |
c1susvme2 has been running just a bit late for about a week. I rebooted it.
The plot shows SRM_FE_SYNC, which is the number of times in the last second that c1susvme2 was late for the 16k cycle. Similarly for ETMX.
|
Attachment 1: srmsync.jpg
|
|
Attachment 2: etmxsync.jpg
|
|
1635
|
Mon Jun 1 13:25:00 2009 |
rob | Update | Computers | c1susvme2, c1iscex running late |
Quote: |
c1susvme2 has been running just a bit late for about a week. I rebooted it.
The plot shows SRM_FE_SYNC, which is the number of times in the last second that c1susvme2 was late for the 16k cycle. Similarly for ETMX.
|
The reboot appears to have worked. |
Attachment 1: doublesync.jpg
|
|
1642
|
Tue Jun 2 23:12:08 2009 |
rob | Configuration | Computers | ntp on op440m |
I restarted ntpd on op440m to solve a "synchronization error" that we were having in DTT. I also edited the config file (/etc/inet/ntp.conf) to remove the lines referring to rana as an ntp server; now only nodus is listed.
To do this:
log in as root
/usr/local/bin/ntpd -c /etc/inet/ntp.conf |
1643
|
Tue Jun 2 23:53:12 2009 |
pete | DAQ | Computers | reset c1susvme1 |
rob, alberto, rana, pete
we reset this computer, which was out of sync (16384 in the FE_SYNC field instead of 0) |
1657
|
Fri Jun 5 16:45:28 2009 |
rob, pete | HowTo | Computers | tdsavg failure in cm_step script |
Quote: |
Quote: |
the command
tdsavg 5 C1:LSC-PD4_DC_IN1
was causing grievous woe in the cm_step script. It turned out to fail intermittently at the command line, as did other LSC channels. (But non-LSC channels seem to be OK.) So we power cycled c1lsc (we couldn't ssh).
Then we noticed that computers were out of sync again (several timing fields said 16383 in the C0DAQ_RFMNETWORK screen). We restarted c1iscey, c1iscex, c1lsc, c1susvme1, and c1susvme2. The timing fields went back to 0. But the tdsavg command still intermittently said "ERROR: LDAQ - SendRequest - bad NDS status: 13".
The channel C1:LSC-SRM_OUT16 seems to work with tdsavg every time.
Let us know if you know how to fix this.
|
Did you try restarting the framebuilder?
What you type is in bold:
op440m> telnet fb40m 8087
daqd> shutdown
|
Restarting the framebuilder didn't work, but the problem now appears to be fixed.
Upon reflection, we also decided to try killing all open DTT and Dataviewer windows. This also involved liberal use of ps -ef to seek out and destroy all diag's, dc3's, framer4's, etc.
That may have worked, but it happened simultaneously to killing the tpman process on fb40m, so we can't be sure which is the actual solution.
To restart the testpoint manager:
what you type is in bold:
rosalba> ssh fb40m
fb40m~> pkill tpman
The tpman is actually immortal, like Voldemort or the Kurgan or the Cylons in the new BG. Truly slaying it requires special magic, so the pkill tpman command has the effect of restarting it.
In the future, we should make it a matter of policy to close DTTs and Dataviewers when we're done using them, and killing any unattended ones that we encounter.
|
1668
|
Thu Jun 11 14:54:18 2009 |
josephb, alberto | Update | Computers | Wireless network |
After poking around for a few minutes several facts became clear:
1) At least one GPIB interface has a hard ethernet connection (and does not currently go through the wireless).
2) The wireless on the laptop works fine, since it can connect to the router.
3) The rest of the martian network cannot talk to the router.
This led to me replugging the ethernet cord back into the wireless router, which at some point in the past had been unplugged. The computers now seem to be happy and can talk to each other.
|
1682
|
Wed Jun 17 01:07:50 2009 |
rob | Configuration | Computers | matapps on /cvs/cds |
I checked out a copy of matapps into /cvs/cds/caltech/apps/lscsoft so that I could find the matlab function strassign.m, which is necessary for some old mDV commands to run. I don't know why it became necessary or why it disappeared if it did. |
1683
|
Wed Jun 17 01:09:47 2009 |
rob | Update | Computers | /cvs/cds 91% full |
In /cvs/cds/caltech
1.6M 2008-8-15.pdf
2.9M 40mUpgradeOpticalLayoutPlan01.pdf
2.4M alh
19M apache
18G apps
11M archive
4.0K authorized_keys2
8.0K backup.notes
8.0K backup.notes~
1.9G build
62G burt
47M cds
13M cds40m
37M chans
70G conlog
52K crontab
12K cshrc.40m
12K cshrc.40m~
36M diag
1.4G dmt
8.2M framecpp-0.2.0
1.7M free_080730.pdf
57M gds
9.8G home
60K hooks
8.0K hosts.40m
4.0K id_rsa
10M iscmodeling
110M ldg-4.7
648M libs
4.0K log2.txt
224K logs
0 log.txt
238M medm
344M NB
148M NB_080304
211M NB_080307
401M NB40
1.2G noisebudget.071109
837M noisebudget.bak.20060623
3.5M oldtarget
123M root
5.7M savesets
208K schematics
655M scripts
13G scripts_archive
1.1M state
3.7G svn
4.0K svn-commit.tmp
7.3G target
295M target_archive
6.7M test
72K test.png
4.0K tmp
8.0K typescript
35G users
205M wind
|