ID |
Date |
Author |
Type |
Category |
Subject |
7259
|
Thu Aug 23 17:17:49 2012 |
Masha | Summary | Computers | Code Folder Status |
I cleaned up my directory (/users/masha) today. A lot of the files are just code that I experimented with, but the important files for training the classification neural network are in "neural_network_classification". The "EarthquakeData" subdirectory contains my entire dataset. Files of the form "GenerateRNNInput" are used to create input vector sets to the network, while files of the "*NeuralNetworkClassification* actually run the code that generates the neural network vectors for the classification code block in the c1pem model.
Also, the folder "feed_c", which can also be found in Den's directory, contains the neural network controller code we played around with. |
7362
|
Fri Sep 7 15:31:52 2012 |
Mike J. | Update | Computers | Sensoray back up |
Video Capture with the Sensoray works again. Pianosa just needed mplayer installed for it to play properly. |
Attachment 1: output_5.mp4
|
7364
|
Fri Sep 7 17:24:16 2012 |
Mike J. | Update | Computers | Sensoray Video Capture |
To capture video with the Sensoray, open the GUI (python ./demo.py), simply press "Save," enter a filename, and hit "Stop" when you wish to stop recording. If you want to change the video format, there is a dropdown menu labelled "Format." I recommend MP4 for standard video, and nv12 for RAW video. |
7365
|
Fri Sep 7 17:34:53 2012 |
Jenne | Update | Computers | Sensoray Video Capture |
Quote: |
To capture video with the Sensoray, open the GUI (python ./demo.py), simply press "Save," enter a filename, and hit "Stop" when you wish to stop recording. If you want to change the video format, there is a dropdown menu labelled "Format." I recommend MP4 for standard video, and nv12 for RAW video.
|
I also installed mplayer on rossa, so we can play the videos there.
Even though Mike won't admit it, the video stuff is all in /users/sensoray/ . I opened the demo.py from there, and it also works. |
7495
|
Sun Oct 7 12:11:00 2012 |
Aidan | Update | Computers | Rebooted cymac0 |
I rebooted cymac0 a couple of times. When I first got here it was just frozen. I rebooted it and then ran a model (x1ios). The machine froze the second time I ran ./killx1ios. I've rebooted it again. |
7498
|
Mon Oct 8 09:45:28 2012 |
jamie | Update | Computers | Rebooted cymac0 |
Quote: |
I rebooted cymac0 a couple of times. When I first got here it was just frozen. I rebooted it and then ran a model (x1ios). The machine froze the second time I ran ./killx1ios. I've rebooted it again.
|
For context, there's a is stand-alone cymac test system running at the 40m. It's not hooked up to anything, except for just being on the martian network (it's not currently mounting any 40m CDS filesystems, for instance). The machine is temporarily between the 1Y4 and 1Y5 racks. |
7552
|
Mon Oct 15 22:24:45 2012 |
Jenne | Update | Computers | Lots of new White :( |
Evan and I are starting to lock, and there is lots of new, unfortunate white stuff on several different screens.
C1:TIM-PACIFIC_STRING is gone, C1:IFO-STATE (MC state) is gone, C1:LSC-PZT..._requests are gone (all 4 of them), C1:PSL-FSS_FASTSWEEPTEST from the FSS screen is gone (although I'm not sure that that one is newly gone), lots of the WF AA lights on the LSC screen are gone.
Those are the things I find in a few minutes of not really looking around.
EDIT: IPPOS is also gone, so I can't see how my current alignment relates to old alignments. |
7570
|
Wed Oct 17 19:35:58 2012 |
Koji | Update | Computers | Re: Lots of new White :( |
Solved. The power code of c1iscaux was loose.
Has anyone worked around the back side of 1Y3?
I looked into the problem. I went around the channel lists for each slow machines and found the variables are supported by c1iscaux
controls@pianosa:/cvs/cds/caltech/target/c1iscaux 0$ cd /cvs/cds/caltech/target/c1iscaux
controls@pianosa:/cvs/cds/caltech/target/c1iscaux 0$ grep C1:IF *
C1IFO_STATE.db:grecord(ai,"C1:IFO-STATE")
It seemed that the machine was not responding to ping. I went to 1Y3 and found the crate was off. Actually this is not correct.
The key was on but the power was off. I looked at the back and found the power code was loose from its inlet.
Once the code was pushed in and the crate was keyed, the white boxes got back online.
Just in case I burtrestored these slow channels by the snapshot at 6:07am on Sunday. |
7574
|
Thu Oct 18 08:00:40 2012 |
jamie | Update | Computers | Re: Lots of new White :( |
Quote: |
Solved. The power code of c1iscaux was loose.
Has anyone worked around the back side of 1Y3?
I looked into the problem. I went around the channel lists for each slow machines and found the variables are supported by c1iscaux
controls@pianosa:/cvs/cds/caltech/target/c1iscaux 0$ cd /cvs/cds/caltech/target/c1iscaux
controls@pianosa:/cvs/cds/caltech/target/c1iscaux 0$ grep C1:IF *
C1IFO_STATE.db:grecord(ai,"C1:IFO-STATE")
It seemed that the machine was not responding to ping. I went to 1Y3 and found the crate was off. Actually this is not correct.
The key was on but the power was off. I looked at the back and found the power code was loose from its inlet.
Once the code was pushed in and the crate was keyed, the white boxes got back online.
Just in case I burtrestored these slow channels by the snapshot at 6:07am on Sunday.
|
I was working around 1Y2 and 1Y3 when I wired the DAC in the c1lsc IO chassis in 1Y3 to the tip-tilt electronics in 1Y2. I had to mess around in the back of 1Y3 to get it connected. I obviously did not intend to touch anything else, but it's certainly possible that I did. |
7577
|
Fri Oct 19 00:55:35 2012 |
Jenne | Update | Computers | c1lsc is down (at least all of the models) |
When Evan and I were dithering the BS and ITMY (see his elog), I noticed that c1lsc was acting weird. the IOP was the only one with the blinky heartbeat. The IOP was all green lights, but all the other models had red for the fb connection, as well as the rightmost indicator (I don't know what that one is for). I logged on to c1lsc and ran 'rtcds restart all'. The script didn't get anywhere beyond saying it was beginning to stop the 1st model (sup, the bottom one on the lsc list). Then all of the cpus went white. I can still ping c1lsc, but I can't ssh to it.
I'm not sure what to do here Jamie. Heelp. |
7746
|
Mon Nov 26 18:56:34 2012 |
Jenne | HowTo | Computers | Data logging suggestions |
We've been talking for a while about how we want to store data. I'm not in love with keeping it on the elog, although I think we should always be able to reference and go back and forth between the elogs and the data.
I have made a new folder: /data EDIT: nevermind. I want it to be on the file system just like /users, but I don't know how to do that. Right now the folder is just on Ottavia. Jamie will help me tomorrow.
In this folder, we will save all of the data which goes into the elog.
I propose that we should have a common format for the names of the data files, so that we can easily find things.
My proposal is that one begins ones elog regarding the data to be saved, and submit it immediately after putting in the first ~sentence or so. One should then make a new folder inside the data folder with a title "elog#####_Anything_Else_You_Want" Then, data (which was originally saved in ones own users folder) should be copied into the /data/elog#####_AnythingElse/ folder. Also in that folder should be any Matlab scripts used to create the plots that you post in the elog. One should then edit the elog to continue making a regular, very thorough elog, including the path to the data. Elog should include all of the information about the measurement, state of the IFO (or whatever you were measuring), etc.
Riju will be alpha-testing this procedure tonight. EDIT: nevermind...see previous edit. |
7749
|
Tue Nov 27 00:26:00 2012 |
jamie | Omnistructure | Computers | Ubuntu update seems to have broken html input to elog on firefox |
After some system updates this evening, firefox can no longer handle the html input encoding for the elog. I'm not sure what happened. You can still use the "ELCode" or "plain" input encodings, but "HTML" won't work. The problem seems to be firefox 17. ottavia and rosalba were upgraded, while rossa and pianosa have not yet been.
I've installed chromium-browser (debranded chrome) on all the machines as a backup. Hopefully the problem will clear itself up with the next update. In the mean time I'll try to figure out what happened.
To use chromium: Appliations -> Internet -> Chromium |
7757
|
Wed Nov 28 17:40:28 2012 |
jamie | Omnistructure | Computers | elog working again on firefox 17 |
Koji and I figured out what the problem is. Apparently firefox 17.0 (specifically it's user-agent string) breaks fckeditor, which is the javascript toolbox the elog uses for the wysiwyg text editor. See https://support.mozilla.org/en-US/questions/942438.
The suspect line was in elog/fckeditor/editor/js/fckeditorcode_gecko.js. I hacked it up so that it stopped whatever crappy conditional user agent crap it was doing. It seems to be working now.
Edit by Koji: In order to make this change work, I needed to clear the cache of firefox from Tool/Clear Recent History menu. |
7786
|
Tue Dec 4 20:38:51 2012 |
jamie | Omnistructure | Computers | new (beta) version of nds2 installed on control room machines |
I've installed the new nds2 packages on the control room machines.
These new packages include some new and improved interfaces for python, matlab, and octave that were not previously available. See the documentation in:
/usr/share/doc/nds2-client-doc/html/index.html
for details on how to use them. They all work something like:
conn = nds2.connection('fb', 8088)
chans = conn.findChannels()
buffers = conn.fetch(t1, t2, {c1,...})
data = buffers(1).getData()
NOTE: the new interface for python is distinct from the one provided by pynds. The old pynds interface should continue to work, though.
To use the new matlab interface, you have to first issue the following command:
javaaddpath('/usr/lib/java')
I'll try to figure out a way to have that included automatically.
The old Matlab mex functions (NDS*_GetData, NDS*_GetChannel, etc.) are now provided by a new and improved package. Those should now work "out of the box". |
7788
|
Tue Dec 4 23:08:46 2012 |
Den | Omnistructure | Computers | new (beta) version of nds2 installed on control room machines |
Quote: |
I've installed the new nds2 packages on the control room machines.
|
I've tried new nds2 Java interface in Matlab. Using findChannels method of the connection class I see only slow, DQ and trend channels. I could even download data online using iterate method. When it will be possible to do the same with fast non-DQ channels?
>> conn = nds2.connection('fb', 8088);
>> conn.iterate({'C1:LSC-XARM_OUT'})
??? Java exception occurred:
java.lang.RuntimeException: No such channel.
at nds2.nds2JNI.connection_iterate__SWIG_0(Native Method)
at nds2.connection.iterate(connection.java:91)
|
7791
|
Wed Dec 5 09:42:46 2012 |
rana | Omnistructure | Computers | new (beta) version of NDS2 installed on control room machines |
NDS2 is not designed for non DQ channels - it gets data from the frames, not through NDS1.
For getting the non-DQ stuff, I would just continue using our NDS1 compatible NDS mex files (this is what is used in mDV). |
7793
|
Wed Dec 5 16:54:29 2012 |
jamie | Omnistructure | Computers | new (beta) version of NDS2 installed on control room machines |
Quote: |
NDS2 is not designed for non DQ channels - it gets data from the frames, not through NDS1.
For getting the non-DQ stuff, I would just continue using our NDS1 compatible NDS mex files (this is what is used in mDV).
|
The NDS2 protocol is not for non-DQ, but the NDS2 client is capable of talking both the NDS1 and NDS2 protocols.
fb:8088 is an NDS1 server, so the client is talking NDS1 to fb. It should therefore be capable of getting online data.
It doesn't seem to be seeing the online channels, though, so I'll work with Leo to figure out what's going on there.
The old mex functions, which like I said are now available, aren't capable of getting online data. |
7805
|
Mon Dec 10 16:28:13 2012 |
jamie | Omnistructure | Computers | progressive retrieval of online data now possible with the new NDS2 client |
Leo fixed an issue with the new nds2-client packages that was preventing it from retrieving online data. It's working now from matlab, python, and octave.
Here's an example of a dataviewer-like script in python:
#!/usr/bin/python
import sys
import nds2
from pylab import *
# channels are command line arguments
channels = sys.argv[1:]
conn = nds2.connection('fb', 8088)
fig = figure()
fig.show()
for bufs in conn.iterate(channels):
fig.clf()
for buf in bufs:
plot(buf.data)
draw()
|
|
7859
|
Wed Dec 19 20:18:51 2012 |
rana | Update | Computers | We are Changing the Passwerdz next week---- |
Be Prepared
http://xkcd.com/936/ |
7920
|
Sat Jan 19 15:05:37 2013 |
Jenne | Update | Computers | All front ends but c1lsc are down |
Message I get from dmesg of c1sus's IOP:
[ 44.372986] c1x02: Triggered the ADC
[ 68.200063] c1x02: Channel Hopping Detected on one or more ADC modules !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[ 68.200064] c1x02: Check GDSTP screen ADC status bits to id affected ADC modules
[ 68.200065] c1x02: Code is exiting ..............
[ 68.200066] c1x02: exiting from fe_code()
Right now, c1x02's max cpu indicator reads 73,000 micro seconds. c1x05 is 4,300usec, and c1x01 seems totally fine, except that it has the 02xbad.
c1x02 has 0xbad (not 0x2bad). All other models on c1sus, c1ioo, c1iscex and c1iscey all have 0x2bad.
Also, no models on those computers have 'heartbeats'.
C1x02 has "NO SYNC", but all other IOPs are fine.
I've tried rebooting c1sus, restarting the daqd process on fb, all to no avail. I can ssh / ping all of the computers, but not get the models running. Restarting the models also doesn't help.

c1iscex's IOP dmesg:
[ 38.626001] c1x01: Triggered the ADC
[ 39.626001] c1x01: timeout 0 1000000
[ 39.626001] c1x01: exiting from fe_code()
c1ioo's IOP has the same ADC channel hopping error as c1sus'.
|
7922
|
Sat Jan 19 18:23:31 2013 |
rana | Update | Computers | All front ends but c1lsc are down |
After sshing into several machines and doing 'sudo shutdown -r now', some of them came back and ran their processes.
After hitting the reset button on the RFM switch, their diagnostic lights came back. After restarting the Dolphin task on fb:
"sudo /etc/init.d/dis_networkmgr restart"
the Dolphin diagnostic lights came up green on the FE status screen.
iscex still wouldn't come up. The awgtpman tasks on there keep trying to start but then stop due to not finding ADCs.
Then power cycled the IO Chassis for EX and then awtpman log files changed, but still no green lights. Then tried a soft reboot on fb and now its not booting correctly.
Hardware lights are on, but I can't telnet into it. Tried power cycling it once or twice, but no luck.
Probably Jamie will have to hook up a keyboard and monitor to it, to find out why its not booting.

P.S. The snapshot scripts in the yellow button don't work and the MEDM screen itself is missing the time/date string on the top. |
7963
|
Wed Jan 30 13:50:27 2013 |
Jenne | Update | Computers | c1iscex still down |
[Koji, Jenne]
We noticed that the iscex computer is still down, but the IOP is (was) running. When we sat down to look at it, c1x01 was 'breathing', had a non-zero CPU_METER time, and the error was 0x4000, which I've never seen before. The fb connection was still red though. Also, it is claiming that its sync source is 1pps, not TDS like it usually is.
Since things were different, Koji restarted the 2 other models running on iscex, with no resulting change. We then did a 'rtcds restart all', and the IOP is no longer breathing, and the error message has changed to 0xbad. The sync source is still 1pps.
Moral of the story: c1iscex is still down, but temporarily showed signs of life that we wanted to record. |
7970
|
Thu Jan 31 10:23:39 2013 |
Jamie | Update | Computers | c1iscex still down |
Quote: |
[Koji, Jenne]
We noticed that the iscex computer is still down, but the IOP is (was) running. When we sat down to look at it, c1x01 was 'breathing', had a non-zero CPU_METER time, and the error was 0x4000, which I've never seen before. The fb connection was still red though. Also, it is claiming that its sync source is 1pps, not TDS like it usually is.
Since things were different, Koji restarted the 2 other models running on iscex, with no resulting change. We then did a 'rtcds restart all', and the IOP is no longer breathing, and the error message has changed to 0xbad. The sync source is still 1pps.
Moral of the story: c1iscex is still down, but temporarily showed signs of life that we wanted to record.
|
There's definitely a timing issue with this machine. I looked at it a bit yesterday. I'll try to get to it by the end of the week. |
8036
|
Fri Feb 8 12:43:26 2013 |
yuta | Update | Computers | videocapture.py now supports movie capturing |
I updated /opt/rtcds/caltech/c1/scripts/general/videoscripts.py so that it supports movie capturing. It saves captured images (bmp) and movies (mp4) in /users/sensoray/SensorayCaptures/ directory.
I also updated /opt/rtcds/caltech/c1/scripts/pylibs/pyndslib.py because /usr/bin/lalapps_tconvert is not working and now /usr/bin/tconvert works.
However, tconvert doesn't run on ottavia, so I need Jamie to fix it.
videocapture.py -h:
Usage:
videocapture.py [cameraname] [options]
Example usage:
videocapture.py MC2F -s 320x240 -t off
(Camptures image of MC2F with the size of 320x240, without timestamp on the image. MUST RUN ON PIANOSA!)
videocapture.py AS -m 10
(Camptures 10 sec movie of AS with the size of 720x480. MUST RUN ON PIANOSA!)
Options:
-h, --help show this help message and exit
-s SIZE specify image size [default: 720x480]
-t TIMESTAMP_ONOFF timestamp on or off [default: on]
-m MOVLENGTH specity movie length (in sec; takes movie if specified) [default: 0] |
8062
|
Mon Feb 11 18:44:34 2013 |
Jamie | Update | Computers | passwerdz changed |
Password for nodus and all control room workstations has been changed. Look for new one in usual place.
We will try to change the password on all the RTS machines soon. For the moment, though, they remain with the old passwerd. |
8088
|
Fri Feb 15 15:21:07 2013 |
Jamie | Update | Computers | c1iscex IO-chassis dead |
I appears that the c1iscex IO-chassis is either dead or in a very bad state. The PCIe interface card in the IO-chassis is showing four red lights, where it's supposed to be showing a dozen or so green lights. Obviously this is going to prevent anything from running.
We've had power issues with this chassis before, so possibly that's what we're running into now. I'll pull the chassis and diagnose asap.
|
8140
|
Fri Feb 22 20:28:17 2013 |
Jamie | Update | Computers | linux1 dead, then undead |
At around 2:30pm today something brought down most of the martian network. All control room workstations, nodus, etc. were unresponsive. After poking around for a bit I finally figured it had to be linux1, which serves the NFS filesystem for all the important CDS stuff. linux1 was indeed completely unresponsive.
Looking closer I noticed that the Fibrenetix FX-606-U4 SCSI hardware RAID device connected to linux1 (see #1901), which holds cds network filesystem, was showing "IDE Channel #4 Error Reading" on it's little LCD display. I assumed this was the cause of the linux1 crash.
I hard shutdown linux1, and powered off the Fibrenetix device. I pulled the disk from slot 4 and replaced it with one of the spares we had in the control room cabinets. I powered the device back up and it beeped for a while. Unfortunately the device was requiring a password to access it from the front panel, and I could find no manual for the device in the lab, nor does the manufacturer offer the manual on it's web site.
Eventually I was able to get linux1 fully rebooted (after some fscks) and it seemed to mount the hardware RAID (as /dev/sdc1) fine. The brought the NFS back. I had to reboot nodus to get it recovered, but all the control room and front-end linux machines seemed to recover on their own (although the front-ends did need an mxstream restart).
The remaining problem is that the linux1 hardware RAID device is still currently unaccessible, and it's not clear to me that it's actually synced the new disk that I put in it. In other words I have very little confidence that we actually have an operational RAID for /opt/rtcds. I've contacted the LDAS guys (ie. Dan Kozak) who are managing the 40m backup to confirm that the backup is legit. In the mean time I'm going to spec out some replacement disks onto which to copy /opt/rtcds, and also so that we can get rid of this old SCSI RAID thing. |
Attachment 1: FX-606-U4_1205.pdf
|
|
8141
|
Sat Feb 23 00:34:28 2013 |
yuta | Update | Computers | crontab in op340m deleted and restored (maybe) |
I accidentally overwrote crontab in op340m with an empty file.
By checking /var/cron in op340m, I think I restored it.
But somehow, autolockMCmain40m does not work in cron job, so it is currently running by nohup.
What I did:
1. I ssh-ed op340m to edit crontab to change MC autolocker to usual power mode. I used "crontab -e", but it did not show anything. I exited emacs and op340m.
2. Rana found that the file size of crontab went 0 when I did "crontab -e".
3. I found my elog #6899 and added one line to crontab
55 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/MC/autolockMCmain40m >/cvs/cds/caltech/logs/scripts/mclock.cronlog 2>&1
4. It didn't run correctly, so Rana used his hidden power "nohup" to run autolockMCmain40m in background.
5. Koji's hidden magic "/var/cron/log" gave me inspiration about what was in crontab. So, I made a new crontab in op340m like this;
34 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/MC/autolockMCmain40m >/cvs/cds/caltech/logs/scripts/mclock.cronlog 2>&1
55 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/RCthermalPID.pl >/cvs/cds/caltech/logs/scripts/RCthermalPID.cronlog 2>&1
07 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/FSSSlowServo >/cvs/cds/caltech/logs/scripts/FSSslow.cronlog 2>&1
00 * * * * /opt/rtcds/caltech/c1/burt/autoburt/burt.cron >> /opt/rtcds/caltech/c1/burt/burtcron.log
13 * * * * /cvs/cds/caltech/conlog/bin/check_conlogger_and_restart_if_dead
14,44 * * * * /opt/rtcds/caltech/c1/scripts/SUS/rampdown.pl > /dev/null 2>&1
6. It looks like some of them started running, but I haven't checked if they are working or not. We need to look into them.
Moral of the story:
crontab needs backup. |
8144
|
Sat Feb 23 14:04:07 2013 |
Koji | Update | Computers | apache retarted (Re: linux1 dead, then undead) |
apache has been restarted.
How to: search "apache" on the 40m wiki
Quote: |
I had to reboot nodus to get it recovered
|
|
8146
|
Sat Feb 23 15:26:26 2013 |
yuta | Update | Computers | crontab in op340m updated |
I found some daily cron jobs for op340m I missed last night. Also, I edited timings of hourly jobs to maintain consistency with the past. Some of them looks old, but I will leave as it is for now.
At least, burt, FSSSlowServo and autolockMCmain40m seems like they are working now.
If you notice something is missing, please add it to crontab.
07 * * * * /opt/rtcds/caltech/c1/burt/autoburt/burt.cron >> /opt/rtcds/caltech/c1/burt/burtcron.log
13 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/FSSSlowServo >/cvs/cds/caltech/logs/scripts/FSSslow.cronlog 2>&1
14,44 * * * * /cvs/cds/caltech/conlog/bin/check_conlogger_and_restart_if_dead
15,45 * * * * /opt/rtcds/caltech/c1/scripts/SUS/rampdown.pl > /dev/null 2>&1
55 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/MC/autolockMCmain40m >/cvs/cds/caltech/logs/scripts/mclock.cronlog 2>&1
59 * * * * /opt/rtcds/caltech/c1/scripts/general/scripto_cron /opt/rtcds/caltech/c1/scripts/PSL/FSS/RCthermalPID.pl >/cvs/cds/caltech/logs/scripts/RCthermalPID.cronlog 2>&1
00 0 * * * /var/scripts/ntp.sh > /dev/null 2>&1
00 4 * * * /opt/rtcds/caltech/c1/scripts/RGA/RGAlogger.cron >> /cvs/cds/caltech/users/rward/RGA/RGAcron.out 2>&1
00 6 * * * /cvs/cds/scripts/backupScripts.pl
00 7 * * * /opt/rtcds/caltech/c1/scripts/AutoUpdate/update_conlog.cron |
8147
|
Sat Feb 23 15:46:16 2013 |
rana | Update | Computers | crontab in op340m updated |
According to Google, you can add a line in the crontab to backup the crontab by having the cronback.py script be in the scripts/ directory. It needs to save multiple copies, or else when someone makes the file size zero it will just write a zero size file onto the old backup. |
8181
|
Wed Feb 27 11:22:54 2013 |
yuta | Update | Computers | backup crontab |
I made a simple script to backup crontab (/opt/rtcds/caltech/c1/scripts/crontab/backupCrontab).
#!/bin/bash
crontab -l > /opt/rtcds/caltech/c1/scripts/crontab/crontab_$(hostname).$(date '+%Y%m%d%H%M%S')
I put this script into op340m crontab.
00 8 * * * /opt/rtcds/caltech/c1/scripts/crontab/backupCrontab
It took me 30 minutes to write and check this one line script. I hate shell scripts. |
8266
|
Mon Mar 11 10:20:36 2013 |
Max Horton | Summary | Computers | Attempted Smart UPS 2200 Battery Replacement |
Attempted Battery Replacement on Backup Power Supply in the Control Room:
I tried to replace the batteries in the Smart UPS 2200 with new batteries purchased by Steve. However, the power port wasn't compatible with the batteries. The battery cable's plug was too tall to fit properly into the Smart UPS port. New batteries must be acquired. Steve has pictures of the original battery (gray) and the new battery (blue) plugs, which look quite different (even though the company said the battery would fit).
The Correct battery connector is GRAY : APC RBC55 |
Attachment 1: upsB.jpg
|
|
Attachment 2: upsBa.jpg
|
|
8274
|
Tue Mar 12 00:35:56 2013 |
Jenne | Update | Computers | FB's RAID is beeping |
[Manasa, Jenne]
Manasa just went inside to recenter the AS beam on the camera after our Yarm spot centering exercises of the evening, and heard a loud beeping. We determined that it is the RAID attached to the framebuilder, which holds all of our frame data that is beeping incessantly. The top center power switch on the back (there are FOUR power switches, and 3 power cables, btw. That's a lot) had a red light next to it, so I power cycled the box. After the box came back up, it started beeping again, with the same front panel message:
H/W monitor power #1 failed.
Right now the fb is trying to stay connected to things, and we can kind of use dataviewer, but we lose our connection to the framebuilder every ~30 seconds or so. This rough timing estimate comes from how often we see the fb-related lights on the frontend status screen cycle from green to white to red back to green (or, how long do the lights stay green before going white again). We weren't having trouble before the RAID went down a few minutes ago, so I'm hopeful that once that's fixed, the fb will be fine.
In other news, just to make Jamie's day a little bit better, Dataviewer does not open on Pianosa or Rosalba. The window opens, but it stays a blank grey box. This has been going on for Pianosa for a few days, but it's new (to me at least) on Rosalba. This is different from the lack of ability to connect to the fb that Rossa and Ottavia are seeing. |
8278
|
Tue Mar 12 12:06:22 2013 |
Jamie | Update | Computers | FB recovered, RAID power supply #1 dead |
The framebuilder RAID is back online. The disk had been mounted read-only (see below) so daqd couldn't write frames, which was in turn causing it to segfault immediately, so it was constantly restarting.
The jetstor RAID unit itself has a dead power supply. This is not fatal, since it has three. It has three so it can continue to function if one fails. I have removed the bad supply and gave it to Steve so he can get a suitable replacement.
Some recovery had to be done on fb to get everything back up and running again. I ran into issues trying to do it on the fly, so I eventually just rebooted. It seemed to come back ok, except for something going on with daqd. It was reporting the following error upon restart:
[Tue Mar 12 11:43:54 2013] main profiler warning: 0 empty blocks in the buffer
It was spitting out this message about once a second, until eventually the daqd died. When it restarted it seemed to come back up fine. I'm not exactly clear what those messages were about, but I think it has something to do with not being able to dump it's data buffers to disk. I'm guessing that this was a residual problem from the umounted /frames, which somehow cleared on it's own. Everything seems to be ok now.
Quote: |
Manasa just went inside to recenter the AS beam on the camera after our Yarm spot centering exercises of the evening, and heard a loud beeping. We determined that it is the RAID attached to the framebuilder, which holds all of our frame data that is beeping incessantly. The top center power switch on the back (there are FOUR power switches, and 3 power cables, btw. That's a lot) had a red light next to it, so I power cycled the box. After the box came back up, it started beeping again, with the same front panel message:
H/W monitor power #1 failed.
|
DO NOT DO THIS. This is what caused all the problems. The unit has three redundant power supplies, for just this reason. It was probably continuing to function fine. The beeping was just to tell you that there was something that needed attention. Rebooting the device does nothing to solve the problem. Rebooting in an attempt to silence beeping is not a solution. Shutting of the RAID unit is basically the equivalent of ripping out a mounted external USB drive. You can damage the filesystem that way. The disk was still functioning properly. As far as I understand it the only problem was the beeping, and there were no other issues. After you hard rebooted the device, fb lost it's mounted disk and then went into emergency mode, which was to remount the disk read-only. It didn't understand what was going on, only that the disk seemed to disappear and the reappear. This was then what caused the problems. It was not the beeping, it was the restarting the RAID that was mounted on fb.
Computers are not like regular pieces of hardware. You can't just yank the power on them. Worse yet is yanking the power on a device that is connected to a computer. DON"T DO THIS UNLESS YOU KNOW WHAT YOU"RE DOING. If the device is a disk drive, then doing this is a sure-fire way to damage data on disk.
|
8280
|
Tue Mar 12 14:51:00 2013 |
Steve | Update | Computers | buy warranty or not ? |
Details of the warranties are posted on wiki power supply cost, warranty described, cost
.......I’ve also attached a warranty renewal quote. A 1 year warranty renewal is usually $.... per year, but we gave you special pricing of $.... / year if you renew both units. This pricing is also special due to the fact that both warranties expired awhile ago. We usually require that the warranty renewal begin on the date of expiration, but we will waive this for you this time if both are renewed.
JetStor SATA 416S, SN: SB09040111A3 – expired 04/24/2012 (3 years old)
JetStor SATA 516F, SN: SB09080016P – expired on 08/21/2012........
. Are we keep it for an other 2 years? buy warranty or buy better storage.
|
8324
|
Thu Mar 21 10:29:12 2013 |
Manasa | Update | Computers | Computers down since last night |
I'm trying to figure out what went wrong last night. But the morning status...the computers are down.

|
Attachment 1: down.png
|
|
8325
|
Thu Mar 21 12:04:05 2013 |
Manasa | Update | Computers | Fixed |
All FE computers are back.
Restart procedure:
0a. Restart frame builder: telnet fb 8087 & type shutdown
0b. Restart mx_stream from the FE overview screen
1. I ssh ed to the computer. (c1lsc, c1ioo, c1iscex, c1isey)
2. I used 'sudo shutdown -r (computername) '. They came back ON.
3. While rebooting c1ioo, c1sus shutdown (for reasons I don't know). I could not ping or ssh c1SUS after this.
4. I went in and switched c1SUS computer OFF and back ON after which I could ssh to it.
5. I did the same reboot procedure for c1SUS.
6. I had to restart some of the models individually.
(i) ssh to the computer running the model
(ii) rtcds restart 'model name'
7. All computers are back now.

|
8326
|
Thu Mar 21 12:33:51 2013 |
rana | Update | Computers | Fixed |
Please stop power cycling computers. This is not an acceptable operation (as Jamie already wrote before). When you don't know what to do besides power cycling the computer, just stop and do something else or call someone who knows more. Every time you kill the power to a computer you are taking a chance on damaging it or corrupting some hard drive.
In this case, the right thing to do would be to hook up the external keyboard and monitor directly to c1sus to diagnose things.
NO MORE TOUCHING THE POWER BUTTON. |
8334
|
Mon Mar 25 09:52:22 2013 |
Jenne | Update | Computers | c1lsc mxstream won't restart |
Most of the front ends' mx streams weren't running, so I did the old mxstreamrestart on all machines (see elog 6574....the dmesg on c1lsc right now, at the top, has similar messages). Usually this mxstream restart works flawlessly, but today c1lsc isn't working. Usually to the right side of the terminal window I get an [ok] when things work. For the lsc machine today, I get [!!] instead.
After having learned from recent lessons, I'm waiting to hear from Jamie. |
8335
|
Mon Mar 25 11:42:45 2013 |
Jamie | Update | Computers | c1lsc mx_stream ok |
I'm not exactly sure what the problem was here, but I think it had to do with a stuck mx_stream process that wasn't being killed properly. I manually killed the process and it seemed to come up fine after that. The regular restart mechanisms should work now.
No idea what caused the process to hang in the first place, although I know the newer RCG (2.6) is supposed to address some of these mx_stream issues. |
8366
|
Thu Mar 28 10:44:30 2013 |
Manasa | Update | Computers | c1lsc down |
c1lsc was down this morning.
I restarted fb and c1lsc based on elog
Everything but c1oaf came back. I tried to restart c1oaf individually; but it didn't work.
Before:

After:

|
8367
|
Thu Mar 28 12:50:52 2013 |
Jenne | Update | Computers | c1lsc is fine |
Manasa told me that she did things in a different order than her old elog.
She had
(1) ssh'ed to c1lsc and did a remote shutdown / restart,
(2) restarted fb,
(3) restarted the mxstream on c1lsc,
(4) restarted each model individually in some order that I forgot to ask.
However, with the situation as in her "before" screenshot, all that needed to be done was restart the mxstream process on c1lsc.
Anyhow, when I looked at the OAF model, it was complaining of "no sync", so I restarted the model, and it came back up fine. All is well again. |
8374
|
Fri Mar 29 17:24:43 2013 |
Jamie | Update | Computers | FB RAID power supply replaced |
Steve ordered a replacement power supply for the FB JetStor power supply that failed a couple weeks ago. I just installed it and it looks fine. |
8394
|
Tue Apr 2 20:52:35 2013 |
rana | Update | Computers | iMac bashed |
I changed the default shell on our control room iMac to bash. Since we're really, really using bash as the shell for LIGO, we might as well get used to it. As we do this for the workstations, some things will fail, but we can adopt Jamie's private .bashrc to get started and then fix it up later. |
8398
|
Wed Apr 3 01:32:04 2013 |
Jenne | Update | Computers | updated EPICS database (channels selected for saving) |
I modified /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini to include the C1:LSC-DegreeOfFreedom_TRIG_MON channels. These are the same channel that cause the LSC screen trigger indicators to light up.
I vaguely followed Koji's directions in elog 5991, although I didn't add new grecords, since these channels are already included in the .db file as a result of EpicsOut blocks in the simulink model. So really, I only did Step 2. I still need to restart the framebuilder, but locking (attempt at locking) is happening.
The idea here is that we should be able to search through this channel, and when we get a trigger, we can go back and plot useful signals (PDs, error signals, cotrol signals,....), and try to figure out why we're losing lock.
Rana tells me that this is similar to an old LockAcq script that would run DTT and get data.
EDIT: I restarted the daqd on the fb, and I now see the channel in dataviewer, but I can only get live data, no past data, even though it says that it is (16,float). Here's what Dataviewer is telling me:
Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
read(); errno=0
LONG: DataRead = -1
No data found
read(); errno=9
read(); errno=9
T0=13-03-29-08-59-43; Length=432010 (s)
No data output.
|
8400
|
Wed Apr 3 14:45:34 2013 |
Jamie | Update | Computers | updated EPICS database (channels selected for saving) |
Quote: |
I modified /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini to include the C1:LSC-DegreeOfFreedom_TRIG_MON channels. These are the same channel that cause the LSC screen trigger indicators to light up.
I vaguely followed Koji's directions in elog 5991, although I didn't add new grecords, since these channels are already included in the .db file as a result of EpicsOut blocks in the simulink model. So really, I only did Step 2. I still need to restart the framebuilder, but locking (attempt at locking) is happening.
The idea here is that we should be able to search through this channel, and when we get a trigger, we can go back and plot useful signals (PDs, error signals, cotrol signals,....), and try to figure out why we're losing lock.
Rana tells me that this is similar to an old LockAcq script that would run DTT and get data.
EDIT: I restarted the daqd on the fb, and I now see the channel in dataviewer, but I can only get live data, no past data, even though it says that it is (16,float). Here's what Dataviewer is telling me:
Connecting to NDS Server fb (TCP port 8088)
Connecting.... done
read(); errno=0
LONG: DataRead = -1
No data found
read(); errno=9
read(); errno=9
T0=13-03-29-08-59-43; Length=432010 (s)
No data output.
|
I seem to be able to retrieve these channels ok from the past:
controls@pianosa:/opt/rtcds/caltech/c1/scripts 0$ tconvert 1049050000
Apr 03 2013 18:46:24 UTC
controls@pianosa:/opt/rtcds/caltech/c1/scripts 0$ ./general/getdata -s 1049050000 -d 10 --noplot C1:LSC-PRCL_TRIG_MON
Connecting to server fb:8088 ...
nds_logging_init: Entrynds_logging_init: Exit
fetching... 1049050000.0
Hit any key to exit:
controls@pianosa:/opt/rtcds/caltech/c1/scripts 0$
Maybe DTT just needed to be reloaded/restarted? |
8444
|
Thu Apr 11 11:58:21 2013 |
Jenne | Update | Computers | LSC whitening c-code ready |
The big hold-up with getting the LSC whitening triggering ready has been a problem with running the c-code on the front end models. That problem has now been solved (Thanks Alex!), so I can move forward.
The background:
We want the RFPD whitening filters to be OFF while in acquisition mode, but after we lock, we want to turn the analog whitening (and the digital compensation) ON. The difference between this and the other DoF and filter module triggers is that we must parse the input matrix to see which PD is being used for locking at that time. It is the c-code that parses this matrix that has been causing trouble. I have been testing this code on the c1tst.mdl, which runs on the Y-end computer. Every time I tried to compile and run the c1tst model, the entire Y-end computer would crash.
The solution:
Alex came over to look at things with Jamie and me. In the 2.5 version of the RCG (which we are still using), there is an optimization flag "-O3" in the make file. This optimization, while it can make models run a little faster, has been known in the past to cause problems. Here at the 40m, our make files had an if-statement, so that the c1pem model would compile using the "-O" optimization flag instead, so clearly we had seen the problem here before, probably when Masha was here and running the neural network code on the pem model. In the RCG 2.6 release, all models are compiled using the "-O" flag. We tried compiling the c1tst model with this "-O" optimization, and the model started and the computer is just fine. This solved the problem.
Since we are going to upgrade to RCG 2.6 in the near-ish future anyway, Alex changed our make files so that all models will now compile with the "-O" flag. We should monitor other models when we recompile them, to make sure none of them start running long with the different optimization.
The future:
Implement LSC whitening triggering! |
8479
|
Tue Apr 23 22:10:54 2013 |
rana | Update | Computers | Nancy |
controls@rosalba:/users/rana/docs 0$ svn resolve --accept working nancy
Resolved conflicted state of 'nancy' |
8529
|
Sat May 4 00:21:00 2013 |
rana | Configuration | Computers | workstation updates |
Koji and I went into "Update Manager" on several of the Ubuntu workstations and unselected the "check for updates" button. This is to prevent the machines from asking to be upgraded so frequently - I am concerned that someone might be tempted to upgrade the workstations to Ubuntu 12.
We didn't catch them all, so please take a moment to check that this is the case on all the laptops you are using and make it so. We can then apply the updates in a controlled manner once every few months. |