40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
 40m Log, Page 76 of 344 Not logged in
ID Date Author Type Category Subject
10793   Fri Dec 12 19:38:49 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

 Quote: [Diego, Steve] We ran a Cat 6+ Ethernet cable from the 1X7 rack (where the new nodus is located) to the fast GC switch in the control room rack; now I will learn how to setup the 'outside world' network, iptables, and the like.   I remind that the current hardware/software status is posted in elog 10697 ; if additions or corrections are needed, let me know.   After I check a couple of things, we can use the new nodus (which is currently known in the martian network as rosalba) as a local test to see that everything is working. After that (and, mostly, after I'll have the network working), we will sync the data from the old nodus to the new one and make the switch.

[Diego, EricQ]

Update: work is almost completed; the old nodus is still online, as I don't feel confident to make the switch and leave it on its own for the weekend. However, the new nodus is online with the IP address 131.215.114.87, so everyone can check that everything works. From my tests I can say that:

After everything will be in place, I will save every reasonably important configuration file of nodus into the svn.

I remind that every change made while accessing the 131.215.114.87 machine will be purged during the sync&switch

10797   Mon Dec 15 12:53:13 2014 ericqUpdateComputer Scripts / ProgramsStatus of the new nodus

Nodus (solaris) is dead, long live Nodus (ubuntu).

Diego and I are smoothing out the Kinks as they appear, but the ELOG is running smoothly on our new machine.

SVN is working, but your checkouts may complain because they expect https, and we haven't turned SSL on yet...

10798   Mon Dec 15 16:27:57 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

 Quote: Nodus (solaris) is dead, long live Nodus (ubuntu). Diego and I are smoothing out the Kinks as they appear, but the ELOG is running smoothly on our new machine.  SVN is working, but your checkouts may complain because they expect https, and we haven't turned SSL on yet...

[Diego, EricQ]

SSL, https and backups are now working too!

A backup of nodus's configuration (with some explaining) will be done soon.

10805   Tue Dec 16 20:49:25 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

Quote:

 Quote: Nodus (solaris) is dead, long live Nodus (ubuntu). Diego and I are smoothing out the Kinks as they appear, but the ELOG is running smoothly on our new machine.  SVN is working, but your checkouts may complain because they expect https, and we haven't turned SSL on yet...

[Diego, EricQ]

SSL, https and backups are now working too!

A backup of nodus's configuration (with some explaining) will be done soon.

Nodus should be visible again from outside the Caltech Network; I added some basic configuration for postfix and smartmontools; configuration files and instructions for everything are in the svn in the nodus_config folder

10815   Thu Dec 18 15:41:30 2014 ericqUpdateComputer Scripts / ProgramsOffsite backups of /cvs/cds going again

Since the Nodus switch, the offsite backup scripts (scripts/backup/rsync.backup) had not been running successfully. I tracked it down to the weird NFS file ownership issues we've been seeing since making Chiara the fileserver. Since the backup script uses rsync's "archive" mode, which preserves ownership, permissions, modification dates, etc, not seeing the proper ownership made everything wacky.

Despite 99% of the searches you do about this problem saying you just need to match your user's uid and gid on the NFS client and server, it turns out NFSv4 doesn't use this mechanism at all, opting instead for some ID mapping service (idmapd), which I have no inclination of figuring out at this time.

Thus, I've configured /etc/fstab on Nodus (and the control room machines) to use NFSv3 when mounting /cvs/cds. Now, all the file ownerships show up correctly, and the offsite backup of /cvs/cds is churning along happily.

10816   Thu Dec 18 16:21:08 2014 ericqUpdateComputer Scripts / Programsscripts not being backed up!

I just stumbled upon this while poking around:

Since the great crash of June 2014, the scripts backup script has not been workingon op340m. For some reason, it's only grabbing the PRFPMI folder, and nothing else.

Megatron seems to be able to run it. I've moved the job to megatron's crontab for now.

10817   Fri Dec 19 14:25:48 2014 diegoUpdateComputer Scripts / Programselog restarted

elog was not responding for unknown reasons, since the elogd process on nodus was alive; anyway, I restarted it.

10819   Fri Dec 19 16:39:25 2014 ericqUpdateComputer Scripts / Programselog autostart

I've set up nodus to start the ELOG on boot, through /etc/init/elog.conf. Also, thanks to this, we don't need to use the start-elog.csh script any more. We can now just do:

controls@nodus:~  sudo initctl restart elog I also tweaked some of the ELOG settings, so that image thumbnails are produced at higher resolution and quality. 10820 Fri Dec 19 16:59:32 2014 ericqUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron Given that op340m showed some undesired behavior, and that the FSS slow seems prone to railing lately, I've moved the FSS slow servo job over to megatron in the same way I did for the MC autolocker. Namely, there is an upstart configuration (megatron:/etc/init/FSSslow.conf), that invokes the slow servo. Log file is in the same old place (/cvs/cds/caltech/logs/scripts), and the servo can be (re)started by running: controls@megatron|~ > sudo initctl start FSSslow Maybe this won't really change the behavior. We'll see 10824 Fri Dec 19 20:44:23 2014 JenneUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron Today Q moved the FSS slow servo over to some init thing on megatron, and some time ago he did the same thing to the MC auto locker script. It isn't working though. Even though megatron was rebooted, neither script started up automatically. As Diego mentioned in elog 10823, we ran sudo initctl start MCautolocker and sudo initctl start FSSslow, and the blinky lights for both of the scripts started. However, that seems to be the only thing that the scripts are doing. The MC auto locker is not detecting lockloses, and is not resetting things to allow the MC to relock. The MC is happy to lock if I do it by hand though. Similarly, the blinky light for the FSS is on, but the PSL temperature is moving a lot faster than normal. I expect that it will hit one of the rails in under an hour or so. The MC autolocker and the FSS loop were both running earlier today, so maybe Q had some magic that he used when he started them up, that he didn't include in the elog instructions? 10825 Sat Dec 20 00:00:03 2014 ericqUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron I ssh'd in, and was able to run each script manually successfully. I ran the initctl commands, and they started up fine too. We've seen this kind of behavior before, generally after reboots; see ELOGS 10247 and 10572 10840 Tue Dec 23 18:43:33 2014 diegoUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron  Quote: I ssh'd in, and was able to run each script manually successfully. I ran the initctl commands, and they started up fine too. We've seen this kind of behavior before, generally after reboots; see ELOGS 10247 and 10572. In the plot it is shown the behaviour of the PSL-FSS_SLOWDC signal during the last week; the blue rectangle marks an approximate estimate of the time when the scripts were moved to megatron. Apart from the bad things that happened on Friday during the big crash, and the work ongoing since yesterday, it seems that something is not working well. The scripts on megatron are actually running, but I'll try and have a look at it. 10844 Fri Dec 26 18:20:42 2014 ranaUpdateComputer Scripts / ProgramsFSS Slow servo thresh change  Quote: In the plot it is shown the behaviour of the PSL-FSS_SLOWDC signal during the last week; the blue rectangle marks an approximate estimate of the time when the scripts were moved to megatron. Apart from the bad things that happened on Friday during the big crash, and the work ongoing since yesterday, it seems that something is not working well. The scripts on megatron are actually running, but I'll try and have a look at it. I guessed that what was happening was that the SLOW servo settings were not restored to the right values after the code movements / reboots. The ON threshold for the servo was set at +6 counts and the channel is MC TRANS. Since the ADC noise on that channel is ~50 counts, this means that the servo keeps pushing the laser temperature off in some direction when the MC is unlocked. I reset the threshold to +6666 counts (the aligned MC transmission is ~16000 for the TEM00 mode) so that it only turns on when we're in a good locked state. 10877 Thu Jan 8 03:40:50 2015 ericqUpdateComputer Scripts / ProgramsELOG 3.0 I've installed the very fresh ELOG 3.0, for nothing else than the new built in text editor which has a LATEX capable equation editor built right in. Check out this sweet limerick: $\int_{1}^{\sqrt[3]{3}}t^2 dt\, \textbf{cos}(\frac{3\pi}{9}) = \textbf{ln}(\sqrt[3]{e})$ 10878 Thu Jan 8 09:24:40 2015 jamieUpdateComputer Scripts / ProgramsELOG 3.0  Quote: I've installed the very fresh ELOG 3.0, for nothing else than the new built in text editor which has a LATEX capable equation editor built right in. Check out this sweet limerick: $\int_{1}^{\sqrt[3]{3}}t^2 dt\, \textbf{cos}(\frac{3\pi}{9}) = \textbf{ln}(\sqrt[3]{e})$ $\int \omega \epsilon \varepsilon \Gamma$ 10889 Tue Jan 13 01:58:16 2015 ericqUpdateComputer Scripts / ProgramsCdsutils upgraded to 382 I've upgraded our cdsutils installation to v382; there have been some changes to pydv which will allow me to implement the auto y-scaling on our lockloss plots. After some brief testing, things seem to still work... 10897 Tue Jan 13 18:47:20 2015 ChrisConfigurationComputer Scripts / Programsinstafoton setup To use instafoton, right click an MEDM screen, open the Execute menu, and choose "Foton". Then click on the EPICS channel of a filter module as displayed on the screen. Here's how it was set up: • Install instafoton.py in /opt/rtcds/caltech/c1/scripts; edit paths to localize for the 40m • Add instafoton to the MEDM_EXEC_LIST environment variable, newly defined in /ligo/cdscfg/workstationrc.sh: export MEDM_EXEC_LIST="Edit this screen;medm &A &:Probe;probe &P &:Foton (Pick filter PV);/opt/rtcds/caltech/c1/scripts/instafoton.py &P &" 10898 Tue Jan 13 23:17:57 2015 ChrisFrogsComputer Scripts / Programsmedm time machine After recompiling medm with a patch for dumping screens (attached), I added a time machine to the right-click Execute menu. It's installed under /cvs/cds/caltech/users/wipf/src/medm_time_machine. Dependencies include the python CA server module (pcaspy) and the latest nds2-client 0.11.2. These were also installed under my users directory, to avoid interfering with other tools. 10905 Thu Jan 15 18:06:34 2015 JenneUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa I have installed kerberos on Rossa, so that I don't have to type my name and password every time I do an svn checkin, since I'm making some modifications and want to be sure that everything is checked in before and afterwards. I ran sudo apt-get install krb5-user. I didn't put in a default_realm when it prompted me to during install, so I went into the /etc/krb5.conf file and changed the default_realm line to read default_realm = LIGO.ORG Now we can use kinit, but we must (as usual) remember to kdestroy our credentials when we're done. As a reminder, to use: > kinit albert.einstein Password for albert.einstein@LIGO.ORG: (type your pw here) When you're finished, run > kdestroy The end. 10906 Thu Jan 15 18:10:19 2015 jamieUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa  Quote: I have installed kerberos on Rossa, so that I don't have to type my name and password every time I do an svn checkin, since I'm making some modifications and want to be sure that everything is checked in before and afterwards. I ran sudo apt-get install krb5-user. I didn't put in a default_realm when it prompted me to during install, so I went into the /etc/krb5.conf file and changed the default_realm line to read default_realm = LIGO.ORG. Now we can use kinit, but we must (as usual) remember to kdestroy our credentials when we're done. As a reminder, to use: > kinit albert.einstein Password for albert.einstein@LIGO.ORG: (type your pw here) When you're finished, run > kdestroy The end. WARNING: since the workstations are all shared user, if you forget to kdestroy the next user can commit under your user ID. It might be good to set the timeout to be something much shorter than 24 hours, like maybe 1, or 2. 10907 Thu Jan 15 18:30:18 2015 JenneUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa  Quote: WARNING: since the workstations are all shared user, if you forget to kdestroy the next user can commit under your user ID. It might be good to set the timeout to be something much shorter than 24 hours, like maybe 1, or 2. Good call. I added a line ticket_lifetime = 3600, which should make it destroy the credentials after an hour. 10990 Mon Feb 9 17:23:17 2015 diegoUpdateComputer Scripts / ProgramsNew laptops I forgot to elog about these ones, my bad... The new/updated laptops are giada, viviana and paola; paola is already in the lab, while giada and viviana are in the control room waiting for a new home. The Pool of Names Wiki page has already been updated to reflect the changes. 11015 Thu Feb 12 15:21:37 2015 ericqUpdateComputer Scripts / Programsnetgpib updates I've fixed the gpib scripts for the SR785 and AG4395A to output data in the same format as expected by older scripts when called by them. In addition, there are now some easier modes of operation through the measurement scripts SRmeasure and AGmeasure. These are on thePATH for the main control room machines, and live in scripts/general/netgpib

Case 1: I manually set up a measurement on the analyzer, and just want to download / plot the data.

Make sure you have a yellow prologix box plugged in, and can ping the address it is labeled with. (i.e. 'vanna'). Then, in the directory you want to save the data, run:

SRmeasure -i vanna -f mydata --getdata --plot

This saves mydata_(datetime).txt and mydata_(datetime).pdf in the current directory.

In all cases, AGmeasure has the identical syntax. If the GPIB address is something other than 10, specifiy it with -a, but this is rarely the case.

Case 2: I want to remotely specify a measurement

Rather than a series of command line arguments, which may get lost to the mists of time, I've set the scripts up to use parameter files that serve as arguments to the scripts.

Get the templates for spectrum and TF measurements in your current directory by running

SRmeasure --template

Set the parameters with your text editor of choice, such as frequency span, filename output, whether to create a plot or not, then run the measurement:

SRmeasure SR785template.yml

Case 3: I want to compare my data with previous measurements

In the template parameter files, there is an option 'plotRefs', that will automatically plot the data from files whose filenames start with the same string as the current measurement.

If, in the "#" commented out header of the data file, there is a line that contains "memo:" or "timestamp:", it will include the text that follows in the plot legend.

There are also methods to remotely trigger an already configured measurement, or remotely reset an unresponsive instrument. Options can be perused by looking at the help in SRmeasure -h

I've tested, debugged, and used them for a bit, but wrinkles may remain. They've been svn40m committed, and I also set up a separate git repository for them at github.com/e-q/netgpibdata

11065   Wed Feb 25 11:01:05 2015 manasaUpdateComputer Scripts / ProgramsNew screen for FOL PID loop

Created a new medm screen C1ALS_FOL_PID.adl for FOL PID loop control in /medm/als/master/

This is not currently linked to the sitemap screen.

11076   Thu Feb 26 13:17:31 2015 ericqUpdateComputer Scripts / ProgramsFB IO load

Over the past few days, I've occasionally been peeking at the framebuilder IO load to see If I could correlate anything with it, but it's usually been low when I looked. I.e. with daqd and all models running, the %wa time was in the few percents at most.

Just now, I was seeing some EPICS sluggishness, and sure enough, the %wa was in the 50-60 range. I used iostat -xmh 5 on the framebuilder to see that /dev/sda, the /frames drive, was at 100% utilization, which means it was reading and writing as fast as it possibliy could.

I ssh'd over to nodus, and with iotop found that an rsync job was running (rsync -am --exclude .*.gwf full 131.215.114.19::40m/full), and its IO rates corresponded very closely to the data read rates on the framebuilder from /frames.

I killed the rsync process on nodus, and the %wa time on the framebuilder dropped to near zero. The ASS striptools, where I had noticed the sluggishness, immediately started updating faster.

While rsync is supposed to play nice with a system's IO demands, maybe it only knows about nodus's IO usage, not fb which is the underlying NFS server where the frames live. I think it would be good to throttle the bandwidth of these jobs to a specific bandwidth. 50MB/s seemed like too much, so maybe 10MB/s is ok?

11077   Thu Feb 26 13:55:59 2015 jamieUpdateComputer Scripts / ProgramsFB IO load
We should use "ionice" to throttle the rsync. Use something like "ionice -c 3 rsync ..." to set the priority such that the rsync process will only work when there is no other IO contention. See "man ionice" for other options.
11084   Fri Feb 27 11:20:49 2015 ericqUpdateComputer Scripts / ProgramsiPython Notebook for LSC Sensing Matrix
 Quote: ** along the way, I noticed that the reason this notebook hasn't been working since last night is that someone sadly installed a new anaconda python distro today  without telling anyone by ELOG. This new distro didn't have all the packages of the previous one. I've updated it with astropy and uncertainties packages.

Yesterday, I was trying to install a package with anaconda's package manager, conda, but it was crashing in some weird way. I wasn't able to fix it, which led me to create a fresh installation.

11161   Mon Mar 23 19:30:36 2015 ranaUpdateComputer Scripts / Programsrsync frames to LDAS cluster

The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now.

Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes.

You can find the script and its logfile name by doing 'crontab -l' on nodus.

11162   Mon Mar 23 22:56:54 2015 ericqUpdateComputer Scripts / ProgramsNodus web things

Back when Diego and I were getting all of the web services running up on the new nodus, we inexplicably were not able to get the hosting of the public_html directory and wikis to share the same port of 30889. In ELOG 10793, we stated that public_html was hosted on a new port, 30888, though we didn't really bring much attention to that new fact.

Unbeknowst to us at the time, this broke other links/bookmarks/sites that people had been using. Koji pointed this out to me the other day, but I have not made any sort of resolution. For now, the public_html directory, and the sites therein, have been taken offline.

In other nodus news, Jamie has set Nodus' apache service with a certificate for SSL goodness. We want to extend this to the ELOG, which uses a built in webserver, rather than apache.

He set up a proxy at the https address which will later host the secured elog: https://nodus.ligo.caltech.edu:8081/

When we make the switch to running the ELOG with HTTPS on by default, living on port 8081, we will set up apache to point 8080 at 8081, to preserve all of the old links.

I.e. this change should effectively be invisible to ELOG users if we implement it right.

11166   Tue Mar 24 15:22:12 2015 manasaUpdateComputer Scripts / ProgramsNew slow channels for FOL

[Koji, Manasa]

I have created new slow channels for FOL. To do so, I have edited the fcreadout.db file in Domenica and the C0EDCU.ini file in /chans/daq

Domenica and frame builder were restarted after the edits.

Koji has moved the following files from /opt/rtcds/caltech/c1/chans/daq/ to /opt/rtcds/caltech/c1/chans/daq/trash  as they are not being used anymore.

C0EDCU1.ini C1EDCU_X00.ini C1EDCU_X10.ini C1EDCU_X14.ini C1X00.ini C1X10.ini C1X99.ini

11189   Wed Apr 1 11:42:30 2015 manasaUpdateComputer Scripts / ProgramsPID script in python

Since none of us here are experts in pearl, I have put together a python script for a simple PID controller. This can be imported into any main scripts that will run the actual PID loop. The script, PID.py, exists in /scripts/general/

11220   Wed Apr 15 15:14:18 2015 ericqUpdateComputer Scripts / ProgramsCDSutils upgraded to v474

CDSutlils has been updated to the newest version, 474; there are some matrix interface methods that will make our locking scripts easier to read, modify, and maintain.

I've tested the ALS and CARM down scripts, and the LSC offsets script, and they all work fine.

11221   Wed Apr 15 20:54:18 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade bad

The SUS align/misalign scripts don't work after the new CDS utils upgrade.

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks.

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

11223   Wed Apr 15 23:29:08 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q remotely reverted this change.  Scripts seem to work again.

 Quote: The SUS align/misalign scripts don't work after the new CDS utils upgrade.  I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks.  ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

11240   Thu Apr 23 21:05:23 2015 ranaUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q: please update this Wiki page with the go-back procedure:

11252   Sun Apr 26 00:56:21 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

11263   Wed Apr 29 18:12:42 2015 ranaUpdateComputer Scripts / Programsnodus update

Installed libmotif3 and libmotif4 on nodus so that we can run dataviewer on there.

Also, the lscsoft stuff wasn't installed for apt-get, so I did so following the instructions on the DASWG website:

Then I installed libmetaio1, libfftw3-3. Now, rather than complain about missing librarries, diaggui just silently dies.

Then I noticed that the awggui error message tells us to use 'ssh -Y' instead of 'ssh -X'. Using that I could run DTT on nodus from my office.

11267   Fri May 1 20:33:31 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Same thing again today. So I renamed the /etc/init/elog.conf so that it doesn't keep respawning bootlessly. Until then restart elog using the start script in /cvs/cds/caltech/elog/ as usual.

I'll let EQ debug when he gets back - probably we need to pause the elog respawn so that it waits until nodus is up for a few minutes before starting.

 Quote: Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki. After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

11273   Tue May 5 10:40:05 2015 ericqHowToComputer Scripts / ProgramsHow to get a web page running on Nodus

# How to get your own web page running on Nodus

1. On any martian machine, put your stuff in /users/public_html/$MYPAGE/ 2. On Nodus, run: ln -s /users/public_html/$MYPAGE /export/home/
3. Your site is now available at https://nodus.ligo.caltech.edu:30889/$MYPAGE/ 4. If you want to allow straight up directory listing to the entire internet, on Nodus run: sudoedit /etc/sites-available/nodus, and add the following lines towards the bottom: <Directory /export/home/$MYPAGE>
    Options +Indexes
</Directory>
11277   Sun May 10 13:54:41 2015 ranaHowToComputer Scripts / Programssummary page URL change

Also, EQ gave us a better (and not pwd protected) URL for the summary pages. Please replace your previous links with this new one:

https://nodus.ligo.caltech.edu:30889/detcharsummary/

11278   Mon May 11 01:28:33 2015 ranaHowToComputer Scripts / Programssummary page URL change

Like Steve pointed out, the summary pages show that the y-arm transmission drifts a lot when locked. The OL summary page shows that this is all due to ITMY yaw.

Could be either that they coil driver / DAC is bad or that the suspension is poorly built. We need to dig into ITMY OL trends over long term to see if this is new or now.

Also, weather station needs a reboot. And does anyone know what the MC_F calibration is?

11288   Wed May 13 09:17:28 2015 ranaUpdateComputer Scripts / Programsrsync frames to LDAS cluster

Still seems to be running without causing FB issues. One thought is that we could look through the FB status channel trends and see if there is some excess of FB problems at 10 min after the hour to see if its causing problems.

I also looked into our minute trend situation. Looks like the files are comrpessed and have checksum enabled. The size changes sometimes, but its roughly 35 MB per hour. So 840 MB per day.

According to the wiper.pl script, its trying to keep the minute-trend directory to below some fixed fraction of the total /frames disk. The comment in the scripts says 0.005%,

but I'm dubious since that's only 13TB*5e-5 = 600 MB, and that would only keep us for a day. Maybe the comment should read 0.5% instead...

 Quote: The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now. Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes. You can find the script and its logfile name by doing 'crontab -l' on nodus.

11299   Mon May 18 14:22:05 2015 ericqUpdateComputer Scripts / Programsrsync frames to LDAS cluster
 Quote: Still seems to be running without causing FB issues.

I'm not so sure. I just was experiencing some severe network latency / EPICS channel freezes that was alleviated by killing the rsync job on nodus. It started a few minutes after ten past the hour, when the rysnc job started.

Unrelated to this, for some odd reason, there is some weirdness going on with ssh'ing to martian machines from the control room computers. I.e. on pianosa, ssh nodus fails with a failure to resolve hostaname message, but ssh nodus.martian succeeds.

11307   Tue May 19 11:15:09 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

Starting on the 14th (five days ago) the local chiara rsync backup of /cvs/cds to an external HDD has been failing:

caltech/c1/scripts/backup/rsync_chiara.backup.log:

2015-05-13 07:00:01,614 INFO       Updating backup image of /cvs/cds 2015-05-13 07:49:46,266 INFO       Backup rsync job ran successfully, transferred 6504 files. 2015-05-14 07:00:01,826 INFO       Updating backup image of /cvs/cds 2015-05-14 07:50:18,709 ERROR      Backup rysnc job failed with exit code 24! 2015-05-15 07:00:01,385 INFO       Updating backup image of /cvs/cds 2015-05-15 08:09:18,527 ERROR      Backup rysnc job failed with exit code 24! ...

Code 24 apparently means "Partial transfer due to vanished source files."

Manually running the backup command on chiara worked fine, returning a code of 0 (success), so we are backed up. For completeness, the command is controls@chiara: sudo rsync -av --delete --stats /home/cds/ /media/40mBackup

Are the summary page jobs moving files around at this time of day? If so, one of the two should be rescheduled to not conflict.

11308   Tue May 19 11:24:44 2015 ericqUpdateComputer Scripts / ProgramsNotification Scheme

Given some of the things we've facing lately, it occurs to me that we could be better served by having some sort of unified human-alerting scheme in place, for things like:

• Local/offsite backup failures
• Vaccumm system problems
• HDD status for things like /frames/ and /cvs/cds/, whether the disks are full, or their SMART status indicates imminent mechanical failure

Currently, many of these things are just checked sporadically when it occurs to someone to do so, or when debugging random issues. Smoother IFO operation and peace of mind could be gained if we're confident that the relevant people are notified in a timely manner.

Thoughts? Suggestions on other things to monitor, like maybe frontend/model crashes?

11321   Fri May 22 18:09:58 2015 ericqUpdateComputer Scripts / ProgramsifoCoupling

I've started working on a general routine to measure noise couplings in our interferometers. Often this is done with swept sine measurements, but this misses the nonlinear part of the coupling, especially if the linear part is alreay reduced through some compensation or feedforward scheme. Rana suggested using a series of narrow band-limited noise injections.

The structure I'm working on is a python script that uses the AWG interface written by Chris W. to create the excitations. Afterwards, I calculate a series of PSD estimates from the data (i.e. a spectrogram), and apply a two-sample, unequal variance, t-test to test for statisically significant increases in the noise spectra to try and evaluate the nonlinear contriubutions to the noise. I've started a git repository at github.com/e-q/ifoCoupling with the code.

So far, I've tested one such injection of noise coupling from the ETMX oplev error point to the single arm length error signal. It's completely missing the user interface and structure to do a general series of measurements, but this is just organizational; I'm trying to get the math/science down first.

Here's a result from today:

Median, instead of the usual mean, PSDs are used throughout, to reject outliers/glitches.

The linear part of the coupling can be estimated using the coherence / spectrum height in the excitation band, but I'm not sure what the best what to present/paramerize the nonlinear parts of each individaul excitation band's result is.

Also, I anticipate being able to write an excitation auto-leveling routine, gradually increasing the exctiation level until the excited spectrum is some amount  noisier than the baseline spectrum, up to some maximum amount configurable by the user.

The excitation shaping could probably be improved, too. It's currently and elliptic + butterworth bandpass for a sharp edge and rolloff.

I'm open to any thoughts and/or suggestions anyone may have!

11325   Tue May 26 19:57:11 2015 ranaUpdateComputer Scripts / ProgramsifoCoupling

Looks like a very handy code, especially with the real statistical tests.

I would make sure to use much smaller excitation amplitudes. Since the coupling is nonlinear, we expect that its only a good noise budget estimator when the excitation amplitude is less than a factor of 3 above the quiesscent excitation.

11327   Wed May 27 15:20:54 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

The local chiara backups are still failing due to vanished source files. I've emailed Max about the summary page jobs, since I think they're running remotely.

11336   Fri May 29 11:28:42 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

I've changed the chiara local backup script to read a folder exclusion file, and excluded /users/public_html/detcharsummary, and things are working again.

This was neccesary because the summary pages are being updated every half hour, which is faster than the time it takes for the backup script to run, so the file index that it builds at the start becomes invalid later on in the process.

Thinking about chiara's disk, it strikes me that when we went from the linux1 RAID to a single HDD on chiara, we may have tightened a bottleneck on our NFS latency, i.e. we are limited to that single hard drive's IO rates. This of course isn't the culprit for the more recent dramatic slowdowns, but in addition to fixing whatever has happened more recently, we may want to consider some kind of setup with higher IO capability for the NFS filesystem.

11337   Fri May 29 12:49:53 2015 KojiUpdateComputer Scripts / ProgramsChiara Backup Hiccup

In fact, the file access is supposed to be WAY faster now than in the RAID case.

As noted in ELOG 9511, it was SCSI-2(or 3?) that had ~6MB/s thruput. Previously the backup took ~2hours.
This was improved to 30min by SATA HDD on llinux1.

I am looking at /opt/rtcds/caltech/c1/scripts/backup/rsync.backup.cumlog

In fact, this "30-min backup" was true until the end of March. After that the backup is taking 1h~1.5h.

This could be related to the recent NFS issue?

ELOG V3.1.3-