40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 76 of 344  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  10820   Fri Dec 19 16:59:32 2014 ericqUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

Given that op340m showed some undesired behavior, and that the FSS slow seems prone to railing lately, I've moved the FSS slow servo job over to megatron in the same way I did for the MC autolocker. 

Namely, there is an upstart configuration (megatron:/etc/init/FSSslow.conf), that invokes the slow servo. Log file is in the same old place (/cvs/cds/caltech/logs/scripts), and the servo can be (re)started by running: 

controls@megatron|~ > sudo initctl start FSSslow

Maybe this won't really change the behavior. We'll see

  10824   Fri Dec 19 20:44:23 2014 JenneUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

Today Q moved the FSS slow servo over to some init thing on megatron, and some time ago he did the same thing to the MC auto locker script.  It isn't working though.

Even though megatron was rebooted, neither script started up automatically.  As Diego mentioned in elog 10823, we ran sudo initctl start MCautolocker and sudo initctl start FSSslow, and the blinky lights for both of the scripts started.  However, that seems to be the only thing that the scripts are doing.  The MC auto locker is not detecting lockloses, and is not resetting things to allow the MC to relock.  The MC is happy to lock if I do it by hand though.  Similarly, the blinky light for the FSS is on, but the PSL temperature is moving a lot faster than normal.  I expect that it will hit one of the rails in under an hour or so. 

The MC autolocker and the FSS loop were both running earlier today, so maybe Q had some magic that he used when he started them up, that he didn't include in the elog instructions?

  10825   Sat Dec 20 00:00:03 2014 ericqUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

I ssh'd in, and was able to run each script manually successfully. I ran the initctl commands, and they started up fine too. 

We've seen this kind of behavior before, generally after reboots; see ELOGS 10247 and 10572

  10840   Tue Dec 23 18:43:33 2014 diegoUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

Quote:

I ssh'd in, and was able to run each script manually successfully. I ran the initctl commands, and they started up fine too. 

We've seen this kind of behavior before, generally after reboots; see ELOGS 10247 and 10572

In the plot it is shown the behaviour of the PSL-FSS_SLOWDC signal during the last week; the blue rectangle marks an approximate estimate of the time when the scripts were moved to megatron. Apart from the bad things that happened on Friday during the big crash, and the work ongoing since yesterday, it seems that something is not working well. The scripts on megatron are actually running, but I'll try and have a look at it.

  10844   Fri Dec 26 18:20:42 2014 ranaUpdateComputer Scripts / ProgramsFSS Slow servo thresh change

Quote:

 

In the plot it is shown the behaviour of the PSL-FSS_SLOWDC signal during the last week; the blue rectangle marks an approximate estimate of the time when the scripts were moved to megatron. Apart from the bad things that happened on Friday during the big crash, and the work ongoing since yesterday, it seems that something is not working well. The scripts on megatron are actually running, but I'll try and have a look at it.

I guessed that what was happening was that the SLOW servo settings were not restored to the right values after the code movements / reboots. The ON threshold for the servo was set at +6 counts and the channel is MC TRANS. Since the ADC noise on that channel is ~50 counts, this means that the servo keeps pushing the laser temperature off in some direction when the MC is unlocked.

I reset the threshold to +6666 counts (the aligned MC transmission is ~16000 for the  TEM00 mode) so that it only turns on when we're in a good locked state.

  10877   Thu Jan 8 03:40:50 2015 ericqUpdateComputer Scripts / ProgramsELOG 3.0

I've installed the very fresh ELOG 3.0, for nothing else than the new built in text editor which has a LATEX capable equation editor built right in. 

Check out this sweet limerick: 

\int_{1}^{\sqrt[3]{3}}t^2 dt\, \textbf{cos}(\frac{3\pi}{9}) = \textbf{ln}(\sqrt[3]{e})

  10878   Thu Jan 8 09:24:40 2015 jamieUpdateComputer Scripts / ProgramsELOG 3.0
Quote:

I've installed the very fresh ELOG 3.0, for nothing else than the new built in text editor which has a LATEX capable equation editor built right in. 

Check out this sweet limerick: 

\int_{1}^{\sqrt[3]{3}}t^2 dt\, \textbf{cos}(\frac{3\pi}{9}) = \textbf{ln}(\sqrt[3]{e})

\int \omega \epsilon \varepsilon \Gamma

  10889   Tue Jan 13 01:58:16 2015 ericqUpdateComputer Scripts / ProgramsCdsutils upgraded to 382

I've upgraded our cdsutils installation to v382; there have been some changes to pydv which will allow me to implement the auto y-scaling on our lockloss plots. 

After some brief testing, things seem to still work...

  10897   Tue Jan 13 18:47:20 2015 ChrisConfigurationComputer Scripts / Programsinstafoton setup

To use instafoton, right click an MEDM screen, open the Execute menu, and choose "Foton".  Then click on the EPICS channel of a filter module as displayed on the screen.

Here's how it was set up:

  • Install instafoton.py in /opt/rtcds/caltech/c1/scripts; edit paths to localize for the 40m
  • Add instafoton to the MEDM_EXEC_LIST environment variable, newly defined in /ligo/cdscfg/workstationrc.sh:
    export MEDM_EXEC_LIST="Edit this screen;medm &A &:Probe;probe &P &:Foton (Pick filter PV);/opt/rtcds/caltech/c1/scripts/instafoton.py &P &"
  10898   Tue Jan 13 23:17:57 2015 ChrisFrogsComputer Scripts / Programsmedm time machine

After recompiling medm with a patch for dumping screens (attached), I added a time machine to the right-click Execute menu.  It's installed under /cvs/cds/caltech/users/wipf/src/medm_time_machine. Dependencies include the python CA server module (pcaspy) and the latest nds2-client 0.11.2.  These were also installed under my users directory, to avoid interfering with other tools.

 

Attachment 1: tm.png
tm.png
Attachment 2: dump_medm_screen.patch
--- /ligo/apps/epics-3.14.12_long/extensions/src/medm/medm/utils.c.orig	2015-01-13 18:56:44.867720104 -0800
+++ /ligo/apps/epics-3.14.12_long/extensions/src/medm/medm/utils.c	2015-01-13 22:49:56.636820963 -0800
@@ -4156,6 +4156,37 @@
 	timeOffset = time900101 - time700101;
 }
 
+#if((2*MAX_TRACES)+2) > MAX_PENS
+#define MAX_COUNT 2*MAX_TRACES+2
+#else
+#define MAX_COUNT MAX_PENS
... 75 more lines ...
  10905   Thu Jan 15 18:06:34 2015 JenneUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa

I have installed kerberos on Rossa, so that I don't have to type my name and password every time I do an svn checkin, since I'm making some modifications and want to be sure that everything is checked in before and afterwards. 

I ran sudo apt-get install krb5-user.  I didn't put in a default_realm when it prompted me to during install, so I went into the /etc/krb5.conf file and changed the default_realm line to read default_realm = LIGO.ORG

Now we can use kinit, but we must (as usual) remember to kdestroy our credentials when we're done.

As a reminder, to use:

> kinit albert.einstein

Password for albert.einstein@LIGO.ORG: (type your pw here)

When you're finished, run

> kdestroy

The end.

  10906   Thu Jan 15 18:10:19 2015 jamieUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa
Quote:

I have installed kerberos on Rossa, so that I don't have to type my name and password every time I do an svn checkin, since I'm making some modifications and want to be sure that everything is checked in before and afterwards. 

I ran sudo apt-get install krb5-user.  I didn't put in a default_realm when it prompted me to during install, so I went into the /etc/krb5.conf file and changed the default_realm line to read default_realm = LIGO.ORG

Now we can use kinit, but we must (as usual) remember to kdestroy our credentials when we're done.

As a reminder, to use:

> kinit albert.einstein

Password for albert.einstein@LIGO.ORG: (type your pw here)

When you're finished, run

> kdestroy

The end.

WARNING: since the workstations are all shared user, if you forget to kdestroy the next user can commit under your user ID.  It might be good to set the timeout to be something much shorter than 24 hours, like maybe 1, or 2.

  10907   Thu Jan 15 18:30:18 2015 JenneUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa

 

Quote:

WARNING: since the workstations are all shared user, if you forget to kdestroy the next user can commit under your user ID.  It might be good to set the timeout to be something much shorter than 24 hours, like maybe 1, or 2.

Good call.  I added a line ticket_lifetime = 3600, which should make it destroy the credentials after an hour.

  10990   Mon Feb 9 17:23:17 2015 diegoUpdateComputer Scripts / ProgramsNew laptops

I forgot to elog about these ones, my bad... The new/updated laptops are giada, viviana and paola; paola is already in the lab, while giada and viviana are in the control room waiting for a new home. The Pool of Names Wiki page has already been updated to reflect the changes.

  11015   Thu Feb 12 15:21:37 2015 ericqUpdateComputer Scripts / Programsnetgpib updates

I've fixed the gpib scripts for the SR785 and AG4395A to output data in the same format as expected by older scripts when called by them. In addition, there are now some easier modes of operation through the measurement scripts SRmeasure and AGmeasure. These are on the $PATH for the main control room machines, and live in scripts/general/netgpib


Case 1: I manually set up a measurement on the analyzer, and just want to download / plot the data. 

Make sure you have a yellow prologix box plugged in, and can ping the address it is labeled with. (i.e. 'vanna'). Then, in the directory you want to save the data, run:

SRmeasure -i vanna -f mydata --getdata --plot

This saves mydata_(datetime).txt and mydata_(datetime).pdf in the current directory. 

In all cases, AGmeasure has the identical syntax. If the GPIB address is something other than 10, specifiy it with -a, but this is rarely the case.


Case 2: I want to remotely specify a measurement

Rather than a series of command line arguments, which may get lost to the mists of time, I've set the scripts up to use parameter files that serve as arguments to the scripts. 

Get the templates for spectrum and TF measurements in your current directory by running

SRmeasure --template

Set the parameters with your text editor of choice, such as frequency span, filename output, whether to create a plot or not, then run the measurement:

SRmeasure SR785template.yml


Case 3: I want to compare my data with previous measurements

In the template parameter files, there is an option 'plotRefs', that will automatically plot the data from files whose filenames start with the same string as the current measurement.

If, in the "#" commented out header of the data file, there is a line that contains "memo:" or "timestamp:", it will include the text that follows in the plot legend. 


There are also methods to remotely trigger an already configured measurement, or remotely reset an unresponsive instrument. Options can be perused by looking at the help in SRmeasure -h

I've tested, debugged, and used them for a bit, but wrinkles may remain. They've been svn40m committed, and I also set up a separate git repository for them at github.com/e-q/netgpibdata

  11065   Wed Feb 25 11:01:05 2015 manasaUpdateComputer Scripts / ProgramsNew screen for FOL PID loop

Created a new medm screen C1ALS_FOL_PID.adl for FOL PID loop control in /medm/als/master/

This is not currently linked to the sitemap screen.

  11076   Thu Feb 26 13:17:31 2015 ericqUpdateComputer Scripts / ProgramsFB IO load

Over the past few days, I've occasionally been peeking at the framebuilder IO load to see If I could correlate anything with it, but it's usually been low when I looked. I.e. with daqd and all models running, the %wa time was in the few percents at most.

Just now, I was seeing some EPICS sluggishness, and sure enough, the %wa was in the 50-60 range. I used iostat -xmh 5 on the framebuilder to see that /dev/sda, the /frames drive, was at 100% utilization, which means it was reading and writing as fast as it possibliy could. 

I ssh'd over to nodus, and with iotop found that an rsync job was running (rsync -am --exclude .*.gwf full 131.215.114.19::40m/full), and its IO rates corresponded very closely to the data read rates on the framebuilder from /frames. 

I killed the rsync process on nodus, and the %wa time on the framebuilder dropped to near zero. The ASS striptools, where I had noticed the sluggishness, immediately started updating faster.

While rsync is supposed to play nice with a system's IO demands, maybe it only knows about nodus's IO usage, not fb which is the underlying NFS server where the frames live. I think it would be good to throttle the bandwidth of these jobs to a specific bandwidth. 50MB/s seemed like too much, so maybe 10MB/s is ok?

  11077   Thu Feb 26 13:55:59 2015 jamieUpdateComputer Scripts / ProgramsFB IO load
We should use "ionice" to throttle the rsync. Use something like "ionice -c 3 rsync ..." to set the priority such that the rsync process will only work when there is no other IO contention. See "man ionice" for other options.
  11084   Fri Feb 27 11:20:49 2015 ericqUpdateComputer Scripts / ProgramsiPython Notebook for LSC Sensing Matrix
Quote:

** along the way, I noticed that the reason this notebook hasn't been working since last night is that someone sadly installed a new anaconda python distro today  without telling anyone by ELOG. This new distro didn't have all the packages of the previous one.no I've updated it with astropy and uncertainties packages.

My bad, sorry! 

Yesterday, I was trying to install a package with anaconda's package manager, conda, but it was crashing in some weird way. I wasn't able to fix it, which led me to create a fresh installation. 

  11161   Mon Mar 23 19:30:36 2015 ranaUpdateComputer Scripts / Programsrsync frames to LDAS cluster

The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now.

Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes.

You can find the script and its logfile name by doing 'crontab -l' on nodus.

  11162   Mon Mar 23 22:56:54 2015 ericqUpdateComputer Scripts / ProgramsNodus web things

Back when Diego and I were getting all of the web services running up on the new nodus, we inexplicably were not able to get the hosting of the public_html directory and wikis to share the same port of 30889. In ELOG 10793, we stated that public_html was hosted on a new port, 30888, though we didn't really bring much attention to that new fact. 

Unbeknowst to us at the time, this broke other links/bookmarks/sites that people had been using. Koji pointed this out to me the other day, but I have not made any sort of resolution. For now, the public_html directory, and the sites therein, have been taken offline. 


In other nodus news, Jamie has set Nodus' apache service with a certificate for SSL goodness. We want to extend this to the ELOG, which uses a built in webserver, rather than apache. 

He set up a proxy at the https address which will later host the secured elog: https://nodus.ligo.caltech.edu:8081/

When we make the switch to running the ELOG with HTTPS on by default, living on port 8081, we will set up apache to point 8080 at 8081, to preserve all of the old links. 

I.e. this change should effectively be invisible to ELOG users if we implement it right. 

  11166   Tue Mar 24 15:22:12 2015 manasaUpdateComputer Scripts / ProgramsNew slow channels for FOL

[Koji, Manasa]

I have created new slow channels for FOL. To do so, I have edited the fcreadout.db file in Domenica and the C0EDCU.ini file in /chans/daq

Domenica and frame builder were restarted after the edits.

Koji has moved the following files from /opt/rtcds/caltech/c1/chans/daq/ to /opt/rtcds/caltech/c1/chans/daq/trash  as they are not being used anymore.

C0EDCU1.ini
C1EDCU_X00.ini
C1EDCU_X10.ini
C1EDCU_X14.ini
C1X00.ini
C1X10.ini
C1X99.ini

  11189   Wed Apr 1 11:42:30 2015 manasaUpdateComputer Scripts / ProgramsPID script in python

Since none of us here are experts in pearl, I have put together a python script for a simple PID controller. This can be imported into any main scripts that will run the actual PID loop. The script, PID.py, exists in /scripts/general/

  11220   Wed Apr 15 15:14:18 2015 ericqUpdateComputer Scripts / ProgramsCDSutils upgraded to v474

CDSutlils has been updated to the newest version, 474; there are some matrix interface methods that will make our locking scripts easier to read, modify, and maintain.

I've tested the ALS and CARM down scripts, and the LSC offsets script, and they all work fine. 

  11221   Wed Apr 15 20:54:18 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade bad

The SUS align/misalign scripts don't work after the new CDS utils upgrade. 

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks. 

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

  11223   Wed Apr 15 23:29:08 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q remotely reverted this change.  Scripts seem to work again.

Quote:

The SUS align/misalign scripts don't work after the new CDS utils upgrade. 

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks. 

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

 

  11240   Thu Apr 23 21:05:23 2015 ranaUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q: please update this Wiki page with the go-back procedure:

https://wiki-40m.ligo.caltech.edu/CDSutils_Upgrade_Procedure

  11252   Sun Apr 26 00:56:21 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

  11263   Wed Apr 29 18:12:42 2015 ranaUpdateComputer Scripts / Programsnodus update

Installed libmotif3 and libmotif4 on nodus so that we can run dataviewer on there.

Also, the lscsoft stuff wasn't installed for apt-get, so I did so following the instructions on the DASWG website:

https://www.lsc-group.phys.uwm.edu/daswg/download/repositories.html#debian

Then I installed libmetaio1, libfftw3-3. Now, rather than complain about missing librarries, diaggui just silently dies.

Then I noticed that the awggui error message tells us to use 'ssh -Y' instead of 'ssh -X'. Using that I could run DTT on nodus from my office.

  11267   Fri May 1 20:33:31 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Same thing again todaysad. So I renamed the /etc/init/elog.conf so that it doesn't keep respawning bootlessly. Until then restart elog using the start script in /cvs/cds/caltech/elog/ as usual.

I'll let EQ debug when he gets back - probably we need to pause the elog respawn so that it waits until nodus is up for a few minutes before starting.

Quote:

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

 

  11273   Tue May 5 10:40:05 2015 ericqHowToComputer Scripts / ProgramsHow to get a web page running on Nodus

How to get your own web page running on Nodus

  1. On any martian machine, put your stuff in /users/public_html/$MYPAGE/
  2. On Nodus, run: ln -s /users/public_html/$MYPAGE /export/home/
  3. Your site is now available at https://nodus.ligo.caltech.edu:30889/$MYPAGE/
  4. If you want to allow straight up directory listing to the entire internet, on Nodus run: sudoedit /etc/sites-available/nodus, and add the following lines towards the bottom:
<Directory /export/home/$MYPAGE>
    Options +Indexes
</Directory>
  11277   Sun May 10 13:54:41 2015 ranaHowToComputer Scripts / Programssummary page URL change

Also, EQ gave us a better (and not pwd protected) URL for the summary pages. Please replace your previous links with this new one:

https://nodus.ligo.caltech.edu:30889/detcharsummary/

  11278   Mon May 11 01:28:33 2015 ranaHowToComputer Scripts / Programssummary page URL change

Like Steve pointed out, the summary pages show that the y-arm transmission drifts a lot when locked. The OL summary page shows that this is all due to ITMY yaw.

Could be either that they coil driver / DAC is bad or that the suspension is poorly built. We need to dig into ITMY OL trends over long term to see if this is new or now.

Also, weather station needs a reboot. And does anyone know what the MC_F calibration is?

  11288   Wed May 13 09:17:28 2015 ranaUpdateComputer Scripts / Programsrsync frames to LDAS cluster

Still seems to be running without causing FB issues. One thought is that we could look through the FB status channel trends and see if there is some excess of FB problems at 10 min after the hour to see if its causing problems.

I also looked into our minute trend situation. Looks like the files are comrpessed and have checksum enabled. The size changes sometimes, but its roughly 35 MB per hour. So 840 MB per day.

According to the wiper.pl script, its trying to keep the minute-trend directory to below some fixed fraction of the total /frames disk. The comment in the scripts says 0.005%,

but I'm dubious since that's only 13TB*5e-5 = 600 MB, and that would only keep us for a day. Maybe the comment should read 0.5% instead...

Quote:

The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now.

Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes.

You can find the script and its logfile name by doing 'crontab -l' on nodus.

 

  11299   Mon May 18 14:22:05 2015 ericqUpdateComputer Scripts / Programsrsync frames to LDAS cluster
Quote:

Still seems to be running without causing FB issues.

I'm not so sure. I just was experiencing some severe network latency / EPICS channel freezes that was alleviated by killing the rsync job on nodus. It started a few minutes after ten past the hour, when the rysnc job started. 

Unrelated to this, for some odd reason, there is some weirdness going on with ssh'ing to martian machines from the control room computers. I.e. on pianosa, ssh nodus fails with a failure to resolve hostaname message, but ssh nodus.martian succeeds. 

  11307   Tue May 19 11:15:09 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

Starting on the 14th (five days ago) the local chiara rsync backup of /cvs/cds to an external HDD has been failing:

caltech/c1/scripts/backup/rsync_chiara.backup.log:

2015-05-13 07:00:01,614 INFO       Updating backup image of /cvs/cds
2015-05-13 07:49:46,266 INFO       Backup rsync job ran successfully, transferred 6504 files.
2015-05-14 07:00:01,826 INFO       Updating backup image of /cvs/cds
2015-05-14 07:50:18,709 ERROR      Backup rysnc job failed with exit code 24!
2015-05-15 07:00:01,385 INFO       Updating backup image of /cvs/cds
2015-05-15 08:09:18,527 ERROR      Backup rysnc job failed with exit code 24!
...
 

Code 24 apparently means "Partial transfer due to vanished source files."

Manually running the backup command on chiara worked fine, returning a code of 0 (success), so we are backed up. For completeness, the command is controls@chiara: sudo rsync -av --delete --stats /home/cds/ /media/40mBackup

Are the summary page jobs moving files around at this time of day? If so, one of the two should be rescheduled to not conflict. 

  11308   Tue May 19 11:24:44 2015 ericqUpdateComputer Scripts / ProgramsNotification Scheme

Given some of the things we've facing lately, it occurs to me that we could be better served by having some sort of unified human-alerting scheme in place, for things like:

  • Local/offsite backup failures
  • Vaccumm system problems
  • HDD status for things like /frames/ and /cvs/cds/, whether the disks are full, or their SMART status indicates imminent mechanical failure

Currently, many of these things are just checked sporadically when it occurs to someone to do so, or when debugging random issues. Smoother IFO operation and peace of mind could be gained if we're confident that the relevant people are notified in a timely manner. 

Thoughts? Suggestions on other things to monitor, like maybe frontend/model crashes?

  11321   Fri May 22 18:09:58 2015 ericqUpdateComputer Scripts / ProgramsifoCoupling

I've started working on a general routine to measure noise couplings in our interferometers. Often this is done with swept sine measurements, but this misses the nonlinear part of the coupling, especially if the linear part is alreay reduced through some compensation or feedforward scheme. Rana suggested using a series of narrow band-limited noise injections. 

The structure I'm working on is a python script that uses the AWG interface written by Chris W. to create the excitations. Afterwards, I calculate a series of PSD estimates from the data (i.e. a spectrogram), and apply a two-sample, unequal variance, t-test to test for statisically significant increases in the noise spectra to try and evaluate the nonlinear contriubutions to the noise. I've started a git repository at github.com/e-q/ifoCoupling with the code. 

So far, I've tested one such injection of noise coupling from the ETMX oplev error point to the single arm length error signal. It's completely missing the user interface and structure to do a general series of measurements, but this is just organizational; I'm trying to get the math/science down first. 

Here's a result from today:

Median, instead of the usual mean, PSDs are used throughout, to reject outliers/glitches.

The linear part of the coupling can be estimated using the coherence / spectrum height in the excitation band, but I'm not sure what the best what to present/paramerize the nonlinear parts of each individaul excitation band's result is.

Also, I anticipate being able to write an excitation auto-leveling routine, gradually increasing the exctiation level until the excited spectrum is some amount  noisier than the baseline spectrum, up to some maximum amount configurable by the user. 

The excitation shaping could probably be improved, too. It's currently and elliptic + butterworth bandpass for a sharp edge and rolloff. 

I'm open to any thoughts and/or suggestions anyone may have!

Attachment 1: ETMX_PIT_L_coupling.png
ETMX_PIT_L_coupling.png
  11325   Tue May 26 19:57:11 2015 ranaUpdateComputer Scripts / ProgramsifoCoupling

Looks like a very handy code, especially with the real statistical tests.

I would make sure to use much smaller excitation amplitudes. Since the coupling is nonlinear, we expect that its only a good noise budget estimator when the excitation amplitude is less than a factor of 3 above the quiesscent excitation.

  11327   Wed May 27 15:20:54 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

The local chiara backups are still failing due to vanished source files. I've emailed Max about the summary page jobs, since I think they're running remotely. 

  11336   Fri May 29 11:28:42 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

I've changed the chiara local backup script to read a folder exclusion file, and excluded /users/public_html/detcharsummary, and things are working again. 

This was neccesary because the summary pages are being updated every half hour, which is faster than the time it takes for the backup script to run, so the file index that it builds at the start becomes invalid later on in the process. 


Thinking about chiara's disk, it strikes me that when we went from the linux1 RAID to a single HDD on chiara, we may have tightened a bottleneck on our NFS latency, i.e. we are limited to that single hard drive's IO rates. This of course isn't the culprit for the more recent dramatic slowdowns, but in addition to fixing whatever has happened more recently, we may want to consider some kind of setup with higher IO capability for the NFS filesystem. 

  11337   Fri May 29 12:49:53 2015 KojiUpdateComputer Scripts / ProgramsChiara Backup Hiccup

In fact, the file access is supposed to be WAY faster now than in the RAID case.

As noted in ELOG 9511, it was SCSI-2(or 3?) that had ~6MB/s thruput. Previously the backup took ~2hours.
This was improved to 30min by SATA HDD on llinux1.

I am looking at /opt/rtcds/caltech/c1/scripts/backup/rsync.backup.cumlog

In fact, this "30-min backup" was true until the end of March. After that the backup is taking 1h~1.5h.

This could be related to the recent NFS issue?

  11338   Fri May 29 15:12:39 2015 KojiUpdateComputer Scripts / ProgramsChiara Backup Hiccup

Actual data

Attachment 1: backup_hours.pdf
backup_hours.pdf
  11366   Fri Jun 19 16:54:20 2015 JenneUpdateComputer Scripts / ProgramsWiener scripts in scripts directory

I have put the Wiener filter scripts into  /opt/rtcds/caltech/c1/scripts/Wiener/  .  They are under version control. 

The idea is that you should copy ParameterFile_Example.m into your own directory, and modify parameters at the top of the file, and then when you run that script, it will output fitted filters ready to go into Foton.  (Obviously you must check before actually implementing them that you're happy with the efficacy and fits of the filters). 

Things to be edited in the ParameterFile include:

  • Channel names for the witness sensors (which should each have a corresponding .txt file with the raw data)
  • Channel name for the target
  • Folder where this raw data is saved
  • Folder to save results
  • 1 or 0 to determine if need to load and downsample the raw data, or if can use pre-downsampled data
    • This should probably be changed to just look to see if the pre-downsampled data already exists, and if not, do the downsampling
  • 1 or 0 to determine if should use actuator pre-weighting
  • Data folder for measured actuator TFs (only if using actuator pre-weighting)
    • Actuator TFs can be many different exported text files from DTT, and they will be stitched together to make one set of measurements, where all points have coherence above some quantity (that you set in the ParameterFile)
  • Coherence threshold for actuator data (only use data points with coherence above this amount)
  • Fit order for actuator transfer function's vectfit
  • 1 or 0 to decide if should use preweighting filter
  • zeros and poles for preweighting filters
  • 1 or 0 to decide if should use lowpass after Wiener filters (will be provided corresponding SOS coefficients for this filter, if you say yes)
  • Lowpass filter parameters: cuttoff freq, order and ripple for the Cheby filter
  • New sample rate for the data
  • Number of Wiener filter taps
  • Decide if use brute force matrix inversion or Levinson method
  • Calibrations for witnesses and target
  • Fit order for each of the Wiener filters

I think that's everything that is required.

  •  
  11481   Thu Aug 6 01:38:19 2015 ericqUpdateComputer Scripts / ProgramsChiara gets new Ethernet card

Since Chiara's onboard ethernet card has a reputation to be flaky in Linux, Koji suggested we could just buy a new ethernet card and throw it in there, since they're cheap. 

I've installed a Intel EXPI9301CT ethernet card in Chiara, which detected it without problems. I changed over the network settings in /etc/networking/interfaces to use eth1 instead of eth0, restarted nfs and bind9, and everything looked fine. 

Sadly, EPICS/network slowdowns are still happening. :(

  11498   Wed Aug 12 14:35:46 2015 ericqUpdateComputer Scripts / ProgramsPDFs in ELOG

I've tweaked the ELOG code to allow uploading of PDFs by drag-and-drop into the main editor window. Once again we can bask in the glory of 

(You may have to clear your browser's cache to load the new javascript)

Attachment 1: smooth.pdf
smooth.pdf
  11572   Fri Sep 4 04:12:05 2015 ericqUpdateComputer Scripts / ProgramsMATLAB down on all workstations

There seems to be something funny going on with MATLAB's license authentication on the control room workstations. Earlier today, I was able to start MATLAB on pianosa, but now attempting to run /cvs/cds/caltech/apps/linux64/matlab/bin/matlab -desktop results in the message: 

License checkout failed.
License Manager Error -15
MATLAB is unable to connect to the license server. 
Check that the license manager has been started, and that the MATLAB client machine can communicate
with the license server.

Troubleshoot this issue by visiting: 
http://www.mathworks.com/support/lme/R2013a/15

Diagnostic Information:
Feature: MATLAB 
License path: /home/controls/.matlab/R2013a_licenses:/cvs/cds/caltech/apps/linux64/matlab/licenses/license.dat:/cv
s/cds/caltech/apps/linux64/matlab/licenses/network.lic 
Licensing error: -15,570. System Error: 115

  11580   Mon Sep 7 16:30:56 2015 ranaHowToComputer Scripts / Programsincrease of window border size on Rossa

Frustrated by the single pixel width of the windows and how hard that makes it to drag things around, I explored StackExchange:

which showed how there is a .xml file which can be edited to increase this. I've changed the border size to 4 pixels on Rossa - its nice.devil

  11615   Thu Sep 17 19:58:06 2015 gautamSummaryComputer Scripts / ProgramsFrequency counting algorithm

I made some changes to the c1tst model running on c1iscey in order to test my algorithm for frequency counting. I followed the steps listed in elog 8909 to make, install and start the model. 

I need to debug a few things and run some more diagnostics so I am leaving the model in its edited version (Eric had committed it to the svn before I made any changes). 

  11618   Fri Sep 18 09:06:26 2015 ranaFrogsComputer Scripts / Programsremote data access: volume 1, Inferno

Trying to download some data using matlab today, I found that my ole mDV stuff doesn't work because its MEX files were built for AMD64...

Tried to rebuild the NDS1 MEX according to 7 year old instructions didn't work; our GCC is 'too' new.

From the Remote Data Access wiki (https://wiki.ligo.org/RemoteAccess/MatlabTools) I got the new 'get_data.m' and 'GWdata.m'. These didn't run, so I updated the nds2-client and matlab-nds2-client on Donatella.

Still doesn't run to get 40m data. It recognizes that we're C1, but throws some java exception error. Maybe it doesn't work on the NDS1 protocol of our framebuilder?

So then I noticed that our NDS2 server on megatron is no longer running...thought it was supposed to run via init.d. Found that the nds2 binary doesn't run because it can't find libframecpp.so.5; maybe this was blown away in some recent upgrade? We do have versions 3, 4, 6, 7, & 8 of this library installed.

So now, after an hour or two, I'm upgrading the nds2 server on megatron (plus a hundred dependencies) as well as getting a newer version of matlab to see if there's some kind of java version issue there.

Of course python still works to get data, but doesn't have any of the wiener filter calculating code that matlab has...

ELOG V3.1.3-