40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 74 of 339  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  10759   Fri Dec 5 15:41:45 2014 JenneUpdateComputer Scripts / ProgramsNodus on fast GC network

Apparently, some time ago Larry Wallace installed a new, fast ethernet switch in the old nodus rack. Q and I have just now moved nodus' GC ethernet cable over to the new switch.  Dan Kozak is going to use this faster connection to make the data flow over to the cluster not so lag-y.

  10772   Wed Dec 10 14:22:37 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

[Diego, Steve]

We ran a Cat 6+ Ethernet cable from the 1X7 rack (where the new nodus is located) to the fast GC switch in the control room rack; now I will learn how to setup the 'outside world' network, iptables, and the like.

 

I remind that the current hardware/software status is posted in elog 10697 ; if additions or corrections are needed, let me know.

 

After I check a couple of things, we can use the new nodus (which is currently known in the martian network as rosalba) as a local test to see that everything is working. After that (and, mostly, after I'll have the network working), we will sync the data from the old nodus to the new one and make the switch.

  10779   Thu Dec 11 12:39:31 2014 diegoUpdateComputer Scripts / ProgramsFrequency Offset Locking scripts status

 I finished the polishing in the scripts/FOL directory, this is the current status and this post replaces my two previous posts on the subject:

  • the Raspberry Pi operates locally: everything is in its /opt/FOL directory, which is a mirror of /opt/rtcds/caltech/c1/scripts/FOL/ ; some backup/sync scripts should be set up, tell me what kind (sync direction, place to call the script from, etc..) is recommended and I'll set it up;

 

  • the /opt/FOL directory contains:
    • ADC_interface                 : Akhil's ADC interface software and dependencies;
    • akhilTestCodes                : Akhil's work directory with his programs and data;
    • backup                             : two zip files with a full backup of the FOL stuff for both chiara and domenica at 2014/12/08, before my work on the directory;
    • fcreadoutApp                   : the EPICS app compiled on domenica. I didn't modify anything in particular here, as I don't know much about EPICS Apps; I'm not even sure if it is used by now, as I launch EPICS manually by just giving him a .db file (see below).
    • armFC*                        : it is the single program that constantly fetches data for the channels: it takes as arguments the RPi device (/dev/hidraw0 for the X arm, and /dev/hidraw1 for the Y arm) and the value to write into the frequency counter (0x3 for initialization and 0x2 for actual use); hence, there is no more need for recompilation!
    • epics_channels.py :  this is the new version of the old codetorun.py script; it handles the initialization and the availability of the two beatnote channels;
    • fcreadout / freqCountIOC : these are the binaries of the EPICS apps that I found on chiara/domenica; they are not used as of now, but could be useful;
    • fcreadout.db           : it is the database file that is loaded by EPICS to handle the channels;
    • FOLPID.pl              : the Perl PID controller; it is still the old version, we will work on this one later on (see Manasa's schedule at elog 10760 for info)

 

  • Domenica's environment:
    • as I said, everything runs locally from /opt/FOL;
    • in particular, I added in /etc/inittab two lines that launch EPICS and the python script for the channels; respawn is supported so these processes should always be available. For this to happen, DO NOT MOVE armFC, epics_channels.py and fcreadout.db from the /opt/FOL directory on domenica!
    • I added a udev rule in /etc/udev/rules.d/98-hidraw-permissions.rules to let the controls user access the /dev/hidraw* devices without having to sudo all the time;
    • I updated the ~/.bashrc and /opt/epics/epics-user-env.sh files to fix syntax errors and add some aliases we usually use.
  10783   Thu Dec 11 17:45:54 2014 ericqUpdateComputer Scripts / ProgramsLockloss plotting

 With some advice from Jamie, I've gotten the lock loss plotting script that is used at LHO working on our machines. The other night, I modified the ALSwatch.py script to log lockloss times. Tying it together, I've written a small wrapper script that grabs the last time from the lockloss log, and plots it. 

It is: scripts/LSC/LocklossData/lastlock.sh

Jamie's going to make an adjustment to the pydv codebase that will let me implement the auto y-scaling that we like. We also will need to get a feel for the right timing window, once we see what kind of delay in the ALSwatch script is typical. 

Here's an example of the output, with the window of [-10,+2] seconds from the logged GPS time:

figure.png

 

  10784   Thu Dec 11 18:00:43 2014 ericqUpdateComputer Scripts / ProgramsMC error signal monitoring

The other day, I hooked up the agilent analyzer to OUT2 of the MC board, which is currently set to output the MC refl error signal.  I've written a GPIB-based program that continuously polls the analyzer, and plots the live spectrum, an exponentially weighted running mean, and the first measured spectrum. 

The intended use case is to see if the FSS or MC loops are going crazy when we're locking. Sometimes the GPIB interface hangs/loses its connection, and the script needs a restart.

The script lives in scripts/MC/MCerrmon

 

 

  10793   Fri Dec 12 19:38:49 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

Quote:

[Diego, Steve]

We ran a Cat 6+ Ethernet cable from the 1X7 rack (where the new nodus is located) to the fast GC switch in the control room rack; now I will learn how to setup the 'outside world' network, iptables, and the like.

 

I remind that the current hardware/software status is posted in elog 10697 ; if additions or corrections are needed, let me know.

 

After I check a couple of things, we can use the new nodus (which is currently known in the martian network as rosalba) as a local test to see that everything is working. After that (and, mostly, after I'll have the network working), we will sync the data from the old nodus to the new one and make the switch.

[Diego, EricQ]

Update: work is almost completed; the old nodus is still online, as I don't feel confident to make the switch and leave it on its own for the weekend. However, the new nodus is online with the IP address 131.215.114.87, so everyone can check that everything works. From my tests I can say that:

After everything will be in place, I will save every reasonably important configuration file of nodus into the svn.

 

I remind that every change made while accessing the 131.215.114.87 machine will be purged during the sync&switch

 

  10797   Mon Dec 15 12:53:13 2014 ericqUpdateComputer Scripts / ProgramsStatus of the new nodus

 Nodus (solaris) is dead, long live Nodus (ubuntu).

Diego and I are smoothing out the Kinks as they appear, but the ELOG is running smoothly on our new machine. 

SVN is working, but your checkouts may complain because they expect https, and we haven't turned SSL on yet...

  10798   Mon Dec 15 16:27:57 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

Quote:

 Nodus (solaris) is dead, long live Nodus (ubuntu).

Diego and I are smoothing out the Kinks as they appear, but the ELOG is running smoothly on our new machine. 

SVN is working, but your checkouts may complain because they expect https, and we haven't turned SSL on yet...

 [Diego, EricQ]

SSL, https and backups are now working too!

A backup of nodus's configuration (with some explaining) will be done soon.

  10805   Tue Dec 16 20:49:25 2014 diegoUpdateComputer Scripts / ProgramsStatus of the new nodus

Quote:

Quote:

 Nodus (solaris) is dead, long live Nodus (ubuntu).

Diego and I are smoothing out the Kinks as they appear, but the ELOG is running smoothly on our new machine. 

SVN is working, but your checkouts may complain because they expect https, and we haven't turned SSL on yet...

 [Diego, EricQ]

SSL, https and backups are now working too!

A backup of nodus's configuration (with some explaining) will be done soon.

 Nodus should be visible again from outside the Caltech Network; I added some basic configuration for postfix and smartmontools; configuration files and instructions for everything are in the svn in the nodus_config folder

  10815   Thu Dec 18 15:41:30 2014 ericqUpdateComputer Scripts / ProgramsOffsite backups of /cvs/cds going again

Since the Nodus switch, the offsite backup scripts (scripts/backup/rsync.backup) had not been running successfully. I tracked it down to the weird NFS file ownership issues we've been seeing since making Chiara the fileserver. Since the backup script uses rsync's "archive" mode, which preserves ownership, permissions, modification dates, etc, not seeing the proper ownership made everything wacky. 

Despite 99% of the searches you do about this problem saying you just need to match your user's uid and gid on the NFS client and server, it turns out NFSv4 doesn't use this mechanism at all, opting instead for some ID mapping service (idmapd), which I have no inclination of figuring out at this time. 

Thus, I've configured /etc/fstab on Nodus (and the control room machines) to use NFSv3 when mounting /cvs/cds. Now, all the file ownerships show up correctly, and the offsite backup of /cvs/cds is churning along happily. 

  10816   Thu Dec 18 16:21:08 2014 ericqUpdateComputer Scripts / Programsscripts not being backed up!

 I just stumbled upon this while poking around:

Since the great crash of June 2014, the scripts backup script has not been workingon op340m. For some reason, it's only grabbing the PRFPMI folder, and nothing else. 

Megatron seems to be able to run it. I've moved the job to megatron's crontab for now.

  10817   Fri Dec 19 14:25:48 2014 diegoUpdateComputer Scripts / Programselog restarted

elog was not responding for unknown reasons, since the elogd process on nodus was alive; anyway, I restarted it. 

  10819   Fri Dec 19 16:39:25 2014 ericqUpdateComputer Scripts / Programselog autostart

I've set up nodus to start the ELOG on boot, through /etc/init/elog.conf. Also, thanks to this, we don't need to use the start-elog.csh script any more. We can now just do:

controls@nodus:~  $ sudo initctl restart elog

I also tweaked some of the ELOG settings, so that image thumbnails are produced at higher resolution and quality. 

  10820   Fri Dec 19 16:59:32 2014 ericqUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

Given that op340m showed some undesired behavior, and that the FSS slow seems prone to railing lately, I've moved the FSS slow servo job over to megatron in the same way I did for the MC autolocker. 

Namely, there is an upstart configuration (megatron:/etc/init/FSSslow.conf), that invokes the slow servo. Log file is in the same old place (/cvs/cds/caltech/logs/scripts), and the servo can be (re)started by running: 

controls@megatron|~ > sudo initctl start FSSslow

Maybe this won't really change the behavior. We'll see

  10824   Fri Dec 19 20:44:23 2014 JenneUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

Today Q moved the FSS slow servo over to some init thing on megatron, and some time ago he did the same thing to the MC auto locker script.  It isn't working though.

Even though megatron was rebooted, neither script started up automatically.  As Diego mentioned in elog 10823, we ran sudo initctl start MCautolocker and sudo initctl start FSSslow, and the blinky lights for both of the scripts started.  However, that seems to be the only thing that the scripts are doing.  The MC auto locker is not detecting lockloses, and is not resetting things to allow the MC to relock.  The MC is happy to lock if I do it by hand though.  Similarly, the blinky light for the FSS is on, but the PSL temperature is moving a lot faster than normal.  I expect that it will hit one of the rails in under an hour or so. 

The MC autolocker and the FSS loop were both running earlier today, so maybe Q had some magic that he used when he started them up, that he didn't include in the elog instructions?

  10825   Sat Dec 20 00:00:03 2014 ericqUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

I ssh'd in, and was able to run each script manually successfully. I ran the initctl commands, and they started up fine too. 

We've seen this kind of behavior before, generally after reboots; see ELOGS 10247 and 10572

  10840   Tue Dec 23 18:43:33 2014 diegoUpdateComputer Scripts / ProgramsFSS Slow servo moved to megatron

Quote:

I ssh'd in, and was able to run each script manually successfully. I ran the initctl commands, and they started up fine too. 

We've seen this kind of behavior before, generally after reboots; see ELOGS 10247 and 10572

In the plot it is shown the behaviour of the PSL-FSS_SLOWDC signal during the last week; the blue rectangle marks an approximate estimate of the time when the scripts were moved to megatron. Apart from the bad things that happened on Friday during the big crash, and the work ongoing since yesterday, it seems that something is not working well. The scripts on megatron are actually running, but I'll try and have a look at it.

  10844   Fri Dec 26 18:20:42 2014 ranaUpdateComputer Scripts / ProgramsFSS Slow servo thresh change

Quote:

 

In the plot it is shown the behaviour of the PSL-FSS_SLOWDC signal during the last week; the blue rectangle marks an approximate estimate of the time when the scripts were moved to megatron. Apart from the bad things that happened on Friday during the big crash, and the work ongoing since yesterday, it seems that something is not working well. The scripts on megatron are actually running, but I'll try and have a look at it.

I guessed that what was happening was that the SLOW servo settings were not restored to the right values after the code movements / reboots. The ON threshold for the servo was set at +6 counts and the channel is MC TRANS. Since the ADC noise on that channel is ~50 counts, this means that the servo keeps pushing the laser temperature off in some direction when the MC is unlocked.

I reset the threshold to +6666 counts (the aligned MC transmission is ~16000 for the  TEM00 mode) so that it only turns on when we're in a good locked state.

  10877   Thu Jan 8 03:40:50 2015 ericqUpdateComputer Scripts / ProgramsELOG 3.0

I've installed the very fresh ELOG 3.0, for nothing else than the new built in text editor which has a LATEX capable equation editor built right in. 

Check out this sweet limerick: 

\int_{1}^{\sqrt[3]{3}}t^2 dt\, \textbf{cos}(\frac{3\pi}{9}) = \textbf{ln}(\sqrt[3]{e})

  10878   Thu Jan 8 09:24:40 2015 jamieUpdateComputer Scripts / ProgramsELOG 3.0
Quote:

I've installed the very fresh ELOG 3.0, for nothing else than the new built in text editor which has a LATEX capable equation editor built right in. 

Check out this sweet limerick: 

\int_{1}^{\sqrt[3]{3}}t^2 dt\, \textbf{cos}(\frac{3\pi}{9}) = \textbf{ln}(\sqrt[3]{e})

\int \omega \epsilon \varepsilon \Gamma

  10889   Tue Jan 13 01:58:16 2015 ericqUpdateComputer Scripts / ProgramsCdsutils upgraded to 382

I've upgraded our cdsutils installation to v382; there have been some changes to pydv which will allow me to implement the auto y-scaling on our lockloss plots. 

After some brief testing, things seem to still work...

  10897   Tue Jan 13 18:47:20 2015 ChrisConfigurationComputer Scripts / Programsinstafoton setup

To use instafoton, right click an MEDM screen, open the Execute menu, and choose "Foton".  Then click on the EPICS channel of a filter module as displayed on the screen.

Here's how it was set up:

  • Install instafoton.py in /opt/rtcds/caltech/c1/scripts; edit paths to localize for the 40m
  • Add instafoton to the MEDM_EXEC_LIST environment variable, newly defined in /ligo/cdscfg/workstationrc.sh:
    export MEDM_EXEC_LIST="Edit this screen;medm &A &:Probe;probe &P &:Foton (Pick filter PV);/opt/rtcds/caltech/c1/scripts/instafoton.py &P &"
  10898   Tue Jan 13 23:17:57 2015 ChrisFrogsComputer Scripts / Programsmedm time machine

After recompiling medm with a patch for dumping screens (attached), I added a time machine to the right-click Execute menu.  It's installed under /cvs/cds/caltech/users/wipf/src/medm_time_machine. Dependencies include the python CA server module (pcaspy) and the latest nds2-client 0.11.2.  These were also installed under my users directory, to avoid interfering with other tools.

 

Attachment 1: tm.png
tm.png
Attachment 2: dump_medm_screen.patch
--- /ligo/apps/epics-3.14.12_long/extensions/src/medm/medm/utils.c.orig	2015-01-13 18:56:44.867720104 -0800
+++ /ligo/apps/epics-3.14.12_long/extensions/src/medm/medm/utils.c	2015-01-13 22:49:56.636820963 -0800
@@ -4156,6 +4156,37 @@
 	timeOffset = time900101 - time700101;
 }
 
+#if((2*MAX_TRACES)+2) > MAX_PENS
+#define MAX_COUNT 2*MAX_TRACES+2
+#else
+#define MAX_COUNT MAX_PENS
... 75 more lines ...
  10905   Thu Jan 15 18:06:34 2015 JenneUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa

I have installed kerberos on Rossa, so that I don't have to type my name and password every time I do an svn checkin, since I'm making some modifications and want to be sure that everything is checked in before and afterwards. 

I ran sudo apt-get install krb5-user.  I didn't put in a default_realm when it prompted me to during install, so I went into the /etc/krb5.conf file and changed the default_realm line to read default_realm = LIGO.ORG

Now we can use kinit, but we must (as usual) remember to kdestroy our credentials when we're done.

As a reminder, to use:

> kinit albert.einstein

Password for albert.einstein@LIGO.ORG: (type your pw here)

When you're finished, run

> kdestroy

The end.

  10906   Thu Jan 15 18:10:19 2015 jamieUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa
Quote:

I have installed kerberos on Rossa, so that I don't have to type my name and password every time I do an svn checkin, since I'm making some modifications and want to be sure that everything is checked in before and afterwards. 

I ran sudo apt-get install krb5-user.  I didn't put in a default_realm when it prompted me to during install, so I went into the /etc/krb5.conf file and changed the default_realm line to read default_realm = LIGO.ORG

Now we can use kinit, but we must (as usual) remember to kdestroy our credentials when we're done.

As a reminder, to use:

> kinit albert.einstein

Password for albert.einstein@LIGO.ORG: (type your pw here)

When you're finished, run

> kdestroy

The end.

WARNING: since the workstations are all shared user, if you forget to kdestroy the next user can commit under your user ID.  It might be good to set the timeout to be something much shorter than 24 hours, like maybe 1, or 2.

  10907   Thu Jan 15 18:30:18 2015 JenneUpdateComputer Scripts / ProgramsInstalled kerberos on Rossa

 

Quote:

WARNING: since the workstations are all shared user, if you forget to kdestroy the next user can commit under your user ID.  It might be good to set the timeout to be something much shorter than 24 hours, like maybe 1, or 2.

Good call.  I added a line ticket_lifetime = 3600, which should make it destroy the credentials after an hour.

  10990   Mon Feb 9 17:23:17 2015 diegoUpdateComputer Scripts / ProgramsNew laptops

I forgot to elog about these ones, my bad... The new/updated laptops are giada, viviana and paola; paola is already in the lab, while giada and viviana are in the control room waiting for a new home. The Pool of Names Wiki page has already been updated to reflect the changes.

  11015   Thu Feb 12 15:21:37 2015 ericqUpdateComputer Scripts / Programsnetgpib updates

I've fixed the gpib scripts for the SR785 and AG4395A to output data in the same format as expected by older scripts when called by them. In addition, there are now some easier modes of operation through the measurement scripts SRmeasure and AGmeasure. These are on the $PATH for the main control room machines, and live in scripts/general/netgpib


Case 1: I manually set up a measurement on the analyzer, and just want to download / plot the data. 

Make sure you have a yellow prologix box plugged in, and can ping the address it is labeled with. (i.e. 'vanna'). Then, in the directory you want to save the data, run:

SRmeasure -i vanna -f mydata --getdata --plot

This saves mydata_(datetime).txt and mydata_(datetime).pdf in the current directory. 

In all cases, AGmeasure has the identical syntax. If the GPIB address is something other than 10, specifiy it with -a, but this is rarely the case.


Case 2: I want to remotely specify a measurement

Rather than a series of command line arguments, which may get lost to the mists of time, I've set the scripts up to use parameter files that serve as arguments to the scripts. 

Get the templates for spectrum and TF measurements in your current directory by running

SRmeasure --template

Set the parameters with your text editor of choice, such as frequency span, filename output, whether to create a plot or not, then run the measurement:

SRmeasure SR785template.yml


Case 3: I want to compare my data with previous measurements

In the template parameter files, there is an option 'plotRefs', that will automatically plot the data from files whose filenames start with the same string as the current measurement.

If, in the "#" commented out header of the data file, there is a line that contains "memo:" or "timestamp:", it will include the text that follows in the plot legend. 


There are also methods to remotely trigger an already configured measurement, or remotely reset an unresponsive instrument. Options can be perused by looking at the help in SRmeasure -h

I've tested, debugged, and used them for a bit, but wrinkles may remain. They've been svn40m committed, and I also set up a separate git repository for them at github.com/e-q/netgpibdata

  11065   Wed Feb 25 11:01:05 2015 manasaUpdateComputer Scripts / ProgramsNew screen for FOL PID loop

Created a new medm screen C1ALS_FOL_PID.adl for FOL PID loop control in /medm/als/master/

This is not currently linked to the sitemap screen.

  11076   Thu Feb 26 13:17:31 2015 ericqUpdateComputer Scripts / ProgramsFB IO load

Over the past few days, I've occasionally been peeking at the framebuilder IO load to see If I could correlate anything with it, but it's usually been low when I looked. I.e. with daqd and all models running, the %wa time was in the few percents at most.

Just now, I was seeing some EPICS sluggishness, and sure enough, the %wa was in the 50-60 range. I used iostat -xmh 5 on the framebuilder to see that /dev/sda, the /frames drive, was at 100% utilization, which means it was reading and writing as fast as it possibliy could. 

I ssh'd over to nodus, and with iotop found that an rsync job was running (rsync -am --exclude .*.gwf full 131.215.114.19::40m/full), and its IO rates corresponded very closely to the data read rates on the framebuilder from /frames. 

I killed the rsync process on nodus, and the %wa time on the framebuilder dropped to near zero. The ASS striptools, where I had noticed the sluggishness, immediately started updating faster.

While rsync is supposed to play nice with a system's IO demands, maybe it only knows about nodus's IO usage, not fb which is the underlying NFS server where the frames live. I think it would be good to throttle the bandwidth of these jobs to a specific bandwidth. 50MB/s seemed like too much, so maybe 10MB/s is ok?

  11077   Thu Feb 26 13:55:59 2015 jamieUpdateComputer Scripts / ProgramsFB IO load
We should use "ionice" to throttle the rsync. Use something like "ionice -c 3 rsync ..." to set the priority such that the rsync process will only work when there is no other IO contention. See "man ionice" for other options.
  11084   Fri Feb 27 11:20:49 2015 ericqUpdateComputer Scripts / ProgramsiPython Notebook for LSC Sensing Matrix
Quote:

** along the way, I noticed that the reason this notebook hasn't been working since last night is that someone sadly installed a new anaconda python distro today  without telling anyone by ELOG. This new distro didn't have all the packages of the previous one.no I've updated it with astropy and uncertainties packages.

My bad, sorry! 

Yesterday, I was trying to install a package with anaconda's package manager, conda, but it was crashing in some weird way. I wasn't able to fix it, which led me to create a fresh installation. 

  11161   Mon Mar 23 19:30:36 2015 ranaUpdateComputer Scripts / Programsrsync frames to LDAS cluster

The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now.

Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes.

You can find the script and its logfile name by doing 'crontab -l' on nodus.

  11162   Mon Mar 23 22:56:54 2015 ericqUpdateComputer Scripts / ProgramsNodus web things

Back when Diego and I were getting all of the web services running up on the new nodus, we inexplicably were not able to get the hosting of the public_html directory and wikis to share the same port of 30889. In ELOG 10793, we stated that public_html was hosted on a new port, 30888, though we didn't really bring much attention to that new fact. 

Unbeknowst to us at the time, this broke other links/bookmarks/sites that people had been using. Koji pointed this out to me the other day, but I have not made any sort of resolution. For now, the public_html directory, and the sites therein, have been taken offline. 


In other nodus news, Jamie has set Nodus' apache service with a certificate for SSL goodness. We want to extend this to the ELOG, which uses a built in webserver, rather than apache. 

He set up a proxy at the https address which will later host the secured elog: https://nodus.ligo.caltech.edu:8081/

When we make the switch to running the ELOG with HTTPS on by default, living on port 8081, we will set up apache to point 8080 at 8081, to preserve all of the old links. 

I.e. this change should effectively be invisible to ELOG users if we implement it right. 

  11166   Tue Mar 24 15:22:12 2015 manasaUpdateComputer Scripts / ProgramsNew slow channels for FOL

[Koji, Manasa]

I have created new slow channels for FOL. To do so, I have edited the fcreadout.db file in Domenica and the C0EDCU.ini file in /chans/daq

Domenica and frame builder were restarted after the edits.

Koji has moved the following files from /opt/rtcds/caltech/c1/chans/daq/ to /opt/rtcds/caltech/c1/chans/daq/trash  as they are not being used anymore.

C0EDCU1.ini
C1EDCU_X00.ini
C1EDCU_X10.ini
C1EDCU_X14.ini
C1X00.ini
C1X10.ini
C1X99.ini

  11189   Wed Apr 1 11:42:30 2015 manasaUpdateComputer Scripts / ProgramsPID script in python

Since none of us here are experts in pearl, I have put together a python script for a simple PID controller. This can be imported into any main scripts that will run the actual PID loop. The script, PID.py, exists in /scripts/general/

  11220   Wed Apr 15 15:14:18 2015 ericqUpdateComputer Scripts / ProgramsCDSutils upgraded to v474

CDSutlils has been updated to the newest version, 474; there are some matrix interface methods that will make our locking scripts easier to read, modify, and maintain.

I've tested the ALS and CARM down scripts, and the LSC offsets script, and they all work fine. 

  11221   Wed Apr 15 20:54:18 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade bad

The SUS align/misalign scripts don't work after the new CDS utils upgrade. 

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks. 

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

  11223   Wed Apr 15 23:29:08 2015 JenneUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q remotely reverted this change.  Scripts seem to work again.

Quote:

The SUS align/misalign scripts don't work after the new CDS utils upgrade. 

I don't know if it's looking for the _SWSTAT channel to confirm that the offset has been turned on/off, or if it is trying to set that channel, to do the switching, but either way, the script is failing.  Recall that our version of the RCG still has _SW1R and _SW2R, rather than the newer _SWSTAT for the filter banks. 

ezca.ezca.EzcaConnectError: Could not connect to channel (timeout=2s): C1:SUS-PRM_OL_PIT_SWSTAT

Q, can you please (please, please, pretty please) undo this upgrade, and then hold off on any further changes to the system for a few weeks?

 

  11240   Thu Apr 23 21:05:23 2015 ranaUpdateComputer Scripts / ProgramsCDSutils upgrade undone

Q: please update this Wiki page with the go-back procedure:

https://wiki-40m.ligo.caltech.edu/CDSutils_Upgrade_Procedure

  11252   Sun Apr 26 00:56:21 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

  11263   Wed Apr 29 18:12:42 2015 ranaUpdateComputer Scripts / Programsnodus update

Installed libmotif3 and libmotif4 on nodus so that we can run dataviewer on there.

Also, the lscsoft stuff wasn't installed for apt-get, so I did so following the instructions on the DASWG website:

https://www.lsc-group.phys.uwm.edu/daswg/download/repositories.html#debian

Then I installed libmetaio1, libfftw3-3. Now, rather than complain about missing librarries, diaggui just silently dies.

Then I noticed that the awggui error message tells us to use 'ssh -Y' instead of 'ssh -X'. Using that I could run DTT on nodus from my office.

  11267   Fri May 1 20:33:31 2015 ranaSummaryComputer Scripts / Programsproblems with new restart procedures for elogd and apache

Same thing again todaysad. So I renamed the /etc/init/elog.conf so that it doesn't keep respawning bootlessly. Until then restart elog using the start script in /cvs/cds/caltech/elog/ as usual.

I'll let EQ debug when he gets back - probably we need to pause the elog respawn so that it waits until nodus is up for a few minutes before starting.

Quote:

Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.

After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.

 

  11273   Tue May 5 10:40:05 2015 ericqHowToComputer Scripts / ProgramsHow to get a web page running on Nodus

How to get your own web page running on Nodus

  1. On any martian machine, put your stuff in /users/public_html/$MYPAGE/
  2. On Nodus, run: ln -s /users/public_html/$MYPAGE /export/home/
  3. Your site is now available at https://nodus.ligo.caltech.edu:30889/$MYPAGE/
  4. If you want to allow straight up directory listing to the entire internet, on Nodus run: sudoedit /etc/sites-available/nodus, and add the following lines towards the bottom:
<Directory /export/home/$MYPAGE>
    Options +Indexes
</Directory>
  11277   Sun May 10 13:54:41 2015 ranaHowToComputer Scripts / Programssummary page URL change

Also, EQ gave us a better (and not pwd protected) URL for the summary pages. Please replace your previous links with this new one:

https://nodus.ligo.caltech.edu:30889/detcharsummary/

  11278   Mon May 11 01:28:33 2015 ranaHowToComputer Scripts / Programssummary page URL change

Like Steve pointed out, the summary pages show that the y-arm transmission drifts a lot when locked. The OL summary page shows that this is all due to ITMY yaw.

Could be either that they coil driver / DAC is bad or that the suspension is poorly built. We need to dig into ITMY OL trends over long term to see if this is new or now.

Also, weather station needs a reboot. And does anyone know what the MC_F calibration is?

  11288   Wed May 13 09:17:28 2015 ranaUpdateComputer Scripts / Programsrsync frames to LDAS cluster

Still seems to be running without causing FB issues. One thought is that we could look through the FB status channel trends and see if there is some excess of FB problems at 10 min after the hour to see if its causing problems.

I also looked into our minute trend situation. Looks like the files are comrpessed and have checksum enabled. The size changes sometimes, but its roughly 35 MB per hour. So 840 MB per day.

According to the wiper.pl script, its trying to keep the minute-trend directory to below some fixed fraction of the total /frames disk. The comment in the scripts says 0.005%,

but I'm dubious since that's only 13TB*5e-5 = 600 MB, and that would only keep us for a day. Maybe the comment should read 0.5% instead...

Quote:

The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now.

Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes.

You can find the script and its logfile name by doing 'crontab -l' on nodus.

 

  11299   Mon May 18 14:22:05 2015 ericqUpdateComputer Scripts / Programsrsync frames to LDAS cluster
Quote:

Still seems to be running without causing FB issues.

I'm not so sure. I just was experiencing some severe network latency / EPICS channel freezes that was alleviated by killing the rsync job on nodus. It started a few minutes after ten past the hour, when the rysnc job started. 

Unrelated to this, for some odd reason, there is some weirdness going on with ssh'ing to martian machines from the control room computers. I.e. on pianosa, ssh nodus fails with a failure to resolve hostaname message, but ssh nodus.martian succeeds. 

  11307   Tue May 19 11:15:09 2015 ericqUpdateComputer Scripts / ProgramsChiara Backup Hiccup

Starting on the 14th (five days ago) the local chiara rsync backup of /cvs/cds to an external HDD has been failing:

caltech/c1/scripts/backup/rsync_chiara.backup.log:

2015-05-13 07:00:01,614 INFO       Updating backup image of /cvs/cds
2015-05-13 07:49:46,266 INFO       Backup rsync job ran successfully, transferred 6504 files.
2015-05-14 07:00:01,826 INFO       Updating backup image of /cvs/cds
2015-05-14 07:50:18,709 ERROR      Backup rysnc job failed with exit code 24!
2015-05-15 07:00:01,385 INFO       Updating backup image of /cvs/cds
2015-05-15 08:09:18,527 ERROR      Backup rysnc job failed with exit code 24!
...
 

Code 24 apparently means "Partial transfer due to vanished source files."

Manually running the backup command on chiara worked fine, returning a code of 0 (success), so we are backed up. For completeness, the command is controls@chiara: sudo rsync -av --delete --stats /home/cds/ /media/40mBackup

Are the summary page jobs moving files around at this time of day? If so, one of the two should be rescheduled to not conflict. 

  11308   Tue May 19 11:24:44 2015 ericqUpdateComputer Scripts / ProgramsNotification Scheme

Given some of the things we've facing lately, it occurs to me that we could be better served by having some sort of unified human-alerting scheme in place, for things like:

  • Local/offsite backup failures
  • Vaccumm system problems
  • HDD status for things like /frames/ and /cvs/cds/, whether the disks are full, or their SMART status indicates imminent mechanical failure

Currently, many of these things are just checked sporadically when it occurs to someone to do so, or when debugging random issues. Smoother IFO operation and peace of mind could be gained if we're confident that the relevant people are notified in a timely manner. 

Thoughts? Suggestions on other things to monitor, like maybe frontend/model crashes?

ELOG V3.1.3-