ID |
Date |
Author |
Type |
Category |
Subject |
11240
|
Thu Apr 23 21:05:23 2015 |
rana | Update | Computer Scripts / Programs | CDSutils upgrade undone |
Q: please update this Wiki page with the go-back procedure:
https://wiki-40m.ligo.caltech.edu/CDSutils_Upgrade_Procedure |
11252
|
Sun Apr 26 00:56:21 2015 |
rana | Summary | Computer Scripts / Programs | problems with new restart procedures for elogd and apache |
Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.
After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use. |
11263
|
Wed Apr 29 18:12:42 2015 |
rana | Update | Computer Scripts / Programs | nodus update |
Installed libmotif3 and libmotif4 on nodus so that we can run dataviewer on there.
Also, the lscsoft stuff wasn't installed for apt-get, so I did so following the instructions on the DASWG website:
https://www.lsc-group.phys.uwm.edu/daswg/download/repositories.html#debian
Then I installed libmetaio1, libfftw3-3. Now, rather than complain about missing librarries, diaggui just silently dies.
Then I noticed that the awggui error message tells us to use 'ssh -Y' instead of 'ssh -X'. Using that I could run DTT on nodus from my office. |
11267
|
Fri May 1 20:33:31 2015 |
rana | Summary | Computer Scripts / Programs | problems with new restart procedures for elogd and apache |
Same thing again today . So I renamed the /etc/init/elog.conf so that it doesn't keep respawning bootlessly. Until then restart elog using the start script in /cvs/cds/caltech/elog/ as usual.
I'll let EQ debug when he gets back - probably we need to pause the elog respawn so that it waits until nodus is up for a few minutes before starting.
Quote: |
Since the nodus upgrade, Eric/Diego changed the old csh restart procedures to be more UNIX standard. The instructions are in the wiki.
After doing some software updates on nodus today, apache and elogd didn't come back OK. Maybe because of some race condition, elog tried to start but didn't get apache. Apache couldn't start because it found that someone was already binding the ELOGD port. So I killed ELOGD several times (because it kept trying to respawn). Once it stopped trying to come back I could restart Apache using the Wiki instructions. But the instructions didn't work for ELOGD, so I had to restart that using the usual .csh script way that we used to use.
|
|
11273
|
Tue May 5 10:40:05 2015 |
ericq | HowTo | Computer Scripts / Programs | How to get a web page running on Nodus |
How to get your own web page running on Nodus
- On any martian machine, put your stuff in
/users/public_html/$MYPAGE/
- On Nodus, run:
ln -s /users/public_html/$MYPAGE /export/home/
- Your site is now available at https://nodus.ligo.caltech.edu:30889/$MYPAGE/
- If you want to allow straight up directory listing to the entire internet, on Nodus run: sudoedit /etc/sites-available/nodus, and add the following lines towards the bottom:
<Directory /export/home/$MYPAGE>
Options +Indexes
</Directory>
|
11277
|
Sun May 10 13:54:41 2015 |
rana | HowTo | Computer Scripts / Programs | summary page URL change |
Also, EQ gave us a better (and not pwd protected) URL for the summary pages. Please replace your previous links with this new one:
https://nodus.ligo.caltech.edu:30889/detcharsummary/ |
11278
|
Mon May 11 01:28:33 2015 |
rana | HowTo | Computer Scripts / Programs | summary page URL change |
Like Steve pointed out, the summary pages show that the y-arm transmission drifts a lot when locked. The OL summary page shows that this is all due to ITMY yaw.
Could be either that they coil driver / DAC is bad or that the suspension is poorly built. We need to dig into ITMY OL trends over long term to see if this is new or now.
Also, weather station needs a reboot. And does anyone know what the MC_F calibration is? |
11288
|
Wed May 13 09:17:28 2015 |
rana | Update | Computer Scripts / Programs | rsync frames to LDAS cluster |
Still seems to be running without causing FB issues. One thought is that we could look through the FB status channel trends and see if there is some excess of FB problems at 10 min after the hour to see if its causing problems.
I also looked into our minute trend situation. Looks like the files are comrpessed and have checksum enabled. The size changes sometimes, but its roughly 35 MB per hour. So 840 MB per day.
According to the wiper.pl script, its trying to keep the minute-trend directory to below some fixed fraction of the total /frames disk. The comment in the scripts says 0.005%,
but I'm dubious since that's only 13TB*5e-5 = 600 MB, and that would only keep us for a day. Maybe the comment should read 0.5% instead...
Quote: |
The rsync job to sync our frames over to the cluster has been on a 20 MB/s BW limit for awhile now.
Dan Kozak has now set up a cronjob to do this at 10 min after the hour, every hour. Let's see how this goes.
You can find the script and its logfile name by doing 'crontab -l' on nodus.
|
|
11299
|
Mon May 18 14:22:05 2015 |
ericq | Update | Computer Scripts / Programs | rsync frames to LDAS cluster |
Quote: |
Still seems to be running without causing FB issues.
|
I'm not so sure. I just was experiencing some severe network latency / EPICS channel freezes that was alleviated by killing the rsync job on nodus. It started a few minutes after ten past the hour, when the rysnc job started.
Unrelated to this, for some odd reason, there is some weirdness going on with ssh'ing to martian machines from the control room computers. I.e. on pianosa, ssh nodus fails with a failure to resolve hostaname message, but ssh nodus.martian succeeds. |
11307
|
Tue May 19 11:15:09 2015 |
ericq | Update | Computer Scripts / Programs | Chiara Backup Hiccup |
Starting on the 14th (five days ago) the local chiara rsync backup of /cvs/cds to an external HDD has been failing:
caltech/c1/scripts/backup/rsync_chiara.backup.log:
2015-05-13 07:00:01,614 INFO Updating backup image of /cvs/cds
2015-05-13 07:49:46,266 INFO Backup rsync job ran successfully, transferred 6504 files.
2015-05-14 07:00:01,826 INFO Updating backup image of /cvs/cds
2015-05-14 07:50:18,709 ERROR Backup rysnc job failed with exit code 24!
2015-05-15 07:00:01,385 INFO Updating backup image of /cvs/cds
2015-05-15 08:09:18,527 ERROR Backup rysnc job failed with exit code 24!
...
Code 24 apparently means "Partial transfer due to vanished source files."
Manually running the backup command on chiara worked fine, returning a code of 0 (success), so we are backed up. For completeness, the command is controls@chiara: sudo rsync -av --delete --stats /home/cds/ /media/40mBackup
Are the summary page jobs moving files around at this time of day? If so, one of the two should be rescheduled to not conflict. |
11308
|
Tue May 19 11:24:44 2015 |
ericq | Update | Computer Scripts / Programs | Notification Scheme |
Given some of the things we've facing lately, it occurs to me that we could be better served by having some sort of unified human-alerting scheme in place, for things like:
- Local/offsite backup failures
- Vaccumm system problems
- HDD status for things like /frames/ and /cvs/cds/, whether the disks are full, or their SMART status indicates imminent mechanical failure
Currently, many of these things are just checked sporadically when it occurs to someone to do so, or when debugging random issues. Smoother IFO operation and peace of mind could be gained if we're confident that the relevant people are notified in a timely manner.
Thoughts? Suggestions on other things to monitor, like maybe frontend/model crashes? |
11321
|
Fri May 22 18:09:58 2015 |
ericq | Update | Computer Scripts / Programs | ifoCoupling |
I've started working on a general routine to measure noise couplings in our interferometers. Often this is done with swept sine measurements, but this misses the nonlinear part of the coupling, especially if the linear part is alreay reduced through some compensation or feedforward scheme. Rana suggested using a series of narrow band-limited noise injections.
The structure I'm working on is a python script that uses the AWG interface written by Chris W. to create the excitations. Afterwards, I calculate a series of PSD estimates from the data (i.e. a spectrogram), and apply a two-sample, unequal variance, t-test to test for statisically significant increases in the noise spectra to try and evaluate the nonlinear contriubutions to the noise. I've started a git repository at github.com/e-q/ifoCoupling with the code.
So far, I've tested one such injection of noise coupling from the ETMX oplev error point to the single arm length error signal. It's completely missing the user interface and structure to do a general series of measurements, but this is just organizational; I'm trying to get the math/science down first.
Here's a result from today:

Median, instead of the usual mean, PSDs are used throughout, to reject outliers/glitches.
The linear part of the coupling can be estimated using the coherence / spectrum height in the excitation band, but I'm not sure what the best what to present/paramerize the nonlinear parts of each individaul excitation band's result is.
Also, I anticipate being able to write an excitation auto-leveling routine, gradually increasing the exctiation level until the excited spectrum is some amount noisier than the baseline spectrum, up to some maximum amount configurable by the user.
The excitation shaping could probably be improved, too. It's currently and elliptic + butterworth bandpass for a sharp edge and rolloff.
I'm open to any thoughts and/or suggestions anyone may have! |
Attachment 1: ETMX_PIT_L_coupling.png
|
|
11325
|
Tue May 26 19:57:11 2015 |
rana | Update | Computer Scripts / Programs | ifoCoupling |
Looks like a very handy code, especially with the real statistical tests.
I would make sure to use much smaller excitation amplitudes. Since the coupling is nonlinear, we expect that its only a good noise budget estimator when the excitation amplitude is less than a factor of 3 above the quiesscent excitation. |
11327
|
Wed May 27 15:20:54 2015 |
ericq | Update | Computer Scripts / Programs | Chiara Backup Hiccup |
The local chiara backups are still failing due to vanished source files. I've emailed Max about the summary page jobs, since I think they're running remotely. |
11336
|
Fri May 29 11:28:42 2015 |
ericq | Update | Computer Scripts / Programs | Chiara Backup Hiccup |
I've changed the chiara local backup script to read a folder exclusion file, and excluded /users/public_html/detcharsummary, and things are working again.
This was neccesary because the summary pages are being updated every half hour, which is faster than the time it takes for the backup script to run, so the file index that it builds at the start becomes invalid later on in the process.
Thinking about chiara's disk, it strikes me that when we went from the linux1 RAID to a single HDD on chiara, we may have tightened a bottleneck on our NFS latency, i.e. we are limited to that single hard drive's IO rates. This of course isn't the culprit for the more recent dramatic slowdowns, but in addition to fixing whatever has happened more recently, we may want to consider some kind of setup with higher IO capability for the NFS filesystem. |
11337
|
Fri May 29 12:49:53 2015 |
Koji | Update | Computer Scripts / Programs | Chiara Backup Hiccup |
In fact, the file access is supposed to be WAY faster now than in the RAID case.
As noted in ELOG 9511, it was SCSI-2(or 3?) that had ~6MB/s thruput. Previously the backup took ~2hours.
This was improved to 30min by SATA HDD on llinux1.
I am looking at /opt/rtcds/caltech/c1/scripts/backup/rsync.backup.cumlog
In fact, this "30-min backup" was true until the end of March. After that the backup is taking 1h~1.5h.
This could be related to the recent NFS issue? |
11338
|
Fri May 29 15:12:39 2015 |
Koji | Update | Computer Scripts / Programs | Chiara Backup Hiccup |
Actual data |
Attachment 1: backup_hours.pdf
|
|
11366
|
Fri Jun 19 16:54:20 2015 |
Jenne | Update | Computer Scripts / Programs | Wiener scripts in scripts directory |
I have put the Wiener filter scripts into /opt/rtcds/caltech/c1/scripts/Wiener/ . They are under version control.
The idea is that you should copy ParameterFile_Example.m into your own directory, and modify parameters at the top of the file, and then when you run that script, it will output fitted filters ready to go into Foton. (Obviously you must check before actually implementing them that you're happy with the efficacy and fits of the filters).
Things to be edited in the ParameterFile include:
- Channel names for the witness sensors (which should each have a corresponding .txt file with the raw data)
- Channel name for the target
- Folder where this raw data is saved
- Folder to save results
- 1 or 0 to determine if need to load and downsample the raw data, or if can use pre-downsampled data
- This should probably be changed to just look to see if the pre-downsampled data already exists, and if not, do the downsampling
- 1 or 0 to determine if should use actuator pre-weighting
- Data folder for measured actuator TFs (only if using actuator pre-weighting)
- Actuator TFs can be many different exported text files from DTT, and they will be stitched together to make one set of measurements, where all points have coherence above some quantity (that you set in the ParameterFile)
- Coherence threshold for actuator data (only use data points with coherence above this amount)
- Fit order for actuator transfer function's vectfit
- 1 or 0 to decide if should use preweighting filter
- zeros and poles for preweighting filters
- 1 or 0 to decide if should use lowpass after Wiener filters (will be provided corresponding SOS coefficients for this filter, if you say yes)
- Lowpass filter parameters: cuttoff freq, order and ripple for the Cheby filter
- New sample rate for the data
- Number of Wiener filter taps
- Decide if use brute force matrix inversion or Levinson method
- Calibrations for witnesses and target
- Fit order for each of the Wiener filters
I think that's everything that is required.
|
11481
|
Thu Aug 6 01:38:19 2015 |
ericq | Update | Computer Scripts / Programs | Chiara gets new Ethernet card |
Since Chiara's onboard ethernet card has a reputation to be flaky in Linux, Koji suggested we could just buy a new ethernet card and throw it in there, since they're cheap.
I've installed a Intel EXPI9301CT ethernet card in Chiara, which detected it without problems. I changed over the network settings in /etc/networking/interfaces to use eth1 instead of eth0, restarted nfs and bind9, and everything looked fine.
Sadly, EPICS/network slowdowns are still happening. :( |
11498
|
Wed Aug 12 14:35:46 2015 |
ericq | Update | Computer Scripts / Programs | PDFs in ELOG |
I've tweaked the ELOG code to allow uploading of PDFs by drag-and-drop into the main editor window. Once again we can bask in the glory of

(You may have to clear your browser's cache to load the new javascript) |
Attachment 1: smooth.pdf
|
|
11572
|
Fri Sep 4 04:12:05 2015 |
ericq | Update | Computer Scripts / Programs | MATLAB down on all workstations |
There seems to be something funny going on with MATLAB's license authentication on the control room workstations. Earlier today, I was able to start MATLAB on pianosa, but now attempting to run /cvs/cds/caltech/apps/linux64/matlab/bin/matlab -desktop results in the message:
License checkout failed.
License Manager Error -15
MATLAB is unable to connect to the license server.
Check that the license manager has been started, and that the MATLAB client machine can communicate
with the license server.
Troubleshoot this issue by visiting:
http://www.mathworks.com/support/lme/R2013a/15
Diagnostic Information:
Feature: MATLAB
License path: /home/controls/.matlab/R2013a_licenses:/cvs/cds/caltech/apps/linux64/matlab/licenses/license.dat:/cv
s/cds/caltech/apps/linux64/matlab/licenses/network.lic
Licensing error: -15,570. System Error: 115
|
11580
|
Mon Sep 7 16:30:56 2015 |
rana | HowTo | Computer Scripts / Programs | increase of window border size on Rossa |
Frustrated by the single pixel width of the windows and how hard that makes it to drag things around, I explored StackExchange:
which showed how there is a .xml file which can be edited to increase this. I've changed the border size to 4 pixels on Rossa - its nice. |
11615
|
Thu Sep 17 19:58:06 2015 |
gautam | Summary | Computer Scripts / Programs | Frequency counting algorithm |
I made some changes to the c1tst model running on c1iscey in order to test my algorithm for frequency counting. I followed the steps listed in elog 8909 to make, install and start the model.
I need to debug a few things and run some more diagnostics so I am leaving the model in its edited version (Eric had committed it to the svn before I made any changes). |
11618
|
Fri Sep 18 09:06:26 2015 |
rana | Frogs | Computer Scripts / Programs | remote data access: volume 1, Inferno |
Trying to download some data using matlab today, I found that my ole mDV stuff doesn't work because its MEX files were built for AMD64...
Tried to rebuild the NDS1 MEX according to 7 year old instructions didn't work; our GCC is 'too' new.
From the Remote Data Access wiki (https://wiki.ligo.org/RemoteAccess/MatlabTools) I got the new 'get_data.m' and 'GWdata.m'. These didn't run, so I updated the nds2-client and matlab-nds2-client on Donatella.
Still doesn't run to get 40m data. It recognizes that we're C1, but throws some java exception error. Maybe it doesn't work on the NDS1 protocol of our framebuilder?
So then I noticed that our NDS2 server on megatron is no longer running...thought it was supposed to run via init.d. Found that the nds2 binary doesn't run because it can't find libframecpp.so.5; maybe this was blown away in some recent upgrade? We do have versions 3, 4, 6, 7, & 8 of this library installed.
So now, after an hour or two, I'm upgrading the nds2 server on megatron (plus a hundred dependencies) as well as getting a newer version of matlab to see if there's some kind of java version issue there.
Of course python still works to get data, but doesn't have any of the wiener filter calculating code that matlab has... |
11623
|
Fri Sep 18 19:19:49 2015 |
rana | Frogs | Computer Scripts / Programs | remote data access: volume 1, Inferno |
NDS2 restarted after hours long upgrade process; testing has begun. Let's try to get some long stretches of MC locked with MCL FF ON this weekend so's I can test out the angular FF idea. |
11628
|
Mon Sep 21 18:31:06 2015 |
gautam | Summary | Computer Scripts / Programs | Frequency counting algorithm |
I have been working on setting up a frequency counting module that can give us a readout of the beat frequency, divided by a factor of 2^14 using the Wenzel frequency dividers as described here. This is a summary of what I have thus far.
The algorithm, and simulink model
The basic idea is to pass the digitized signal through a Schmitt trigger (existing RCG module), which provides some noise immunity, and should in theory output a clean square wave with the same frequency as the input. The output of the Schmitt trigger module is either 0 (for input < lower threshold value) and 1 (for input greater than the high threshold value). By differencing this between successive samples, we can detect a "zero-crossing", and by measuring the time interval between successive zero crossings, we can take the reciprocal to get the frequency. The last bit of this operation (i.e. measuring the interval) is done using a piece of custom C code. Initially, I was trying to use the part "GPS" from CDS_PARTS to get the current GPS time and hence measure intervals between successive zero-crossings, but this didn't work out because the output of GPS is in seconds, and that doesn't give me the required precision to count frequency. I tried implementing some more precision timing using the clock_gettime() function, which is capable of giving nanosecond precision, but this didn't work for me. So I am now using a more crude way of measuring the interval, by using a counter variable that is incremented each time a zero-crossing is NOT detected, and then converting this to time using the FE_RATE macro (=16384). In any case, the ADC sampling rate limits the resolution of frequency counting using zero-crossing detection (more on this later). Attachment 1 shows the SIMULINK block diagram for this entire procedure.
Testing the model
I implemented all of this on c1tst, and followed the steps listed here to get the model up and running. I then used one of the DB37 breakout boards to send a signal to the ADC using the DS345 function generator. Attachment 2 shows some diagnostic plots - input signal was a 2.5Vpp (chosen to match the output from the Wenzel dividers) square wave at 2kHz:
- Bottom left: digitized version of the input signal - I used this to set the upper and lower thresholds on the Schmitt trigger at +1000 counts and -1000 counts respectively.
- Top left: Schmitt trigger output (red trace) and the difference between successive samples of the Schmitt trigger output (blue trace - this variable is used to detect a zero crossing)
- Top right: Counter variable used to measure intervals between successive zero crossings, and hence, the frequency. The frequency output is held until the next zero crossing is detected, at which time counter is reset
- Bottom right: frequency output in Hz.
The right column pointed me to the limitations of frequency counting using this method - even though the input frequency was constant (2kHz), the counter variable, and hence the frequency readout, was neither accurate nor precise. But this was to be expected given the limitations imposed by ADC sampling? We only get information of the state of the input signal once within each sampling interval, and hence, we cannot know if a zero crossing has occurred until the next sampling interval. Moreover, we can only count frequency in discrete steps. In attachments 3 and 4, I've plotted these discrete frequencies which can be measured - the error bars indicate the error in the frequency readout if the counter variable is 1 more or less than the "true" value - this can (and does) happen if the high and low times of the Schmitt trigger are not equal over time (see top left plot in Attachment 2, its not very obvious, but all the "low" times are not equal, and so, the interval between detected zero crossings is not equal). This becomes a problem for small values of the counter variable, i.e. at high input frequencies. I was having a look at the elogs Aidan wrote some years ago for a different digital frequency counting approach, and I guess the conclusion there was similar - for high input frequencies, the error is large.
I further did two frequency sweeps using the DS345, to see if I could recover this in the frequency readout. Attachments 5 and 6 show the results of these sweeps. For low frequencies, i.e. 100-500 Hz, the jitter in the readout is small (though this will be multiplied by a factor of 2^14), but by the time the input frequency gets up to 2kHz, the jitter in the readout is pretty bad (and gets worse for even higher frequencies.
Bottom line
Some refinements can be made to the algorithm, perhaps by introducing some averaging (i.e. not reading out frequency for every pair of zero crossings, but every 5) which may improve the jitter in the readout, but I would think that the current approach is not very useful above 2kHz (corresponding to ~30MHz of pre-divider frequency), because of the limitations shown in attachments 3 and 4. |
Attachment 1: Simulink_model.pdf
|
|
Attachment 2: diagnostic_plots.pdf
|
|
Attachment 3: Error_high_frequency.pdf
|
|
Attachment 4: Error_low_frequency.pdf
|
|
Attachment 5: Frequency_sweep_100_500_Hz.pdf
|
|
Attachment 6: Frequency_sweep_100_2000_Hz.pdf
|
|
11629
|
Mon Sep 21 23:18:55 2015 |
ericq | Summary | Computer Scripts / Programs | Frequency counting algorithm |
I definitely think lowpassing the output is the way to go. Since this frequency readback will be used for slow control of the beatnote frequency via auxillary laser temperature, even lowpassing at tens of Hz is fine. The jitter doesn't mean its useless, though.
If we lowpass at 16Hz, we're effectively averaging over 1024 samples, bringing, for example, a +-2kHz jitter of a 6kHz signal as you post down to 2kHz/sqrt(1024) ~ 60Hz, which is 1% of the carrier. This seems ok to me. |
11631
|
Tue Sep 22 02:11:17 2015 |
rana | Summary | Computer Scripts / Programs | Frequency counting algorithm |
I was going to suggest using a software PLL, but perhaps averaging gives the same result. The same ADC signal can be fed to multiple blocks with different averaging times and we can just use whichever ones seems the most useful. |
11640
|
Thu Sep 24 17:01:37 2015 |
ericq | Update | Computer Scripts / Programs | Freeing up some space on /cvs/cds |
I noticed that Chiara's backup HD (which has a capacity of 1.8TB, vs the main drives 2TB) was near to getting full, meaning that we would soon be without a local backup.
I freed up ~200GB of space by compressing the autoburt snapshots from 2012, 2013, 2014. Nothing is deleted, I've just compressed text files into archives, so we can still dig out the data whenever we want. |
11688
|
Wed Oct 14 15:59:06 2015 |
rana | Update | Computer Scripts / Programs | nodus web apache simlinks too soft |
None of the links here seem to work. I forgot what the story is with our special apache redirect 
https://wiki-40m.ligo.caltech.edu/Core_Optics |
11694
|
Thu Oct 15 14:39:58 2015 |
ericq | Update | Computer Scripts / Programs | nodus web apache simlinks too soft |
The story is: we currently don't expose the whole /users/public_html folder. Instead, we are symlinking the folders from public_html to /export/home/ on nodus, which is where apache looks for things
So, I fixed the links on the Core Optics page by running:
controls@nodus|~ > ln -sfn /users/public_html/40m_phasemap /export/home/
|
11784
|
Wed Nov 18 20:49:05 2015 |
rana | Update | Computer Scripts / Programs | nodus boot getting full |
controls@nodus|~ > df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/nodus2--vg-root 355G 69G 269G 21% /
udev 5.9G 4.0K 5.9G 1% /dev
tmpfs 1.2G 308K 1.2G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 5.9G 0 5.9G 0% /run/shm
/dev/sda1 236M 210M 14M 94% /boot
chiara:/home/cds 2.0T 1.5T 459G 77% /cvs/cds
fb:/frames 13T 11T 1.6T 88% /frames |
11786
|
Wed Nov 18 23:18:07 2015 |
ericq | Update | Computer Scripts / Programs | nodus /boot cleared up |
The /boot partition was filling up with old kernels. Nodus has automatic security updates turned on, so new kernels roll in and the old ones don't get removed.
I ran apt-get autoremove , which removed several old kernels. (apt is configured by default to keep two previous kernels around when autoremoving, so this isn't so risky)
Now: /dev/sda1 236M 94M 130M 42% /boot
In principle, one should be able change a setting in /etc/apt/apt.conf.d/50unattended-upgrades that would do this cleanup automatically, but this mechanism has a bug whose fix hasn't propagated out yet (link). So, I've added a line to nodus' root crontab to autoremove once a week, Sunday morning. |
11799
|
Mon Nov 23 14:45:39 2015 |
ericq | Update | Computer Scripts / Programs | New software |
COMSOL 5.1 has been installed at: /cvs/cds/caltech/apps/linux64/comsol51/bin/comsol
MATLAB 2015b has been installed at: /cvs/cds/caltech/apps/linux64/matlab15b/bin/matlab
This has not replaced the default matlab on the workstations, which remains at 2013a. If some testing reveals that the upgrade is ok, we can rename the folders to switch. |
11824
|
Mon Nov 30 12:19:38 2015 |
yutaro | Update | Computer Scripts / Programs | image capture |
On VIDEO.adl, Image Capture and Video Capture did not seem to work and gave me some errors, so I fixed following two things:
1. just put one side of a USB cable to Pianosa the other side of which was connected to Sensoray; I don't know why but this was unconnected.
2. slightly fixed /users/sensoray/sdk_2253_1.2.2_linux/imsub/display-image.py as fpllows
L52: pix[j, i] = R, G, B -> pix[j, i] = int(R), int(G), int(B)
It seems to work, at least for some cameras including ETMYF and ITMYF. |
11837
|
Wed Dec 2 15:08:41 2015 |
ericq | Update | Computer Scripts / Programs | Donatella sudo problem resolved |
Somehow, the controls user account on donatella lost its membership to the sudoers group, which meant doing anything that needs root authentication was impossible.
I fixed this by booting up from a Linux install USB drive, mounting the HD, and running useradd controls sudo |
11855
|
Mon Dec 7 10:40:09 2015 |
yutaro | Update | Computer Scripts / Programs | Added 1 line to UNFREEZE_DITHER.py |
I added 1 line to one of the ASS scripts, UNFREEZE_DITHER.py like this:
L29> ez.cawrite('C1:ASS-'+dof+'_GAIN', 0)
The reason why I added this is: without this line, C1:ASS-'+dof+'_GAIN become larger that 1.0, which is nomial value, if you UNFREEZE DITHER when the dither is already running or C1:ASS-'+dof+'_GAIN is not 0.0. |
11861
|
Tue Dec 8 11:24:45 2015 |
yutaro | Summary | Computer Scripts / Programs | Scripts for loss map measurement |
Here I explain usage of my scripts for loss map measurement. There are 7 script files in a same directory /opt/rtcds/caltech/c1/scripts/lossmap_scripts. With these scripts, round trip loss of an arm cavity with the beam spot on one mirror shifted to 5x5 (option: 3x3) points is measured. You can choose on which cavity you measure, the beam spot on which mirror you shift, and maximum shift of the beam spot in vertical and horizontal direction.
To start measurement from the beginning
Run the following command in an arbitrary directory and you will get several text files including the result of loss map measurement:
> python /opt/rtcds/caltech/c1/scripts/lossmap_scripts/lossmap.py [maximum shift in mm (PIT)] [maximum shift in mm (YAW)] [arm name (XorY)] [mirror name (E or I)]
Optionally, you can add "AUTO" at the end of the above command. Without "AUTO", you will be asked if the dithering has already settled down or not after each shift of the beam spot and you can let the scripts wait until the dithering settles down sufficiently. If you add "AUTO", it will be judged if the dithering has settled down or not according to some criteria, and the measurement will continue without your response to the terminal.
The files to be created in the current directory by the scripts are:
- lossmapETMX1-1.txt # [POX power (locked)] / [POX power (misaligned)]
- lossmapETMX1-2.txt # standard deviation of [POX power (locked)] / [POX power (misaligned)]
- lossmapETMX1-3.txt # TRX
- lossmapETMX1-1_converted.txt # round trip loss (ppm) calculated from lossmapETMX1-1.txt
- lossmapETMX1-1_converted_sigma.txt # standard deviation of round trip loss calculated from 1-1.txt and 1-2.txt
- lossmapETMX_result.txt # round trip loss and its error in a clear form.
The name of the files would be "lossmapITMY1-1.txt" etc. depending on which mirror you have chosen.
To restart measurement from a certain point
Run the following command in a directory containing "lossmap(mirror name)1-1.txt", "lossmap(mirror name)1-2.txt" and "lossmap(mirrorname)1-3.txt" which are created by previous not-completed measurement:
> python /opt/rtcds/caltech/c1/scripts/lossmap_scripts/lossmap.py [maximum shift in mm (PIT)] [maximum shift in mm (YAW)] [arm name (XorY)] [mirror name (E or I)] [restart point (PIT)] [restart point (YAW)]
You can also add "AUTO".
How to designate the restart point:
Matrix elements of output of this measurement procedure are characterized by a pair of two numbers as the following shows.
(-1,-1) -> (-1,-0.5) -> (-1,0) -> (-1,0.5) -> (-1,1)
v
(-0.5,1) <- (-0.5,0.5) <- (-0.5,0) <- (-0.5,-0.5) <- (0.5,-1)
v
(0,-1) -> (0,-0.5) -> (0,0) -> (0,0.5) -> (0,1)
v
(0.5,1) <- (0.5,0.5) <- (0.5,0) <- (0.5,-0.5) <- (0.5,-1)
v
(1,-1) -> (1,-0.5) -> (1,0) -> (1,0.5) -> (1,1)
Please write the numbers that correspond to the matrix element you want to restart at. Arrows show the order of sequence of measurement. About the correspondence between the matrix elements and real position on the ETMY and ETMX, see elog 11818 and 11857, respectively.
This script will overwrite the files (~1-1.txt etc.) so it is safer to make backup of the files before you run this script.
Some notes on the scripts and measurement
- Calibration has been done only for ETMs, i.e. for ITMs unit of [maximum shift] is not mm, but the values written in [maximum shift] equal to the maximum offsets added just after demodulation of ASS loop (ex. C1:ASS-YARM_ITM_PIT_L_DEMOD_I_OFFSET).
- It should be checked before doing measurement if the following parameters are correct or not.
POXzero (L47 in lossmapx.py and L52 in lossmapx_resume.py: the value of C1:LSC-POXDC_OUTPUT when no light injects into POXPD.)
POYzero (L45 in lossmapy.py and L50 in lossmapy_resume.py: the value of C1:LSC-POYDC_OUTPUT when no light injects into POYPD.)
mmr (L11 in lossmap_convert.py: (mode matching carrier power)/(total power))
Tf (L12 in lossmap_convert.py; transmittivity of ITM)
Tetm (L13 in lossmap_convert.py: transmittivity of ETM in ppm)
- Changing n (L50 in lossmap.py) from 5 to 3, the grid points will be 3x3 changed from the default value of 5x5. If 3x3, the matrix elements are characterized by
(-1,-1) -> (-1,0) -> (-1,1)
v
(0,1) <- (0,0) <- (0,-1)
v
(1,-1) -> (1,0) -> (1,1)
similarly to the case of 5x5.
- You can copy the directory lossmap_scripts anywhere in controls and use it. These scripts will work as long as all the 7 scripts exist in a same directory. |
11862
|
Tue Dec 8 15:18:29 2015 |
ericq | Update | Computer Scripts / Programs | Nodus security |
I've done a couple things to try and make nodus a little more secure. Some have worried that nodus may be susceptible to being drafted into a botnet, slowing down our operations.
1. I configured the ssh server settings to disallow logins as root. Ubuntu doesn't enable the root account by default anyways, but it doesn't hurt.
2. I installed fail2ban . Function: If some IP address fails to authenticate an ssh connection 3 times, it is banned from trying to connect for 10 minutes. This is mostly for thwarting mass brute force attacks. Looking at /var/log/auth.log doesn't indicate any of this kind of thing going on in the past week, at least.
3. I set up and enabled ufw (uncomplicated firewall) to only allow incoming traffic for:
- ssh
- ELOG
- Nodus apache stuff (svn, wikis, etc.)
I don't think there are any other ports we need open, but I could be wrong. Let me know if I broke something you need! |
11869
|
Wed Dec 9 23:16:13 2015 |
rana | Update | Computer Scripts / Programs | Nodus security |
NDS2 and the usual ports so that we can use optimus as a comsol server.
Quote: |
I don't think there are any other ports we need open, but I could be wrong. Let me know if I broke something you need!
|
|
11899
|
Wed Dec 23 03:27:04 2015 |
rana | Update | Computer Scripts / Programs | LHO EPICS slow down |
https://alog.ligo-wa.caltech.edu/aLOG/index.php?callRep=24321
This LHO log indicates that EPICS slow down could be due to NFS activity. Could we make some trend of NFS activity on Chiara and then see if it correlates with EPICS flatlines?
I wonder if our EPICS issues frequency is correlated to the Chiara install. |
11905
|
Mon Jan 4 14:45:41 2016 |
rana, eq, koji | Configuration | Computer Scripts / Programs | nodus pwd change |
We changed the password for controls on nodus this afternoon. We also zeroed out the authorized_keys file and then added back in the couple that we want in there for automatic backups / detchar.
Also did the recommended Ubuntu updates on there. Everything seems to be going OK so far. We think nothing on the interferometer side cares about the nodus password.
We also decided to dis-allow personal laptops on the new Martian router (to be installed soon). |
12244
|
Tue Jul 5 18:44:39 2016 |
Praful | Update | Computer Scripts / Programs | Working 40m Summary Pages |
After hardware errors prevented me from using optimus, I switched my generation of summary pages back to the clusters. A day's worth of data is still too much to process using one computer, but I have successfully made summary pages for a timescales of a couple of hours on this site: https://ldas-jobs.ligo.caltech.edu/~praful.vasireddy/
Currently, I'm working on learning the current plot-generation code so that it can eventually be modified to include an interactive component (e.g., hovering over a point on a timeseries would display the GPS time). Also, the 40m summary pages have been down for the past 3 weeks but should be up and working soon as the clusters are now alive. |
12252
|
Wed Jul 6 11:02:41 2016 |
Praful | Update | Computer Scripts / Programs | VMon Tab on Summary Pages |
I've added a new tab for VMon under the SUS parent tab. I'm still working out the scale and units, but let me know if you think this is a useful addition. Here's a link to my summary page that has this tab: https://ldas-jobs.ligo.caltech.edu/~praful.vasireddy/1151193617-1151193917/sus/vmon/
I'll have another tab with VMon BLRMS up soon.
Also, the main summary pages should be back online soon after Max fixed a bug. I'll try to add the SUS/VMon tab to the main pages as well. |
12254
|
Wed Jul 6 17:17:22 2016 |
Praful | Update | Computer Scripts / Programs | New Tabs and Working Summary Pages |
The main C1 summary pages are back online now thanks to Max and Duncan, with a gap in pages from June 8th to July 4th. Also, I've added my new VMon and Sensors tabs to the SUS parent tab on the main pages. These new tabs are now up and running on the July 7th summary page.
Here's a link to the main nodus pages with the new tabs: https://nodus.ligo.caltech.edu:30889/detcharsummary/day/20160707/sus/vmon/
And another to my ldas page with the tabs implemented: https://ldas-jobs.ligo.caltech.edu/~praful.vasireddy/1150848017-1150848317/sus/vmon/
Let me know if you have any suggestions or see anything wrong with these additions, I'm still working on getting the scales to be right for all graphs. |
12257
|
Wed Jul 6 21:05:36 2016 |
Koji | Update | Computer Scripts / Programs | New Tabs and Working Summary Pages |
I started to receive emails from cron every 15min. Is the email related to this? And is it normal? I never received these cron emails before when the sum-page was running. |
Attachment 1: cron_mail.txt.zip
|
12258
|
Wed Jul 6 21:09:09 2016 |
not Koji | Update | Computer Scripts / Programs | New Tabs and Working Summary Pages |
I don't know much about how the cron job runs, I'll forward this to Max.
Quote: |
I started to receive emails from cron every 15min. Is the email related to this? And is it normal? I never received these cron emails before when the sum-page was running.
|
Max says it should be fixed now. Have the emails stopped? |
12259
|
Wed Jul 6 21:16:17 2016 |
Max Isi | Update | Computer Scripts / Programs | New Tabs and Working Summary Pages |
This should be fixed now—apologies for the spam.
Quote: |
I don't know much about how the cron job runs, I'll forward this to Max.
Quote: |
I started to receive emails from cron every 15min. Is the email related to this? And is it normal? I never received these cron emails before when the sum-page was running.
|
|
|
12260
|
Wed Jul 6 21:50:21 2016 |
Koji | Update | Computer Scripts / Programs | New Tabs and Working Summary Pages |
It seemed something has been done. And I got cron emails.
Then, it seemed something has been done. And the emails stopped. |
12277
|
Fri Jul 8 19:33:16 2016 |
Praful | Update | Computer Scripts / Programs | MEDM Tab on Summary Pages |
A new MEDM tab has been added to the summary pages (https://nodus.ligo.caltech.edu:30889/detcharsummary/day/20160708/medm/), although some of the screens are not updated when /cvs/cds/projects/statScreen/cronjob.sh is run. In /cvs/cds/projects/statScreen/log.txt, the following error is given for those files: import: unable to read X window image `0x20011f': Resource temporarily unavailable @ error/xwindow.c/XImportImage/5027. If anyone has seen this error before or knows how to fix it, please let me know.
In the meantime, I'll be working on creating an archive of MEDM screens for every hour to be displayed on the summary pages. |