Jamie has informed me of numpy's numpy.savetxt() method, which is exactly what I want for this situation (human-readable text storage of an array). So, I will now be using:
# outfile is the name of the .png graph. data is the array with our desired data.
numpy.savetxt(outfile + '.dat', data)
to save the data. I can later retrieve it with numpy.loadtxt()
Graph Limits: The limits on graphs have been problematic. They often reflect too large of a range of values, usually because of dropouts in data collection. Thus, they do not provide useful information because the important information is washed out by the large limits on the graph. For example, the graph below shows data over an unnecessarily large range, because of the dropout in the 300-1000Hz pressure values.
The limits on the graphs can be modified using the config file found in /40m-summary/share/c1_summary_page.ini. At the entry for the appropriate graph, change the amplitude-lim=y1,y2 line by setting y1 to the desired lower limit and y2 to the desired upper limit. For example, I changed the amplitude limits on the above graph to amplitude-lim=.001,1, and achieved the following graph.
The limits could be tightened further to improve clarity - this is easily done by modifying the config file. I modified the config file for all the 2D plots to improve the bounds. However, on some plots, I wasn't sure what bounds were appropriate or what range of values we were interested in, so I will have to ask someone to find out.
Next: I now want to fix all the funny little problems with the site, such as scroll bars appearing where they should not appear, and graphs only plotting until 6PM. In order to do this most effectively, I need to restructure the code and factor it into several files. Otherwise, the code will not only be much harder to edit, but will become more and more confusing as I add on to it, compounding the problems that we currently have (i.e. that this code isn't very well documented and nobody knows how it works). We need lots of specific documentation on what exactly is happening before too many changes are made. Take the config files, for example. Someone put a lot of work into them, but we need a README specifying which options are supported for which types of graphs, etc. So we are slowed down because I have to figure out what is going on before I make small changes.
To fix this, I will divide the code into three main sectors. The division of labor will be:
- Sector 1: Figure out what the user wants (i.e. read config files, create a ConfigParser, etc...)
- Sector 2: Process the data and generate the plots based on what the user wants
- Sector 3: Generate the HTML
Duncan Macleod (original author of summary pages) has an updated version that I would like to import and work on. The code and installation instructions are found below.
I am not sure where we want to host this. I could put it in a new folder in /users/public_html/ on megatron, for example. Duncan appears to have just included the summary page code in the pylal repository. Should I reimport the whole repository? I'm not sure if this will mess up other things on megatron that use pylal. I am working on talking to Rana and Jamie to see what is best.
I am following the instructions here:
But there as an error when I run the ./00boot command near the beginning. I have asked Duncan Macleod about this and am waiting to hear back.
For now, I am putting things into /home/controls on allegra. My understanding is that this is not shared, so I don't have a chance of messing up anyone else's work. I have been moving slow and being extra cautious about what I do because I don't want to accidentally nuke anything.
I installed the new version of LAL on allegra. I don't think it has interfered with the existing version, but if anyone has problems, let me know. The old version on allegra was 6.9.1, but the new code uses 18.104.22.168. To use it, add . /opt/lscsoft/lal/etc/lal-user-ench.sh to the end of the .bashrc file (this is the simplest way, since it will automatically pull the new version).
I am having a little trouble getting some other unmet dependencies for the summary pages such as the new lalframe, etc. But I am working on it.
Once I get it working on allegra and know that I can get it without messing up current versions of lal, I will do this again on megatron so I can test and edit the new version of the summary pages.
LALFrame was successfully installed. Allegra had unmet dependencies with some of the library tools. I tried to install LALMetaIO, but there were unmet dependencies with other LSC software. After updating the LSC software, the problem has persisted. I will try some more, and ask Duncan if I'm not successful.
Installing these packages is rather time consuming, it would be nice if there was a way to do it all at once.
I am now working on megatron, installing in /home/controls/lal. I am having some unmet dependency issues that I have asked Duncan about.
I have figured out all the issues, and successfully installed the new versions of the LAL software. I am now going to get the summary pages set up using the new code.
There was an issue with running the new summary pages, because laldetchar was not included (the website I used for instructions doesn't mention that it is needed for the summary pages). I figured out how to include it with help from Duncan. There appear to be other needed dependencies, though. I have emailed Duncan to ask how these are imported into the code base. I am making a list of all the packages / dependencies that I needed that weren't included on the website, so this will be easier if/when it has to be done again.
Most dependencies are met. The next issue is that matplotlib.basemap is not installed, because it is not available for our version of python. We need to update python on megatron to fix this.
Replaced the batteries successfully in the control room. We just had to switch the clips from the old batteries to the new one, which we didn't know was possible until now.
The summary pages have been down due to incompatibilities with a software update and problems with the LDAS cluster. I'm working at the moment to fix the former and the LDAS admins are looking into the latter. Overall, we can expect the pages will be fully functional again by Monday.
The pages are live again. Please allow some time for the system to catch up and process missed days. If there are any further issues, please let me know.
URL reminder: https://nodus.ligo.caltech.edu:30889/detcharsummary/
It was brought to my attention that the "Code status" page (https://nodus.ligo.caltech.edu:30889/detcharsummary/status.html) had been stuck showing "Unknown status" for a while.
This was due to a sync error with LDAS and has now been fixed. Let me know if the issue returns.
The summary pages are currently unstable due to priority issues on the cluster*. The plots had been empty ever since the CDS updated started anyway. This issue will (presubmably) disappear once the jobs are moved to the new 40m shared LDAS account by the end of next week.
*namely, the jobs are put on hold (rather, status "idle") because we have low priority in the processing queue, making the usual 30min latency impossible.
Bad syntax errors in the c1sus.ini config file were causing the summary pages to crash: a plot type had not been indicated for plots 5 and 6, so I've made these "timeseries."
In the future, please remember to always specify a plot type, e.g.:
5 = C1:SUS-ETMX_SUSPIT_INMON.mean,
3 = C1:SUS-ETMX_SUSPIT_INMON.mean,
By the way, the pages will continue to be unavailable while I transfer them to the new shared account.
A shared LIGO Data Grid (LDG) account was created for use by the 40m lab. The purpose of this account is to provide access to the LSC computer cluster resources for 40m-specific projects that may benefit from increased computational power and are not linked to any user in particular (e.g. the summary pages).
For further information, please see https://wiki-40m.ligo.caltech.edu/40mLDASaccount
The summary pages are now generated from the new 40m LDAS account. The nodus URL (https://nodus.ligo.caltech.edu:30889/detcharsummary/) is the same and there are no changes to the way the configuration files work. However, the location on LDAS has changed to https://ldas-jobs.ligo.caltech.edu/~40m/summary/ and the config files are no longer version-controlled on the LDAS side (this was redundant, as they are under VCS in nodus).
I have posted a more detailed description of the summary page workflow, as well as instructions to run your own jobs and other technical minutiae, on the wiki: https://wiki-40m.ligo.caltech.edu/DailySummaryHelp
For the past couple of days, the summary pages have shown minute trend data disappear at 12:00 UTC (05:00 AM local time). This seems to be the case for all channels that we plot, see e.g. https://nodus.ligo.caltech.edu:30889/detcharsummary/day/20150724/ioo/. Using Dataviewer, Koji has checked that indeed the frames seem to have disappeared from disk. The data come back at 24 UTC (5pm local). Any ideas why this might be?
A new type of plot is now available for use in the summary pages, based on EricQ's 2D histogram plots (elog 11210). I have added an example of this to the SandBox tab (https://nodus.ligo.caltech.edu:30889/detcharsummary/day/20151119/sandbox/). The usage is straighforwad: the name to be used in config files is histogram2d; the first channel corresponds to the x-axis and the second one to the y-axis; the options accepted are the same as numpy.histogram2d and pyploy.pcolormesh (besides plot limits, titles, etc.). The default colormap is inferno_r and the shading is flat.
I have added a new cron job in pcdev1 at CIT using the 40m shared account. This will run the /home/40m/DetectorChar/bin/cleanarchive script one minute past midnight on the first of every month. The script removes GWsumm archive files older than 1 month old.
I have modified the c1summary.ini and c1lsc.ini configuration files slightly to avoid overloading the system and remove the errors that were preventing plots from being updated after certain time in the day.
The changes made are the following:
1- all high-resolution spectra from the Summary and LSC tabs are now computed for each state (X-arm locked, Y-arm locked, IFO locked, all);
2- I've removed MICH, PRCL & SRCL from the summary spectrum (those can still be found in the LSC tab);
3- I've split LSC into two subtabs.
The reason for these changes is that having high resolution (raw channels, 16kHz) spectra for multiple (>3) channels on a single tab requires a *lot* of memory to process. As a result, those jobs were failing in a way that blocked the queue, so even other "healthy" tabs could not be updated.
My changes, reflected from May 25 on, should hopefully fix this. As always, feel free to re organize the ini files to make the pages more useful to you, but keep in mind that we cannot support multiple high resolution spectra on a single tab, as explained above.
This should be fixed now—apologies for the spam.
I don't know much about how the cron job runs, I'll forward this to Max.
I started to receive emails from cron every 15min. Is the email related to this? And is it normal? I never received these cron emails before when the sum-page was running.
Summary pages down today due to schedulted LDAS cluster maintenance. The pages will be back automatically once the servers are back (by tomorrow).
The system is back from maintenance and the pages for last couple of days will be filled retroactively by the end of the week.
I've re-submitted the Condor job; pages should be back within the hour.
Been non-functional for 3 weeks. Anyone else notice this? images missing since ~Sep 21.
Summary pages will be unavailable today due to LDAS server maintenance. This is unrelated to the issue that Rana reported.
The summary pages were not successfully generated for a long period of time at the end of 2016 due to syntax errors in the PEM and Weather configuration files.
These errors caused the INI parser to crash and brought down the whole gwsumm system. It seems that changes in the configuration of the Condor daemon at the CIT clusters may have made our infrastructure less robust against these kinds of problems (which would explain why there wasn't a better error message/alert), but this requires further investigation.
In any case, the solution was as simple as correcting the typos in the config side (on the nodus side) and restarting the cron jobs (on the cluster side, by doing `condor_rm 40m && condor_submit DetectorChar/condor/gw_daily_summary.sub`). Producing pages for the missing days will take some time (how to do so for a particular day is explained in the wiki https://wiki-40m.ligo.caltech.edu/DailySummaryHelp).
RXA: later, Max sent us this secret note:
However, I realize it might not be clear from the page which are the key steps. These are just running:
1) ./DetectorChar/bin/gw_daily_summary --day YYYYMMDD --file-tag some_custom_tag To create pages for day YYYYMMDD (the file-tag option is not strictly necessary but will prevent conflict with other instances of the code running simultaneously).
2) sync those days back to nodus by doing, eg: ./DetectorChar/bin/pushnodus 20160701 20160702
This must all be done from the cluster using the 40m shared account.
./DetectorChar/bin/gw_daily_summary --day YYYYMMDD --file-tag some_custom_tag
./DetectorChar/bin/pushnodus 20160701 20160702
System-wide CIT LDAS cluster maintenance may cause disruptions to summary pages today.
LDAS has not recovered from maintenance causing the pages to remain unavailable until further notice.
> System-wide CIT LDAS cluster maintenance may cause disruptions to summary pages today.
FYI this issue has still not been solved, but the pages are working because I got the software running on an
alternative headnode (pcdev2). This may cause unexpected behavior (or not).
> LDAS has not recovered from maintenance causing the pages to remain unavailable until further notice.
> > System-wide CIT LDAS cluster maintenance may cause disruptions to summary pages today.
There has been a change in the default format for the output of the condor_q command at CIT clusters. This could be problematic for the summary page status monitor, so I have disabled the default behavior in favor of the old one. Specifically, I ran the following commands from the 40m shared account: mkdir -p ~/.condor echo "CONDOR_Q_DASH_BATCH_IS_DEFAULT=False" >> ~/.condor/user_config This should have no effect on the pages themselves.
UPDATE: It turned out that the pair of 0.3 OD ND filters we used were not matched. So we replaced them with new 0.5 OD NENIR05A-C from thorlabs. Now both the photodiodes give similar count.
The DC power incident on the PDs is 74 mW which may cause saturation. We attenuated the beam going to BHD_DC Photodiodes using ND filter of OD 0.3 which gives attenuation of 0.5.
[Mayank, Radhika, Paco]
We locked PRMI for a solid hour and controlled LO phase angle using BH55_Q at higher power.
After Radhika aligned the IFO for us, and recovered the PRMI flashing (using REFLDC), we attempted a PRMI lock. After a few trials we succeeded.
Control parameters: see Attachment #1, basically REFL11_I to PRCL, and AS55_Q to MICH (error points) and actuation as previous locks with PRCL to PRMand MICH to 0.5 * BS - 0.33 * PRM.
The gains are slightly different, and in particular PRCL gain was increased from -0.07 to -0.09 after an OLTF estimated the UGF could be increased to > 120 Hz (Attachment #2 shows the measured OLTF) Do note we ended up disabling the FM1 on PRCL LSC filter bank (a boost) because we thought the loop was unstable when it got triggered ON. Finally, we took a quick noise spectrum of PRMI, and we have yet to calibrate it.
We also managed to reduce the AS_DC level from 0.4 to 0.1. We first tried to add an offset to MICH error point but the trick was to align the ITMX ITMY differential yaw.
Lock start Time: 1367107965 --> 1367111565
While PRMI was locked, we quickly locked homodyne angle using BH55_Q. For this the demod angle was optimized from -60 deg to 55.374 deg. The lock was acquired using FM5 and FM8 with a gain of -0.75. Attachment #3 shows the "calibrated" noise budget of the LO phase under this configuration. The main difference with respect to the previous budget is in the "RIN" which we now realize is not relative... therefore the increase in this budget. We will revisit this calibration later.
- Next steps
We removed the PSL controller internal (broken) fans after it tripped due to overheat.
While aligning Xarm I noticed sudden loss of beam. On Radhikas suggestion we cheked the PSL and found out that the PSL controller was in off state (No lights on front and back panel). We restored the situation by unplugging and replugging the power cord. The PSL worked fine for a few minutes (~ 30 ) and then tripped again.This time the front panel OFF light was on . See attached image (Attachment #1).
[Paco, Mayank, Koji-remote]
We disconnected the PSL controller and took this opportunity to investigate the controller's internal cooling mechanism. After disassembling the top panel of the chassis, we saw there are two SUNON - KD1205PHB2 fans meant to run at 12 VDC (1.7 W) connected to the bottom pcb inside the controller. After disconnecting them from this board, we tested them with an externally supplied dc voltage and confirmed they no longer worked. We noted the cooling mechanism is based on a long aluminum heat sink to which most ICs are attached, and the fans are meant to provide airflow towards the rear aperture on the chassis. We followed Koji's suggestion and for now, removed the damaged components (detailed pictures of this operation have been posted in a google photos album elsewhere) to allow heat to flow out more easily. We reassembled the controller chassis and reinstalled it with the external fan providing the necessary airflow to prevent the unit from tripping again due to overheating. Then we turned on PSL and recovered PMC and IMC locks.
We took a C1:IOO-MC_F_DQ trace after this work to confirm our earlier findings; the trace is attached in Attachment #2. The noise bumps are present as expected. This is still not a desirable configuration so next step would be replacing the external fan, or even better, find the appropriate spare of the internal units and berid the external one.
Restarted the NDS2 script on Megatron following the instructions here
1) SSH to megatron - "ssh megatron"
2) Switch to nds2mgr using "sudo su nds2mgr"
3) Stop and restart the service using
"sudo /etc/init.d/nds2 stop
sudo /etc/init.d/nds2 start"
We aligned the ETMX with help of reflection of green beam from the ETMX.
ITMX was aligned with Michelson fringe.
We could see the flashing of green HOM and the flashing of IR. However there was no signal at TRX. This is beacuse we have not connected the Higain PD yet. In oder to use the QPD signal instead of Higain PD we set a negative threshold of PD selection.
We were able to see the flasing on the ndscope and the adgusted the ITMX and ETMX aligment to improve the transmission.
Even with high transmission we were not able to get any locking with ETMX actuation. We shifted to ITMX actuation and doubled the gain to lock the Xarm.
To explore this assymettery in ETMX and ITMX actuation. We measured the transfer function of C1:SUS-ITMX_LSC_EXC to C1:LSC-XARM_IN1 and C1:SUS-ETMX_LSC_EXC to C1:LSC-XARM_IN1. (Attachment #1). The TFs look different.
This could be because the Anti-De-Whitening is not yet updated for the ETMX new coil driver.
After lots of burtrestores and alignment by Koji. The interferometer is live again.
The X/Y green beams are resonating. The X/Y arms were locked with IR. We saw the MI fringe on the POP. However, the AS spot is not visible. Needs to check the optical table?
The auto-restoration process did not bring back all the settings (incomplete restoration), and we had to restore them manually. We took some time to realize it.