Quick Note on Multiprocessing: The multiprocessing was plugged into the codebase on March 4. Since then, the various pages that appear when you click on certain tabs (such as the page found here: https://nodus.ligo.caltech.edu:30889/40m-summary/archive_daily/20130304/ifo/dc_mon/ from clicking the 'IFO' tab) don't display graphs. But, the graphs are being generated (if you click here or here, you will find the two graphs that are supposed to be displayed). So, for some reason, the multiprocessing is preventing these graphs from appearing, even though they are being generated. I rolled back the multiprocessing changes temporarily, so that the newly generated pages look correct until I find the cause of this.
Fixing Plot Limits: The plots generated by the summary_pages.py script have a few problems, one of which is: the graphs don't choose their boundaries in a very useful way. For example, in these pressure plots, the dropout 0 values 'ruin' the graph in the sense that they cause the plot to be scaled from 0 to 760, instead of a more useful range like 740 to 760 (which would allow us to see details better).
The call to the plotting functions begins in process_data() of summary_pages.py, around line 972, with a call to plot_data(). This function takes in a data list (which represents the x-y data values, as well as a few other fields such as axes labels). The easiest way to fix the plots would be to "cleanse" the data list before calling plot_data(). In doing so, we would remove dropout values and obtain a more meaningful plot.
To observe the data list that is passed to plot_data(), I added the following code:
# outfile is a string that represents the name of the .png file that will be generated by the code.
print_verbose("Saving data into a file.")
outfile_mch = open(outfile + '.dat', 'w')
# at this point in process_data(), data is an array that should contain the desired data values.
if (data == ):
print >> outfile_mch, data
When I ran this in the code midday, it gave a human-readable array of values that appeared to match the plots of pressure (i.e. values between 740 and 760, with a few dropout 0 values). However, when I let the code run overnight, instead of observing a nice list in 'outfile.dat', I observed:
[('Pressure', array([ 1.04667840e+09, 1.04667846e+09, 1.04667852e+09, ...,
1.04674284e+09, 1.04674290e+09, 1.04674296e+09]), masked_array(data = [ 744.11076965 744.14254761 744.14889221 ..., 742.01931356 742.05930208
mask = False,
fill_value = 1e+20)
I.e. there was an ellipsis (...) instead of actual data, for some reason. Python does this when printing lists in a few specific situations. The most common of which is that the list is recursively defined. For example:
a = 
It doesn't seem possible that the definitions for the data array become recursive (especially since the test worked midday). Perhaps the list becomes too long, and python doesn't want to print it all because of some setting.
Instead, I will use cPickle to save the data. The disadvantage is that the output is not human readable. But cPickle is very simple to use. I added the lines:
cPickle.dump(data, open(outfile + 'pickle.dat', 'w'))
This should save the 'data' array into a file, from which it can be later retrieved by cPickle.load().
There are other modules I can use that will produce human-readable output, but I'll stick with cPickle for now since it's well supported. Once I verify this works, I will be able to do two things:
1) Cut out the dropout data values to make better plots.
2) When the process_data() function is run in its current form, it reprocesses all the data every time. Instead, I will be able to draw the existing data out of the cPickle file I create. So, I can load the existing data, and only add new values. This will help the program run faster.