Overview: In order to make the code more maintainable, I need to factor it into different well-documented classes. To do this carefully and rigorously, I need to run tests every time I make changes to the code. The runtime of the code is currently quite high, so I will work on improving the runtime of the program before factoring it into classes. This will be more efficient (minimize testing time) and allow me to factor more quickly. So, my current goal is to improve runtime as much as possible.
Multiprocessing Implementation:
I invented a simple way to implement multiprocessing in the summary_pages.py file. Here is an example: in the code, there is a process_data() function, which is run 75 times and takes rather long to run. I created multiple processes to run these calls concurrently, as follows:
Original Code: (around line 7840)
for sec in datasections:
for run in run_opts:
run_opt = 'run_%s_time' % run
if hasattr(tabs[sec], run_opt) and getattr(tabs[sec], run_opt):
process_data(cp, ifo, start, end, tabs[sec],\
cache=datacache, segcache=segcache, run=run,\
veto_def_table=veto_table[run], plots=do['plots'],\
subplots=do['subplots'], html_only=do['html_only'])
#
# free data memory
#
keys = globvar.data.keys()
for ch in keys:
del globvar.data[ch]
The weakness in this code is that process_data() is called many times, and doesn't take advantage of megatron's multiple threads. I changed the code to:
Modified Code: (around line 7840)
import multiprocessing
if do['dataplot']:
... etc... (same as before)
if hasattr(tabs[sec], run_opt) and getattr(tabs[sec], run_opt):
# Create the process
p = multiprocessing.Process(target=process_data, args=(cp, ifo, start, end, tabs[sec], datacache, segcache, run, veto_table[run], do['plots'], do['subplots'], do['html_only']))
# Add the process to the list of processes
plist += [p]
Then, I run the process in groups of size "numconcur", as follows:
numconcur = 8
curlist = []
for i in range(len(plist)):
curlist += [plist[i]]
if (i % numconcur == (numconcur - 1)):
for item in curlist:
item.start()
for item in curlist:
item.join()
item.terminate()
keys = globvar.data.keys()
for ch in keys:
del globvar.data[ch]
curlist = []
The value of numconcur (which defines how many threads megatron will use concurrently to run the program) greatly effects the speed of the program! With numconcur = 8, the program runs in ~45% of the time of the original code! This is the optimal value -- megatron has 8 threads. Several other values were tested - numconcur = 4 and numconcur = 6 had almost the same performance as numconcur = 8, but numconcur = 1 (which is essentially the same as the unmodified code) has a much worse performance.
Improvement Cap:
Why does numcores = 4 have almost the same performance as numcores = 8? I monitored the available memory of megatron, and it is quickly consumed during these runs. I believe that once 4 or more cores are being used, the fact that the data can't all fit in megatron's memory (which was entirely filled during these trials) counteracts the usefulness of additional threads.
Summary of Improvements:
Original Runtime of all process_data() statements: (approximate): 8400 sec
Runtime with 8 processes (approximate): 3842 sec
This is about a 55% improvement for speed, in this particular sector (not in the overall performance of the entire program). It saves about 4600 seconds (~1.3 hours) per run of the program. Note that these values are approximate (since other processes are running on megatron during my tests. They might be inflated or deflated by some margin of error).
Next Time:
This same optimization method will be applied to all repetitive processes with reasonably large runtimes. |