40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Wed Feb 27 22:46:53 2013, Max Horton, Update, Summary Pages, Multiprocessing Implementation 
    Reply  Wed Feb 27 23:19:54 2013, rana, Update, Summary Pages, Multiprocessing Implementation 
       Reply  Thu Feb 28 14:19:20 2013, Max Horton, Update, Summary Pages, Multiprocessing Implementation 
          Reply  Mon Mar 4 10:41:18 2013, Max Horton, Update, Summary Pages, Multiprocessing Implementation 
Message ID: 8194     Entry time: Wed Feb 27 22:46:53 2013     Reply to this: 8195
Author: Max Horton 
Type: Update 
Category: Summary Pages 
Subject: Multiprocessing Implementation 

Overview: In order to make the code more maintainable, I need to factor it into different well-documented classes.  To do this carefully and rigorously, I need to run tests every time I make changes to the code.  The runtime of the code is currently quite high, so I will work on improving the runtime of the program before factoring it into classes.  This will be more efficient (minimize testing time) and allow me to factor more quickly.  So, my current goal is to improve runtime as much as possible.

Multiprocessing Implementation:

I invented a simple way to implement multiprocessing in the summary_pages.py file.  Here is an example: in the code, there is a process_data() function, which is run 75 times and takes rather long to run.  I created multiple processes to run these calls concurrently, as follows:

Original Code: (around line 7840)

for sec in datasections:
      for run in run_opts:
        run_opt = 'run_%s_time' % run
        if hasattr(tabs[sec], run_opt) and getattr(tabs[sec], run_opt):

          process_data(cp, ifo, start, end, tabs[sec],\
                       cache=datacache, segcache=segcache, run=run,\
                       veto_def_table=veto_table[run], plots=do['plots'],\
                       subplots=do['subplots'], html_only=do['html_only'])

  #
  # free data memory
  #
          keys = globvar.data.keys()
          for ch in keys:
            del globvar.data[ch]

 

The weakness in this code is that process_data() is called many times, and doesn't take advantage of megatron's multiple threads.  I changed the code to:

Modified Code: (around line 7840)
  import multiprocessing

  if do['dataplot']:
  ... etc... (same as before)       
    if hasattr(tabs[sec], run_opt) and getattr(tabs[sec], run_opt):

          # Create the process
          p = multiprocessing.Process(target=process_data, args=(cp, ifo, start, end, tabs[sec], datacache, segcache, run, veto_table[run], do['plots'], do['subplots'], do['html_only']))
          # Add the process to the list of processes
          plist += [p]

 

Then, I run the process in groups of size "numconcur", as follows:
    numconcur = 8
    curlist = []
    for i in range(len(plist)):
      curlist += [plist[i]]
      if (i % numconcur == (numconcur - 1)):
        for item in curlist:
          item.start()
        for item in curlist:
          item.join()
          item.terminate()
        keys = globvar.data.keys()
        for ch in keys:
          del globvar.data[ch]
        curlist = []

 

The value of numconcur (which defines how many threads megatron will use concurrently to run the program) greatly effects the speed of the program!  With numconcur = 8, the program runs in ~45% of the time of the original code!  This is the optimal value -- megatron has 8 threads.  Several other values were tested - numconcur = 4 and numconcur = 6 had almost the same performance as numconcur = 8, but numconcur = 1 (which is essentially the same as the unmodified code) has a much worse performance.

Improvement Cap:

Why does numcores = 4 have almost the same performance as numcores = 8?  I monitored the available memory of megatron, and it is quickly consumed during these runs.  I believe that once 4 or more cores are being used, the fact that the data can't all fit in megatron's memory (which was entirely filled during these trials) counteracts the usefulness of additional threads.

Summary of Improvements:

Original Runtime of all process_data() statements: (approximate): 8400 sec

Runtime with 8 processes (approximate): 3842 sec

This is about a 55% improvement for speed, in this particular sector (not in the overall performance of the entire program).  It saves about 4600 seconds (~1.3 hours) per run of the program.  Note that these values are approximate (since other processes are running on megatron during my tests.  They might be inflated or deflated by some margin of error).

 

Next Time:

This same optimization method will be applied to all repetitive processes with reasonably large runtimes.

ELOG V3.1.3-