Entry  Wed Jan 11 19:20:23 2017, Max Isi, Update, Summary Pages, December outage 
    Reply  Thu Jan 12 23:22:34 2017, rana, Update, Summary Pages, December outage 
       Reply  Fri Jan 13 14:33:00 2017, MAX (not Rana), Update, Summary Pages, December outage 
Message ID: 12703     Entry time: Wed Jan 11 19:20:23 2017     Reply to this: 12709
Author: Max Isi 
Type: Update 
Category: Summary Pages 
Subject: December outage 

The summary pages were not successfully generated for a long period of time at the end of 2016 due to syntax errors in the PEM and Weather configuration files.

These errors caused the INI parser to crash and brought down the whole gwsumm system. It seems that changes in the configuration of the Condor daemon at the CIT clusters may have made our infrastructure less robust against these kinds of problems (which would explain why there wasn't a better error message/alert), but this requires further investigation.

In any case, the solution was as simple as correcting the typos in the config side (on the nodus side) and restarting the cron jobs (on the cluster side, by doing `condor_rm 40m && condor_submit DetectorChar/condor/gw_daily_summary.sub`). Producing pages for the missing days will take some time (how to do so for a particular day is explained in the wiki https://wiki-40m.ligo.caltech.edu/DailySummaryHelp).

RXA: later, Max sent us this secret note:

However, I realize it might not be clear from the page which are the key steps. These are just running:

1) ./DetectorChar/bin/gw_daily_summary --day YYYYMMDD --file-tag some_custom_tag To create pages for day YYYYMMDD (the file-tag option is not strictly necessary but will prevent conflict with other instances of the code running simultaneously).

2) sync those days back to nodus by doing, eg: ./DetectorChar/bin/pushnodus 20160701 20160702

This must all be done from the cluster using the 40m shared account.
