40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Mon Nov 23 17:53:08 2009, Jenne, Update, Computers, 40m frame builder backup acting funny 
    Reply  Tue Nov 24 16:06:45 2009, Jenne, Update, Computers, 40m frame builder backup acting funny 
       Reply  Wed Nov 25 11:10:05 2009, Jenne, Update, Computers, 40m frame builder backup acting funny 
Message ID: 2315     Entry time: Mon Nov 23 17:53:08 2009     Reply to this: 2322
Author: Jenne 
Type: Update 
Category: Computers 
Subject: 40m frame builder backup acting funny 

As part of the fb40m restart procedure (Sanjit and I were restarting it to add some new channels so they can be read by the OAF model), I checked up on how the backup has been going.  Unfortunately the answer is: not well.

Alan imparted to me all the wisdom of frame builder backups on September 28th of this year.  Except for the first 2 days of something having gone wrong (which was fixed at that time), the backup script hasn't thrown any errors, and thus hasn't sent any whiny emails to me.  This is seen by opening up /caltech/scripts/backup/rsync.backup.cumlog , and noticing that  after October 1, 2009, all of the 'errorcodes' have been zero, i.e. no error (as opposed to 'errorcode 2' when the backup fails).  

However, when you ssh to the backup server to see what .gwf files exist, the last one is at gps time 941803200, which is Nov 9 2009, 11:59:45 UTC.  So, I'm not sure why no errors have been thrown, but also no backups have happened. Looking at the rsync.backup.log file, it says 'Host Key Verification Failed'.  This seems like something which isn't changing the errcode, but should be, so that it can send me an email when things aren't up to snuff.  On Nov 10th (the first day the backup didn't do any backing-up), there was a lot of Megatron action, and some adding of StochMon channels.  If the fb was restarted for either of these things, and the backup script wasn't started, then it should have had an error, and sent me an email.  Since any time the frame builder's backup script hasn't been started properly it should send an email, I'm going to go ahead and blame whoever wrote the scripts, rather than the Joe/Pete/Alberto team.

Since our new raid disk is ~28 days of local storage, we won't have lost anything on the backup server as long as the backup works tonight (or sometime in the next few days), because the backup is an rsync, so it copies anything which it hasn't already copied.  Since the fb got restarted just now, hopefully whatever funny business (maybe with the .agent files???) will be gone, and the backup will work properly. 

I'll check in with the frame builder again tomorrow, to make sure that it's all good.

ELOG V3.1.3-