As part of the fb40m restart procedure (Sanjit and I were restarting it to add some new channels so they can be read by the OAF model), I checked up on how the backup has been going. Unfortunately the answer is: not well.
Alan imparted to me all the wisdom of frame builder backups on September 28th of this year. Except for the first 2 days of something having gone wrong (which was fixed at that time), the backup script hasn't thrown any errors, and thus hasn't sent any whiny emails to me. This is seen by opening up /caltech/scripts/backup/rsync.backup.cumlog , and noticing that after October 1, 2009, all of the 'errorcodes' have been zero, i.e. no error (as opposed to 'errorcode 2' when the backup fails).
However, when you ssh to the backup server to see what .gwf files exist, the last one is at gps time 941803200, which is Nov 9 2009, 11:59:45 UTC. So, I'm not sure why no errors have been thrown, but also no backups have happened. Looking at the rsync.backup.log file, it says 'Host Key Verification Failed'. This seems like something which isn't changing the errcode, but should be, so that it can send me an email when things aren't up to snuff. On Nov 10th (the first day the backup didn't do any backing-up), there was a lot of Megatron action, and some adding of StochMon channels. If the fb was restarted for either of these things, and the backup script wasn't started, then it should have had an error, and sent me an email. Since any time the frame builder's backup script hasn't been started properly it should send an email, I'm going to go ahead and blame whoever wrote the scripts, rather than the Joe/Pete/Alberto team.
Since our new raid disk is ~28 days of local storage, we won't have lost anything on the backup server as long as the backup works tonight (or sometime in the next few days), because the backup is an rsync, so it copies anything which it hasn't already copied. Since the fb got restarted just now, hopefully whatever funny business (maybe with the .agent files???) will be gone, and the backup will work properly.
I'll check in with the frame builder again tomorrow, to make sure that it's all good. |