40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  ATF eLog  Not logged in ELOG logo
Message ID: 1213     Entry time: Tue Dec 14 17:02:56 2010
Author: Alastair, Zach, Frank (and absolutely definitely no help from Joe) 
Type: Computing 
Category: Computing 
Subject: FB0 status update 

We started having a look at the prolems with the frame builder in the ATF today.  As Zach had said, the previous attempts ended at the point where FB0 was rebooted and one of the drives kept coming up with some problem.  The main issue before the reboot was that daqd kept restarting.  It seems that the two problems are unrelated.

1) The first issue, with the hard drive has been solved.  The drive that was causing the problem was sdc1.  The machine FB0 has four drives installed, one of which was labelled "full" and a second one "trend".  We removed these two drives, backed up the fstab file and removed them from the listing in fstab.  The computer was able to boot with no problems.  We identified sdc1 as being "full", mounted in position 3 in the computer casing.  I got a temporary replacement that is 1TB from Larry and have installed that in it's place.  After checking the device name in the /dev directory (didn't want to format the wrong one...) I formatted it using fdisk /dev/sdc and created the filesystem using mkfs.ext3 /sdc1

I then went back and copied the old version of fstab back to it's original place and put the fourth drive, sdd back in place.  The machine was rebooted and I checked that the new drive was mounting correctly as /frames/full.  I noticed that the ownership of /full was not the same as /trend (the new drive was owned by root) so I used chown controls:controls /frames/full to make controls the owner.  Inside /full there is a directory /lost+found that I also chown'ed to be controls.  Everything seems happy with this now.

I've ordered a new 1.5TB drive and a second one to keep as a backup.  They were only $59.99 from Tiger Direct - Bargain!!!

2a) The issue with daqd restarting may be a bit more involved.  The first thing that was noticed was that fb1 exhibits the same problem of daqd restarting.  Since fb0 was out of commission at this point, we checked on fb1 to see when the last data in /frames/full was written.  The date was Nov 13th.  Also the user name changes on the 8th, corresponding to this elog posting that Rana put up.  He had been trying to change over all the machines to having the same group user id.  It seems that fb1 however is still on controls oldcon as the user and group.  At some point when the front-end was restarted it no longer had write permission.

b) However having checked out fb0 it appears that this is controls controls, and that it should not have the same problems.  I looked in /frames/trend/second and there is data in there since the Nov 13th date.  However the owner of the data is root which I'm guessing is bad.  I wonder whether the permissions on the /frames/full folder may have been set such that it was unable to write to this folder, or perhaps it was just that the old drive was failing.  It seems that the easiest way to check is to start the frontend again and see if it will record full data.

I ran startatf, then opened the frond end medm panel and set the Burt restore entry to 1.  The front-end came back up just fine, and the gyro MEDM screens are working.  I kept checking back on the processes running until daqd started, with a time of 16:52.  I then went into the /frames/full folder on the new disk and checked for new data.  There was a new data folder which, just like the trend data, is owned by root.  I'm going to leave it for ten minutes to see if any data is recorded that I can access.  I'll come back and update this elog entry........

.....23:35 and daqd is still up and running on fb0 since this afternoon.  So replacing the hard drive seems to have fixed that problem.  Need to look at dataviewer again tomorrow because I can't seem to pull any data on dataviewer without it coming up with errors.

ELOG V3.1.3-