40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 244 of 341  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  973   Fri Sep 19 11:21:45 2008 josephbConfigurationComputersNameserver and Rosalba
I tried modifying the nsswitch.conf file to include going to dns in addition to local files for everything (services, network, etc) and then moving the /etc/hosts file to /etc/hosts.bak. Unfortunately, this still didn't allow front-ends to reboot properly. So I'm not sure what is using the hosts file, but whatever it is, is apparently important. After the test I placed the hosts file back and reverted the nsswitch.conf file.

I also noticed that Rosalba was having problems connecting to the network. This apparently was because I had shut down the dhcp server on the NAT router, as had been discussed at the meeting on Wednesday.

To fix this, I modified the /etc/sysconfig/network-scripts/ifcfg-eth1 file to fix rosalba's ip as 131.215.113.24 (which doesn't seem to be in use). I also updated rosalba's /etc/resolv.conf file to point at linux1's name server, and two additional name servers as well, and added the "search martian" line. I modified the /etc/sysconfig/network-scripts/ifcfg-eth0 file so the built in network card doesn't come up automatically, since its currently not plugged into anything. Lastly, I added rosalba and its IP to linux1's name server files.
  992   Thu Sep 25 14:03:08 2008 josephbConfigurationComputers 

Quote:

We (Joe) need to buy a GigE card for linux1 and to also set up conlog and conlogger to run on Nodus.


A spare Intel Pro 1000/GT desktop adapter (gigabit ethernet card) has been added to Linux1 and is now using that card to connect to the network.

This was after a slight scare when I somehow reset the bios on Linux1 during the first reboot after adding the card.
After some debugging and discussion with Yoichi, the bios was fixed and the computer works again, with its new faster network connection.

Although we both noted that Linux1 is a rather old machine, with only half a gig of Ram and reaching about 80% capacity on its 58 gigabyte hard drive (raid). Might be worth upgrading in general.

Need to figure out how to install conlog/conlogger programs next...
  1006   Mon Sep 29 13:33:39 2008 josephbConfigurationComputersGigabit network finished and conlog available on Nodus
The last 100 Mb unmounted hub has been removed (or at least of the ones I could find). We should be on a fully gigabit network with Cat6 cables and lots and lots of labels.

In other news, the pearl script that runs the web interface on linux1 for the conlog has been copied to /cvs/cds/caltech/apache/cgi-bin/ and is now being pointed to by the apache server on Nodus.

https://nodus.ligo.caltech.edu:30889/cgi-bin/conlog_web.pl
  1033   Wed Oct 8 12:35:56 2008 josephbConfigurationComputersNew Network diagram for the 40m
Attached is a pdf of the new network diagram for the 40m after having removed all of the old hubs.
  1067   Wed Oct 22 12:37:47 2008 josephbUpdateComputersNetwork spreadsheet
Attached in open office format as well as excel format is spreadsheet containing all the devices with IP addresses at the 40m. Please contact me with any corrections.
  1098   Tue Oct 28 12:01:01 2008 josephbConfigurationComputerslinux2

Quote:
I have removed linux2 and its cables from the control room and put it into 1Y3 along with op340m.

When Joe next comes in we can ask him to Cat6 it to the rest of the world, although it already
seems to me that the CDS hub/switch next Alberto's desk is too full and that we need to purchase
a 48 port device for there.


Note I still need to remove a fair bit of cabling no longer in use from the Martian network switch next to Alberto's desk. There's actually about 8-10 cables there which show no connectivity and are not being used. So there's really about 33% of the ports open in the control room hub, it just doesn't look like it.

As for linux2, I'll probably just connect it to the 1Y2 or 1Y6 Hubs when I get the chance.
  1175   Thu Dec 4 16:29:20 2008 josephbConfigurationComputersError message on Frame Builder Raid Array
The Fibrenetix FX-606-U4 RAID connected to the frame builder in 1Y7 is showing the following error message: IDE Channel #4 Error Reading
  1253   Mon Jan 26 14:51:54 2009 josephbConfigurationVAC 
We need a new RS-232 to Ethernet bridge in order to interface properly with the new RGA. The RGA has a fixed baud rate of 28.8k, and the current bridge (which used to work with the old RGA) doesn't have that baud rate as an option. Currently looking into purchasing a new bridge, and trying to make sure it can meet the communications requirements of the RGA.
  1294   Wed Feb 11 15:01:47 2009 josephbConfigurationComputersAllegra

So after having broke Allegra by updating the kernel, I was able to get it running again by copying the xorg.conf.backup file over xorg.conf in /etc/X11.  So at this point in time, Allegra is running with generic video drivers, as opposed to the ATI specific and proprietary drivers.

  1333   Mon Feb 23 16:42:08 2009 josephbConfigurationCamerasCamera Beta Testing

I've setup the GC650 camera (ID 32223) to look at the mode cleaner transmission.  I've also added an alias to the camera server and client for this camera.

To use, type: "pserv1 &"on the machine you want to run the server on and "pcam1 &" on the machine you want to actually view the video.  At the moment, this only works for the 64-bit Centos 5 machines, Rosalba, Allegra and Ottavia.

Note, you will generally want to start the client first (pcam1 or pcam2) to see if a server is already running somewhere.  The server will complain that it can't connect to a camera if it already is in use.

I've setup the GC750 camera (ID 44026) to look at the the right most analog quad TV.  This can be run by using "pserv2 &" and "pcam2 &". 

If the image stops playing you can try starting and stoping the server to see if will start back up. 

You can also try increasing or decreasing the exposure, to see if that helps.  The increase and decrease buttons change the exposure by a factor of 2 for each press.

  Lastly, the button "Read Epic Channel" reads in the current value from the channel: "C1:PEM-stacis_EEEX_geo" and uses it as the exposure value, in microseconds (in principle 10 to 1000000 should work).

For example, to exposre for 10000 microseconds, use "ezcawrite C1:PEM-stacis_EEEX_geo 10000" and press the "Read Epic Channel" button.

  1355   Wed Mar 4 17:20:04 2009 josephbUpdateCamerasCamera code upgrades

I've updated the digital camera python code as well as changed the network topology.

At the moment, both cameras are connected to a small gigabit switch which only talks to Ottavia.  This means all camera servers must be run on Ottavia, allow camera output is still UDP multicast so any machine capable of running gstreamer can pick up the images.

The server and client programs now have the ability to read a configuration file for the setup of the cameras.  They default to pcameraSettings.ini, but this can default can be changed with a -c or --config option

For example, "serverV3.py --config pcam1.ini" will run the server using the pcam1.ini settings file.  Similarly, "client.py --config pcam1.ini" will also take the IP settings from the config file so that it knows at which port and IP to listen.

These programs and .ini files have been placed in /cvs/cds/caltech/apps/linux64/python/pcamera/

I've updated the cshrc.40m aliases so that it uses the new configuration file options, so now pcam1 calls "client.py -c pcam1.ini" in the above directory.

So to start a client use pcam1 or pcam2 (for the 32223 camera in PSL looking at MC trans or 44026 looking at an analog moniter in the control room respectively).  These can be run on Allegra, Rosalba or Ottavia at the moment.

To start a server, use pserv1 or pserv2.  These *must* be run on Ottavia.

I've also added a -n or --no-gui option at Yoichi's request, one which just starts up and plays, with no graphical gui.

Lastly, I've made some changes to the base pcamerasrc.py file, which should make display more robust.  After a failed transmission of an image from the camera to Ottavia, it should re-attempt up to 10 times before giving up. I'm hoping this will make it more robust against packet loss.  The change in network topology has also helped this, allowing 640x480 to be transmitted on both cameras before tens of minutes before a packet loss causes a stop.

  1385   Wed Mar 11 11:30:15 2009 josephbConfigurationCameras 

I modified the Video.db file used by c1aux located in  /cvs/cds/caltech/target/c1aux.

I added the following channels to the db file, intended for either read in or read out by the digital camera scripts.

C1:VID-ETMY_X_COM
C1:VID-ETMY_Y_COM

C1:VID-ETMY_X_STDEV

C1:VID-ETMY_Y_STDEV

C1:VID-ETMY_XY_COVAR

C1:VID-ETMY_EXPOSURE

C1:VID-ETMY_GAIN

C1:VID-ETMY_X_UL

C1:VID-ETMY_Y_UL

C1:VID_ETMY_X_SIZE

C1:VID_ETMY_Y_SIZE

A better naming scheme can probably be devised, but these will do for now.

  1496   Sun Apr 19 11:34:33 2009 josephbHowToCamerasUSB Frame Grabber - How to

To use the Sensoray 2250 USB frame grabber:

Ensure you have the following packages installed: build-essential, libusb-dev

Download the Linux manual and linux SDK from the Sensoray website at:

http://www.sensoray.com/products/2250data.htm

Go to the Software and Manual tab near the bottom to find the links.  The software can also be found on the 40m computers at /cvs/cds/caltech/users/josephb/sensoray/

The files are Manual2250LinuxV120.pdf and s2250_v120.tar.gz

Run the following commands in the directory where you have the files.

tar -xvf s2250_v120.tar.gz

cd s2250_v120

make

cd ezloader

make

sudo make modules_install

cd ..

At this point plug in the 2250 frame grabber.

sudo modprobe s2250_ezloader

Now you can run the demo with

./sraydemo or ./sraydemo64

Options will show up on screen.  A simple set to start with is "encode 0", which sets the recording type, "recvid test.mpg", which starts the recording in file test.mpg, and "stop", which stops recording.  Note there is no on screen playback.  One needs an installed mpeg player to view the saved file, such as Totem (which can screen cap to .png format) or mplayer.

All these instructions are on the first few pages of the Manual2250LinuxV120 pdf.

 

 

  1497   Sun Apr 19 11:51:05 2009 josephbUpdateCamerasMafalda may need an update

I tried installing libusb-dev on mafalda in order to try getting the usb frame grabber to work on it, but could not as it could not download the package.

I then tried to do a sudo apt-get update, which failed completely, as the repository seems to have ceased existing.  Basically I had all 404 Not Found errors.

Turns out Mafalda is still running Ubuntu 7.04, whose support ended late 2008.  So there's a couple things that can be done:

1) Ignore it, and simply not update Mafalda anymore.  This also means some newer software and hardware simply won't work with it (like the usb frame grabber)

2) Try to find another, unofficial repository which still has all of the Ubuntu 7.04 packages.

3) Upgrade to a newer, still supported Ubuntu, such as 7.10, 8.04, or 8.10.

I'd personally lean towards the 3rd option, and go to the 8.04 long term support version.  If people agree with it, I could do the upgrade sometime Monday or Tuesday.

 

 

  1581   Wed May 13 12:41:14 2009 josephbUpdateCamerasTiming and stability tests of GigE Camera code

At the request of people down at LLO I've been trying to work on the reliability and speed of the GigE camera code.  In my testing, after several hours, the code would tend to lock up on the camera end.  It was also reported at LLO after several minutes the camera display would slow down, but I haven't been able to replicate that problem.

I've recently added some additional error checking and have updated to a more recent SDK which seems to help.  Attached are two plots of the frames per second of the code.  In this case, the frames per second  are measured as the time between calls to the C camera code for a new frame for gstreamer to encode and transmit.  The data points in the first graph are actually the averaged time for sets of 1000 frames.  The camera was sending 640x480 pixel frames, with an exposure time of 0.01 seconds.  Since the FPS was mostly between 45 and 55, it is taking the code roughly 0.01 second to process, encode, and transmit a frame.

During the test, the memory usage by the server code was roughly 1% (or 40 megabytes out of 4 gigabytes) and 50% of a CPU (out a total of  CPUs).

  1590   Fri May 15 16:47:44 2009 josephbUpdateCamerasImproved camera code

At Rob's request I've added the following features to the camera code.

The camera server, which can be started on Ottavia by just typing pserv1 (for camera 1) or pserv2 (for camera 2), now has the ability to save individual jpeg snap shots, as well as taking a jpeg image every X seconds, as defined by the user.

The first text box is for the file name (i.e. ./default.jpg will save the file to the local directory and call it default.jpg).  If the camera is running (i.e. you've pressed start), prsessing "Take Snapshot to" will take an image immediately and save it.  If the camera is not running, it will take an image as soon as you do start it.

If you press "Start image capture every X seconds", it will do exactly that.  The file name is the same as for the first button, but it appends a time stamp to the end of the file.

There is also a viedo recording client now.  This is access by typing "pcam1-mov" or "pcam2-mov".  The text box is for setting the file name.  It is currently using the open source Theora encoder and Ogg format (.ogm).  Totem is capable of reading this format (and I also believe vlc).  This can be run on any of the Linux machines.

The viewing client is still accessed by "pcam1" or "pcam2".

I'll try rolling out these updates to the sites on Monday.

The configuration files for camera 1 and camera 2 can be found by typing in camera (which is aliased to cd /cvs/cds/caltech/apps/linux64/python/pcamera) and are called pcam1.ini, pcam2.ini, etc.

 

  1901   Fri Aug 14 10:39:50 2009 josephbConfigurationComputersRaid update to Framebuilder (specs)

The RAID array servicing the Frame builder was finally switched over to JetStor Sata 16 Bay raid array. Each bay contains a 1 TB drive.  The raid is configured such that 13 TB is available, and the rest is used for fault protection.

The old Fibrenetix FX-606-U4, a 5 bay raid array which only had 1.5 TB space, has been moved over to linux1 and will be used to store /cvs/cds/.

This upgrade provides an increase in look up times from 3-4 days for all channels out to about 30 days.  Final copying of old data occured on August 5th, 2009, and was switched over on that date.

  1904   Fri Aug 14 15:20:42 2009 josephbSummaryComputersLinux1 now has 1.5 TB raid drive

Quote:

Quote:

nodus was rebooted by Alex at Fri Aug 14 13:53. I launched elogd.

cd /export/elog/elog-2.7.5/
./elogd -p 8080 -c /export/elog/elog-2.7.5/elogd.cfg -D

 It looks like Alex also rebooted all of the control room computers.  Or something.  The alarm handler and strip tool aren't running.....after I fix susvme2 (which was down when I got in earlier today), I'll figure out how to restart those.

 Alex switched the mount point for /cvs/cds on Linux1 to the 1.5 TB RAID array after he finished copying the data from old drives.  This required a reboot of linux1, with all the resulting /cvs/cds mount points on the other computers becoming stale.  Easiest way to fix that he found was to do a reboot of all the control room machines.  In addition, a reboot fest should probably happen in the near futuer for all the front end machines since they will also have stale mount points as well from linux1.

The 1.5 TB RAID array mount is now mounted on /home of linux1, which was the old mount point of the ~300 GB drive.  The old drive is now at /oldhome on linux1.

 

  1967   Fri Sep 4 16:09:26 2009 josephbSummaryVACRebooted RGA computer and reset RGA settings

Steve noticed the RGA was not working today.  It was powered on but no other lights were lit.

Turns out the c0rga machine had not been rebooted when the file system on linux1 was moved to the raid array, and thus no longer had a valid mount to /cvs/cds/.  Thus, the scripts that were run as a cron could not be called.

We rebooted c0rga, and then ran ./RGAset.py to reset all the RGA settings, which had been reset when the RGA had lost power (and thus was the reason for only the power light being lit).

 

Everything seems to be working now.  I'll be adding c0rga to the list of computers to reboot in the wiki.

  2068   Thu Oct 8 11:37:59 2009 josephbUpdateComputersReboot of dcuepics helped, c1susvme1 having problems

Power cycling c1dcuepics seems to have fixed the EPICs channel problems, and c1lsc, c1asc, and c1iovme are talking again.

I burt restored c1iscepics and c1Iosepics from the snapshot at 6 am this morning.

However, c1susvme1 never came back after the last power cycle of its crate that it shared with c1susvme2.  I connected a monitor and keyboard per the reboot instructions.  I hit ctrl-x, and it proceeded to boot, however, it displays that there's a media error, PXE-E61, suggests testing the cable, and only offers an option to reboot.  From a cursory inspection of the front, the cables seem to look okay.  Also, this machine had eventually come back after the first power cycle and I'm pretty sure no cables were moved in between.

 

  2183   Thu Nov 5 16:41:14 2009 josephbConfigurationComputersMegatron's personal network

In investigating why megatron wouldn't talk to the network, I re-discovered the fact that it had been placed on its own private network to avoid conflicts with the 40m's test point manager.  So I moved the linksys router (model WRT310N V2) down to 1Y9, plugged megatron into a normal network port, and connected its internet port to the rest of the gigabit network. 

Unfortunately, megatron still didn't see the rest of the network, and vice-versa.  I brought out my laptop and started looking at the settings.  It had been configured with the DMZ zone on for 192.168.1.2, which was Megatron's IP, so communications should flow through the router. Turns out it needs the dhcp server on the gateway router (131.215.113.2) to be on for everyone to talk to each other.  However, this may not be the best practice.  It'd probably be better to set the router IP to be fixed, and turn off the dhcp server on the gateway.  I'll look into doing this tomorrow.

Also during this I found the DNS server running on linux1 had its IP to name and name to IP files in disagreement on what the IP of megatron should be.  The IP to name claimed 131.215.113.95 while the name to IP claimed 131.215.113.178.  I set it so both said 131.215.113.178.  (These are in /var/named/chroot/var/ directory on linux1, the files are 113.215.131.in-addr.arpa.zone  and martian.zone - I modified the 113.215.131.in-addr.arpa.zone file).  This is the dhcp served IP address from the gateway, and in principle could change or be given to another machine while the dhcp server is on.

  2192   Fri Nov 6 10:35:56 2009 josephbUpdateComputersRFM reboot fest and re-enabled ITMY coil drivers

As noted by Steve, the RFM network was down this morning.  I noticed that c1susvme1 sync counter was pegged at 16384, so I decided to start with reboots in that viscinity.

After power cycling crates containing c1sosvme, c1susvme1, and c1susvme2 (since the reset buttons didn't work) only c1sosvme and c1susvme2 came back normally.  I hooked up a monitor and keyboard to c1susvme1, but saw nothing.  I power cycled the c1susvme crate again, and this time I watched it boot properly.  I'm not sure why it failed the first time.

The RFM network is now operating normally.  I have re-enabled the watchdogs again after having turned them off for the reboots.  Steve and I also re-enabled the ITMY coil drivers when I noticed them not damping once the watch dogs were re-enabled.  The manual switches had been set to disabled, so we re-enabled them.

  2195   Fri Nov 6 17:04:01 2009 josephbConfigurationComputersRFM and Megatron

I took the RFM 5565 card dropped off by Jay and installed it into megatron.  It is not very secure, as it was too tall for the slot and could not be locked down.  I did not connect the RFM fibers at this point, so just the card is plugged in.

Unfortunately, on power up, and immediately after the splash screen I get "NMI EVENT!" and "System halted due to fatal NMI". 

The status light on the RFM light remains a steady red as well.  There is a distinct possibility the card is broken in some way.

The card is a VMIPMC-5565 (which is the same as the card used by the ETMY front end machine).  We should get Alex to come in and look at it on Monday, but we may need to get a replacement.

  2196   Fri Nov 6 18:02:22 2009 josephbUpdateComputersElog restarted

While I was writing up an elog entry, the elog died again, and I restarted it.  Not sure what caused it to die since no one was uploading to it at the time.

  2197   Fri Nov 6 18:13:34 2009 josephbUpdateComputersMegatron woes

I have removed the RFM card from Megatron and left it (along with all the other cables and electronics) on the trolly in front of the 1Y9 rack.

Megatron proceeded to boot normally up until it started loading Centos 5.  During the linux boot process it checks the file systems.  At this point we have an error:

 

/dev/VolGroup00/LogVol00 contains a file system with errors, check forced

Error reading block 28901403 (Attempt to read block from filesystem resulted short read) while doing inode scan.

/dev/VolGroup00/LogVol00 Unexpected Inconsistency; RUN fsck MANUALLY

 

So I ran fsck manually, to see if I get some more information.  fsck reports back it can't read block 28901403 (due to a short read), and asks if you want to ignore(y)?.  I ignore (by hitting space), and unfortunately touch it an additional time.  The next question it asks is force rewrite(y)?  So I apparently forced a rewrite of that block.  On further ignores (but no forced rewrites) I continue seeing short read errors at 28901404, *40, *41,*71, *512, *513, etc.  So not totally continugous.  Each iteration takes about 5-10 seconds.  At this point I reboot, but the same problem happens again, although it starts 28901404 instead of 28901403.  So apparently the force re-write fixed something, but I don't know if this is the best way of going about this.  I just wondering if there's any other tricks I can try before I just start rewriting random blocks on the hard drive.  I also don't know how widespread this problem is and how long it might take to complete (if its a large swath of the hard drive and its take 10 seconds for each block that wrong, it might take a while).

So for the moment, megatron is not functional.  Hopefully I can get some advice from Alex on Monday (or from anyone else who wants to chime in).  It may wind up being easiest to just wipe the drive and re-install real time linux, but I'm no expert at that.

 

  2264   Fri Nov 13 09:47:18 2009 josephbUpdateComputersMegatron status lights lit

Megatron's top fan, rear ps, and temperature front panel lights were all lit amber this morning.  I checked the service manual, found at :

http://docs.sun.com/app/docs/prod/sf.x4600m2?l=en&a=view

According to the manual, this means a front fan failed, a voltage event occured, and we hit a high temperature threshold.  However, there were no failure light on any of the individual front fans (which should have been the case given the front panel fan light).  The lights remained on after I shutdown megatron.  After unplugging, waiting 30 seconds, and replugging the power cords in, the lights went off and stayed off.  Megatron seems to come up fine.

I unplugged the IO chassis from megatron, rebooted, and tried to start Peter's plant model.  However, it still prints that its starting, but really doesn't.  One thing I forgot to mention in the previous elog on the matter, is that on the local monitor it prints "shm_open(): No such file or directory" every time we try to start one of these programs.

  2265   Fri Nov 13 09:54:14 2009 josephbConfigurationComputersMegatron switched to tcsh

I've changed megatron's controls account default shell to tcsh (like it was before).  It now sources cshrc.40m in /cvs/cds/caltech/ correctly at login, so all the usual aliases and programs work without doing any extra work.

  2273   Mon Nov 16 15:13:25 2009 josephbUpdateComputersezcaread updated to Yoichi style ezcawrite

In order to get the gige camera code running robustly here at the 40m, I created a "Yoichi style" ezcaread, which is now the default, while the original ezcaread is located in ezcaread.bin.  This tries 5 times before failing out of a read attempt.

  2275   Mon Nov 16 15:58:02 2009 josephbConfigurationGeneralAdded Gige camera to AP table, added some screens

I placed a GC750 gige camera looking at a pickoff of the AS port, basically next to the analog camera, on the AP table.

I've modified the main sitemap to include a CAM button, for the digital cameras.  There's a half done screen associated with it.  At the moment, it reports on the X and Y center of mass calculation, the exposure setting, and displays a little graph with a dot indicating the COM of mass location.  Currently this screen is associated a GC750 camera looking at pickoff of the AS port.  I'm having some issues with getting shell scripts to run from it, as well as a slider having limits other than 0 and 0.

  2276   Mon Nov 16 17:24:28 2009 josephbConfigurationComputersCamera medm functionality improved

Currently the Camera medm screen (now available from the sitemap), includes a server and client script buttons.  The server has two options.  One which starts a server, the second which (for the moment) kills all copies of the server running on Ottavia.  The client button simply starts a video screen with the camera image.  The slider on this screen changes the exposure level.  The snap shot button saves a jpeg image in the /cvs/cds/caltech/cam/c1asport directory with a date and time stamp on it (up to the second).  For the moment, these buttons only work on Linux machines.

All channels were added to C0EDCU.ini, and should be being recorded for long term viewing.

Feel free to play around with it, break it, and let me know how it works (or doesn't).

  2279   Tue Nov 17 10:09:57 2009 josephbUpdateEnvironmentFumes

The smell of diesel is particularly bad this morning.  Its concentrated enough to be causing me a headache.  I'm heading off to Millikan and will be working remotely on Megatron.

  2299   Thu Nov 19 09:55:41 2009 josephbUpdateComputersTrying to get testpoints on megatron

This is a continuation from last night, where Peter, Koji, and I were trying to get test point channels working on megatron and with the TST module.

Things we noticed last night:

We could run starttst, and ./daqd -c daqdrc, which allowed us to get some channels in dataviewer.  The default 1k channel selection works, but none of the testpoints do. 

However, awgtpman -s tst does appear in the processes running list.

The error we get from dataviewer is:

Server error 861: unable to create thread
Server error 23328: unknown error
datasrv: DataWriteRealtime failed: daq_send: Illegal seek

Going to DTT, it starts with no errors in this configuration.  Initially it listed both MDC and TST channels.  However, as a test, I moved the tpchn_C4.par , tpchn_M4.par and tpchn_M5.par files to the directory backup, in /cvs/cds/caltech/target/gds/param.  This caused only the TST channels to show up (which is what we want when not running the mdc module.

We had changed the daqdrc file in /cvs/cds/caltech/target/fb, several times to get to this state.  According to the directions in the RCG manual written by Rolf, we're supposed to "set cit_40m=1" in the daqdrc file, but it was commented out.  However, when we uncommented it, it started causing errors on dtt startup, so we put it back.  We also tried adding lines:

set dcu_rate 13 = 16384;
set dcu_rate 14 = 16384;

But this didn't seem to help.  The reason we did this is we noticed dcuid = 13 and dcuid = 14 in the /cvs/cds/caltech/target/gds/param/tpchn_C1.par file.  We also edited the testpoint.par file so that it correctly corresponded to the tst module, and not the mdc and mdp modules.  We basically set:

[C-node1]
hostname=192.168.1.2
system=tst

in that file, and commented everything else out.

At this point, given all the things we've changed, I'm going to try a rebuild of the tst and daq and see if that solves things.

 

  2300   Thu Nov 19 10:19:04 2009 josephbUpdateComputersMegatron tst status

I did a full make clean and make uninstall-daq-tst, then rebuilt it.  I copied a good version of filters to C1TST.txt in /cvs/cds/caltech/chans/ as well as a good copy of screens to /cvs/cds/caltech/medm/c1/tst/.

Test points still appear to be broken.  Although for a single measurement in dtt, I was somehow able to start, although the output in the results page didn't seem to have any actual data in the plots, so I'm not sure what happened there - after that it just said unable to select test points.  It now says that when starting up as well.  The tst channels are the only ones showing up.  However, the 1k channels seem to have disappeared from Data Viewer, and now only 16k channels are selectable, but they don't actually work.  I'm not actually sure where the 1k channels were coming from earlier now that I think about it.  They were listed like C1:TST-ETMY-SENSOR_UL and so forth.

RA: Koji and I added the SENSOR channels by hand to the .ini file last night so that we could have data stored in the frames ala c1susvme1, etc.

  2301   Thu Nov 19 11:33:15 2009 josephbConfigurationComputersMegatron

I tried rebooting megatron, to see if that might help, but everything still acts the same. 

I tried using daqconfig and changed channels from deactiveated to activated.  I learned by activating them all, that the daq can't handle that, and eventually aborts from an assert checking a buffer size being too small.  I also tried activating 2 and looking at those channels, and it looks like the _DAQ versions of those channels work, or at least I get 0's out of C1:TST-ETMY_ASCPIT_OUT_DAQ (which is set in C1TST.ini file).

I've added the SENSOR channels back to the /csv/cds/caltech/chans/daq/C1TST.ini file, and those are again working in data viewer.

At this point, I'm leaving megatron roughly in the same state as last night, and am going to wait on a response from Alex.

  2371   Wed Dec 9 10:53:41 2009 josephbUpdateCamerasCamera client wasn't able to talk to server on port 5010, reboot fixed it.

I finally got around to taking a look at the digital camera setup today.  Rob had complained the client had stopped working on Rosalba.

After looking at the code start up and not complain, yet not produce any window output, it looks like it was a network problem.  I tried rebooting Rosalba, but that didn't fix anything.

Using netstat -an, I looked for the port 5010 on both rosalba and ottavia, since that is the port that was being used by the camera.  Ottavia was saying there were 6 established connections after Rosalba had rebooted (rosalba is 131.215.113.103).  I can only presume 6 instances of the camera code had somehow shutdown in such a way they had not closed the connection.

[root@ottavia controls]#netstat -an | grep 5010
tcp        0      0 0.0.0.0:5010                0.0.0.0:*                   LISTEN     
tcp        0      0 131.215.113.97:5010         131.215.113.103:57366       ESTABLISHED
tcp        0      0 131.215.113.97:5010         131.215.113.103:58417       ESTABLISHED
tcp        1      0 131.215.113.97:46459        131.215.113.97:5010         CLOSE_WAIT 
tcp        0      0 131.215.113.97:5010         131.215.113.103:57211       ESTABLISHED
tcp        0      0 131.215.113.97:5010         131.215.113.103:57300       ESTABLISHED
tcp        0      0 131.215.113.97:5010         131.215.113.103:57299       ESTABLISHED
tcp        0      0 131.215.113.97:5010         131.215.113.103:57315       ESTABLISHED

 

I switched the code to use port 5022 which worked fine.  However, I'm not sure what would have caused the original connection closure failures, as I test several close methods (including the kill command on the server end used by the medm screen), and none seemed to generate this broken connection state.  I rebooted Ottavia, and this seemed to fix the connections, and allowed port 5010 to work.  I also tried creating 10 connections, which all seem to run fine simultaneously.  So its not someone overloading that port with too many connections which caused the problem.  Its like the the port stopped working somehow, which froze the connection status, but how or why I don't know at this point.

  2509   Tue Jan 12 11:34:26 2010 josephbUpdatePEMAllegra dataviewer

Quote:

So that we can use both Guralps for Adaptive stuff, and so that I can look at the differential ground motion spectra, I've reconnected the Guralp Seismometers to the PEM ADCU, instead of where they've been sitting for a while connected to the ASS ADC.  I redid the ASS.mdl file, so that the PEM and PEMIIR matricies know where to look for the Gur2 data.  I followed the 'make ass' procedure in the wiki.  The spectra of the Gur1 and Gur2 seismometers look pretty much the same, so everything should be all good.

There's a problem with DataViewer though:  After selecting signals to plot, whenever I hit the "Start" button for the realtime plots, DataViewer closes abruptly. 

When I open dataviewer in terminal, I get the following output:

allegra:~>dataviewer
Warning: communication protocol revision mismatch: expected 11.3, received 11.4
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: communication protocol revision mismatch: expected 11.3, received 11.4
msgget: No space left on device
allegra:~>framer4 msgget:msqid: No space left on device

Does anyone have any inspiration for why this is, or what the story is?  I have GR class, but I'll try to follow up later this afternoon.

 This problem seems to be restricted to allegra.  Dataviewer works fine on Rosalba  and op440m, as well as using ssh -X into rosalba to run dataviewer remotely.  DTT seems to work fine on allegra.  The disk usage seems no where near full on allegra.  Without knowing which "device" its refering to (although it must be a device local to allegra), I'm not sure what to look at now. 

I'm going to do a reboot of allegra and see if that helps.

Update:  The reboot seems to have fixed the issue.

 

 

 

 

  2514   Thu Jan 14 11:44:06 2010 josephbSummaryComputersMemory locations for TST model for ITMY

The main communications data structure is RFM_FE_COMMS, from the rts/src/include/iscNetDsc40m.h file.  The following comments regard sub-structures inside it.  I'm looking at all the files in /rts/src/fe/40m to determine how the structures are used, or if they seem to be unnecessary.

The dsccompad structure is used in the lscextra.c file.  I am assuming I don't need to add anything fo the model for these.  They cover from 0x00000040 to 0x00001000.

FE_COMMS_DATA is used twice, once for dataESX (0x00001000 to 0x00002000), and once for dataESY (0x00002000 to 0x00003000).

Inside FE_COMMS_DATA we have:

status and cycle which look to be initialized then never changed (although they are compared to).

ascETMoutput[P,Y], ascQPDinput are all set to 0 then never used.

qpdGain is used, and set by asc40m, but not read by anything.  It is offset 114, so in dataESX its 4210 (0x00001072), and in dataESY its (0x00002072)

All the other parts of this substructure seem to be unused.

daqTest, dgsSet, low1megpad,mscomms seem unused.

dscPad is referenced, but doesn't seem to be set.

pCoilDriver is a structure of type ALL_CD_INFO, inside a union called suscomms, inside FE_COMMS_Data, and is used.  In this structure, we have:

extData[16], an array of DSC_CD_PPY structures, which is used.  Inside extData we have for each optic (ETMY has an offset of 9 inside the extData array):

Pos is set in sos40m.c via the line pRfm->suscomms.pCoilDriver.extData[jj].Pos = dsp[jj].data[FLT_SUSPos].filterInput;   Elsewhere, Pos seems to be set to 1.0

Similarly, Pit and Yaw are set in sos40m, except with FLT_SUSPitch and FLT_SUSYaw, and being set elsewhere to 1.1, 1.2.  However, these are never applied to the ETMX and ETMY optics (it goes through offests 0 through 7 inclusive). 

Side is set 1.3 or 1.0 only, not used.

ascPit , ascYaw, lscPos are read by the losLinux.c code, and is updated by the sos40m.c code. For ETMY, their respective addresses are: 0x11a1c0, 0x11a1c4, 0x11a1c8.

lscTpNum, lscExNum, seem to be initialized, and read by the losLinux.c, and set by sos40m.c.

modeSwitch is read, but looks to be used for turning dewhitening on and off. Similarly dewhiteSW1R is read and used. 

This ends the DSC_CD_PPY structure.

lscCycle, which is used, although it seems to be an internal check.

dum is unused.

losOpLev is a substructure that is mostly unused.  Inside losOpLev, opPerror, opYerror, opYout seem to be unused, and opPout only seems ever to be set to 0.

Thats the end of ALL_CD_INFO and pCoilDriver.

After we have itmepics, itmfmdata, itmcoeffs, rmbsepics,...etymyepics, etmyfmdata,etmycoeffs which I don't see in use.

We have substructure asc inside mcasc, with epics, filt, and coeff char arrays. These seem to be asc and iowfsDrv specific.

lscIpc, lscepics, and lscla seems lsc specific,

The there is lscdiag struct, which contains v struct, which includes cpuClock, vmeReset, nSpob, nPtrx, nPtry don't seem to be used by the losLinux.c.

The lscfilt structure contains the FILT_MOD dspVME, which seems to be used only by lsc40m.

The lsccoeff structure contrains the VME_COEF pRfmCoeff, which again seems to only interact in the lsc code.

Then we have aciscpad, ascisc, ascipc, ascinfo, and mscepics which do not seem to be used.

ascepics and asccoeff are used in asc.c, but does not seem to be referenced elsewhere.

hepiepics , hepidsp, hepicoeff, hepists do not appear to be used.

 

 

 

 

 

 

  2515   Fri Jan 15 11:21:05 2010 josephbUpdateComputersMegatron and tst model for ETMY

The tst model wasn't compiling this morning due to some incorrect lines in the RfmIOfloat.pm file located in /home/controls/cds/advLIgo/src/epics/util/lib file on megatron. 

The error was "Undefined subroutine &CDS::RfmIOfloat::partType called at lib/Parser2.pm line 354, <IN> line 3363."

I changed RfmIO to RfmIOfloat on lines 1 and 6.
Basically the first 6 lines are now

package CDS::RfmIOfloat;
use Exporter;
@ISA = ('Exporter');

sub partType {
        return RfmIOfloat;
}

The tst now compiles.  At the moment, I believe we should be able to switch megatron in for ETMY and attempt to lock the arm.  The whitening/dewhitening filters are still not automatically synced in software and hardware, but I don't think that should prevent locking.

  2530   Tue Jan 19 10:30:29 2010 josephbUpdateComputersBoot fest to restart the computer and c1iscey not responding.

Quote:

Thi afternoon I found that the RFM network in trouble. The frontends sync counters had railed to 16384 counts and some of the computers were not responding. I went for a bootfest, but before I rebooted c1dcu epics. I did it twice. Eventually it worked and I could get the frontends back to green.

Although trying to burtrestore to snapshots taken at any time after last wednesday till today would make the RFM crash again. Weird.

Also, c1iscey seems in a coma and doesn't want to come back. Power cycling it didn't work. I don't know how to be more persuasive with it.

During the testing of Megatron as the controller for ETMY, c1iscey had been disconnected from the ethernet hub.  Apparently we forgot to reconnect it after the test.  This prevented it from mounting the nfs directory from linux1, and thus prevented it from coming up after being shutdown.  It has been reconnected, restarted, and is working properly now.

  2544   Mon Jan 25 11:42:24 2010 josephbUpdateComputersMegatron and BO board

I was talking with Vladimir on Friday discussing the Binary Output board, we looked at the manual for it (Contec DO-32L-PE) and we realized its an opto-coupler isolated open-collector output.  He mentioned they had the right kind of 50 channel breakout board for testing in Riccardo's lab.

This morning I borrowed the 50 channel breakout board from Riccardo's lab, and along with some resistor loads, test the BO board.  It seems to be working, and I can control the output channel on/off state.

  2558   Tue Feb 2 10:38:30 2010 josephbUpdateComputersMegatron BO test

Last night, I connected megatron's BO board to the analog dewhitening board.  However, I was unable to lock the y arm (although once I disconnected the cable and reconnected it back the xy220 the yarm did lock).

So either A) I've got the wrong cable, or B) I've got the wrong logic being sent to the analog dewhitening filters.

During testing, I ran into an odd continuous on/off cycle on one of the side filer modules (on megatron).  Turns out I had forgotten to use a ground input to the control filer bank (which allows you to both set switches as well as read them out), and it was reading a random variable.  Grounding it in the model fixed the problem (after re-making).

 

 

  2570   Thu Feb 4 12:29:04 2010 josephbUpdateComputersMartian IP switch over notes

What is the change:

We will be moving the 131.215.113.XXX ip addresses of the martian network over to a 192.168.XXX.YYY scheme.

What will users notice:

Computer names (i.e. linux1, scipe25/c1dcuepics) will not change.  The domain name martian, will not change (i.e. linux1.martian.).  What will change is the underlying IP address associated with the host names.  Linux1 will no longer be 131.215.113.20 but something like 192.168.0.20.  If everything is done properly, that should be it.  There should be no impact or need to change anything in EPICS for example.  However, if there are custom scripts with hard coded IP addresses rather than hostnames, those would need to be updated, if they exist.

What needs to be done:

Each computer and router will need to either be accessed remotely, or directly, and the configuration files controlling the IP address (and/or dns lookup locations) be modified.  Then it needs to be rebooted so the configuration changes take effect. I'll be making an updated list of computers this week (tracked down via their physical ethernet cables), and next week, probably on Thursday, and then we simply go down the list one by one.

LINUX

For a linux machine, this means checking the /etc/hosts file and making sure it doesn't have old information.  It should look like:

127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6

Then change the /etc/sysconfig/network-scripts/ifcfg-eth0 file (or ethX file depending on the ethernet card in question).  The IPADDR, NETWORK, and GATEWAY lines will need to be changed.  You can change the hostname (although I don't plan on it) by modifying the /etc/sysconfig/network file.

The /etc/resolv.conf file will need to be updated with the new DNS server location (i.e. 131.215.113.20 to 192.168.0.20 for example).

SOLARIS

Simlarly to linux, the /etc/hosts file will need to be updated and/or simplified.  The /etc/defaultrouter file will need to be updated to the new router ip.  /etc/defaultdomain will need to be updated.  The /etc/resolv.conf will need to be updated with the new dns server.

vxWorks

Looking at the vxWorks machines, the command bootChange can be used to view or edit the IP configuration.

The following is an example from c1iscey.

-> bootChange

'.' = clear field;  '-' = go to previous field;  ^D = quit

boot device          : eeE0
processor number     : 0
host name            : linux1
file name            : /cvs/cds/vw/pIII_7751/vxWorks
inet on ethernet (e) : 131.215.113.79:ffffff00
inet on backplane (b):
host inet (h)        : 131.215.113.20
gateway inet (g)     :
user (u)             : controls
ftp password (pw) (blank = use rsh):
flags (f)            : 0x0
target name (tn)     : c1iscey
startup script (s)   :
other (o)            :

value = 0 = 0x0

By updating the the host (name of machine where its mounting /cvs/cds from - i.e. linux1), inet on ethernet (the IP of c1iscey) and host inet (linux1's ip address), we should be able to change all the vxWorks machines.

LINUX1

The DNS server running on linux1 will need to be updated with the new IPs and domain information.  The host file on linux1 will also need to be updated for all the new IP addresses as well.

This will need to be handled carefully as the last time I tried getting away without the host file on linux1, it broke NFS mounting from other machines.  However, as long as the host on linux1 is kept in sync with the dns server files everything should work.

  2580   Mon Feb 8 17:00:36 2010 josephbUpdateComputersMegatron ETMY model updated (tst.mdl)

I've added the control logic for the outputs going to the Contec Digital Output board.  This includes outputs from the QPD filters (2 filters per quadrant, 8 in total), as well as outputs going to the coil input sensor whitening.

At this point, the ETMY controls should have everything the end station FE normally does.  I'm hoping to do some testing tomorrow afternoon with this with a fully locked IFO.

  2583   Tue Feb 9 17:18:45 2010 josephbSummaryComputersLocking Y arm successful with fully replaced front-end for ITMY

We were able to lock the Y-arm using Megatron and the RCG generated code, with nothing connected to c1iscey.

All relevant cables were disconnected from c1iscey and plugged into the approriate I/O ports, including the digital output.  Turns out the logic for the digital output is opposite what I expected and added XOR bitwise operators in the tst.mdl model just before it went out to DO board.  Once that was added, the Y arm locked within 10 seconds or so.  (Compared to the previous 30 minutes trying to figure out why it wouldn't lock).

  2594   Fri Feb 12 11:44:11 2010 josephbUpdateComputersStatus of the IP change over

Quote:

After Joe left:

  1. Turned on op440m and returned him his keyboard and mouse.
  2. Damped MC2.
  3. Opened PSL shutter - locked PMC, FSS,
  4. Started StripTool displays on op540m.
  5. op340m doesn't respond to ping from anyone.
  6. started FSS  SLOW and RCPID scripts on op540 - need to kill and restart on op430m.
  7. ASS wouldn't come up - it doesn't know who linux1 is.
  8. MC autolocker wouldn't run on op540m because of a perl module issue, started it on op440m - it needs to be killed and restarted on op430m.
  9. probably mafalda, linux2, and op430m need some attention - they are all in the same rack.

As of 7:18 PM, the MC is locked and the PSL seems normal + all suspensions are damped and the ELOG is back up as well as the SVN.

5) op340m has had its hosts table and other network files updated.  I also removed its outdated hosts.deny file which was causing some issues with ssh.

6) On op340m I started FSSSlowServo, with "nohup ./FSSSlowServo", after killing it on op540m.

I also kill RCthermalPID.pl, and started with "nohup ./RCthermalPID.pl" on op540m.

7) c1ass is fixed now.  There was a typo in the resolv.conf file (namerserver -> nameserver) which has been fixed.  It is now using the DNS server running on linux1 for all its host name needs.

8) I killed the autlockMCmain40m process running on op440m, modified the script to run on op340m, logged into op340m, went to /cvs/cds/caltech/scripts/MC and ran nohup ./autolockMCmain40m

9) Linux2 does not look like it has not been connected for awhile and its wasn't connected when we started the IP change over yesterday.  Is it supposed to still be in use?  If so, I can hook it up fairly easily.  op340m, as noted earlier, has been switched over.  Mafalda has been switched over.

10) c0rga has now been switched over. 

11) aldabella, the vacuum laptop has had its starting environment variables tweaked (in the /home/controls/.cshrc file) so that it looks on the 192.168.113 network instead of the 131.215.113.  This should mean Steve will not have any more trouble starting up his vacuum control screen.

12) Ottavia has been switched over.

13) At this time, only the GPIB devices and a few laptops remain to get switched over.

  2595   Fri Feb 12 11:56:02 2010 josephbUpdateComputersNodus slow ssh fixed

Koji pointed out that logging into Nodus was being abnormally slow.  I tracked it down to the fact we had forgotten to update the address for the DNS server running on linux1 in the resolv.conf file on nodus.  Basically it was looking for a DNS server which didn't exit, and thus was timing out before going to the next one.  SSHing into nodus should be more responsive.

  2597   Fri Feb 12 13:56:16 2010 josephbUpdateComputersFinishing touches on IP switch over

The GPIB interfaces have been updated to the new 192.168.113.xxx addresses, with Alberto's help.

Spare ethernet cables have been moved into a cabinet halfway down the x-arm.

The illuminators have a white V error on the alarm handler, but I'm not sure why.  I can turn them on and off using the video screen controls (except for the x arm, which has no computer control, just walk out and turn it on).

There's a laptop or two I haven't tracked down yet, but that should be it on IPs. 

At some point, a find and replace on 131.215.xxx.yyy addresses to 192.168.xxx.yyy should be done on the wiki.  I also need to generate an up to date ethernet IP spreadsheet and post it to the wiki.

 

  2599   Fri Feb 12 15:59:16 2010 josephbUpdateComputersTestpoints not working

Non-testpoint channels seem to be working in data viewer, however testpoints are not.  The tpman process is not running on fb40m.  My rudimentary attempts to start it have failed. 

# /usr/controls/tpman &
13929
# VMIC RFM 5565 (0) found, mapped at 0x2868c90
VMIC RFM 5579 (1) found, mapped at 0x2868c90
Could not open 5565 reflective memory in /dev/daqd-rfm1
16 kHz system
Spawn testpoint manager
no test point service registered
Test point manager startup failed; -1

It looks like it may be an issue with the reflected memory (although the cables are plugged in and I see the correct lights lit on the RFM card in back of fb40m.)

The fact that this is a RFM error is confirmed by /usr/install/rfm2g_solaris/vmipci/sw-rfm2g-abc-005/util/diag/rfm2g_util and entering 3 (which should be the device number).

Interestingly, the device number 4 works, and appears to be the correct RFM network (i.e. changing ETMY lscPos offset changes to the corresponding value in memory).

So, my theory is that when Alex put the cards back in, the device number (PCI slot location?) was changed, and now the tpman code doesn't know where to look for it.

Edit: Doesn't look like PCI slot location is it, given there's 4 slots and its in #3 currently (or 2 I suppose, depending on which way you count).  Neither seems much the number 4.  So I don't know how that device number gets set.

 

 

  2610   Wed Feb 17 12:45:19 2010 josephbUpdateComputersUpdated Megatron and its firewall

I updated the IP address on the Cisco Linksys wireless N router, which we're using to keep megatron separated from the rest of the network.  I then went in and updated megatrons resolv.conf and host files.  It is now possible to ssh into megatron again from the control machines.

  2622   Mon Feb 22 09:45:34 2010 josephbUpdateGeneralPrep for Power Supply Stop

Quote:

Autoburts have not been working since the network changeover last Thursday.

Last snapshot was around noon on Feb 11...  


It turns out this happened when the IP address got switched from 131.... to 192.... Here's the horrible little piece of perl code which was failing:

$command = "/usr/sbin/ifconfig -a > $temp";
   system($command);

   open(TEMP,$temp) || die "Cannot open file $temp\n";
   $site = "undefined";
   #                                                                                                     
   # this is a horrible way to determine site location                                                   
   while ($line = <TEMP>) {
     if ($line =~ /10\.1\./) {
       $site = "lho";
     } elsif ($line =~ /10\.100\./) {
       $site = "llo";
     } elsif ($line =~ /192\.168\./) {
       $site = "40m";
     }
   }
   if ($site eq "undefined") {
     die "Cannot Determine Which LIGO Observatory this is\n";

I've now put in the correct numbers for the 40m...and its now working as before. I also re-remembered how the autoburt works:

1) op340m has a line in its crontab to run /cvs/cds/caltech/burt/autoburt/burt.cron (I've changed this to now run at 7 minutes after the hour instead of at the start of the hour).

2) burt.cron runs /cvs/cds/scripts/autoburt.pl (it was using a perl from 1999 to run this - I've now changed it to use the perl 5.8 from 2002 which was already in the path).

3) autoburt.pl looks through every directory in 'target' and tries to do a burt of its .req file.

Oh, and it looks like Joe has fixed the bug where only op440m could ssh into op340m by editing the host.allow or host.deny file (+1 point for Joe).

But he forgot to elog it (-1 point for Joe).®

I knew there was going to be a script somewhere with a hard coded IP address.  My fault for missing it.  However, in regards to the removal of op340m's host.deny file, I did elog it here.  Item number 5.

ELOG V3.1.3-