40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 81 of 341  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  963   Thu Sep 18 12:16:01 2008 YoichiUpdateComputersEPICS BACK

Quote:

Somehow the EPICS system got hosed tonight. We're pretty much dead in the water till we can get it sorted.


The problem was caused by the installation of a DNS server into linux1 by Joe.
Joe removed /etc/hosts file after running the DNS server (bind). This somehow prevented proper boot of
frontend computers.
Joe and I confirmed that putting back /etc/hosts file resolved the problem.
Right now, the DNS server is also running on linux1.

We are not sure why /etc/hosts file is still necessary. My guess is that the NFS server somehow reads /etc/hosts
when he decides which computer to allow mounting. We will check this later.

Anyway, now the computers are mostly running fine. The X-arm locks.
The Y-arm doesn't, because one of the digital filters for the Y-arm lock fails to be loaded to the frontend.
I'm working on it now.
  964   Thu Sep 18 13:05:05 2008 YoichiUpdateComputersEPICS BACK

Quote:

The Y-arm doesn't, because one of the digital filters for the Y-arm lock fails to be loaded to the frontend.
I'm working on it now.


Rob told me that the filter "3^2:20^2" is switched on/off dynamically by the front end code for the LSC.
Therefore, the failure to manually load it was not actually a problem.
The Y-arm did not lock just because the alignment was bad.
Now the Y-arm alignment is ok and the arm locks.
  965   Thu Sep 18 14:36:54 2008 josephbConfigurationComputersName server and Epics
The problems Rob was experiencing last night was due to part of the setup (or rather testing of the setup) of the new nameserver running on linux1.

The name server was setup on linux1 by doing the following:

1) Installed xorg-x11-xauth via yum which was necessary to get remote x windows to work in linux1

2) Installed xorg-x11-fonts-Type1 in order to get the gui system-config-* programs to work

3) Ran system-config-bind, which created a default set of nameserver files. I unfortunately didn't understand the gui all that well, so I manually edited and added files to these base ones. The base files were generated in /var/named/chroot/etc/ and /var/named/chroot/var/named.

4) I added martian.zone and 113.215.131.in-addr.arpa.zone, named.conf.local, and edited named.conf so it loaded named.conf.local. The martian.zone file acts a forward look up (i.e. give it a name and it returns an IP number like 131.215.113.20). The 113.215.131.in-addr.arpa.zone acts as a reverse look up (i.e. give it an IP number like 131.215.113.20 and it tells you the name). The file named.conf.local merely points to these two files.

Note: One can add or change IP lookup by simply updating these two files. The format should be obvious from the files.

5) I specifically ssh'd in as root to linux1 (using su wasn't sufficient) and then typed "service named start" (without quotes). You can also use "restart" or "stop" instead of "start". This started the name server, giving an [Ok] message.

6) I edited the /etc/resolve.conf file on linux1 so that it pointed to itself first ("nameserver 127.0.0.1" at the top of the file). I also added the line "search martian", which allows one to simply use linux1 as opposed to linux1.martian.

I also edited the /etc/resolve/conf file on linux2, and it seems to resolve names fine.

7) And here is where I broke things. As a test, I moved /etc/hosts to /etc/hosts.bak, and then tested to see if names were being resolved correctly. By using the command host, I determined they were in fact working. I also tested with ssh.

However, something basic didn't like me moving the hosts file. Apparently when a front-end machine needed to reboot, it wouldn't come back up, without any ability to SSH or telnet into them.

With Yoichi and I did quite a bit of debugging this morning and determined the nameserver itself isn't conflicting, merely the lack of the host file was the source of the problem. One theory is that services don't know to go to DNS to resolve host names. I think by modifying the /etc/nsswitch.conf file to include dns as an option for services and other programs, it might work without the host file, however, I'm going to leave that to tomorrow morning which is less likely to interfere with current operations.

As it stands, things are working with the nameserver running and the host file in place.
  966   Thu Sep 18 18:38:14 2008 YoichiHowToComputersHow to compile an SNL code for VxWorks
Dave Barker guided me through how to compile an SNL code into a Motorola 162 CPU object.

Here is the procedure:

(1) You need an account at LHO and a password for ops account at LHO. Contact Dave if you don't have these.

(2) Copy your code (say Particle.st) to the LHO gateway machine.
scp Particle.st username@lhocds.ligo-wa.caltech.edu:/cvs/cds/lho/target/t0sandbox0
(3) Login to lhocds.ligo-wa.caltech.edu
ssh username@lhocds.ligo-wa.caltech.edu
(4) Login to control0
ssh ops@control0
(5) Change directory to the sandbox dir.
cd /cvs/cds/lho/target/t0sandbox0
(6) Prepare for the compilation
setup epics
(7) Edit makefile in the directory. You have to modify a few lines at the end of the file.
There are comments for how to do it in the file.

(8) Compile
make Particle.o
(9) Copy the object file to the 40m target directory
scp Particle.o controls@nodus.ligo.caltech.edu:/cvs/cds/caltech/target/c1psl/

That is it.
  969   Fri Sep 19 00:18:14 2008 ranaUpdateComputerssvn is old
linux2:mDV>ssh nodus
Password:
Last login: Fri Sep 19 00:11:44 2008 from gwave-69.ligo.c
Sun Microsystems Inc.   SunOS 5.9       Generic May 2002
nodus:~>c
nodus:caltech>cd apps/
nodus:apps>cd mDV
nodus:mDV>svn update
svn: This client is too old to work with working copy '.'; please get a newer Subversion client
nodus:mDV>whoami
controls
nodus:mDV>uname -a
SunOS nodus 5.9 Generic_118558-39 sun4u sparc SUNW,A70 Solaris
nodus:mDV>pwd
/cvs/cds/caltech/apps/mDV
nodus:mDV>
Frown
  972   Fri Sep 19 09:49:42 2008 YoichiUpdateComputerssvn is old
The problem below is fixed now.
The cause was .svn/entries and .svn/format had wrong version number "9" where it had to be "8".
I changed those files in all the sub-directories. Now svn up runs fine.
I don't know how this version discrepancy happened.



Quote:
linux2:mDV>ssh nodus
Password:
Last login: Fri Sep 19 00:11:44 2008 from gwave-69.ligo.c
Sun Microsystems Inc.   SunOS 5.9       Generic May 2002
nodus:~>c
nodus:caltech>cd apps/
nodus:apps>cd mDV
nodus:mDV>svn update
svn: This client is too old to work with working copy '.'; please get a newer Subversion client
nodus:mDV>whoami
controls
nodus:mDV>uname -a
SunOS nodus 5.9 Generic_118558-39 sun4u sparc SUNW,A70 Solaris
nodus:mDV>pwd
/cvs/cds/caltech/apps/mDV
nodus:mDV>
Frown
  973   Fri Sep 19 11:21:45 2008 josephbConfigurationComputersNameserver and Rosalba
I tried modifying the nsswitch.conf file to include going to dns in addition to local files for everything (services, network, etc) and then moving the /etc/hosts file to /etc/hosts.bak. Unfortunately, this still didn't allow front-ends to reboot properly. So I'm not sure what is using the hosts file, but whatever it is, is apparently important. After the test I placed the hosts file back and reverted the nsswitch.conf file.

I also noticed that Rosalba was having problems connecting to the network. This apparently was because I had shut down the dhcp server on the NAT router, as had been discussed at the meeting on Wednesday.

To fix this, I modified the /etc/sysconfig/network-scripts/ifcfg-eth1 file to fix rosalba's ip as 131.215.113.24 (which doesn't seem to be in use). I also updated rosalba's /etc/resolv.conf file to point at linux1's name server, and two additional name servers as well, and added the "search martian" line. I modified the /etc/sysconfig/network-scripts/ifcfg-eth0 file so the built in network card doesn't come up automatically, since its currently not plugged into anything. Lastly, I added rosalba and its IP to linux1's name server files.
  974   Fri Sep 19 11:48:14 2008 steveUpdateComputers old hubs can make one happy
Joseph finds a XIX century bottle neck hub: CentreCOM 3624TR 10Base-T
and happily replaces it with Netgear GS724T 1000Base-T
Attachment 1: P1020934.jpg
P1020934.jpg
  977   Mon Sep 22 16:51:27 2008 YoichiHowToComputersNetwork GPIB
I was able to make the wireless connected GPIB interface work with SR785.
Now you can download data from SR785 through network, wherever it is located.
Say good bye to floppy disks.

I wrote an installation note in the wiki.
http://lhocds.ligo-wa.caltech.edu:8000/40m/GPIB

I wrote a new script called "netgpibdata.py" which works similarly as "getgpibdata.py".
It is in the 40m svn. Instructions on how to use it is on the above mentioned wiki page.
  983   Tue Sep 23 00:47:24 2008 YoichiHowToComputersNetwork GPIB

Quote:

I wrote a new script called "netgpibdata.py" which works similarly as "getgpibdata.py".
It is in the 40m svn. Instructions on how to use it is on the above mentioned wiki page.


netgpibdata.py is now installed on the controls machines (/cvs/cds/caltech/scripts/general/netgpibdata/netgpibdata.py).
You can use it like,
netgpibdata.py -i 131.215.113.106 -d AG4395A -a 10 -f spectrum01

In this example, data from Agilent 4395A analyzer at GPIB address 10 connected to the GPIB-LAN box with the IP address 131.215.113.106
is downloaded and saved to spectrum01.dat. The measurement parameters are saved to spectrum01.par.
  990   Thu Sep 25 03:12:13 2008 ranaSummaryComputersconlog and linux1
It would be nice to have conlog from outside. Right now its on linux1 and so its unavailable. To
test it for speed we ran the command line conlog on linux1, linux2, and nodus.

It was slightly faster on nodus than linux1, implying that its not a network speed issue. It was
phenomenally slower on linux2.

I used the command '/sbin/lspci -vvv' to check what network cards are installed where. As it turns
out, linux2 has a GigE card, but linux1, our NFS server, has only a 100 Mbit card:
01:08.0 Ethernet controller: Intel Corporation 82562EZ 10/100 Ethernet Controller (rev 01)
        Subsystem: Intel Corporation Unknown device 304a
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min, 14000ns max), Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 209
        Region 0: Memory at ff8ff000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at bc00 [size=64]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

We (Joe) need to buy a GigE card for linux1 and to also set up conlog and conlogger to run on Nodus.
  992   Thu Sep 25 14:03:08 2008 josephbConfigurationComputers 

Quote:

We (Joe) need to buy a GigE card for linux1 and to also set up conlog and conlogger to run on Nodus.


A spare Intel Pro 1000/GT desktop adapter (gigabit ethernet card) has been added to Linux1 and is now using that card to connect to the network.

This was after a slight scare when I somehow reset the bios on Linux1 during the first reboot after adding the card.
After some debugging and discussion with Yoichi, the bios was fixed and the computer works again, with its new faster network connection.

Although we both noted that Linux1 is a rather old machine, with only half a gig of Ram and reaching about 80% capacity on its 58 gigabyte hard drive (raid). Might be worth upgrading in general.

Need to figure out how to install conlog/conlogger programs next...
  997   Fri Sep 26 14:10:21 2008 YoichiConfigurationComputersLab laptops maintenance
The linux laptops were unable to write to the NFS mounted directories.
That was because the UID of the controls account on those compters was different from linux1 and other control room computers.
I changed the UID of the controls account on the laptops. Of course it required not only editing /etc/password but also dealing with
numerous errors caused by the sudden change of the UID. I had to chown all the files/directories in the /home/controls.
I also had to remove /tmp/gconf-controls because it was assigned the old UID.

Whenever we add a new machine, we have to make sure the controls account has the same UID/GID as other machines, that is 1001/1001.


I did some cleanups of the laptop environment.
I made dataviewer work on the laptops *locally*. We no longer have to ssh -X to other computers to run dataviewer.
The trick was to install grace using Fedora package by
sudo yum install grace
Then i modified /usr/local/stow_pkgs/dataviewer/dataviewer to change the option to dc3 from "-s fb" to "-s fb40m".
  1006   Mon Sep 29 13:33:39 2008 josephbConfigurationComputersGigabit network finished and conlog available on Nodus
The last 100 Mb unmounted hub has been removed (or at least of the ones I could find). We should be on a fully gigabit network with Cat6 cables and lots and lots of labels.

In other news, the pearl script that runs the web interface on linux1 for the conlog has been copied to /cvs/cds/caltech/apache/cgi-bin/ and is now being pointed to by the apache server on Nodus.

https://nodus.ligo.caltech.edu:30889/cgi-bin/conlog_web.pl
  1015   Wed Oct 1 12:05:58 2008 AlbertoConfigurationComputers"StochMon" added to the Alarm Handler
John, Alberto,

we added the four channels of the RF Amplitude Monitor (aka StochMon) to the Alarm HAndler. First we modified the 40m.alh file just copying some lines and switching the name of the channels to the ones we wanted. Than we also added a few lines to the database file ioo.db in order to define the alrm levels. So far I used just test values for the thresholds of green, yellow and red states and need to update to some reasonable ones. To do that I need to calibrate those EPICS channels. I have the old data saved and I'm now trying to figure out how to properly change the database file.
  1016   Wed Oct 1 12:09:25 2008 AlbertoConfigurationComputers"StochMon" added to the Alarm Handler

Quote:
John, Alberto,

we added the four channels of the RF Amplitude Monitor (aka StochMon) to the Alarm HAndler. So far I used just test values for the thresholds of green, yellow and red states and need to update to some reasonable ones. To do that I need to calibrate those EPICS channels. I have the old data saved and I'm now trying to figure out how to properly change the database file.


John, Yoichi, Alberto

We restarted the C1iool0 computer both directly by the main key and remotely via telnet. We had to do it a couple of times and in one occasion the computer didn't restart properly and had connection problem with the newtowrk. We had to call Alex that did just the same thing, but used the CTRL+X command to reboot. It worked and the Alarm Handler now includes the StochMon.
  1031   Tue Oct 7 12:17:57 2008 AlbertoConfigurationComputersTime reset on MEDM
Yoichi, Alberto

I noticed the MEDM screen time was about 7 minutes ahead of the right time. The time on MEDM is read on channel C0:TIM-PACIFIC_STRING which takes it from the C1VCU-EPICS computer. Yoichi found that that computer did not have the right time because one of the startup scripts, ntpd, which are contained in the directory /etc/init.d/ for some reason did not start. So restring it by typing ./ntpd start updated the time on that computer and fixed the problem.
  1033   Wed Oct 8 12:35:56 2008 josephbConfigurationComputersNew Network diagram for the 40m
Attached is a pdf of the new network diagram for the 40m after having removed all of the old hubs.
Attachment 1: 40m_network_10-07-08.pdf
40m_network_10-07-08.pdf
  1038   Fri Oct 10 00:34:52 2008 robOmnistructureComputersFEs are down

The front-end machines are all down. Another cosmic-ray in the RFM, I suppose. Whoever comes in first in the morning should do the all-boot described in the wiki.
  1039   Fri Oct 10 10:20:42 2008 AlbertoOmnistructureComputersFEs are down

Quote:

The front-end machines are all down. Another cosmic-ray in the RFM, I suppose. Whoever comes in first in the morning should do the all-boot described in the wiki.


Yoichi and I went along the arms turning off and on all the FE machines. Then, from the control room we rebooted them all following the procedures in the wiki. Everything is now up again.

I restored the full IFO, re-locked the mode cleaner.
  1040   Fri Oct 10 13:57:33 2008 AlbertoOmnistructureComputersProblems in locking the X arm
This morning for some reason that I didn't clearly understand I could not lock the Xarm. The Y arm was not a problem and the Restore and Align script worked fine.

Looking at the LSC medm screen something strange was happening on the ETMX output. Even if the Input switch for c1:LSC-ETMX_INMON was open, there still was some random output going into c1:LSC-ETMX_INMON, and it was not a residual of the restor script running. Probably something bad happened this monring when we rebooted all the FE computers for the RFM network crash that we had last night.

Restarting the LSC computer didn't solve the problem so I decided to reboot the scipe25 computer, corresponding to c1dcuepics, that controls the LSC channels.

Somehow rebooting that machine erased all the parameters on almost all medm screens. In particular the mode cleaner mirrors got a kick and took a while to stop. I then burtrestored all the medm screen parameters to yesterday Thursday October 9 at 16:00. After that everything came back to normal. I had to re-lock the PMC and the MC.

Burtrestoring c1dcuepics.snap required to edit the .snap file because of a bug in burtrestore for that computer wich adds an extra return before the final quote symbol in the file. That bug should be fixed sometime.

The rebooting apparently fixed the problem with ETMX on the LSC screen. The strange output is not present anymore and I was able to easily lock the X arm. I then run the Align and the Restore full IFO scripts.
  1041   Fri Oct 10 20:03:35 2008 YoichiConfigurationComputersmedm, dataviewer, dtt on 64 bit linux
I compiled EPICS (base, medm and ezca) and dataviewer for 64 bit linux.
These are installed in /cvs/cds/caltech/apps/linux64/.
I also configured cshrc.40m to make it possible to run the 32bit dtt on 64bit machines.
64bit ligotools is also installed to /cvs/cds/caltech/apps/linux64/ligotools although I haven't tested it extensively.

With those essential tools available for 64bit linux, Joe and I decided to install 64bit CentOS to the new linux machine.
It is named allegra.
Now, medm, dataviewer, dtt, awg, foton and ezca commands all work on rosalba and allegra.
I put some notes on how to make things work on 64bit in the wiki.
http://lhocds.ligo-wa.caltech.edu:8000/40m/Building_LIGO_softwares_for_64_bit_linux

I compiled dtt (actually the whole GDS tree) for 64bit linux and the build process finished normally.
But somehow dtt does not work properly. It starts on my laptop but does not retrieve data. It crashes on rosalba.
So I had to retreat to 32bit.
  1047   Tue Oct 14 19:18:18 2008 YoichiUpdateComputersBootFest
Rana, Yoichi

Most of the FE computers failed around the lunch time.
We power cycled those machines and now all of them are up and running.
I confirmed that the both arms lock.
Now the IFO is in "Restore last auto-alignment" status.
  1055   Fri Oct 17 16:43:10 2008 YoichiConfigurationComputersmcup/down scripts on linux
I made some changes to /cvs/cds/caltech/medm/c1/ioo/cmd/medmMCup and medmMCdown so that those can be run from medm on linux machines.
  1059   Mon Oct 20 15:02:00 2008 YoichiConfigurationComputers/cvs/cds restored
I moved missing files in /cvs/cds restored by Alan and Stuart to the original locations.
I confirmed autoburt runs, and dtt, which had also been having trouble running, runs ok now.

I found an interesting piece of evidence on allegra, our new 64bit linux machine.
In the Trash of controls Desktop on that machine, there is /cvs/cds/vw/ directory.
I remember that when I last time emptied the trash bin on the machine (yesterday), it took somewhat long time.
Too bad that I did not pay attention to what was actually in the Trash, but now I have a feeling that in the Trash were
missing /cvs/cds/* directories.
While emptying the Trash, I encountered several errors saying permission denied or something like that, and skipped those files.
Sometimes, when you move something from NFS mounted directories to the Trash, you get this kind of errors.
So my guess is that someone accidentally (or intentionally) moved /cvs/cds/* except for "caltech" to the Trash of allegra.
And I completely removed them carelessly.
  1060   Mon Oct 20 16:18:00 2008 AlanConfigurationComputers/cvs/cds restored

Quote:
I moved missing files in /cvs/cds restored by Alan and Stuart to the original locations.
I confirmed autoburt runs, and dtt, which had also been having trouble running, runs ok now.

I found an interesting piece of evidence on allegra, our new 64bit linux machine.
In the Trash of controls Desktop on that machine, there is /cvs/cds/vw/ directory.
I remember that when I last time emptied the trash bin on the machine (yesterday), it took somewhat long time.
Too bad that I did not pay attention to what was actually in the Trash, but now I have a feeling that in the Trash were
missing /cvs/cds/* directories.
While emptying the Trash, I encountered several errors saying permission denied or something like that, and skipped those files.
Sometimes, when you move something from NFS mounted directories to the Trash, you get this kind of errors.
So my guess is that someone accidentally (or intentionally) moved /cvs/cds/* except for "caltech" to the Trash of allegra.
And I completely removed them carelessly.


In the meantime, I have re-started the nightly backup for /frames/minute-trends
but NOT YET for /cvs/cds ,
since I fear that we'll find another problem and will need to go back to the June 27 backup.
Let's wait a few days for the dust to settle,
and if everyone feels confident that /cvs/cds is ok,
I'll restart the backup of that.

How I restored the files, for the record:

Stuart mounted /archive/backup onto an accessible computer (garrak.ligo.caltech.edu ) and I logged on to controls@nodus and ran this command:

/cvs/cds/caltech/scripts/backup/rsync --rsync-path=/usr/bin/rsync --rsh=/usr/bin/ssh --compress --verbose --archive --hard-links --exclude=caltech/ ajw@garrak.ligo.caltech.edu:/backup/40m/cvs /cvs/cds/recover_20081020

I had to type in my GC password, and it ran for ~20 minutes (would have been much longer had I asked for /cvs/cds/caltech as well!).

you can view the backups by logging on to garrak.ligo.caltech.edu with your GC account:
/backup/40m/cvs/cds/
/archive/frames/trend/minute-trend/40m
  1065   Tue Oct 21 18:19:42 2008 YoichiConfigurationComputersLISO and Eagle installed
I installed LISO, a circuit simulation software, into the control room linux machines.
I also installed a PCB CAD called Eagle to serve as a graphical editor for LISO.
I put a brief explanation in the wiki.
http://lhocds.ligo-wa.caltech.edu:8000/40m/LISO

As a demonstration, I made a model of the FSS PC path and did a stability analysis of the op-amps.

The first attachment is the schematic of the model.
You can find the model in /cvs/cds/caltech/apps/linux/eagle/projects/liso-examples/FSS

The second attachment shows the stability analysis plot of the first two op-amps when AD829s are used.
The op-amp model is for the uncompensated AD829. The graph includes the bode plots of the open-loop transfer function of each op-amp.
If the phase delay is more than 360deg (in the plot it is 0 deg because the phase is wrapped within +/-180 deg) at the unity gain frequency,
the op-amp is unstable.
It is clear from the plot that this circuit is unstable. This is consistent with what I experienced when I replaced the chips to AD829 without
compensation.
Unfortunately, I don't have an op-amp model for phase compensated AD829. So I can't make a plot with compensation caps.

The third attachment is the stability analysis of the same circuit with AD797. It also shows that the circuit is unstable at 200MHz, though I
observed oscillation at 50MHz.

Finally, I did an estimate of frequency noise contribution from the noise of AD829.
First I estimated the voltage noise at the output of the board caused by the first AD829 using LISO's noise command.
Then I converted it into the input equivalent noise at the stage right after the mixer by calculating the transfer function
of the circuit using LISO.
Within the control bandwidth of the FSS, this input equivalent noise appears at the mixer output with the opposite sign.
Since we know the calibration factor from the mixer output voltage to the frequency noise, we can convert this into the frequency noise.
The final attachment is the estimated contribution of the AD829 to the frequency noise. As expected, it is negligible.
Attachment 1: FSS_PC_Path.pdf
FSS_PC_Path.pdf
Attachment 2: AD829Stability.png
AD829Stability.png
Attachment 3: AD797Stability.png
AD797Stability.png
Attachment 4: FreqNoiseByAD829.png
FreqNoiseByAD829.png
  1066   Wed Oct 22 09:42:41 2008 AlbertoDAQComputersc1iool0 rebooted and MC autolocker restarted
This morning I found the MC unlocked. The MC-Down script didn't work because of network problems in communicating with scipe7, a.k.a. c1iool0. Telneting to the computer was also impossible so I power cycled it from its key switch. The first time it failed so I repeated it a second time and then it worked.
Yoichi then restarted c1iovme. It was also necessary to restart the MC autolocker script according to the following procedure:
- ssh into op440m
- from op440m, ssh into op340m
- restart /cvs/cds/caltech/scripts/scripts/MC/autolockMCmain40
  1067   Wed Oct 22 12:37:47 2008 josephbUpdateComputersNetwork spreadsheet
Attached in open office format as well as excel format is spreadsheet containing all the devices with IP addresses at the 40m. Please contact me with any corrections.
Attachment 1: 40m_network_10-15-08.ods
Attachment 2: 40m_network_10-15-08.xls
  1070   Wed Oct 22 20:50:30 2008 AlbertoOmnistructureComputersGPS
Today I measured the GPS clock frequency at the output of CLOCK_MON in a board on the same crate where the c1iool0 computer is located. The monitor was connected with a BNC cable to the 10MHz reference input of the frequency counter on top of that rack, where it was used to check the 166MHz coming from one of the Marconi.

The frequency was supposed to be 10MHz but I actually measured 8 MHz. I tracked down the GPS input cable to the board and it turned out to come from one of the 1Y7 rack. Here it was connected to a board with a display that was showing corrupted digits, plus some leds on the front panel were red.

I'm not sure the GPS reference is working properly.
  1076   Thu Oct 23 18:51:19 2008 AlbertoMetaphysicsComputerseLog
I checked it and the latest version of the elog software, the 2.7.5 (we have the 2.6.5) has, among new nice features, the very good ability to fit the entries into the screen width without showing kilometric lines like we see now. Should we upgrade it?
  1088   Fri Oct 24 20:54:41 2008 ranaConfigurationComputerslinux2
I have removed linux2 and its cables from the control room and put it into 1Y3 along with op340m.

When Joe next comes in we can ask him to Cat6 it to the rest of the world, although it already
seems to me that the CDS hub/switch next Alberto's desk is too full and that we need to purchase
a 48 port device for there.
  1098   Tue Oct 28 12:01:01 2008 josephbConfigurationComputerslinux2

Quote:
I have removed linux2 and its cables from the control room and put it into 1Y3 along with op340m.

When Joe next comes in we can ask him to Cat6 it to the rest of the world, although it already
seems to me that the CDS hub/switch next Alberto's desk is too full and that we need to purchase
a 48 port device for there.


Note I still need to remove a fair bit of cabling no longer in use from the Martian network switch next to Alberto's desk. There's actually about 8-10 cables there which show no connectivity and are not being used. So there's really about 33% of the ports open in the control room hub, it just doesn't look like it.

As for linux2, I'll probably just connect it to the 1Y2 or 1Y6 Hubs when I get the chance.
  1101   Thu Oct 30 11:07:25 2008 YoichiUpdateComputersWireless bridges arrived
Five wireless bridges for the GPIB-Ethernet converters arrived.
One of them had a broken AC adapter. We have to send it back.
I configured the rest of the bridges for the 40MARS wireless network.
One of them was installed to the SR785.
I put the remaining ones in the top drawer of the cabinet, on which the label printers are sitting.
You can use those to connect any network device with a LAN port to the 40MARS network.
  1119   Thu Nov 6 22:07:56 2008 ranaConfigurationComputersELOG compile on Solaris
From the ELOG web pages:

Solaris:

Martin Huber reports that under Solaris 7 the following command line is needed to compile elog:

gcc -L/usr/lib/ -ldl -lresolv -lm -ldl -lnsl -lsocket elogd.c -o elogd

With some combinations of Solaris servers and client-side browsers there have also been problems with ELOG's keep-alive feature. In such a case you need to add the "-k" flag to the elogd command line to turn keep-alives off.
  1126   Mon Nov 10 11:32:49 2008 robUpdateComputersc1iscex rebooted

it was running a few cycles late
  1157   Fri Nov 21 21:28:32 2008 ranaSummaryComputersc0daqawg restart
A few minutes after restarting fb0 for the Guralp channels, the DAQAWG lights went red on the DAQ screens.
Why?? I chose revival procedure #3 for c0daqawg from the Wiki and it came back in a couple minutes.
  1159   Mon Nov 24 16:43:34 2008 ranaConfigurationComputersAlex and Jay took away some computers from the racks
I was over at Wilson house and saw Jay and Alex bring in 3 rackmount computers. One was a Sun 4600 and
then there were 2 3U black boxes. I got the impression that these were the data concentrators or
data collectors or framebuilder test boxes. They said that they got these from the 40m and no one was
in the lab to oppose them except for Bob and he didn't put up much of a fight.

Everything looks green on the DAQ Detail and RFM network screens so perhaps everything is OK. Beware.
  1170   Wed Dec 3 12:49:11 2008 jenneUpdateComputerssomething sketchy with NDS ... or something
Never mind...I had forgotten that you have to run mdv_config every time you open matlab, not just every time you boot a computer.

I am not able to get channels using get_data from the mDV toolbox on Allegra, Megatron or Rosalba.

The error I get while running the "hello_world" test program is:
hello_world
setting up configuration...
added paths for nds
added paths for qscan
couldn't add path for matapps_SDE
couldn't add path for matapps_path
couldn't add path for framecache
couldn't add path for ligotools_matlab
added paths for home_pwd
fetching channels for C...
Warning: get_channel_list() failed.
??? Error using ==> NDS_GetChannels
Failed to get channel list.

Error in ==> fetch_nds at 47
eval(['CONFIG.chl.' server ' = NDS_GetChannels(ab);']);

Error in ==> get_data at 100
out = fetch_nds(channels,dtype,start_time,duration);

Error in ==> hello_world at 6
aa = get_data('C1:LSC-DARM_ERR', 'raw', gps('now - 1 hour'), 32);
  1175   Thu Dec 4 16:29:20 2008 josephbConfigurationComputersError message on Frame Builder Raid Array
The Fibrenetix FX-606-U4 RAID connected to the frame builder in 1Y7 is showing the following error message: IDE Channel #4 Error Reading
  1177   Fri Dec 5 01:41:33 2008 YoichiConfigurationComputersMEDM screen snapshot now works on linux machines
As a part of my "make everything work on linux" project, I modified 'updatesnap' script so that linux machines can update MEDM screen snapshots.
Now, all 'updatesnap' in the subsystem directories (like medm/c1/lsc/cmd/updatesnap) are sym-link to /cvs/cds/caltech/medm/c1/cmd/updatesnap.
This script will take a window snapshot to a PNG file, and move the old snapshot to archive folders with date information added to the filename.
For compatibility, it also saves JPEG snapshot. Right now, most of 'view snapshot' menus in MEDM screens are calling 'sdtimage' command, which cannot display PNG files. I installed Imagemagick to op440m. We should change MEDM files to use 'display' command instead of 'sdtimage' so that it can show PNG files.
I've already changed some MEDM screens, but there are so many remaining to be modified.

PNG is better than JPEG for crisp images like screen shots. JPEG performs a sort of spacial Fourier transformations and low-pass filtering to compress the information. If it is used with sharp edges like boundaries of buttons on an MEDM screen, it naturally produces spacial aliasing (ghost images).

I also created several sym-links on the apps/linux/bin directory to mimic the Solaris-only commands, such as 'sdtimage', 'nedit' and 'dtterm'.
For example, nedit is symbolic linked to gedit. Many MEDM buttons/menus, which used to be incompatible with linux, now work fine on the linux machines.
  1181   Fri Dec 5 20:40:38 2008 YoichiHowToComputersElog multi-keyword search
The current Elog search allows you to look for only one keyword in the text.
You cannot search for two keywords by simply separating them with a white space.
That is, a search term "abc def" matches a literal "abc def", not a text containing "abc" and "def".
This is extremely annoying. However, there are still some ways to search for multiple keywords.
The Elog search fields are treated as regular expressions.
In order to match a text containing "abc" and "def", you can use a search term "abc.*def".
A period (.) means "any character", and an asterisk (*) means "any number of repetition of the preceding character".
Therefore, ".*" matches "any number of any character" i.e. anything.
The search term "abc.*def" works fine when you know "abc" appears first in the text you are looking for.
If you don't know the order of appearance of the keywords, you have two choices: either to use,
"(abc.*def)|(def.*abc)"
or
"(abc|def).*(abc|def)"
The vertical bar (|) means "or". Parentheses are used for grouping.
The first example does exactly what you want. However, you have to list all the permutations of your keywords
separated by |. If you have more than two keywords, it can be a very very long search word.
(The length of the search word is O(n!), where n is the number of keywords).
In the second example, the length of the keyword is O(n). However, it can also match a text containing two "abc".
This means the search result may contain some garbages (entries containing only "abc").
I guess in most cases we can tolerate this.

To automatically construct a multiple keyword search term for the Elog, I wrote a bash script called elogkeywd
and it is installed in the control room machines.
You can type
elogkeywd keyword1 keyword2 keyword3
to generate a regular expression for searching a text containing "keyword1", "keyword2" and "keyword3".
The generated expression is of the second type shown above. You can then copy-and-paste the result to
the Elog search field.
The script takes any number of keywords. However, there seems to be a limit on the number of characters you can type
into the search field of the Elog. I found the practical limit is about 3 keywords.
  1200   Sun Dec 21 14:18:04 2008 YoichiUpdateComputersRFM network bypass box's power supply is dead
I restarted the front-end computers by power cycling them one-by-one.
After issuing startup commands, most of them started normally at least by looking
at the output from telnet/ssh.
However, the status monitors of the FE computers on the EPICS screen are still red.
I noticed that all the LEDs on the VMIC 5594 RFM network bypass box are off.
According to the labels, fb40m, c0daqctrl, c0dcu are connected to the box.
This means (I believe) c1dcuepics cannot access the RFM network. So we have no control over
the FE computers through EPICS.

I pushed the reset button on the box, power cycled it, but nothing changed.
I checked the fuse and it was OK. Then I found that the power supply was dead.
It is a small AC adapter supplying +5VDC with a 5-pin DIN like connector.
We have to find a replacement.
  1201   Mon Dec 22 13:48:22 2008 YoichiUpdateComputersRFM network bypass box's power supply is dead
As a temporary fix, I cut the cable of the power supply and connected it to the Sorensen power supply +5V on the rack.
Now, the RFM bypass box is powered up, but some LEDs are red, which looks like a bad sign.
I restarted all the FE computers, but this time I got errors during the execution of the startup commands in the VxWorks machines.
The errors are "General Protection Fault" or "Invalid Opcode".
The linux machines do not show errors but still the status lights in EPICS are red.
We need Alex's help. He did not answer the phone, so Alberto left a voice mail.
  1202   Tue Dec 23 10:35:40 2008 YoichiUpdateComputersRFM network breakdown mostly fixed
Rana, Rolf, Alberto, Yoichi

The source of the problem was the RFM bypass box, as expected.
Rana pointed out that the long cable I used to bring the 5V from the Sorensen to the box
may cause a large voltage drop considering that the box is sucking ~3A.
So we connected the cable to another power supply (5V/5A linear power supply).
Then the LEDs on the bypass box turned green from red, and everything started to work.

A weired thing is that when I connected the cable to the wrong terminals of the power supply which
have lower current supply capabilities, the supply voltage dropped to 3V, but still the LEDs on the bypass box
turned green. This means the bypass box can live with 3V.
I noticed that there is a long cable from the Sorensen to the cross connect on the side of the rack, where I
connected my cable to the bypass box. This long cable had somewhat large resistance (1 or 2 Ohms) and dropped
the supply voltage to less than 3V ?
Anyway, the bypass box is now on a temporary power supply. Alberto was assigned a task to find a replacement power
supply.

There are two remaining problems.
c1susvme1 fails to start often claiming a DMA error on a Pentek. After several attempts, you can start the machine,
but after a while (1 hour ?) it fails again.
op340m is not responding to ssh login. It responds to ping.
We hooked up a monitor and keyboard (USB because the machine does not have a PS/2 port) to it and rebooted.
At the boot, it briefly displays a message "No keyboard, try TTYa", but after that no display signal.
Steve found me a serial cable. I will try to login to the machine using the serial port.

  1203   Wed Dec 24 10:33:24 2008 YoichiUpdateComputersSeveral fixes. Test point problem remains.
Yesterday, I fixed several remaining problems from the power failure.

I found a LEMO cable connecting the timing board to the Penteks was lose on the c1susvme1 crate.
After I pushed it in, the DMA error has not occured on c1susvme1.

I logged into op340m using a Null Modem Cable.
The computer was failing to boot because there were un-recoverable disk errors by the automatic fsck.
I run fsck manually and corrected some errors. After that, op340m booted normally and now it is working fine.
Here is the serial communication parameters I used to communicate with op340m:
>kermit      (I used kermit command for serial communication.)
>set modem type none
>set line /dev/ttyS0     (ttyS0 should be the device name of your serial port)
>set speed 9600
>set parity none
>set stop-bits 1
>set flow-control none
>connect

After fixing op340m, the MC locked.
Then I reset the HV amps. for the steering PZTs.
Somehow, the PZT1 PIT did not work. But after moving the slider back and forth several times, it started to work.

I reset the mechanical shutters around the lab.

I went ahead to align the mirrors. The X-arm locked but the alignment script did not improve
the arm power.
I found that test points are not available. (diag said test point management not available).
Looks like test point manager is not running. Called Rolf, but could not reach him.
I'm not even sure on which machine, the tp manager is supposed to be run.
Is it c0daqawg ?
  1204   Wed Dec 24 12:46:54 2008 YoichiUpdateComputersTest points are back
Rob told me how to restart the test point manager.
It runs on fb40m and actually there is an instruction on how to do that in the Wiki.
http://lhocds.ligo-wa.caltech.edu:8000/40m/Computer_Restart_Procedures#fb40m

I couldn't find the page because when I put a keyword in the search box on the upper right
corner of the Wiki page and hit "enter", it only searches for titles. To do a full text
search, you have to click on the "Text" button.

Anyway, now the test points are back.
  1206   Mon Dec 29 21:38:57 2008 YoichiUpdateComputersSnapshots of MEDM screens
I wrote scripts to take snapshots of MEDM screens in the background.
These scripts work even on a computer without a physical display attached.
You don't need to have X running.
So now the scripts run on nodus every 5 minutes from cron.
The screen shots are saved in /cvs/cds/caltech/statScreen/images/

There is a wiki page for the scripts.
http://lhocds.ligo-wa.caltech.edu:8000/40m/captureScreen.sh

Someone has to make a nice web page summarizing the captured images.
  1207   Mon Dec 29 21:51:02 2008 YoichiConfigurationComputersWeb server on nodus
The apache on nodus has been solely serving for the svn web access.
I changed the configuration and all files under /cvs/cds/caltech/users/public_html/ can be seen under
https://nodus.ligo.caltech.edu:30889/

The page is not password protected, but you can add a protection by putting an appropriate .htaccess
in your directory.
For the standard LVC password, put the following in your .htaccess
AuthType Basic  
AuthName "LVC password"
AuthUserFile /cvs/cds/caltech/apache/etc/LVC.auth
Require valid-user
  1221   Fri Jan 9 17:30:10 2009 KakeruUpdateComputersSnapshots of MEDM screens
I wrote a web page which shows snapshots of MEDM screens generated by Yoich's script (e-log #1206).
https://nodus.ligo.caltech.edu:30889/medm/screenshot.html
This page refreshes itself every 5 minutes automatically.

The .html file is generated by /cvs/cds/caltech/statScreen/bin/genHtml.pl
This script generates the .html file contains snapshots listed on /cvs/cds/caltech/statScreen/etc/medmScreens.txt every 5 minutes with cron.
When you wont to display other screens, please edit this .txt file and wait 5 minutes!


To make thumbnails, I wrote /cvs/cds/caltech/statScreen/bin/genThumbnail.pl
This script reads /cvs/cds/caltech/statScreen/etc/medmScreens.txt, too.
(Sometimes, it makes thumbnails with larger storage...)


Quote:
I wrote scripts to take snapshots of MEDM screens in the background.
These scripts work even on a computer without a physical display attached.
You don't need to have X running.
So now the scripts run on nodus every 5 minutes from cron.
The screen shots are saved in /cvs/cds/caltech/statScreen/images/

There is a wiki page for the scripts.
http://lhocds.ligo-wa.caltech.edu:8000/40m/captureScreen.sh

Someone has to make a nice web page summarizing the captured images.
ELOG V3.1.3-