40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 91 of 348  Not logged in ELOG logo
ID Date Author Type Categoryup Subject
  2891   Thu May 6 19:23:54 2010 FrankSummaryComputerssvn problems

i tried to commit something this afternoon and got the following error message:

Command: Commit 
Adding: C:\Caltech\Documents\40m-svn\nodus\frank 
Error: Commit failed (details follow): 
Error: Server sent unexpected return value (405 Method Not Allowed) in response to  
Error: MKCOL request for '/svn/!svn/wrk/d2523f8e-eda2-d847-b8e5-59c020170cec/trunk/frank' 
Finished!:  

anyone had this before? what's wrong?

  2971   Fri May 21 16:41:38 2010 Alberto, JoUpdateComputersIt's a boy!

Today the new Dell computer for the GSCS (General SURF Computing Side) arrived.

We put it together and hooked it up to a monitor. And guess what? It works!

I'm totally impressed by how the Windows get blurred on Windows 7 when you move them around. Good job Microsoft! Totally worth 5 years of R&D.

  2998   Thu May 27 08:22:57 2010 AidanUpdateComputersRestarted the elog this morning
  3061   Wed Jun 9 21:05:44 2010 ranaSummaryComputersop540m is not to be used

This is a reminder (mainly for Steve, who somehow doesn't believe these things) that op540m is not to be used for your general pleasure.

No web, no dataviewer, no DTT. Using these things often makes the graphical X-Windows crash. I have had to restart the StripTool, our seismic BLRMS and our Alarms many times because someone uses op540m, makes it crash, and then does not restart the processes.

Stop breaking op540m, Steve!

  3080   Wed Jun 16 11:31:19 2010 josephbSummaryComputersRemoved scaling fonts from medm on Allegra

Because it was driving me crazy while working on the new medm screens for the simulated plant, I went and removed the aliased font entries in /usr/share/X11/fonts/misc/fonts.alias that are associated with medm.  Specifically I removed the lines  starting with widgetDM_.  I made a backup in the same directory called fonts.alias.bak with the old lines.

Medm now behaves the same on op440m, rosalba, and allegra - i.e. it can't find the widgetDM_ scalable fonts and defaults to a legible fixed font.

  3081   Wed Jun 16 18:12:16 2010 nancyConfigurationComputers40MARS

i added my laptop's mac address to teh martian at port 13 today.

 

  3083   Wed Jun 16 18:44:07 2010 AlbertoConfigurationComputers40MARS

Quote:

i added my laptop's mac address to teh martian at port 13 today.

 

No personal laptop is allowed to the martian network. Only access to the General Computing Side is permitted.

Please disconnect it.

  3106   Wed Jun 23 15:15:53 2010 josephbSummaryComputers40m computer security issue from last night and this morning

The following is not 100% accurate, but represents my understanding of the events currently.  I'm trying to get a full description from Christian and will hopefully be able to update this information later today.

 

Last night around 7:30 pm, Caltech detected evidence of computer virus located behind a linksys router with mac address matching our NAT router, and at the IP 131.215.114.177.  We did not initially recognize the mac address as the routers because the labeled mac address was off by a digit, so we were looking for another old router for awhile.  In addition, pings to 131.215.114.177 were not working from inside or outside of the martian network, but the router was clearly working.  

However, about 5 minutes after Christian and Mike left, I found I could ping the address.  When I placed the address into a web browser, the address brought us to the control interface for our NAT router (but only from the martian side, from the outside world it wasn't possible to reach it).

They turned logging on the router (which had been off by default) and started monitoring the traffic for a short time.  Some unusual IP addresses showed up, and Mike said something about someone trying to IP spoof warning coming up.  Something about a file sharing port showing up was briefly mentioned as well.

The outside IP address was changed to 131.215.115.189 and dhcp which apparently was on, was turned off.  The password was changed and is in the usual place we keep router passwords.

Update: Christian said Mike has written up a security report and that he'll talk to him tomorrow and forward the relevant information to me.  He notes there is possibly an infected laptop/workstation still at large.  This could also be a personal laptop that was accidently connected to the martian network.  Since it was found to be set to dhcp, its possible a laptop was connected to the wrong side and the user might not have realized this.

 

  3115   Thu Jun 24 13:02:59 2010 JenneUpdateComputersSome lunchtime reboots

[Jenne, Megan, Frank]

We rebooted c1iovme, c1susvme1, and c1susvme2 during lunch.  Frank is going to write a thrilling elog about why c1iovme needed some attention.

C1susvme 1&2 have had their overflow numbers on the DAQ_RFMnetwork screen red at 16384 for the past few days.  While we were booting computers anyway, we booted the suses.  Unfortunately, they're still red.  I'm too hungry right now to deal with it....more to follow.

  3120   Fri Jun 25 12:09:27 2010 kiwamuUpdateComputersGPIB controller of HP8591E

I've just stolen a GPIB controller, an yellow small box, from the spectrum analyzer HP8591E.

The controller is going to be used for driving the old spectrum analyzer HP3563A for a while.

Gopal and I will be developing and testing a GPIB program code for HP3563A via the controller.

Once after we get a new GPIB controller, it will be back to the original place, i.e. HP8591E.

 

--- GPIB controller ----

name: teofila

address: 131.215.113.106

  3159   Tue Jul 6 17:05:30 2010 Megan and JoeUpdateComputersc1iovme reboot

We rebooted c1iovme because the lines stopped responding to inputs on C1:I00-MC_DRUM1. This fixed the problem.

  3172   Wed Jul 7 22:22:49 2010 JenneUpdateComputersSome channels not being recorded!!!

[Rana, Jenne]

We discovered to our great dismay that several important channels (namely C1:IOO-MC_L, but also everything on c1susvme2) are not being recorded, and haven't been since May 17th.  This corresponds to the same day that some other upgrade computers were installed.  Coincidence?

We've rebooted pretty much every FE computer and the FrameBuilder and DAQ_CONTROL approximately 18 times each (plus or minus some number).  No matter what we do, or what channels we comment out of the C1SUS2.ini file, we get a Status on the DAQ_Detail screen for c1susvme2 of 0x1000.  Except sometimes it is 0x2000.  Anyhow, it's bad, and we can't make it good again. 

I have emailed Joe about fixing this (with some assistance from Alberto, since we all know how much he likes doing the Nuclear Reboot option for the computers :)

  3176   Thu Jul 8 14:11:16 2010 josephbUpdateComputersSome channels not being recorded!!!

This has been fixed, thanks to some help from Alex. It doesn't correspond to new computers being put in, but rather corresponds to a dcu_id change I had made in the new LSC model.

The fundamental problem is way back when I built the new LSC model using "lsc" as the name instead of something like "tst", I forgot to go to the current frame builder master file (/cvs/cds/caltech/chans/daq/master) and comment out the C1LSC.ini line. Initially there was no conflict with c1susvme, because the initially was dcu_id 13. The dcu_id was eventually changed to 10 from 13 , and thats when it conflicted with the c1susvme2 dcu_id which was also 10. I checked it against wiki edits to my dcu_id list page and I apparently updated the list on May 20th when it changed from 13 to 10, so the time frame fits.  Apparently it was previously conflicting with C0GDS.ini or C1EXC.ini, which both seem to have dcu_id = 13 set, although the C1EXC file is all commented out. The C0GDS.ini file seems to be LSC and ASC test points only.

The solution was to comment out the C1LSC.ini file line in the /cvs/cds/caltech/chans/daq/master file and restart the framebuilder with the fixed file.

Quote:

[Rana, Jenne]

We discovered to our great dismay that several important channels (namely C1:IOO-MC_L, but also everything on c1susvme2) are not being recorded, and haven't been since May 17th.  This corresponds to the same day that some other upgrade computers were installed.  Coincidence?

We've rebooted pretty much every FE computer and the FrameBuilder and DAQ_CONTROL approximately 18 times each (plus or minus some number).  No matter what we do, or what channels we comment out of the C1SUS2.ini file, we get a Status on the DAQ_Detail screen for c1susvme2 of 0x1000.  Except sometimes it is 0x2000.  Anyhow, it's bad, and we can't make it good again. 

I have emailed Joe about fixing this (with some assistance from Alberto, since we all know how much he likes doing the Nuclear Reboot option for the computers :)

 

  3178   Thu Jul 8 15:19:27 2010 josephb, kojiConfigurationComputersAdded Zonet camera to IP table on linux1

We gave the Zonet camera the IP 192.168.113.26 and the name Zonet1.

We did this by modifying the /var/named/chroot/var/named/113.168.192.in-addr.arpa.zone and martian.zone files on linux1 as root.

  3179   Thu Jul 8 15:43:58 2010 ranaUpdateComputersSome channels not being recorded!!!

Quote:

This has been fixed, thanks to some help from Alex. It doesn't correspond to new computers being put in, but rather corresponds to a dcu_id change I had made in the new LSC model.

 Just as I expected, since these hunuman didn't actually check MC_L after doing this stuff, MC_L was only recording ZERO. Joe and I reset and restarted c1susmve2 and then

verified (for real this time) that the channel was visible in both the Dataviewer real time display as well as in the trend.

3-monkeys-ComicPosition.gif

The lesson here is that you NEVER trust that the problem has been fixed until you check for yourself. Also, we must always

specify a very precise test that must be used when we ask for help debugging some complicated software problem.

 

  3185   Fri Jul 9 11:09:14 2010 josephbUpdateComputersFb40m and a few other machines turned off briefly just before 11am

I turned off fb40m2 and fb40m temporarily while we added an extra power strip  to the (new) 1X6 rack at the bottom in the back.  This is to allow for the addition of the 4600 computer  given to us by Rolf (which needs a good name) into the rack above the fb machine.  The fb40m2 was unfortunately plugged into the main power connectors, so we unplugged two of its cables, and put them into the new strip. While trying to undo some of the rats nest of cables in the back I also powered down and unpluged briefly the c0dcu1, the pem crate, and the myrinet bypass box.

I am in the process of bringing those machines back up and restoring the network.

Also this morning, Megatron was moved from the end station into the (new) 1X3 rack, along with its router.  This is to allow for the installation of the new end computer and IO chassis.

 

  3226   Thu Jul 15 11:58:50 2010 josephbUpdateComputersAdded channel to ADCU_PEM (C0DCU1)

I modified the C1ADCU_PEM.ini file in /cvs/cds/caltech/chans/daq/ (after making a backup), and added a temporary channel called C1:PEM-TEMP_9, the 9 corresponding to the labeled 9 channel on the front of the BNC breakout in the 1Y7 rack.  The chnnum it was set to is 15008 (it was commented out and called C1:PEM-PETER_FE).  I also set the data rate to 2048.

I then did telnet fb40m 8087, and shutdown, and also hit the blue reconfig button on the DAQ status screen for the C0DCU1 machine.  The framebuilder came back up.  I confirmed the temporary channel, as well as the Guralp channels were still working from C0DCU1.

We have strung a cable in the cable trays from the SP table to the 1Y7 rack, which has been labeled as "Phasecam PD".  This will be used to record the output of an additional photodiode.

 

  3237   Fri Jul 16 15:57:19 2010 josephb,kiwamuUpdateComputersNew X end FE and IO chassis work

We finished setting up the new X end front end machine (still temporarily called c1scx), and attached it to its IO chassis.  We're preparing for a test tomorrow, where we redirect the Limo breakout box to the new front end and IO chassis, so Kiwamu can test getting some green locking channels into his controls model.

We strung a pair of blue fibers from the timing master to the new X end (and labeled them), so we have a timing signal for the IO chassis.  I also labeled the orange fiber Alex had repurposed from the RFM to timing for the new Y end when I noticed he had not actually labelled it at the timing master.

  3238   Fri Jul 16 16:07:14 2010 josephbUpdateComputersPossible solution for the last ADC

After talking with Jenne, I realized the ADC card in the c1ass machine was currently going unused.  As we are short an ADC card, a possible solution is to press that card into service.  Unfortunately, its currently on a PMC to PCI adapter, rather than PMC to PCIe adapter.  The two options I have are to try to find a different adapter board (I was handed 3 for RFM cards, so its possible there's another spare over in downs - unfortunately I missed Jay when I went over at 2:30 to check).  The other option is put it directly into a computer, the only option being megatron, as the other machines don't have full length PCI slot. 

I'm still waiting to hear back from Alex (who is in Germany for the next 10 days) whether I can connect both in the computer as well as with the IO chassis.

So to that end, I briefly turned off the c1ass machine, and pulled the card.  I then turned it back on, restarted all the code as per the wiki instructions, and had Jenne go over how it looked with me, to make sure everything was ok.

There is something odd with some of the channels reading 1e20 from the RFM network.  I believe this is related to those particular channels not being refreshed by their source (which is other suspension front end machines), so its just sitting at a default until the channel value actually changes.

 

 

  3239   Fri Jul 16 16:12:31 2010 AlbertoConfigurationComputersc1susvme1/2 rebooted

Today I noticed that the FE SYNC counters of c1susvme1/2 on the RFM network screen were stuck at 16384. I tried to reboot the machines to fix the problem but it didn't work.

The BS watchdog tripped off when I did that, because I had forgotten to disable it. I had to wait for a few minutes before it settled down again.

Later I also re-locked the mode cleaner. But before I could do it, Rana had to reduce the MC_L offset for me.

  3257   Wed Jul 21 12:20:29 2010 josephb, kiwamuUpdateComputersMegatron temporarily disconnected, c1iscex firewalled, green FE test

We are moving towards a first test of getting Kiwamu's green locking signals into the new front end at the new X end, as well as sending signal out to the green laser temperature control.

Towards that end, we borrowed the router which we were using as a firewall for megatron.   At the moment, megatron is not connected to the network.  The router (a linksys N wire router), was moved to the new X end, and setup to act as a firewall for the c1iscex machine.

At this point, we need to figure which channels of the DAC correspond to which outputs of the anti-imaging board (D000186) and coil driver outputs.  Ideally, we'd like to simply take a spare output from that board and bring it to the laser temperature control.  The watchdogs will be disabled when testing to avoid any unfortunate mis-sent signals to the coils.  It looks like it should be something like channels 6,7,8 are free, although I'm not positive if thats the correct mapping or if there's a n*8 + 6,7,8 mapping.

The ADC should be much easier to determine,  since we only have a single 16 channel set coming from the lemo breakout box.  Once we've determined channels, we should be all set to do a test with the green system.

  3308   Wed Jul 28 12:53:32 2010 channaUpdateComputersnds data listener

For the sake of writing it down: /cvs/cds/caltech/apps/linux64/rockNDS

  3310   Wed Jul 28 14:34:29 2010 channaUpdateComputersinstallation on allegra

I have done the following on allegra and rosalba:

[root@allegra caltech]# yum install glade2

On rosalba the matplotlib was out of date with respect to allegra.  I have no idea how the version 0.98 on allegra got there, but I left it.  However I updated rosalba to the epel version

  1 yum remove python-numpy
  2 yum install python-matplotlib numpy scipy --enablerepo=epel --disablerepo=rpmforge

 

This is all to support the LIGO data listener which now has a shortcut on rosalba and allegra's desktop.  It seems to work for (live mode) right now.
 

 

  3419   Fri Aug 13 09:41:00 2010 nancyOmnistructureComputersCharger for dell laptop

 I have taken the charger for the dark gray dell laptop from its station, and have labelled the information there too.

Will keep it back tonight.

  3433   Wed Aug 18 12:02:29 2010 AlastairConfigurationComputerselog had crashed again...

...I restarted it.

  3441   Thu Aug 19 09:52:51 2010 AlastairUpdateComputersElog down

 I restarted it using start-elog-nodus and this worked out fine - even though I did it from Pete's on my phone ;-)

  3442   Thu Aug 19 11:38:48 2010 AlastairUpdateComputersATF wiki

The ATF wiki page doesn't seem to be working any more.  Does anyone know where this is held so we can try to get it back online?  Thanks

  3443   Thu Aug 19 12:06:07 2010 AlastairUpdateComputersATF wiki

Quote:

The ATF wiki page doesn't seem to be working any more.  Does anyone know where this is held so we can try to get it back online?  Thanks

 I phoned Phil Ehrens and found out that all these wikis have been moved to a new wiki site

The ATF wiki can now be found here

I have updated the link from the 40m wiki to reflect this

  3476   Fri Aug 27 11:24:13 2010 JenneOmnistructureComputersop540m dead

I think op540m has finally bitten the dust.  I noticed that both of its screens were black, so I assumed that it had crashed due to known graphics card issues or something.  But upon closer inspection, it is way more dead than that.  I checked that it does have power (at least the power cable is securely plugged in at both ends, and the power strip its on is successfully powering several other computers), but I can't make any lights or anything come on by pressing the power button on the front of the computer tower.

Immediate consequences of op540 not being operational are the lack of DMT, and the lack of Alarms. 

Joe is doing an autopsy right now to see if its really dead, or only 'mostly dead'.

EDIT: Joe says maybe it's the power supply for the computer.  But he can't turn it on either.

  3485   Sun Aug 29 21:18:00 2010 ranaUpdateComputerskallo -> rossa

We changed the name of the new control room computer from kallo to rossa (since its red).

I also tried to install the nVidia graphics driver, but failed. I downloaded the one for the GeForce 310

for x86_64 from the nVidia website, but it failed to work. I installed it, but then X windows wouldn't start.

I've left it running a basic VESA driver.

Kiwamu updated the host tables to reflect the name change. We found that both rossa and allegra were

set up to look at the old 131.* DNS computers and so they were not resolving correctly. We set them up for new way.

  3486   Mon Aug 30 11:41:34 2010 kiwamuUpdateComputersdisable sendmail and isdn

 {Rana and Kiwamu}

Yesterday we disabled the sendmail daemon and the isdn daemon on allegra because we don't need these daemons always running.

 

-   How to disable/enable daemons:

sudo ntsysv

  3517   Thu Sep 2 21:22:31 2010 Sanjit, KojiConfigurationComputersrossa nvidia driver and dual monitor configuration

 

Simple steps (but don't try these on a working computer without getting some experience on a spare one, you may find it difficult to restore the system if something goes wrong):

  1. download the appropriate driver from NVIDIA website for this computer
    • we did: NVIDIA GeForce 310 64bit Linux, version: 256.53, release date 2010.08.31
  2. keep/move the driver in /root (use "sudo" or "su")
  3. reboot the computer in "single user" mode
    • in the GRUB screen edit the boot command by pressing the appropriate key listed in the screen
    • in the boot command-line put " single" in the end (no other change is normally needed), don't save
    • press ENTER and the system will reboot to a root shell (# prompt)
  4. cd /root
  5. run the NVIDIA driver script
  6. exit the shell (ctrl-d), let the system reboot
    • it should flash (mostly green) "nvidia" screen before starting X
    • in case of problems run system-config-display and revert to vesa driver
  7. login as "root" and run "nvidia-settings" from command line or GUI menu to add/configure display

 

  3518   Thu Sep 2 23:35:33 2010 ranaConfigurationComputersrossa nvidia driver and dual monitor configuration

Why are we running CentOS 4.8 instead of 5.5 ?    What runs at LLO?     What runs in Downs?

  3521   Fri Sep 3 11:23:16 2010 josephbConfigurationComputersrossa nvidia driver and dual monitor configuration
At LLO the machines are running Centos 5.5. A quick login confirms this. Specifically the release is 2.6.18-194.3.1.el5.

Quote:

Why are we running CentOS 4.8 instead of 5.5 ?    What runs at LLO?     What runs in Downs?

 

  3524   Sun Sep 5 21:35:41 2010 ranaConfigurationComputersrossa notes

**** Deleted
apps/emacs
apps/linux64/firefoxold
apps/linux64/comsol      (old v. 3.5)

* running up2date on rossa

* rossa needs to be able move windows between monitors: Xinerama?

* there are permissions problems: controls on rossa can't
make and delete directories made by 'controls' elsewhere.
Some sort of user# or group issue?

  3526   Mon Sep 6 10:08:10 2010 AlbertoConfigurationComputersNetgear Network Switch fan broken.

The Netgear Network Switch in the top shelf of Nodus' rack has a broken fan. It is the one interfaced to the Martian network.

The fan must have broken and it is has now started to produce a loud noise. It's like a truck was parked in the room with the engine running.

Also the other network switch, just below the Netgear, has one of its two fans broken. It is the one interfaced with the General Computer Side.

I tried to knock them to make the noise stop, but nothing happened.

We should consider trying to fix them. Although that would mean disconnecting all the computers.

  3531   Tue Sep 7 10:50:53 2010 josephbConfigurationComputersrossa notes

The controls group is user id 500 by default on most new machines. Unfortunately, the user ID used across the already existing machines is 1001. One method of doing this switch is in this elog.  You can also do the change of the controls ID by becoming root and using the graphical command system-config-users.  This will  let you change the user ID and group ID for controls to 1001.  This graphical interface also lets you change the login shell.

Unfortunately, I had some minor difficulty and I ended up removing the old controls and creating a new controls account with the correct values and using tcsh.  The .cshrc file has been recreated to source cshrc.40m.  The controls account now has correct permissions, although some of the preferences such as background will need to be reset.

 

 

Quote:

**** Deleted
apps/emacs
apps/linux64/firefoxold
apps/linux64/comsol      (old v. 3.5)

* running up2date on rossa

* rossa needs to be able move windows between monitors: Xinerama?

* there are permissions problems: controls on rossa can't
make and delete directories made by 'controls' elsewhere.
Some sort of user# or group issue?

 

  3535   Tue Sep 7 15:57:07 2010 AlbertoConfigurationComputersNodus connection not working. Fixed

[Joe, Alberto]

The Nodus connection to the Martian network stopped working after someone switched cables on the Netgear router. Apparently that router doesn't like to have the 23 and 24 ports connected at the same time.

Joe fixed the connection just freeing either the 23 or the 24 port.

  3537   Tue Sep 7 22:21:17 2010 AlbertoConfigurationComputerselog restarted
  3539   Tue Sep 7 23:17:45 2010 sanjitConfigurationComputersrossa notes

Quote:

* rossa needs to be able move windows between monitors: Xinerama?

 Xinerama support has been enabled on rossa using nvidia-settings.

  3540   Tue Sep 7 23:34:15 2010 Kiwamu, SanjitConfigurationComputerse-log

e-log was repeatedly hanging and several attempts to start the daemon failed.

problem was solved after clearing the (firefox) browser cache, cookie, everything!!

 

  3541   Tue Sep 7 23:49:08 2010 sanjitConfigurationComputersaldabella network configuration

 

added name server 192.169.113.20 as the first entry in /etc/resolv.conf

changed the host IPs in /etc/hosts to 192.168.xxx.yyy

made:

127.0.0.1 localhost.localdomain localhost

::1 localhost6.localdomain6 localhos6

as the first two lines of /etc/hosts

 

/cvs/cds mounts

on ethernet, DNS look-up works without the explicit host definitions in /etc/hosts,

but those entries are needed for wifi only connection.

 

  3557   Sat Sep 11 03:16:51 2010 ranaConfigurationComputersrossa notes

I wiped out the old CentOS install on rossa and installed CentOS 5.5 on there. The DVDs are on a spindle in the control room; there were 2 iso's, but I only needed the first to install most things.

It still needs to get all of the usual stuff (java, flash, nvidia) installed as well as setting up the .cshrc and the NFS mount of /cvs/cds. The userID and groupID are set to 1001 as before. Whoever

sees Sanjit first should steer him towards this elog entry.

 

  3559   Sun Sep 12 22:36:03 2010 ranaConfigurationComputersrossa notes

I found Sanjit's instructions for doing the Nvidia settings too complicated and so I followed these instructions from Facebook:

http://www.facebook.com/notes/centos-howtos/installing-nvidia-display-drivers-on-centos-55/399295987425

After installations, the monitors were autodetected and the Xinerama effect is working.

  3588   Mon Sep 20 10:33:21 2010 josephbBureaucracyComputersLarry stopped by - GC machine had conflicting IP

Larry stopped by today and had to disconnect the m25 machine (this is the 1st GC machine on the left as you walk into the control room) because its IP was conflicting with a machine over in Downs.  Do not use 131.215.115.125 as the IP on this machine as this is already assigned to someone else.  They couldn't figure out the root password to change it which is why it is not currently plugged into the network, and is not to be until an appropriate IP is assigned.

They've asked that whoever set the machine up to please contact them (extension 2974).

  3595   Wed Sep 22 22:22:12 2010 KojiConfigurationComputersNetgear Network Switch fan broken.

Net switch mumbo-jumbo:

Although Rana is going to buy a replacement for the Netgear Switch for martian, I opened the lid of the Netgear as the fan already have stopped working.
Also the lid of the other network switch for GC (Black one) was opened as it has a broken fan and a noisy half-broken fan.

I have asked Steve to buy replacement fans. These would also be the replacement of the replacement.

During the work, it seemed that I accidentally toggled the power supply of linux1. It lead lengthy fsck of the storage.
This is why all of the machines which rely on linux1 got freezed. linux1 is back and the machines looked happy now.

If you find any machine disconnected from the network, please consult with me.

Quote:

The Netgear Network Switch in the top shelf of Nodus' rack has a broken fan. It is the one interfaced to the Martian network.

The fan must have broken and it is has now started to produce a loud noise. It's like a truck was parked in the room with the engine running.

Also the other network switch, just below the Netgear, has one of its two fans broken. It is the one interfaced with the General Computer Side.

I tried to knock them to make the noise stop, but nothing happened.

We should consider trying to fix them. Although that would mean disconnecting all the computers.

 

  3597   Thu Sep 23 02:45:30 2010 KojiSummaryComputersnodus gracefully rebooted

Zach> Nodus seemed to be working fine again, and I was browsing the elog with no
Zach> problem. I tried making an entry, but when I started uploading a file it
Zach> became unresponsive. Tried SSHing, but I get no prompt after the welcome
Zach> blurb. ^C gives me some kind of tcsh prompt (">"), which only really
Zach> responds to ^D (logout). Don't know what else to do, but I assume someone
Zach> knows what's going on.

By gracefully rebooting nodus, the problem was solved.


It (">") actually was the tcsh prompt, but any commands with the shared or dynamic link libraries looked unfunctional.

I could use
    cd /.../...
and
    echo *
to browse the directory tree. The main mounted file systems like /, /usr, /var, /cvs/cds/caltech looked fine.
I was afraid that the important library files were damaged.

I tried
    umountall
and
    mountall
in order to flush the file systems.
These should run even without the libraries as mount must properly work even before /usr is mounted.

They indeed did something to the system. Once I re-launch a new login shell, the prompt was still ">"
but now I could use most of the commands.

I have rebooted by usual sudo-ing and now the services on nodus are back to the functional state again.

# nodus was working in the evening at around 9pm. I even made an e-log entry about that.
# So I like to assume this is not directly related to the linux1 incident. Something else could have happened.

  3598   Thu Sep 23 10:34:20 2010 ranaFrogsComputersnodus gracefully rebooted

SVN down

mafalda down

I am guessing that the NFS file system hangup may have caused some machines to get into an awkward state. We may be best off doing a controlled power cycle of everything...

  3599   Thu Sep 23 11:15:20 2010 KojiFrogsComputersnodus gracefully rebooted

svn is back after starting apache on nodus.

http://lhocds.ligo-wa.caltech.edu:8000/40m/ApacheOnNodus

Quote:

SVN down

mafalda down

I am guessing that the NFS file system hangup may have caused some machines to get into an awkward state. We may be best off doing a controlled power cycle of everything...

 

  3601   Thu Sep 23 13:16:57 2010 KojiFrogsComputersnodus gracefully rebooted

mafalda is up now.

I found that the cable for mafalda (the sole red cable) had a broken latch.
The cable was about falling off from the switch. As a first-aid, I used this technique to put a new latch, and put it into the switch.

Now I can logged in it. I did not rebooted it.

Quote:

SVN down

mafalda down

I am guessing that the NFS file system hangup may have caused some machines to get into an awkward state. We may be best off doing a controlled power cycle of everything...

 

ELOG V3.1.3-