40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 262 of 344  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  12854   Tue Feb 28 01:28:52 2017 johannesUpdateComputersc1psl un-bootable

It turned out the 'ringing' was caused by the respective other ETM still being aligned. For these reflection measurements both test masses of the other arm need to be misaligned. For the ETM it's sufficient to use the Misalign button in the medm screens, while the ITM has to be manually misaligned to move the reflected beam off the PD.

I did another round of armloss measurements today. I encountered some problems along the way

  • Some time today (around 6pm) most of the front end models had crashed and needed to be restarted GV: actually it was only the models on c1lsc that had crashed. I noticed this on Friday too.
  • ETMX keeps getting kicked up seemingly randomly. However, it settles fast into it's original position.

General Stuff:

  • Oscilloscope should sample both MC power (from MC2 transmitted beam) and AS signal
  • Channel data can only be loaded from the scope one channel at a time, so 'stop' scope acquisition and then grab the relevant channels individually
  • Averaging needs to be restarted everytime the mirrors are moved triggering stop and run remotely via the http interface scripts does this.

Procedure:

  1.     Run LSC Offsets
  2.     With the PSL shutter closed measure scope channel dark offsets, then open shutter
  3.     Align all four test masses with dithering to make sure the IFO alignment is in a known state
  4.     Pick an arm to measure
  5.     Turn the other arm's dither alignment off
  6.     'Misalign' that arm's ETM using medm screen button
  7.     Misalign that arm's ITM manually after disabling its OpLev servos looking at the AS port camera and make sure it doesn't hit the PD anymore.
  8.     Disable dithering for primary arm
  9.     Record MC and AS time series from (paused) scope
  10.     Misalign primary ETM
  11.     Repeat scope data recording

Each pair of readings gives the reflected power at the AS port normalized to the IMC stored power:

\widehat{P}=\frac{P_{AS}-\overline{P}_{AS}^\mathrm{dark}}{P_{MC}-\overline{P}_{MC}^\mathrm{dark}}

which is then averaged. The loss is calculated from the ratio of reflected power in the locked (L) vs misaligned (M) state from

\mathcal{L}=\frac{T_1}{4\gamma}\left[1-\frac{\overline{\widehat{P}_L}}{\overline{\widehat{P}_M}} +T_1\right ]-T_2

Acquiring data this way yielded P_L/P_M=1.00507 +/- 0.00087 for the X arm and P_L/P_M=1.00753 +/- 0.00095 for the Y arm. With \gamma_x=0.832 and \gamma_x=0.875 (from m1=0.179, m2=0.226 and 91.2% and 86.7% mode matching in X and Y arm, respectively) this yields round trip losses of:

\mathcal{L}_X=21\pm4\,\mathrm{ppm}  and  \mathcal{L}_Y=13\pm4\,\mathrm{ppm}, which is assuming a generalized 1% error in test mass transmissivities and modulation indices. As we discussed, this seems a little too good to be true, but at least the numbers are not negative.

  12943   Thu Apr 13 21:01:20 2017 ranaConfigurationComputersLG UltraWide on Rossa

we installed a new curved 34" doublewide monitor on Rossa, but it seems like it has a defective dead pixel region in it. Unless it heals itself by morning, we should return it to Amazon. Please don't throw out he packing materials.

Steve 8am next morning: it is still bad The monitor is cracked. It got kicked while traveling. It's box is damaged the same place.

Shipped back 4-17-2017

Attachment 1: LG34c.jpg
LG34c.jpg
Attachment 2: crack.jpg
crack.jpg
  12965   Wed May 3 16:12:36 2017 johannesConfigurationComputerscatastrophic multiple monitor failures

It seems we lost three monitors basically overnight.

The main (landscape, left) displays of Pianosa, Rossa and Allegra are all broken with the same failure mode:

their backlights failed. Gautam and I confirmed that there is still an image displayed on all three, just incredibly faint. While Allegra hasn't been used much, we can narrow down that Pianosa's and Rossa's monitors must have failed within 5 or 6 hours of each other, last night.

One could say ... they turned to the dark side cool

Quick edit; There was a functioning Dell 24" monitor next to the iMac that we used as a replacement for Pianosa's primary display. Once the new curved display is paired with Rossa we can use its old display for Donatella or Allegra.

  12966   Wed May 3 16:46:18 2017 KojiConfigurationComputerscatastrophic multiple monitor failures

- Is there any machine that can handle 4K? I have one 4K LCD for no use.
- I also can donate one 24" Dell

  12971   Thu May 4 09:52:43 2017 ranaConfigurationComputerscatastrophic multiple monitor failures

That's a new failure mode. Probably we can't trust the power to be safe anymore.

Need Steve to order a couple of surge suppressing power strips for the monitors. The computers are already on the UPS, so they don't need it.

  12978   Tue May 9 15:23:12 2017 SteveConfigurationComputerscatastrophic multiple monitor failures

Gautam and Steve,

Surge protective power strip was install on Friday, May 5 in the Control Room

Computers not connected to the UPS are plugged into Isobar12ultra.

Quote:

That's a new failure mode. Probably we can't trust the power to be safe anymore.

Need Steve to order a couple of surge suppressing power strips for the monitors. The computers are already on the UPS, so they don't need it.

 

Attachment 1: Trip-Lite.jpg
Trip-Lite.jpg
  12993   Mon May 15 20:43:25 2017 ranaConfigurationComputerscatastrophic multiple monitor failures

this is not the right one; this Ethernet controlled strip we want in the racks for remote control.

Buy some of these for the MONITORS.

Quote:

Surge protective power strip was install on Friday, May 5 in the Control Room

Computers not connected to the UPS are plugged into Isobar12ultra.

Quote:

That's a new failure mode. Probably we can't trust the power to be safe anymore.

Need Steve to order a couple of surge suppressing power strips for the monitors. The computers are already on the UPS, so they don't need it.

 

  13037   Sun Jun 4 14:19:33 2017 ranaFrogsComputersNetwork slowdown: Martians are behind a waterwall

A few weeks ago we did some internet speed tests and found a dramatic difference between our general network and our internal Martian network in terms of access speed to the outside world.

As you can see, the speed from nodus is consistent with a Gigabit connection. But the speeds from any machine on the inside is ~100x slower. We need to take a look at our router / NAT setup to see if its an old hardware problem or just something in the software firewall. By comparison, my home internet download speed test is ~48 Mbit/s; ~6x faster than our CDS computers.


controls@megatron|~> speedtest
/usr/local/bin/speedtest:5: UserWarning: Module dap was already imported from None, but /usr/lib/python2.7/dist-packages is being added to sys.path
  from pkg_resources import load_entry_point
Retrieving speedtest.net configuration...
Testing from Caltech (131.215.115.189)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Race Communications (Los Angeles, CA) [29.63 km]: 6.52 ms
Testing download speed................................................................................
Download: 6.35 Mbit/s
Testing upload speed................................................................................................
Upload: 5.10 Mbit/s
controls@megatron|~> exit
logout
Connection to megatron closed.
controls@nodus|~ > speedtest
Retrieving speedtest.net configuration...
Testing from Caltech (131.215.115.52)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Phyber Communications (Los Angeles, CA) [29.63 km]: 2.196 ms
Testing download speed................................................................................
Download: 721.92 Mbit/s
Testing upload speed................................................................................................
Upload: 251.38 Mbit/s

Attachment 1: Screen_Shot_2017-06-04_at_1.47.47_PM.png
Screen_Shot_2017-06-04_at_1.47.47_PM.png
Attachment 2: Screen_Shot_2017-06-04_at_1.44.42_PM.png
Screen_Shot_2017-06-04_at_1.44.42_PM.png
  13044   Mon Jun 5 21:53:55 2017 ranaUpdateComputersrossa: ubuntu 16.04

With the network config, mounting, and symlinks setup, rossa is able to be used as a workstation for dataviewer and MEDM. For DTT, no luck since there is so far no lscsoft support past the Ubuntu14 stage.

  13050   Wed Jun 7 15:41:51 2017 SteveUpdateComputerswindow laptop scanned

Randy Trudeau scanned our Window laptop Dell 13" Vostro and Steve's memory stick for virus. Nothing was found. The search continues...

Rana thinks that I'm creating these virus beasts with taking pictures with Dino Capture and /or Data Ray on the window machine........

 

 

  13065   Thu Jun 15 14:24:48 2017 Kaustubh, JigyasaUpdateComputersOttavia Switched On

Today, I and Jigyasa connected the Ottavia to one of the unused monitor screens Donatella. The Ottavia CPU had a label saying 'SMOKED''. One of the past elogs, 11091, dated back in March 2015, by Jenne had an update regarding the Ottavia smelling 'burny'. It seems to be working fine for about 2 hours now. Once it is connected to the Martian Network we can test it further. The Donatella screen we used seems to have a graphic problem, a damage to the display screen. Its a minor issue and does not affect the display that much, but perhaps it'll be better to use another screen if we plan to use the Ottavia in the future. We will power it down if there is an issue with it.

  13067   Thu Jun 15 19:49:03 2017 Kaustubh, JigyasaUpdateComputersOttavia Switched On

It has been working fine the whole day(we didn't do much testing on it though). We are leaving it on for the night.

Quote:

Today, I and Jigyasa connected the Ottavia to one of the unused monitor screens Donatella. The Ottavia CPU had a label saying 'SMOKED''. One of the past elogs, 11091, dated back in March 2015, by Jenne had an update regarding the Ottavia smelling 'burny'. It seems to be working fine for about 2 hours now. Once it is connected to the Martian Network we can test it further. The Donatella screen we used seems to have a graphic problem, a damage to the display screen. Its a minor issue and does not affect the display that much, but perhaps it'll be better to use another screen if we plan to use the Ottavia in the future. We will power it down if there is an issue with it.

 

  13068   Fri Jun 16 12:37:47 2017 Kaustubh, JigyasaUpdateComputersOttavia Switched On

Ottavia had been left running overnight and it seems to work fine. There has been no smell or any noticeable problems in the working. This morning Gautam, Kaustubh and I connected Ottavia to the Matrian Network through the Netgear switch in the 40m lab area. We were able to SSH into Ottavia through Pianosa and access directories. On the ottavia itself we were able to run ipython, access the internet. Since it seems to work out fine, Kaustubh and I are going to enable the ethernet connection to Ottavia and secure the wiring now.  

Quote:

It has been working fine the whole day(we didn't do much testing on it though). We are leaving it on for the night.

Quote:

Today, I and Jigyasa connected the Ottavia to one of the unused monitor screens Donatella. The Ottavia CPU had a label saying 'SMOKED''. One of the past elogs, 11091, dated back in March 2015, by Jenne had an update regarding the Ottavia smelling 'burny'. It seems to be working fine for about 2 hours now. Once it is connected to the Martian Network we can test it further. The Donatella screen we used seems to have a graphic problem, a damage to the display screen. Its a minor issue and does not affect the display that much, but perhaps it'll be better to use another screen if we plan to use the Ottavia in the future. We will power it down if there is an issue with it.

 

 

  13071   Fri Jun 16 23:27:19 2017 Kaustubh, JigyasaUpdateComputersOttavia Connected to the Netgear Box

I just connected the Ottavia to the Netgear box and its working just fine. It'll remain switched on over the weekend.

Quote:

Kaustubh and I are going to enable the ethernet connection to Ottavia and secure the wiring now.  

 

  13154   Mon Jul 31 20:35:42 2017 KojiSummaryComputersChiara backup situation summary

Summary
- CDS Shared files system: backed up
- Chiara system itself: not backed up


controls@chiara|~> df -m
Filesystem     1M-blocks    Used Available Use% Mounted on
/dev/sda1         450420   11039    416501   3% /
udev               15543       1     15543   1% /dev
tmpfs               3111       1      3110   1% /run
none                   5       0         5   0% /run/lock
none               15554       1     15554   1% /run/shm
/dev/sdb1        2064245 1718929    240459  88% /home/cds
/dev/sdd1        1877792 1426378    356028  81% /media/fb9bba0d-7024-41a6-9d29-b14e631a2628
/dev/sdc1        1877764 1686420     95960  95% /media/40mBackup

/dev/sda1 : System boot disk
/dev/sdb1 : main cds disk file system 2TB partition of 3TB disk (1TB vacant)
/dev/sdc1 : Daily backup of /dev/sdb1 via a cron job (/opt/rtcds/caltech/c1/scripts/backup/localbackup)

/dev/sdd1 : 2014 snap shot of cds. Not actively used. USB

https://nodus.ligo.caltech.edu:8081/40m/11640

 

  13159   Wed Aug 2 14:47:20 2017 KojiSummaryComputersChiara backup situation summary

I further made the burt snapshot directories compressed along with ELOG 11640. This freed up additional ~130GB. This will eventually help to give more space to the local backup (/dev/sdc1)

controls@chiara|~> df -m
Filesystem     1M-blocks    Used Available Use% Mounted on
/dev/sda1         450420   11039    416501   3% /
udev               15543       1     15543   1% /dev
tmpfs               3111       1      3110   1% /run
none                   5       0         5   0% /run/lock
none               15554       1     15554   1% /run/shm
/dev/sdb1        2064245 1581871    377517  81% /home/cds
/dev/sdd1        1877792 1426378    356028  81% /media/fb9bba0d-7024-41a6-9d29-b14e631a2628
/dev/sdc1        1877764 1698489     83891  96% /media/40mBackup

 

 

  13160   Wed Aug 2 15:04:15 2017 gautamConfigurationComputerscontrol room workstation power distribution

The 4 control room workstation CPUs (Rossa, Pianosa, Donatella and Allegra) are now connected to the UPS.

The 5 monitors are connected to the recently acquired surge-protecting power strips.

Rack-mountable power strip + spare APC Surge Arrest power strip have been stored in the electronics cabinet.

Quote:

this is not the right one; this Ethernet controlled strip we want in the racks for remote control.

Buy some of these for the MONITORS.

 

  13227   Thu Aug 17 22:54:49 2017 ericqUpdateComputersTrying to access JetStor RAID files

The JetStor RAID unit that we had been using for frame writing before the fb meltdown has some archived frames from DRFPMI locks that I want to get at. I spent some time today trying to mount it on optimus with no success crying

The unit was connected to fb via a SCSI cable to a SCSI-to-PCI card inside of fb. I moved the card to optimus, and attached the cable. However, no mountable device corresponding to the RAID seems to show up anywhere.

The RAID unit can tell that it's hooked up to a computer, because when optimus restarts, the RAID event log says "Host Channel 0 - SCSI Bus Reset."

The computer is able to get some sort of signals from the RAID unit, because when I change the SCSI ID, the syslog will say 'detected non-optimal RAID status'.

The PCI card is ID'd fine in lspci as "06:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev c1)"

'lsssci' does not list anything related to the unit

Using 'mpt-status -p', which is somehow associated with this kind of thing returns the disheartening output:

Checking for SCSI ID:0
Checking for SCSI ID:1
Checking for SCSI ID:2
Checking for SCSI ID:3
Checking for SCSI ID:4
Checking for SCSI ID:5
Checking for SCSI ID:6
Checking for SCSI ID:7
Checking for SCSI ID:8
Checking for SCSI ID:9
Checking for SCSI ID:10
Checking for SCSI ID:11
Checking for SCSI ID:12
Checking for SCSI ID:13
Checking for SCSI ID:14
Checking for SCSI ID:15
Nothing found, contact the author
 
I don't know what to try at this point.
  13239   Tue Aug 22 15:17:19 2017 ericqUpdateComputersOld frames accessible again

It turns out the problem was just a bent pin on the SCSI cable, likely from having to stretch things a bit to reach optimus from the RAID unit.frown

I hooked it up to megatron, and it was automatically recognized and mounted. yes

I had to turn off the new FB machine and remove it from the rack to be able to access megatron though, since it was just sitting on top. FB needs a rail to sit on!

At a cursory glance, the filesystem appears intact. I have copied over the achived DRFPMI frame files to my user directory for now, and Gautam is going to look into getting those permanently stored on the LDAS copy of 40m frames, so that we can have some redundancy.

Also, during this time, one of the HDDs in the RAID unit failed its SMART tests, so the RAID unit wanted it replaced. There were some spare drives in a little box directly under the unit, so I've installed one and am currently incorporating it back into the RAID.

There are two more backup drives in the box. We're running a RAID 5 configuration, so we can only lose one drive at a time before data is lost.

  13240   Tue Aug 22 15:40:06 2017 gautamUpdateComputersOld frames accessible again

[jamie, gautam]

I had some trouble getting the daqd processes up and running again using Jamie's instructions.

With Jamie's help however, they are back up and running now. The problem was that the mx infrastructure didn't come back up on its own. So prior to running sudo systemctl restart daqd_*, Jamie ran sudo systemctl start mx. This seems to have done the trick.

c1iscey was still showing red fields on the CDS overview screen so Jamie did a soft reboot. The machine came back up cleanly, so I restarted all the models. But the indicator lights were still red. Apparently the mx processes weren't running on c1iscey. The way to fix this is to run sudo systemctl start mx_stream. Now everything is green.

Now we are going to work on trying the fix Rolf suggested on c1iscex.

Quote:

It turns out the problem was just a bent pin on the SCSI cable, likely from having to stretch things a bit to reach optimus from the RAID unit.frown

I hooked it up to megatron, and it was automatically recognized and mounted. yes

I had to turn off the new FB machine and remove it from the rack to be able to access megatron though, since it was just sitting on top. FB needs a rail to sit on!

At a cursory glance, the filesystem appears intact. I have copied over the achived DRFPMI frame files to my user directory for now, and Gautam is going to look into getting those permanently stored on the LDAS copy of 40m frames, so that we can have some redundancy.

Also, during this time, one of the HDDs in the RAID unit failed its SMART tests, so the RAID unit wanted it replaced. There were some spare drives in a little box directly under the unit, so I've installed one and am currently incorporating it back into the RAID.

There are two more backup drives in the box. We're running a RAID 5 configuration, so we can only lose one drive at a time before data is lost.

 

  13242   Tue Aug 22 17:11:15 2017 gautamUpdateComputersc1iscex model restarts

[jamie, gautam]

We tried to implement the fix that Rolf suggested in order to solve (perhaps among other things) the inability of some utilities like dataviewer to open testpoints. The problem isn't wholly solved yet - we can access actual testpoint data (not just zeros, as was the case) using DTT, and if DTT is used to open a testpoint first, then dataviewer, but DV itself can't seem to open testpoints.

Here is what was done (Jamie will correct me if I am mistaken).

  1. Jamie checked out branch 3.4 of the RCG from the SVN.
  2. Jamie recompiled all the models on c1iscex against this version of RCG.
  3. I shutdown ETMX watchdog, then ran rtcds stop all on c1iscex to stop all the models, and then restarted them using rtcds start <model> in the order c1x01, c1scx and c1asx. 
  4. Models came back up cleanly. I then restarted the daqd_dc process on FB1. At this point all indicators on the CDS overview screen were green.
  5. Tried getting testpoint data with DTT and DV for ETMX Oplev Pitch and Yaw IN1 testpoints. Conclusion as above.

So while we are in a better state now, the problem isn't fully solved. 

Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.

  13243   Tue Aug 22 18:36:46 2017 gautamUpdateComputersAll FE models compiled against RCG3.4

After getting the go ahead from Jamie, I recompiled all the FE models against the same version of RCG that we tested on the c1iscex models.

To do so:

  • I did rtcds make and rtcds install for all the models.
  • Then I ssh-ed into the FEs and did rtcds stop all, followed by rtcds start <model> in the order they are listed on the CDS overview MEDM screen (top to bottom).
  • During the compilation process (i.e. rtcds make), for some of the models, I got some compilation warnings. I believe these are related to models that have custom C code blocks in them. Jamie tells me that it is okay to ignore these warnings at that they will be fixed at some point.
  • c1lsc FE crashed when I ran rtcds stop all - had to go and do a manual reboot.
  • Doing so took down the models on c1sus and c1ioo that were running - but these FEs themselves did not have to be robooted.
  • Once c1lsc came back up, I restarted all the models on the vertex FEs. They all came back online fine.
  • Then I ssh-ed into FB1, and restarted the daqd processes - but c1lsc and c1ioo CDS indicators were still red.
  • Looks like the mx_stream processes weren't started automatically on these two machines. Reasons unknown. Earlier today, the same was observed for c1iscey.
  • I manually restarted the mx_stream processes, at which point all CDS indicator lights became green (see Attachment #1).

IFO alignment needs to be redone, but at least we now have a (admittedly rounabout way) of getting testpoints. Did a quick check for "nan-s" on the ASC screen, saw none. So I am re-enabling watchdogs for all optics.

GV 23 August 9am: Last night, I re-aligned the TMs for single arm locks. Before the model restarts, I had saved the good alignment on the EPICs sliders, but the gain of x3 on the coil driver filter banks have to be manually turned on at the moment (i.e. the safe.snap file has them off). ALS noise looked good for both arms, so just for fun, I tried transitioning control of both arms to ALS (in the CARM/DARM basis as we do when we lock DRFPMI, using the Transition_IR_ALS.py script), and was successful.

Quote:

[jamie, gautam]

We tried to implement the fix that Rolf suggested in order to solve (perhaps among other things) the inability of some utilities like dataviewer to open testpoints. The problem isn't wholly solved yet - we can access actual testpoint data (not just zeros, as was the case) using DTT, and if DTT is used to open a testpoint first, then dataviewer, but DV itself can't seem to open testpoints.

Here is what was done (Jamie will correct me if I am mistaken).

  1. Jamie checked out branch 3.4 of the RCG from the SVN.
  2. Jamie recompiled all the models on c1iscex against this version of RCG.
  3. I shutdown ETMX watchdog, then ran rtcds stop all on c1iscex to stop all the models, and then restarted them using rtcds start <model> in the order c1x01, c1scx and c1asx. 
  4. Models came back up cleanly. I then restarted the daqd_dc process on FB1. At this point all indicators on the CDS overview screen were green.
  5. Tried getting testpoint data with DTT and DV for ETMX Oplev Pitch and Yaw IN1 testpoints. Conclusion as above.

So while we are in a better state now, the problem isn't fully solved. 

Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.

 

Attachment 1: CDS_Aug22.png
CDS_Aug22.png
  13277   Wed Aug 30 22:15:47 2017 ranaOmnistructureComputersUSB flash drives moved

I have moved the USB flash drives from the electronics bench back into the middle drawer of the cabinet next to the AC which is west of the fridge. Drawer re-enlabeled.

  13287   Fri Sep 1 16:55:27 2017 gautamUpdateComputersTestpoints now accessible again

Thanks to Jonathan Hanks, it appears we can now access test-points again using dataviewer.

I haven't done an exhaustive check just yet, but I have loaded a few testpoints in dataviewer, and ran a script that use testpoint channels (specifically the ALS phase tracker UGF setting script), all seems good.

So if I remember correctly, the major CDS fix now required is to solve the model unloading issue.

Thanks to Jamie/Jonathan Hanks/KT for getting us back to this point! Here are the details:

After reading logs and code, it was a simple daqdrc config change.

The daqdrc should read something like this:

...
set master_config=".../master";
configure channels begin end;
tpconfig ".../testpoint.par";
...


What had happened was tpconfig was put before the configure channels
begin end.  So when daqd_rcv went to configure its test points it did
not have the channel list configured and could not match test points to
the right model & machine.  Dave and I suspect that this is so that it
can do an request directly to the correct front end instead of a general
broadcast to all awgtpman instances.

Simply reordering the config fixes it.

I tested by opening a test point in dataviewer and verifiying that
testpoints had opened/closed by using diag -l.  Xmgr/grace didn't seem
to be able to keep up with the test point data over a remote connection.

You can find this in the logs by looking for entries like the following
while the daqd is starting up.  When we looked we saw that there was an
entry for every model.

Unable to find GDS node 35 system c1daf in INI fiels
  13323   Wed Sep 20 15:49:26 2017 ranaOmnistructureComputersnew internet

Larry Wallace hooked up a new switch (Brocade FWS 648G) today which is our 40m lab interface to the outside world internet. Its faster.

He then, just now, switched over the cables which were going to the old one into the new one, including NODUS and the NAT Router. CDS machines can still connect to the outside world.

In the next week or two, he'll install a new NAT for us so that we can have high speed comm from CDS to the world.

  13405   Sun Oct 29 16:40:17 2017 ranaSummaryComputersdisk cleanup

Backed up all the wikis. Theyr'e in wiki_backups/*.tar.xz (because xz -9e gives better compression than gzip or bzip2)

Moved old user directories in the /users/OLD/

  13434   Fri Nov 17 16:31:11 2017 aaronOmnistructureComputersAcromag wired up

Acromag Wireup Update

I finished wiring up the Acromags to replace the VME boxes on the x arm. I still need to cut down the bar and get them all tidy in the box, but I wanted to post the wiring maps I made.
I wanted to note specifically that a few of the connections were assigned to VME boxes but are no longer assigned in this Acromag setup. We should be sure that we actually do not need to use the following channels:

Channels no longer in use

  • From the VME analog output (VMIVME 4116) to the QPD Whitening board (no DCC number on the front), 3 channels are no longer in use
  • From the anti-image filter (D000186) to the ADC (VMIVME 3113A) 5 channels are no longer in use (these are the only channels from the anti-image filter, so this filter is no longer in use at all?)
  • From the universal dewhitening filter (D000183) to a binary I/O adapter (channels 1-16), 4 channels are no longer in use. These are the only channels from the dewhitening filter
  • From a second universal dewhitening filter (D000183) to another the binary I/O adapter (channels 1-16), one channel is no longer in use (this was the only channel from this dewhitening filter).
  • From the opti-lever (D010033) to the VME ADC (VMIVME 3113A), 7 channels are no longer in use (this was all of the channels from the opti lever)
  • From the SUS PD Whitening/Interface board (D000210) to a binary I/O adapter (channels 1-16), 5 channels are no longer in use. 
  • Note that none of the binary I/O adapter channels are in use.

 

Attachment 1: AcromagWiringMaps.pdf
AcromagWiringMaps.pdf AcromagWiringMaps.pdf AcromagWiringMaps.pdf AcromagWiringMaps.pdf AcromagWiringMaps.pdf AcromagWiringMaps.pdf AcromagWiringMaps.pdf
  13435   Fri Nov 17 17:10:53 2017 ranaOmnistructureComputersAcromag wired up

Exactly: you'll have to list explicitly what functions those channels had so that we know what we're losing before we make the switch.

  13440   Tue Nov 21 17:51:01 2017 KojiConfigurationComputersnodus post OS migration admin

The post OS migration admin for nodusa bout apache, elogd, svn, iptables, etc can be found in https://wiki-40m.ligo.caltech.edu/NodusUpgradeNov2017

Update: The svn dump from the old svn was done, and it was imported to the new svn repository structure. Now the svn command line and (simple) web interface is running. And "websvn" was also implemented.

  13442   Tue Nov 21 23:47:51 2017 gautamConfigurationComputersnodus post OS migration admin

I restored the nodus crontab (copied over from the Nov 17 backup of the same at /opt/rtcds/caltech/c1/scripts/crontab/crontab_nodus.20171117080001. There wasn't a crontab, so I made one using sudo crontab -e.

This crontab is supposed to execute some backup scripts, send pizza emails, check chiara disk usage, and backup the crontab itself.

I've commented out the backup of nodus' /etc and /export for now, while we get back to fully operational nodus (though we also have a backup of /cvs/cds/caltech/nodus_backup on the external LaCie drive), they can be re-enabled by un-commenting the appropriate lines in the crontab.

Quote:

The post OS migration admin for nodusa bout apache, elogd, svn, iptables, etc can be found in https://wiki-40m.ligo.caltech.edu/NodusUpgradeNov2017

Update: The svn dump from the old svn was done, and it was imported to the new svn repository structure. Now the svn command line and (simple) web interface is running. "websvn" is not installed.

 

  13443   Wed Nov 22 00:54:18 2017 johannesOmnistructureComputersSlow DAQ replacement computer progress

I got the the SuperMicro 1U server box from Larry W on Monday and set it up in the CryoLab for initial testing.

The specs: https://www.supermicro.com/products/system/1U/5015/SYS-5015A-EHF-D525.cfm

The processor is an Intel D525 dual core atom processor with 1.8 GHz (i386 architecture, no 64-bit support). The unit has a 250GB SSD and 4GB RAM.

I installed Debian Jessie on it without any problems and compiled the most recent stable versions of EPICS base (3.15.5), asyn drivers (4-32), and modbus module (2-10-1). EPICS and asyn each took about 10 minutes, and modbus about 1 minute.

I copied the database files and port driver definitions for the cryolab from cryoaux, whose modbus services I suspended, and initialized the EPICS modbus IOC on the SuperMicro machine instead. It's working flawlessly so far, but admittedly the box is not under heavy load in the cryolab, as the framebuilder there is logging only the 16 analog channels.

I have recently worked out some kinks in the port driver and channel definitions, most importantly:

  • mosbus IOC initialization is performed automatically by systemd on reboot
  • If the IOC crashes or a system reboot is required the Acromag units freeze in their last current state. When the IOC is started a single read operation of all A/D registers is performed and the result taken as the initial value of the corresponding channel, causing no discontinuity in generated voltage EVER (except of course for the rare case when the Acromags themselves have to be restarted)

Aaron and I set 12/4 as a tentative date when we will be ready to attempt a swap. Until then the cabling needs to be finished and a channel database file needs to be prepared.

  13445   Wed Nov 22 11:51:38 2017 gautamConfigurationComputersnodus post OS migration admin

Confirmed that this crontab is running - the daily backup of the crontab seems to have successfully executed, and there is now a file crontab_nodus.ligo.caltech.edu.20171122080001 in the directory quoted below. The $HOSTNAME seems to be "nodus.ligo.caltech.edu" whereas it was just "nodus", so the file names are a bit longer now, but I guess that's fine...

Quote:

I restored the nodus crontab (copied over from the Nov 17 backup of the same at /opt/rtcds/caltech/c1/scripts/crontab/crontab_nodus.20171117080001. There wasn't a crontab, so I made one using sudo crontab -e.

This crontab is supposed to execute some backup scripts, send pizza emails, check chiara disk usage, and backup the crontab itself.

I've commented out the backup of nodus' /etc and /export for now, while we get back to fully operational nodus (though we also have a backup of /cvs/cds/caltech/nodus_backup on the external LaCie drive), they can be re-enabled by un-commenting the appropriate lines in the crontab.

 

 

  13458   Wed Nov 29 21:40:30 2017 johannesOmnistructureComputersSlow DAQ replacement computer progress

[Aaron, Johannes]

We configured the AtomServer for the Martian network today. Hostname is c1auxex2, IP is 192.168.113.49. Remote access over SSH is enabled.

There will be 6 acromag units served by c1auxex2.

Hostname Type IP Address
c1auxex-xt1221a 1221 192.168.113.130
c1auxex-xt1221b 1221 192.168.113.131
c1auxex-xt1221c 1221 192.168.113.132
c1auxex-xt1541a 1541 192.168.113.133
c1auxex-xt1541b 1541 192.168.113.134
c1auxex-xt1111a 1111 192.168.113.135

Some hardware to assemble the Acromag box and adapter PCBs are still missing, and the wiring and channel definitions have to be finalized. The port driver initialization instructions and channel definitions are currently locally stored in /home/controls/modbusIOC/ but will eventually be migrated to a shared location, but we need to decide how exactly we want to set up this infrastructure.

  • Should the new machines have the same hostnames as the ones they're replacing? For the transition we simply named it c1auxex2.
  • Because the communication of the server machine with the DAQ modules is happening over TCP/IP and not some VME backplane bus we could consolidate machines, particularly in the vertex area.
  • It would be good to use the fact that these SuperMicro servers have 2+ ethernet ports to separate CDS EPICS traffic from the modbus traffic. That would also keep the 30+ IPs for the Acromag thingies off the Martian host tables.
  13461   Sun Dec 3 05:25:59 2017 gautamConfigurationComputerssendmail installed on nodus

Pizza mail didn't go out last weekend - looking at logfile, it seems like the "sendmail" service was missing. I installed sendmail following the instructions here: https://tecadmin.net/install-sendmail-server-on-centos-rhel-server/

Except that to start the sendmail service, I used systemctl and not init.d. i.e. I ran systemctl start sendmail.service (as root). Test email to myself works. Let's see if it works this weekend. Of course this isn't so critical, more important are the maintenance emails that may need to go out (e.g. disk usage alert on chiara / N2 pressure check, which looks like nodus' responsibilities). 

  13462   Sun Dec 3 17:01:08 2017 KojiConfigurationComputerssendmail installed on nodus

An email has come at 5PM on Dec 3rd.

 

  13463   Mon Dec 4 22:06:07 2017 johannesOmnistructureComputersAcromag XEND progress

I wired up the power distribution, and ethernet cables in the Acromag chassis today. For the time being it's all kind of loose in there but tomorrow the last parts should arrive from McMaster to put everything in its place. I had to unplug some of the wiring that Aaron had already done but labeled everything before I did so. I finalized the IP configuration via USB for all the units, which are now powered through the chassis and active on the network.

I started transcribing the database file ETMXaux.db that is loaded by c1auxex in the format required by the Acromags and made sure that the new c1auxex2 properly functions as a server, which it does.

ToDo-list:

  • Need to calibrate the +/- 10V swing of the analog channels via the USB utility, but that requires wiring the channels to the connectors and should probably be done once the unit sits in the rack
  • Need to wire power from the Sorensens into the chassis. There are +/- 5V, +/- 15V and +/- 20V present. The Acromags need only +12V-32V, for which I plan to use the +20V, and an excitation voltage for the binary channels, for which I'm going to wire the +5V. Should do this through the fuse rails on the side.
  • The current slow binary channels are sinking outputs, same as the XT1111 16-channel module we have. The additional 4 binary outputs of the XT1541 are sourcing, and I'm currently not sure if we can use them with the sos driver and whitening vme boards that get their binary control signals from the slow system.
  • Confirm switching of binary channels (haven't used model XT1111 before, but I assume the definitions are identical to XT1121)
  • Setup remaining essential EPICS channels and confirm that dimensions are the same (as in both give the same voltage for the same requested value)
  • Disconnect DIN cables, attach adapter boards + DSUB cables
  • Testing

 

Quote:

[Aaron, Johannes]

We configured the AtomServer for the Martian network today. Hostname is c1auxex2, IP is 192.168.113.49. Remote access over SSH is enabled.

There will be 6 acromag units served by c1auxex2.

Hostname Type IP Address
c1auxex-xt1221a 1221 192.168.113.130
c1auxex-xt1221b 1221 192.168.113.131
c1auxex-xt1221c 1221 192.168.113.132
c1auxex-xt1541a 1541 192.168.113.133
c1auxex-xt1541b 1541 192.168.113.134
c1auxex-xt1111a 1111 192.168.113.135

Some hardware to assemble the Acromag box and adapter PCBs are still missing, and the wiring and channel definitions have to be finalized. The port driver initialization instructions and channel definitions are currently locally stored in /home/controls/modbusIOC/ but will eventually be migrated to a shared location, but we need to decide how exactly we want to set up this infrastructure.

  • Should the new machines have the same hostnames as the ones they're replacing? For the transition we simply named it c1auxex2.
  • Because the communication of the server machine with the DAQ modules is happening over TCP/IP and not some VME backplane bus we could consolidate machines, particularly in the vertex area.
  • It would be good to use the fact that these SuperMicro servers have 2+ ethernet ports to separate CDS EPICS traffic from the modbus traffic. That would also keep the 30+ IPs for the Acromag thingies off the Martian host tables.
  13468   Thu Dec 7 22:24:04 2017 johannesOmnistructureComputersAcromag XEND progress

 

Quote:
 
  • Need to calibrate the +/- 10V swing of the analog channels via the USB utility, but that requires wiring the channels to the connectors and should probably be done once the unit sits in the rack
  • Need to wire power from the Sorensens into the chassis. There are +/- 5V, +/- 15V and +/- 20V present. The Acromags need only +12V-32V, for which I plan to use the +20V, and an excitation voltage for the binary channels, for which I'm going to wire the +5V. Should do this through the fuse rails on the side.
  • The current slow binary channels are sinking outputs, same as the XT1111 16-channel module we have. The additional 4 binary outputs of the XT1541 are sourcing, and I'm currently not sure if we can use them with the sos driver and whitening vme boards that get their binary control signals from the slow system.
  • Confirm switching of binary channels (haven't used model XT1111 before, but I assume the definitions are identical to XT1121)
  • Setup remaining essential EPICS channels and confirm that dimensions are the same (as in both give the same voltage for the same requested value)
  • Disconnect DIN cables, attach adapter boards + DSUB cables
  • Testing

Getting the chassis ready took a little longer than anticipated, mostly because I had not looked into the channel list myself before and forgot about Lydia's post which mentions that some of the switching controls have to be moved from the fast to the slow DAQ. We would need a total of 5+5+4+8=22 binary outputs. With the existing Acromag units we have 16 sinking outputs and 8 sourcing outputs. I looked through all the Eurocrate modules and confirmed that they all use the same switch topology which has sourcing inputs.

While one can use a pull-down resistor to control a sourcing input with a sourcing output,

pulling down the MAX333A input (datasheet says logic low is <0.8V) requires something like 100 Ohms for the pull down resistor, which would require ~150mA of current PER CHANNEL, which is unreasonable. Instead, I asked Steve to buy a second XT1111 and modified the chassis to accomodate more Acromag units.

I have now finished wiring the chassis (except for 8 remaining bypass controls to the whitening board which need the second XT1111), calibrated all channels in use, confirmed all pin locations via the existing breakout boards and DCC drawings for the eurocrate modules, and today Steve and I added more fuses to the DIN rail power distribution for +20V and +15V.

There was not enough contingent free space in the XEND rack to mount the chassis, so for now I placed it next to it.

c1auxex2 is currently hosting all original physical c1auxex channels (not yet calc records) under their original name with an _XT added at the end to avoid duplicate channel names. c1auxex is still in control of ETMX. All EPICS channels hosted by c1auxex2 are in dimensions of Volts. The plan for tomorrow is to take c1auxex off the grid, rename the c1auxex2 hosted channels and transfer ETMX controls to it, provided we can find enough 37pin DSub cables (8). I made 5 adapter boards for the 5 Eurocrate modules that need to talk to the slow DAQ through their backplane connector.

  13469   Fri Dec 8 12:06:59 2017 johannesOmnistructureComputersc1auxex2 ready - but need more cables

The new slow machine c1auxex2 is ready to deploy. Unfortunately we don't have enough 37pin DSub cables to connect all channels. In fact, we need a total of 8, and I found only three male-male cables and one gender changer. I asked Steve to buy more.

Over the past week I have transferred all EPICS records - soft channels and physical ones - from c1auxex to c1auxex2, making changes where needed. Today I started the in-situ testing

  1. Unplugged ETMX's satellite box
  2. Unplugged the eurocrate backplane DIN cables from the SOS Driver and QPD Whitening filter modules (the ones that receive ao channels)
  3. Measured output voltages on the relevant pins for comparison after the swap
  4. Turned off c1auxex by key, removed ethernet cable
  5. Started the modbus ioc on c1auxex2
  6. Slow machine indicator channels came online, ETMX Watchdog was responsive (but didn't have anything to do due to missing inputs) and reporting. PIT/YAW sliders function as expected
  7. Restoring the previous settings gives output voltages close to the previous values, in fact the exact values requested (due to fresh calibration)
  8. Last step is to go live with c1auxex2 and confirm the remaining channels work as expected.

I copied the relevant files to start the modbus server to /cvs/cds/caltech/target/c1auxex2, although kept local copies in /home/controls/modbusIOC/ from which they're still run.

I wonder what's the best practice for this. Probably to store the database files centrally and load them over the network on server start?

  13487   Mon Dec 18 17:48:09 2017 ranaUpdateComputersrossa: SL7.3 upgrade continues

Following instructions from LLO-CDS fo the rossa upgrade. Last time there were some issues with not being to access the LLO EPEL repos, but this time it seems to be working fine.

After adding font aliases, need to run 'sudo xset fp rehash' to get the new aliases to take hold. Afterwards, am able to use MEDM and sitemap just fine.

But diaggui won't run because of a lib-sasl error. Try 'sudo yum install gds-all'.

diaggui: error while loading shared libraries: libsasl2.so.2: cannot open shared object file: No such file or directorycrying (have contacted LLO CDS admins)

X-windows keeps crashing with SL7 and this big monitor. Followed instructions on the internet to remove the generic 'Nouveau' driver and install the proprietary NVDIA drivers by dropping to run level 3 and runnning some command line hoodoo to modify the X-files. Now I can even put the mouse on the left side of the screen and it doesn't crash. laugh

  13504   Fri Jan 5 17:50:47 2018 ranaConfigurationComputersmotif on nodus

I had to do 'sudo yum install motif' on nodus so that we could get libXm.so.4 so that we could run MEDM. Works now.

  13539   Fri Jan 12 12:31:04 2018 gautamConfigurationComputerssendmail troubles on nodus

I'm having trouble getting the sendmail service going on nodus since the Christmas day power failure - for some reason, it seems like the mail server that sendmail uses to send out emails on nodus (mx1.caltech.iphmx.com, IP=68.232.148.132) is on a blacklist! Not sure how exactly to go about remedying this.

Running sudo systemctl status sendmail.service -l also shows a bunch of suspicious lines:

Jan 12 10:15:27 nodus.ligo.caltech.edu sendmail[6958]: STARTTLS=client, relay=cluster6a.us.messagelabs.com., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-GCM-SHA384, bits=256/256
Jan 12 10:15:45 nodus.ligo.caltech.edu sendmail[6958]: w0A7QThE032091: to=<umakant.rapol@iiserpune.ac.in>, ctladdr=<controls@nodus.ligo.caltech.edu> (1001/1001), delay=2+10:49:16, xdelay=00:00:39, mailer=esmtp, pri=5432408, relay=cluster6a.us.messagelabs.com. [216.82.251.230], dsn=4.0.0, stat=Deferred: 421 Service Temporarily Unavailable
Jan 12 11:15:23 nodus.ligo.caltech.edu sendmail[10334]: STARTTLS=client, relay=cluster6a.us.messagelabs.com., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-GCM-SHA384, bits=256/256
Jan 12 11:15:31 nodus.ligo.caltech.edu sendmail[10334]: w0A7QThE032091: to=<umakant.rapol@iiserpune.ac.in>, ctladdr=<controls@nodus.ligo.caltech.edu> (1001/1001), delay=2+11:49:02, xdelay=00:00:27, mailer=esmtp, pri=5522408, relay=cluster6a.us.messagelabs.com. [216.82.251.230], dsn=4.0.0, stat=Deferred: 421 Service Temporarily Unavailable
Jan 12 12:15:25 nodus.ligo.caltech.edu sendmail[13747]: STARTTLS=client, relay=cluster6a.us.messagelabs.com., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-GCM-SHA384, bits=256/256
Jan 12 12:15:42 nodus.ligo.caltech.edu sendmail[13747]: w0A7QThE032091: to=<umakant.rapol@iiserpune.ac.in>, ctladdr=<controls@nodus.ligo.caltech.edu> (1001/1001), delay=2+12:49:13, xdelay=00:00:33, mailer=esmtp, pri=5612408, relay=cluster6a.us.messagelabs.com. [216.82.251.230], dsn=4.0.0, stat=Deferred: 421 Service Temporarily Unavailable

 

Why is nodus attempting to email umakant.rapol@iiserpune.ac.in?

  13540   Fri Jan 12 16:01:27 2018 KojiConfigurationComputerssendmail troubles on nodus

I personally don't like the idea of having sendmail (or something similar like postfix) on a personal server as it requires a lot of maintenance cost (like security update, configuration, etc). If we can use external mail service (like gmail) via gmail API on python, that would easy our worry, I thought.

  13542   Fri Jan 12 18:22:09 2018 gautamConfigurationComputerssendmail troubles on nodus

Okay I will port awade's python mailer stuff for this purpose.

gautam 14Jan2018 1730: Python mailer has been implemented: see here for the files. On shared drive, the files are at /opt/rtcds/caltech/c1/scripts/general/pizza/pythonMailer/

gautam 11Feb2018 1730: The python mailer had never once worked successfully in automatically sending the message. I realized this may be because I had put the script on the root user's crontab, but had setup the authentication keyring with the password for the mailer on the controls user. So I have now setup a controls user crontab, which for now just runs the pizza mailing. let's see if this works next Sunday...

Quote:

I personally don't like the idea of having sendmail (or something similar like postfix) on a personal server as it requires a lot of maintenance cost (like security update, configuration, etc). If we can use external mail service (like gmail) via gmail API on python, that would easy our worry, I thought.

 

  13545   Sat Jan 13 02:36:51 2018 ranaConfigurationComputerssendmail troubles on nodus

I think sendmail is required on nodus since that's how the dokuwiki works. That's why the dokuwiki was trying to send an email to Umakant.

  13546   Sat Jan 13 03:20:55 2018 KojiConfigurationComputerssendmail troubles on nodus

I know it, and I don't like it. DokuWiki seems to allow us to use an external server for notification emails. That would be the way to go.

  13681   Tue Mar 13 20:03:16 2018 johannesConfigurationComputersc1auxex replacement

I assembled the rack-mount server that will long-term replace c1auxex, so we can return the borrowed unit to Larry.

SUPERMICRO SYS-5017A-EP Specs:

  • Intel Atom N2800 (2 cores, 1.8GHz, 1MB, 64-bit)
  • 4GB (2x2GB) DDR3 RAM
  • 128 GB SSD

IMG_20180313_105154890.jpg      IMG_20180313_133031002.jpg

I installed a standard Debian Jessie distribution, with option LXDE for minimal resource usage. Steps taken after fresh install

  1. Give controls sudo permission: usermod -aG sudo controls
  2. mkdir /cvs/cds
  3. apt-get install nfs-common
  4. Added line "chiara:/home/cds              /cvs/cds        nfs     rw,bg,nfsvers=3" to end of /etc/fstab
  5. Configured network adapter in /etc/network/interfaces
            iface eth0 inet static
            address 192.168.113.48
            netmask 255.255.255.0
            gateway 192.168.113.2
            dns-nameservers 192.168.113.104 131.215.125.1 131.215.139.100
            dns-search martian

    I first assigned the IP 192.168.113.59 of the original c1auxex, but for some reason my ssh connections kept failing mid-session. After I switched to a different IP the disruption no longer happened.
  6. Add lines "search martian" and "nameserver 192.168.113.104" to /etc/resolv.conf
  7. apt-get install openssh-server
    At this point the unit was ready for remote connections on the martian network, and I moved it to the XEND.
  8. Added lines to /home/controls/.bashrc to set paths and environment variables:
    export PATH=/cvs/cds/rtapps/epics-3.14.12.2_long/base/bin/linux-x86_64:/cvs/cds/rtapps/epics-3.14.12.2_long/extensions/bin/linux-x86_64:$PATH
    export HOST_ARCH=linux-x86_64
    export EPICS_HOST_ARCH=linux-x86_64
    export RPN_DEFNS=~/.defns.rpn
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cvs/cds/rtapps/epics-3.14.12.2_long/base/lib/linux-x86_64:/cvs/cds/rtapps/epics-3.14.12.2_long/modules/modbus/lib/linux-x86_64/:/cvs/cds/rtapps/epics-3.14.12.2_long/modules/asyn/lib/linux-x86_64
  9. apt-get install libmotif-common libmotif4 libxp6 (required to run burtwb utility)

The server is ready to take over for c1auxex2 and does not need any local epics compiled, since it can run the 3.14.12.2_long binaries in /cvs/cds.

Attachment 1: IMG_20180313_105154890.jpg
IMG_20180313_105154890.jpg
Attachment 2: IMG_20180313_133031002.jpg
IMG_20180313_133031002.jpg
  13682   Wed Mar 14 23:58:30 2018 johannesConfigurationComputersc1auxex replacement

I replaced the borrowed server with the permanent one today. Before Removing the current server, Before, I performed several additional preparations:

  • Updated Chiara hostables to IP 192.168.113.48 for c1auxex
  • apt-get install procserv
  • copied ETMXaux2.* files in /cvs/cds/caltech/target/c1auxex2 to ETMXaux.* and changed references from /opt/rtcds/epics (which was a local directory on c1auxex2) to /cvs/cds/rtapps/epics-3.14.12.2_long in the copied files
  • Added instruction
    Environment="LD_LIBRARY_PATH=/cvs/cds/rtapps/epics-3.14.12.2_long/base/lib/linux-x86_64:/cvs/cds/rtapps/epics-3.14.12.2_long/modules/modbus/lib/linux-x86_64/:/cvs/cds/rtapps/epics-3.14.12.2_long/modules/asyn/lib/linux-x86_64"
    to /etc/systemd/system/modbusIOC.service  (required for burtwb dependencies)

Then I replaced the server:

  1. IFO was in LSC mode with both arms locked
  2. Backed up ETMX alignment using save feature in IFOalign screen
  3. Disengaged LSC mode
  4. Shut down ETMX watchdog
  5. Disconnected ETMX satellite box
  6. Shut down c1auxex2 and c1auxex
  7. Performed the server swap
  8. Booted c1auxex
  9. Made sure EPICS channels were back online and channel defaults were restored
  10. Reconnected satellite box
  11. Turned on watchdog
  12. Turned on OpLevs
  13. Engaged LSC mode -> both arms were instantly locked

I returned c1auxex2 to Larry, who needed it back asap because of some hardware failure

Steve: Acromag XT1221 ordered 3-15-18

  13683   Thu Mar 15 16:00:25 2018 Larry WallaceSummaryComputersCert renewal for NODUS

The cert for nodus has been renewed for another 2 years.

The following is the basic procedure for getting a new cert: (Note certs are only good for two years as of 2018)
openssl req -sha256 -nodes -newkey rsa:2048 -keyout nodus.ligo.caltech.edu.key -out nodus.ligo.caltech.edu.csr
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:CaliforniaLocality Name (eg, city) []:Pasadena
Organization Name (eg, company) [Internet Widgits Pty Ltd]:California Institute of Technology
Organizational Unit Name (eg, section) []:LIGO
Common Name (eg, YOUR name) []:nodus.ligo.caltech.edu

Leave the e-mail address, challenge password and optional company name blank. A new private key will be generated.
chown root nodus.ligo.caltech.edu.key
chgrp root nodus.ligo.caltech.edu.key
chmod 0600 nodus.ligo.caltech.edu.key

The nodus.ligo.caltech.edu.csr file is what is sent in for the cert.
This file should be sent to either ryan@ligo.caltech.edu or security@caltech.edu and copy wallace_l@ligo.caltech.edu.

A URL llink with the new cert to be downloaded will be sent to the requestor.

Once the files are downloaded, the new cert and intermediate cert, they can be copied and renamed.

The PEM-encoded host certificate by itself is saved at:

  /etc/httpd/ssl/nodus.ligo.caltech.edu.crt

The nodus.ligo.caltech.edu.key file should be in the same directory or whichever directory is indicated in the ssl.conf located in /etc/httpd/conf.d/  directory.

httpd will need to be restarted in order for it to see the new cert.

 

  13687   Mon Mar 19 14:39:09 2018 johannesConfigurationComputersc1auxex replacement

[gautam, johannes]

The temperature control output channel for the XEND seismometer wasn't working properly. The EPICS channel existed, could be written to and read from, but no physical voltage was observed on the (confirmed properly) wired connector.

The Acromag DAC that outputs this channel was completely spare in the original scheme and does not serve any other channels at the moment. We found it to be unresponsive to ping from the host machine (reminder: the Acromags are on their own subnet with IPs 192.168.114.xxx connected to the secondary ethernet adapter of c1auxex), while all others returned the ping just fine. The modules have daisy-chained ethernet connections, and the one Acromag unit behind the unresponsive one in the chain was still responding to ping and its channels were working, so it couldn't have been a problem with the (ethernet) cabling.

Gautam and I power-cycled the chassis and server, which resolved the issue. The channel is now outputting the requested voltage on the Out1 BNC connector of the chassis (front). When I was setting up the whole system and did frequent rebooting and IP-redefinitions I have seen network issues arise between server and Acromags. In particular, when changing the network settings server-side, the Acromags needed to reboot occasionally. So this whole problem was probably due to the recent server-swap, as the chassis had not been power-cycled since.

 

During the debugging we also found that the c1psl2 channels were not working. This was because I had overlooked to update the epics environment variables for the modbus path defined in /cvs/cds/caltech/target/c1psl2/npro_config.cmd from the local installation /opt/epics/ (which doesn't exist on the new server anymore) to the network location /cvs/cds/rtapps/epics-3.14.12.2_long/. This has been fixed and the slow diagnostic PSL channels are recording again.

  13761   Wed Apr 18 17:15:35 2018 ranaConfigurationComputersNODUS: no xmgrace for dataviewer

Turns out, there is no RPM for XmGrace on Scientific Linux 7. Since this is the graphic output of dataviewer, we can't use dataviewer through X windows until this gets fixed. CDS is looking into a xmGrace replacement, but it would be better if we can hijack a alt RH repo to steal a temporary xmgrace RPM. KT has been pinged.

ELOG V3.1.3-