40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 262 of 344  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  12724   Mon Jan 16 22:03:30 2017 jamieConfigurationComputersMegatron update
Quote:
 

We should consider upgrading a few of our workstations to Ubuntu 14 LTS to see how painful it is to run our scripts and DTT and DV. Better to upgrade a bit before we are forced to by circumstance.

I would recommend upgrading the workstations to one of the reference operating systems, either SL7 or Debian squeeze, since that's what the sites are moving towards.  If you do that you can just install all the control room software from the supported repos, and not worry about having to compile things from source anymore.

  12849   Thu Feb 23 15:48:43 2017 johannesUpdateComputersc1psl un-bootable

Using the PDA520 detector on the AS port I tried to get some better estimates for the round-trip loss in both arms. While setting up the measurement I noticed some strange output on the scope I'm using to measure the amount of reflected light.

The interferometer was aligned using the dither scripts for both arms. Then, ITMY was majorly misaligned in pitch AND yaw such that the PD reading did not change anymore. Thus, only light reflected from the XARM was incident of the AS PD. The scope was showing strange oscillations (Channel 2 is the AS PD signal):

For the measurement we compare the DC level of the reflection with the ETM aligned (and the arm locked) vs a misaligned ETM (only ITM reflection). This ringing could be observed in both states, and was qualitatively reproducible with the other arm. It did not show up in the MC or ARM transmission. I found that changing the pitch of the 'active' ITM (=of the arm under investigation) either way by just a couple of ticks made it go away and settle roughly at the lower bound of the oscillation:

In this configuration the PD output follows the mode cleaner transmission (Channel 3 in the screen caps) quite well, but we can't take the differential measurement like this, because it is impossible to align and lock the arm but them misalign the ITM. Moving the respective other ITM for potential secondary beams did not seem to have an obvious effect, although I do suspect a ghost/secondary beam to be the culprit for this. I moved the PDA520 on the optical table but didn't see a change in the ringing amplitude. I do need to check the PD reflection though.

Obviously it will be hard to determine the arm loss this way, but for now I used the averaging function of the scope to get rid of the ringing. What this gave me was:
(16 +/- 9) ppm losses in the x-arm and (-18+/-8) ppm losses in the y-arm

The negative loss obviously makes little sense, and even the x-arm number seems a little too low to be true. I strongly suspect the ringing is responsible and wanted to investigate this further today, but a problem with c1psl came up that shut down all work on this until it is fixed:

I found the PMC unlocked this morning and c1psl (amongst other slow machines) was unresponsive, so I power-cycled them. All except c1psl came back to normal operation. The PMC transmission, as recorded by c1psl,  shows that it has been down for several days:

Repeated attempts to reset and/or power-cycle it by Gautam and myself could not bring it back. The fail indicator LED of a single daughter card (the DOUT XVME-212) turns off after reboot, all others stay lit. The sysfail LED on the crate is also on, but according to elog 10015 this is 'normal'. I'm following up that post's elog tree to monitor the startup of c1psl through its system console via a serial connection to find out what is wrong.

  12850   Thu Feb 23 18:52:53 2017 ranaUpdateComputersc1psl un-bootable

The fringes seen on the oscope are mostly likely due to the interference from multiple light beams. If there are laser beams hitting mirrors which are moving, the resultant interference signal could be modulated at several Hertz, if, for example, one of the mirrors had its local damping disabled.

  12851   Thu Feb 23 19:44:48 2017 johannesUpdateComputersc1psl un-bootable

Yes, that was one of the things that I wanted to look into. One thing Gautam and I did that I didn't mention was to reconnect the SRM satellite box and move the optic around a bit, which didn't change anything. Once the c1psl problem is fixed we'll resume with that.

Quote:

The fringes seen on the oscope are mostly likely due to the interference from multiple light beams. If there are laser beams hitting mirrors which are moving, the resultant interference signal could be modulated at several Hertz, if, for example, one of the mirrors had its local damping disabled.

 

Speaking of which:

Using one of the grey RJ45 to D-Sub cables with an RS232 to USB adapter I was able to capture the startup log of c1psl (using the usb camera windows laptop). I also logged the startup of the "healthy" c1aux, both are attached. c1psl stalls at a point were c1aux starts testing for present vme modules and doesn't continue, however is not strictly hung up, as it still registers to the logger when external login attempts via telnet occur. The telnet client simply reports that the "shell is locked" and exits. It is possible that one of the daughter cards causes this. This seems to happen after iocInit is called by the startup script at /cvs/cds/caltech/target/c1psl/startup.cmd, as it never gets to the next item "coreRelease()". Gautam and I were trying to find out what happends inside iocInit, but it's not clear to us at this point from where it is even called. iocInit.c and compiled binaries exist in several places on the shared drive. However, all belong to R3.14.x epics releases, while the logfile states that the R3.12.2 epics core is used when iocInit is called.

Next we'll interrupt the autoboot procedure and try to work with the machine directly.

  12852   Fri Feb 24 20:38:01 2017 johannesUpdateComputersc1psl boot-stall culprit identified

[Gautam, Johannes]

c1psl finally booted up again, PMC and IMC are locked.

Trying to identify the hickup from the source code was fruitless. However, since the PMCTRANSPD channel acqusition failure occured long before the actual slow machine crashed, and since the hickup in the boot seemed to indicate a problem with daughter module identification, we started removing the DIO and DAQ modules:

  1. Started with the ones whose fail LED stayed lit during the boot process: the DIN (XVME-212) and the three DACs (VMIVME4113). No change.
  2. Also removed the DOUT (XVME-220) and the two ADCs (VMIVME 3113A and VMIVME3123). It boots just fine and can be telnetted into!
  3. Pushed the DIN and the DACs back in. Still boots.
  4. Pushed only VMIVME3123 back in. Boot stalls again.
  5. Removed VMIVME3123, pushed VMIVME 3113A back in. Boots successfully.
  6. Left VMIVME3123 loose in the crate without electrical contact for now.
  7. Proceeded to lock PMC and IMC

The particle counter channel should be working again.

  • VMIVME3123 is a 16-Bit High-Throughput Analog Input Board, 16 Channels with Simultaneous Sample-and-Hold Inputs
  • VMIVME3113A is a Scanning 12-Bit Analog-to-Digital Converter Module with 64 channels

/cvs/cds/caltech/target/c1psl/psl.db lists the following channels for VMIVME3123:

Channels currently in use (and therefore not available in the medm screens):

  • C1:PSL-FSS_SLOW_MON
  • C1:PSL-PMC_PMCERR
  • C1:PSL-FSS_SLOWM
  • C1:PSL-FSS_MIXERM
  • C1:PSL-FSS_RMTEMP
  • C1:PSL-PMC_PMCTRANSPD

Channels not currently in use (?):

  • C1:PSL-FSS_MINCOMEAS
  • C1:PSL-FSS_RCTRANSPD
  • C1:PSL-126MOPA_126MON
  • C1:PSL-126MOPA_AMPMON
  • C1:PSL-FSS_TIDALINPUT
  • C1:PSL-FSS_TIDALSET
  • C1:PSL-FSS_RCTEMP
  • C1:PSL-PPKTP_TEMP

There are plenty of channels available on the asynchronous ADC, so we could wire the relevant ones there if we done care about the 16 bit synchronous sampling (required for proper functionality?)

Alternatively, we could prioritize the Acromag upgrade on c1psl (DAQ would still be asynchronous, though). The PCBs are coming in next Monday and the front panels on Tuesday.

 

 

Some more info that might come in handy to someone someday:

The (nameless?) Windows 7 laptop that lives near MC2 and is used for the USB microscope was used for interfacing with c1psl. No special drivers were necessary to use the USB to RS232 adapter, and the RJ45 end of the grey homemade DB9 to RJ45 cable was plugged into the top port which is labeled "console 1". I downloaded the program "CoolTerm" from http://freeware.the-meiers.org/#CoolTerm, which is a serial protocol emulator, and it worked out of the box with the adapter. The standard settings fine worked for communicating with c1psl, only a small modification was necessary: in Options>Terminal make sure that "Enter Key Emulation" is set from "CR+LF" to "CR", otherwise each time 'Enter' is pressed it is actually sent twice.

  12854   Tue Feb 28 01:28:52 2017 johannesUpdateComputersc1psl un-bootable

It turned out the 'ringing' was caused by the respective other ETM still being aligned. For these reflection measurements both test masses of the other arm need to be misaligned. For the ETM it's sufficient to use the Misalign button in the medm screens, while the ITM has to be manually misaligned to move the reflected beam off the PD.

I did another round of armloss measurements today. I encountered some problems along the way

  • Some time today (around 6pm) most of the front end models had crashed and needed to be restarted GV: actually it was only the models on c1lsc that had crashed. I noticed this on Friday too.
  • ETMX keeps getting kicked up seemingly randomly. However, it settles fast into it's original position.

General Stuff:

  • Oscilloscope should sample both MC power (from MC2 transmitted beam) and AS signal
  • Channel data can only be loaded from the scope one channel at a time, so 'stop' scope acquisition and then grab the relevant channels individually
  • Averaging needs to be restarted everytime the mirrors are moved triggering stop and run remotely via the http interface scripts does this.

Procedure:

  1.     Run LSC Offsets
  2.     With the PSL shutter closed measure scope channel dark offsets, then open shutter
  3.     Align all four test masses with dithering to make sure the IFO alignment is in a known state
  4.     Pick an arm to measure
  5.     Turn the other arm's dither alignment off
  6.     'Misalign' that arm's ETM using medm screen button
  7.     Misalign that arm's ITM manually after disabling its OpLev servos looking at the AS port camera and make sure it doesn't hit the PD anymore.
  8.     Disable dithering for primary arm
  9.     Record MC and AS time series from (paused) scope
  10.     Misalign primary ETM
  11.     Repeat scope data recording

Each pair of readings gives the reflected power at the AS port normalized to the IMC stored power:

\widehat{P}=\frac{P_{AS}-\overline{P}_{AS}^\mathrm{dark}}{P_{MC}-\overline{P}_{MC}^\mathrm{dark}}

which is then averaged. The loss is calculated from the ratio of reflected power in the locked (L) vs misaligned (M) state from

\mathcal{L}=\frac{T_1}{4\gamma}\left[1-\frac{\overline{\widehat{P}_L}}{\overline{\widehat{P}_M}} +T_1\right ]-T_2

Acquiring data this way yielded P_L/P_M=1.00507 +/- 0.00087 for the X arm and P_L/P_M=1.00753 +/- 0.00095 for the Y arm. With \gamma_x=0.832 and \gamma_x=0.875 (from m1=0.179, m2=0.226 and 91.2% and 86.7% mode matching in X and Y arm, respectively) this yields round trip losses of:

\mathcal{L}_X=21\pm4\,\mathrm{ppm}  and  \mathcal{L}_Y=13\pm4\,\mathrm{ppm}, which is assuming a generalized 1% error in test mass transmissivities and modulation indices. As we discussed, this seems a little too good to be true, but at least the numbers are not negative.

  12943   Thu Apr 13 21:01:20 2017 ranaConfigurationComputersLG UltraWide on Rossa

we installed a new curved 34" doublewide monitor on Rossa, but it seems like it has a defective dead pixel region in it. Unless it heals itself by morning, we should return it to Amazon. Please don't throw out he packing materials.

Steve 8am next morning: it is still bad The monitor is cracked. It got kicked while traveling. It's box is damaged the same place.

Shipped back 4-17-2017

  12965   Wed May 3 16:12:36 2017 johannesConfigurationComputerscatastrophic multiple monitor failures

It seems we lost three monitors basically overnight.

The main (landscape, left) displays of Pianosa, Rossa and Allegra are all broken with the same failure mode:

their backlights failed. Gautam and I confirmed that there is still an image displayed on all three, just incredibly faint. While Allegra hasn't been used much, we can narrow down that Pianosa's and Rossa's monitors must have failed within 5 or 6 hours of each other, last night.

One could say ... they turned to the dark side cool

Quick edit; There was a functioning Dell 24" monitor next to the iMac that we used as a replacement for Pianosa's primary display. Once the new curved display is paired with Rossa we can use its old display for Donatella or Allegra.

  12966   Wed May 3 16:46:18 2017 KojiConfigurationComputerscatastrophic multiple monitor failures

- Is there any machine that can handle 4K? I have one 4K LCD for no use.
- I also can donate one 24" Dell

  12971   Thu May 4 09:52:43 2017 ranaConfigurationComputerscatastrophic multiple monitor failures

That's a new failure mode. Probably we can't trust the power to be safe anymore.

Need Steve to order a couple of surge suppressing power strips for the monitors. The computers are already on the UPS, so they don't need it.

  12978   Tue May 9 15:23:12 2017 SteveConfigurationComputerscatastrophic multiple monitor failures

Gautam and Steve,

Surge protective power strip was install on Friday, May 5 in the Control Room

Computers not connected to the UPS are plugged into Isobar12ultra.

Quote:

That's a new failure mode. Probably we can't trust the power to be safe anymore.

Need Steve to order a couple of surge suppressing power strips for the monitors. The computers are already on the UPS, so they don't need it.

 

  12993   Mon May 15 20:43:25 2017 ranaConfigurationComputerscatastrophic multiple monitor failures

this is not the right one; this Ethernet controlled strip we want in the racks for remote control.

Buy some of these for the MONITORS.

Quote:

Surge protective power strip was install on Friday, May 5 in the Control Room

Computers not connected to the UPS are plugged into Isobar12ultra.

Quote:

That's a new failure mode. Probably we can't trust the power to be safe anymore.

Need Steve to order a couple of surge suppressing power strips for the monitors. The computers are already on the UPS, so they don't need it.

 

  13037   Sun Jun 4 14:19:33 2017 ranaFrogsComputersNetwork slowdown: Martians are behind a waterwall

A few weeks ago we did some internet speed tests and found a dramatic difference between our general network and our internal Martian network in terms of access speed to the outside world.

As you can see, the speed from nodus is consistent with a Gigabit connection. But the speeds from any machine on the inside is ~100x slower. We need to take a look at our router / NAT setup to see if its an old hardware problem or just something in the software firewall. By comparison, my home internet download speed test is ~48 Mbit/s; ~6x faster than our CDS computers.


controls@megatron|~> speedtest
/usr/local/bin/speedtest:5: UserWarning: Module dap was already imported from None, but /usr/lib/python2.7/dist-packages is being added to sys.path
  from pkg_resources import load_entry_point
Retrieving speedtest.net configuration...
Testing from Caltech (131.215.115.189)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Race Communications (Los Angeles, CA) [29.63 km]: 6.52 ms
Testing download speed................................................................................
Download: 6.35 Mbit/s
Testing upload speed................................................................................................
Upload: 5.10 Mbit/s
controls@megatron|~> exit
logout
Connection to megatron closed.
controls@nodus|~ > speedtest
Retrieving speedtest.net configuration...
Testing from Caltech (131.215.115.52)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Phyber Communications (Los Angeles, CA) [29.63 km]: 2.196 ms
Testing download speed................................................................................
Download: 721.92 Mbit/s
Testing upload speed................................................................................................
Upload: 251.38 Mbit/s

  13044   Mon Jun 5 21:53:55 2017 ranaUpdateComputersrossa: ubuntu 16.04

With the network config, mounting, and symlinks setup, rossa is able to be used as a workstation for dataviewer and MEDM. For DTT, no luck since there is so far no lscsoft support past the Ubuntu14 stage.

  13050   Wed Jun 7 15:41:51 2017 SteveUpdateComputerswindow laptop scanned

Randy Trudeau scanned our Window laptop Dell 13" Vostro and Steve's memory stick for virus. Nothing was found. The search continues...

Rana thinks that I'm creating these virus beasts with taking pictures with Dino Capture and /or Data Ray on the window machine........

 

 

  13065   Thu Jun 15 14:24:48 2017 Kaustubh, JigyasaUpdateComputersOttavia Switched On

Today, I and Jigyasa connected the Ottavia to one of the unused monitor screens Donatella. The Ottavia CPU had a label saying 'SMOKED''. One of the past elogs, 11091, dated back in March 2015, by Jenne had an update regarding the Ottavia smelling 'burny'. It seems to be working fine for about 2 hours now. Once it is connected to the Martian Network we can test it further. The Donatella screen we used seems to have a graphic problem, a damage to the display screen. Its a minor issue and does not affect the display that much, but perhaps it'll be better to use another screen if we plan to use the Ottavia in the future. We will power it down if there is an issue with it.

  13067   Thu Jun 15 19:49:03 2017 Kaustubh, JigyasaUpdateComputersOttavia Switched On

It has been working fine the whole day(we didn't do much testing on it though). We are leaving it on for the night.

Quote:

Today, I and Jigyasa connected the Ottavia to one of the unused monitor screens Donatella. The Ottavia CPU had a label saying 'SMOKED''. One of the past elogs, 11091, dated back in March 2015, by Jenne had an update regarding the Ottavia smelling 'burny'. It seems to be working fine for about 2 hours now. Once it is connected to the Martian Network we can test it further. The Donatella screen we used seems to have a graphic problem, a damage to the display screen. Its a minor issue and does not affect the display that much, but perhaps it'll be better to use another screen if we plan to use the Ottavia in the future. We will power it down if there is an issue with it.

 

  13068   Fri Jun 16 12:37:47 2017 Kaustubh, JigyasaUpdateComputersOttavia Switched On

Ottavia had been left running overnight and it seems to work fine. There has been no smell or any noticeable problems in the working. This morning Gautam, Kaustubh and I connected Ottavia to the Matrian Network through the Netgear switch in the 40m lab area. We were able to SSH into Ottavia through Pianosa and access directories. On the ottavia itself we were able to run ipython, access the internet. Since it seems to work out fine, Kaustubh and I are going to enable the ethernet connection to Ottavia and secure the wiring now.  

Quote:

It has been working fine the whole day(we didn't do much testing on it though). We are leaving it on for the night.

Quote:

Today, I and Jigyasa connected the Ottavia to one of the unused monitor screens Donatella. The Ottavia CPU had a label saying 'SMOKED''. One of the past elogs, 11091, dated back in March 2015, by Jenne had an update regarding the Ottavia smelling 'burny'. It seems to be working fine for about 2 hours now. Once it is connected to the Martian Network we can test it further. The Donatella screen we used seems to have a graphic problem, a damage to the display screen. Its a minor issue and does not affect the display that much, but perhaps it'll be better to use another screen if we plan to use the Ottavia in the future. We will power it down if there is an issue with it.

 

 

  13071   Fri Jun 16 23:27:19 2017 Kaustubh, JigyasaUpdateComputersOttavia Connected to the Netgear Box

I just connected the Ottavia to the Netgear box and its working just fine. It'll remain switched on over the weekend.

Quote:

Kaustubh and I are going to enable the ethernet connection to Ottavia and secure the wiring now.  

 

  13154   Mon Jul 31 20:35:42 2017 KojiSummaryComputersChiara backup situation summary

Summary
- CDS Shared files system: backed up
- Chiara system itself: not backed up


controls@chiara|~> df -m
Filesystem     1M-blocks    Used Available Use% Mounted on
/dev/sda1         450420   11039    416501   3% /
udev               15543       1     15543   1% /dev
tmpfs               3111       1      3110   1% /run
none                   5       0         5   0% /run/lock
none               15554       1     15554   1% /run/shm
/dev/sdb1        2064245 1718929    240459  88% /home/cds
/dev/sdd1        1877792 1426378    356028  81% /media/fb9bba0d-7024-41a6-9d29-b14e631a2628
/dev/sdc1        1877764 1686420     95960  95% /media/40mBackup

/dev/sda1 : System boot disk
/dev/sdb1 : main cds disk file system 2TB partition of 3TB disk (1TB vacant)
/dev/sdc1 : Daily backup of /dev/sdb1 via a cron job (/opt/rtcds/caltech/c1/scripts/backup/localbackup)

/dev/sdd1 : 2014 snap shot of cds. Not actively used. USB

https://nodus.ligo.caltech.edu:8081/40m/11640

 

  13159   Wed Aug 2 14:47:20 2017 KojiSummaryComputersChiara backup situation summary

I further made the burt snapshot directories compressed along with ELOG 11640. This freed up additional ~130GB. This will eventually help to give more space to the local backup (/dev/sdc1)

controls@chiara|~> df -m
Filesystem     1M-blocks    Used Available Use% Mounted on
/dev/sda1         450420   11039    416501   3% /
udev               15543       1     15543   1% /dev
tmpfs               3111       1      3110   1% /run
none                   5       0         5   0% /run/lock
none               15554       1     15554   1% /run/shm
/dev/sdb1        2064245 1581871    377517  81% /home/cds
/dev/sdd1        1877792 1426378    356028  81% /media/fb9bba0d-7024-41a6-9d29-b14e631a2628
/dev/sdc1        1877764 1698489     83891  96% /media/40mBackup

 

 

  13160   Wed Aug 2 15:04:15 2017 gautamConfigurationComputerscontrol room workstation power distribution

The 4 control room workstation CPUs (Rossa, Pianosa, Donatella and Allegra) are now connected to the UPS.

The 5 monitors are connected to the recently acquired surge-protecting power strips.

Rack-mountable power strip + spare APC Surge Arrest power strip have been stored in the electronics cabinet.

Quote:

this is not the right one; this Ethernet controlled strip we want in the racks for remote control.

Buy some of these for the MONITORS.

 

  13227   Thu Aug 17 22:54:49 2017 ericqUpdateComputersTrying to access JetStor RAID files

The JetStor RAID unit that we had been using for frame writing before the fb meltdown has some archived frames from DRFPMI locks that I want to get at. I spent some time today trying to mount it on optimus with no success crying

The unit was connected to fb via a SCSI cable to a SCSI-to-PCI card inside of fb. I moved the card to optimus, and attached the cable. However, no mountable device corresponding to the RAID seems to show up anywhere.

The RAID unit can tell that it's hooked up to a computer, because when optimus restarts, the RAID event log says "Host Channel 0 - SCSI Bus Reset."

The computer is able to get some sort of signals from the RAID unit, because when I change the SCSI ID, the syslog will say 'detected non-optimal RAID status'.

The PCI card is ID'd fine in lspci as "06:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev c1)"

'lsssci' does not list anything related to the unit

Using 'mpt-status -p', which is somehow associated with this kind of thing returns the disheartening output:

Checking for SCSI ID:0
Checking for SCSI ID:1
Checking for SCSI ID:2
Checking for SCSI ID:3
Checking for SCSI ID:4
Checking for SCSI ID:5
Checking for SCSI ID:6
Checking for SCSI ID:7
Checking for SCSI ID:8
Checking for SCSI ID:9
Checking for SCSI ID:10
Checking for SCSI ID:11
Checking for SCSI ID:12
Checking for SCSI ID:13
Checking for SCSI ID:14
Checking for SCSI ID:15
Nothing found, contact the author
 
I don't know what to try at this point.
  13239   Tue Aug 22 15:17:19 2017 ericqUpdateComputersOld frames accessible again

It turns out the problem was just a bent pin on the SCSI cable, likely from having to stretch things a bit to reach optimus from the RAID unit.frown

I hooked it up to megatron, and it was automatically recognized and mounted. yes

I had to turn off the new FB machine and remove it from the rack to be able to access megatron though, since it was just sitting on top. FB needs a rail to sit on!

At a cursory glance, the filesystem appears intact. I have copied over the achived DRFPMI frame files to my user directory for now, and Gautam is going to look into getting those permanently stored on the LDAS copy of 40m frames, so that we can have some redundancy.

Also, during this time, one of the HDDs in the RAID unit failed its SMART tests, so the RAID unit wanted it replaced. There were some spare drives in a little box directly under the unit, so I've installed one and am currently incorporating it back into the RAID.

There are two more backup drives in the box. We're running a RAID 5 configuration, so we can only lose one drive at a time before data is lost.

  13240   Tue Aug 22 15:40:06 2017 gautamUpdateComputersOld frames accessible again

[jamie, gautam]

I had some trouble getting the daqd processes up and running again using Jamie's instructions.

With Jamie's help however, they are back up and running now. The problem was that the mx infrastructure didn't come back up on its own. So prior to running sudo systemctl restart daqd_*, Jamie ran sudo systemctl start mx. This seems to have done the trick.

c1iscey was still showing red fields on the CDS overview screen so Jamie did a soft reboot. The machine came back up cleanly, so I restarted all the models. But the indicator lights were still red. Apparently the mx processes weren't running on c1iscey. The way to fix this is to run sudo systemctl start mx_stream. Now everything is green.

Now we are going to work on trying the fix Rolf suggested on c1iscex.

Quote:

It turns out the problem was just a bent pin on the SCSI cable, likely from having to stretch things a bit to reach optimus from the RAID unit.frown

I hooked it up to megatron, and it was automatically recognized and mounted. yes

I had to turn off the new FB machine and remove it from the rack to be able to access megatron though, since it was just sitting on top. FB needs a rail to sit on!

At a cursory glance, the filesystem appears intact. I have copied over the achived DRFPMI frame files to my user directory for now, and Gautam is going to look into getting those permanently stored on the LDAS copy of 40m frames, so that we can have some redundancy.

Also, during this time, one of the HDDs in the RAID unit failed its SMART tests, so the RAID unit wanted it replaced. There were some spare drives in a little box directly under the unit, so I've installed one and am currently incorporating it back into the RAID.

There are two more backup drives in the box. We're running a RAID 5 configuration, so we can only lose one drive at a time before data is lost.

 

  13242   Tue Aug 22 17:11:15 2017 gautamUpdateComputersc1iscex model restarts

[jamie, gautam]

We tried to implement the fix that Rolf suggested in order to solve (perhaps among other things) the inability of some utilities like dataviewer to open testpoints. The problem isn't wholly solved yet - we can access actual testpoint data (not just zeros, as was the case) using DTT, and if DTT is used to open a testpoint first, then dataviewer, but DV itself can't seem to open testpoints.

Here is what was done (Jamie will correct me if I am mistaken).

  1. Jamie checked out branch 3.4 of the RCG from the SVN.
  2. Jamie recompiled all the models on c1iscex against this version of RCG.
  3. I shutdown ETMX watchdog, then ran rtcds stop all on c1iscex to stop all the models, and then restarted them using rtcds start <model> in the order c1x01, c1scx and c1asx. 
  4. Models came back up cleanly. I then restarted the daqd_dc process on FB1. At this point all indicators on the CDS overview screen were green.
  5. Tried getting testpoint data with DTT and DV for ETMX Oplev Pitch and Yaw IN1 testpoints. Conclusion as above.

So while we are in a better state now, the problem isn't fully solved. 

Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.

  13243   Tue Aug 22 18:36:46 2017 gautamUpdateComputersAll FE models compiled against RCG3.4

After getting the go ahead from Jamie, I recompiled all the FE models against the same version of RCG that we tested on the c1iscex models.

To do so:

  • I did rtcds make and rtcds install for all the models.
  • Then I ssh-ed into the FEs and did rtcds stop all, followed by rtcds start <model> in the order they are listed on the CDS overview MEDM screen (top to bottom).
  • During the compilation process (i.e. rtcds make), for some of the models, I got some compilation warnings. I believe these are related to models that have custom C code blocks in them. Jamie tells me that it is okay to ignore these warnings at that they will be fixed at some point.
  • c1lsc FE crashed when I ran rtcds stop all - had to go and do a manual reboot.
  • Doing so took down the models on c1sus and c1ioo that were running - but these FEs themselves did not have to be robooted.
  • Once c1lsc came back up, I restarted all the models on the vertex FEs. They all came back online fine.
  • Then I ssh-ed into FB1, and restarted the daqd processes - but c1lsc and c1ioo CDS indicators were still red.
  • Looks like the mx_stream processes weren't started automatically on these two machines. Reasons unknown. Earlier today, the same was observed for c1iscey.
  • I manually restarted the mx_stream processes, at which point all CDS indicator lights became green (see Attachment #1).

IFO alignment needs to be redone, but at least we now have a (admittedly rounabout way) of getting testpoints. Did a quick check for "nan-s" on the ASC screen, saw none. So I am re-enabling watchdogs for all optics.

GV 23 August 9am: Last night, I re-aligned the TMs for single arm locks. Before the model restarts, I had saved the good alignment on the EPICs sliders, but the gain of x3 on the coil driver filter banks have to be manually turned on at the moment (i.e. the safe.snap file has them off). ALS noise looked good for both arms, so just for fun, I tried transitioning control of both arms to ALS (in the CARM/DARM basis as we do when we lock DRFPMI, using the Transition_IR_ALS.py script), and was successful.

Quote:

[jamie, gautam]

We tried to implement the fix that Rolf suggested in order to solve (perhaps among other things) the inability of some utilities like dataviewer to open testpoints. The problem isn't wholly solved yet - we can access actual testpoint data (not just zeros, as was the case) using DTT, and if DTT is used to open a testpoint first, then dataviewer, but DV itself can't seem to open testpoints.

Here is what was done (Jamie will correct me if I am mistaken).

  1. Jamie checked out branch 3.4 of the RCG from the SVN.
  2. Jamie recompiled all the models on c1iscex against this version of RCG.
  3. I shutdown ETMX watchdog, then ran rtcds stop all on c1iscex to stop all the models, and then restarted them using rtcds start <model> in the order c1x01, c1scx and c1asx. 
  4. Models came back up cleanly. I then restarted the daqd_dc process on FB1. At this point all indicators on the CDS overview screen were green.
  5. Tried getting testpoint data with DTT and DV for ETMX Oplev Pitch and Yaw IN1 testpoints. Conclusion as above.

So while we are in a better state now, the problem isn't fully solved. 

Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.

 

  13277   Wed Aug 30 22:15:47 2017 ranaOmnistructureComputersUSB flash drives moved

I have moved the USB flash drives from the electronics bench back into the middle drawer of the cabinet next to the AC which is west of the fridge. Drawer re-enlabeled.

  13287   Fri Sep 1 16:55:27 2017 gautamUpdateComputersTestpoints now accessible again

Thanks to Jonathan Hanks, it appears we can now access test-points again using dataviewer.

I haven't done an exhaustive check just yet, but I have loaded a few testpoints in dataviewer, and ran a script that use testpoint channels (specifically the ALS phase tracker UGF setting script), all seems good.

So if I remember correctly, the major CDS fix now required is to solve the model unloading issue.

Thanks to Jamie/Jonathan Hanks/KT for getting us back to this point! Here are the details:

After reading logs and code, it was a simple daqdrc config change.

The daqdrc should read something like this:

...
set master_config=".../master";
configure channels begin end;
tpconfig ".../testpoint.par";
...


What had happened was tpconfig was put before the configure channels
begin end.  So when daqd_rcv went to configure its test points it did
not have the channel list configured and could not match test points to
the right model & machine.  Dave and I suspect that this is so that it
can do an request directly to the correct front end instead of a general
broadcast to all awgtpman instances.

Simply reordering the config fixes it.

I tested by opening a test point in dataviewer and verifiying that
testpoints had opened/closed by using diag -l.  Xmgr/grace didn't seem
to be able to keep up with the test point data over a remote connection.

You can find this in the logs by looking for entries like the following
while the daqd is starting up.  When we looked we saw that there was an
entry for every model.

Unable to find GDS node 35 system c1daf in INI fiels
  13323   Wed Sep 20 15:49:26 2017 ranaOmnistructureComputersnew internet

Larry Wallace hooked up a new switch (Brocade FWS 648G) today which is our 40m lab interface to the outside world internet. Its faster.

He then, just now, switched over the cables which were going to the old one into the new one, including NODUS and the NAT Router. CDS machines can still connect to the outside world.

In the next week or two, he'll install a new NAT for us so that we can have high speed comm from CDS to the world.

  13405   Sun Oct 29 16:40:17 2017 ranaSummaryComputersdisk cleanup

Backed up all the wikis. Theyr'e in wiki_backups/*.tar.xz (because xz -9e gives better compression than gzip or bzip2)

Moved old user directories in the /users/OLD/

  13434   Fri Nov 17 16:31:11 2017 aaronOmnistructureComputersAcromag wired up

Acromag Wireup Update

I finished wiring up the Acromags to replace the VME boxes on the x arm. I still need to cut down the bar and get them all tidy in the box, but I wanted to post the wiring maps I made.
I wanted to note specifically that a few of the connections were assigned to VME boxes but are no longer assigned in this Acromag setup. We should be sure that we actually do not need to use the following channels:

Channels no longer in use

  • From the VME analog output (VMIVME 4116) to the QPD Whitening board (no DCC number on the front), 3 channels are no longer in use
  • From the anti-image filter (D000186) to the ADC (VMIVME 3113A) 5 channels are no longer in use (these are the only channels from the anti-image filter, so this filter is no longer in use at all?)
  • From the universal dewhitening filter (D000183) to a binary I/O adapter (channels 1-16), 4 channels are no longer in use. These are the only channels from the dewhitening filter
  • From a second universal dewhitening filter (D000183) to another the binary I/O adapter (channels 1-16), one channel is no longer in use (this was the only channel from this dewhitening filter).
  • From the opti-lever (D010033) to the VME ADC (VMIVME 3113A), 7 channels are no longer in use (this was all of the channels from the opti lever)
  • From the SUS PD Whitening/Interface board (D000210) to a binary I/O adapter (channels 1-16), 5 channels are no longer in use. 
  • Note that none of the binary I/O adapter channels are in use.

 

  13435   Fri Nov 17 17:10:53 2017 ranaOmnistructureComputersAcromag wired up

Exactly: you'll have to list explicitly what functions those channels had so that we know what we're losing before we make the switch.

  13440   Tue Nov 21 17:51:01 2017 KojiConfigurationComputersnodus post OS migration admin

The post OS migration admin for nodusa bout apache, elogd, svn, iptables, etc can be found in https://wiki-40m.ligo.caltech.edu/NodusUpgradeNov2017

Update: The svn dump from the old svn was done, and it was imported to the new svn repository structure. Now the svn command line and (simple) web interface is running. And "websvn" was also implemented.

  13442   Tue Nov 21 23:47:51 2017 gautamConfigurationComputersnodus post OS migration admin

I restored the nodus crontab (copied over from the Nov 17 backup of the same at /opt/rtcds/caltech/c1/scripts/crontab/crontab_nodus.20171117080001. There wasn't a crontab, so I made one using sudo crontab -e.

This crontab is supposed to execute some backup scripts, send pizza emails, check chiara disk usage, and backup the crontab itself.

I've commented out the backup of nodus' /etc and /export for now, while we get back to fully operational nodus (though we also have a backup of /cvs/cds/caltech/nodus_backup on the external LaCie drive), they can be re-enabled by un-commenting the appropriate lines in the crontab.

Quote:

The post OS migration admin for nodusa bout apache, elogd, svn, iptables, etc can be found in https://wiki-40m.ligo.caltech.edu/NodusUpgradeNov2017

Update: The svn dump from the old svn was done, and it was imported to the new svn repository structure. Now the svn command line and (simple) web interface is running. "websvn" is not installed.

 

  13443   Wed Nov 22 00:54:18 2017 johannesOmnistructureComputersSlow DAQ replacement computer progress

I got the the SuperMicro 1U server box from Larry W on Monday and set it up in the CryoLab for initial testing.

The specs: https://www.supermicro.com/products/system/1U/5015/SYS-5015A-EHF-D525.cfm

The processor is an Intel D525 dual core atom processor with 1.8 GHz (i386 architecture, no 64-bit support). The unit has a 250GB SSD and 4GB RAM.

I installed Debian Jessie on it without any problems and compiled the most recent stable versions of EPICS base (3.15.5), asyn drivers (4-32), and modbus module (2-10-1). EPICS and asyn each took about 10 minutes, and modbus about 1 minute.

I copied the database files and port driver definitions for the cryolab from cryoaux, whose modbus services I suspended, and initialized the EPICS modbus IOC on the SuperMicro machine instead. It's working flawlessly so far, but admittedly the box is not under heavy load in the cryolab, as the framebuilder there is logging only the 16 analog channels.

I have recently worked out some kinks in the port driver and channel definitions, most importantly:

  • mosbus IOC initialization is performed automatically by systemd on reboot
  • If the IOC crashes or a system reboot is required the Acromag units freeze in their last current state. When the IOC is started a single read operation of all A/D registers is performed and the result taken as the initial value of the corresponding channel, causing no discontinuity in generated voltage EVER (except of course for the rare case when the Acromags themselves have to be restarted)

Aaron and I set 12/4 as a tentative date when we will be ready to attempt a swap. Until then the cabling needs to be finished and a channel database file needs to be prepared.

  13445   Wed Nov 22 11:51:38 2017 gautamConfigurationComputersnodus post OS migration admin

Confirmed that this crontab is running - the daily backup of the crontab seems to have successfully executed, and there is now a file crontab_nodus.ligo.caltech.edu.20171122080001 in the directory quoted below. The $HOSTNAME seems to be "nodus.ligo.caltech.edu" whereas it was just "nodus", so the file names are a bit longer now, but I guess that's fine...

Quote:

I restored the nodus crontab (copied over from the Nov 17 backup of the same at /opt/rtcds/caltech/c1/scripts/crontab/crontab_nodus.20171117080001. There wasn't a crontab, so I made one using sudo crontab -e.

This crontab is supposed to execute some backup scripts, send pizza emails, check chiara disk usage, and backup the crontab itself.

I've commented out the backup of nodus' /etc and /export for now, while we get back to fully operational nodus (though we also have a backup of /cvs/cds/caltech/nodus_backup on the external LaCie drive), they can be re-enabled by un-commenting the appropriate lines in the crontab.

 

 

  13458   Wed Nov 29 21:40:30 2017 johannesOmnistructureComputersSlow DAQ replacement computer progress

[Aaron, Johannes]

We configured the AtomServer for the Martian network today. Hostname is c1auxex2, IP is 192.168.113.49. Remote access over SSH is enabled.

There will be 6 acromag units served by c1auxex2.

Hostname Type IP Address
c1auxex-xt1221a 1221 192.168.113.130
c1auxex-xt1221b 1221 192.168.113.131
c1auxex-xt1221c 1221 192.168.113.132
c1auxex-xt1541a 1541 192.168.113.133
c1auxex-xt1541b 1541 192.168.113.134
c1auxex-xt1111a 1111 192.168.113.135

Some hardware to assemble the Acromag box and adapter PCBs are still missing, and the wiring and channel definitions have to be finalized. The port driver initialization instructions and channel definitions are currently locally stored in /home/controls/modbusIOC/ but will eventually be migrated to a shared location, but we need to decide how exactly we want to set up this infrastructure.

  • Should the new machines have the same hostnames as the ones they're replacing? For the transition we simply named it c1auxex2.
  • Because the communication of the server machine with the DAQ modules is happening over TCP/IP and not some VME backplane bus we could consolidate machines, particularly in the vertex area.
  • It would be good to use the fact that these SuperMicro servers have 2+ ethernet ports to separate CDS EPICS traffic from the modbus traffic. That would also keep the 30+ IPs for the Acromag thingies off the Martian host tables.
  13461   Sun Dec 3 05:25:59 2017 gautamConfigurationComputerssendmail installed on nodus

Pizza mail didn't go out last weekend - looking at logfile, it seems like the "sendmail" service was missing. I installed sendmail following the instructions here: https://tecadmin.net/install-sendmail-server-on-centos-rhel-server/

Except that to start the sendmail service, I used systemctl and not init.d. i.e. I ran systemctl start sendmail.service (as root). Test email to myself works. Let's see if it works this weekend. Of course this isn't so critical, more important are the maintenance emails that may need to go out (e.g. disk usage alert on chiara / N2 pressure check, which looks like nodus' responsibilities). 

  13462   Sun Dec 3 17:01:08 2017 KojiConfigurationComputerssendmail installed on nodus

An email has come at 5PM on Dec 3rd.

 

  13463   Mon Dec 4 22:06:07 2017 johannesOmnistructureComputersAcromag XEND progress

I wired up the power distribution, and ethernet cables in the Acromag chassis today. For the time being it's all kind of loose in there but tomorrow the last parts should arrive from McMaster to put everything in its place. I had to unplug some of the wiring that Aaron had already done but labeled everything before I did so. I finalized the IP configuration via USB for all the units, which are now powered through the chassis and active on the network.

I started transcribing the database file ETMXaux.db that is loaded by c1auxex in the format required by the Acromags and made sure that the new c1auxex2 properly functions as a server, which it does.

ToDo-list:

  • Need to calibrate the +/- 10V swing of the analog channels via the USB utility, but that requires wiring the channels to the connectors and should probably be done once the unit sits in the rack
  • Need to wire power from the Sorensens into the chassis. There are +/- 5V, +/- 15V and +/- 20V present. The Acromags need only +12V-32V, for which I plan to use the +20V, and an excitation voltage for the binary channels, for which I'm going to wire the +5V. Should do this through the fuse rails on the side.
  • The current slow binary channels are sinking outputs, same as the XT1111 16-channel module we have. The additional 4 binary outputs of the XT1541 are sourcing, and I'm currently not sure if we can use them with the sos driver and whitening vme boards that get their binary control signals from the slow system.
  • Confirm switching of binary channels (haven't used model XT1111 before, but I assume the definitions are identical to XT1121)
  • Setup remaining essential EPICS channels and confirm that dimensions are the same (as in both give the same voltage for the same requested value)
  • Disconnect DIN cables, attach adapter boards + DSUB cables
  • Testing

 

Quote:

[Aaron, Johannes]

We configured the AtomServer for the Martian network today. Hostname is c1auxex2, IP is 192.168.113.49. Remote access over SSH is enabled.

There will be 6 acromag units served by c1auxex2.

Hostname Type IP Address
c1auxex-xt1221a 1221 192.168.113.130
c1auxex-xt1221b 1221 192.168.113.131
c1auxex-xt1221c 1221 192.168.113.132
c1auxex-xt1541a 1541 192.168.113.133
c1auxex-xt1541b 1541 192.168.113.134
c1auxex-xt1111a 1111 192.168.113.135

Some hardware to assemble the Acromag box and adapter PCBs are still missing, and the wiring and channel definitions have to be finalized. The port driver initialization instructions and channel definitions are currently locally stored in /home/controls/modbusIOC/ but will eventually be migrated to a shared location, but we need to decide how exactly we want to set up this infrastructure.

  • Should the new machines have the same hostnames as the ones they're replacing? For the transition we simply named it c1auxex2.
  • Because the communication of the server machine with the DAQ modules is happening over TCP/IP and not some VME backplane bus we could consolidate machines, particularly in the vertex area.
  • It would be good to use the fact that these SuperMicro servers have 2+ ethernet ports to separate CDS EPICS traffic from the modbus traffic. That would also keep the 30+ IPs for the Acromag thingies off the Martian host tables.
  13468   Thu Dec 7 22:24:04 2017 johannesOmnistructureComputersAcromag XEND progress

 

Quote:
 
  • Need to calibrate the +/- 10V swing of the analog channels via the USB utility, but that requires wiring the channels to the connectors and should probably be done once the unit sits in the rack
  • Need to wire power from the Sorensens into the chassis. There are +/- 5V, +/- 15V and +/- 20V present. The Acromags need only +12V-32V, for which I plan to use the +20V, and an excitation voltage for the binary channels, for which I'm going to wire the +5V. Should do this through the fuse rails on the side.
  • The current slow binary channels are sinking outputs, same as the XT1111 16-channel module we have. The additional 4 binary outputs of the XT1541 are sourcing, and I'm currently not sure if we can use them with the sos driver and whitening vme boards that get their binary control signals from the slow system.
  • Confirm switching of binary channels (haven't used model XT1111 before, but I assume the definitions are identical to XT1121)
  • Setup remaining essential EPICS channels and confirm that dimensions are the same (as in both give the same voltage for the same requested value)
  • Disconnect DIN cables, attach adapter boards + DSUB cables
  • Testing

Getting the chassis ready took a little longer than anticipated, mostly because I had not looked into the channel list myself before and forgot about Lydia's post which mentions that some of the switching controls have to be moved from the fast to the slow DAQ. We would need a total of 5+5+4+8=22 binary outputs. With the existing Acromag units we have 16 sinking outputs and 8 sourcing outputs. I looked through all the Eurocrate modules and confirmed that they all use the same switch topology which has sourcing inputs.

While one can use a pull-down resistor to control a sourcing input with a sourcing output,

pulling down the MAX333A input (datasheet says logic low is <0.8V) requires something like 100 Ohms for the pull down resistor, which would require ~150mA of current PER CHANNEL, which is unreasonable. Instead, I asked Steve to buy a second XT1111 and modified the chassis to accomodate more Acromag units.

I have now finished wiring the chassis (except for 8 remaining bypass controls to the whitening board which need the second XT1111), calibrated all channels in use, confirmed all pin locations via the existing breakout boards and DCC drawings for the eurocrate modules, and today Steve and I added more fuses to the DIN rail power distribution for +20V and +15V.

There was not enough contingent free space in the XEND rack to mount the chassis, so for now I placed it next to it.

c1auxex2 is currently hosting all original physical c1auxex channels (not yet calc records) under their original name with an _XT added at the end to avoid duplicate channel names. c1auxex is still in control of ETMX. All EPICS channels hosted by c1auxex2 are in dimensions of Volts. The plan for tomorrow is to take c1auxex off the grid, rename the c1auxex2 hosted channels and transfer ETMX controls to it, provided we can find enough 37pin DSub cables (8). I made 5 adapter boards for the 5 Eurocrate modules that need to talk to the slow DAQ through their backplane connector.

  13469   Fri Dec 8 12:06:59 2017 johannesOmnistructureComputersc1auxex2 ready - but need more cables

The new slow machine c1auxex2 is ready to deploy. Unfortunately we don't have enough 37pin DSub cables to connect all channels. In fact, we need a total of 8, and I found only three male-male cables and one gender changer. I asked Steve to buy more.

Over the past week I have transferred all EPICS records - soft channels and physical ones - from c1auxex to c1auxex2, making changes where needed. Today I started the in-situ testing

  1. Unplugged ETMX's satellite box
  2. Unplugged the eurocrate backplane DIN cables from the SOS Driver and QPD Whitening filter modules (the ones that receive ao channels)
  3. Measured output voltages on the relevant pins for comparison after the swap
  4. Turned off c1auxex by key, removed ethernet cable
  5. Started the modbus ioc on c1auxex2
  6. Slow machine indicator channels came online, ETMX Watchdog was responsive (but didn't have anything to do due to missing inputs) and reporting. PIT/YAW sliders function as expected
  7. Restoring the previous settings gives output voltages close to the previous values, in fact the exact values requested (due to fresh calibration)
  8. Last step is to go live with c1auxex2 and confirm the remaining channels work as expected.

I copied the relevant files to start the modbus server to /cvs/cds/caltech/target/c1auxex2, although kept local copies in /home/controls/modbusIOC/ from which they're still run.

I wonder what's the best practice for this. Probably to store the database files centrally and load them over the network on server start?

  13487   Mon Dec 18 17:48:09 2017 ranaUpdateComputersrossa: SL7.3 upgrade continues

Following instructions from LLO-CDS fo the rossa upgrade. Last time there were some issues with not being to access the LLO EPEL repos, but this time it seems to be working fine.

After adding font aliases, need to run 'sudo xset fp rehash' to get the new aliases to take hold. Afterwards, am able to use MEDM and sitemap just fine.

But diaggui won't run because of a lib-sasl error. Try 'sudo yum install gds-all'.

diaggui: error while loading shared libraries: libsasl2.so.2: cannot open shared object file: No such file or directorycrying (have contacted LLO CDS admins)

X-windows keeps crashing with SL7 and this big monitor. Followed instructions on the internet to remove the generic 'Nouveau' driver and install the proprietary NVDIA drivers by dropping to run level 3 and runnning some command line hoodoo to modify the X-files. Now I can even put the mouse on the left side of the screen and it doesn't crash. laugh

  13504   Fri Jan 5 17:50:47 2018 ranaConfigurationComputersmotif on nodus

I had to do 'sudo yum install motif' on nodus so that we could get libXm.so.4 so that we could run MEDM. Works now.

  13539   Fri Jan 12 12:31:04 2018 gautamConfigurationComputerssendmail troubles on nodus

I'm having trouble getting the sendmail service going on nodus since the Christmas day power failure - for some reason, it seems like the mail server that sendmail uses to send out emails on nodus (mx1.caltech.iphmx.com, IP=68.232.148.132) is on a blacklist! Not sure how exactly to go about remedying this.

Running sudo systemctl status sendmail.service -l also shows a bunch of suspicious lines:

Jan 12 10:15:27 nodus.ligo.caltech.edu sendmail[6958]: STARTTLS=client, relay=cluster6a.us.messagelabs.com., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-GCM-SHA384, bits=256/256
Jan 12 10:15:45 nodus.ligo.caltech.edu sendmail[6958]: w0A7QThE032091: to=<umakant.rapol@iiserpune.ac.in>, ctladdr=<controls@nodus.ligo.caltech.edu> (1001/1001), delay=2+10:49:16, xdelay=00:00:39, mailer=esmtp, pri=5432408, relay=cluster6a.us.messagelabs.com. [216.82.251.230], dsn=4.0.0, stat=Deferred: 421 Service Temporarily Unavailable
Jan 12 11:15:23 nodus.ligo.caltech.edu sendmail[10334]: STARTTLS=client, relay=cluster6a.us.messagelabs.com., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-GCM-SHA384, bits=256/256
Jan 12 11:15:31 nodus.ligo.caltech.edu sendmail[10334]: w0A7QThE032091: to=<umakant.rapol@iiserpune.ac.in>, ctladdr=<controls@nodus.ligo.caltech.edu> (1001/1001), delay=2+11:49:02, xdelay=00:00:27, mailer=esmtp, pri=5522408, relay=cluster6a.us.messagelabs.com. [216.82.251.230], dsn=4.0.0, stat=Deferred: 421 Service Temporarily Unavailable
Jan 12 12:15:25 nodus.ligo.caltech.edu sendmail[13747]: STARTTLS=client, relay=cluster6a.us.messagelabs.com., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-GCM-SHA384, bits=256/256
Jan 12 12:15:42 nodus.ligo.caltech.edu sendmail[13747]: w0A7QThE032091: to=<umakant.rapol@iiserpune.ac.in>, ctladdr=<controls@nodus.ligo.caltech.edu> (1001/1001), delay=2+12:49:13, xdelay=00:00:33, mailer=esmtp, pri=5612408, relay=cluster6a.us.messagelabs.com. [216.82.251.230], dsn=4.0.0, stat=Deferred: 421 Service Temporarily Unavailable

 

Why is nodus attempting to email umakant.rapol@iiserpune.ac.in?

  13540   Fri Jan 12 16:01:27 2018 KojiConfigurationComputerssendmail troubles on nodus

I personally don't like the idea of having sendmail (or something similar like postfix) on a personal server as it requires a lot of maintenance cost (like security update, configuration, etc). If we can use external mail service (like gmail) via gmail API on python, that would easy our worry, I thought.

  13542   Fri Jan 12 18:22:09 2018 gautamConfigurationComputerssendmail troubles on nodus

Okay I will port awade's python mailer stuff for this purpose.

gautam 14Jan2018 1730: Python mailer has been implemented: see here for the files. On shared drive, the files are at /opt/rtcds/caltech/c1/scripts/general/pizza/pythonMailer/

gautam 11Feb2018 1730: The python mailer had never once worked successfully in automatically sending the message. I realized this may be because I had put the script on the root user's crontab, but had setup the authentication keyring with the password for the mailer on the controls user. So I have now setup a controls user crontab, which for now just runs the pizza mailing. let's see if this works next Sunday...

Quote:

I personally don't like the idea of having sendmail (or something similar like postfix) on a personal server as it requires a lot of maintenance cost (like security update, configuration, etc). If we can use external mail service (like gmail) via gmail API on python, that would easy our worry, I thought.

 

  13545   Sat Jan 13 02:36:51 2018 ranaConfigurationComputerssendmail troubles on nodus

I think sendmail is required on nodus since that's how the dokuwiki works. That's why the dokuwiki was trying to send an email to Umakant.

  13546   Sat Jan 13 03:20:55 2018 KojiConfigurationComputerssendmail troubles on nodus

I know it, and I don't like it. DokuWiki seems to allow us to use an external server for notification emails. That would be the way to go.

ELOG V3.1.3-