40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 86 of 348  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  15446   Wed Jul 1 18:03:04 2020 JonConfigurationVACUPS replacements

​I looked into how the new UPS devices suggested by Chub would communicate with the vac interlocks. There are several possible ways, listed in order of preference:

  • Python interlock service directly queries the UPS via a USB link using the (unofficial) tripplite package. Direct communication would be ideal because it avoids introducing a dependency on third-party software outside the monitoring/control capability of the interlock manager. However the documentation warns this package does not work for all models...
  • Configure Tripp Lite's proprietary software (PowerAlert Local) to send SYSLOG event messages (UDP packets) to a socket monitored by the Python interlock manager.
  • Configure the proprietary software to execute a custom script upon an event occurring. The script would, e.g., set an EPICS flag channel which the interlock manager is continually monitoring.

I recommend we proceed with ordering the Tripp Lite 36HW20 for TP1 and Tripp Lite 1AYA6 for TP2 and TP3 (and other 120V electronics). As far as I can tell, the only difference between the two 120V options is that the 6FXN4 model is TAA-compliant.

  15456   Mon Jul 6 15:10:40 2020 JonSummaryBHD40m --> A+ BHD design analysis

As suggested last week, Hang and I have reviewed the A+ BHD status (DRD, CDD, and reviewers' comments) and compiled a list of key unanswered questions which could be addressed through Finesse analysis.

In anticipation of others helping with this modeling effort, we've tried to break questions into self-contained projects and estimated their level of difficulty. As you'll see, they range from beginner to Finesse guru.

  15462   Thu Jul 9 16:02:33 2020 JonHowToCDSProcedure for setting up BHD front-ends

Here is the procedure for setting up the three new BHD front-ends (c1bhd, c1sus2, c1ioo - replacement). This plan is based on technical advice from Rolf Bork and Keith Thorne.

The overall topology for each machine is shown here. As all our existing front-ends use (obsolete) Dolphin PCIe Gen1 cards for IPC, we have elected to re-use Dolphin Gen1 cards removed from the sites. Different PCIe generations of Dolphin cards cannot be mixed, so the only alternative would be to upgrade every 40m machine. However the drivers for these Gen1 Dolphin cards were last updated in 2016. Consequently, they do not support the latest Linux kernel (4.x) which forces us to install a near-obsolete OS for compatibility (Debian 8).

Hardware

Software

  • OS: Debian 8.11 (Linux kernel 3.16)
  • IPC card driver: Dolphin DX 4.4.5 [works only with Linux kernel 2.6 to 3.x]
  • I/O card driver: None required, per the manual

Install Procedure

  1. Follow Keith Thorne's procedure for setting up Debian 8 front-ends
  2. Apply the real-time kernel patches developed for Debian 9, but modified for kernel 3.16 [these are UNTESTED against Debian 8; Keith thinks they may work, but they weren't discovered until after the Debian 9 upgrade]
  3. Install the PCIe expansion cards and Dolphin DX driver (driver installation procedure)
  15465   Thu Jul 9 18:00:35 2020 JonConfigurationVACUPS replacements

Chub has placed the order for two new UPS units (115V for TP2/3 and a 220V version for TP1).

They will arrive within the next two weeks.

Quote:

​I looked into how the new UPS devices suggested by Chub would communicate with the vac interlocks. There are several possible ways, listed in order of preference:

  • Python interlock service directly queries the UPS via a USB link using the (unofficial) tripplite package. Direct communication would be ideal because it avoids introducing a dependency on third-party software outside the monitoring/control capability of the interlock manager. However the documentation warns this package does not work for all models...
  • Configure Tripp Lite's proprietary software (PowerAlert Local) to send SYSLOG event messages (UDP packets) to a socket monitored by the Python interlock manager.
  • Configure the proprietary software to execute a custom script upon an event occurring. The script would, e.g., set an EPICS flag channel which the interlock manager is continually monitoring.

I recommend we proceed with ordering the Tripp Lite 36HW20 for TP1 and Tripp Lite 1AYA6 for TP2 and TP3 (and other 120V electronics). As far as I can tell, the only difference between the two 120V options is that the 6FXN4 model is TAA-compliant.

  15499   Thu Jul 23 15:58:24 2020 JonSummaryVACVacuum controls refurbishment plan

This year we've struggled with vacuum controls unreliability (e.g., spurious interlock triggers) caused by decaying hardware. Here are details of the vacuum refurbishment plan I described on the 40m call this week.

 Refurbish TP2 and TP3 dry pumps. Completed [ELOG 15417].

 Automated notifications of interlock-trigger events. Email to 40m list and a new interlock flag channel. Completed [ELOG 15424].

Replace failing UPS.

  • Two new Tripp Lite units on order, 110V and 230V [ELOG 15465].
  • Jordan will install them in the vacuum rack once received.
  • Once installed, Jon will come test the new units, set up communications, and integrate them into the interlock system following this plan [ELOG 15446].
  • Jon will move the pumps and other equipment to the new UPS units only after completing the above step.

Remove interlock dependencies on TP2/TP3 serial readbacks. Due to persistent glitching [ELOG 15140, ELOG 15392].

Unlike TP2 and TP3, the TP1 readbacks are real analog signals routed to Acromags. As these have caused us no issues at all, the plan is to eliminate dependence on the TP2/3 digital readbacks in favor of the analog controller outputs. All the digital readback channels will continue to exist, but the interlock system will no longer depend on them. This will require adding 2 new sinking BI channels each for TP2 and TP3 (for a total of 4 new channels). We have 8 open Acromag XT1111 channels in the c1vac system [ELOG 14493], so the new channels can be accommodated. The below table summarizes the proposed changes.

Channel Type Status Description Interlock
C1:Vac-TP1_current AI exists Current draw (A) keep
C1:Vac-TP1_fail BI exists Critical fault has occurred keep
C1:Vac-TP1_norm BI exists Rotation speed is within +/-10% of set point new
C1:Vac-TP2_rot soft exists Rotation speed (krpm) remove
C1:Vac-TP2_temp soft exists Temperature (C) remove
C1:Vac-TP2_current soft exists Current draw (A) remove
C1:Vac-TP2_fail BI new Critical fault has occurred new
C1:Vac-TP2_norm BI new Rotation speed is >80% of set point new
C1:Vac-TP3_rot soft exists Rotation speed (krpm) remove
C1:Vac-TP3_temp soft exists Temperature (C) remove
C1:Vac-TP3_current soft exists Current draw (A) remove
C1:Vac-TP3_fail BI new Critical fault has occurred new
C1:Vac-TP3_norm BI new Rotation speed is >80% of set point new
  15501   Mon Jul 27 15:48:36 2020 JonSummaryVACVacuum parts ordered

To carry out the next steps of the vac refurbishment plan [ELOG 15499], I've ordered parts necessary for interfacing the UPS units and the analog TP2/3 controller outputs with c1vac. The purchase list is appended to the main BHD list and is located here. Some parts we already had in the boxes of Acromag materials. Jordan is gathering what we do already have and staging it on the vacuum controls console table - please don't move them or put them away.

Quote:

Replace failing UPS.

Remove interlock dependencies on TP2/TP3 serial readbacks. Due to persistent glitching [ELOG 15140, ELOG 15392].

  15502   Tue Jul 28 12:22:40 2020 JonUpdateVACVac interlock test today 1:30 pm

This afternoon Jordan is going to carry out a test of the V4 and V5 hardware interlocks. To inform the interlock improvement plan [15499], we need to characterize exactly how these work (they pre-date the 2018 upgrade). I have provided him a sequence of steps for each test and will also be backing him up on Zoom.

We will close V1 as a precaution but there should be no other impact to the IFO. The tests are expected to take <1 hour. We will advise when they are completed.

  15504   Tue Jul 28 14:11:14 2020 JonUpdateVACVac interlock test today 1:30 pm

This test has been completed. The IFO configuration has been reverted to nominal.

For future reference: yes, both the V4 and V5 hardware interlocks were found to still be connected and work. A TTL signal from the analog output port of each pump controller (TP2 and TP3) is connected to an auxiliary relay inside the main valve relay box. These serve the purpose of interupting the (Acromag) control signal to the primary V4/5 relay. This interrupt is triggered by each pump's R1 setpoint signal, which is programmed to go low when the rotation speed falls below 80% of the low-speed setting.

Quote:

This afternoon Jordan is going to carry out a test of the V4 and V5 hardware interlocks. To inform the interlock improvement plan [15499], we need to characterize exactly how these work (they pre-date the 2018 upgrade). I have provided him a sequence of steps for each test and will also be backing him up on Zoom.

We will close V1 as a precaution but there should be no other impact to the IFO. The tests are expected to take <1 hour. We will advise when they are completed.

  15525   Fri Aug 14 10:03:37 2020 JonUpdateCDSTiming distribution slot availability

That's great news we won't have to worry about a new timing fanout for the two new machines, c1bhd and c1sus2. And there's no plan to change Dolphin IPC drivers. The plan is only to install the same (older) version of the driver on the two new machines and plug into free slots in the existing switch.

Quote:

The new dolphin eventually helps us. But the installation is an invasive change to the existing system and should be done at the installation stage of the 40m BHD.

  15526   Fri Aug 14 10:10:56 2020 JonConfigurationVACVacuum repairs today

The vac system is going down now for planned repairs [ELOG 15499]. It will likely take most of the day. Will advise when it's back up.

  15527   Sat Aug 15 02:02:13 2020 JonConfigurationVACVacuum repairs today

Vacuum work is completed. The TP2 and TP3 interlocks have been overhauled as proposed in ELOG 15499 and seem to be performing reliably. We're now back in the nominal system state, with TP2 again backing for TP1 and TP3 pumping the annuli. I'll post the full implementation details in the morning.

I did not get to setting up the new UPS units. That will have to be scheduled for another day.

Quote:

The vac system is going down now for planned repairs [ELOG 15499]. It will likely take most of the day. Will advise when it's back up.

  15528   Sat Aug 15 15:12:22 2020 JonConfigurationVACOverhaul of small turbo pump interlocks

Summary

Yesterday I completed the switchover of small turbo pump interlocks as proposed in ELOG 15499. This overhaul altogether eliminates the dependency on RS232 readbacks, which had become unreliable (glitchy) in both controllers. In their place, the V4(5) valve-close interlocks are now predicated on an analog controller output whose voltage goes high when the rotation speed is >= 80% of the nominal setpoint. The critical speed is 52.8 krpm for TP2 and 40 krpm for TP3. There already exist hardware interlocks of V4(5) using the same signals, which I have also tested.

Interlock signal

Unlike the TP1 controller, which exposes simple relays whose open/closed states are sensed by Acromags, what the TP2(3) controllers output is an energized 24V signal for controlling such a relay (output circuit pictured below). I hadn't appreciated this difference and it cost me time yesterday. The ultimate solution was to route the signals through a set of new 24V Phoenix Contact relays installed inside the Acromag chassis. However, this required removing the chassis from the rack and bringing it to the electronics bench (rather than doing the work in situ, as I had planned). The relays are mounted to the second DIN rail opposite the Acromags. Each TP2(3) signal controls the state of a relay, which in turn is sensed using an Acromag XT1111.

Signal routing

The TP2(3) "normal-speed" signals are already in use by hardware interlocks of V4(5). Each signal is routed into the main AC relay box, where it controls an "interrupter" relay through which the Acromag control signal for the main V4(5) relay is passed. These signals are now shared with the digital controls system using a passive DB15 Y-splitter. The signal routing is shown below.

Interlock conditions

The new turbo-pump-related interlock conditions and their channel predicates are listed below. The full up-to-date channel list and wiring assignments for c1vac are maintained here.

Channel Type New? Interlock-triggering condition
C1:Vac-TP1_norm BI No Rotation speed < 90% nominal setpoint (29 krpm)
C1:Vac-TP1_fail BI No Critical fault occurrence
C1:Vac-TP1_current AI No Current draw > 4 A
C1:Vac-TP2_norm BI Yes Rotation speed < 80% nominal setpoint (52.8 krpm)
C1:Vac-TP3_norm BI Yes Rotation speed < 80% nominal setpoint (40 krpm)

There are two new channels, both of which provide a binary indication of whether the pump speed is outside its nominal range. I did not have enough 24V relays to also add the C1:Vac-TP2(3)_fail channels listed in ELOG 15499. However, these signals are redundant with the existing interlocks, and the existing serial "Status" readback will already print failure messages to the MEDM screens. All of the TP2(3) serial readback channels remain, which monitor voltage, current, operational status, and temperature. The pump on/off and low-speed mode on/off controls remain implemented with serial signals as well.

The new analog readbacks have been added to the MEDM controls screens, circled below:

Other incidental repairs

  • I replaced the (dead) LED monitor at the vac controls console. In the process of finding a replacement, I came across another dead spare monitor as well. Both have been labeled "DEAD" and moved to Jordan's desk for disposal.
  • I found the current TP3 Varian V70D controller to be just as glitchy in the analog outputs as well. That likely indicates there is a problem with the microprocessor itself, not just the serial communications card as I thought might be the case. I replaced the controller with the spare unit which was mounted right next to it in the rack [ELOG 13143]. The new unit has not glitched since the time I installed it around 10 pm last night.
Attachment 1: small_tp_signal_routing.png
small_tp_signal_routing.png
Attachment 3: small_tp_signal_routing.png
small_tp_signal_routing.png
Attachment 4: medm_screen.png
medm_screen.png
  15537   Mon Aug 24 08:13:56 2020 JonUpdateVACUPS installation

I'm in the lab this morning to interface the two new UPS units with the digital controls system. Will be out by lunchtime. The disruptions to the vac system should be very brief this time.

  15538   Mon Aug 24 11:25:07 2020 JonUpdateVACUPS installation

I'm leaving the lab shortly. We're not ready to switch over the vac equipment to the new UPS units yet.

The 120V UPS is now running and interfaced to c1vac via a USB cable. The unofficial tripplite python package is able to detect and connect to the unit, but then read queries fail with "OS Error: No data received." The firmware has a different version number from what the developers say is known to be supported.

The 230V UPS is actually not correctly installed. For input power, it has a general type C14 connector which is currently plugged into a 120V power strip. However this unit has to be powered from a 230V outlet. We'll have to identify and buy the correct adapter cable.

With the 120V unit now connected, I can continue to work on interfacing it with python remotely. The next implementation I'm going to try is item #2 of this plan [ELOG 15446].

Quote:

I'm in the lab this morning to interface the two new UPS units with the digital controls system. Will be out by lunchtime. The disruptions to the vac system should be very brief this time.

  15556   Fri Sep 4 15:26:55 2020 JonUpdateVACVac system UPS installation

The vac controls are going down now to pull and test software changes. Will advise when the work is completed.

  15557   Fri Sep 4 21:12:51 2020 JonUpdateVACVac system UPS installation

The vac work is completed. All of the vacuum equipment is now running on the new 120V UPS, except for TP1. The 230V TP1 is still running off wall power, as it always has. After talking with Tripp Lite support today, I believe there is a problem with the 230V UPS. I will post a more detailed note in the morning.

Quote:

The vac controls are going down now to pull and test software changes. Will advise when the work is completed.

  15558   Sat Sep 5 12:01:10 2020 JonUpdateVACVac system UPS installation

Summary

Yesterday's UPS switchover was mostly a success. The new Tripp Lite 120V UPS is fully installed and is communicating with the slow controls system. The interlocks are configured to trigger a controlled shutdown upon an extended power outage (> ~30 s), and they have been tested. All of the 120V pumpspool equipment (the full c1vac/LAN/Acromag system, pressure gauges, valves, and the two small turbo pumps) has been moved to the new UPS. The only piece of equipment which is not 120V is TP1, which is intended to be powered by a separate 230V UPS. However that unit is still not working, and after more investigation and a call to Tripp Lite, I suspect it may be defective. A detailed account of the changes to the system follow below.

Unfortunately, I think I damaged the Hornet (the only working cathode ionization gauge in the main volume) by inadvertently unplugging it while switching over equipment to the new UPS. The electronics are run from multiple daisy-chained power strips in the bottom of the rack and it is difficult to trace where everything goes. After the switchover, the Hornet repeatedly failed to activate (either remotely or manually) with the error "HV fail." Its compatriot, the Pirani SuperBee, also failed about a year ago under similar circumstances (or at least its remote interface did, making it useless for digital monitoring and control). I think we should replace them both, ideally with ones with some built-in protection against power failures.

New EPICS channels

Four new soft channels per UPS have been created, although the interlocks are currently predicated on only C1:Vac-UPS120V_status.

Channel Type Description Units
C1:Vac-UPS120V_status stringin Operational status -
C1:Vac-UPS120V_battery ai Battery remaining %
C1:Vac-UPS120V_line_volt ai Input line voltage V
C1:Vac-UPS120V_line_freq ai Input line frequency Hz
C1:Vac-UPS240V_status stringin Operational status -
C1:Vac-UPS240V_battery ai Battery remaining %
C1:Vac-UPS240V_line_volt ai Input line voltage V
C1:Vac-UPS240V_line_freq ai Input line frequency Hz

These new readbacks are visible in the MEDM vacuum control/monitor screens, as circled in Attachment 1:

Continuing issues with 230V UPS

Yesterday I brought with me a custom power cable for the 230V UPS. It adapts from a 208/120V three-phase outlet (L21-20R) to a standard outlet receptacle (5-15P) which can mate with the UPS's C14 power cable. I installed the cable and confirmed that, at the UPS end, 208V AC was present split-phase (i.e., two hot wires separated 120 deg in phase, each at 120V relative to ground). This failed to power on the unit. Then Jordan showed up and suggested to try powering it instead from a single-phase 240V outlet (L6-20R). However we found that the voltage present at this outlet was exactly the same as what the adapter cable provides: 208V split-phase.

This UPS nominally requires 230V single-phase. I don't understand well enough how the line-noise-isolation electronics work internally, so I can think of three possible explanations:

  1. 208V AC is insufficient to power the unit.
  2. The unit requires a true neutral wire (i.e., not a split-phase configuration), in which case it is not compatible with the U.S. power grid.
  3. The unit is defective.

I called Tripp Lite technical support. They thought the unit should work as powered in the configuration I described, so this leads me to suspect #3.

@Chub and Jordan: Can you please look into somehow replacing this unit, potentially with a U.S.-specific model? Let's stick with the Tripp Lite brand though, as I already have developed the code to interface those.

UPS-host computer communications

Unlike our older equipment, which communicates serially with the host via RS232/485, the new UPS units can be connected with a USB 3.0 cable. I found a great open-source package for communicating directly with the UPS from within Python, Network UPS Tools (NUT), which eliminates the dependency on Tripp Lite's proprietary GUI. The package is well documented, supports hundreds of power-management devices, and is available in the Debian package manager from Jessie (Debian 8) up. It consists of a large set of low-level, device-specific drivers which communicate with a "server" running as a systemd service. The NUT server can then be queried using a uniform set of programming commands across a huge number of devices.

I document the full set-up procedure below, as we may want to use this with more USB devices in the future.

How to set up

First, install the NUT package and its Python binding:

$ sudo apt install nut python-nut

This automatically creates (and starts) a set of systemd processes which expectedly fail, since we have not yet set up the config. files defining our USB devices. Stop these services, delete their default definitions, and replace them with the modified definitions from the vacuum git repo:

$ sudo systemctl stop nut-*.service
$ sudo rm /lib/systemd/system/nut-*.service
$ sudo cp /opt/target/services/nut-*.service /etc/systemd/system
$ sudo systemctl daemon-reload

Next copy the NUT config. files from the vacuum git repo to the appropriate system location (this will overwrite the existing default ones). Note that the file ups.conf defines the UPS device(s) connected to the system, so for setups other than c1vac it will need to be edited accordingly.

$ sudo cp /opt/target/services/nut/* /etc/nut

Now we are ready to start the NUT server, and then enable it to automatically start after reboots:

$ sudo systemctl start nut-server.service
$ sudo systemctl enable nut-server.service

If it succeeds, the start command will return without printing any output to the terminal. We can test the server by querying all the available UPS parameters with

$ upsc 120v

which will print to the terminal screen something like

battery.charge: 100
battery.runtime: 1215
battery.type: PbAC
battery.voltage: 13.5
battery.voltage.nominal: 12.0
device.mfr: Tripp Lite 
device.model: Tripp Lite UPS 
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.productid: 2010
driver.parameter.vendorid: 09ae
driver.version: 2.7.2
driver.version.data: TrippLite HID 0.81
driver.version.internal: 0.38
input.frequency: 60.1
input.voltage: 120.3
input.voltage.nominal: 120
output.frequency.nominal: 60
output.voltage.nominal: 120
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.mfr: Tripp Lite 
ups.model: Tripp Lite UPS 
ups.power.nominal: 1000
ups.productid: 2010
ups.status: OL
ups.timer.reboot: 65535
ups.timer.shutdown: 65535
ups.vendorid: 09ae
ups.watchdog.status: 0

Here 120v is the name assigned to the 120V UPS device in the ups.conf file, so it will vary for setups on other systems.

If all succeeds to this point, what we have set up so far is a set of command-line tools for querying (and possibly controlling) the UPS units. To access this functionality from within Python scripts, a set of official Python bindings are provided by the python-nut package. However, at the time of writing, these bindings only exist for Python 2.7. For Python 3 applications (like the vacuum system), I have created a Python 3 translation which is included in the vacuum git repo. Refer to the UPS readout script for an illustration of its usage.

Attachment 1: vac_medm.png
vac_medm.png
  15560   Sun Sep 6 13:15:44 2020 JonUpdateDAQUPS for framebuilder

Now that the old APC Smart-UPS 2200 is no longer in use by the vacuum system, I looked into whether it can be repurposed for the framebuilder machine. Yes, it can. The max power consumption of the framebuilder (a SunFire X4600) is 1.137kW. With fresh batteries, I estimate this UPS can power the framebuilder for >10 min. and possibly as long as 30 min., depending on the exact load.

@Chub/Jordan, this UPS is ready to be moved to rack 1X6/1X7. It just has to be disconnected from the wall outlet. All of the equipment it was previously powering has been moved to the new UPS. I have ordered a replacement battery (APC #RBC43) which is scheduled to arrive 9/09-11.

  15561   Sun Sep 6 14:17:18 2020 JonUpdateEquipment loanZurich Instruments analyzer

On Friday, I grabbed the Zurich Instruments HF2LI lock-in amplifier and brought it home. As time permits, I will work towards developing a similar readout script as we have for the SR785.

  15567   Thu Sep 10 15:43:22 2020 JonUpdateBHDInput noise spectra for A+ BHD modeling

As promised some time ago, I've obtained input noise spectra from the sites calibrated to physical units. They are located in a new subdirectory of the BHD repo: A+/input_noises. I've heavily annotated the notebook that generates them (input_noises.ipynb) with aLOG references, to make it transparent what filters, calibrations, etc. were applied and when the data were taken. Each noise term is stored as a separate HDF5 file, which are all tracked via git LFS.

So far there are measurements of the following sources:

  • L1 SRCL
  • H1 SRCL
  • L1 DHARD PIT
  • L1 DSOFT PIT
  • L1 CSOFT PIT
  • L1 CHARD PIT

These can be used, for example, to make more realistic Hang's bilinear noise modeling [ELOG 15503] and Yehonathan's Monte Carlo simulations [ELOG 15539]. Let me know if there are other specific noises of interest and I will try to acquire them. It's a bit time-consuming to search out individual channel calibrations, so I will have to add them on a case-by-case basis.

  15577   Wed Sep 16 12:03:07 2020 JonUpdateVACReplacing pressure gauges

Assembled is the list of dead pressure gauges. Their locations are also circled in Attachment 1.

Gauge Type Location
CC1 Cold cathode Main volume
CC3 Cold cathode Pumpspool 
CC4 Cold cathode RGA chamber
CCMC Cold cathode IMC beamline near MC2
P1b Pirani Main volume
PTP1 Pirani TP1 foreline

For replacements, I recommend we consider the Agilent FRG-700 Pirani Inverted Magnetron Gauge. It uses dual sensing techniques to cover a broad pressure range from 3e-9 torr to atmosphere in a single unit. Although these are more expensive, I think we would net save money by not having to purchase two separate gauges (Pirani + hot/cold cathode) for each location. It would also simplify the digital controls and interlocking to have a streamlined set of pressure readbacks.

For controllers, there are two options with either serial RS232/485 or Ethernet outputs. We probably want the Agilent XGS-600, as it can handle all the gauges in our system (up to 12) in a single controller and no new software development is needed to interface it with the slow controls.

Attachment 1: vac_gauges.png
vac_gauges.png
  15692   Wed Dec 2 12:27:49 2020 JonUpdateVACReplacing pressure gauges

Now that the new Agilent full-range gauges (FRGs) have been received, I'm putting together an installation plan. Since my last planning note in Sept. (ELOG 15577), two more gauges appear to be malfunctioning: CC2 and PAN. Those are taken into account, as well. Below are the proposed changes for all the sensors in the system.

In summary:

  • Four of the FRGs will replace CC1/2/3/4.
  • The fifth FRG will replace CCMC if the 15.6 m cable (the longest available) will reach that location.
  • P2 and P3 will be moved to replace PTP1 and PAN, as they will be redundant once the new FRGs are installed.

Required hardware:

  • 3x CF 2.75" blanks
  • 10x CF 2.75" gaskets
  • Bolts and nut plates
Volume Sensor Location Status Proposed Action
Main P1a functioning leave
Main P1b local readback only leave
Main CC1 dead replace with FRG
Main CCMC dead replace with FRG*
Pumpspool PTP1 dead replace with P2
Pumpspool P2 functioning replace with 2.75" CF blank
Pumpspool CC2 intermittent replace with FRG
Pumpspool PTP2 functioning leave
Pumpspool P3 functioning replace with 2.75" CF blank
Pumpspool CC3 dead replace with FRG
Pumpspool PTP3 functioning leave
Pumpspool PRP functioning leave
RGA P4 functioning leave
RGA CC4 dead replace with FRG
RGA IG1 dead replace with 2.75" CF blank
Annuli PAN intermittent replace with P3
Annuli PASE functioning leave
Annuli PASV functioning leave
Annuli PABS functioning leave
Annuli PAEV functioning leave
Annuli PAEE functioning leave

 

Quote:

For replacements, I recommend we consider the Agilent FRG-700 Pirani Inverted Magnetron Gauge. It uses dual sensing techniques to cover a broad pressure range from 3e-9 torr to atmosphere in a single unit. Although these are more expensive, I think we would net save money by not having to purchase two separate gauges (Pirani + hot/cold cathode) for each location. It would also simplify the digital controls and interlocking to have a streamlined set of pressure readbacks.

For controllers, there are two options with either serial RS232/485 or Ethernet outputs. We probably want the Agilent XGS-600, as it can handle all the gauges in our system (up to 12) in a single controller and no new software development is needed to interface it with the slow controls.

 

  15703   Thu Dec 3 14:53:58 2020 JonUpdateVACReplacing pressure gauges

Update to the gauge replacement plan (15692), based on Jordan's walk-through today. He confirmed:

  • All of the gauges being replaced are mounted via 2.75" ConFlat flange. The new FRGs have the same footprint, so no adapters are required.
  • The longest Agilent cable (50 ft) will NOT reach the CCMC location. The fifth FRG will have to be installed somewhere closer to the X-end.

Based on this info (and also info from Gautam that the PAN gauge is still working), I've updated the plan as follows. In summary, I now propose we install the fifth FRG in the TP1 foreline (PTP1 location) and leave P2 and P3 where they are, as they are no longer needed elsewhere. Any comments on this plan? I plan to order all the necessary gaskets, blanks, etc. tomorrow.

Volume Sensor Location Status Proposed Action
Main P1a functioning leave
Main P1b local readback only leave
Main CC1 dead replace with FRG
Main CCMC dead remove; cap with 2.75" CF blank
Pumpspool PTP1 dead replace with FRG
Pumpspool P2 functioning leave
Pumpspool CC2 dead replace with FRG
Pumpspool PTP2 functioning leave
Pumpspool P3 functioning leave
Pumpspool CC3 dead replace with FRG
Pumpspool PTP3 functioning leave
Pumpspool PRP functioning leave
RGA P4 functioning leave
RGA CC4 dead replace with FRG
RGA IG1 dead remove; cap with 2.75" CF blank
Annuli PAN functioning leave
Annuli PASE functioning leave
Annuli PASV functioning leave
Annuli PABS functioning leave
Annuli PAEV functioning leave
Annuli PAEE functioning leave
  15724   Thu Dec 10 13:05:52 2020 JonUpdateVACUPS failure

I've investigated the vacuum controls failure that occurred last night. Here's what I believe happened.

From looking at the system logs, it's clear that there was a sudden loss of power to the control computer (c1vac). Also, the system was actually down for several hours. The syslog shows normal EPICS channel writes (pressure readback updates, etc., and many of them per minute) which suddenly stop at 4:12 pm. There are no error or shutdown messages in the syslog or in the interlock log. The next activity is the normal start-up messaging at 7:39 pm. So this is all consistent with the UPS suddenly failing.

According to the Tripp Lite manual, the FAULT icon indicates "the battery-supported outlets are overloaded." The failure of the TP2 dry pump appears to have caused this. After the dry pump failure, the rising pressure in the TP2 foreline caused TP2's current draw to increase way above its normal operating range. Attachment 1 shows anomalously high TP2 current and foreline pressure in the minutes just before the failure. The critical system-wide failure is that this overloaded the UPS before overloading TP2's internal protection circuitry, which would have shut down the pump, triggering interlocks and auto-notifications.

Preventing this in the future:

First, there are too many electronics on the 1 kVA UPS. The reason I asked us to buy a dual 208/120V UPS (which we did buy) is to relieve the smaller 120V UPS. I envision moving the turbo pumps, gauge controllers, etc. all to the 5 kVA unit and reserving the smaller 1 kVA unit for the c1vac computer and its peripherals. We now have the dual 208/120V UPS in hand. We should make it a priority to get that installed.

Second, there are 1 Hz "blinker" channels exposed for c1vac and all the slow controls machines, each reporting the machine's alive status. I don't think they're being monitored by any auto-notification program (running on a central machine), but they could be. Maybe there already exists code that could be co-opted for this purpose? There is an MEDM screen displaying the slow machine statuses at Sitemap > CDS > SLOW CONTROLS STATUS, pictured in Attachment 2. This is the only way I know to catch sudden failures of the control computer itself.

Attachment 1: TP2_time_history.png
TP2_time_history.png
Attachment 2: slow_controls_monitors.png
slow_controls_monitors.png
  15729   Thu Dec 10 17:12:43 2020 JonUpdate New SMA cables on order

As requested, I placed an order for an assortment of new RF cables: SMA male-male, RG405.

  • x3 12"
  • x3 24"
  • x2 48"

They're expected to arrive mid next week.

  15738   Fri Dec 18 22:59:12 2020 JonConfigurationCDSUpdated CDS upgrade plan

Attached is the layout for the "intermediate" CDS upgrade option, as was discussed on Wednesday. Under this plan:

  • Existing FEs stay where they are (they are not moved to a single rack)

  • Dolphin IPC remains PCIe Gen 1

  • RFM network is entirely replaced with Dolphin IPC

Please send me any omissions or corrections to the layout.

Attachment 1: CDS_2020_Dec.pdf
CDS_2020_Dec.pdf
Attachment 2: CDS_2020_Dec.graffle
  15739   Sat Dec 19 00:25:20 2020 JonUpdate New SMA cables on order

I re-ordered the below cables, this time going with flexible, double-shielded RG316-DS. Jordan will pick up and return the RG-405 cables after the holidays.

Quote:

As requested, I placed an order for an assortment of new RF cables: SMA male-male, RG405.

  • x3 12"
  • x3 24"
  • x2 48"
  15764   Thu Jan 14 12:19:43 2021 JonUpdateCDSExpansion chassis from LHO

That's fine, we didn't actually request those. We bought and already have in hand new PCIe x4 cables for the chassis-host connection. They're 3 m copper cables, which was based on the assumption of the time that host and chassis would be installed in the same rack.

Quote:
  1. Regarding the fibers - one of the fibers is pre-2012. These are known to fail (according to Rolf). One of the two that LHO shipped is from 2012 (judging by S/N, I can't find an online lookup for the serial number), the other is 2011. IIRC, Rolf offered us some fibers so we may want to take him up on that. We may also be able to use copper cables if the distances b/w server and expansion chassis are short.
  15766   Fri Jan 15 15:06:42 2021 JonUpdateCDSExpansion chassis from LHO

Koji asked me assemble a detailed breakdown of the parts received from LHO, which I do based on the high-res photos that Gautam posted of the shipment.

Parts in hand:

Qty Part Note(s)
2 Chassis body  
2 Power board and cooling fans As noted in 15763, these have the standard LIGO +24V input connector which we may want to change
2 IO interface backplane  
2 PCIe backplane  
2 Chassis-side OSS PCIe x4 card  
2 CX4 fiber cables These were not requested and are not needed

Parts still needed:

Qty Part Note(s)
2 Host-side OSS PCIe x4 card These were requested but missing from the LHO shipment 
2 Timing slave These were not originally requested, but we have recently learned they will be replaced at LHO soon

Issue with PCIe slots in new FEs

Also, I looked into the mix-up regarding the number of PCIe slots in the new Supermicro servers. The motherboard actually has six PCIe slots and is on the CDS list of boards known to be compatible. The mistake (mine) was in selecting a low-profile (1U) chassis that only exposes one of these slots. But at least it's not a fundamental limitation.

One option is to install an external PCIe expansion chassis that would be rack-mounted right above the FE. It is automatically configured by the system BIOS, so doesn't require any special drivers. It also supports hot-swapping of PCIe cards. There are also cheap ribbon-cable riser cards that would allow more cards to be connected for testing, although this is not as great for permanent mounting.

It may still be better to use the machines offered by Keith Thorne from LLO, as they're more powerful anyway. But if there is going to be an extended delay before those can be received, we should be able to use the machines we already have in conjunction with one of these PCIe expansion options.

  15770   Tue Jan 19 13:19:24 2021 JonUpdateCDSExpansion chassis from LHO

Indeed T1800302 is the document I was alluding to, but I completely missed the statement about >3 GHz speed. There is an option for 3.4 GHz processors on the X10SRi-F board, but back in 2019 I chose against it because it would double the cost of the systems. At the time I thought I had saved us $5k. Hopefully we can get the LLO machines in the near term---but if not, I wonder if it's worth testing one of these to see whether the performance is tolerable.

Can you please provide a link to this "list of boards"? The only document I can find is T1800302....

I confirm that PCIe 2.0 motherboards are backwards compatible with PCIe 1.x cards, so there's no hardware issue. My main concern is whether the obsolete Dolphin drivers (requiring linux kernel <=3.x) will work on a new system, albeit one running Debian 8. The OSS PCIe card is automatically configured by the BIOS, so no external drivers are required for that one.

Please also confirm that there are no conflicts w.r.t. the generation of PCIe slots, and the interfaces (Dolphin, OSSI) we are planning to use - the new machines we have are "PCIe 2.0" (though i have no idea if this is the same as Gen 2). 

  15771   Tue Jan 19 14:05:25 2021 JonConfigurationCDSUpdated CDS upgrade plan

I've produced updated diagrams of the CDS layout, taking the comments in 15476 into account. I've also converted the 40m's diagrams from Omnigraffle ($150/license) to the free, cloud-based platform draw.io. I had never heard of draw.io, but I found that it has most all the same functionality. It also integrates nicely with Google Drive.

Attachment 1: The planned CDS upgrade (2 new FEs, fully replace RFM network with Gen 1 Dolphin IPC)
Attachment 2: The current 40m CDS topology

The most up-to-date diagrams are hosted at the following links:

Please send me any further corrections or omissions. Anyone logged in with LIGO.ORG credentials can also directly edit the diagrams.

Attachment 1: 40m_CDS_Network_-_Planned.pdf
40m_CDS_Network_-_Planned.pdf
Attachment 2: 40m_CDS_Network_-_Current.pdf
40m_CDS_Network_-_Current.pdf
  15842   Wed Feb 24 22:13:47 2021 JonUpdateCDSPlanning document for front-end testing

I've started writing up a rough testing sequence for getting the three new front-ends operational (c1bhd, c1sus2, c1ioo). Since I anticipate this plan undergoing many updates, I've set it up as a Google doc which everyone can edit (log in with LIGO.ORG credentials).

Link to planning document

Please have a look and add any more tests, details, or concerns. I will continue adding to it as I read up on CDS documentation.

  15872   Fri Mar 5 17:48:25 2021 JonUpdateCDSFront-end testing

Today I moved the c1bhd machine from the control room to a new test area set up behind (west of) the 1X6 rack. The test stand is pictured in Attachment 1. I assembled one of the new IO chassis and connected it to the host.

I/O Chassis Assembly

  • LIGO-style 24V feedthrough replaced with an ATX 650W switching power supply
  • Timing slave installed
  • Contec DO-1616L-PE card installed for timing control
  • One 16-bit ADC and one 32-channel DO module were installed for testing

The chassis was then powered on and LED lights illuminated indicating that all the components have power. The assembled chassis is pictured in Attachment 2.

Chassis-Host Communications Testing

Following the procedure outlined T1900700, the system failed the very first test of the communications link between chassis and host, which is to check that all PCIe cards installed in both the host and the expansion chassis are detected. The Dolpin host adapter card is detected:

07:06.0 PCI bridge: Stargen Inc. Device 0102 (rev 02) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=07, secondary=0e, subordinate=0e, sec-latency=0
    I/O behind bridge: 00002000-00002fff
    Prefetchable memory behind bridge: 00000000c0200000-00000000c03fffff
    Capabilities: [40] Power Management version 2
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [60] Express Downstream Port (Slot+), MSI 00
    Capabilities: [80] Subsystem: Device 0000:0000
    Kernel driver in use: pcieport

However the OSS PCIe adapter card linking the host to the IO chassis was not detected, nor were any of the cards in the expansion chassis. Gautam previously reported that the OSS card was not detected by the host (though it was not connected to the chassis then). Even now connected to the IO chassis, the card is still not detected. On the chassis-side OSS card, there is a red LED illuminated indicating "HOST CARD RESET" as pictured in Attachment 3. This may indicate a problem with the card on the host side. Still more debugging to be done.

Attachment 1: image_67203585.JPG
image_67203585.JPG
Attachment 2: image_67216641.JPG
image_67216641.JPG
Attachment 3: image_17185537.JPG
image_17185537.JPG
  15890   Tue Mar 9 16:52:47 2021 JonUpdateCDSFront-end testing

Today I continued with assembly and testing of the new front-ends. The main progress is that the IO chassis is now communicating with the host, resolving the previously reported issue.

Hardware Issues to be Resolved

Unfortunately, though, it turns out one of the two (host-side) One Stop Systems PCIe cards sent from Hanford is bad. After some investigation, I ultimately resolved the problem by swapping in the second card, with no other changes. I'll try to procure another from Keith Thorne, along with some spares.

Also, two of the three switching power supplies sent from Livingston (250W Channel Well PSG400P-89) appear to be incompatible with the Trenton BPX6806 PCIe backplanes in these chassis. The power supply cable has 20 conductors and the connector on the board has 24. The third supply, a 650W Antec EA-650, does have the correct cable and is currently powering one of the IO chassis. I'll confirm this situation with Keith and see whether they have any more Antecs. If not, I think these supplies can still be bought (not obsolete).

I've gone through all the hardware we've received, checked against the procurement spreadsheet. There are still some missing items:

  • 18-bit DACs (Qty 14; but 7 are spares)
  • ADC adapter boards (Qty 5)
  • DAC adapter boards (Qty 9)
  • 32-channel DO modules (Qty 2/10 in hand)

Testing Progress

Once the PCIe communications link between host and IO chassis was working, I carried out the testing procedure outlined in T1900700. This performs a series checks to confirm basic operation/compatibility of the hardware and PCIe drivers. All of the cards installed in both the host and the expansion chassis are detected and appear correctly configured, according to T1900700. In the below tree, there is one ADC, one 16-ch DIO, one 32-ch DO, and one DolphinDX card:

+-05.0-[05-20]----00.0-[06-20]--+-00.0-[07-08]----00.0-[08]----00.0  Contec Co., Ltd Device 86e2
|                               +-01.0-[09]--
|                               +-03.0-[0a]--
|                               +-08.0-[0b-15]----00.0-[0c-15]--+-02.0-[0d]--
|                               |                               +-03.0-[0e]--
|                               |                               +-04.0-[0f]--
|                               |                               +-06.0-[10-11]----00.0-[11]----04.0  PLX Technology, Inc. PCI9056 32-bit 66MHz PCI <-> IOBus Bridge
|                               |                               +-07.0-[12]--
|                               |                               +-08.0-[13]--
|                               |                               +-0a.0-[14]--
|                               |                               \-0b.0-[15]--
|                               \-09.0-[16-20]----00.0-[17-20]--+-02.0-[18]--
|                                                               +-03.0-[19]--
|                                                               +-04.0-[1a]--
|                                                               +-06.0-[1b]--
|                                                               +-07.0-[1c]--
|                                                               +-08.0-[1d]--
|                                                               +-0a.0-[1e-1f]----00.0-[1f]----00.0  Contec Co., Ltd Device 8632
|                                                               \-0b.0-[20]--
\-08.0-[21-2a]--+-00.0  Stargen Inc. Device 0101
                \-00.1-[22-2a]--+-00.0-[23]--
                                +-01.0-[24]--
                                +-02.0-[25]--
                                +-03.0-[26]--
                                +-04.0-[27]--
                                +-05.0-[28]--
                                +-06.0-[29]--
                                \-07.0-[2a]--

Standalone Subnet

Before I start building/testing RTCDS models, I'd like to move the new front ends to an isolated subnet. This is guaranteed to prevent any contention with the current system, or inadvertent changes to it.

Today I set up another of the Supermicro servers sent by Livingston in the 1X6 test stand area. The intention is for this machine to run a cloned, bootable image of the current fb1 system, allowing it to function as a bootserver and DAQ server for the FEs on the subnet.

However, this hard disk containing the fb1 image appears to be corrupted and will not boot. It seems to have been sitting disconnected in a box since ~2018, which is not a stable way to store data long term. I wasn't immediately able to recover the disk using fsck. I could spend some more time trying, but it might be most time-effective to just make a new clone of the fb1 system as it is now.

Attachment 1: image_72192707.JPG
image_72192707.JPG
  15924   Tue Mar 16 16:27:22 2021 JonUpdateCDSFront-end testing

Some progress today towards setting up an isolated subnet for testing the new front-ends. I was able to recover the fb1 backup disk using the Rescatux disk-rescue utility and successfully booted an fb1 clone on the subnet. This machine will function as the boot server and DAQ server for the front-ends under test. (None of these machines are connected to the Martian network or, currently, even the outside Internet.)

Despite the success with the framebuilder, front-ends cannot yet be booted locally because we are still missing the DHCP and FTP servers required for network booting. On the Martian net, these processes are running not on fb1 but on chiara. And to be able to compile and run models later in the testing, we will need the contents of the /opt/rtcds directory also hosted on chiara.

For these reasons, I think it will be easiest to create another clone of chiara to run on the subnet. There is a backup disk of chiara and I attempted to boot it on one of the LLO front-ends, but without success. The repair tool I used to recover the fb1 disk does not find a problem with the chiara disk. However, the chiara disk is an external USB drive, so I suspect there could be a compatibility problem with these old (~2010) machines. Some of them don't even recognize USB keyboards pre-boot-up. I may try booting the USB drive from a newer computer.

Edit: I removed one of the new, unused Supermicros from the 1Y2 rack and set it up in the test stand. This newer machine is able to boot the chiara USB disk without issue. Next time I'll continue with the networking setup.

  15947   Fri Mar 19 18:14:56 2021 JonUpdateCDSFront-end testing

Summary

Today I finished setting up the subnet for new FE testing. There are clones of both fb1 and chiara running on this subnet (pictured in Attachment 2), which are able to boot FEs completely independently of the Martian network. I then assembled a second FE system (Supermicro host and IO chassis) to serve as c1sus2, using a new OSS host adapter card received yesterday from LLO. I ran the same set of PCIe hardware/driver tests as was done on the c1bhd system in 15890. All the PCIe tests pass.

Subnet setup

For future reference, below is the procedure used to configure the bootserver subnet.

  • Select "Network" as highest boot priority in FE BIOS settings
  • Connect all machines to subnet switch. Verify fb1 and chiara eth0 interfaces are enabled and assigned correct IP address.
  • Add c1bhd and c1sus2 entries to chiara:/etc/dhcp/dhcpd.conf:
host c1bhd {
  hardware ethernet 00:25:90:05:AB:46;
  fixed-address 192.168.113.91;
}
host c1bhd {
  hardware ethernet 00:25:90:06:69:C2;
  fixed-address 192.168.113.92;
}
  • Restart DHCP server to pick up changes:
$ sudo service isc-dhcp-server restart
  • Add c1bhd and c1sus2 entries to fb1:/etc/hosts:
192.168.113.91    c1bhd
192.168.113.92    c1sus2
  • Power on the FEs. If all was configured correctly, the machines will boot.

C1SUS2 I/O chassis assembly

  • Installed in host:
    • DolphinDX host adapter
    • One Stop Systems PCIe x4 host adapter (new card sent from LLO)
  • Installed in chassis:
    • Channel Well 250 W power supply (replaces aLIGO-style 24 V feedthrough)
    • Timing slave
    • Contec DIO-1616L-PE module for timing control

Next time, on to RTCDS model compilation and testing. This will require first obtaining a clone of the /opt/rtcds disk hosted on chiara.

Attachment 1: image_72192707_(1).JPG
image_72192707_(1).JPG
Attachment 2: image_50412545.JPG
image_50412545.JPG
  15948   Fri Mar 19 19:15:13 2021 JonUpdateCDSc1auxey assembly

Today I helped Yehonathan get started with assembly of the c1auxey (slow controls) Acromag chassis. This will replace the final remaining VME crate. We cleared the far left end of the electronics bench in the office area, as discussed on Wed. The high-voltage supplies and test equipment was moved together to the desk across the aisle.

Yehonathan has begun assembling the chassis frame (it required some light machining to mount the DIN rails that hold the Acromag units). Next, he will wire up the switches, LED indicator lights, and Acromag power connectors following the the documented procedure.

  15959   Wed Mar 24 19:02:21 2021 JonUpdateCDSFront-end testing

This evening I prepared a new 2 TB 3.5" disk to hold a copy of /opt/rtcds and /opt/rtapps from chiara. This is the final piece of setup before model compilation can be tested on the new front-ends. However chiara does not appear to support hot-swapping of disks, as the disk is not recognized when connected to the live machine. I will await confirmation before rebooting it. The new disk is not currently connected.

  15976   Mon Mar 29 17:55:50 2021 JonUpdateCDSFront-end testing

Cloning of chiara:/home/cvs underway

I returned today with a beefier USB-SATA adapter, which has an integrated 12 V supply for powering 3.5" disks. I used this to interface a new 6 TB 3.5" disk found in the FE supplies cabinet.

I decided to go with a larger disk and copy the full contents of chiara:/home/cds. Strictly, the FEs only strictly need the RTS executables in /home/cvs/rtcds and /home/cvs/rtapps. However, to independently develop models, the shared matlab binaries in /home/cvs/caltech/... also need to be exposed. And there may be others I've missed.

I began the clone around 12:30 pm today. To preserve bandwidth to the main disk, I am copying not the /home/cds disk directly, but rather its backup image at /media/40mBackup.

Set up of dedicated SimPlant host

Although not directly related to the FE testing, today I added a new machine to the test stand which will be dedicated to running sim models. Chris has developed a virtual cymac which we plan to run on this machine. It will provide a dedicated testbed for SimPlant and other development, and can host up to 10 user models.

I used one of the spare 12-core Supermicro servers from LLO, which I have named c1sim. I assigned it the IP address 192.168.113.93 on the Martian network. This machine will run in a self-contained way that will not depend on any 40m CDS services and also should not interfere with them. However, if there are concerns about having it present on the network, it can be moved to the outside-facing switch in the office area. It is not currently running any RTCDS processes.

Set-up was carried out via the following procedure:

  • Installed Debian 10.9 on an internal 480 GB SSD.
  • Installed cdssoft repos following Jamie's instructions.
  • Installed RTS and Docker dependencies:
    $ sudo apt install cpuset advligorts-mbuf-dkms advligorts-gpstime-dkms docker.io docker-compose
  • Configured scheduler for real-time operation:
    $ sudo /sbin/sysctl kernel.sched_rt_runtime_us = -1
  • Reserved 10 cores for RTS user models (plus one for IOP model) by adding the following line to /etc/default/grub:
    GRUB_CMDLINE_LINUX_DEFAULT="isolcpus=nohz,domain,1-11 nohz_full=1-11 tsc=reliable mce=off"
    followed by the commands:
    $ sudo update-grub
    $ sudo reboot now
  • Downloaded virtual cymac repo to /home/controls/docker-cymac.

I need to talk to Chris before I can take the setup further.

  15979   Tue Mar 30 18:21:34 2021 JonUpdateCDSFront-end testing

Progress today:

Outside Internet access for FE test stand

This morning Jordan and I ran an 85-foot Cat 6 Ethernet cable from the campus network switch in the office area (on the ligo.caltech.edu domain) to the FE test stand near 1X6. This is to allow the test-stand subnet to be accessed for remote testing, while keeping it invisible to the parallel Martian subnet.

Successful RTCDS model compilation on new FEs

The clone of the chiara:/home/cds disk completed overnight. Today I installed the disk in the chiara clone. The NFS mounts (/opt/rtcds, /opt/rtapps) shared with the other test-stand machines mounted without issue.

Next, I attempted to open the shared Matlab executable (/cvs/cds/caltech/apps/linux64/matlab/bin/matlab) and launch Simulink. The existing Matlab license (/cvs/cds/caltech/apps/linux64/matlab/licenses/license_chiara_865865_R2015b.lic) did not work on this new machine, as they are machine-specific, so I updated the license file. I linked this license to my personal license, so that the machine license for the real chiara would not get replaced. The original license file is saved in the same directory with a *.bak postfix. If this disk is ever used in the real chiara machine, this file should be restored. After the machine license was updated, Matlab and Simulink loaded and allowed model editing.

Finally, I tested RTCDS model compilation on the new FEs using the c1lsc model as a trial case. It encountered one path issue due to the model being located at /opt/rtcds/userapps/release/isc/c1/models/isc/ instead of /opt/rtcds/userapps/release/isc/c1/models/. This seems to be a relic of the migration of the 40m models from the SVN to a standalone git repo. This was resolved by simply symlinking to the expected location:

$ sudo ln -s /opt/rtcds/userapps/release/isc/c1/models/isc/c1lsc.mdl /opt/rtcds/userapps/release/isc/c1/models/c1lsc.mdl

The model compilation then succeeded:

controls@c1bhd$ cd /opt/rtcds/caltech/c1/rtbuild/release

controls@c1bhd$ make clean-c1lsc
Cleaning c1lsc...
Done

controls@c1bhd$ make c1lsc
Cleaning c1lsc...
Done
Parsing the model c1lsc...
Done
Building EPICS sequencers...
Done
Building front-end Linux kernel module c1lsc...
make[1]: Warning: File 'GNUmakefile' has modification time 28830 s in the
future
make[1]: warning:  Clock skew detected.  Your build may be incomplete.
Done
RCG source code directory:
/opt/rtcds/rtscore/branches/branch-3.4
The following files were used for this build:
/opt/rtcds/caltech/c1/userapps/release/cds/common/src/cdsToggle.c
/opt/rtcds/userapps/release/cds/c1/src/inmtrxparse.c
/opt/rtcds/userapps/release/cds/common/models/FILTBANK_MASK.mdl
/opt/rtcds/userapps/release/cds/common/models/rtbitget.mdl
/opt/rtcds/userapps/release/cds/common/models/SCHMITTTRIGGER.mdl
/opt/rtcds/userapps/release/cds/common/models/SQRT_SWITCH.mdl
/opt/rtcds/userapps/release/cds/common/src/DB2MAG.c
/opt/rtcds/userapps/release/cds/common/src/OSC_WITH_CONTROL.c
/opt/rtcds/userapps/release/cds/common/src/wait.c
/opt/rtcds/userapps/release/isc/c1/models/c1lsc.mdl
/opt/rtcds/userapps/release/isc/c1/models/IQLOCK_WHITENING_TRIGGERING.mdl
/opt/rtcds/userapps/release/isc/c1/models/PHASEROT.mdl
/opt/rtcds/userapps/release/isc/c1/models/RF_PD_WITH_WHITENING_TRIGGERING.mdl
/opt/rtcds/userapps/release/isc/c1/models/UGF_SERVO_40m.mdl
/opt/rtcds/userapps/release/isc/common/models/FILTBANK_TRIGGER.mdl
/opt/rtcds/userapps/release/isc/common/models/LSC_TRIGGER.mdl

Successfully compiled c1lsc
***********************************************
Compile Warnings, found in c1lsc_warnings.log:
***********************************************
[warnings suppressed]

As did the installation:

controls@c1bhd$ make install-c1lsc
Installing system=c1lsc site=caltech ifo=C1,c1
Installing /opt/rtcds/caltech/c1/chans/C1LSC.txt
Installing /opt/rtcds/caltech/c1/target/c1lsc/c1lscepics
Installing /opt/rtcds/caltech/c1/target/c1lsc
Installing start and stop scripts
/opt/rtcds/caltech/c1/scripts/killc1lsc
/opt/rtcds/caltech/c1/scripts/startc1lsc
Performing install-daq
Updating testpoint.par config file
/opt/rtcds/caltech/c1/target/gds/param/testpoint.par
/opt/rtcds/rtscore/branches/branch-3.4/src/epics/util/updateTestpointPar.pl
-par_file=/opt/rtcds/caltech/c1/target/gds/param/archive/testpoint_210330_170634.par
-gds_node=42 -site_letter=C -system=c1lsc -host=c1lsc
Installing GDS node 42 configuration file
/opt/rtcds/caltech/c1/target/gds/param/tpchn_c1lsc.par
Installing auto-generated DAQ configuration file
/opt/rtcds/caltech/c1/chans/daq/C1LSC.ini
Installing Epics MEDM screens
Running post-build script

safe.snap exists

We are ready to start building and testing models.

  15997   Tue Apr 6 07:19:11 2021 JonUpdateCDSNew SimPlant cymac

Yesterday Chris and I completed setup of the Supermicro machine that will serve as a dedicated host for developing and testing RTCDS sim models. It is currently sitting in the stack of machines in the FE test stand, though it should eventually be moved into a permanent rack.

It turns out the machine cannot run 10 user models, only 4. Hyperthreading was enabled in the BIOS settings, which created the illusion of there being 12 rather than 6 physical cores. Between Chris and Ian's sim models, we already have a fully-loaded machine. There are several more of these spare 6-core machines that could be set up to run additional models. But in the long term, and especially in Ian's case where the IFO sim models will all need to communicate with one another (this is a self-contained cymac, not a distributed FE system), we may need to buy a larger machine with 16 or 32 cores.

IPMI was set up for the c1sim cymac. I assigned the IPMI interface a static IP address on the Martian network (192.168.113.45) and registered it in the usual way with the domain name server on chiara. After updating the BIOS settings and rebooting, I was able to remotely power off and back on the machine following these instructions.

Set up of dedicated SimPlant host

Although not directly related to the FE testing, today I added a new machine to the test stand which will be dedicated to running sim models. Chris has developed a virtual cymac which we plan to run on this machine. It will provide a dedicated testbed for SimPlant and other development, and can host up to 10 user models.

I used one of the spare 12-core Supermicro servers from LLO, which I have named c1sim. I assigned it the IP address 192.168.113.93 on the Martian network. This machine will run in a self-contained way that will not depend on any 40m CDS services and also should not interfere with them.

  15998   Tue Apr 6 11:13:01 2021 JonUpdateCDSFE testing

I/O chassis assembly

Yesterday I installed all the available ADC/DAC/BIO modules and adapter boards into the new I/O chassis (c1bhd, c1sus2). We are still missing three ADC adapter boards and six 18-bit DACs. A thorough search of the FE cabinet turned up several 16-bit DACs, but only one adapter board. Since one 16-bit DAC is required anyway for c1sus2, I installed the one complete set in that chassis.

Below is the current state of each chassis. Missing components are highlighted in yellow. We cannot proceed to loopback testing until at least some of the missing hardware is in hand.

C1BHD

Component Qty Required Qty Installed
16-bit ADC 1 1
16-bit ADC adapter 1 0
18-bit DAC 1 0
18-bit DAC adapter 1 1
16-ch DIO 1 1

C1SUS2

Component Qty required Qty Installed
16-bit ADC 2 2
16-bit ADC adapter 2 0
16-bit DAC 1 1
16-bit DAC adapter 1 1
18-bit DAC 5 0
18-bit DAC adapter 5 5
32-ch DO 6 6
16-ch DIO 1 1

Gateway for remote access

To enable remote access to the machines on the test stand subnet, one machine must function as a gateway server. Initially, I tried to set this up using the second network interface of the chiara clone. However, having two active interfaces caused problems for the DHCP and FTS servers and broke the diskless FE booting. Debugging this would have required making changes to the network configuration that would have to be remembered and reverted, were the chiara disk to ever to be used in the original machine.

So instead, I simply grabbed another of the (unused) 1U Supermicro servers from the 1Y1 rack and set it up on the subnet as a standalone gateway server. The machine is named c1teststand. Its first network interface is connected to the general computing network (ligo.caltech.edu) and the second to the test-stand subnet. It has no connection to the Martian subnet. I installed Debian 10.9 anticipating that, when the machine is no longer needed in the test stand, it can be converted into another docker-cymac for to run additional sim models.

Currently, the outside-facing IP address is assigned via DHCP and so periodically changes. I've asked Larry to assign it a static IP on the ligo.caltech.edu domain, so that it can be accessed analogously to nodus.

  16012   Sat Apr 10 08:51:32 2021 JonUpdateCDSI/O Chassis Assembly

I installed three of the 16-bit ADC adapter boards assembled by Koji. Now, the only missing hardware is the 18-bit DACs (quantities below). As I mentioned this week, there are 2-3 16-bit DACs available in the FE cabinet. They could be used if more 16-bit adapter boards could be procured.

C1BHD    
Component Qty Required Qty Installed
16-bit ADC 1 1
16-bit ADC adapter 1 1
18-bit DAC 1 0
18-bit DAC adapter 1 1
16-ch DIO 1 1
C1SUS2    
Component Qty required Qty Installed
16-bit ADC 2 2
16-bit ADC adapter 2 2
16-bit DAC 1 1
16-bit DAC adapter 1 1
18-bit DAC 5 0
18-bit DAC adapter 5 5
32-ch DO 6 6
16-ch DIO 1 1
  16015   Sat Apr 10 11:56:14 2021 JonUpdateCDS40m LSC simPlant model

Summary

Yesterday I resurrected the 40m's LSC simPlant model, c1lsp. It is running on c1sim, a virtual, self-contained cymac that Chris and I set up for developing sim models (see 15997). I think the next step towards an integrated IFO model is incorporating the suspension plants. I am going to hand development largely over to Ian at this point, with continued support from me and Chris.

x1lsp_main.png

 

LSC Plant

This model dates back to around 2012 and appears to have last been used in ~2015. According to the old CDS documentation:

Name Description Communicates directly with
LSP Simulated length sensing model of the physical plant, handles light propagation between mirrors, also handles alignment modeling and would have to communicate ground motion to all the suspensions for ASS to work LSC, XEP, YEP, VSP

Here XEP, YEP, and VSP are respectively the x-end, y-end, and vertex suspension plant models. I haven't found any evidence that these were ever fully implemented for the entire IFO. However, it looks like SUS plants were later implemented for a single arm cavity, at least, using two models named c1sup and c1spx (appear in more recent CDS documentation). These suspension plants could likely be updated and then copied for the other suspended optics.

To represent the optical transfer functions, the model loads a set of SOS filter coefficients generated by an Optickle model of the interferometer. The filter-generating code and instructions on how to use it are located here. In particular, it contains a Matlab script named opt40m.m which defines the interferferometer. It should be updated to match the parameters in the latest 40m Finesse model, C1_w_BHD.kat. The calibrations from Watts to sensor voltages will also need to be checked and likely updated.

Model-Porting Procedure

For future reference, below are the steps followed to port this model to the virtual cymac.

  1. Copy over model files.
     
    • The c1lsp model, chiara:/opt/rtcds/userapps/release/isc/c1/models/c1lsp.mdl, was copied to the userapps directory on the virtual cymac, c1sim:/home/controls/docker-cymac/userapps/x1lsp.mdl. In the filename, note the change in IFO prefix "c1" --> "x1," since this cymac is not part of the C1 CDS network.
       
    • This model also depends on a custom library file, chiara:/opt/rtcds/userapps/release/isc/c1/models/SIMPLANT.mdl, which was copied to c1sim:/home/controls/docker-cymac/userapps/lib/SIMPLANT.mdl.
       
  2. Update model parameters in Simulink. To edit models in Simulink, see the instructions here and also here.
     
    • The main changes are to the cdsParameters block, which was updated as shown below. Note the values of dcuid and specific_cpu are specifically assigned to x1lsp and will vary for other models. The other parameters will be the same.
       

    •  
    • I also had to change the name of one of the user-defined objects from "ADC0" --> "ADC" and then re-copy the cdsAdc object (shown above) from the CDS_PARTS.mdl library. At least in newer RCG code, the cdsAdc object must also be named "ADC0." This namespace collision was causing the compiler to fail.
       
    • Note: Since Matlab is not yet set up on c1sim, I actually made these edits on one of the 40m machines (chiara) prior to copying the model.
       
  3. Compile and launch the models. Execute the following commands on c1sim:
    • $ cd ~/docker-cymac
      $ ./kill_cymac
      $ ./start_cymac debug
    • The optional debug flag will print the full set of compilation messages to the terminal. If compilation fails, search the traceback for lines containing "ERROR" to determine what is causing the failure.
       

  4. Accessing MEDM screens. Once the model is running, a button should be added to the sitemap screen (located at c1sim:/home/controls/docker-cymac/userapps/medm/sitemap.adl) to access one or more screens specific to the newly added model.

    • Custom-made screens should be added to c1sim:/home/controls/docker-cymac/userapps/medm/x1lsp (where the final subdirectory is the name of the particular model).

    • The set of available auto-generated screens for the model can be viewed by entering the virtual environment:

    • $ cd ~/docker-cymac
      $ ./login_cymac                    #drops into virtual shell
      # cd /opt/rtcds/tst/x1/medm/x1lsp  #last subdirectory is model name
      # ls -l *.adl
      # exit                             #return to host shell
    • The sitemap screen and any subscreens can link to the auto-generated screens in the usual way (by pointing to their virtual /opt/rtcds path). Currently, for the virtual path resolution to work, an environment script has to be run prior to launching sitemap, which sets the location of a virtual MEDM server (this will be auto-scripted in the future):

    • $ cd ~/docker-cymac
      $ eval $(./env_cymac)
      $ sitemap
    • One important auto-generated screen that should be linked for every model is the CDS runtime diagnostics screen, which indicates the success/fail state of the model and all its dependencies. T1100625 details the meaning of all the various indicator lights.

 

 

 

 

  16037   Thu Apr 15 17:24:08 2021 JonUpdateCDSUpdated c1auxey wiring plan

I've updated the c1auxey wiring plan for compatibility with the new suspension electronics. Specifically it is based on wiring schematics for the new HAM-A coil driver (D1100117), satellite amplifier (D1002818), and HV bias driver (D1900163).

Changes:

  • The PDMon, VMon, CoilEnable, and BiasAdj channels all move from DB37 to various DB9 breakout boards.
  • The DB9 cables (x2) connecting the CoilEnable channels to the coil drivers must be spliced with the dewhitening switching signals from the RTS.
  • As suggested, I added five new BI channels to monitor the state of the CoilEnable switches. For lack of a better name, they follow the naming convention C1:SUS-ETMY_xx_ENABLEMon.

@Yehonathan can proceed with wiring the chassis.

Quote:

I finished prewiring the new c1auxey Acromag chassis (see attached pictures). I connected all grounds to the DIN rail to save some wiring. The power switches and LEDs work as expected.

I configured the DAQ modules using the old windows machine. I configured the gateway to be 192.168.114.1. The host machine still needs to be setup.

Next, the feedthroughs need to be wired and the channels need to be bench tested.

Attachment 1: C1AUXEY_Chassis_Feedthroughs_-_By_Connector.pdf
C1AUXEY_Chassis_Feedthroughs_-_By_Connector.pdf
  16090   Wed Apr 28 11:31:40 2021 JonUpdateVACEmpty N2 Tanks

I checked out what happened on c1vac. There are actually two independent monitoring codes running:

  1. The interlock service, which monitors the line directly connected to the valves.
  2. A seaparate convenience mailer, running as a cronjob, that monitors the tanks.

The interlocks did not trip because the low-pressure delivery line, downstream of the dual-tank regulator, never fell below the minimum pressure to operate the valves (65 PSI). This would have eventually occurred, had Jordan been slower to replace the tanks. So I see no problem with the interlocks.

On the other hand, the N2 mailer should have sent an email at 2021-04-18 15:00, which was the first time C1:Vac-N2T1_pressure dropped below the 600 PSI threshold. N2check.log shows these pressures were recorded at this time, but does not log that an email was sent. Why did this fail? Not sure, but I found two problems which I did fix:

  • One was that the code was checking the sensor on the low-pressure side (C1:Vac-N2_pressure; nominally 75 PSI) against the same 600 PSI threshold as the tanks. This channel should either be removed or a separate threshold (65 PSI) defined just for it. I just removed it from the list because monitoring of this channel is redundant with the interlock service. This does not explain the failure to send an email.
  • The second issue was that the pyN2check.sh script appeared to be calling Python 3 to run a Python 2 script. At least that was the situation when I tested it, and this was causing it to fail partway through. This might well explain the problem with no email. I explicitly set python --> python2 in the shell script.

The code then ran fine for me when I retested it. I don't see any further issues.

Quote:

Installed T2 today, and leaked checked the entire line. No issues found. It could have been a bad valve on the tank itself. Monitored T2 pressure for ~2 hours to see if there was any change. All seems ok.

Quote:

When I came into the lab this morning, I noticed that both N2 tanks were empty. I had swapped one on Friday (4-16-21) before I left the lab. Looking at the logs, the right tank (T2) sprung a leak shortly shortly after install. I leak checked the tank coupling after install but did not see a leak. There could a leak further down the line, possibly at the pressure transducer.

The left tank (T1) emptied normally over the weekend, and I quickly swapped the left tank for a full one, and is curently at ~2700 psi. It was my understanding that if both tanks emptied, V1 would close automatically and a mailer would be sent out to the 40m group. I did not receive an email over the weekend, and I checked the Vac status just now and V1 was still open.

I will keep an eye on the tank pressure throughout the day, and will try to leak check the T2 line this afternoon, but someone should check the vacuum interlocks and verify.

 

  16093   Thu Apr 29 10:51:35 2021 JonUpdateCDSI/O Chassis Assembly

Summary

Yesterday I unpacked and installed the three 18-bit DAC cards received from Hanford. I then repeated the low-level PCIe testing outlined in T1900700, which is expanded upon below. I did not make it to DAC-ADC loopback testing because these tests in fact revealed a problem with the new hardware. After a combinatorial investigation that involved swapping cards around between known-to-be-working PCIe slots, I determined that one of the three 18-bit DAC cards is bad. Although its "voltage present" LED illuminates, the card is not detected by the host in either I/O chassis.

I installed one of the two working DACs in the c1bhd chassis. This now 100% completes this system. I installed the other DAC in the c1sus2 chassis, which still requires four more 18-bit DACs. Lastly, I reran the PCIe tests for the final configurations of both chassis.

PCIe Card Detection Tests

For future reference, below is the set of command line tests to verify proper detection and initialization of ADC/DAC/BIO cards in I/O chassis. This summarizes the procedure described in T1900700 and also adds the tests for 18-bit DAC and 32-channel BO cards, which are not included in the original document.

Each command should be executed on the host machine with the I/O chassis powered on:

$ sudo lspci -v | grep -B 1 xxxx

where xxxx is a four-digit device code given in the following table.

Device Device Code
General Standards 16-bit ADC 3101
General Standards 16-bit DAC 3120
General Standards 18-bit DAC 3357
Contec 16-channel BIO 8632
Contec 32-channel BO 86e2
Dolphin IPC host adapter 0101

The command will return a two-line entry for each PCIe device of the specified type that is detected. For example, on a system with a single ADC this command should return:

10:04.0 Bridge: PLX Technology, Inc. PCI9056 32-bit 66MHz PCI IOBus Bridge (rev ac)
             Subsystem: PLX Technology, Inc. Device 3101
  16116   Tue May 4 07:38:36 2021 JonUpdateCDSI/O Chassis Assembly

IOP models created

With all the PCIe issues now resolved, yesterday I proceeded to build an IOP model for each of new FEs. I assigned them names and DCUIDs consist with the 40m convention, listed below. These models currently exist on only the cloned copy of /opt/rtcds running on the test stand. They will be copied to the main network disk later, once the new systems are fully tested.

Model Host CPU DCUID
c1x06 c1bhd 1 23
c1x07 c1sus2 1 24

The models compile and install successfully. The RCG runtime diagnostics indicate that all is working except for the timing synchronization and DAQD data transmission. This is as expected because neither of these have been set up yet.

Timing system set-up

The next step is to provide the 65 kHz clock signals from the timing fanout via LC optical fiber. I overlooked the fact that an SPX optical transceiver is required to interface the fiber to the timing slave board. These were not provided with the timing slaves we received. The timing slaves require a particular type of transceiver, 100base-FX/OC-3, which we did not have on hand. (For future reference, there is a handy list of compatible transceivers in E080541, p. 14.) I placed a Digikey order for two Finisar FTLF1217P2BTL, which should arrive within two days.

Attachment 1: Screen_Shot_2021-05-03_at_4.16.06_PM.png
Screen_Shot_2021-05-03_at_4.16.06_PM.png
  16130   Tue May 11 16:29:55 2021 JonUpdateCDSI/O Chassis Assembly
Quote:

Timing system set-up

The next step is to provide the 65 kHz clock signals from the timing fanout via LC optical fiber. I overlooked the fact that an SPX optical transceiver is required to interface the fiber to the timing slave board. These were not provided with the timing slaves we received. The timing slaves require a particular type of transceiver, 100base-FX/OC-3, which we did not have on hand. (For future reference, there is a handy list of compatible transceivers in E080541, p. 14.) I placed a Digikey order for two Finisar FTLF1217P2BTL, which should arrive within two days.

Today I brought and installed the new optical transceivers (Finisar FTLF1217P2BTL) for the two timing slaves. The timing slaves appear to phase-lock to the clocking signal from the master fanout. A few seconds after each timing slave is powered on, its status LED begins steadily blinking at 1 Hz, just as in the existing 40m systems.

However, some other timing issue remains unresolved. When the IOP model is started (on either FE), the DACKILL watchdog appears to start in a tripped state. Then after a few minutes of running, the TIM and ADC indicators go down as well. This makes me suspect the sample clocks are not really phase-locked. However, the models do start up with no error messages. Will continue to debug...

Attachment 1: Screen_Shot_2021-05-11_at_3.03.42_PM.png
Screen_Shot_2021-05-11_at_3.03.42_PM.png
  16154   Sun May 23 18:28:54 2021 JonUpdateCDSOpto-isolator for c1auxey

The new HAM-A coil drivers have a single DB9 connector for all the binary inputs. This requires that the dewhitening switching signals from the fast system be spliced with the coil enable signals from c1auxey. There is a common return for all the binary inputs. To avoid directly connecting the grounds of the two systems, I have looked for a suitable opto-isolator for the c1auxey signals.

I best option I found is the Ocean Controls KTD-258, a 4-channel, DIN-rail-mounted opto-isolator supporting input/output voltages of up to 30 V DC. It is an active device and can be powered using the same 15 V supply as is currently powering both the Acromags and excitation. I ordered one unit to be trialed in c1auxey. If this is found to be good solution, we will order more for the upgrades of c1auxex and c1susaux, as required for compatibility with the new suspension electronics.

ELOG V3.1.3-