40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 63 of 354  Not logged in ELOG logo
ID Date Author Type Category Subjectdown
  16756   Mon Apr 4 17:03:47 2022 AnchalSummaryCDSc1susaux2 slow controls acromag chassis fixed and installed

[Anchal, JC, Ian, Paco]

We have now fixed all issues with the PD mons of c1susaux2 chassis. The slow channels are now reading same values as the fast channels and there is no arbitrary offset. The binary channels are all working now except for LO2 UL which keeps showing ENABLE OFF. This was an issue earlier on LO1 UR and it magically disappeared and now is on LO2. I think the optical isolators aren't very robust. But anyways, now our watchdog system is fully functional for all BHD suspended optics.

Attachment 1: Screenshot_2022-04-04_17-03-26.png
Screenshot_2022-04-04_17-03-26.png
  14495   Mon Mar 25 10:21:05 2019 JonUpdateUpgradec1susaux upgrade plan

Now that the Acromag upgrade of c1vac is complete, the next system to be upgraded will be c1susaux. We chose c1susaux because it is one of the highest-priority systems awaiting upgrade, and because Johannes has already partially assembled its Acromag replacement (see photos below). I've assessed the partially-assembled Acromag chassis and the mostly-set-up host computer and propose we do the following to complete the system.

Documentation

As I go, I'm writing step-by-step documentation here so that others can follow this procedure for future systems. The goal is to create a standard procedure that can be followed for all the remaining upgrades.

Acromag Chassis Status

The bulk of the remaining work is the wiring and testing of the rackmount chassis housing the Acromag units. This system consists of 17 units: 10 ADCs, 4 DACs, and 3 digitial I/O modules. Johannes has already created a full list of channel wiring assignments. He has installed DB37-to-breakout board feedthroughs for all the signal cable connections. It looks like about 40% of the wiring from the breakout boards to Acromag terminals is already done.

The Acromag units have to be initially configured using the Windows laptop connected by USB. Last week I wasn't immediately able to check their configuration because I couldn't power on the units. Although the DC power wiring is complete, when I connected a 24V power supply to the chassis connector and flipped on the switch, the voltage dropped to ~10V irrespective of adjusting the current limit. The 24V indicator lights on the chassis front and back illuminated dimly, but the Acromag lights did not turn on. I suspect there is a short to ground somewhere, but I didn't have time to investigate further. I'll check again this week unless someone else looks at it first.

Host Computer Status

The host computer has already been mostly configured by Johannes. So far I've only set up IP forwarding rules between the martian-facing and Acromag-facing ethernet interfaces (the Acromags are on a subnet inaccessible from the outside). This is documented in the link above. I also plan to set up local installations of modbus and EPICS, as explained below. The new EPICS command file (launches the IOC) and database files (define the channels) have already been created by Johannes. I think all that remains is to set up the IOC as a persistent system service.

Host computer OS

Recommendation from Keith Thorne:

For CDS lab-wide, Jamie Rollins and Ryan Blair have been maintaining Debian 8 and 9 repos with some of these.  
They have somewhat older EPICS versions and may not include all the modules we have for SL7.
One worry is whether they will keep up Debian 9 maintained, as Debian 10 is already out.

I would likely choose Debian 9 instead of Ubuntu 18.04.02, as not sure of Ubuntu repos for EPICS libraries.

Based on this, I propose we use Debian 9 for our Acromag systems. I don't see a strong reason to switch to SL7, especially since c1vac and c1susaux are already set-up using Debian 8. Although Debian 8 is one version out of date, I think it's better to get a well-documented and tested procedure in place before we upgrade the working c1vac and c1susaux computers. When we start building the next system, let's install Debian 9 (or 10, if it's available), get it working with EPICS/modbus, then loop back to c1vac and c1susaux for the OS upgrade.

Local vs. central modbus/EPICS installation

The current convention is for all machines to share a common installation which is hosted on the /cvs/cds network drive. This seems appealing because only a single central EPICS distribution needs to be maintained. However, from experience attempting this on c1vac, I'm convinced this is a bad design for the new Acromag systems.

The problem is that any network outage, even routine maintenance or brief glitches, wreaks havoc on Acromags set up this way. When the network is interrupted, the modbus executable disappears mid-execution, crashing the process and hanging the OS (I think related to the deadlocked NFS mount), so that the only way to recover is to manually power-cycle. Still worse, this can happen silently (channel values freeze), meaning that, e.g., watchdog protections might fail.

To avoid this, I'm planning to install a local EPICS distribution from source on c1susaux, just as I did for c1vac. This only takes a few minutes to do, and I will include the steps in the documented procedure. Building from source also better protects against OS-dependent buginess.

Main TODO items

  • Debug issue with Acromag DC power wiring
  • Complete wiring from chassis feedthroughs to Acromag terminals, following this wiring diagram
  • Check/set the configuration of each Acromag unit using the software on the Windows laptop
  • Set the analog channel calibrations in the EPICS database file
  • Test each channel ex situ. Chub and I discussed an idea to use two DB-37F breakout boards, with the wiring between the board terminals manually set. One DAC channel would be calibrated and driven to test other ADC channels. A similar approach could be used for the digital input/output channels.
Attachment 1: IMG_3136.jpg
IMG_3136.jpg
Attachment 2: IMG_3138.jpg
IMG_3138.jpg
Attachment 3: IMG_3137.jpg
IMG_3137.jpg
  14496   Tue Mar 26 04:25:13 2019 JohannesUpdateUpgradec1susaux upgrade plan
Quote:

Main TODO items

  • Debug issue with Acromag DC power wiring
  • Complete wiring from chassis feedthroughs to Acromag terminals, following this wiring diagram
  • Check/set the configuration of each Acromag unit using the software on the Windows laptop
  • Set the analog channel calibrations in the EPICS database file
  • Test each channel ex situ. Chub and I discussed an idea to use two DB-37F breakout boards, with the wiring between the board terminals manually set. One DAC channel would be calibrated and driven to test other ADC channels. A similar approach could be used for the digital input/output channels.

Just a few remarks, since I heard from Gautam that c1susaux is next in line for upgrade.

All units have already been configured with IP addresses and settings following the scheme explained on the slow controls wiki page. I did this while powering the units in the chassis, so I'm not sure where the short is coming from. Is the power supply maybe not sourcing enough current? Powering all units at the same time takes significant current, something like >1.5 Amps if I remember correctly. These are the IPs I assigned before I left:

Acromag Unit IP Address
C1SUSAUX_ADC00 192.168.115.20
C1SUSAUX_ADC01 192.168.115.21
C1SUSAUX_ADC02 192.168.115.22
C1SUSAUX_ADC03 192.168.115.23
C1SUSAUX_ADC04 192.168.115.24
C1SUSAUX_ADC05 192.168.115.25
C1SUSAUX_ADC06 192.168.115.26
C1SUSAUX_ADC07 192.168.115.27
C1SUSAUX_ADC08 192.168.115.28
C1SUSAUX_ADC09 192.168.115.29
C1SUSAUX_DAC00 192.168.115.40
C1SUSAUX_DAC01 192.168.115.41
C1SUSAUX_DAC02 192.168.115.42
C1SUSAUX_DAC03 192.168.115.43
C1SUSAUX_BIO00 192.168.115.60
C1SUSAUX_BIO01 192.168.115.61
C1SUSAUX_BIO02 192.168.115.62

I used black/white twisted-pair wires for A/D, red/white for D/A, and green/white for BIO channels. I found it easiest to remove the blue terminal blocks from the Acromag units for doing the majority of the wiring, but wasn't able to finish it. I had also done the analog channel calibrations using the windows untility using multimeters and one of the precision voltage sources I had brought over from the Bridge labs, but it's probably a good idea to check it and correct if necessary. I also recommend to check that the existing wiring particularly for MC1 and MC2 is correct, as I had swapped their order in the channel assignment in the past.

While looking through the database files I noticed two glaring mistakes which I fixed:

  1. The definition of C1SUSAUX_BIO2 was missing in /cvs/cds/caltech/target/c1susaux2/C1SUSAUX.cmd. I added it after the assignments for C1SUSAUX_BIO1
  2. Due to copy/paste the database files /cvs/cds/caltech/target/c1susaux2/C1_SUS-AUX_<OPTIC>.db files were still pointing to C1AUXEX. I overwrote all instances of this in all database files with C1SUSAUX.

 

  14590   Thu May 2 15:35:54 2019 JonOmnistructureUpgradec1susaux upgrade documentation

For future reference:

  • The updated list of c1susaux channel wiring (includes the "coil enable" --> "coil test" digital outputs change)
  • Step-by-step instructions on how to set up an Acromag system from scratch
  12336   Tue Jul 26 09:56:34 2016 ericqUpdateCDSc1susaux restarted

c1susaux (which controls watchdogs and alignments for all non-ETM optics) was down, the last BURT was done yesterday around 2PM. 

I restarted via keying the crate. I restored the BURT snapshot from yesterday.

  14567   Wed Apr 24 17:07:39 2019 gautamUpdateSUSc1susaux in-situ testing [and future of IFOtest]

[jon, gautam]

For the in-situ test, I decided that we will use the physical SRM to test the c1susaux Acromag replacement crate functionality for all 8 optics (PRM, BS, ITMX, ITMY, SRM, MC1, MC2, MC3). To facilitate this, I moved the backplane connector of the SRM SUS PD whitening board from the P1 connector to P2, per Koji's mods at ~5:10PM local time. Watchdog was shutdown, and the backplane connectors for the SRM coil driver board was also disconnected (this is interfaced now to the Acromag chassis).

I had to remove the backplane connector for the BS coil driver board in order to have access to the SRM backplane connector. Room in the back of these eurocrate boxes is tight in the existing config...

At ~6pm, I manually powered down c1susaux (as I did not know of any way to turn off the EPICS server run by the old VME crate in a software way). The point was to be able to easily interface with the MEDM screens. So the slow channels prefixed C1:SUS-* are now being served by the Supermicro called c1susaux2.

A critical wiring error was found. The channel mapping prepared by Johannes lists the watchdog enable BIO channels as "C1:SUS-<OPTIC>_<COIL>_ENABLE", which go to pins 23A-27A on the P1 connector, with returns on the corresponding C pins. However, we use the "TEST" inputs of the coil driver boards for sending in the FAST actuation signals. The correct BIO channels for switching this input is actually "C1:SUS-<OPTIC>_<COIL>_TEST", which go to pins 28A-32A on the P1 connector. For todays tests, I voted to fix this inside the Acromag crate for the SRM channels, and do our tests. Chub will unfortunately have to fix the remaining 7 optics, see Attachment #1 for the corrections required. I apportion 70% of the blame to Johannes for the wrong channel assignment, and accept 30% for not checking it myself.

The good news: the tests for the SRM channels all passed!

  • Attachment #2: Output of Jon's testing code. My contribution is the colored logs courtesy of python's coloredlogs package, but this needs a bit more work - mainly the PASS mssage needs to be green. This test applies bias voltages to PIT/YAW, and looks for the response in the PDmon channels. It backs out the correct signs for the four PDs based on the PIT/YAW actuation matrix, and checks that the optic has moved "sufficiently" for the applied bias. You can also see that the PD signals move with consistent signs when PIT/YAW misalignment is applied. Additionally, the DC values of the PDMon channels reported by the Acromag system are close to what they were using the VME system. I propose calling the next iteration of IFOtest "Sherlock".
  • Attachment #3: Confirmation (via spectra) that the SRM OSEM PD whitening can still be switched even after my move of the signals from the P1 connector to the P2 connector. I don't have an explanation right now for the shape of the SIDE coil spectrum.
  • Attachment #4: Applied 100 cts (~ 100*10/2**15/2 ~ 15mV at the monitor point) offset at the bias input of the coil output filters on SRM (this is a fast channel). Looked for the response in the Coil Vmon channels (these are SLOW channels). The correct coil showed consistent response across all 5 channels.

Additionally, I confirmed that the watchdog tripped when the RMS OSEM PD voltage exceeded 200 counts. Ideally we'd have liked to test the stability of the EPICS server, but we have shut it down and brought the crate back out to the electronics bench for Chub to work on tomorrow.

I restarted the old VME c1susaux at 915pm local time as I didn't want to leave the watchdogs in an undefined state. Unsurprisingly, ITMY is stuck. Also, the BS (cable #22) and SRM (cable #40) coil drivers are physically disconnected at the front DB15 output because of the undefined backplane inputs. I also re-opened the PSL shutter.

Attachment 1: 2019-04-24_20-29.pdf
2019-04-24_20-29.pdf
Attachment 2: Screenshot_from_2019-04-24_20-05-54.png
Screenshot_from_2019-04-24_20-05-54.png
Attachment 3: SRM_OSEMPD_WHT_ACROMAG.pdf
SRM_OSEMPD_WHT_ACROMAG.pdf
Attachment 4: DCVmon.png
DCVmon.png
  16210   Thu Jun 17 16:37:23 2021 Anchal, PacoUpdateSUSc1susaux computer rebooted

Jon suggested to reboot the acromag chassis, then the computer, and we did this without success. Then, Koji suggested we try running ifup eth0, so we ran `sudo /sbin/ifup eth0` and it worked to put c1susaux back in the martian network, but the modbus service was still down. We switched off the chassis and rebooted the computer and we had to do sudo /sbin/ifup eth0` again (why do we need to do this manually everytime?). Switched on the chassis but still no channels. `sudo systemctl status modbusioc.service' gave us inactive (dead) status. So  we ran sudo systemctl restart modbusioc.service'.

The status became:


● modbusIOC.service - ModbusIOC Service via procServ
   Loaded: loaded (/etc/systemd/system/modbusIOC.service; enabled)
   Active: inactive (dead)
           start condition failed at Thu 2021-06-17 16:10:42 PDT; 12min ago
           ConditionPathExists=/opt/rtcds/caltech/c1/burt/autoburt/latest/c1susaux.snap was not met`

After another iteration we finally got a modbusIOC.service OK status, and we then repeated Jon's reboot procedure. This time, the acromags were on but reading 0.0, so we just needed to run `sudo /sbin/ifup eth1`and finally some sweet slow channels were read. As a final step we burt restored to 05:19 AM today c1susaux.snap file and managed to relock the IMC >> will keep an eye on it.... Finally, in the process of damping all the suspended optics, we noticed some OSEM channels on BS and PRM are reading 0.0 (they are red as we browse them)... We succeeded in locking both arms, but this remains an unknown for us.

  14563   Tue Apr 23 18:48:25 2019 JonUpdateSUSc1susaux bench testing completed

Today I tested the remaining Acromag channels and retested the non-functioning channels found yesterday, which Chub repaired this morning. We're still not quite ready for an in situ test. Here are the issues that remain.

Analog Input Channels

Channel Issue
C1:SUS-MC2_URPDMon No response
C1:SUS-MC2_LRPDMon No response

I further diagnosed these channels by connecting a calibrated DC voltage source directly to the ADC terminals. The EPICS channels do sense this voltage, so the problem is isolated to the wiring between the ADC and DB37 feedthrough.

Analog Output Channels

Channel Issue
C1:SUS-ITMX_ULBiasAdj No output signal
C1:SUS-ITMX_LLBiasAdj No output signal
C1:SUS-ITMX_URBiasAdj No output signal
C1:SUS-ITMX_LRBiasAdj No output signal
C1:SUS-ITMY_ULBiasAdj No output signal
C1:SUS-ITMY_LLBiasAdj No output signal
C1:SUS-ITMY_URBiasAdj No output signal
C1:SUS-ITMY_LRBiasAdj No output signal
C1:SUS-MC1_ULBiasAdj No output signal
C1:SUS-MC1_LLBiasAdj No output signal
C1:SUS-MC1_URBiasAdj No output signal
C1:SUS-MC1_LRBiasAdj No output signal

To further diagnose these channels, I connected a voltmeter directly to the DAC terminals and toggled each channel output. The DACs are outputting the correct voltage, so these problems are also isolated to the wiring between DAC and feedthrough.

In testing the DC bias channels, I did not check the sign of the output signal, but only that the output had the correct magnitude. As a result my bench test is insensitive to situations where either two degrees of freedom are crossed or there is a polarity reversal. However, my susPython scripting tests for exactly this, fetching and applying all the relevant signal gains between pitch/yaw input and coil bias output. It would be very time consuming to propagate all these gains by hand, so I've elected to wait for the automated in situ test.

Digital Output Channels

yes Everything works.

  17028   Fri Jul 22 17:46:10 2022 yutaConfigurationBHDc1sus2 watchdog update and DCPD ERR channels

[Tega, Yuta]

We have added C1:HPC-DCPD_A_ERR and C1:HPC-DCPD_B_ERR testpoints, which can be used as A+B, A-B etc.
Restarting c1hpc crashed c1sus2, and also made c1lsc/ioo/sus models red.
We run /opt/rtcds/caltech/c1/Git/40m/scripts/cds/restartAllModels.sh to restart all the machines. It worked perfectly without manually pressing power buttons! Wow!heart

We have also edited /opt/rtcds/caltech/c1/medm/c1su2/C1SU2_WATCHDOGS.adl so that it will use new /opt/rtcds/caltech/c1/Git/40m/scripts/SUS/medm/resetFromWatchdogTrip.sh instead of old /opt/rtcds/caltech/c1/scripts/SUS/damprestore.py.

Attachment 1: Screenshot_2022-07-22_17-48-25.png
Screenshot_2022-07-22_17-48-25.png
  17026   Fri Jul 22 15:05:26 2022 TegaConfigurationBHDc1sus2 shared memory and ADC fix

[Tega, Yuta]

We were able to fix the shared memory issue by updating the receiver model name from ''SUS' to 'SU2' and the ADC zero issue by including both ADC0 and ADC1 in the c1hpc and c1bac models as well as removing the grounding of the unused ADC channels (including chn#16 and chn#17 which are actually used in c1hpc) in c1su2. We also used shared memory to move the DCPD_A/B error signals (after signal conditioning and mixing A/B; now named A_ERR and B_ERR) from c1hpc to c1bac.
C1:HPC-DCPD_A_IN1 and C1:HPC-DCPD_B_IN1 are now availableangel (they are essentially the same as C1:LSC-DCPD_A_IN1 and C1:LSC-DCPD_B_IN1, except for they are ADC-ed with different ADC; see elog 40m/16954 and Attachment #1).
Dolphin IPC error in seding signal from c1hpc to c1lsc still remains.crying

Attachment 1: Screenshot_2022-07-22_15-04-33_DCPD.png
Screenshot_2022-07-22_15-04-33_DCPD.png
Attachment 2: Screenshot_2022-07-22_15-12-19_models.png
Screenshot_2022-07-22_15-12-19_models.png
Attachment 3: Screenshot_2022-07-22_15-15-11_ERR.png
Screenshot_2022-07-22_15-15-11_ERR.png
Attachment 4: Screenshot_2022-07-22_15-32-19_GDS.png
Screenshot_2022-07-22_15-32-19_GDS.png
  17054   Tue Aug 2 17:25:18 2022 TegaConfigurationBHDc1sus2 dolphin IPC issue solved

[Yuta, Tega, Chris]

We did it!laugh

Following Chris's suggestion, we added "pciRfm=1" to the CDS parameter block in c1x07.mdl - the IOP model for c1sus2. Then restarted the FE machines and this solved the dolphin IPC problem on c1sus2. We no longer see the RT Netstat error for 'C1:HPC-LSC_DCPD_A' and 'C1:HPC-LSC_DCPD_B' on the LSC IPC status page, see attachement 1.

Attachment 2 shows the module dependencies before and after the change was made, which confirms that the IOP model was not using the dolphin driver before the change.


We encountered a burt restore problem with missing snapfiles from yesterday when we tried restoring the EPICS values after restarting the FE machines. Koji helped us debug the problem, but the summary is that restarting the FE models somehow fixed the issue.

Log files:
/opt/rtcds/caltech/c1/burt/burtcron.log
/opt/rtcds/caltech/c1/burt/autoburt/autoburtlog.log
 
Request File list:
/opt/rtcds/caltech/c1/burt/autoburt/requestfilelist
 
Snap files location:
/opt/rtcds/caltech/c1/burt/autoburt/today
/opt/rtcds/caltech/c1/burt/autoburt/snapshots
 
Autoburt crontab on megatron:
19 * * * * /opt/rtcds/caltech/c1/scripts/autoburt/autoburt.cron > /opt/rtcds/caltech/c1/burt/burtcron.log 2>&1
Attachment 1: c1lsc_IPC_status.png
c1lsc_IPC_status.png
Attachment 2: FE_lsmod_dependencies_c1sus2_b4_after_iop_unpdate.png
FE_lsmod_dependencies_c1sus2_b4_after_iop_unpdate.png
  17626   Thu Jun 8 10:31:46 2023 yutaUpdateCDSc1sus2 all FE models crashed spontaneously again and IMC/vertex optics damping stopped

[Mayank, Yuta]

We noticed that c1sus2 crashed again on June 8, 2023 17:25 UTC (Yarm has been locked since 1:49:24 UTC, with a short interruption 10:31:51-10:32:29 UTC due to IMC unlock; Attachment #1 and #2).
Following 40m/17335, we fixed it by running

controls@c1sus2:~$ rtcds restart --all

"global diag reset" made all FE STATUS green.
Burt restored at 2023/Jun/8/09:19 for c1sus2 models, and BHD optics are now damping OK.
MC suspensions, BS, PRM, SRM, ITMX, ITMY were also affected (BS, PRM, SRM watchdogs even tripped), and c1x02 showed DAC error (Attachment #3).
Following 40m/17606, we also had to run

./opt/rtcds/caltech/c1/Git/40m/scripts/cds/restartAllModels.sh

Now all look fine.

 

Attachment 1: Screenshot_2023-06-08_10-36-08_Yarm.png
Screenshot_2023-06-08_10-36-08_Yarm.png
Attachment 2: Screenshot_2023-06-08_10-36-51_c1sus2.png
Screenshot_2023-06-08_10-36-51_c1sus2.png
Attachment 3: Screenshot_2023-06-08_10-41-16_c1x02.png
Screenshot_2023-06-08_10-41-16_c1x02.png
  17457   Thu Feb 9 10:05:37 2023 yutaUpdateCDSc1sus2 all FE models crashed spontaneously again

I just noticed that c1sus2 crashed again. Following 40m/17335, I fixed it by running

controls@c1sus2:~$ rtcds restart --all

"global diag reset" made all FE STATUS green.
Burt restored at 2023/Feb/8/19:19 for c1sus2 models.
Watchdogs reset for BHD optics and now all look fine.

  17606   Mon May 29 11:04:13 2023 PacoUpdateCDSc1sus2 all FE models crashed spontaneously again

c1sus2 crashed again. Following 40m/17335, I fixed it by running

controls@c1sus2:~$ rtcds restart --all

"global diag reset" made all FE STATUS green burt restored at 2023/May/28/00:19 for c1sus2 models, and watchdogs reset for BHD optics and now all look fine.


Other optics, including MC1, MC2, and MC3 were not damped, so maybe c1sus crashed too I also ran

controls@c1sus:~$ rtcds restart --all

"global diag reset" made all FE STATUS green, burt restored at 2023/May/28/00:19 for c1sus models and watchdogs reset for all other suspensions.


TT1 and TT2 were not responding, and the DAC monitors were frozen... so I ran

./opt/rtcds/caltech/c1/Git/40m/scripts/cds/restartAllModels.sh


Wed May 31 14:58:06 2023 UPDATE: I forgot to log that ITMX, ITMY and BS oplevs were centered after some nominal alignment was recovered.

  17335   Mon Dec 5 12:05:29 2022 AnchalUpdateCDSc1sus2 all FE models crashed spontaneously

Just a few minutes ago, all models on FE c1sus2 crashed. I'm attaching some important files that can be helpful in investigating this. CDS upgrade team, please take a look.

I fixed this by running following on c1sus2:

controls@c1sus2:~$ rtcds restart --all
Attachment 1: c1sus2Failed.txt
controls@c1sus2:~$ rtcds status
found host specific environment file '/etc/advligorts/systemd_env_c1sus2'
Kernel Module Status
mbuf                   24576  40 c1bac,c1x07,c1su2,c1hpc,c1su3
gpstime                40960  5 c1bac,c1x07,c1su2,c1hpc,c1su3
dis_kosif             131072  5 dis_ix_ntb,dis_intel_dma,dis_ix_dma,dis_sisci,dis_irm
dis_ix_ntb            307200  1 dis_irm
dis_ix_dma             36864  1 dis_ix_ntb
dis_irm               462848  4 dolphin_proxy_km,dis_sisci
dis_sisci            1093632  0
... 28 more lines ...
Attachment 2: c1sus2Failed.png
c1sus2Failed.png
  17025   Thu Jul 21 21:50:47 2022 TegaConfigurationBHDc1sus2 IPC update

IPC issue still unresolved.

Updated shared memory tag so that 'SUS' -> 'SU2' in c1hpc, c1bac and c1su2. Removed obsolete 'HPC/BAC-SUS' references from IPC file, C1.ipc. Restarted the FE models but the c1sus2 machine froze, so I did a manual reboot. This brought down the vertex machines---which I restarted using /opt/rtcds/caltech/c1/scripts/cds/rebootC1LSC.sh---and the end machines which I restarted manually. Everything but the BHD optics now have their previous values. So need to burtrestore these.
 

# IPC file:
/opt/rtcds/caltech/c1/chans/ipc/C1.ipc

# Model file locations:
/opt/rtcds/userapps/release/isc/c1/models/isc/c1hpc.mdl
/opt/rtcds/userapps/release/sus/c1/models/c1su2.mdl
/opt/rtcds/userapps/release/isc/c1/models/isc/c1bac.mdl

# Log files:
/cvs/cds/rtcds/caltech/c1/rtbuild/3.4/c1hpc.log
/cvs/cds/rtcds/caltech/c1/rtbuild/3.4/c1su2.log
/cvs/cds/rtcds/caltech/c1/rtbuild/3.4/c1bac.log


SUS overview medm screen :

  • Reduced the entire screen width
  • Revert to old screen style watchdog layout
  17033   Mon Jul 25 17:58:10 2022 TegaConfigurationBHDc1sus2 IPC dolphin issue update

From the 40m wiki, I was able to use the instructions here to map out what to do to get the IPC issue resolved. Here is a summary of my findings.

I updated the /etc/dis/dishost.conf file on the frame builder machine to include the c1sus2 machine which runs the sender model, c1hpc, see below. After this, the file becomes available on c1sus2 machine, see attachment 1, and the c1sus2 node shows up in the dxadmin GUI, see attachment 2. However, the c1sus2 machine was not active. I noticed that the log file for the dis_nodemgr service, see attachment 3, which is responsible for setting things up, indicated that the dis_irm service may not be up, so I checked and confirmed that this was indeed the case, see attachment 4. I tried restarting this service but was unsuccessful. I restarted the machine but this did not help either. I have reached out to Jonathan Hanks for assistance.

Attachment 1: Screen_Shot_2022-07-25_at_5.43.28_PM.png
Screen_Shot_2022-07-25_at_5.43.28_PM.png
Attachment 2: Screen_Shot_2022-07-25_at_5.21.10_PM.png
Screen_Shot_2022-07-25_at_5.21.10_PM.png
Attachment 3: Screen_Shot_2022-07-25_at_5.30.58_PM.png
Screen_Shot_2022-07-25_at_5.30.58_PM.png
Attachment 4: Screen_Shot_2022-07-25_at_5.35.19_PM.png
Screen_Shot_2022-07-25_at_5.35.19_PM.png
  17052   Mon Aug 1 18:42:39 2022 TegaConfigurationBHDc1sus2 IPC dolphin issue update

[Yuta, Tega]

We decided to give the dolphin debugging another go. Firstly, we noticed that c1sus2 was no longer recogonising the dolphin card, which can be checked using

lspci | grep Stargen

or looking at the status light on the dolphin card of c1sus2, which was orange for both ports A and B.

We decided to do a hard reboot of c1sus2 and turned off the DAQ chassis for a few minutes, then restared c1sus2. This solved the card recognition problem as well as the 'dis_irm' driver loading issue (I think the driver does not get loaded if the system does not recognise a valid card, as I also saw the missing dis_irm driver module on c1testand). 

Next, we confirmed the status of all dolphin cards on fb1, using

controls@fb1$ /opt/DIS/sbin/dxadmin

It looks like the dolphin card on c1sus2 has now been configured and is availabe to all other nodes. We then restated the all FE machines and models to see if we are in the clear. Unfortunately, we are not so lucky since the problem persisted.

Looking at the output of 'dmesg', we could only identity two notable difference between the operational dolphin cards on c1sus/c1ioo/c1lsc and c1sus2, namely: the card number being equal to zero and the memory addresses which are also zero, see image below.

Anyways, at least we can now eliminate driver issues and would move on to debugging the models next.

Attachment 1: c1sus2_dolphin.png
c1sus2_dolphin.png
Attachment 2: fb1_dxamin_status.png
fb1_dxamin_status.png
Attachment 3: dolphin_num_mem_init2.png
dolphin_num_mem_init2.png
  16414   Tue Oct 19 18:20:33 2021 Ian MacMillanSummaryCDSc1sus2 DAC to ADC test

I ran a DAC to ADC test on c1sus2 channels where I hooked up the outputs on the DAC to the input channels on the ADC. We used different combinations of ADCs and DACs to make sure that there were no errors that cancel each other out in the end. I took a transfer function across these channel combinations to reproduce figure 1 in T2000188.

As seen in the two attached PDFs the channels seem to be working properly they have a flat response with a gain of 0.5 (-6 dB). This is the response that is expected and is the result of the DAC signal being sent as a single ended signal and the ADC receiving as a differential input signal. This should result in a recorded signal of 0.5 the amplitude of the actual output signal.

The drop off on the high frequency end is the result of the anti-aliasing filter and the anti-imaging filter. Both of these are 8-pole elliptical filters so when combined we should get a drop off of 320dB per decade. I measured the slope on the last few points of each filter and the averaged value was around 347dB per decade. This is slightly steeper than expected but since it is to cut off higher frequencies it shouldn't have an effect on the operation of the system. Also it is very close to the expected value.

The ripples seen before the drop off are also an effect of the elliptical filters and are seen in T2000188.

Note: the transfer function that doesn't seem to match the others is the heartbeat timing signal.

Attachment 1: data3_Plots.pdf
data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf
Attachment 2: data2_Plots.pdf
data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf
  16415   Tue Oct 19 23:43:09 2021 KojiSummaryCDSc1sus2 DAC to ADC test

(Because of a totally unrelated reason) I was checking the electronics units for the upgrade. And I realized that the electronics units at the test stand have not been properly powered.

I found that the AA/AI stack at the test stand (Attachment 1) has an unusual powering configuration (Attachment 2).
- Only the positive power supply was used / - The supply voltage is only +15V / - The GND reference is not connected to anywhere.

For confirmation, I checked the voltage across the DC power strip (Attachments 3/4). The positive was +5.3V and the negative was -9.4V. This is subject to change depending on the earth potential.

This is not a good condition at all. The asymmetric powering of the circuit may cause damages to the opamps. So I turned off the switches of the units.

The power configuration should be immediately corrected.

  1. Use both positive and negative supply (2 power supply channels) to produce the positive and the negative voltage potentials. Connect the reference potential to the earth post of the power supply.
    https://www.youtube.com/watch?v=9_6ecyf6K40   [Dual Power Supply Connection / Serial plus minus electronics laboratory PS with center tap]
  2. These units have DC power regulator which produces +/-15V out of +/-18V. So the DC power supplies are supposed to be set at +18V.

 

Attachment 1: P_20211019_224433.jpg
P_20211019_224433.jpg
Attachment 2: P_20211019_224122.jpg
P_20211019_224122.jpg
Attachment 3: P_20211019_224400.jpg
P_20211019_224400.jpg
Attachment 4: P_20211019_224411.jpg
P_20211019_224411.jpg
  16430   Tue Oct 26 18:24:00 2021 Ian MacMillanSummaryCDSc1sus2 DAC to ADC test

[Ian, Anchal, Paco]

After the Koji found that there was a problem with the power source Anchal and I fixed the power then reran the measurment. The only change this time around is that I increased the excitation amplitude to 100. In the first run the excitation amplitude was 1 which seemed to come out noise free but is too low to give a reliable value.

link to previous results

The new plots are attached.

Attachment 1: data2_Plots.pdf
data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf data2_Plots.pdf
Attachment 2: data3_Plots.pdf
data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf data3_Plots.pdf
  3638   Fri Oct 1 18:19:24 2010 josephb, kiwamuUpdateCDSc1sus work

The c1sus model was split into 2, so that c1sus controls BS, PRM, SRM, ITMX, ITMY, while c1mcs controls MC1, MC2, MC3.  The c1mcs uses shared memory to tell c1sus what signals to the binary outputs (which control analog whitening/dewhitening filters), since two models can't control a binary output.

This split was done because the CPU time was running above 60 microseconds (the limit allowable since we're trying to run at 16kHz). Apparently the work Alex had done getting testpoints working had put a greater load on the cpu and pushed it over an acceptable maximum.    After removing the MC optics controls, the CPU time dropped to about 47 microseconds from about 67 microseconds.  The c1mcs is taking about 20 microseconds per cycle.

The new model is using the top_names functionality to still call the channels C1SUS-XXX_YYY.  However, the directory to find the actual medm filter modules is /opt/rtcds/caltech/c1/medm/c1mcs, and the gds testpoint screen for that model is called C1MCS-GDS_TP.adl.  I'm currently in the process of updating the medm screens to point to the correct location.

Also, while plugging in the cables from the coil dewhitening boards, we realized I (Joe) had made a mistake in the assignment of channels to the binary output boards.  I need to re-examine Jay's old drawings and fix the simulink model binary outputs.

  3665   Thu Oct 7 10:37:42 2010 josephbUpdateCDSc1sus with flaky ssh

Currently trying to understand why the ssh connections to c1sus  are flaky.  This morning, every time I tried to make the c1sus model on the c1sus machine, the ssh session would be terminated at a random spot midway through the build process.  Eventually restarting c1sus fixed the problem for the moment.

However, previously in the last 48 hours, the c1sus machine had stopped responding to ssh logins while still appearing to be running the front end code.  The next time this occurs, we should attach a monitor and keyboard and see what kind of state the computer is in.  Its interesting to note we didn't have these problems before we switched over to the Gentoo kernel from the real-time linux Centos 5.5 kernel.

  3160   Tue Jul 6 17:07:56 2010 josephbUpdateCDSc1sus status

I talked to Alex, and he explained the steps necessary to get the real time linux kernel installed.  It basically went like copy the files from c1iscex (the one he installed last month) in the directory /opt/rtlidk-2.2 to the c1sus locally.  Then go into rtlinux_kernel_2_6, and run make and make install (or something like that - need to look at the make file).  Then edit the grub loader file to look like the one on c1iscex (located at /boot/grub/menu.lst).

This will then hopefully let us try out the RCG code on c1sus and see if it works.

  3662   Wed Oct 6 16:16:48 2010 josephb, yutaUpdateCDSc1sus status

At the moment, c1sus and c1mcs on the c1sus machine seem to be dead in the water.  At this point, it is unclear to me why.

Apparently during the 40m meeting, Alex was able to get test points working for the c1mcs model.  He said he "had to slow down mx_stream startup on c1sus".   When we returned at 2pm, things were running fine. 

We began updating all the matrix values on the medm screens.  Somewhere towards the end of this the c1sus model seemed to have crashed, leaving only c1x02 and c1mcs running.  There were no obvious error messages I saw in dmesg and the target/c1sus/logs/log.txt file (although that seems to empty these days).  We quickly saved to burt snap shots, one of c1sus and one of c1mcs and saved them to /opt/rtcds/catlech/c1/target/snapshots directory temporarily.  We then ran the killc1sus script on c1sus, and then after confirming the code was removed, ran the startup script, startc1sus.  The code seemed to come back partly.  It was syncing up and finding the ADC/DAC boards, but not doing any real computations.  The cycle time was reporting reasonably, but the usr time (representing computation done for the model) was 0.  There were no updating monitor channels on the medm screens and filters would not turn on.

At this point I tried bringing down all 3 models, and restarting c1x02, then c1sus and c1mcs.  At this point, both c1sus and c1mcs came back partly, doing no real calculations.  c1x02 appears to be working normally (or at least the two filter banks in that model are showing changing channels from ADCs properly).  I then tried rebooting the c1sus machine.  It came back in the same state, working c1x02, non-calculating c1sus and c1mcs.

  3666   Thu Oct 7 10:48:41 2010 josephb, yutaUpdateCDSc1sus status

This problem has been resolved.

Apparently during one of Alex's debugging sessions, he had commented out the feCode function call on line 1532 of the controller.c file (located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe/ directory).

This function is the one that actually calls all the front end specific code and without it, the code just doesn't do any computations.  We had to then rebuild the front end codes with this corrected file.

Quote:

At the moment, c1sus and c1mcs on the c1sus machine seem to be dead in the water.  At this point, it is unclear to me why.

Apparently during the 40m meeting, Alex was able to get test points working for the c1mcs model.  He said he "had to slow down mx_stream startup on c1sus".   When we returned at 2pm, things were running fine. 

We began updating all the matrix values on the medm screens.  Somewhere towards the end of this the c1sus model seemed to have crashed, leaving only c1x02 and c1mcs running.  There were no obvious error messages I saw in dmesg and the target/c1sus/logs/log.txt file (although that seems to empty these days).  We quickly saved to burt snap shots, one of c1sus and one of c1mcs and saved them to /opt/rtcds/catlech/c1/target/snapshots directory temporarily.  We then ran the killc1sus script on c1sus, and then after confirming the code was removed, ran the startup script, startc1sus.  The code seemed to come back partly.  It was syncing up and finding the ADC/DAC boards, but not doing any real computations.  The cycle time was reporting reasonably, but the usr time (representing computation done for the model) was 0.  There were no updating monitor channels on the medm screens and filters would not turn on.

At this point I tried bringing down all 3 models, and restarting c1x02, then c1sus and c1mcs.  At this point, both c1sus and c1mcs came back partly, doing no real calculations.  c1x02 appears to be working normally (or at least the two filter banks in that model are showing changing channels from ADCs properly).  I then tried rebooting the c1sus machine.  It came back in the same state, working c1x02, non-calculating c1sus and c1mcs.

 

  3668   Thu Oct 7 14:57:52 2010 josephb, yutaUpdateCDSc1sus status

Around noon, Yuta and I were trying to figure out why we were getting no signal out to the mode cleaner coils.  It turns out the mode cleaner optic control model was not talking to the IOP model. 

Alex and I were working under the incorrect assumption that you could use the same DAC piece in multiple models, and simply use a subset of the channels.  He finally went and asked Rolf, who said that the same DAC simulink piece in different models doesn't work.  You need to use shared memory locations to move the data to the model with the DAC card.  Rolf says there was a discussion (probably a long while back) where it was asked if we needed to support DAC cards in multiple models and the decision was that it was not needed.

Rolf and Alex have said they'd come over and discuss the issue.

In the meantime, I'm moving forward by adding shared memory locations for all the mode cleaner optics to talk to the DAC in the c1sus model.

 

Note by KA: Important fact that is worth remembering

  3673   Thu Oct 7 17:19:55 2010 josephb, alex, rolfUpdateCDSc1sus status

As noted by Koji, Alex and Rolf stopped by.

We discussed the feasibility of getting multiple models using the same DAC.  We decided that we infact did need it. (I.e. 8 optics through 3 DACs does not divide nicely), and went about changing the controller.c file so as to gracefully handle that case.  Basically it now writes a 0 to the channel rather than repeating the last output if a particular model goes down that is sharing a DAC.

In a separate issue, we found that when skipping DACs  in a model (say using DACs 1 and 2 only) there was a miscommunication to the IOP, resulting in the wrong DACs getting the data.  the temporary solution is to have all DACs in each model, even if they are not used.  This will eventually be fixed in code.

At this point, we *seem* to be able to control and damp optics.  Look for a elog from Yuta confirming or denying this later tonight (or maybe tomorrow).

 

  3687   Mon Oct 11 10:49:03 2010 josephbUpdateCDSc1sus stability

Taking a look at the c1sus machine, it looks as if all of the front end codes its running (c1sus - running BS, ITMX, ITMY, c1mcs - running MC1, MC2, MC3, and c1rms - running PRM and SRM) worked over the weekend.  As I see no

Running dmesg on c1sus reports on a single long cycle on c1x02, where it took 17 microseconds (~15 microseconds i maximum because the c1x02 IOP process is running at 64kHz).

Both the c1sus and c1mcs models are running at around 39-42 microseconds USR time and 44-50 microseconds CPU time.  It would run into problems at 60-62 microseconds.

Looking at the filters that are turned on, it looks as it these models were running with only a single optic's worth of filters turned on via the medm screens.  I.e. the MC2 and ITMY filters were properly set, but not the others.

The c1rms model is running at around 10 microseconds USR time and 14-18 microseconds CPU time.  However it apparently had no filters on.

It looks as if no test points were used this weekend.  We'll turn on the rest of the filters and see if we start seeing crashes of the front end again.

Edit:

The filters for all the suspensions have been turned on, and all matrix elements entered.  The USR and CPU times have not appreciably changed.  No long cycles have been reported through dmesg on c1sus at this time.  I'm going to let it run and see if it runs into problems.

  6020   Mon Nov 28 06:53:30 2011 kiwamuUpdateCDSc1sus shutdown

I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferomter.

Quote from #6016

c1sus has been shutdown so that the optics dont bang around.  This is because the watch dogs are not working.

  6033   Tue Nov 29 04:47:49 2011 kiwamuUpdateCDSc1sus shut down again

I have shut down the c1sus machine at 3:30 AM.

  3636   Fri Oct 1 16:34:06 2010 josephbUpdateCDSc1sus not booting due to fb dhcp server not running

For some reason, the dhcp server running on the fb machine which assigns the IP address to c1sus (since its running a diskless boot) was down.  This was preventing c1sus from coming up properly.  The symptom was an error indicated no DHCP offers were made(when I plugged a keyboard and monitor in).

To check if the dhcp server is running, run ps -ef | grep dhcpd.  If its not, it can be started with "sudo /etc/init.d/dhcpd start"

  3157   Fri Jul 2 11:33:15 2010 josephbUpdateCDSc1sus needs real time linux to be setup on it

I connected a monitor and keyboard to the new c1sus machine and discovered its not running RTL linux.  I changed the root password to the usual, however, without help from Alex I don't know where to get the right version or how to install it, since it doesn't seem to have an obvious CD rom drive or the like.  Hopefully Tuesday I can get Alex to come over and help with the setup of it, and the other 1-2 IO chassis.

  6042   Tue Nov 29 18:54:29 2011 kiwamuUpdateCDSc1sus machine up

[Zach / Kiwamu]

 Woke up the c1sus machine in order to lock PSL to MC so that we can observe the effect of not having the EOM heater.

  7182   Tue Aug 14 17:47:44 2012 JamieUpdateCDSc1sus machine replaced

Rolf and Alex came back over with a replacement machine for c1sus.   We removed the old machine, removed it's timing, dolphin, and PCIe extension cards and put them in the new machine.  We then installed the new machine and booted it and it came up fine.  The BIOS in this machine is slightly different, and it wasn't having the same failure-to-boot-with-no-COM issue that the previous one was.  The COM ports are turned off on this machine (as is the USB interface).

Unfortunately the problem we were experiencing with the old machine, that unloading certain models was causing others to twitch and that dolphin IPC writes were being dropped, is still there.  So the problem doesn't seem to have anything to do with hardware settings...

After some playing, Rolf and Alex determined that for some reason the c1rfm model is coming up in a strange state when started during boot.  It runs faster, but the IPC errors are there.  If instead all models are stopped, the c1rfm model is started first, and then the rest of the models are started, the c1rfm model runs ok.  They don't have an explanation for this, and I'm not sure how we can work around it other than knowing the problem is there and do manual restarts after boot.  I'll try to think of something more robust.

A better "fix" to the problems is to clean up all of our IPC routing, a bunch of which we're currently doing very inefficient right now.  We're routing things through c1rfm that don't need to be, which is introducing delays.  It particular, things that can communicate directly over RFM or dolphin should just do so.  We should also figure out if we can put the c1oaf and c1pem models on the same machine, so that they can communicate directly over shared memory (SHMEM).  That should cut down on overhead quite a bit.  I'll start to look at a plan to do that.

 

  6026   Mon Nov 28 16:46:55 2011 kiwamuUpdateCDSc1sus is now up

I have restarted the c1sus machine and burt-restored c1sus and c1mcs to the day before Thank giving, namely 23rd of November.

Quote from #6020

I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferometer.

  6923   Thu Jul 5 16:49:35 2012 JenneUpdateComputersc1sus is funny

I was trying to use a new BLRMs c-code block that the seismic people developed, instead of Mirko's more clunky version, but putting this in crashed c1sus.

I reverted to a known good c1pem.mdl, and Jamie and I did a reboot, but c1sus is still funny - none of the models are actually running. 

rtcds restart all - all the models are happy again, c1sus is fine.

But, we still need to figure out what was wrong with the c-code block.

Also, the BLRMS channels are listed in a Daq Channels block inside of the (new) library part, so they're all saved with the new CDS system which became effective as of the upgrade.  (I made the Mirko copy-paste BLRMS into a library part, including a DAQ channels block before trying the c-code.  This is the known-working version to which I reverted, and we are currently running.)

  14719   Tue Jul 2 16:57:09 2019 gautamUpdateCDSc1sus is flaky

Since the work earlier this morning, the fast c1sus model has crashed ~5 times. Tried rebooting vertex FEs using the reboot script a few times, but the problem is persisting. I'm opting to do the full hard reboot of the 3 vertex FEs to resolve this problem.

Judging by Attachment #1, the processes have been stable overnight.

Attachment 1: c1sus_timing.png
c1sus_timing.png
  6924   Fri Jul 6 01:12:02 2012 JenneUpdateComputersc1sus is fine

Quote:

I was trying to use a new BLRMs c-code block that the seismic people developed, instead of Mirko's more clunky version, but putting this in crashed c1sus.

I reverted to a known good c1pem.mdl, and Jamie and I did a reboot, but c1sus is still funny - none of the models are actually running. 

rtcds restart all - all the models are happy again, c1sus is fine.

But, we still need to figure out what was wrong with the c-code block.

Also, the BLRMS channels are listed in a Daq Channels block inside of the (new) library part, so they're all saved with the new CDS system which became effective as of the upgrade.  (I made the Mirko copy-paste BLRMS into a library part, including a DAQ channels block before trying the c-code.  This is the known-working version to which I reverted, and we are currently running.)

 The reason I started looking at BLRMS and c1sus today was that the BLRMS striptool was totally wacky.  I finally figured out that the pemepics hadn't been burt restored, so none of the channels were being filtered.  It's all better now, and will be even better soon when Masha finishes updating the filters (she'll make her own elog later)

  3946   Thu Nov 18 14:05:06 2010 josephb, yutaUpdateCDSc1sus is alive!

Problem:

We broke c1sus by moving ADC cards around.

Solution:

We pulled all the cards out, examined all contacts (which looked fine), found 1 poorly connected cable internally, going between an ADC and ADC timing interface card  (that probably happened last night), and one of the two RFM fiber cables pulled out of its RFM card.

We then placed all of the cards back in with a new ordering, tightened down everything, and triple checked all connections were on and well fit.

 

Gotcha!

Joe forgot that slot 1 and slot 2 of the timing interface boards have their last channels reserved for duotone signals.  Thus, they shouldn't be used for any ADCs or DACs that need their last channel (such as MC3_LR sensor input).  We saw a perfect timing signal come in through the MC3_LR sensor input, which prevented damping. 

We moved the ADC timing interface card out of the 1st slot  of the timing interface board and into slot 6 of the timing interface board, which resolved the problem.

Final Configuration:

 

 Timing Interface Board

Timing Interface Slot 1 (Duotone) 2 (Duotone) 3 4 5 6 7 8 9 10 11 12 13
Card None DAC interface (can't use last channel) ADC Interface ADC interface ADC interface

ADC

interface

None None None DAC interface DAC interface None None

 PCIe Chassis

Slot 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
PCIe Number Do Not Use 1 6 5 4 9 8 7 3 2 14 13 12 17 16 15 11 10
Card None ADC DAC ADC ADC ADC BO BO BO BO DAC DAC BIO RFM None None None None

Still having Issues with:

ITM West damps.  ITM South damps, but the coil gains are opposite to the other optics in order to damp properly.

We also need to look into switching the channel names for the watchdogs on ITMX/Y in addition to the front end code changes.

  6787   Thu Jun 7 17:49:09 2012 JamieUpdateCDSc1sus in weird state, running models but unresponsive otherwise

Somehow c1sus was in a very strange state.  It was running models, but EPICS was slow to respond.  We could not log into it via ssh, and we could not bring up test points.  Since we didn't know what else to do we just gave it a hard reset.

Once it came it, none of the models were running.  I think this is a separate problem with the model startup scripts that I need to debug.  I logged on to c1sus and ran:

rtcds restart all

(which handles proper order of restarts) and everything came up fine.

Have no idea what happened there to make c1sus freeze like that.  Will keep an eye out.

  3653   Tue Oct 5 16:58:41 2010 josephb, yutaUpdateCDSc1sus front end status

We moved the filters for the mode cleaner optics over from the C1SUS.txt file in /opt/rtcds/caltech/c1/chans/ to the C1MCS.txt file, and placed SUS_ on the front of all the filter names.  This has let us load he filters for the mode cleaner optics.

At the moment, we cannot seem to get testpoints for the optics (i.e. dtt is not working, even the specially installed ones on rosalba). I've asked Yuta to enter in the correct matrix elements and turn the correct filters on, then save with a burt backup.

  4183   Fri Jan 21 15:26:15 2011 josephbUpdateCDSc1sus broken yesterday and now fixed

[Joe, Koji]
Yesterday's CDS swap of c1sus and c1iscex left the interfometer in a bad state due to several issues.

The first being a need to actually power down the IO chassis completely (I eventually waited for a green LED to stop glowing and then plugged the power back in) when switching computers.  I also plugged and plugged the interface cable from the IO chassis and computer while powered down.  This let the computer actually see the IO chassis (previously the host interface card was glowing just red, no green lights).

Second, the former c1iscex computer and now new c1sus computer only has 6 CPUs, not 8 like most of the other front ends.  Because it was running 6 models (c1sus, c1mcs, c1rms, c1rfm, c1pem, c1x02) and 1 CPU needed to be reserved for the operating system, 2 models were not actually running (recycling mirrors and PEM).  This meant the recycling mirrors were left swinging uncontrolled.

To fix this I merged the c1rms model with the c1sus model.  The c1sus model now controls BS, ITMX, ITMY, PRM, SRM.  I merged the filter files in the /chans/ directory, and reactivated all the DAQ channels.  The master file for the fb in the /target/fb directory had all references to c1rms removed, and then the fb was restarted via "telnet fb 8088" and then "shutdown".

My final mistake was starting the work late in the day.

So the lesson for Joe is, don't start changes in the afternoon.

Koji has been helping me test the damping and confirm things are really running.  We were having some issues with some of the matrix values.  Unfortunately I had to add them by hand since the previous snapshots no longer work with the models.

  6738   Fri Jun 1 08:01:46 2012 steveUpdateComputersc1sus and c1iscex are down

Quote:

Something bad happened to c1sus and c1iscex ~20 min ago.  They both have "0x2bad" 's.  I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed.  The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).

If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.

 

 

Attachment 1: compdown.png
compdown.png
  6737   Fri Jun 1 02:33:40 2012 JenneUpdateComputersc1sus and c1iscex - bad fb connections

Something bad happened to c1sus and c1iscex ~20 min ago.  They both have "0x2bad" 's.  I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed.  The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).

If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.

  6740   Fri Jun 1 09:50:50 2012 JamieUpdateComputersc1sus and c1iscex - bad fb connections

Quote:

Something bad happened to c1sus and c1iscex ~20 min ago.  They both have "0x2bad" 's.  I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed.  The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).

If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.

 This is at least partially related to the mx_stream issue I reported previously.  I restarted mx_stream on c1iscex and that cleared up the models on that machine.

Something else is happening with c1sus.  Restarting mx_stream on c1sus didn't help.  I'll try to fix it when I get over there later.

  6742   Fri Jun 1 14:40:24 2012 JamieUpdateComputersc1sus and c1iscex - bad fb connections

Quote:

This is at least partially related to the mx_stream issue I reported previously.  I restarted mx_stream on c1iscex and that cleared up the models on that machine.

Something else is happening with c1sus.  Restarting mx_stream on c1sus didn't help.  I'll try to fix it when I get over there later.

I managed to recover c1sus.  It required stopping all the models, and the restarting them one-by-one:

$ rtcds stop all     # <-- this does the right to stop all the models with the IOP stopped last, so they will all unload properly.

$ rtcds start iop

$ rtcds start c1sus c1mcs c1rfm

I have no idea why the c1sus models got wedged, or why restarting them in this way fixed the issue.

  4733   Tue May 17 18:09:13 2011 Jamie, KiwamuConfigurationCDSc1sus and c1auxey crashed, rebooted

c1sus and c1auxey crashed, required hard reboot

For some reason, we found that c1sus and c1auxey were completely unresponsive.  We went out and gave them a hard reset, which brought them back up with no problems.

This appears to be related to a very similar problem report by Kiwamu just a couple of days ago, where c1lsc crashed after editing the C1LSC.ini and restarting the daqd process, which is exactly what I just did (see my previous log).  What could be causing this?

  3945   Thu Nov 18 11:06:20 2010 josephbUpdateCDSc1sus and ADCs

Problem:

ADCs are timing out on c1sus when we have more than 3.

Talked with Rolf:

Alex will be back tomorrow (he took yesterday and today off), so I talked with Rolf.

He said ordering shouldn't make a difference and he's not sure why would be having a problem. However, when he loads the chassis, he tends to put all the ADCs on the same PCI bus (the back plane apparently contains multiples).  Slot 1 is its own bus, Slots 2-9 should be the same bus, and 10-17should be the same bus.

He also mentioned that when you use dmesg and see a line like "ADC TIMEOUT # ##### ######", the first number should be the ADC number, which is useful for determining which one is reporting back slow.

Plan:

Disconnect c1sus IO chassis completely, pull it out, pull out all cards, check connectors, and repopulate with Rolf's suggestions and keeping this elog in mind.

In regards to the RFM, it looks like one of the fibers had been disconnected from  the c1sus chassis RFM card (its plugged in in the middle of the chassis so its hard to see) during all the plugging in and out of the cables and cards last night.

  17756   Sat Aug 5 01:42:19 2023 KojiUpdatePEMc1sus DAC1/DAC2 negative side DAC outputs were isolated from the downstream chain

As reported already, we resolved the c1sus DAC0 issue by introducing the differential receivers for the DAC output CHs.

The issue remained in DAC1 (SRM/MC2/MC1/MC3) and DAC2 (All 8 vertex side OSEMs). Their outputs were still received by the single-ended AI boards.
This meant that DAC1 and DAC2 negative side outputs were shorted to ground. This was not great.
(See Attachment 1 for the configuration)

Mitigation

The DAC SCSI-IDC40 adapter units were modified to resolve the issue. This modification was done for the two adapter units among the three.
The modified units were indicated as "DAC Neg out isolated"

These adapter boxes will eventually be removed when we make the transition to the aLIGO electronics. So it's convenient us to touch them.
In the box, the + and - outputs were lined up with a narrow gap. So the modification was a bit cumbersome. The below is the procedure

  • First, the solder resist was removed at the corner of the wiring, and copper lines were cut one by one.
  • The isolation of the DAC +/-, the isolation of the output sides each other were checked for all the channels.
  • Note1: I used a spare PCB found on the workbench for the first modification. I found that one of the left channel (CH9-16) has the output +/- pin connected internally. This would cause + output shorted to the gnd when RevB Ai board is connected. This problem was resolved by cutting one of the IDC40 pin. So don't be surprised if you find one of the pin is missing for the left IDC40 connector for DAC1.
  • Note2: One of the + signal line was damaged why the PCB for DAC2 was prepared.

Result

The replacement was done with the watchdogs shutdown. After the replacement, the watchdogs were reloaded and the MC locking were resumed.
I found the apparent behavior of the suspension looked OK but we should keep checking if everything is absolutely fine.

It was surprising that the BS/PRM/SRM OPLEVs lost their spots, but it turned out that it is not related to the electronics work. (See next elog)

 

Attachment 1: Vertex_SUS_DAC_config.pdf
Vertex_SUS_DAC_config.pdf
Attachment 2: DAC1_BOTTOM_WIDE.jpg
DAC1_BOTTOM_WIDE.jpg
Attachment 3: DAC1_BOTTOM_ZOOM.jpg
DAC1_BOTTOM_ZOOM.jpg
Attachment 4: DAC1_TOP_WIDE.jpg
DAC1_TOP_WIDE.jpg
Attachment 5: DAC1_TOP_ZOOM.jpg
DAC1_TOP_ZOOM.jpg
Attachment 6: DAC2_BOTTOM_WIDE.jpg
DAC2_BOTTOM_WIDE.jpg
Attachment 7: DAC2_BOTTOM_ZOOM.jpg
DAC2_BOTTOM_ZOOM.jpg
Attachment 8: DAC2_TOP_WIDE.jpg
DAC2_TOP_WIDE.jpg
Attachment 9: DAC2_TOP_ZOOM.jpg
DAC2_TOP_ZOOM.jpg
ELOG V3.1.3-