40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 278 of 341  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  15231   Thu Feb 27 17:50:36 2020 gautamUpdatePSLc1psl setup setup

[many people]

in prep for the install tomorrow, we did the following:

  • Install the c1psl Supermicro in the 1X2 rack (Attachment 1). To make room we removed the anti-image filter and mounted it on the OMC rack.
  • Set up a local workstation (monitor+mouse+keyboard) for the Supermicro so we can do some local testing (Attachment 2).
  • Clear up the immediate area around the 1X1/1X2 rack, setup a cart for the Acromag.
  • Make sure there are sufficient adaptor boards cables (DB37, DB15, DB9, DB25, ethernet) etc available at the cart.
  • Label cables, connect on Acromag chassis end (Attachment 3).
  • Keep some large (A3) printouts of the channel mapping handy by the cart.
  • made sure we have open fuse-able DIN rail connectors for +/-15 V DC and +/-24 V DC for the Acromag box (we are waiting on some thinner gauge cabling for the 24V supply, once that arrives, we will power the box from the Sorensens. For now, they are powered by bench supplies on the cart).
  • made sure c1psl1 (still this name for the Supermicro) is ssh-able.

Barring objections, tomorrow (Friday 28 Feb 2020) morning I will commence the switch (I still want to work on the IFO tonight).

Attachment 1: 20200227_173535.jpg
20200227_173535.jpg
Attachment 2: 20200227_173454_HDR.jpg
20200227_173454_HDR.jpg
Attachment 3: 20200227_172659.jpg
20200227_172659.jpg
  15234   Fri Feb 28 08:05:22 2020 gautamUpdatePSLc1psl setup setup

And so it begins.

Quote:

Barring objections, tomorrow (Friday 28 Feb 2020) morning I will commence the switch

  15235   Fri Feb 28 10:04:41 2020 gautamUpdatePSLc1psl setup setup

Summary:

There are several problems evident already.

  1. Several EPICS database entries were missing. WTF.
  2. After fixing the missing entries, the PMC could be locked. However, the IMC could not be locked.
  3. I think the FSS Interface card is not configured correctly.

For now, I've returned the old c1psl connections, the PMC and IMC are both locked. Need to do some debugging on the bench.

  15239   Mon Mar 2 16:35:12 2020 gautamUpdateCDSc1psl test status

Channel list with test status
== Test Status ==

[done] Lock PMC and IMC
[done] IMC Servo board test
[done] IMC LO Det Mon channel check
[0th order] WFS quadrant DC mon
[none] WFS I/F monitors
[0th order] WFS attenuators
[none] IOO QPD channels
[done] FSS readbacks 
[done] PMC readbacks


Some more detailed elogs about the individual tests will follow.

Basically, I have characterized the IMC Servo board in detail. The summary finding is that the IN2 (=AO gain) slider needs to be investigated. 

All other channels need to be verified in a more thorough fashion than my basic checks which were just to guarantee the core interferometer functionality, which is important to me.

  12849   Thu Feb 23 15:48:43 2017 johannesUpdateComputersc1psl un-bootable

Using the PDA520 detector on the AS port I tried to get some better estimates for the round-trip loss in both arms. While setting up the measurement I noticed some strange output on the scope I'm using to measure the amount of reflected light.

The interferometer was aligned using the dither scripts for both arms. Then, ITMY was majorly misaligned in pitch AND yaw such that the PD reading did not change anymore. Thus, only light reflected from the XARM was incident of the AS PD. The scope was showing strange oscillations (Channel 2 is the AS PD signal):

For the measurement we compare the DC level of the reflection with the ETM aligned (and the arm locked) vs a misaligned ETM (only ITM reflection). This ringing could be observed in both states, and was qualitatively reproducible with the other arm. It did not show up in the MC or ARM transmission. I found that changing the pitch of the 'active' ITM (=of the arm under investigation) either way by just a couple of ticks made it go away and settle roughly at the lower bound of the oscillation:

In this configuration the PD output follows the mode cleaner transmission (Channel 3 in the screen caps) quite well, but we can't take the differential measurement like this, because it is impossible to align and lock the arm but them misalign the ITM. Moving the respective other ITM for potential secondary beams did not seem to have an obvious effect, although I do suspect a ghost/secondary beam to be the culprit for this. I moved the PDA520 on the optical table but didn't see a change in the ringing amplitude. I do need to check the PD reflection though.

Obviously it will be hard to determine the arm loss this way, but for now I used the averaging function of the scope to get rid of the ringing. What this gave me was:
(16 +/- 9) ppm losses in the x-arm and (-18+/-8) ppm losses in the y-arm

The negative loss obviously makes little sense, and even the x-arm number seems a little too low to be true. I strongly suspect the ringing is responsible and wanted to investigate this further today, but a problem with c1psl came up that shut down all work on this until it is fixed:

I found the PMC unlocked this morning and c1psl (amongst other slow machines) was unresponsive, so I power-cycled them. All except c1psl came back to normal operation. The PMC transmission, as recorded by c1psl,  shows that it has been down for several days:

Repeated attempts to reset and/or power-cycle it by Gautam and myself could not bring it back. The fail indicator LED of a single daughter card (the DOUT XVME-212) turns off after reboot, all others stay lit. The sysfail LED on the crate is also on, but according to elog 10015 this is 'normal'. I'm following up that post's elog tree to monitor the startup of c1psl through its system console via a serial connection to find out what is wrong.

  12850   Thu Feb 23 18:52:53 2017 ranaUpdateComputersc1psl un-bootable

The fringes seen on the oscope are mostly likely due to the interference from multiple light beams. If there are laser beams hitting mirrors which are moving, the resultant interference signal could be modulated at several Hertz, if, for example, one of the mirrors had its local damping disabled.

  12851   Thu Feb 23 19:44:48 2017 johannesUpdateComputersc1psl un-bootable

Yes, that was one of the things that I wanted to look into. One thing Gautam and I did that I didn't mention was to reconnect the SRM satellite box and move the optic around a bit, which didn't change anything. Once the c1psl problem is fixed we'll resume with that.

Quote:

The fringes seen on the oscope are mostly likely due to the interference from multiple light beams. If there are laser beams hitting mirrors which are moving, the resultant interference signal could be modulated at several Hertz, if, for example, one of the mirrors had its local damping disabled.

 

Speaking of which:

Using one of the grey RJ45 to D-Sub cables with an RS232 to USB adapter I was able to capture the startup log of c1psl (using the usb camera windows laptop). I also logged the startup of the "healthy" c1aux, both are attached. c1psl stalls at a point were c1aux starts testing for present vme modules and doesn't continue, however is not strictly hung up, as it still registers to the logger when external login attempts via telnet occur. The telnet client simply reports that the "shell is locked" and exits. It is possible that one of the daughter cards causes this. This seems to happen after iocInit is called by the startup script at /cvs/cds/caltech/target/c1psl/startup.cmd, as it never gets to the next item "coreRelease()". Gautam and I were trying to find out what happends inside iocInit, but it's not clear to us at this point from where it is even called. iocInit.c and compiled binaries exist in several places on the shared drive. However, all belong to R3.14.x epics releases, while the logfile states that the R3.12.2 epics core is used when iocInit is called.

Next we'll interrupt the autoboot procedure and try to work with the machine directly.

Attachment 1: slow_startup_logs.tar.gz
  12854   Tue Feb 28 01:28:52 2017 johannesUpdateComputersc1psl un-bootable

It turned out the 'ringing' was caused by the respective other ETM still being aligned. For these reflection measurements both test masses of the other arm need to be misaligned. For the ETM it's sufficient to use the Misalign button in the medm screens, while the ITM has to be manually misaligned to move the reflected beam off the PD.

I did another round of armloss measurements today. I encountered some problems along the way

  • Some time today (around 6pm) most of the front end models had crashed and needed to be restarted GV: actually it was only the models on c1lsc that had crashed. I noticed this on Friday too.
  • ETMX keeps getting kicked up seemingly randomly. However, it settles fast into it's original position.

General Stuff:

  • Oscilloscope should sample both MC power (from MC2 transmitted beam) and AS signal
  • Channel data can only be loaded from the scope one channel at a time, so 'stop' scope acquisition and then grab the relevant channels individually
  • Averaging needs to be restarted everytime the mirrors are moved triggering stop and run remotely via the http interface scripts does this.

Procedure:

  1.     Run LSC Offsets
  2.     With the PSL shutter closed measure scope channel dark offsets, then open shutter
  3.     Align all four test masses with dithering to make sure the IFO alignment is in a known state
  4.     Pick an arm to measure
  5.     Turn the other arm's dither alignment off
  6.     'Misalign' that arm's ETM using medm screen button
  7.     Misalign that arm's ITM manually after disabling its OpLev servos looking at the AS port camera and make sure it doesn't hit the PD anymore.
  8.     Disable dithering for primary arm
  9.     Record MC and AS time series from (paused) scope
  10.     Misalign primary ETM
  11.     Repeat scope data recording

Each pair of readings gives the reflected power at the AS port normalized to the IMC stored power:

\widehat{P}=\frac{P_{AS}-\overline{P}_{AS}^\mathrm{dark}}{P_{MC}-\overline{P}_{MC}^\mathrm{dark}}

which is then averaged. The loss is calculated from the ratio of reflected power in the locked (L) vs misaligned (M) state from

\mathcal{L}=\frac{T_1}{4\gamma}\left[1-\frac{\overline{\widehat{P}_L}}{\overline{\widehat{P}_M}} +T_1\right ]-T_2

Acquiring data this way yielded P_L/P_M=1.00507 +/- 0.00087 for the X arm and P_L/P_M=1.00753 +/- 0.00095 for the Y arm. With \gamma_x=0.832 and \gamma_x=0.875 (from m1=0.179, m2=0.226 and 91.2% and 86.7% mode matching in X and Y arm, respectively) this yields round trip losses of:

\mathcal{L}_X=21\pm4\,\mathrm{ppm}  and  \mathcal{L}_Y=13\pm4\,\mathrm{ppm}, which is assuming a generalized 1% error in test mass transmissivities and modulation indices. As we discussed, this seems a little too good to be true, but at least the numbers are not negative.

  14455   Thu Feb 14 23:14:12 2019 gautamUpdateCDSc1rfm errors

The pressure is still 2e-4 torr according to CC1 so I thought I'd give ASS debugging a go tonight. But the arm transmission signal isn't coming through to the LSC model from the end PDs - so a resurfacing of this problem. Rebooting the sender model, c1scy, did not fix the problem. Moreover, c1susaux is dead. The last time I rebooted it, ITMY got stuck so I'm not going to attempt a revival tonight.

  15240   Mon Mar 2 19:32:41 2020 gautamUpdateCDSc1rfm errors

Had to reboot both end machines and the c1rfm model to get the TRX and TRY signals to the LSC models. Now both arms can be locked using POX/POY respectively.

Attachment 1: RFMerrors.png
RFMerrors.png
  14457   Fri Feb 15 15:22:08 2019 gautamUpdateCDSc1rfm errors persist

I restarted c1scyc1rfm (so both sender and receiver models were cycled) and power-cycled the c1iscey and c1sus machines. The TRY PD is certainly seeing light - it is just not getting piped over to c1rfm. dmesg doesn't give any clues. I'm out of ideas.

P.S. The new reality seems to be that getting ITMY stuck in the event of a c1susaux reboot is inevitable. As is the practise for ITMX, I tried slowly ramping the PIT and YAW biases to 0 slowly - but in the process of ramping YAW to 0, the optic got stuck. I am ramping in steps of 0.1 (in units of the PIT/YAW sliders, waiting ~3 seconds between steps), I guess I can try ramping even more slowly.

Update: I power cycled the physical RFM switch. This necessitated reboot of all vertex FEs. But seems like things are back to normal now...

Note: to unstick ITMY, seems like the best approach is:

  1. Jiggle bias until SIDE shadow sensor is on average above it's half-light level. This is the critical step. A bias of +20000 cts on the fast SIDE output seems to help.
  2. Set YAW bias to -10, ramp down the BIAS in steps of 0.1, watching shadow sensor levels to ensure optic doesn't get stuck again.
  3. Hope for the best. Iterate if necessary.
Quote:

The pressure is still 2e-4 torr according to CC1 so I thought I'd give ASS debugging a go tonight. But the arm transmission signal isn't coming through to the LSC model from the end PDs - so a resurfacing of this problem. Rebooting the sender model, c1scy, did not fix the problem. Moreover, c1susaux is dead. The last time I rebooted it, ITMY got stuck so I'm not going to attempt a revival tonight.

Attachment 1: Screenshot_from_2019-02-15_15-21-47.png
Screenshot_from_2019-02-15_15-21-47.png
  15920   Mon Mar 15 20:22:01 2021 gautamUpdateASCc1rfm model restarted

On Friday, I felt that the ASC performance when the PRFPMI was locked was not as good as it used to be, so I looked into the situation a bit more. As part of my ASC model revamp in December, I made a bunch of changes to the signal routing, and my suspicion was that the control signals weren't even reaching the ETMs. My log says that I recompiled and reinstalled the c1rfm model (used to pipe the ASC control signals to the ETMs), and indeed, the file was modified on Dec 21. But for whatever reason, the C1RFM.ini (=Dolphin receiver since the ASC control signals are sent to this model over the Dolphin network from the c1ioo machine which hosts the C1:ASC- namespace, and RFM sender to the ETMs, but this path already existed) file never picked up the new channels. Today, I recompiled, re-installed, and restarted the models, and confirmed that the control signals actually make it to the ETMs. So now we can have the QPD-based ASC loops engaged once again for the PRFPMI lock. The CDS system did not crash 🎉 . See Attachments #1-3.

I checked the loop performance in the POX/POY locked config by first deliberately misaligning the ETMs, and then engagin the loops - seems to work (Attachment #4). The loop shapes have to be tweaked a bit and I didn't engage the integrators, hence the DC pointing wasn't recovered. Also, added a line to the script that turns the ASC loops on to set limits for all the loops - in the testing process, one of the loops ran away and I tripped the ETMY watchdog. It has since been recovered. I SDFed a limit of 100cts just to be on the conservative side for model reboot situations - the value in the script can be raised/lowered as deemed necessary (sorry, I don't know the cts-->urad number off the top of my head).

But the hope is this improves the power buildup, and provides stability so that I can begin to commission the AS WFS system a bit.

Attachment 1: RFM.png
RFM.png
Attachment 2: CDSoverview.png
CDSoverview.png
Attachment 3: RFMchans.png
RFMchans.png
Attachment 4: ASCloops.png
ASCloops.png
  11883   Tue Dec 15 11:22:53 2015 gautamUpdateCDSc1scx and c1asx crashed

I noticed what I thought was excessive movement of the beam spot on ITMX and ETMX on the control room monitors, and when I checked the CDS FE status overview MEDM screen, I saw that c1scx and c1asx had crashed. I ssh-ed into c1iscex and restarted both models, and then restarted fb as well. However, the DAQ-DCO_C1SCX_STATUS indicator remains red even after restarting fb (see attached screenshot). I am not sure how to fix this so I am leaving it as is for now, and the X arm looks to have settled down.

Attachment 1: CDS_FE_STATUS_OVERVIEW_15DEC2015.png
CDS_FE_STATUS_OVERVIEW_15DEC2015.png
  7008   Mon Jul 23 18:57:52 2012 JamieUpdateCDSc1scx and c1scy models recompiled and restarted

After the changes listed in 7005 and 7007, I have rebuilt, installed, and restarted the c1scx and c1scy models.  Everything seems to have come back up ok.

Running into some daqd troubles because of a change to c1ioo, but will report on the new ALS channels when I can.

  6436   Thu Mar 22 16:45:06 2012 kiwamuUpdateCDSc1scx and c1scy not properly running

It seems that neither c1scx nor c1scy is working properly as their ADC counts are showing digital-zeros.

However the IOPs, c1gcx and c1gcy look running fine, and also the IOPs seem successfully recognizing the ADCs according to dmesg.

Also there is one more confusing fact : c1scx and c1scy are synchronizing to the timing signal somehow.

I restarted the c1scx front end model to see if this helps, but unfortunately it didn't work.

As this is not the top priority concern for now, I am leaving them as they are now with the watchgods off.

(I may try hardware rebooting them in this evening)

Quote from #6434

The power was turned back on at 4pm It took some time for Suresh to restart the computers. We have damping but things are not perfect yet. Auto BURTH did not work well.

 

  6438   Thu Mar 22 17:41:15 2012 sureshUpdateCDSc1scx and c1scy not properly running

Quote:

It seems that neither c1scx nor c1scy is working properly as their ADC counts are showing digital-zeros.

Quote from #6434

The power was turned back on at 4pm It took some time for Suresh to restart the computers. We have damping but things are not perfect yet. Auto BURTH did not work well.

 When Steve and I restarted the c1iscex and c1iscey computers after the power shutdown, the models within them did not start-up automatically.  I had to start them manually from a terminal in the control room. 

I also tried rebooting the FB a couple of times.  Did not make any difference.

Manually starting the c1x05, c1scy and c1x01, c1scx models (with the Burt Restore button ON) did not resolve the issue of zeros in the epics screens.  though it did re-establish timing. 

  6439   Thu Mar 22 23:43:56 2012 KojiUpdateCDSc1scx and c1scy not properly running

Did you guys checked if the simplant switch is set to "REAL WORLD" mode?

Edit by KI:

Bingo ! The input signals were bypassed to the simplant. I switched the simplant settings to REAL WORLD and now both end suspensions are working fine.

  5535   Sat Sep 24 01:38:14 2011 kiwamuUpdateCDSc1scx and c1x01 restarted

[Koji / Kiwamu]

 The c1scx and c1x01 realtime processes became frozen. We restarted them around 1:30 by sshing and running the kill/start scripts.

  6175   Fri Jan 6 01:00:56 2012 kiwamuUpdateCDSc1scx out of sync

Both the c1scx and its IOP realtime processes became out of sync.

Initially I found that the c1scx didn't show any ADC signals, though the sync sign was green.

Then I software-rebooted the c1iscex machine and then it became out of sync.

For tonight this is fine because I am concentrating on the central part anyway.

  4173   Thu Jan 20 04:03:02 2011 kiwamuUpdateCDSc1scy error

 I found that c1scy was not running due to a daq initialization error.

 I couldn't figure out how to fix it, so I am leaving it to Joe.


 Here is the error messages in the dmesg on c1iscey
[   39.429002] c1scy: Invalid num daq chans = 0
[   39.429002] c1scy: DAQ init failed -- exiting
 
 
Before I found this fact, I rebooted c1iscey in order to recover the synchronization with fb.
The synchronization had been lost probably because I shutdowned the daqd on fb.
  4175   Thu Jan 20 10:15:50 2011 josephbUpdateCDSc1scy error

This is caused by an insufficient number of active DAQ channels in the C1SCY.ini file located in /opt/rtcds/caltech/c1/chans/daq/.  A quick look (grep -v # C1SCY.ini) indicates there are no active channels.  Experience tells me you need at least 2 active channels.

Taking a look at the activateDAQ.py script in the daq directory, it looks like the C1SCY.ini file is included, by the loop over optics is missing ETMY.  This caused the file to improperly updated when the activateDAQ.py script was run.  I have fixed the C1SCY.ini file (ran a modified version of the activate script on just C1SCY.ini).

I have restarted the c1scy front end using the startc1scy script and is currently working.

Quote:
 Here is the error messages in the dmesg on c1iscey
[   39.429002] c1scy: Invalid num daq chans = 0
[   39.429002] c1scy: DAQ init failed -- exiting
 

 

  8626   Thu May 23 10:24:23 2013 JamieSummaryCDSc1scy model continues to run at the hairy edge

c1scy, the controller model at the Y END, is still running very long, typically at 55/60 microseconds, or ~92% of it's cycle.  It's currently showing a recorded max cycle time (since last restart or reset) of 60, which means that it has actually hit it's limit sometime in the very recent past.  This is obviously not good, since it's going to inject big glitches into ETMY.

c1scy is actually running a lot less code than c1scx, but c1scx caps out it's load at about 46 us.  This indicates to me that it must be some hardware configuration setting in the c1iscey computer.

I'll try to look into this more as soon as I can.

  9441   Wed Dec 4 21:33:24 2013 KojiUpdateCDSc1scy time-over issue mitigated

c1scy had frequent time-over. This caused the glitches of the OSEM damping servos.

Today Eric Q was annoyed by the glitches while he worked on the green PDH inspection at the Y-end.

In order to mitigate this issue, low priority RFM channels are moved from c1scy to c1tst.
The moved channels (see Attachment 1) are supposed to be less susceptible to the additional delay.

This modification required the following models to be modified, recompiled, reinstalled, and restarted
in the listed order:
c1als, c1sus, c1rfn, c1tst, c1scy

Now the models are are running. CDS status is all green.
The time consumption of c1scy is now ~30us (porevious ~60us)
(see Attachment 2)

I am looking at the cavity lock of TEM00 and I have witnessed no glitch any more.
In fact, the OSEM signals have no glitch. (see Attachment 3)

We still have c1mcs having regularly time-over. Can I remove the WFS->OAF connections temporarily?

Attachment 1: TST.png
TST.png
Attachment 2: CDS.png
CDS.png
Attachment 3: no_glitch.png
no_glitch.png
  5786   Wed Nov 2 17:29:10 2011 KatrinUpdateCDSc1scy.mdl compiled

Slight modification on that model:

  • terminated Q_out of Lockins to be able to compile the old model
  • assigned other ADC channels to GCY (green YARM)
  16728   Tue Mar 15 14:10:41 2022 AnchalSummaryCDSc1su2 model remade, reinstalled, restarted after the update

I have restarted c1su2 model with the connections of Run Acquire switch to analog filters on coil drivers. Following steps were taken:

First ssh to c1sus2 and then:

controls@c1sus2:~ 0$ rtcds make c1su2
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### building c1su2...
Cleaning c1su2...
Done
Parsing the model c1su2...
Done
Building EPICS sequencers...
Done
Building front-end Linux kernel module c1su2...
Done
RCG source code directory:
/opt/rtcds/rtscore/branches/branch-3.4
The following files were used for this build:
/opt/rtcds/userapps/release/cds/common/models/lockin.mdl
/opt/rtcds/userapps/release/cds/common/models/rtbitget.mdl
/opt/rtcds/userapps/release/cds/common/models/rtdemod.mdl
/opt/rtcds/userapps/release/isc/common/models/QPD.mdl
/opt/rtcds/userapps/release/sus/c1/models/c1su2.mdl
/opt/rtcds/userapps/release/sus/c1/models/lib/sus_single_control.mdl

Successfully compiled c1su2
***********************************************
Compile Warnings, found in c1su2_warnings.log:
***********************************************
WARNING  *********** No connection to subsystem output named  SUS_DAC1_12  
WARNING  *********** No connection to subsystem output named  SUS_DAC1_13  
WARNING  *********** No connection to subsystem output named  SUS_DAC1_14  
WARNING  *********** No connection to subsystem output named  SUS_DAC1_15  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_7  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_8  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_9  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_10  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_11  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_12  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_13  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_14  
WARNING  *********** No connection to subsystem output named  SUS_DAC2_15  
***********************************************
controls@c1sus2:~ 0$ rtcds install c1su2
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### installing c1su2...
Installing system=c1su2 site=caltech ifo=C1,c1
Installing /opt/rtcds/caltech/c1/chans/C1SU2.txt
Installing /opt/rtcds/caltech/c1/target/c1su2/c1su2epics
Installing /opt/rtcds/caltech/c1/target/c1su2
Installing start and stop scripts
/opt/rtcds/caltech/c1/scripts/killc1su2
/opt/rtcds/caltech/c1/scripts/startc1su2
Performing install-daq
Updating testpoint.par config file
/opt/rtcds/caltech/c1/target/gds/param/testpoint.par
/opt/rtcds/rtscore/branches/branch-3.4/src/epics/util/updateTestpointPar.pl -par_file=/opt/rtcds/caltech/c1/target/gds/param/archive/testpoint_220315_135808.par -gds_node=26 -site_letter=C -system=c1su2 -host=c1sus2
Installing GDS node 26 configuration file
/opt/rtcds/caltech/c1/target/gds/param/tpchn_c1su2.par
Installing auto-generated DAQ configuration file
/opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
Installing Epics MEDM screens
Running post-build script

/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-AS1_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-AS1_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-AS1_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-AS4_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-AS4_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-AS4_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-LO1_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-LO1_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-LO1_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-LO2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-LO2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-LO2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-PR2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-PR2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-PR2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-PR3_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-PR3_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-PR3_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-SR2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-SR2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-SR2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_TO_COIL_KB.adl
safe.snap exists
controls@c1sus2:~ 0$

Then on rossa, run activateSUS2DQ.py which creates a file C1SU2.ini.NEW. Remove old backup file C1SU2.ini.bak, rename C1SU2.ini to C1SU2.ini.bak and rename C1SU2.ini.NEW to C1SU2.ini:

~> cd /opt/rtcds/caltech/c1/chans/daq/
daq>python2 activateSUS2DQ.py 
/opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
daq>rm C1SU2.ini.bak
daq>mv C1SU2.ini C1SU2.ini.bak
daq>mv C1SU2.ini.NEW C1SU2.ini

Then ssh back to c1sus2 and restart the rtcds model:

controls@c1sus2:~ 0$ rtcds restart c1su2
### stopping c1su2...
### starting c1su2...
c1su2epics: no process found
Number of ADC cards on bus = 2
Number of DAC16 cards on bus = 3
Number of DAC18 cards on bus = 0
Number of DAC20 cards on bus = 0
Specified filename iocC1.log does not exist.
c1su2epics C1 IOC Server started
c1su2 RT ready in 4
awg_server Version $Id$
channel_client Version $Id$
testpoint_server Version $Id$
/opt/rtcds/caltech/c1/target/gds/bin/awgtpman -s c1su2 -l /opt/rtcds/caltech/c1/target/gds/awgtpman_logs/c1su2.log started on host c1sus2 hostid ffffffffa8c05771 
awgtpman Version $Id$
controls@c1sus2:~ 0$

Then restart daqd services from rossa and burtrestore to latest snap of c1su2epics.snap:

daq>telnet fb 8083
Trying 192.168.113.201...
Connected to fb.martian.
Escape character is '^]'.
daqd> shutdown
OK
Connection closed by foreign host.
daq>burtgooey
>burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/latest/c1su2epics.snap -l /tmp/controls_1220315_140755_0.write.log -o /tmp/controls_1220315_140755_0.nowrite.snap -v <
daq>

All suspensions are back online and everything is same as before now. Will test later the Run/Acquire switch functionality.

  16726   Tue Mar 15 11:52:34 2022 AnchalSummaryCDSc1su2 model updated for sending Run/Acquire Binary Output to Binary Interface card

I routed the XXX_COIL_DW signals from the 7 SOS blocks in c1su2.mdl (located at /cvs/cds/rtcds/userapps/trunk/sus/c1/models/c1su2.mdl) to the binary outputs from the FE model. The routing is done such that when these binary outputs are routed through the binary interface card mounted on 1Y0, they go to the acromag chassis just installed and from there they go to the binary inputs of the coil drivers together with the acromag controlled coil outputs.

I have not restarted the rtcds models yet. This needs more care and need to follow instructions from 40m/16533. Will do that sometime later or Koji can follow up this work.

Attachment 1: c1su2.pdf
c1su2.pdf
  16533   Wed Dec 22 17:40:22 2021 AnchalSummaryCDSc1su2 model updated with SUS damping blocks for 7 SOSs

[Anchal, Koji]

I've updated the c1su2 model today with model suspension blocks for the 7 new SOSs (LO1, LO2, AS1, AS4, SR2, PR2 and PR3). The model is running properly now but we had some difficulty in getting it to run.

Initially, we were getting 0x2000 error on the c1su2 model CDS screen. The issue probably was high data transmission required for all the 7 SOSs in this model. Koji dug up a script /opt/rtcds/caltech/c1/userapps/trunk/cds/c1/scripts/activateDQ.py that has been used historically for updating the data rate on some of theDQ channels in the suspension block. However, this script was not working properly for Koji, so he create a new script at /opt/rtcds/caltech/c1/chans/daq/activateSUS2DQ.py.

[Ed by KA: I could not make this modified script run so that I replaces the input file (i.e. C1SU2.ini). So the output file is named C1SU2.ini.NEW and need to manually replace the original file.]

With this, Koji was able to reduce acquisition rate of SUSPOS_IN1_DQ, SUSPIT_IN1_DQ, SUSYAW_IN1_DQ, SUSSIDE_IN1_DQ, SENSOR_UL, SENSOR_UR, SENSOR_LL,SENSOR_LR, SENSOR_SIDE, OPLEV_PERROR, OPLEV_YERROR, and OPLEV_SUM to 2048 Sa/s. The script modifies the /opt/rtcds/caltech/c1/chans/daq/C1SU2.ini file which would get re-written if c1su2 model is remade and reinstalled. After this modification, the 0x2000 error stopped appearing and the model is running fine.


Should we change the library model part for sus_single_control.mdl

We notice that all our suspension models need to go through this weird python script modifying auto-generated .ini files to reduce the data rate. Ideally, there is a simpler solution to this by simply adding the datarate 2048 in the '#DAQ Channels' block in the model library part /cvs/cds/rtcds/userapps/trunk/sus/c1/models/lib/sus_single_control.mdl which is the root model in all the suspensions. With this change, the .ini files will automatically be written with correct datarate and there will be no need for using the activateDQ script. But we couldn't find why this simple solution was not implemented in the past, so we want to know if there is more stuff going on here then we know. Changing the library model would obviously change every suspension model and we don't want a broken CDS system on our head at the begining of holidays, so we'll leave this delicate task for the near future.

  16537   Wed Dec 29 20:09:40 2021 ranaSummaryCDSc1su2 model updated with SUS damping blocks for 7 SOSs

We want to maintain the 16 kHz sample rate for the COIL DAQ channels, but nothing wrong with reducing the others.

I would suggest setting the DQ sample rates to 256 Hz for the SUS DAMP channels and 1024 Hz for the OPLEV channels (for noise diagnostics).

Maybe you can put these numbers into a new library part and we can have the best of all worlds?

Quote:
 

Should we change the library model part for sus_single_control.mdl

We notice that all our suspension models need to go through this weird python script modifying auto-generated .ini files to reduce the data rate. Ideally, there is a simpler solution to this by simply adding the datarate 2048 in the '#DAQ Channels' block in the model library part /cvs/cds/rtcds/userapps/trunk/sus/c1/models/lib/sus_single_control.mdl which is the root model in all the suspensions. With this change, the .ini files will automatically be written with correct datarate and there will be no need for using the activateDQ script. But we couldn't find why this simple solution was not implemented in the past, so we want to know if there is more stuff going on here then we know. Changing the library model would obviously change every suspension model and we don't want a broken CDS system on our head at the begining of holidays, so we'll leave this delicate task for the near future.

 

  7165   Mon Aug 13 20:12:29 2012 jamieUpdateCDSc1sup model moved to c1lsc machine

I moved the c1sup simplant model to the c1lsc machine, where there was one remaining available processor.  This requires changing a bunch of IPC routing in the c1sus and c1lsp models.  I have rebuilt and installed the models, and have restarted c1sup, but have not restarted c1sus and c1lsp since they're currently in use.  I'll restart them first thing tomorrow.

  6619   Mon May 7 22:39:37 2012 DenUpdateCDSc1sus

[Jenne, Den]

We decided to reboot C1SUS machine in hope that this will fix the problem with seismic channels. After reboot the machine could not connect to framebuilder. We restarted mx_stream but this did not relp. Then we manually executed

/opt/rtcds/caltech/c1/target/fb/mx_stream -s c1x02 c1sus c1mcs c1rfm c1pem -d fb:0 -l /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log

but c1sus still could not connect to fb. This script returned the following error:

controls@c1sus ~ 128$ cat /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log


c1x02
c1sus
c1mcs
c1rfm
c1pem
mmapped address is 0x7fb5ef8cc000
mapped at 0x7fb5ef8cc000
mmapped address is 0x7fb5eb8cc000
mapped at 0x7fb5eb8cc000
mmapped address is 0x7fb5e78cc000
mapped at 0x7fb5e78cc000
mmapped address is 0x7fb5e38cc000
mapped at 0x7fb5e38cc000
mmapped address is 0x7fb5df8cc000
mapped at 0x7fb5df8cc000
send len = 263596
OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
mx_connect failed

Looks like CDS error. We are leaving the WATCHDOGS OFF for the night.

  10135   Mon Jul 7 13:44:21 2014 JenneUpdateCDSc1sus - bad fb connection

Quote:

 

I managed to recover c1sus.  It required stopping all the models, and the restarting them one-by-one:

$ rtcds stop all     # <-- this does the right to stop all the models with the IOP stopped last, so they will all unload properly.

$ rtcds start iop

$ rtcds start c1sus c1mcs c1rfm

I have no idea why the c1sus models got wedged, or why restarting them in this way fixed the issue.

 In addition to needing obnoxiously regular mxstream restarts, this afternoon the sus machine was doing something slightly differently.  Only 1 fb block per core was red (the mxstream symptom is 3 fb-related blocks are red per core), and restarting the mxstream didn't help.  Anyhow, I was searching through the elog, and this entry to which I'm replying had similar symptoms.  However, by the time I went back to the CDS FE screen, c1sus had regular mxstream symptoms, and an mxstream restart fixed things right up. 

So, I don't know what the issue is or was, nor do I know why it is fixed, but it's fine for now, but I wanted to make a note for the future.

  3945   Thu Nov 18 11:06:20 2010 josephbUpdateCDSc1sus and ADCs

Problem:

ADCs are timing out on c1sus when we have more than 3.

Talked with Rolf:

Alex will be back tomorrow (he took yesterday and today off), so I talked with Rolf.

He said ordering shouldn't make a difference and he's not sure why would be having a problem. However, when he loads the chassis, he tends to put all the ADCs on the same PCI bus (the back plane apparently contains multiples).  Slot 1 is its own bus, Slots 2-9 should be the same bus, and 10-17should be the same bus.

He also mentioned that when you use dmesg and see a line like "ADC TIMEOUT # ##### ######", the first number should be the ADC number, which is useful for determining which one is reporting back slow.

Plan:

Disconnect c1sus IO chassis completely, pull it out, pull out all cards, check connectors, and repopulate with Rolf's suggestions and keeping this elog in mind.

In regards to the RFM, it looks like one of the fibers had been disconnected from  the c1sus chassis RFM card (its plugged in in the middle of the chassis so its hard to see) during all the plugging in and out of the cables and cards last night.

  4733   Tue May 17 18:09:13 2011 Jamie, KiwamuConfigurationCDSc1sus and c1auxey crashed, rebooted

c1sus and c1auxey crashed, required hard reboot

For some reason, we found that c1sus and c1auxey were completely unresponsive.  We went out and gave them a hard reset, which brought them back up with no problems.

This appears to be related to a very similar problem report by Kiwamu just a couple of days ago, where c1lsc crashed after editing the C1LSC.ini and restarting the daqd process, which is exactly what I just did (see my previous log).  What could be causing this?

  6737   Fri Jun 1 02:33:40 2012 JenneUpdateComputersc1sus and c1iscex - bad fb connections

Something bad happened to c1sus and c1iscex ~20 min ago.  They both have "0x2bad" 's.  I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed.  The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).

If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.

  6740   Fri Jun 1 09:50:50 2012 JamieUpdateComputersc1sus and c1iscex - bad fb connections

Quote:

Something bad happened to c1sus and c1iscex ~20 min ago.  They both have "0x2bad" 's.  I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed.  The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).

If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.

 This is at least partially related to the mx_stream issue I reported previously.  I restarted mx_stream on c1iscex and that cleared up the models on that machine.

Something else is happening with c1sus.  Restarting mx_stream on c1sus didn't help.  I'll try to fix it when I get over there later.

  6742   Fri Jun 1 14:40:24 2012 JamieUpdateComputersc1sus and c1iscex - bad fb connections

Quote:

This is at least partially related to the mx_stream issue I reported previously.  I restarted mx_stream on c1iscex and that cleared up the models on that machine.

Something else is happening with c1sus.  Restarting mx_stream on c1sus didn't help.  I'll try to fix it when I get over there later.

I managed to recover c1sus.  It required stopping all the models, and the restarting them one-by-one:

$ rtcds stop all     # <-- this does the right to stop all the models with the IOP stopped last, so they will all unload properly.

$ rtcds start iop

$ rtcds start c1sus c1mcs c1rfm

I have no idea why the c1sus models got wedged, or why restarting them in this way fixed the issue.

  6738   Fri Jun 1 08:01:46 2012 steveUpdateComputersc1sus and c1iscex are down

Quote:

Something bad happened to c1sus and c1iscex ~20 min ago.  They both have "0x2bad" 's.  I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed.  The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).

If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.

 

 

Attachment 1: compdown.png
compdown.png
  4183   Fri Jan 21 15:26:15 2011 josephbUpdateCDSc1sus broken yesterday and now fixed

[Joe, Koji]
Yesterday's CDS swap of c1sus and c1iscex left the interfometer in a bad state due to several issues.

The first being a need to actually power down the IO chassis completely (I eventually waited for a green LED to stop glowing and then plugged the power back in) when switching computers.  I also plugged and plugged the interface cable from the IO chassis and computer while powered down.  This let the computer actually see the IO chassis (previously the host interface card was glowing just red, no green lights).

Second, the former c1iscex computer and now new c1sus computer only has 6 CPUs, not 8 like most of the other front ends.  Because it was running 6 models (c1sus, c1mcs, c1rms, c1rfm, c1pem, c1x02) and 1 CPU needed to be reserved for the operating system, 2 models were not actually running (recycling mirrors and PEM).  This meant the recycling mirrors were left swinging uncontrolled.

To fix this I merged the c1rms model with the c1sus model.  The c1sus model now controls BS, ITMX, ITMY, PRM, SRM.  I merged the filter files in the /chans/ directory, and reactivated all the DAQ channels.  The master file for the fb in the /target/fb directory had all references to c1rms removed, and then the fb was restarted via "telnet fb 8088" and then "shutdown".

My final mistake was starting the work late in the day.

So the lesson for Joe is, don't start changes in the afternoon.

Koji has been helping me test the damping and confirm things are really running.  We were having some issues with some of the matrix values.  Unfortunately I had to add them by hand since the previous snapshots no longer work with the models.

  3653   Tue Oct 5 16:58:41 2010 josephb, yutaUpdateCDSc1sus front end status

We moved the filters for the mode cleaner optics over from the C1SUS.txt file in /opt/rtcds/caltech/c1/chans/ to the C1MCS.txt file, and placed SUS_ on the front of all the filter names.  This has let us load he filters for the mode cleaner optics.

At the moment, we cannot seem to get testpoints for the optics (i.e. dtt is not working, even the specially installed ones on rosalba). I've asked Yuta to enter in the correct matrix elements and turn the correct filters on, then save with a burt backup.

  6787   Thu Jun 7 17:49:09 2012 JamieUpdateCDSc1sus in weird state, running models but unresponsive otherwise

Somehow c1sus was in a very strange state.  It was running models, but EPICS was slow to respond.  We could not log into it via ssh, and we could not bring up test points.  Since we didn't know what else to do we just gave it a hard reset.

Once it came it, none of the models were running.  I think this is a separate problem with the model startup scripts that I need to debug.  I logged on to c1sus and ran:

rtcds restart all

(which handles proper order of restarts) and everything came up fine.

Have no idea what happened there to make c1sus freeze like that.  Will keep an eye out.

  3946   Thu Nov 18 14:05:06 2010 josephb, yutaUpdateCDSc1sus is alive!

Problem:

We broke c1sus by moving ADC cards around.

Solution:

We pulled all the cards out, examined all contacts (which looked fine), found 1 poorly connected cable internally, going between an ADC and ADC timing interface card  (that probably happened last night), and one of the two RFM fiber cables pulled out of its RFM card.

We then placed all of the cards back in with a new ordering, tightened down everything, and triple checked all connections were on and well fit.

 

Gotcha!

Joe forgot that slot 1 and slot 2 of the timing interface boards have their last channels reserved for duotone signals.  Thus, they shouldn't be used for any ADCs or DACs that need their last channel (such as MC3_LR sensor input).  We saw a perfect timing signal come in through the MC3_LR sensor input, which prevented damping. 

We moved the ADC timing interface card out of the 1st slot  of the timing interface board and into slot 6 of the timing interface board, which resolved the problem.

Final Configuration:

 

 Timing Interface Board

Timing Interface Slot 1 (Duotone) 2 (Duotone) 3 4 5 6 7 8 9 10 11 12 13
Card None DAC interface (can't use last channel) ADC Interface ADC interface ADC interface

ADC

interface

None None None DAC interface DAC interface None None

 PCIe Chassis

Slot 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
PCIe Number Do Not Use 1 6 5 4 9 8 7 3 2 14 13 12 17 16 15 11 10
Card None ADC DAC ADC ADC ADC BO BO BO BO DAC DAC BIO RFM None None None None

Still having Issues with:

ITM West damps.  ITM South damps, but the coil gains are opposite to the other optics in order to damp properly.

We also need to look into switching the channel names for the watchdogs on ITMX/Y in addition to the front end code changes.

  6924   Fri Jul 6 01:12:02 2012 JenneUpdateComputersc1sus is fine

Quote:

I was trying to use a new BLRMs c-code block that the seismic people developed, instead of Mirko's more clunky version, but putting this in crashed c1sus.

I reverted to a known good c1pem.mdl, and Jamie and I did a reboot, but c1sus is still funny - none of the models are actually running. 

rtcds restart all - all the models are happy again, c1sus is fine.

But, we still need to figure out what was wrong with the c-code block.

Also, the BLRMS channels are listed in a Daq Channels block inside of the (new) library part, so they're all saved with the new CDS system which became effective as of the upgrade.  (I made the Mirko copy-paste BLRMS into a library part, including a DAQ channels block before trying the c-code.  This is the known-working version to which I reverted, and we are currently running.)

 The reason I started looking at BLRMS and c1sus today was that the BLRMS striptool was totally wacky.  I finally figured out that the pemepics hadn't been burt restored, so none of the channels were being filtered.  It's all better now, and will be even better soon when Masha finishes updating the filters (she'll make her own elog later)

  14719   Tue Jul 2 16:57:09 2019 gautamUpdateCDSc1sus is flaky

Since the work earlier this morning, the fast c1sus model has crashed ~5 times. Tried rebooting vertex FEs using the reboot script a few times, but the problem is persisting. I'm opting to do the full hard reboot of the 3 vertex FEs to resolve this problem.

Judging by Attachment #1, the processes have been stable overnight.

Attachment 1: c1sus_timing.png
c1sus_timing.png
  6923   Thu Jul 5 16:49:35 2012 JenneUpdateComputersc1sus is funny

I was trying to use a new BLRMs c-code block that the seismic people developed, instead of Mirko's more clunky version, but putting this in crashed c1sus.

I reverted to a known good c1pem.mdl, and Jamie and I did a reboot, but c1sus is still funny - none of the models are actually running. 

rtcds restart all - all the models are happy again, c1sus is fine.

But, we still need to figure out what was wrong with the c-code block.

Also, the BLRMS channels are listed in a Daq Channels block inside of the (new) library part, so they're all saved with the new CDS system which became effective as of the upgrade.  (I made the Mirko copy-paste BLRMS into a library part, including a DAQ channels block before trying the c-code.  This is the known-working version to which I reverted, and we are currently running.)

  6026   Mon Nov 28 16:46:55 2011 kiwamuUpdateCDSc1sus is now up

I have restarted the c1sus machine and burt-restored c1sus and c1mcs to the day before Thank giving, namely 23rd of November.

Quote from #6020

I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferometer.

  7182   Tue Aug 14 17:47:44 2012 JamieUpdateCDSc1sus machine replaced

Rolf and Alex came back over with a replacement machine for c1sus.   We removed the old machine, removed it's timing, dolphin, and PCIe extension cards and put them in the new machine.  We then installed the new machine and booted it and it came up fine.  The BIOS in this machine is slightly different, and it wasn't having the same failure-to-boot-with-no-COM issue that the previous one was.  The COM ports are turned off on this machine (as is the USB interface).

Unfortunately the problem we were experiencing with the old machine, that unloading certain models was causing others to twitch and that dolphin IPC writes were being dropped, is still there.  So the problem doesn't seem to have anything to do with hardware settings...

After some playing, Rolf and Alex determined that for some reason the c1rfm model is coming up in a strange state when started during boot.  It runs faster, but the IPC errors are there.  If instead all models are stopped, the c1rfm model is started first, and then the rest of the models are started, the c1rfm model runs ok.  They don't have an explanation for this, and I'm not sure how we can work around it other than knowing the problem is there and do manual restarts after boot.  I'll try to think of something more robust.

A better "fix" to the problems is to clean up all of our IPC routing, a bunch of which we're currently doing very inefficient right now.  We're routing things through c1rfm that don't need to be, which is introducing delays.  It particular, things that can communicate directly over RFM or dolphin should just do so.  We should also figure out if we can put the c1oaf and c1pem models on the same machine, so that they can communicate directly over shared memory (SHMEM).  That should cut down on overhead quite a bit.  I'll start to look at a plan to do that.

 

  6042   Tue Nov 29 18:54:29 2011 kiwamuUpdateCDSc1sus machine up

[Zach / Kiwamu]

 Woke up the c1sus machine in order to lock PSL to MC so that we can observe the effect of not having the EOM heater.

  3157   Fri Jul 2 11:33:15 2010 josephbUpdateCDSc1sus needs real time linux to be setup on it

I connected a monitor and keyboard to the new c1sus machine and discovered its not running RTL linux.  I changed the root password to the usual, however, without help from Alex I don't know where to get the right version or how to install it, since it doesn't seem to have an obvious CD rom drive or the like.  Hopefully Tuesday I can get Alex to come over and help with the setup of it, and the other 1-2 IO chassis.

  3636   Fri Oct 1 16:34:06 2010 josephbUpdateCDSc1sus not booting due to fb dhcp server not running

For some reason, the dhcp server running on the fb machine which assigns the IP address to c1sus (since its running a diskless boot) was down.  This was preventing c1sus from coming up properly.  The symptom was an error indicated no DHCP offers were made(when I plugged a keyboard and monitor in).

To check if the dhcp server is running, run ps -ef | grep dhcpd.  If its not, it can be started with "sudo /etc/init.d/dhcpd start"

  6033   Tue Nov 29 04:47:49 2011 kiwamuUpdateCDSc1sus shut down again

I have shut down the c1sus machine at 3:30 AM.

ELOG V3.1.3-