40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 279 of 344  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  14956   Tue Oct 8 20:23:03 2019 gautamUpdateCDSc1iscaux testing

Looking at the old latch.st code, looks like this is just a heartbeat signal to indicate the code is alive. I'll implement this. Aesthetically, it'd be also nice to have the hex representation of the "*_SET" channels visible on the MEDM screen.

 

Quote:

Latch logic works. But latch alive signal is missing.

  14912   Mon Sep 30 11:20:43 2019 gautamUpdateCDSc1iscaux testing - CM board code updated

DATED, SEE ELOG14941 for the most up-to-date info on latch.py.

I modified /cvs/cds/caltech/target/c1iscaux/latch.py and /cvs/cds/caltech/target/c1iscaux/C1_ISC-AUX_CM.db to set up the mbbo logic for the other three channels on the CM board, namely REFL2 Gain, AO Gain, and the Super boosts. The systemctl processes were restarted on c1iscaux. We are now ready to perform systematic checks on the CM board functionality.

Remarks:

The addressing of the Acromag BIO registers is done in a way that is kind of inconvenient to use the EPICS mbboDirect protocol

  • The control word going to the Acromag is 16 bits in length
  • However, only the 4 least significant bits actually correspond to physical channels - the remaining 12 bits are "unused".
  • Because each Acromag BIO unit has 16 BIO channels, this means that they are grouped into four "banks" of 4 bits each.
  • The mbboDirect EPICS/modbus protocol is used to control multiple physical BIO channels using a single input, which is exactly what we want for the gain sliders on the CM board. However, one caveat is that the bits need to be consecutive.
  • This means that we have to break up the 6 bits used for the gain sliders (and in fact also the 2 bits used for the super boosts) into a least-significant-bits (LSB) group and a most-significant-bits (MSB) group.
  • What's more annoying is that our physical wiring scheme means that we can't uniformly decide on how this division into LSBs and MSBs work for all the channels - e.g. for REFL1 Gain, the LSB is the 4 least significant bits, while the MSB is the 2 most significant ones, while for REFL2 Gain, the roles are reversed.
  • In hindsight, the "clever" way to do the wiring assignment would have been to factor this in - but the problem is (sort of) easily fixed in software, and so I recommend we stick with the existing wiring scheme.

I tested the new latch.py script by toggling the various sliders (one at a time) between two values and monitoring the states of the various soft and "*_BITS" channels, see Attachment #1. The behavior seems consistent to me, but to be sure, we have to use Koji's LED tester board and confirm that the physical bits are being toggled correctly. The StripTool templates live in /cvs/cds/caltech/target/c1iscaux/CMdiag.

Quote:

I have not yet implemented the fix for the MBBO gain channels for all the gains - only REFL1_GAIN is set up correctly now. Need to look at the hardware for the correct addressing of bits

Attachment 1: CMsoftTest.png
CMsoftTest.png
  15423   Mon Jun 22 17:51:50 2020 gautamUpdateCDSc1iscaux was down

The machine needed a hard reboot as it was un-ssh-able. 

The exact time that the machine went down is unknown because the blinkys were not DQ-ed. I've now added these to the EDCU to make these channels actually useful, and we may look back on the reliability (or otherwise) of the Acromag system. To my memory, this is the ~5th time one of the new Acromag servers has needed a hard reboot. While this may be less frequent (?) than the VME machines, perhaps there is some other reason for these dropouts. Maybe something to do with the martian network?

Anyway the machine is back up and running now.

  4570   Tue Apr 26 22:56:01 2011 kiwamuUpdateLSCc1iscaux2 and c1iscaux restrated

While checking whitening filters on the LSC rack, I found some epics controls for the whitening looked not working.

So I powered two crates off : the top one and the bottom one on 1Y3 rack.

These crates contain c1iscaux and c1iscaux2. Then powered them on. But it didn't solve the issue.

  5992   Wed Nov 23 22:58:33 2011 KojiUpdateGeneralc1iscaux2 is back (re: recovery from the power shutdown)

Keying again did not help to solve the issue. I turned off the power at the back of the crate, and turn it on again.
Then the key worked again.

c1iscaux2 is burtrestored and running fine now.

Quote:

 - One of the VME rack on 1X3 is not showing the +/-15V green LED lights.

   This is the one on very upper side of the rack, which contains the old c1lsc machine and c1iscaux2. If we are still using c1iscaux2, it needs to be fixed.

 

  6598   Thu May 3 17:15:38 2012 KojiUpdateLSCc1iscaux2 rebooted/burtrestored

[Jenne/Den/Koji]

We saw some white boxes on the LSC screens.
We found c1iscaux2 is not running.

Once the target was power-cycled, these epics channels are back.
Then c1iscaux2 were burtrestored using the snapshot at 5:07 on 4/16, a day before the power glitch.

  5979   Tue Nov 22 18:15:39 2011 jamieUpdateCDSc1iscex ADC found dead. Replaced, c1iscex working again

c1iscex has not been running for a couple of days (since the power shutdown at least). I was assuming that the problem was recurrence of the c1iscex IO chassis issue from a couple weeks ago (5854).  However, upon investigation I found that the timing signals were all fine.  Instead, the IOP was reporting that it was finding now ADC, even though there is one in the chassis.

Since I had a spare ADC that was going to be used for the CyMAC, I decided to try swapping it out to see if that helped.  Sure enough, the system came up fine with the new ADC.  The IOP (c1x01) and c1scx are now both running fine.

I assume the issue before might have been caused by a failing and flaky ADC, which has now failed.  We'll need to get a new ADC for me to give back to the CyMAC.

  8088   Fri Feb 15 15:21:07 2013 JamieUpdateComputersc1iscex IO-chassis dead

I appears that the c1iscex IO-chassis is either dead or in a very bad state.  The PCIe interface card in the IO-chassis is showing four red lights, where it's supposed to be showing a dozen or so green lights.  Obviously this is going to prevent anything from running.

We've had power issues with this chassis before, so possibly that's what we're running into now.  I'll pull the chassis and diagnose asap.

 

  8109   Tue Feb 19 15:10:02 2013 JamieUpdateCDSc1iscex alive again

c1iscex is back up.  It is communicating with it's IO chassis, and all of it's models (c1x01, c1scx, c1spx) are running again.

The problem was that the IO chassis had no connection to the computer.  The One Stop card in the IO chassis, which is the PCIe bridge from the front-end machine and the IO chassis, was showing four red lights instead of the dozen or so green lights that it usually shows.  Upon closer inspection, the card appeared to be complaining that it had no connection to the host card in the front-end machine.  Un-illuminated lights on the host card seemed to be pointing to the same thing.

There are two connector slots on the expansion card, presumably for a daisy chain situation.  Looking at other IO chassis in the lab I determined that the cable from the front-end machine was plugged into the wrong slot in the One Stop card.  wtf.

Did someone unplug the cable connecting c1iscex to it's IO chassis, and then replug it in in the wrong slot?  A human must have done this.

  4179   Thu Jan 20 18:20:55 2011 josephbUpdateCDSc1iscex computer and c1sus computer swapped

Since the 1U sized computers don't have enough slots to hold the host interface board, RFM card, and a dolphin card, we had to move the 2U computer from the end to middle to replace c1sus.

We're hoping this will reduce the time associated with reads off the RFM card compared to when its in the IO chassis.  Previous experience on c1ioo shows this change provides about a factor of 2 improvement, with 8 microseconds per read dropping to 4 microseconds per read, per this elog.

So the dolphin card was moved into the 2U chassis, as well as the RFM card.  I had to swap the PMC to PCI adapter on the RFM card since the one originally on it required an external power connection, which the computer doesn't provide.  So I swapped with one of the DAC cards in the c1sus IO chassis.

But then I forgot to hit submit on this elog entry..............

  4478   Thu Mar 31 19:58:11 2011 kiwamuUpdateCDSc1iscex crashed

After I did several things to add new DAQ channels on c1iscex it suddenly became out of network. Maybe crashed.

Then c1iscex didn't respond to a ping and all the epics values associated with c1iscex became not accessible.

I physically shut it down by pushing the reset button. Then it came back and is now running fine.

 


(how I broke it)

Since activateDAQ.py has screwed up the 'ini' files including C1SCX.ini, I was not able to add a channel to C1SCX.ini by the usual daqconfig GUI.

So I started editing it in a manual way with an editor and changed some sentences to that shown below

  [C1:GCX-ERR_MON_IN1_DAQ]
  acquire=1
  chnnum=10004
  datarate=2048
  datatype=4
  [C1:GCX-GRN_REFL_DC_IN1_DAQ]
  acquire=1
  chnnum=10007
  datarate=2048
  datatype=4
  [C1:GCX-SLOW_SERVO1_IN1_DAQ]
  acquire=1
  chnnum=10010
  datarate=2048
  datatype=4

Then I rebooted fb to reflect the new DAQ channels.

After that I looked at the C1_FE_STATUS.adl screen and found some indicator lights were red.

So I pushed "Diag reset" button and "DAQ Reload" button on the C1SCX_GDS_TP.adl screen and then c1iscex died.

After the reboot the new DAQ channels looked acquired happily.

This is my second time to crash a front end machine (see this entry)

  8126   Thu Feb 21 12:56:38 2013 JenneUpdateCDSc1iscex dead again

c1iscex is dead again.  Red lights, no "breathing" on the FE status screen.

  6194   Thu Jan 12 23:19:56 2012 KojiUpdateSUSc1iscex is fine now

c1iscex is working as before and the optic is damped.


What I checked

1. I went to the X-end rack. I found the io-chassis was turned off.

2. I shutdown c1iscex, turned off, and turned on everything. Again, we did not have any signal from the ADC into c1scx model.

However, I found that c1x01 indicates healthy ADC signals.
This means that the connection between the IOP and the c1scx model was wrong ==> Simulated Plant

3. Burtrestored X'mas eve snapshot. This restored the gains and matrices as well as C1:SCX-SIM_SWITCH
which switches the input between the real ADCs and simulated plant.

4. The signals came back to c1scx.

  3965   Mon Nov 22 17:48:11 2010 josephbUpdateCDSc1iscex is not seeing its Binary Output card

Problem:

c1iscex does not even see its 32 channel Binary output card.  This means we have no control over the state of the analog whitening and dewhitening filters.  The ADC, DAC, and the 1616 Binary Input/Output cards are recognized and working.

Things tried:

Tried recreating the IOP code from the known working c1x02 (from the c1sus front end), but that didn't help.

Checked seating of the card, but it seems correctly socketed and tightened down nicely with a screw.

Tomorrow will try moving cards around and see if there's an issue with the first slot, which the Binary Output card is in.

Current Status:

The ETMX is currently damping, including POS, PIT, YAW and SIDE degrees of freedom.  However, the gds screen is showing a 0x2bad status for the c1scx front end (the IOP seems fine with a 0x0 status).  So for the moment, I can't seem to bring up c1scx testpoints.  I was able to do so earlier when I was testing the status of the binary outputs, so during one of the rebuilds, something broke. I may have to undo the SVN update and/or a change made by Alex today to allow for longer filter bank names beyond 19 characters.

  3926   Mon Nov 15 16:26:46 2010 josephbUpdateCDSc1iscex is now running and the network hasn't died

Problem:

c1iscex was spamming the network with error messages.

Solution:

Updated the front end codes to current standards (they were on the order of months out of date).  After fixing them up and rebuilding the codes on c1iscex, it no longer had problems connecting to the frame builder.\

Status:

I can look at test points for ETMX.  It is not currently damping however.

To Do:

Move filters for ETMX into the correct files. 

Need to add a Binary output blue and gold box to the end rack, and plug it into the binary output card.  Confirm the binary output logic is correct for the OSEM whitening, coil dewhitening, and QPD whitening boards. 

Get ETMX damped.

Figure out what we're going to do with the aux crate which is currently running y-end code at the new x-end.  Koji suggested simply swapping auxilliary crates - this may be the easiest.  Other option would be to change the IP address, so that when it PXE boots it grabs the x-end code instead of the y-end code.

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Sim.Plant Frame builder TDS
                     
  13242   Tue Aug 22 17:11:15 2017 gautamUpdateComputersc1iscex model restarts

[jamie, gautam]

We tried to implement the fix that Rolf suggested in order to solve (perhaps among other things) the inability of some utilities like dataviewer to open testpoints. The problem isn't wholly solved yet - we can access actual testpoint data (not just zeros, as was the case) using DTT, and if DTT is used to open a testpoint first, then dataviewer, but DV itself can't seem to open testpoints.

Here is what was done (Jamie will correct me if I am mistaken).

  1. Jamie checked out branch 3.4 of the RCG from the SVN.
  2. Jamie recompiled all the models on c1iscex against this version of RCG.
  3. I shutdown ETMX watchdog, then ran rtcds stop all on c1iscex to stop all the models, and then restarted them using rtcds start <model> in the order c1x01, c1scx and c1asx. 
  4. Models came back up cleanly. I then restarted the daqd_dc process on FB1. At this point all indicators on the CDS overview screen were green.
  5. Tried getting testpoint data with DTT and DV for ETMX Oplev Pitch and Yaw IN1 testpoints. Conclusion as above.

So while we are in a better state now, the problem isn't fully solved. 

Comment: seems like there is an in-built timeout for testpoints opened with DTT - if the measurement is inactive for some time (unsure how much exactly but something like 5mins), the testpoint is automatically closed.

  13135   Mon Jul 24 10:45:23 2017 gautamUpdateCDSc1iscex models died

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

Quote:

All front ends and model are (mostly) running now

Attachment 1: c1iscexFailure.png
c1iscexFailure.png
  13136   Mon Jul 24 10:59:08 2017 JamieUpdateCDSc1iscex models died
Quote:

This morning, all the c1iscex models were dead. Attachment #1 shows the state of the cds overview screen when I came in. The machine itself was ssh-able, so I just restarted all the models and they came back online without fuss.

This was me.  I had rebooted that machine and hadn't restarted the models.  Sorry for the confusion.

  8128   Thu Feb 21 14:32:02 2013 JamieUpdateCDSc1iscex models restarted

Quote:

c1iscex is dead again.  Red lights, no "breathing" on the FE status screen.

The c1iscex machine itself wasn't dead, the models were just not running.  Here are the last messages in dmesg:

[130432.926002] c1spx: ADC TIMEOUT 0 7060 20 7124
[130432.926002] c1scx: ADC TIMEOUT 0 7060 20 7124
[130433.941008] c1x01: timeout 0 1000000 
[130433.941008] c1x01: exiting from fe_code()

I'm guessing maybe the timing signal was lost, so the ADC stopped clocking.   Since the ADC clock is the everything clock, all the "fe" code (ie. models) aborted. Not sure what would have caused it.

I restarted all the models ("rtcds restart all") and everything came up fine. Obviously we should keep our eyes on things, and note if anything strange was happening if this happens again.

  5060   Fri Jul 29 12:39:26 2011 jamieUpdateCDSc1iscex mysteriously crashed

c1iscex was behaving very strangely this morning.  Steve earlier reported that he was having trouble pulling up some channels from the c1scx model.  I went to investigate and noticed that indeed some channels were not responding.

While I was in the middle of poking around, c1iscex stopped responding altogether, and became completely unresponsive.  I walked down there and did a hard reset.  Once it rebooted, and I did a burt restore from early this morning, everything appeared to be working again.

The fact that problems were showing up before the machine crashed worries me.  I'll try to investigate more this afternoon.

  9000   Mon Aug 12 21:27:03 2013 manasaUpdateCDSc1iscex needs help

I started to modify the c1asx model to reduce the RFM model from hitting its max time.
Instead of bringing in ASS, I have modified ASX to do everything and only the clock signals to ITMX pitch and yaw are now going through RFM. RFM is still hitting 62usec and I suppose that is because of the problems with c1iscex.

c1iscex not happy

Cause and symptoms

While restarting the models, c1iscex crashed a couple of times because of some errors and had to be powercycled. The models were modified and they seem to start ok.
But it looks like there is something wrong with c1iscex since the models were started. The GPS time is off and C1:DAQ-DC0_C1X01_CRC_SUM keeps building up even for c1x01 which was left untouched.

Trial treatments

1. Since c1x01 ans c1spx were not touched,c1scx and c1asx were killed and we tried to start the other models. This did not help.
2. Koji did a manual daqd restart which did not help either.

We are leaving c1iscex as is for the time being and calling Jamie for help.

P.S. While making the models, I had created IPCx_PCIE blocks in c1iscex which do not exist. I changed them to RFM and SHMEM blocks. This did not allow me to compile the model and was only spitting errors of IPCx mismatch. After some struggle and elog search I figured out from an old elog that eventhough the IPCx blocks are changed in the model, the old junk exists in the ipc file in chans directory. I deleted all junk channels related to the ASX model. The model compiled right away.

  9002   Tue Aug 13 07:40:53 2013 SteveUpdateCDSc1iscex needs help

 

 Sorrensen ps ouput of +15V at rack 1X9 was current limited to 10.3V @ 2A

Increased threshold to 2.1A  and the voltage is up to 14.7V

Attachment 1: c1iscexSick.png
c1iscexSick.png
  1126   Mon Nov 10 11:32:49 2008 robUpdateComputersc1iscex rebooted

it was running a few cycles late
  10642   Mon Oct 27 22:19:17 2014 JenneUpdateCDSc1iscex restarted

I'm not sure why, but c1iscex did not want to do an mxstream restart.  It would complain at me that  "* ERROR:  mx_stream is already stopping."
Koji suggested that I reboot the machine, so I did.  I turned off the ETMX watchdog, and then did a remote reboot.  Everything came back nicely, and the mx_stream process seems to be running.

  4020   Tue Dec 7 16:09:53 2010 josephbUpdateCDSc1iscex status

I swapped out the IO chassis which could only handle 3 PCIe cards with the another chassis which has space for 17, but which previously had timing issues.  A new cable going between the timing slave and the rear board seems to have fixed the timing issues. 

I'm hoping to get a replacement PCI extension board which can handle more than 3 cards this week from Rolf and then eventually put it in the Y-end rack.  I'm also still waiting for a repaired Host interface board to come in for that as well.

At this point, RFM is working to c1iscex, but I'm still debugging the binary outputs to the analog filters.  As of this time they are not working properly (turning the digital filters on and off seems to have no effect on the transfer function measured from an excitation in SUSPOS, all the way around to IN1 of the sensor inputs (but before measuring the digital fitlers).  Ideally I should see a difference when I switch the digital filters on and off (since the analog ones should also switch on and off), but I do not.

  7963   Wed Jan 30 13:50:27 2013 JenneUpdateComputersc1iscex still down

[Koji, Jenne]

We noticed that the iscex computer is still down, but the IOP is (was) running.  When we sat down to look at it, c1x01 was 'breathing', had a non-zero CPU_METER time, and the error was 0x4000, which I've never seen before.  The fb connection was still red though.  Also, it is claiming that its sync source is 1pps, not TDS like it usually is. 

Since things were different, Koji restarted the 2 other models running on iscex, with no resulting change.  We then did a 'rtcds restart all', and the IOP is no longer breathing, and the error message has changed to 0xbad.  The sync source is still 1pps.

Moral of the story:  c1iscex is still down, but temporarily showed signs of life that we wanted to record.

  7970   Thu Jan 31 10:23:39 2013 JamieUpdateComputersc1iscex still down

Quote:

[Koji, Jenne]

We noticed that the iscex computer is still down, but the IOP is (was) running.  When we sat down to look at it, c1x01 was 'breathing', had a non-zero CPU_METER time, and the error was 0x4000, which I've never seen before.  The fb connection was still red though.  Also, it is claiming that its sync source is 1pps, not TDS like it usually is. 

Since things were different, Koji restarted the 2 other models running on iscex, with no resulting change.  We then did a 'rtcds restart all', and the IOP is no longer breathing, and the error message has changed to 0xbad.  The sync source is still 1pps.

Moral of the story:  c1iscex is still down, but temporarily showed signs of life that we wanted to record.

There's definitely a timing issue with this machine.  I looked at it a bit yesterday.  I'll try to get to it by the end of the week.

  9433   Mon Dec 2 16:04:47 2013 JamieUpdateCDSc1iscex timing problem mysteriously disappears??? (thanksgiving miracle???)

Quote:

There is definitely a timing distribution malfunction at the c1iscex IO chassis.  There is no timing link between the "Master Timer Sequencer D050239" at the 1X6 and the c1iscex IO chassis.  Link lights at both ends are dead.  No timing, no running models.

It does not appear to be a problem with the Master Timer Sequencer.  I moved the c1iscey link to the J15 port on the sequencer and it worked fine.  This means its either a problem with the fiber or the timing card in the IO chassis.  The IO timing card is powered and does have what appear to be normal status lights on (except for the fiber link lights).  It's getting what I think is the nominal 4V power.  The connection to the IO chassis backplane board look ok.  So maybe it's just a dead fiber issue?

I do not know what could have been the problem with c1auxex, or if it's related to the fast timing issue.

I just got over here from Downs, where I managed to convince Todd to let me borrow one of their three remaining timing slave boards for c1iscex.  I walked down to the X end to replace the board only to discover that the link light on the existing timing board was back!  c1iscex was not responding, so I hard rebooted the machine, and everything came up rosy (all green!):

festatus.png

To repeat, I DID NOTHING.  The thing was working when I got here.  I have no idea when it came back, or how, but it's at least working for the moment.  I re-enabled the watchdog for ETMX SUS and it's now damped normally.

I'm going to hold on to the timing card for a couple of days, in case the failure comes back, but we'll need to return it to Downs soon, and probably think about getting some spare backups from Columbia.

  9434   Mon Dec 2 17:05:13 2013 JenneUpdateCDSc1iscex timing problem mysteriously disappears??? (thanksgiving miracle???)

Steve was trying to do something to it this morning, but I'm not exactly clear on what it was.  Maybe that helped?  Steve, can you tell us what you were trying to do this morning?

  9435   Tue Dec 3 07:42:23 2013 SteveUpdateCDSc1iscex timing problem mysteriously disappears??? (thanksgiving miracle???)

Quote:

Steve was trying to do something to it this morning, but I'm not exactly clear on what it was.  Maybe that helped?  Steve, can you tell us what you were trying to do this morning?

 I was trying to repeat  elog 9007  I did only get to line 2 of the Solution by Koji when Ottavia shut down, where I was working. This was all what I did.

  13079   Sun Jun 25 22:30:57 2017 gautamUpdateGeneralc1iscex timing troubles

I saw that the CDS overview screen indicated problems with c1iscex (also ETMX was erratic). I took a closer look and thought it might be a timing issue - a walk to the X-end confirmed this, the 1pps status light on the timing slave card was no longer blinking. 

I tried all versions of power cycling and debugging this problem known to me, including those suggested in this thread and from a more recent time. I am leaving things as it for the night, will look into this more tomorrow. I've also shutdown the ETMX watchdog for the time being. Looks like this has been down since 24Jun 8am UTC.

Attachment 1: c1iscex_status.png
c1iscex_status.png
  13081   Mon Jun 26 22:01:08 2017 KojiUpdateGeneralc1iscex timing troubles

I tried a couple of things, but no fundamental improvement of the missing LED light on the timing board.

- The power supply cable to the timing board at c1iscex indicated +12.3V

- I swapped the timing fiber to the new one (orange) in the digital cabinet. It didn't help.

- I swapped the opto-electronic I/F for the timing fiber with the Y-end one. The X-end one worked at Y-end, and Y-end one didn't work at X-end.

- I suspected the timing board itself -> I brought a "spare" timing board from the digital cabinet and tried to swap the board. This didn't help.

 

Some ideas:

- Bring the X-end fiber to C1SUS or C1IOO to see if the fiber is OK or not.

- We checked the opto-electronic I/F is OK

- Try to swap the IO chassis with the Y-end one.

- If this helps, swap the timing board only to see this is the problem or not.

  13085   Wed Jun 28 20:15:46 2017 gautamUpdateGeneralc1iscex timing troubles

[Koji, gautam]

Here is a summary of what we did today to fix the timing issue on c1iscex. The power supply to the timing card in the X end expansion chassis was to blame.

  1. We prepared the Y-end expansion chassis for transport to the X end. To do so, we disconnected the following from the expansion chassis
    • Cables going to the ADC/DAC adaptor boards
    • Dolphin connector
    • BIO connector
    • RFM fiber
    • Timing fiber
  2. We then carried the expansion chassis to the X end electronics rack. There we repeated the above steps for the X-end expansion chassis
  3. We swapped the X and Y end expansion chassis in the X end electronics rack. Powering the unit, we immediately saw the green lights on the front of the timing card turn on, suggesting that the Y-end expansion chassis works fine at the X end as well (as it should). To further confirm that all was well, we were able to successfully start all the RT models on c1iscex without running into any timing issues.
  4. Next, we decided to verify if the spare timing card is functional. So we swapped out the timing card in the expansion chassis brought over to the X end from the Y end with the spare. In this test too, all worked as expected. So at this stage, we concluded that
    • There was nothing wrong with the fiber bringing the timing signal to the X end
    • The Y-end expansion chassis works fine
    • The spare timing card works fine.
  5. Then we decided to try the original X-end expansion chassis timing card in the Y-end expansion chassis. This test too was successful - so there was nothing wrong with any of the timing card!
  6. Next, we decided to power the X-end timing chassis with its original timing card, which was just verified to work fine. Surprisingly, the indicator lights on the timing card did not turn on.
  7. The timing card has 3 external connections
    • A 40 pin IDE connector
    • Power
    • Fiber carrying the timing signal
  8. We went back to the Y-end expansion chassis, and checked that the indicator lights on the timing card turned on even when the 40 pin IDE connector was left unconnected (so the timing card just gets power and the timing signal).
  9. We concluded that the power supply in the X end expansion chassis was to blame. Indeed, when Koji jiggled the connector around a little, the indicator lights came on!
  10. The connection was diagnosed to be somewhat flaky - it employs the screw-in variety of terminal blocks, and one of the connections was quite loose - Koji was able to pull the cable out of the slot applying a little pressure.
  11. I replaced the cabling (swapped the wires for thicker gauge, more flexible variety), and re-tightened the terminal block screws. The connection was reasonably secure even when I applied some force. A quick test verified that the timing card was functional when the unit was powered.
  12. We then replaced the X and Y-end expansion chassis (complete with their original timing cards, so the spare is back in the CDS cabinet), in the racks. The models started up again without complaint, and the CDS overview screen is now in a good state [Attachment #1]. The arms are locked and aligned for maximum transmission now.
  13. There was some additional difficulty in getting the 40-pin IDE connector in on the Y-end expansion chassis. Looked like we had bent some of the pins on the timing board while pulling this cable out. But Koji was able to fix this with a screw driver. Care should be taken when disconnecting this cable in the future!

There were a few more flaky things in the Expansion chassis - the IDE connectors don't have "keys" that fix the orientation they should go in, and the whole timing card assembly is kind of difficult and not exactly secure. But for now, things are back to normal it seems.

Wouldn't it be nice if this fix also eliminates the mystery ETMX glitching problem? After all, seems like this flaky power supply has been a problem for a number of years. Let's keep an eye out.

Attachment 1: CDS_status_28Jun2017.png
CDS_status_28Jun2017.png
  11778   Wed Nov 18 10:10:53 2015 ericqUpdateCDSc1iscey IO chassis missing brackets

Steve and I inadvertently discovered that the c1iscey IO chassis doesn't have brackets to secure the cards where the ADC/DAC cables are connected, making them very easy to knock loose. All other IO chassis have these brackets. Pictures of c1iscey and c1lsc IO chassis to compare:

  11789   Thu Nov 19 15:16:24 2015 ericqUpdateCDSc1iscey IO chassis now has brackets

[Steve, ericq]

Brackets for the c1iscey IO chassis cards have been installed. Now, I can't unseat the cards by wiggling the ADC or DAC cable. yes

  4773   Tue May 31 15:45:37 2011 JamieUpdateCDSc1iscey IOchassis powered off for some reason. repowered.

We found that both of the c1iscey models (c1x05 and c1scy) were unresponsive, and weren't coming back up even after reboot.  We then found that the c1iscey IOchassis was actually powered off.  Steve's accepts some sort of responsibility, since he was monkeying around down there for some reason.  After powerup and reboot, everything is running again.

  5791   Wed Nov 2 21:49:59 2011 KatrinUpdateCDSc1iscey computer died again

while I was not doing anything on the machine.

  6002   Thu Nov 24 15:27:15 2011 kiwamuUpdateCDSc1iscey hardware rebooted
The c1iscey machine crashed around 1:00 AM last night and I did a hard-ware reboot by pressing a button on the front panel of the machine.
After the reboot its been running okay so far.
The crash happened after I pressed the "Diag Reset" button on the CDS status screen.
  2551   Thu Jan 28 09:14:51 2010 AlbertoConfigurationComputersc1iscey, c1iscex, c1lsc, c1asc rebooted

This morning the LSC scripts wheren't running properly. I had to reboot c1iscey, c1iscex, c1lsc, c1asc .

I burtrestored to Monday January 25 at 12:00. 

  7550   Mon Oct 15 20:45:58 2012 jamieUpdateIOOc1lsc DAC0 now connected to tip-tilt SOS DW boards

The tip-tile SOS dewhite/AI boards are now connected to the digital system.

20121015_190340.png

I put together a chassis for one of our space DAC -> IDC interface boards (maybe our last?).  A new SCSI cable now runs from DAC0 in the c1lsc IO chassis in 1Y3, to the DAC interface chassis in 1Y2.

Two homemade ribbon cables go directly from the IDC outputs of the interface chassis to the 66 pin connectors on the backplane of the Eurocrate.  They do not go through the cross-connects, cause cross-connects are stupid.  They go to directly to the lower connectors for slots 1 and 3, which are the slots for the SOS DW/AI boards.  I had to custom make these cables, or course, and it was only slightly tricky to get the correct pins to line up.  I should probably document the cable pin outs.

  • cable 0:  IDC0 on interface chassis (DAC channels 0-7) ---> Eurocrate slot 0 (TT1/TT2)
  • cable 1:  IDC1 on interface chassis (DAC channels 8-15)---> Eurocrate slot 2 (TT3/TT4)

As reported in a previous log in this thread, I added control logic to the c1ass front-end model for the tip-tilts.  I extended it to include TT_CONTROL (model part) for TT3 and TT4 as well, so we're now using all channels of DAC0 in c1lsc for TT control.

I tested all channels by stepping through values in EPICS and reading the monitor and SMA outputs of the DW/AI boards.  The channels all line up correctly.  A full 32k count output of a DAC channel results in 10V output of the DW/AI boards.  All channels checked out, with a full +-10V swing on their output with a full +-32k count swing of the DAC outputs.

   We're using SN 1 and 2 of the SOS DW/AI boards (seriously!)

The output channels look ok, and not too noisy.

Tomorrow I'll get new SMA cables to connect the DW/AI outputs to the coil driver boards, and I'll start testing the coil driver outputs.

As a reminder:  https://wiki-40m.ligo.caltech.edu/Suspensions/Tip_Tilts_IO

 

  7553   Tue Oct 16 00:08:26 2012 DenUpdateIOOc1lsc DAC0 now connected to tip-tilt SOS DW boards

Quote:

Tomorrow I'll get new SMA cables to connect the DW/AI outputs to the coil driver boards, and I'll start testing the coil driver outputs. 

 I've found a nice 16 twisted pair cable ~25m long and decided to use it as a port from 1Y3 to clean room cable instead of buying a new long one. I've added a break out board to the coil driver end to monitor outputs.

DSC_4748.JPG

  7561   Tue Oct 16 20:40:06 2012 DenUpdateIOOc1lsc DAC0 now connected to tip-tilt SOS DW boards

  Full cable path from coil driver to osem input is now ready. I've tested Ch1-4 of the left AI and left coil driver. 15 pin outputs and monitors show voltage that we expect. I've checked voltage on the other side of the cable in the clean room, it is correct. We are ready to test the coils. We need to bake osem cables asap. Hopefully, Bob will start this job tomorrow.

DSC_4749.JPG DSC_4755.JPG

  12309   Mon Jul 18 18:44:52 2016 varunUpdateCDSc1lsc FE recovered

c1lsc FE is up and running.

Details:

2) The machine was manually rebooted.

3) c1daf was recompiled and installed, with the problematic piece of code removed.

4) NTP timing was adjusted.

5) Frame Builder was restarted.

6) All models on c1lsc machine were restarted.

Attachment 1 shows the CDS status after the recovery. I wont be trying to run frequency warping immediately, I will first finish implementing the other harmless modules first.

Attachment 1: CDS_status160718.png
CDS_status160718.png
  12303   Thu Jul 14 23:38:59 2016 varunUpdateCDSc1lsc FE unresponsive

Today, at around 10:30, c1lsc machine froze and stopped responding to ping and ssh after I compiled and restarted c1daf. I think it is due to a large array in one of my codes. The daqd.log file shows the following:


..................................................................
CA.Client.Exception...............................................
    Warning: "Virtual circuit unresponsive"
    Context: "c1lsc.martian.113.168.192.in-addr.arpa:5064"
    Source File: ../tcpiiu.cpp line 945
    Current Time: Thu Jul 14 2016 22:27:42.102649102
..................................................................

I think the c1lsc FE may need a hard reboot.

  12304   Fri Jul 15 12:21:28 2016 varunUpdateCDSc1lsc FE unresponsive

c1lsc is up and running, Eric did a manual reboot today.

Quote:

Today, at around 10:30, c1lsc machine froze and stopped responding to ping and ssh after I compiled and restarted c1daf. I think it is due to a large array in one of my codes. The daqd.log file shows the following:


..................................................................
CA.Client.Exception...............................................
    Warning: "Virtual circuit unresponsive"
    Context: "c1lsc.martian.113.168.192.in-addr.arpa:5064"
    Source File: ../tcpiiu.cpp line 945
    Current Time: Thu Jul 14 2016 22:27:42.102649102
..................................................................

I think the c1lsc FE may need a hard reboot.

 

  5610   Tue Oct 4 07:51:36 2011 steveUpdateCDSc1lsc and c1sus are still down

 

 c1lsc and c1sus are still down. Only ETMX and ETMY are damped

  5607   Mon Oct 3 20:47:51 2011 kiwamuUpdateCDSc1lsc and c1sus didn't run

[Mirko / Jenne / Kiwamu]

Just a quick update. All the realtime processes on the c1lsc and c1sus machine didn't run at all.

Somehow the c1xxxfe.ko kernel module, where xxx is x04, x02, lsc, ass, sus, mcs, pem and rfm failed to be insmod.

The timing indicators on the c1lsc and c1sus machine are saying NO SYNC.

 

- According to log files (target/c1lsc/logs/log.txt)

insmod: error inserting '/opt/rtcds/caltech/c1/target/c1lsc/bin/c1lscfe.ko': -1 Unknown symbol in module

- dmesg on c1lsc (c1sus also dumps the same error message):

[   45.831507] DXH Adapter 0 : sci_map_segment - Failed to map segment - error=0x40000d01
[   45.833006] c1x04: DIS segment mapping status 1073745153

DXH dapter is a part of the Dolphine connections.

When a realtime codes is waking up, the code checks the Dolphin connections.

The checking procedure is defined by dolphin.c (/src/fe/doplhin.c).

According to a printk sentence in dolphin.c the second error message listed above will return status "0" if everything is fine.

The first error above is an error vector from a special dolphin's function called sci_map_segment, which is called in dolphin.c.

So something failed in this sci_map_segment function and is preventing the realtime code from waking up.

Note that sci_map_segment is defined in genif.h and genif.c which reside in /opt/srcdis/src/IRM_DSX/drv/src.

  5608   Mon Oct 3 21:20:30 2011 JenneUpdateCDSc1lsc and c1sus didn't run

[Jenne, Mirko, Kiwamu, Koji, and Jamie by phone]

We just got off the phone with Jamie.  In addition to all the stuff that Kiwamu mentioned, Mirko reverted the c1oaf model and C-code to stuff that was working successfully on Friday (using "svn export, rev # 1134) since that's what we were working on when all hell broke loose.

We did a few rounds of "sudo shutdown -h now" on c1lsc and c1sus machines, and pulled the power cords out.  

We also switched the c1ioo and c1lsc 1PPS fibers in the fanout chassis, to see if that would fix the problem.  Nope.  c1ioo is still fine, and c1lsc is still not fine.

Still getting "No Sync".

We're going to call in Alex in the morning, if we can't figure it out soon.

  5611   Tue Oct 4 10:35:08 2011 MirkoUpdateCDSc1lsc and c1sus running again

[Alex, Mirko]

Alex fixed the computers this morning. It was in fact a dolphin problem:

Hi Jenne,  figured it out. Even though dxadmin said the Dolphin net was fine, it wasn't. Something happeneed to DIS networkmanager and I had to restart it. It is running on fb: 
controls@fb ~ $ ps -e | grep dis 12280 ?        00:00:00 dis_networkmgr
controls@fb ~ $ sudo /etc/init.d/dis_networkmgr restart
Once the restart was done both c1lsc and c1sus nodes were configured correctly, they printed the usual "node 12 is OK" "node 8 is OK" messages into the dmesg and was able to run /etc/start_fes.sh on lsc and sus to load all the FEs.  Alex

Some lights on c1lsc were still red: C1:DAQ-FBO_C1SYS_SYS and the smaller red light left of it. Restarted the fb. Didn't help. Restarted c1lsc, all green now.
Restored autoburt from Oct 3. 19:07 on c1lsc and c1sus.
  4706   Thu May 12 23:12:40 2011 kiwamuUpdateCDSc1lsc crashed

This is my third time to crash a real-time machine. This time I crashed c1lsc.

I physically rebooted c1lsc machine by pushing the power button and it came back and now running fine.

 

(what I did)

The story is almost the same as the last two times (1st time, 2nd time).

I edited c1lsc.ini file using daqconfig and then shutdown daqd running fb.

Some indicators for c1lsc on the C1_FE_STATUS screen became red. So I hit the 'DAQ reload' button on the C1LSC_GDS_TP screen.

Then c1lsc died and didn't respond to ping.

ELOG V3.1.3-