40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 273 of 337  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  11778   Wed Nov 18 10:10:53 2015 ericqUpdateCDSc1iscey IO chassis missing brackets

Steve and I inadvertently discovered that the c1iscey IO chassis doesn't have brackets to secure the cards where the ADC/DAC cables are connected, making them very easy to knock loose. All other IO chassis have these brackets. Pictures of c1iscey and c1lsc IO chassis to compare:

  11789   Thu Nov 19 15:16:24 2015 ericqUpdateCDSc1iscey IO chassis now has brackets

[Steve, ericq]

Brackets for the c1iscey IO chassis cards have been installed. Now, I can't unseat the cards by wiggling the ADC or DAC cable. yes

  4773   Tue May 31 15:45:37 2011 JamieUpdateCDSc1iscey IOchassis powered off for some reason. repowered.

We found that both of the c1iscey models (c1x05 and c1scy) were unresponsive, and weren't coming back up even after reboot.  We then found that the c1iscey IOchassis was actually powered off.  Steve's accepts some sort of responsibility, since he was monkeying around down there for some reason.  After powerup and reboot, everything is running again.

  5791   Wed Nov 2 21:49:59 2011 KatrinUpdateCDSc1iscey computer died again

while I was not doing anything on the machine.

  6002   Thu Nov 24 15:27:15 2011 kiwamuUpdateCDSc1iscey hardware rebooted
The c1iscey machine crashed around 1:00 AM last night and I did a hard-ware reboot by pressing a button on the front panel of the machine.
After the reboot its been running okay so far.
The crash happened after I pressed the "Diag Reset" button on the CDS status screen.
  2551   Thu Jan 28 09:14:51 2010 AlbertoConfigurationComputersc1iscey, c1iscex, c1lsc, c1asc rebooted

This morning the LSC scripts wheren't running properly. I had to reboot c1iscey, c1iscex, c1lsc, c1asc .

I burtrestored to Monday January 25 at 12:00. 

  7550   Mon Oct 15 20:45:58 2012 jamieUpdateIOOc1lsc DAC0 now connected to tip-tilt SOS DW boards

The tip-tile SOS dewhite/AI boards are now connected to the digital system.

20121015_190340.png

I put together a chassis for one of our space DAC -> IDC interface boards (maybe our last?).  A new SCSI cable now runs from DAC0 in the c1lsc IO chassis in 1Y3, to the DAC interface chassis in 1Y2.

Two homemade ribbon cables go directly from the IDC outputs of the interface chassis to the 66 pin connectors on the backplane of the Eurocrate.  They do not go through the cross-connects, cause cross-connects are stupid.  They go to directly to the lower connectors for slots 1 and 3, which are the slots for the SOS DW/AI boards.  I had to custom make these cables, or course, and it was only slightly tricky to get the correct pins to line up.  I should probably document the cable pin outs.

  • cable 0:  IDC0 on interface chassis (DAC channels 0-7) ---> Eurocrate slot 0 (TT1/TT2)
  • cable 1:  IDC1 on interface chassis (DAC channels 8-15)---> Eurocrate slot 2 (TT3/TT4)

As reported in a previous log in this thread, I added control logic to the c1ass front-end model for the tip-tilts.  I extended it to include TT_CONTROL (model part) for TT3 and TT4 as well, so we're now using all channels of DAC0 in c1lsc for TT control.

I tested all channels by stepping through values in EPICS and reading the monitor and SMA outputs of the DW/AI boards.  The channels all line up correctly.  A full 32k count output of a DAC channel results in 10V output of the DW/AI boards.  All channels checked out, with a full +-10V swing on their output with a full +-32k count swing of the DAC outputs.

   We're using SN 1 and 2 of the SOS DW/AI boards (seriously!)

The output channels look ok, and not too noisy.

Tomorrow I'll get new SMA cables to connect the DW/AI outputs to the coil driver boards, and I'll start testing the coil driver outputs.

As a reminder:  https://wiki-40m.ligo.caltech.edu/Suspensions/Tip_Tilts_IO

 

  7553   Tue Oct 16 00:08:26 2012 DenUpdateIOOc1lsc DAC0 now connected to tip-tilt SOS DW boards

Quote:

Tomorrow I'll get new SMA cables to connect the DW/AI outputs to the coil driver boards, and I'll start testing the coil driver outputs. 

 I've found a nice 16 twisted pair cable ~25m long and decided to use it as a port from 1Y3 to clean room cable instead of buying a new long one. I've added a break out board to the coil driver end to monitor outputs.

DSC_4748.JPG

  7561   Tue Oct 16 20:40:06 2012 DenUpdateIOOc1lsc DAC0 now connected to tip-tilt SOS DW boards

  Full cable path from coil driver to osem input is now ready. I've tested Ch1-4 of the left AI and left coil driver. 15 pin outputs and monitors show voltage that we expect. I've checked voltage on the other side of the cable in the clean room, it is correct. We are ready to test the coils. We need to bake osem cables asap. Hopefully, Bob will start this job tomorrow.

DSC_4749.JPG DSC_4755.JPG

  12309   Mon Jul 18 18:44:52 2016 varunUpdateCDSc1lsc FE recovered

c1lsc FE is up and running.

Details:

2) The machine was manually rebooted.

3) c1daf was recompiled and installed, with the problematic piece of code removed.

4) NTP timing was adjusted.

5) Frame Builder was restarted.

6) All models on c1lsc machine were restarted.

Attachment 1 shows the CDS status after the recovery. I wont be trying to run frequency warping immediately, I will first finish implementing the other harmless modules first.

  12303   Thu Jul 14 23:38:59 2016 varunUpdateCDSc1lsc FE unresponsive

Today, at around 10:30, c1lsc machine froze and stopped responding to ping and ssh after I compiled and restarted c1daf. I think it is due to a large array in one of my codes. The daqd.log file shows the following:


..................................................................
CA.Client.Exception...............................................
    Warning: "Virtual circuit unresponsive"
    Context: "c1lsc.martian.113.168.192.in-addr.arpa:5064"
    Source File: ../tcpiiu.cpp line 945
    Current Time: Thu Jul 14 2016 22:27:42.102649102
..................................................................

I think the c1lsc FE may need a hard reboot.

  12304   Fri Jul 15 12:21:28 2016 varunUpdateCDSc1lsc FE unresponsive

c1lsc is up and running, Eric did a manual reboot today.

Quote:

Today, at around 10:30, c1lsc machine froze and stopped responding to ping and ssh after I compiled and restarted c1daf. I think it is due to a large array in one of my codes. The daqd.log file shows the following:


..................................................................
CA.Client.Exception...............................................
    Warning: "Virtual circuit unresponsive"
    Context: "c1lsc.martian.113.168.192.in-addr.arpa:5064"
    Source File: ../tcpiiu.cpp line 945
    Current Time: Thu Jul 14 2016 22:27:42.102649102
..................................................................

I think the c1lsc FE may need a hard reboot.

 

  5610   Tue Oct 4 07:51:36 2011 steveUpdateCDSc1lsc and c1sus are still down

 

 c1lsc and c1sus are still down. Only ETMX and ETMY are damped

  5607   Mon Oct 3 20:47:51 2011 kiwamuUpdateCDSc1lsc and c1sus didn't run

[Mirko / Jenne / Kiwamu]

Just a quick update. All the realtime processes on the c1lsc and c1sus machine didn't run at all.

Somehow the c1xxxfe.ko kernel module, where xxx is x04, x02, lsc, ass, sus, mcs, pem and rfm failed to be insmod.

The timing indicators on the c1lsc and c1sus machine are saying NO SYNC.

 

- According to log files (target/c1lsc/logs/log.txt)

insmod: error inserting '/opt/rtcds/caltech/c1/target/c1lsc/bin/c1lscfe.ko': -1 Unknown symbol in module

- dmesg on c1lsc (c1sus also dumps the same error message):

[   45.831507] DXH Adapter 0 : sci_map_segment - Failed to map segment - error=0x40000d01
[   45.833006] c1x04: DIS segment mapping status 1073745153

DXH dapter is a part of the Dolphine connections.

When a realtime codes is waking up, the code checks the Dolphin connections.

The checking procedure is defined by dolphin.c (/src/fe/doplhin.c).

According to a printk sentence in dolphin.c the second error message listed above will return status "0" if everything is fine.

The first error above is an error vector from a special dolphin's function called sci_map_segment, which is called in dolphin.c.

So something failed in this sci_map_segment function and is preventing the realtime code from waking up.

Note that sci_map_segment is defined in genif.h and genif.c which reside in /opt/srcdis/src/IRM_DSX/drv/src.

  5608   Mon Oct 3 21:20:30 2011 JenneUpdateCDSc1lsc and c1sus didn't run

[Jenne, Mirko, Kiwamu, Koji, and Jamie by phone]

We just got off the phone with Jamie.  In addition to all the stuff that Kiwamu mentioned, Mirko reverted the c1oaf model and C-code to stuff that was working successfully on Friday (using "svn export, rev # 1134) since that's what we were working on when all hell broke loose.

We did a few rounds of "sudo shutdown -h now" on c1lsc and c1sus machines, and pulled the power cords out.  

We also switched the c1ioo and c1lsc 1PPS fibers in the fanout chassis, to see if that would fix the problem.  Nope.  c1ioo is still fine, and c1lsc is still not fine.

Still getting "No Sync".

We're going to call in Alex in the morning, if we can't figure it out soon.

  5611   Tue Oct 4 10:35:08 2011 MirkoUpdateCDSc1lsc and c1sus running again

[Alex, Mirko]

Alex fixed the computers this morning. It was in fact a dolphin problem:

Hi Jenne,  figured it out. Even though dxadmin said the Dolphin net was fine, it wasn't. Something happeneed to DIS networkmanager and I had to restart it. It is running on fb: 
controls@fb ~ $ ps -e | grep dis 12280 ?        00:00:00 dis_networkmgr
controls@fb ~ $ sudo /etc/init.d/dis_networkmgr restart
Once the restart was done both c1lsc and c1sus nodes were configured correctly, they printed the usual "node 12 is OK" "node 8 is OK" messages into the dmesg and was able to run /etc/start_fes.sh on lsc and sus to load all the FEs.  Alex

Some lights on c1lsc were still red: C1:DAQ-FBO_C1SYS_SYS and the smaller red light left of it. Restarted the fb. Didn't help. Restarted c1lsc, all green now.
Restored autoburt from Oct 3. 19:07 on c1lsc and c1sus.
  4706   Thu May 12 23:12:40 2011 kiwamuUpdateCDSc1lsc crashed

This is my third time to crash a real-time machine. This time I crashed c1lsc.

I physically rebooted c1lsc machine by pushing the power button and it came back and now running fine.

 

(what I did)

The story is almost the same as the last two times (1st time, 2nd time).

I edited c1lsc.ini file using daqconfig and then shutdown daqd running fb.

Some indicators for c1lsc on the C1_FE_STATUS screen became red. So I hit the 'DAQ reload' button on the C1LSC_GDS_TP screen.

Then c1lsc died and didn't respond to ping.

  13942   Mon Jun 11 18:49:06 2018 gautamUpdateCDSc1lsc dead again

Why is this happening so frequently now? Last few lines of error log:

[  575.099793] c1oaf: DAQ EPICS: Int = 199  Flt = 706 Filters = 9878 Total = 10783 Fast = 113
[  575.099793] c1oaf: DAQ EPICS: Number of Filter Module Xfers = 11 last = 98
[  575.099793] c1oaf: crc length epics = 43132
[  575.099793] c1oaf:  xfer sizes = 128 788 100988 100988 
[240629.686307] c1daf: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1cal: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1ass: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1oaf: ADC TIMEOUT 0 43039 31 43103
[240629.686307] c1lsc: ADC TIMEOUT 0 43039 31 43103
[240630.684493] c1x04: timeout 0 1000000 
[240631.684938] c1x04: timeout 1 1000000 
[240631.684938] c1x04: exiting from fe_code()

I fixed it by running the reboot script.

  1348   Tue Mar 3 10:39:07 2009 AlbertoUpdateLSCc1lsc discontinued functioning

The c1lsc has been unstable since last night. Its status on the DAQ screen was oscillating from green to red every minute.

Yesterday, I power recycled it. That brought it back but the MC got unclocked and the autolocker could not get engaged. I think it's because the power recycling also turned c1iscaux2 off which occupies the same rack crate.

Killing the autolocker on op340 e restarting didn't work. So I rebooted also c1dcuepis and burt-restored almost all snapshot files. To do that, as usual, I had to edit the snapshot files of c1dcuepics to move the quotes from the last line.

After that I restarted the autolocker that time it worked.

This morning c1lsc was again in the same unstable status as yesterday. This time I just reset it (no power recycling) and then I restarted it. It worked and now everything seems to be fine.

  8366   Thu Mar 28 10:44:30 2013 ManasaUpdateComputersc1lsc down

c1lsc was down this morning.

I restarted fb and c1lsc based on elog

Everything but c1oaf came back. I tried to restart c1oaf individually; but it didn't work.

Before:

cds_FE.png

After:

cds_fe1.png

  14133   Sun Aug 5 13:28:43 2018 gautamUpdateCDSc1lsc flaky

Since the lab-wide computer shutdown last Wednesday, all the realtime models running on c1lsc have been flaky. The error is always the same:

[58477.149254] c1cal: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1daf: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1ass: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1oaf: ADC TIMEOUT 0 10963 19 11027
[58477.149254] c1lsc: ADC TIMEOUT 0 10963 19 11027
[58478.148001] c1x04: timeout 0 1000000 
[58479.148017] c1x04: timeout 1 1000000 
[58479.148017] c1x04: exiting from fe_code()

This has happened at least 4 times since Wednesday. The reboot script makes recovery easier, but doing it once in 2 days is getting annoying, especially since we are running many things (e.g. ASS) in custom configurations which have to be reloaded each time. I wonder why the problem persists even though I've power-cycled the expansion chassis? I want to try and do some IFO characterization today so I'm going to run the reboot script again but I'll get in touch with J Hanks to see if he has any insight (I don't think there are any logfiles on the FEs anyways that I'll wipe out by doing a reboot). I wonder if this problem is connected to DuoTone? But if so, why is c1lsc the only FE with this problem? c1sus also does not have the DuoTone system set up correctly...

The last time this happened, the problem apparently fixed itself so I still don't have any insight as to what is causing the problem in the first place frown. Maybe I'll try disabling c1oaf since that's the configuration we've been running in for a few weeks.

  4015   Mon Dec 6 16:49:43 2010 josephbUpdateCDSc1lsc halfway to working

C1LSC Status:

The c1lsc computer is running Gentoo off of the fb server. It has been connected to the DAQ network and is handling mx_streams properly (so we're not flooding the network error messages like we used to with c1iscex).  It is using the old c1lsc ip address (192.168.113.62). It can ssh'd into.

However, it is not talking properly to the IO chassis.  The IO chassis turns on when the computer turns on, but the host interface board in the IO chassis only has 2 red lights on (as opposed to many green lights on the host interface boards in the c1sus, c1ioo, and c1iscex IO chassis).  The c1lsc IO processor (called c1x04) doesn't see any ADCs, DACs, or Binary cards.  The timing slave is receiving 1PPS and is locked to it, but because the chassis isn't communicating, c1x04 is running off the computer's internal clock, causing it to be several seconds off. 

Need to investigate why the computer and chassis are not talking to each other.

General Status:

The c1sus and c1ioo computers are not talking properly to the frame builder.  A reboot of c1iscex fixed the same problem earlier, however, as Kiwamu and Suresh are working in the vacuum, I'm leaving those computers alone for the moment, but a reboot and burt restore probably should be done later today for c1sus and c1ioo

 

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Dolphin RFM Sim.Plant Frame builder TDS
                       
  7577   Fri Oct 19 00:55:35 2012 JenneUpdateComputersc1lsc is down (at least all of the models)

When Evan and I were dithering the BS and ITMY (see his elog), I noticed that c1lsc was acting weird.  the IOP was the only one with the blinky heartbeat.  The IOP was all green lights, but all the other models had red for the fb connection, as well as the rightmost indicator (I don't know what that one is for).  I logged on to c1lsc and ran 'rtcds restart all'.  The script didn't get anywhere beyond saying it was beginning to stop the 1st model (sup, the bottom one on the lsc list).  Then all of the cpus went white.  I can still ping c1lsc, but I can't ssh to it.

I'm not sure what to do here Jamie.  Heelp. 

  8367   Thu Mar 28 12:50:52 2013 JenneUpdateComputersc1lsc is fine

 Manasa told me that she did things in a different order than her old elog. 

She had

(1) ssh'ed to c1lsc and did a remote shutdown / restart,

(2) restarted fb,

(3) restarted the mxstream on c1lsc,

(4) restarted each model individually in some order that I forgot to ask.

However, with the situation as in her "before" screenshot, all that needed to be done was restart the mxstream process on c1lsc. 

Anyhow, when I looked at the OAF model, it was complaining of "no sync", so I restarted the model, and it came back up fine.  All is well again.

  7580   Fri Oct 19 12:45:12 2012 DenUpdateCDSc1lsc is up after reboot
  6243   Fri Feb 3 10:48:24 2012 DenUpdateComputersc1lsc kernel

This morning I killed again c1lsc kernel with the new realization of fxlms algorithm. It works fine with gcc compiler during the tests. However, smth forbidden for the kernel is going on. I'll spend some more time on investigatin it. Interesting thing is that I did not even pressed "On" at the OAF MEDM screen to make the code running. c1lsc suspended even before. May be there is some function-name mismatch.

After c1lsc suspention I recomiled back non-working code and rebooted c1lsc. c1sus is also bad after c1lsc reboot as they communicate. I killed x04, lsc, ass, oaf models on the c1lsc computer and sus, mcs, rfm, pem on the c1sus computer. Then I restarted x02 model and restored its burt snapshot from 08:07. After I started all models back and restored their burt snapshots from 08:07. Then I diag reset all started models.

Before starting new fxlms code I've shutted down all the optics so that possible c1lsc suspention would not make them crazy. After reboot I turned the coils back. Everything seems to work fine.

  6249   Fri Feb 3 17:29:28 2012 DenUpdateComputersc1lsc kernel

The reason I've killed the c1lsc kernel was the following - when the code starts to run, it initializes some parameters and this takes ~0.2 msec per dof. Now, the old code did nothing with a DOF if C1:OAF-ADAPT_???_ONOFF == OFF. My code still initialized the parameters but then does nothing because no witness channels are given. But it spends 8*0.2 = 1.6 msec for initializing all 8 dof. As the code is called with frequency 2k, this was the reason for crashing. Now I've corrected my code, it compiles, runs and does not kill c1lsc. However, the old code would also kill the kernel if all DOF are filtered. So, when we'll use all 8 DOF, we'll have to split variable initialization.

But this is not the biggest problem. C1OAF model must be corrected, because, as for now, all 8 DOF call the same ADAPT_XFCODE function. As this function uses static variables, they will be all messed up by different DOF signals.

  4749   Thu May 19 16:46:20 2011 kiwamuUpdateLSCc1lsc model : input channels rearanged

According to Suresh's LSC rack design I rearranged the input channels of the c1lsc model such that the analog signals and the ADC channels are nicely matched.

Also I updated the c1lsc model in the svn with a help from Joe. The picture below is a screen shot of the input channels in the model file after I edited it.

c1lsc.png

  14146   Wed Aug 8 23:03:42 2018 gautamUpdateCDSc1lsc model started

As part of this slow but systematic debugging, I am turning on the c1lsc model overnight to see if the model crashes return.

  11809   Wed Nov 25 14:46:53 2015 gautamUpdateCDSc1lsc models restarted

I noticed that all the models running on C1LSC had crashed when I came in earlier today. I restarted all of them by ssh-ing into C1LSC and running rtcds restart all. The models seem to be running fine now.

  8335   Mon Mar 25 11:42:45 2013 JamieUpdateComputersc1lsc mx_stream ok

I'm not exactly sure what the problem was here, but I think it had to do with a stuck mx_stream process that wasn't being killed properly.  I manually killed the process and it seemed to come up fine after that.  The regular restart mechanisms should work now.

No idea what caused the process to hang in the first place, although I know the newer RCG (2.6) is supposed to address some of these mx_stream issues.

  8334   Mon Mar 25 09:52:22 2013 JenneUpdateComputersc1lsc mxstream won't restart

Most of the front ends' mx streams weren't running, so I did the old mxstreamrestart on all machines (see elog 6574....the dmesg on c1lsc right now, at the top, has similar messages).  Usually this mxstream restart works flawlessly, but today c1lsc isn't working.  Usually to the right side of the terminal window I get an [ok] when things work.  For the lsc machine today, I get [!!] instead. 

After having learned from recent lessons, I'm waiting to hear from Jamie.

  1235   Fri Jan 16 18:33:54 2009 YoichiSummaryComputersc1lsc rebooted to fix 16Hz glitches
Kakeru, Yoichi

There were 16Hz harmonics in the PD3 and PD4 channels even when there is no light falling on it.
Actually, even when the connection to the ADC was removed, the 16Hz noise was still there.

Rob suggested that this might be digital problem, because data is sent to the daq computer very 1/16 of a second.

We restarted c1lsc and the problem went away.
  4124   Fri Jan 7 12:01:39 2011 OsamuUpdateComputersc1lsc running

I got a new adapter board for expansion chassis from CDS and exchanged the existing adapter board which was laid on the floor around ETMX to new one.

Then I connected the chassis to c1lcs,  c1lsc seems to be running now. I will return the old board to CDS since Rolf says he wants to return it to manufacture.

 I found an interface box from ADC to D-SUB37pin and a cable to connect them.

I needed to make cables to connect the interface box to existing LSC whitening filters that has a 37 pin female D-SUB connector on one end and a 40pin female flat connector on the other end. We should use shielded cables for them, but unfortunately CDS did not have right one. Temporarily I made one cable for 1-8ch using a ribbon twist cable like Joe did.

I found a saturation at ch5 of ADC0 on c1lsc. I did not check carefully but it seemed to come from the LSC whitening board. Input of ch5 of the whitening board was not terminated and had a huge output voltage, but also ch6 was not terminated and had no big output. I guess something wrong on the LSC whitening board. Needs to be checked. Anyway I unplugged the small ribbon cable between the whitening board and the next LSC AA filter board.

Finally I realized that fiber connection of RFM did not exist. What I saw was the fiber cable of Dolphin. We need a RFM PCIe interface board, and a long fiber cable between c1lsc and RFM hub.

  6716   Wed May 30 18:08:40 2012 JamieUpdateLSCc1lsc: add error point pick-offs, moved ctrl pick-offs after feedforward

I made some modifications to the c1lsc model in order to extract both the error and control signals.

I added pick-offs for the error signals right before IFO DOF filter modules.  These are then sent with GOTOs to outputs.

I also modified things on the control side.  The OAF stuff was picking off control signals before feedforward in/outs.  After discussing with Jenne we decided that it would make sense for the OAF to be looking at the control signals after feedforward.  It also makes sense to define the control signal after the feedforward.  These control signals are then sent with GOTOs to another set of outputs.

Finally, I moved the triggers to after the control signal pickoffs, and right before the output matrix.  The final chain looks like (see attachment):

input matrix --> power norm --> ERR pickoff --> DOF filters --> FF out --> FF in --> CTRL pickoff --> trigger --> output matrix

The error pickoff outputs in the top level of the model are left terminated for the moment.  Eventually I will be hooking these into the new c1cal calibration model.

The model was recompiled, installed, and restarted.  Everything came up fine.

  6734   Thu May 31 22:13:08 2012 JamieUpdateCDSc1lsc: added remaining SHMEM senders for ERR and CTRL, c1oaf model updated appropriately

All the ERR and CTRL outputs in c1lsc now go to SHMEM senders.  I renamed the the CTRL output SHMEM senders to be more generic, since they aren't specifically for OAF anymore.  See attached image from c1lsc.

c1oaf was updated so that SHMEM receivers pointed to the newly renamed senders.

c1lsc and c1oaf were rebuilt, installed, and restarted and are now running.

  11280   Mon May 11 13:21:25 2015 manasaUpdateCDSc1lsp and c1sup not running

I found the c1lsp and c1sup models not running anymore on c1lsc (white blocks for status lights on medm).

To fix this, I ssh'd into c1lsc. c1lsc status did not show c1lsp and c1sup models running on it.

I tried the usual rtcds restart <model name> for both and that returned error "Cannot start/stop model 'c1XXX' on host c1lsc".

I also tried rtcds restart all on c1lsc, but that has NOT brought back the models alive.

Does anyone know how I can fix this??

c1sup runs some the suspension controls. So I am afraid that the drift and frequent unlocking of the arms we see might be related to this.

 

P.S. We might also want to add the FE status channels to the summary pages.

  11282   Mon May 11 14:08:19 2015 manasaUpdateCDSc1lsp and c1sup removed?

I just found out that c1lsp and c1sup models no more exist on the FE status medm screens. I am assuming some changes were done to the models as well.

Earlier today, I was looking at some of the old medm screens running on Donatella that did not reflect this modification. 

Did I miss any elogs about this or was this change not elogged??

Quote:

I found the c1lsp and c1sup models not running anymore on c1lsc (white blocks for status lights on medm).

To fix this, I ssh'd into c1lsc. c1lsc status did not show c1lsp and c1sup models running on it.

I tried the usual rtcds restart <model name> for both and that returned error "Cannot start/stop model 'c1XXX' on host c1lsc".

I also tried rtcds restart all on c1lsc, but that has NOT brought back the models alive.

Does anyone know how I can fix this??

c1sup runs some the suspension controls. So I am afraid that the drift and frequent unlocking of the arms we see might be related to this.

 

P.S. We might also want to add the FE status channels to the summary pages.

 

  11285   Tue May 12 08:51:08 2015 ericqUpdateCDSc1lsp and c1sup removed?
Quote:

was this change not elogged??

This is my sin.

Back in Febuary (around the 25th) I modified c1sus.mdl, removing the simulated plant connections we weren't using from c1lsp and c1sup. This was included in the model's svn log, but not elogged. blush

The models don't start with the rtcds restart shortcut, because I removed them from the c1lsc line in FB:/diskless/root/etc/rtsystab (or c1lsc:/etc/rtsystab). There is a commented out line in there that can be uncommented to restore them to the list of models c1lsc is allowed to run. 

However, I wouldn't suspect that the models not running should affect the suspension drift, since the connections from them to c1sus have been removed. If we still have trends from early February, we could look and see if the drift was happening before I made this change. 

  13639   Fri Feb 16 22:15:30 2018 gautamUpdateGeneralc1mcs model restarted

c1mcs had died for some reason. Looking at dmesg, I see:

[769312.996875] c1mcsepics[1140]: segfault at 7f5000000012 ip 00007f50ea8ded8f sp 00007f50e9f53a10 error 4 in libc-2.19.so[7f50ea865000+1a1000]

None of the other EPICS processes died. Not sure what to make of this. I was at the PSL table working, and had closed the PSL shutter to avoid MC autolocker trying to keep the MC locked while I was mucking about, but this shouldn't have had any effect on an EPICS process?

Anyway, I just logged into c1sus, stopped and restarted the model. IMC locks fine now.

  6570   Wed Apr 25 21:24:10 2012 DenUpdateComputer Scripts / Programsc1oaf

C1OAF model, codes and medm screens are updated. All proper files are commited to svn and updated at the new model path.

  14927   Wed Oct 2 23:23:02 2019 gautamUpdateCDSc1oaf DC indicator needs to be green

Today, I found out that this type of "0x2bad" DC error is connected to the 1e+20 cts output. The solution was to bite the bullet and stop/start the c1oaf model (at the risk of crashing the vertex FEs). Today, I was lucky and the model came back online with all CDS indicators green. At which point I was able to engage length feedforward to MC2 (with some admittedly old filter). Some subtraction is happening, see Attachment #1. This was just meant to test whether the signal routing is happening - the feedforward signal goes to the "ALTPOS" input of the suspension CDS block, which AFAIK does not have a corresponding MEDM EPICS indicator. So I couldn't figure out whether the feedforward control signal was in fact making it to the suspension. On the evidence of the suppression of MCL in the 1-3 Hz band, I would conclude that it is. Useful to be able to engage these FF filters for better lockability.

Quote:

Attachment #1 - the vertex seismometer input produces 1e+20 cts at the output of the feedforward filter. Attachment #2 shows the shape of the feedforward filters - doesn't explain the saturation. Since this is a feedforward loop, a runaway loop can't be the explanation either.

  5426   Thu Sep 15 21:56:01 2011 MirkoUpdateCDSc1oaf check, possible shmem problem

After Jamie installed the c1oaf model ( entry 5424 ) I went and checked the intermodel communication.

Remember the config is:

c1lsc ->SHMEM-> c1oaf
c1oaf ->SHMEM-> c1lsc
c1pem ->SHMEM-> c1rfm ->PCIE-> c1oaf

I checked at least one of every communications type.

-All signals reach their destinations.
-c1lsc_to_c1oaf_via_shmem is more noisy adding noise to the signal. lsc runs at 16kHz and oaf at 2kHz but that should actually smooth things out.

c1lsc_to_c1oaf_via_shmem.png

 

  15078   Thu Dec 5 15:09:50 2019 gautamUpdateCDSc1oaf crashed c1lsc

I tried starting the c1oaf model, but got a DQ error (I want the option of running feedforward during locking even if the filters aren't particularly well tuned yet). Note that this isn't "just a warning light" - some channels are initialized to +/- 1e20, so if you try turning some filters on, you will deliver a massive kick to the optics. Restarting it crashed c1lsc (this is not unexpected behavior - the only way to clear the DQ error is to restart the model, and empirically, the success rate is ~50%). The reboot script brought everything back online smoothly, and the second, time, c1oaf started without any issues.

While looking at the CDS overview screen, I noticed that the c1scy model was reporting frequent RFM errors for the C1:SCY-RFM_ETMY_LSC channel (but none of the others). On the sender model (c1rfm), no errors were being reported. The diag reset button / mxstream restart didn't really work either. See Attachment #1. Just restarting the c1scy model didn't fix the error - I had to reboot the machine and restart the models, and now no errors are being reported.

Attachment #2 shows the current nominal CDS status - the red light on c1lsc is due to some missing c1dnn channels (I'll remove these at the next c1lsc model change because I don't want to un-necessarily reboot the vertex FEs), and the c1omc model is obsolete I guess. c1daf isn't running right now but once I get the new fiber (ordered), I'm gonna restart this model as well.

P.S. The ALS temperature sliders are not SDF-ed. So when the model was restarted, I had to change the sliders back to their old values to get the beat back in the usable range.

  6944   Mon Jul 9 11:27:27 2012 JenneUpdateComputersc1oaf has been down for several days - BURT restore wasn't done correctly on startup

The c1oaf model hasn't been running for a few days (since the leap second problems we were having last week).  I had looked into it, but finally figured it out (with Jamie's help) today. 

The BURT restore has to be given to the model during startup, but for whatever reason it wasn't BURT restoring until *after* the model had already failed to start.  The symptoms were:  no 'heartbeat' for the oaf model, no connection to the fb, NO SYNC on the GDS screen, 0x4000.  the BURT restore button was green, which threw me off the scent, but that's just because it did, in fact, get set, just way too late.

I ended up looking in the dmesg of the lsc computer, and the last set of stuff was several lines of "[3354303.626446] c1oaf: Epics burt restore is 0".  Nothing else was written after that.  Jamie pointed out that this meant the BURT restore wasn't getting sent before the model unloaded itself and decided not to run.

The solution:  restart the model, and manually click the BURT restore button as soon as you're able (after everything comes back from being white).  We used to have to do this, but then there was a "fix", which apparently isn't super robust and failed for the oaf (even though it used to work just fine).  Bugzilla report submitted.

  9911   Mon May 5 19:51:56 2014 jamieUpdateCDSc1oaf model broken because of broken BLRMS block

I finally tracked down the problem with the c1oaf model to the BLRMS part:

/opt/rtcds/userapps/release/cds/common/models/BLRMS.mdl

blrms-hot-mess.pngsddefault.jpg

Note that this is pulling from a cds/common location, so presumably this is a part that's also being used at the sites.

Either there was an svn up that pulled in something new and broken, or the local version is broken, or who knows what.

We'll have to figure how what's going on here, but in the mean time, as I already mentioned, I'm leaving the c1oaf model off for now.

 RXA: also...we updated Ottavia to Ubuntu 12 LTS...but now it has no working network connection. Needs help.  (which of course has nothing whatsoever to do with this point )

  14922   Wed Oct 2 10:40:07 2019 gautamUpdateCDSc1oaf model restarted

This morning, I restarted the c1oaf model on the c1lsc machine, so as to have the option of enabling some feedforward action. Unsurprisingly, the "DC" indicator is red, citing a "0x2bad". In the past, I've been able to correct this by simply restarting the model. But given the fragility of the c1lsc machine, I think I'll live with not having the OAF model signals in frames. Medium-term, I'd like to pare down the c1oaf model a bit - I think it has way too many options/matrices right now, and is an un-necessarily bloated and heavy model. Unless there are serious objections, I will do this work when I next feel like it.

  14522   Mon Apr 8 11:53:17 2019 gautamUpdateCDSc1oaf needs debugging

I tried restarting c1oaf this weekend to see if turning on the MC length FF would affect the ALS noise performance. I burtrestored the filter settings from March 2016. However, I noticed several possible anomalies, which need debugging. I am not turning the model off because of the possibility of having to reboot all the vertex FEs, but this model is totally unusable right now.

  1. Attachment #1 - the vertex seismometer input produces 1e+20 cts at the output of the feedforward filter. Attachment #2 shows the shape of the feedforward filters - doesn't explain the saturation. Since this is a feedforward loop, a runaway loop can't be the explanation either.
  2. The MC length feedforward control signal is supposed to only go to MC2 - but MC1 and MC3 coil outputs were saturated when I enabled the feedforward.
  7287   Mon Aug 27 17:14:00 2012 jamieUpdateCDSc1oaf problem

Quote:

I came in to the lab in the evening and found c1lsc had "red" for FB connection.
I restarted c1lsc models and it kept hung the machine everytime.

I decided to kill all of the model during the startup sequence right after the reboot.
Then run only c1x04 and c1lsc. It seems that c1oaf was the cause, but it wasn't clear.

The "red for FB connection" issue was probably a dead mx_stream on c1lsc.  That can usually be fixed by just restarting mx_stream.

There is definitely a problem with c1oaf, though.  It crashes immediately after attempting to start.  kernel log for a crash included below.

We will leave c1oaf off until we have time to debug.

[83752.505720] c1oaf: Send Computer Number  = 0
[83752.505720] c1oaf: entering the loop
[83752.505720] c1oaf: waiting to sync 19520
[83753.207372] c1oaf: Synched 701492
[83753.207372] general protection fault: 0000 [#2] SMP 
[83753.207372] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:2e:01.0/class
[83753.207372] CPU 4 
[83753.207372] Modules linked in: c1oaf c1ass c1sup c1lsp c1cal c1lsc c1x04 open_mx dis_irm dis_dx dis_kosif mbuf [last unloaded: c1oaf]
[83753.207372] 
[83753.207372] Pid: 0, comm: swapper Tainted: G      D    2.6.34.1 #5 X7DWU/X7DWU
[83753.207372] RIP: 0010:[<ffffffffa1bf7567>]  [<ffffffffa1bf7567>] T.2870+0x27/0xbf0 [c1oaf]
[83753.207372] RSP: 0000:ffff88023ecc1aa8  EFLAGS: 00010092
[83753.207372] RAX: ffff88023ecc1af8 RBX: ffff88023ecc1ae8 RCX: ffffffffa1c35e48
[83753.207372] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffffffffa1c21360
[83753.207372] RBP: ffff88023ecc1bb8 R08: 0000000000000000 R09: 0000000000175f60
[83753.207372] R10: 0000000000000000 R11: ffffffffa1c2a640 R12: ffff88023ecc1b38
[83753.207372] R13: ffffffffa1c2a640 R14: 0000000000007fff R15: 0000000000000000
[83753.207372] FS:  0000000000000000(0000) GS:ffff880001f00000(0000) knlGS:0000000000000000
[83753.207372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83753.207372] CR2: 000000000378a040 CR3: 0000000001a09000 CR4: 00000000000406e0
[83753.207372] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[83753.207372] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[83753.207372] Process swapper (pid: 0, threadinfo ffff88023ecc0000, task ffff88023ec7eae0)
[83753.207372] Stack:
[83753.207372]  ffff88023ecc1ab8 0000000000000096 0000000000000019 ffff88023ecc1b18
[83753.207372] <0> 0000000000014729 0000000000032a0c ffff880001e12d90 000000000000000a
[83753.207372] <0> ffff88023ecc1bb8 ffffffffa1c06cad ffff88023ecc1be8 000000000000000f
[83753.207372] Call Trace:
[83753.207372]  [<ffffffffa1c06cad>] ? filterModuleD+0xd6d/0xe40 [c1oaf]
[83753.207372]  [<ffffffffa1c07ae3>] feCode+0xd63/0x129b0 [c1oaf]
[83753.207372]  [<ffffffffa1c00dc6>] ? T.2888+0x1966/0x1f10 [c1oaf]
[83753.207372]  [<ffffffffa1c1b3bf>] fe_start+0x1c8f/0x3060 [c1oaf]
[83753.207372]  [<ffffffff8102ce57>] ? select_task_rq_fair+0x2c8/0x821
[83753.207372]  [<ffffffff8104cd8b>] ? enqueue_hrtimer+0x65/0x72
[83753.207372]  [<ffffffff8104d8f6>] ? __hrtimer_start_range_ns+0x2d6/0x2e8
[83753.207372]  [<ffffffff8104d91b>] ? hrtimer_start+0x13/0x15
[83753.207372]  [<ffffffff810173df>] play_dead_common+0x6e/0x70
[83753.207372]  [<ffffffff810173ea>] native_play_dead+0x9/0x20
[83753.207372]  [<ffffffff81001c38>] cpu_idle+0x46/0x8d
[83753.207372]  [<ffffffff814ec523>] start_secondary+0x192/0x196
[83753.207372] Code: 1f 44 00 00 55 66 0f 57 c0 48 89 e5 41 57 41 56 41 55 41 54 53 48 8d 9d 30 ff ff ff 48 8d 43 10 4c 8d 63 50 48 81 ec e8 00 00 00 <66> 0f 29 85 30 ff ff ff 48 89 85 18 ff ff ff 31 c0 48 8d 53 78 
[83753.207372] RIP  [<ffffffffa1bf7567>] T.2870+0x27/0xbf0 [c1oaf]
[83753.207372]  RSP <ffff88023ecc1aa8>
[83753.207372] ---[ end trace df3ef089d7e64971 ]---
[83753.207372] Kernel panic - not syncing: Attempted to kill the idle task!
[83753.207372] Pid: 0, comm: swapper Tainted: G      D    2.6.34.1 #5
[83753.207372] Call Trace:
[83753.207372]  [<ffffffff814ef6f4>] panic+0x73/0xe8
[83753.207372]  [<ffffffff81063c19>] ? crash_kexec+0xef/0xf9
[83753.207372]  [<ffffffff8103a386>] do_exit+0x6d/0x712
[83753.207372]  [<ffffffff81037311>] ? spin_unlock_irqrestore+0x9/0xb
[83753.207372]  [<ffffffff81037f1b>] ? kmsg_dump+0x115/0x12f
[83753.207372]  [<ffffffff81006583>] oops_end+0xb1/0xb9
[83753.207372]  [<ffffffff8100674e>] die+0x55/0x5e
[83753.207372]  [<ffffffff81004496>] do_general_protection+0x12a/0x132
[83753.207372]  [<ffffffff814f17af>] general_protection+0x1f/0x30
[83753.207372]  [<ffffffffa1bf7567>] ? T.2870+0x27/0xbf0 [c1oaf]
[83753.207372]  [<ffffffffa1c06cad>] ? filterModuleD+0xd6d/0xe40 [c1oaf]
[83753.207372]  [<ffffffffa1c07ae3>] feCode+0xd63/0x129b0 [c1oaf]
[83753.207372]  [<ffffffffa1c00dc6>] ? T.2888+0x1966/0x1f10 [c1oaf]
[83753.207372]  [<ffffffffa1c1b3bf>] fe_start+0x1c8f/0x3060 [c1oaf]
[83753.207372]  [<ffffffff8102ce57>] ? select_task_rq_fair+0x2c8/0x821
[83753.207372]  [<ffffffff8104cd8b>] ? enqueue_hrtimer+0x65/0x72
[83753.207372]  [<ffffffff8104d8f6>] ? __hrtimer_start_range_ns+0x2d6/0x2e8
[83753.207372]  [<ffffffff8104d91b>] ? hrtimer_start+0x13/0x15
[83753.207372]  [<ffffffff810173df>] play_dead_common+0x6e/0x70
[83753.207372]  [<ffffffff810173ea>] native_play_dead+0x9/0x20
[83753.207372]  [<ffffffff81001c38>] cpu_idle+0x46/0x8d
[83753.207372]  [<ffffffff814ec523>] start_secondary+0x192/0x196

  11551   Tue Sep 1 02:44:44 2015 KojiSummaryCDSc1oaf, c1mcs modified for the IMC angular FF

[Koji, Ignacio]

In order to allow us to work on the IMC angular FF, we made the signal paths from PEM to MC SUSs.
In fact, there already were the paths from c1pem to c1oaf. So, the new paths were made from c1oaf to c1mcs. (Attachment 1~3)

After some debugging those two models started running. The additional cost of the processing time is insignificant.
FB was restarted to accomodate the change.

Once the modification of the models was completed, the OAF screens were modified. It seemed that the Kissel button
for the output matrix haven't been updated for the PRM ASC implementation. This was fixed as the button was updated this time.
In addition, the button for the FM matrix was also made and pasted.

 

ELOG V3.1.3-