40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 277 of 341  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  4015   Mon Dec 6 16:49:43 2010 josephbUpdateCDSc1lsc halfway to working

C1LSC Status:

The c1lsc computer is running Gentoo off of the fb server. It has been connected to the DAQ network and is handling mx_streams properly (so we're not flooding the network error messages like we used to with c1iscex).  It is using the old c1lsc ip address (192.168.113.62). It can ssh'd into.

However, it is not talking properly to the IO chassis.  The IO chassis turns on when the computer turns on, but the host interface board in the IO chassis only has 2 red lights on (as opposed to many green lights on the host interface boards in the c1sus, c1ioo, and c1iscex IO chassis).  The c1lsc IO processor (called c1x04) doesn't see any ADCs, DACs, or Binary cards.  The timing slave is receiving 1PPS and is locked to it, but because the chassis isn't communicating, c1x04 is running off the computer's internal clock, causing it to be several seconds off. 

Need to investigate why the computer and chassis are not talking to each other.

General Status:

The c1sus and c1ioo computers are not talking properly to the frame builder.  A reboot of c1iscex fixed the same problem earlier, however, as Kiwamu and Suresh are working in the vacuum, I'm leaving those computers alone for the moment, but a reboot and burt restore probably should be done later today for c1sus and c1ioo

 

Current CDS status:

MC damp dataviewer diaggui AWG c1ioo c1sus c1iscex RFM Dolphin RFM Sim.Plant Frame builder TDS
                       
  7577   Fri Oct 19 00:55:35 2012 JenneUpdateComputersc1lsc is down (at least all of the models)

When Evan and I were dithering the BS and ITMY (see his elog), I noticed that c1lsc was acting weird.  the IOP was the only one with the blinky heartbeat.  The IOP was all green lights, but all the other models had red for the fb connection, as well as the rightmost indicator (I don't know what that one is for).  I logged on to c1lsc and ran 'rtcds restart all'.  The script didn't get anywhere beyond saying it was beginning to stop the 1st model (sup, the bottom one on the lsc list).  Then all of the cpus went white.  I can still ping c1lsc, but I can't ssh to it.

I'm not sure what to do here Jamie.  Heelp. 

  8367   Thu Mar 28 12:50:52 2013 JenneUpdateComputersc1lsc is fine

 Manasa told me that she did things in a different order than her old elog. 

She had

(1) ssh'ed to c1lsc and did a remote shutdown / restart,

(2) restarted fb,

(3) restarted the mxstream on c1lsc,

(4) restarted each model individually in some order that I forgot to ask.

However, with the situation as in her "before" screenshot, all that needed to be done was restart the mxstream process on c1lsc. 

Anyhow, when I looked at the OAF model, it was complaining of "no sync", so I restarted the model, and it came back up fine.  All is well again.

  7580   Fri Oct 19 12:45:12 2012 DenUpdateCDSc1lsc is up after reboot
  6243   Fri Feb 3 10:48:24 2012 DenUpdateComputersc1lsc kernel

This morning I killed again c1lsc kernel with the new realization of fxlms algorithm. It works fine with gcc compiler during the tests. However, smth forbidden for the kernel is going on. I'll spend some more time on investigatin it. Interesting thing is that I did not even pressed "On" at the OAF MEDM screen to make the code running. c1lsc suspended even before. May be there is some function-name mismatch.

After c1lsc suspention I recomiled back non-working code and rebooted c1lsc. c1sus is also bad after c1lsc reboot as they communicate. I killed x04, lsc, ass, oaf models on the c1lsc computer and sus, mcs, rfm, pem on the c1sus computer. Then I restarted x02 model and restored its burt snapshot from 08:07. After I started all models back and restored their burt snapshots from 08:07. Then I diag reset all started models.

Before starting new fxlms code I've shutted down all the optics so that possible c1lsc suspention would not make them crazy. After reboot I turned the coils back. Everything seems to work fine.

  6249   Fri Feb 3 17:29:28 2012 DenUpdateComputersc1lsc kernel

The reason I've killed the c1lsc kernel was the following - when the code starts to run, it initializes some parameters and this takes ~0.2 msec per dof. Now, the old code did nothing with a DOF if C1:OAF-ADAPT_???_ONOFF == OFF. My code still initialized the parameters but then does nothing because no witness channels are given. But it spends 8*0.2 = 1.6 msec for initializing all 8 dof. As the code is called with frequency 2k, this was the reason for crashing. Now I've corrected my code, it compiles, runs and does not kill c1lsc. However, the old code would also kill the kernel if all DOF are filtered. So, when we'll use all 8 DOF, we'll have to split variable initialization.

But this is not the biggest problem. C1OAF model must be corrected, because, as for now, all 8 DOF call the same ADAPT_XFCODE function. As this function uses static variables, they will be all messed up by different DOF signals.

  4749   Thu May 19 16:46:20 2011 kiwamuUpdateLSCc1lsc model : input channels rearanged

According to Suresh's LSC rack design I rearranged the input channels of the c1lsc model such that the analog signals and the ADC channels are nicely matched.

Also I updated the c1lsc model in the svn with a help from Joe. The picture below is a screen shot of the input channels in the model file after I edited it.

c1lsc.png

  14146   Wed Aug 8 23:03:42 2018 gautamUpdateCDSc1lsc model started

As part of this slow but systematic debugging, I am turning on the c1lsc model overnight to see if the model crashes return.

  11809   Wed Nov 25 14:46:53 2015 gautamUpdateCDSc1lsc models restarted

I noticed that all the models running on C1LSC had crashed when I came in earlier today. I restarted all of them by ssh-ing into C1LSC and running rtcds restart all. The models seem to be running fine now.

  8335   Mon Mar 25 11:42:45 2013 JamieUpdateComputersc1lsc mx_stream ok

I'm not exactly sure what the problem was here, but I think it had to do with a stuck mx_stream process that wasn't being killed properly.  I manually killed the process and it seemed to come up fine after that.  The regular restart mechanisms should work now.

No idea what caused the process to hang in the first place, although I know the newer RCG (2.6) is supposed to address some of these mx_stream issues.

  8334   Mon Mar 25 09:52:22 2013 JenneUpdateComputersc1lsc mxstream won't restart

Most of the front ends' mx streams weren't running, so I did the old mxstreamrestart on all machines (see elog 6574....the dmesg on c1lsc right now, at the top, has similar messages).  Usually this mxstream restart works flawlessly, but today c1lsc isn't working.  Usually to the right side of the terminal window I get an [ok] when things work.  For the lsc machine today, I get [!!] instead. 

After having learned from recent lessons, I'm waiting to hear from Jamie.

  1235   Fri Jan 16 18:33:54 2009 YoichiSummaryComputersc1lsc rebooted to fix 16Hz glitches
Kakeru, Yoichi

There were 16Hz harmonics in the PD3 and PD4 channels even when there is no light falling on it.
Actually, even when the connection to the ADC was removed, the 16Hz noise was still there.

Rob suggested that this might be digital problem, because data is sent to the daq computer very 1/16 of a second.

We restarted c1lsc and the problem went away.
  4124   Fri Jan 7 12:01:39 2011 OsamuUpdateComputersc1lsc running

I got a new adapter board for expansion chassis from CDS and exchanged the existing adapter board which was laid on the floor around ETMX to new one.

Then I connected the chassis to c1lcs,  c1lsc seems to be running now. I will return the old board to CDS since Rolf says he wants to return it to manufacture.

 I found an interface box from ADC to D-SUB37pin and a cable to connect them.

I needed to make cables to connect the interface box to existing LSC whitening filters that has a 37 pin female D-SUB connector on one end and a 40pin female flat connector on the other end. We should use shielded cables for them, but unfortunately CDS did not have right one. Temporarily I made one cable for 1-8ch using a ribbon twist cable like Joe did.

I found a saturation at ch5 of ADC0 on c1lsc. I did not check carefully but it seemed to come from the LSC whitening board. Input of ch5 of the whitening board was not terminated and had a huge output voltage, but also ch6 was not terminated and had no big output. I guess something wrong on the LSC whitening board. Needs to be checked. Anyway I unplugged the small ribbon cable between the whitening board and the next LSC AA filter board.

Finally I realized that fiber connection of RFM did not exist. What I saw was the fiber cable of Dolphin. We need a RFM PCIe interface board, and a long fiber cable between c1lsc and RFM hub.

  6716   Wed May 30 18:08:40 2012 JamieUpdateLSCc1lsc: add error point pick-offs, moved ctrl pick-offs after feedforward

I made some modifications to the c1lsc model in order to extract both the error and control signals.

I added pick-offs for the error signals right before IFO DOF filter modules.  These are then sent with GOTOs to outputs.

I also modified things on the control side.  The OAF stuff was picking off control signals before feedforward in/outs.  After discussing with Jenne we decided that it would make sense for the OAF to be looking at the control signals after feedforward.  It also makes sense to define the control signal after the feedforward.  These control signals are then sent with GOTOs to another set of outputs.

Finally, I moved the triggers to after the control signal pickoffs, and right before the output matrix.  The final chain looks like (see attachment):

input matrix --> power norm --> ERR pickoff --> DOF filters --> FF out --> FF in --> CTRL pickoff --> trigger --> output matrix

The error pickoff outputs in the top level of the model are left terminated for the moment.  Eventually I will be hooking these into the new c1cal calibration model.

The model was recompiled, installed, and restarted.  Everything came up fine.

  6734   Thu May 31 22:13:08 2012 JamieUpdateCDSc1lsc: added remaining SHMEM senders for ERR and CTRL, c1oaf model updated appropriately

All the ERR and CTRL outputs in c1lsc now go to SHMEM senders.  I renamed the the CTRL output SHMEM senders to be more generic, since they aren't specifically for OAF anymore.  See attached image from c1lsc.

c1oaf was updated so that SHMEM receivers pointed to the newly renamed senders.

c1lsc and c1oaf were rebuilt, installed, and restarted and are now running.

  11280   Mon May 11 13:21:25 2015 manasaUpdateCDSc1lsp and c1sup not running

I found the c1lsp and c1sup models not running anymore on c1lsc (white blocks for status lights on medm).

To fix this, I ssh'd into c1lsc. c1lsc status did not show c1lsp and c1sup models running on it.

I tried the usual rtcds restart <model name> for both and that returned error "Cannot start/stop model 'c1XXX' on host c1lsc".

I also tried rtcds restart all on c1lsc, but that has NOT brought back the models alive.

Does anyone know how I can fix this??

c1sup runs some the suspension controls. So I am afraid that the drift and frequent unlocking of the arms we see might be related to this.

 

P.S. We might also want to add the FE status channels to the summary pages.

  11282   Mon May 11 14:08:19 2015 manasaUpdateCDSc1lsp and c1sup removed?

I just found out that c1lsp and c1sup models no more exist on the FE status medm screens. I am assuming some changes were done to the models as well.

Earlier today, I was looking at some of the old medm screens running on Donatella that did not reflect this modification. 

Did I miss any elogs about this or was this change not elogged??

Quote:

I found the c1lsp and c1sup models not running anymore on c1lsc (white blocks for status lights on medm).

To fix this, I ssh'd into c1lsc. c1lsc status did not show c1lsp and c1sup models running on it.

I tried the usual rtcds restart <model name> for both and that returned error "Cannot start/stop model 'c1XXX' on host c1lsc".

I also tried rtcds restart all on c1lsc, but that has NOT brought back the models alive.

Does anyone know how I can fix this??

c1sup runs some the suspension controls. So I am afraid that the drift and frequent unlocking of the arms we see might be related to this.

 

P.S. We might also want to add the FE status channels to the summary pages.

 

  11285   Tue May 12 08:51:08 2015 ericqUpdateCDSc1lsp and c1sup removed?
Quote:

was this change not elogged??

This is my sin.

Back in Febuary (around the 25th) I modified c1sus.mdl, removing the simulated plant connections we weren't using from c1lsp and c1sup. This was included in the model's svn log, but not elogged. blush

The models don't start with the rtcds restart shortcut, because I removed them from the c1lsc line in FB:/diskless/root/etc/rtsystab (or c1lsc:/etc/rtsystab). There is a commented out line in there that can be uncommented to restore them to the list of models c1lsc is allowed to run. 

However, I wouldn't suspect that the models not running should affect the suspension drift, since the connections from them to c1sus have been removed. If we still have trends from early February, we could look and see if the drift was happening before I made this change. 

  13639   Fri Feb 16 22:15:30 2018 gautamUpdateGeneralc1mcs model restarted

c1mcs had died for some reason. Looking at dmesg, I see:

[769312.996875] c1mcsepics[1140]: segfault at 7f5000000012 ip 00007f50ea8ded8f sp 00007f50e9f53a10 error 4 in libc-2.19.so[7f50ea865000+1a1000]

None of the other EPICS processes died. Not sure what to make of this. I was at the PSL table working, and had closed the PSL shutter to avoid MC autolocker trying to keep the MC locked while I was mucking about, but this shouldn't have had any effect on an EPICS process?

Anyway, I just logged into c1sus, stopped and restarted the model. IMC locks fine now.

  6570   Wed Apr 25 21:24:10 2012 DenUpdateComputer Scripts / Programsc1oaf

C1OAF model, codes and medm screens are updated. All proper files are commited to svn and updated at the new model path.

  14927   Wed Oct 2 23:23:02 2019 gautamUpdateCDSc1oaf DC indicator needs to be green

Today, I found out that this type of "0x2bad" DC error is connected to the 1e+20 cts output. The solution was to bite the bullet and stop/start the c1oaf model (at the risk of crashing the vertex FEs). Today, I was lucky and the model came back online with all CDS indicators green. At which point I was able to engage length feedforward to MC2 (with some admittedly old filter). Some subtraction is happening, see Attachment #1. This was just meant to test whether the signal routing is happening - the feedforward signal goes to the "ALTPOS" input of the suspension CDS block, which AFAIK does not have a corresponding MEDM EPICS indicator. So I couldn't figure out whether the feedforward control signal was in fact making it to the suspension. On the evidence of the suppression of MCL in the 1-3 Hz band, I would conclude that it is. Useful to be able to engage these FF filters for better lockability.

Quote:

Attachment #1 - the vertex seismometer input produces 1e+20 cts at the output of the feedforward filter. Attachment #2 shows the shape of the feedforward filters - doesn't explain the saturation. Since this is a feedforward loop, a runaway loop can't be the explanation either.

  5426   Thu Sep 15 21:56:01 2011 MirkoUpdateCDSc1oaf check, possible shmem problem

After Jamie installed the c1oaf model ( entry 5424 ) I went and checked the intermodel communication.

Remember the config is:

c1lsc ->SHMEM-> c1oaf
c1oaf ->SHMEM-> c1lsc
c1pem ->SHMEM-> c1rfm ->PCIE-> c1oaf

I checked at least one of every communications type.

-All signals reach their destinations.
-c1lsc_to_c1oaf_via_shmem is more noisy adding noise to the signal. lsc runs at 16kHz and oaf at 2kHz but that should actually smooth things out.

c1lsc_to_c1oaf_via_shmem.png

 

  15078   Thu Dec 5 15:09:50 2019 gautamUpdateCDSc1oaf crashed c1lsc

I tried starting the c1oaf model, but got a DQ error (I want the option of running feedforward during locking even if the filters aren't particularly well tuned yet). Note that this isn't "just a warning light" - some channels are initialized to +/- 1e20, so if you try turning some filters on, you will deliver a massive kick to the optics. Restarting it crashed c1lsc (this is not unexpected behavior - the only way to clear the DQ error is to restart the model, and empirically, the success rate is ~50%). The reboot script brought everything back online smoothly, and the second, time, c1oaf started without any issues.

While looking at the CDS overview screen, I noticed that the c1scy model was reporting frequent RFM errors for the C1:SCY-RFM_ETMY_LSC channel (but none of the others). On the sender model (c1rfm), no errors were being reported. The diag reset button / mxstream restart didn't really work either. See Attachment #1. Just restarting the c1scy model didn't fix the error - I had to reboot the machine and restart the models, and now no errors are being reported.

Attachment #2 shows the current nominal CDS status - the red light on c1lsc is due to some missing c1dnn channels (I'll remove these at the next c1lsc model change because I don't want to un-necessarily reboot the vertex FEs), and the c1omc model is obsolete I guess. c1daf isn't running right now but once I get the new fiber (ordered), I'm gonna restart this model as well.

P.S. The ALS temperature sliders are not SDF-ed. So when the model was restarted, I had to change the sliders back to their old values to get the beat back in the usable range.

  6944   Mon Jul 9 11:27:27 2012 JenneUpdateComputersc1oaf has been down for several days - BURT restore wasn't done correctly on startup

The c1oaf model hasn't been running for a few days (since the leap second problems we were having last week).  I had looked into it, but finally figured it out (with Jamie's help) today. 

The BURT restore has to be given to the model during startup, but for whatever reason it wasn't BURT restoring until *after* the model had already failed to start.  The symptoms were:  no 'heartbeat' for the oaf model, no connection to the fb, NO SYNC on the GDS screen, 0x4000.  the BURT restore button was green, which threw me off the scent, but that's just because it did, in fact, get set, just way too late.

I ended up looking in the dmesg of the lsc computer, and the last set of stuff was several lines of "[3354303.626446] c1oaf: Epics burt restore is 0".  Nothing else was written after that.  Jamie pointed out that this meant the BURT restore wasn't getting sent before the model unloaded itself and decided not to run.

The solution:  restart the model, and manually click the BURT restore button as soon as you're able (after everything comes back from being white).  We used to have to do this, but then there was a "fix", which apparently isn't super robust and failed for the oaf (even though it used to work just fine).  Bugzilla report submitted.

  9911   Mon May 5 19:51:56 2014 jamieUpdateCDSc1oaf model broken because of broken BLRMS block

I finally tracked down the problem with the c1oaf model to the BLRMS part:

/opt/rtcds/userapps/release/cds/common/models/BLRMS.mdl

blrms-hot-mess.pngsddefault.jpg

Note that this is pulling from a cds/common location, so presumably this is a part that's also being used at the sites.

Either there was an svn up that pulled in something new and broken, or the local version is broken, or who knows what.

We'll have to figure how what's going on here, but in the mean time, as I already mentioned, I'm leaving the c1oaf model off for now.

 RXA: also...we updated Ottavia to Ubuntu 12 LTS...but now it has no working network connection. Needs help.  (which of course has nothing whatsoever to do with this point )

  14922   Wed Oct 2 10:40:07 2019 gautamUpdateCDSc1oaf model restarted

This morning, I restarted the c1oaf model on the c1lsc machine, so as to have the option of enabling some feedforward action. Unsurprisingly, the "DC" indicator is red, citing a "0x2bad". In the past, I've been able to correct this by simply restarting the model. But given the fragility of the c1lsc machine, I think I'll live with not having the OAF model signals in frames. Medium-term, I'd like to pare down the c1oaf model a bit - I think it has way too many options/matrices right now, and is an un-necessarily bloated and heavy model. Unless there are serious objections, I will do this work when I next feel like it.

  14522   Mon Apr 8 11:53:17 2019 gautamUpdateCDSc1oaf needs debugging

I tried restarting c1oaf this weekend to see if turning on the MC length FF would affect the ALS noise performance. I burtrestored the filter settings from March 2016. However, I noticed several possible anomalies, which need debugging. I am not turning the model off because of the possibility of having to reboot all the vertex FEs, but this model is totally unusable right now.

  1. Attachment #1 - the vertex seismometer input produces 1e+20 cts at the output of the feedforward filter. Attachment #2 shows the shape of the feedforward filters - doesn't explain the saturation. Since this is a feedforward loop, a runaway loop can't be the explanation either.
  2. The MC length feedforward control signal is supposed to only go to MC2 - but MC1 and MC3 coil outputs were saturated when I enabled the feedforward.
  7287   Mon Aug 27 17:14:00 2012 jamieUpdateCDSc1oaf problem

Quote:

I came in to the lab in the evening and found c1lsc had "red" for FB connection.
I restarted c1lsc models and it kept hung the machine everytime.

I decided to kill all of the model during the startup sequence right after the reboot.
Then run only c1x04 and c1lsc. It seems that c1oaf was the cause, but it wasn't clear.

The "red for FB connection" issue was probably a dead mx_stream on c1lsc.  That can usually be fixed by just restarting mx_stream.

There is definitely a problem with c1oaf, though.  It crashes immediately after attempting to start.  kernel log for a crash included below.

We will leave c1oaf off until we have time to debug.

[83752.505720] c1oaf: Send Computer Number  = 0
[83752.505720] c1oaf: entering the loop
[83752.505720] c1oaf: waiting to sync 19520
[83753.207372] c1oaf: Synched 701492
[83753.207372] general protection fault: 0000 [#2] SMP 
[83753.207372] last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:2e:01.0/class
[83753.207372] CPU 4 
[83753.207372] Modules linked in: c1oaf c1ass c1sup c1lsp c1cal c1lsc c1x04 open_mx dis_irm dis_dx dis_kosif mbuf [last unloaded: c1oaf]
[83753.207372] 
[83753.207372] Pid: 0, comm: swapper Tainted: G      D    2.6.34.1 #5 X7DWU/X7DWU
[83753.207372] RIP: 0010:[<ffffffffa1bf7567>]  [<ffffffffa1bf7567>] T.2870+0x27/0xbf0 [c1oaf]
[83753.207372] RSP: 0000:ffff88023ecc1aa8  EFLAGS: 00010092
[83753.207372] RAX: ffff88023ecc1af8 RBX: ffff88023ecc1ae8 RCX: ffffffffa1c35e48
[83753.207372] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffffffffa1c21360
[83753.207372] RBP: ffff88023ecc1bb8 R08: 0000000000000000 R09: 0000000000175f60
[83753.207372] R10: 0000000000000000 R11: ffffffffa1c2a640 R12: ffff88023ecc1b38
[83753.207372] R13: ffffffffa1c2a640 R14: 0000000000007fff R15: 0000000000000000
[83753.207372] FS:  0000000000000000(0000) GS:ffff880001f00000(0000) knlGS:0000000000000000
[83753.207372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83753.207372] CR2: 000000000378a040 CR3: 0000000001a09000 CR4: 00000000000406e0
[83753.207372] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[83753.207372] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[83753.207372] Process swapper (pid: 0, threadinfo ffff88023ecc0000, task ffff88023ec7eae0)
[83753.207372] Stack:
[83753.207372]  ffff88023ecc1ab8 0000000000000096 0000000000000019 ffff88023ecc1b18
[83753.207372] <0> 0000000000014729 0000000000032a0c ffff880001e12d90 000000000000000a
[83753.207372] <0> ffff88023ecc1bb8 ffffffffa1c06cad ffff88023ecc1be8 000000000000000f
[83753.207372] Call Trace:
[83753.207372]  [<ffffffffa1c06cad>] ? filterModuleD+0xd6d/0xe40 [c1oaf]
[83753.207372]  [<ffffffffa1c07ae3>] feCode+0xd63/0x129b0 [c1oaf]
[83753.207372]  [<ffffffffa1c00dc6>] ? T.2888+0x1966/0x1f10 [c1oaf]
[83753.207372]  [<ffffffffa1c1b3bf>] fe_start+0x1c8f/0x3060 [c1oaf]
[83753.207372]  [<ffffffff8102ce57>] ? select_task_rq_fair+0x2c8/0x821
[83753.207372]  [<ffffffff8104cd8b>] ? enqueue_hrtimer+0x65/0x72
[83753.207372]  [<ffffffff8104d8f6>] ? __hrtimer_start_range_ns+0x2d6/0x2e8
[83753.207372]  [<ffffffff8104d91b>] ? hrtimer_start+0x13/0x15
[83753.207372]  [<ffffffff810173df>] play_dead_common+0x6e/0x70
[83753.207372]  [<ffffffff810173ea>] native_play_dead+0x9/0x20
[83753.207372]  [<ffffffff81001c38>] cpu_idle+0x46/0x8d
[83753.207372]  [<ffffffff814ec523>] start_secondary+0x192/0x196
[83753.207372] Code: 1f 44 00 00 55 66 0f 57 c0 48 89 e5 41 57 41 56 41 55 41 54 53 48 8d 9d 30 ff ff ff 48 8d 43 10 4c 8d 63 50 48 81 ec e8 00 00 00 <66> 0f 29 85 30 ff ff ff 48 89 85 18 ff ff ff 31 c0 48 8d 53 78 
[83753.207372] RIP  [<ffffffffa1bf7567>] T.2870+0x27/0xbf0 [c1oaf]
[83753.207372]  RSP <ffff88023ecc1aa8>
[83753.207372] ---[ end trace df3ef089d7e64971 ]---
[83753.207372] Kernel panic - not syncing: Attempted to kill the idle task!
[83753.207372] Pid: 0, comm: swapper Tainted: G      D    2.6.34.1 #5
[83753.207372] Call Trace:
[83753.207372]  [<ffffffff814ef6f4>] panic+0x73/0xe8
[83753.207372]  [<ffffffff81063c19>] ? crash_kexec+0xef/0xf9
[83753.207372]  [<ffffffff8103a386>] do_exit+0x6d/0x712
[83753.207372]  [<ffffffff81037311>] ? spin_unlock_irqrestore+0x9/0xb
[83753.207372]  [<ffffffff81037f1b>] ? kmsg_dump+0x115/0x12f
[83753.207372]  [<ffffffff81006583>] oops_end+0xb1/0xb9
[83753.207372]  [<ffffffff8100674e>] die+0x55/0x5e
[83753.207372]  [<ffffffff81004496>] do_general_protection+0x12a/0x132
[83753.207372]  [<ffffffff814f17af>] general_protection+0x1f/0x30
[83753.207372]  [<ffffffffa1bf7567>] ? T.2870+0x27/0xbf0 [c1oaf]
[83753.207372]  [<ffffffffa1c06cad>] ? filterModuleD+0xd6d/0xe40 [c1oaf]
[83753.207372]  [<ffffffffa1c07ae3>] feCode+0xd63/0x129b0 [c1oaf]
[83753.207372]  [<ffffffffa1c00dc6>] ? T.2888+0x1966/0x1f10 [c1oaf]
[83753.207372]  [<ffffffffa1c1b3bf>] fe_start+0x1c8f/0x3060 [c1oaf]
[83753.207372]  [<ffffffff8102ce57>] ? select_task_rq_fair+0x2c8/0x821
[83753.207372]  [<ffffffff8104cd8b>] ? enqueue_hrtimer+0x65/0x72
[83753.207372]  [<ffffffff8104d8f6>] ? __hrtimer_start_range_ns+0x2d6/0x2e8
[83753.207372]  [<ffffffff8104d91b>] ? hrtimer_start+0x13/0x15
[83753.207372]  [<ffffffff810173df>] play_dead_common+0x6e/0x70
[83753.207372]  [<ffffffff810173ea>] native_play_dead+0x9/0x20
[83753.207372]  [<ffffffff81001c38>] cpu_idle+0x46/0x8d
[83753.207372]  [<ffffffff814ec523>] start_secondary+0x192/0x196

  11551   Tue Sep 1 02:44:44 2015 KojiSummaryCDSc1oaf, c1mcs modified for the IMC angular FF

[Koji, Ignacio]

In order to allow us to work on the IMC angular FF, we made the signal paths from PEM to MC SUSs.
In fact, there already were the paths from c1pem to c1oaf. So, the new paths were made from c1oaf to c1mcs. (Attachment 1~3)

After some debugging those two models started running. The additional cost of the processing time is insignificant.
FB was restarted to accomodate the change.

Once the modification of the models was completed, the OAF screens were modified. It seemed that the Kissel button
for the output matrix haven't been updated for the PRM ASC implementation. This was fixed as the button was updated this time.
In addition, the button for the FM matrix was also made and pasted.

 

  14123   Wed Aug 1 20:44:57 2018 gautamSummaryComputersc1omc model (re?)created

The main motivation behind adding a DAC card in c1ioo was to setup an RTCDS model for the OMC. Attachment #1 shows the new look CDS overview screen. Here is what I did.

Mostly, I followed instructions from when I setup the model for the EX green PZTs.


Simulink model:

The model is just a toy for now (CDS parameters, ADC block and 2 CDS filter modules). I leave it to Aaron to actually populate it, check functionality etc. The path to the model is /opt/rtcds/caltech/c1/userapps/release/isc/c1/models/c1omc.mdl. I am listing the parameters set on the CDS_PARAMETERS block:

  • host = c1ioo
  • site = c1
  • rate = 16k
  • dcuid = 27 (which I chose after making sure that this dcuid was not used on this list which I also updated by adding c1omc and moving c1imc to "old")
  • specific_cpu = 6 (again chosen after checking the available CPUs in the above list and confirming using the cset utility).
  • adc_Slave = 1
  • shmem_daq = 1
  • no_rfm_dma = 1
  • biquad = 1

Building and installing model:

Once the model was installed, I logged into c1ioo, and built and installed the models using the usual rtcds make and rtcds install instructions. Before starting the model, I edited /diskless/root.jessie/etc/rtsystab to allow c1omc to be run on c1ioo. Using sudo cset set, I verified that CPU #6 is no longer listed (if I understand correctly, the RTCDS system takes over the core).


MEDM:

To reflect all this on the MEDM CDS OVERVIEW screen, I just edited the screen.

  • Moved the orange explanation of bits over to the c1iscey panel to make space in the c1ioo panel.
  • Edited the macros to reflect the c1omc parameters.

DAQD:

Finally, I followed the instructions here to get the channels into frames and make all the indicators green. Went into fb and restarted the daqd processes. All looks good smiley. I'm going to leave the model running overnight to investigate stability. I forgot to svn commit the model tonight, will do it tomorrow.


The testing plan (at least initially) is to install the AA and AI boards from the OMC rack in 1X1/1X2. Then we will have short SCSI cables running from the ADC/DAC to these. The actual HV driving stages will remain in the OMC rack (NE corner of AS table).

@Steve, can we get 10 Male-Female D9 cables so that we can run them from 1X1/1X2 to the OMC rack?


Unrelated to this work: There were 2 crashes of the models on c1lsc, one ~6pm and one right now ~1030pm. The restart script brought everything back gracefully  yes...

  14126   Thu Aug 2 20:54:18 2018 gautamSummaryComputersc1omc model looks stable

Actually, c1lsc had crashed again sometime last night so I had to reboot everything this morning. I used the reboot script again, but I increased the sleep time between trying to start up the models again so that I could walk into the VEA and power cycle the c1lsc expansion chassis, as this kind of frequent model crash has been fixed by doing so in the past. Sure enough, there have been no issues since I rebooted everything at ~1030 in the morning. 

The c1omc model itself has been stable as well, though of course, there is nothing in there at the moment. I may do a check of the newly installed DAC tomorrow just to see that we can put out a sine wave.

Steve has ordered the D-sub cabling that will allow us to route signals between AA/AI boards in 1X1/1X2 to the HV PZT electronics in the OMC rack. Things look setup for a measurement next week. Aaron will post a block diagram + photoz of what box goes where in the electronics racks.

  2348   Mon Nov 30 16:23:51 2009 JenneUpdateComputersc1omc restarted

I found the FEsync light on the OMC GDS screen red.  I power cycled C1OMC, and restarted the front end code and the tpman.  I assume this is a remnant of the bootfest of the morning/weekend, and the omc just got forgotten earlier today.

  5971   Mon Nov 21 17:07:34 2011 MirkoUpdateCDSc1pem model dead

For some reason C1PEM doesn't seem to work anymore after a recompilation. It did recompile fine. We just changed some channel / subsystem names.

Tried reverting to the svn version. Doesn't work. Reboot C1SUS also no good.

  5973   Mon Nov 21 22:51:55 2011 MirkoUpdateCDSc1pem model dead

Quote:

For some reason C1PEM doesn't seem to work anymore after a recompilation. It did recompile fine. We just changed some channel / subsystem names.

Tried reverting to the svn version. Doesn't work. Reboot C1SUS also no good.

 It is fine again. Thanks Jamie.

  4028   Wed Dec 8 14:51:09 2010 josephbUpdateCDSc1pem now recording data

Problem:

c1pem model was reporting all zeros for all the PEM channels.

Solution:

Two fold.  On the software end, I added ADCs 0, 1, and 2 to the model.  ADC 3 was already present and is the actual ADC taking in PEM information.

There was a problem noted awhile back by Alex and Rolf that there's a problem with the way the DACs and ADCs are number internally in the code.  Missing ADCs or DACs prior to the one you're actually using can cause problems.

At some point that problem should be fixed by the CDS crew, but for now, always include all ADCs and DACs up to and including the highest number ADC/DAC you need to use for that model.

On the physical end, I checked the AA filter chassis and found the power was not plugged in.  I plugged it in.

Status:

We now have PEM channels being recorded by the FB, which should make Jenne happier.

  4101   Sat Jan 1 19:13:40 2011 ranaUpdateCDSc1pem now recording data

 I found that there was no PEM data nor any other data (no SUS or otherwise. No testpoints, no DAQ).

I went through the procedure that Jenne has detailed in the Wiki but it didn't work.

1) Firstly, the 'telnet fb 8088' step doesn't work. It says "Connected to fb.martian" but then just hangs. To replicate the effect of this step I tried ssh'ing to fb and doing a 'pkill daqd'. That works to restart the daqd process.

2) The wiki instructions had a problem. In the GUI step, it should say 'Save' after the Acquire bit has been set to 1. Even so, this works to get the .ini file right and the DTT can see the correct channel list, but none of the channels are available. There are just 'Unable to obtain measurement data'.

3) I tried running 'startc1pem', but no luck. I also tried rebooting c1sus from the command line. That worked so far as to come back up with all the right processes running, but still no data. The actual /frames directory shows that there are frames, but we just can't see the data. I also tried to get data usind the DTT-NDS2 method, but still no luck. (*** ITMX and ITMY both came back with all their filters off; worth checking if their BURTs are working correctly.)

Using DataViewer, however, I AM able to see the data (although the channel name is RED). In fact, I am able to see the trend data ever since I changed the Acquire bit to 1. Plot attached as evidence. Why does DTT not work anymore???

  12592   Wed Nov 2 22:56:45 2016 gautamUpdateCDSc1pem revamped

installing the BLRMS 2k blocks turned out to be quite non-trivial due to a whole host of CDS issues that had to be debugged, but i've restored everything to a good state now, and the channels are being logged. detailed entry with all the changes to follow.

  12595   Thu Nov 3 12:38:42 2016 gautamUpdateCDSc1pem revamped

A number of changes were made to C1PEM and some library parts. Recall that the motivation was to add BLRMS channels for all our suspension coils and shadow sensor PDs, which we are first testing out on the IMC mirrors.

Here is the summary:

BLRMS_2k library block

  • The name of the custom C code block in this library part was named 'BLRMSFILTER' which conflicted with the name of the function call in the C code it is linked to, which lead to compilation errors
  • Even though the part was found in /opt/rtcds/userapps/release/cds/c1/models and not in the common repository, just to be safe, I made a copy of the part called BLRMS_2k_40m which lives in the above directory. I also made a copy of the code it calls in /opt/rtcds/userapps/release/cds/c1/src

C1PEM model + filter channels

  • Adding the updated BLRMS_2k_40m library part still resulted in some compilation errors - specifically, it was telling me to check for missing links around the ADC parts
  • Eric suggested that the error messages might not be faithfully reporting what the problem is - true enough, the problem lay in the fact that c1pem wasn't updated to follow the namespace convention that we now use in all the RT models - the compiler was getting confused by the fact that the BLRMS stuff was in a namespace block called 'SUS', but the rest of the PEM stuff wasn't in such a block
  • I revamped c1pem to add namespace blocks called PEM and DAF, and put the appropriate stuff in the blocks, after which there were no more compilation errors
  • However, this namespace convention messed up the names of the filter modules and associated channels - this was resolved with Eric's help (find and replace did the job, this is a familiar problem that we had encountered not too long ago when C1IOO was similarly revamped...)
  • There was one last twist in that the model would compile and install, but just would not start. I tried the usual voodo of restarting all the models, and even did a soft reboot of c1sus, to no avail. Looking at dmesg, I tracked the problem down to a burt restore issue - the solution was to press the little 'BURT' button next to c1pem on the CDS overview MEDM screen as soon as it appeared while restarting the model

All the channels seem to exist, and FB seems to not be overloaded judging by the performance overnight up till the power outage. I will continue to monitor this...

GV Edit 3 Nov 2016 7pm:

I had meant to check the suitability of the filters used - there is a detailed account of the filters implemented in BLRMSFILTER.c here, and I quickly looked at the file on hand to make sure the BP filters made sense (see Attachment #1). These the BP filters are 8th order elliptical filters and the lowpass filters are16th order elliptical filters scaled for the appropriate frequency band, which are somewhat different from what we use on the seismometer BLRMS channels, where the filters are order 4, but I don't think we are significantly overloaded on the computational aspect, and the lowpass filters have sufficiently steep roll-off, these should be okay...

  12597   Thu Nov 3 13:36:16 2016 ericqUpdateCDSc1pem revamped

It seems that the EX and EY BLRMS banks were missing the BP and LP filters for the 0.03-0.1 and 0.1-0.3 bands. I've copied over the filters from the BS seismometer.

However, if it looks like the integrated C code BLRMS block works out well, we could replace the seismometers' filter module heavy BLRMS blocks and cut down on the PEM model bloat.

  12766   Fri Jan 27 21:21:35 2017 gautamUpdateCDSc1pem revamped

The coil and PD BLRMS are useful tools in identifying when glitches occur in the PD  readout, I thought it would be good to install them for ITMY, ETMX and SRM (since I plan to switch the MC3 satellite box, which we suspect to be problematic, with the SRM one). For this purpose, I had to install some IPC SHMEM blocks in C1SUS and recompile. 24 IPC channels were added to pipe the coil, PD and Oplev signals from C1SUS to C1PEM - the recompilation went smoothly, and it doesn't look like the model computation time has increased significantly or that the model is any closer to timing out.

However, I was unable to install the BLRMS blocks in C1PEM, as when I tried to compile the model with BLRMS for these extra 24 channels, I got a compilation error saying that I have exceeded the maximum allowed 499 testpoints per channel. Is there any workaround to this? It would be possible to create a custom BLRMS block that doesn't have all those testpoints, maybe this is the way to go? Especially if we want to install these channels for all our SOS optics, and also replace the current Seismic BLRMS with this scheme for consistency?

GV edit: I have implemented this scheme - after backing up the original BLRMS_2k part, I made a new one with no testpoints and only EPICS readouts. Doing so allowed me to recompile c1pem without any issues, the CPU time seems to have gone up by 3us from ~55us to 58us. So the BLRMS data record is only available at 16Hz, since there are no DQ channels in the BRLMS block - do we want these in any case? Let's see how this does over the weekend...

  11882   Mon Dec 14 23:56:29 2015 ericqUpdateCDSc1pem reverted

To get C1PEM data back into the frames, I removed the new BLRMS blocks, recompiled, reinstalled, re-enabled it in daqd, restarted.

We still really want more headroom in our framebuilder situation. 

  5549   Mon Sep 26 17:49:51 2011 KojiUpdatePSLc1psl

[Koji Suresh]

c1psl has got frozen during our ezcaread/write business.
After the target was rebooted and we lost the previous setting as there was no burt snapshot for the slow targets since Dec 13, 2010.

It seems that burtrestore is essential for the bootstrapping of the MC servo, as the auto locker script refers the locking parameters
from the PSL setting values (C1PSL_SETTINGS_SET.adl).

Jenne is working on the recovery of the snap-shotting for the slow targets.

  15238   Mon Mar 2 16:29:40 2020 gautamUpdateElectronicsc1psl VME crate removed, Acro-crate installed

[JV, JWR, YD, GV]

  • The old c1psl VME crate, and all the ribbon cables connected to it were removed from 1X1. They are presently dumped in the office area - we will clear these in the next few days, once the c1iool0 crate also gets removed from the rack.
  • The Acromag crate was capped on the top and bottom, had ears bolted on, and was installed on support rails in the newly cleared up space.
  • The strange orientation of the crate (with the intended backside facing the front of the rack) is to facilitate easy access to the "spare" channels we have in this box, e.g. for a future ISS or laser amplifier.
  • Remaining connections to make are (these will be done tomorrow along with the extrication of the c1iool0 VME crate):
    • PMC trans PD
    • FSS RMTEMP 
    • PSL shutter
    • 2W Mephisto diagnostic connector
    • 24 V DC from Sorensens via DIN connector (we are waiting on a new power cable to arrive).
  15194   Thu Feb 6 21:54:13 2020 JonUpdatePSLc1psl bench testing complete

Today I engineered the last piece of the new c1psl system: the multi-bit binary output (mbbo) channels that control the MC servo board gains. These 6-bit channels have to be split across two 4-bit Acromag registers. To enforce synchronous switching, I adapted the latch.py script developed by Gautam to address this problem in c1iscaux. Analogously to the c1iscaux implementation, I scripted the code to automatically run as a systemd service which is launched by the main modbusIOC service. I tested this all using the DB37 LED test board and confirmed it to work.

This now completes the electronics bench testing.

There are still several DB37 connectors to be wired, which carry only spare channels for future use and are not interfaced with the EPICS IOC. Jordan and I discussed this today and he or Chub will complete it shortly. To allow time for the spare channel wiring to be completed (as well as for more locking progress before interruption), Gautam and I think Monday/Tuesday next week would be the earliest possible window to install the new system.

  12852   Fri Feb 24 20:38:01 2017 johannesUpdateComputersc1psl boot-stall culprit identified

[Gautam, Johannes]

c1psl finally booted up again, PMC and IMC are locked.

Trying to identify the hickup from the source code was fruitless. However, since the PMCTRANSPD channel acqusition failure occured long before the actual slow machine crashed, and since the hickup in the boot seemed to indicate a problem with daughter module identification, we started removing the DIO and DAQ modules:

  1. Started with the ones whose fail LED stayed lit during the boot process: the DIN (XVME-212) and the three DACs (VMIVME4113). No change.
  2. Also removed the DOUT (XVME-220) and the two ADCs (VMIVME 3113A and VMIVME3123). It boots just fine and can be telnetted into!
  3. Pushed the DIN and the DACs back in. Still boots.
  4. Pushed only VMIVME3123 back in. Boot stalls again.
  5. Removed VMIVME3123, pushed VMIVME 3113A back in. Boots successfully.
  6. Left VMIVME3123 loose in the crate without electrical contact for now.
  7. Proceeded to lock PMC and IMC

The particle counter channel should be working again.

  • VMIVME3123 is a 16-Bit High-Throughput Analog Input Board, 16 Channels with Simultaneous Sample-and-Hold Inputs
  • VMIVME3113A is a Scanning 12-Bit Analog-to-Digital Converter Module with 64 channels

/cvs/cds/caltech/target/c1psl/psl.db lists the following channels for VMIVME3123:

Channels currently in use (and therefore not available in the medm screens):

  • C1:PSL-FSS_SLOW_MON
  • C1:PSL-PMC_PMCERR
  • C1:PSL-FSS_SLOWM
  • C1:PSL-FSS_MIXERM
  • C1:PSL-FSS_RMTEMP
  • C1:PSL-PMC_PMCTRANSPD

Channels not currently in use (?):

  • C1:PSL-FSS_MINCOMEAS
  • C1:PSL-FSS_RCTRANSPD
  • C1:PSL-126MOPA_126MON
  • C1:PSL-126MOPA_AMPMON
  • C1:PSL-FSS_TIDALINPUT
  • C1:PSL-FSS_TIDALSET
  • C1:PSL-FSS_RCTEMP
  • C1:PSL-PPKTP_TEMP

There are plenty of channels available on the asynchronous ADC, so we could wire the relevant ones there if we done care about the 16 bit synchronous sampling (required for proper functionality?)

Alternatively, we could prioritize the Acromag upgrade on c1psl (DAQ would still be asynchronous, though). The PCBs are coming in next Monday and the front panels on Tuesday.

 

 

Some more info that might come in handy to someone someday:

The (nameless?) Windows 7 laptop that lives near MC2 and is used for the USB microscope was used for interfacing with c1psl. No special drivers were necessary to use the USB to RS232 adapter, and the RJ45 end of the grey homemade DB9 to RJ45 cable was plugged into the top port which is labeled "console 1". I downloaded the program "CoolTerm" from http://freeware.the-meiers.org/#CoolTerm, which is a serial protocol emulator, and it worked out of the box with the adapter. The standard settings fine worked for communicating with c1psl, only a small modification was necessary: in Options>Terminal make sure that "Enter Key Emulation" is set from "CR+LF" to "CR", otherwise each time 'Enter' is pressed it is actually sent twice.

  15150   Thu Jan 23 23:07:04 2020 JonConfigurationPSLc1psl breakout board wiring

To facilitate wiring the c1psl chassis and scripting loopback tests, I've compiled a distilled spreadsheet with the Acromag-to-breakout board wiring, broken down by connector. This information is extractable from the master spreadsheet, but not easily. There were also a few apparent typos which are fixed here.

The wiring assignments at the time of writing are attached below. Here is the link to the latest spreadsheet.

  15117   Mon Jan 13 15:47:37 2020 shrutiConfigurationComputer Scripts / Programsc1psl burt restore

[Yehonathan, Jon, Shruti]

Since the PMC would not lock, we initially burt-restored the c1psl machine to the last available shapshot (Dec 10th 2019), but it still would not lock.

Then, it was burt-restored to midnight of Dec 1st, 2019, after which it could be locked.

  13742   Mon Apr 9 23:28:49 2018 johannesConfigurationDAQc1psl channel list

I made a list of all the physical c1psl channels to get a better idea for how many acromags we need to replace it eventually. There  3123 unit is the one whose failure had prevented c1psl from booting, which is why it was unplugged (elog post 12852), and its channels have been inactive since. Are the 126MOPA channels used for the current mephisto? 126 tells me it's for an old lightwave laser, but I was checking a few and found that they have non-zero, changing values, so they may have been rewired.

It also hosts some virtual channels for the ISS with root C1:PSL-ISS_ defined in iss.db and dc.db, the PSL particle counter with root C1:PEM- defined in PCount.db  and a whole lot of PSL status channels defined in pslstatus.db. Transfering these virtual channels to a different machine is almost trivial, but the serial readout of the particle counter would have to find a new home.

Long story short - we need:

Function Type # Channels #Channels (no MOPA) # Units # Units (no MOPA)
ADC XT1221 34 21 5 3
DAC XT1541 17 14 3 2
BIO XT1111 19 10 2 1

 



3113 - ADC

C1:PSL-126MOPA_126PWR
C1:PSL-126MOPA_DTMP
C1:PSL-126MOPA_LTMP
C1:PSL-126MOPA_DMON
C1:PSL-126MOPA_LMON
C1:PSL-126MOPA_CURMON
C1:PSL-126MOPA_DTEC
C1:PSL-126MOPA_LTEC
C1:PSL-126MOPA_CURMON2
C1:PSL-126MOPA_HTEMP
C1:PSL-126MOPA_HTEMPSET
C1:PSL-FSS_RFPDDC
C1:PSL-FSS_LODET
C1:PSL-FSS_FAST
C1:PSL-FSS_PCDRIVE
C1:PSL-FSS_MODET
C1:PSL-FSS_VCODETPWR
C1:PSL-FSS_TIDALOUT
C1:PSL-PMC_RFPDDC
C1:PSL-PMC_LODET
C1:PSL-PMC_PZT
C1:PSL-PMC_MODET


3123 - ADC (failed)

C1:PSL-126MOPA_AMPMON
C1:PSL-126MOPA_126MON
C1:PSL-FSS_RCTRANSPD
C1:PSL-FSS_MINCOMEAS
C1:PSL-FSS_RMTEMP
C1:PSL-FSS_RCTEMP
C1:PSL-FSS_MIXERM
C1:PSL-FSS_SLOWM
C1:PSL-FSS_TIDALINPUT
C1:PSL-PMC_PMCTRANSPD
C1:PSL-PMC_PMCERR
C1:PSL-PPKTP_TEMP


4116 - DAC

C1:PSL-126MOPA_126CURADJ
C1:PSL-126MOPA_DCAMP
C1:PSL-126MOPA_DCAMP-
C1:PSL-FSS_INOFFSET
C1:PSL-FSS_MGAIN
C1:PSL-FSS_FASTGAIN
C1:PSL-FSS_PHCON
C1:PSL-FSS_RFADJ
C1:PSL-FSS_SLOWDC
C1:PSL-FSS_VCOMODLEVEL
C1:PSL-FSS_TIDAL
C1:PSL-FSS_TIDALSET
C1:PSL-PMC_GAIN
C1:PSL-PMC_INOFFSET
C1:PSL-PMC_PHCON
C1:PSL-PMC_RFADJ
C1:PSL-PMC_RAMP


XVME-210 - Binary Input

C1:PSL-126MOPA_FAULT
C1:PSL-126MOPA_INTERLOCK
C1:PSL-126MOPA_SHUTTER
C1:PSL-126MOPA_126LASE
C1:PSL-126MOPA_AMPON


XVME-220 - Binary Output

C1:PSL-126MOPA_126NE
C1:PSL-126MOPA_126STANDBY
C1:PSL-126MOPA_SHUTOPENEX
C1:PSL-126MOPA_STANDBY
C1:PSL-FSS_SW1
C1:PSL-FSS_SW2
C1:PSL-FSS_FASTSWEEP
C1:PSL-FSS_PHFLIP
C1:PSL-FSS_VCOTESTSW
C1:PSL-FSS_VCOWIDESW
C1:PSL-PMC_SW1
C1:PSL-PMC_SW2
C1:PSL-PMC_PHFLIP
C1:PSL-PMC_BLANK

  15253   Wed Mar 4 22:38:31 2020 JonUpdatePSLc1psl communications problem resolved

I investigated the problem reported earlier today with the BIO1 channels. By logging the systemd messages generated when the IOC starts, I was immediately able to determine that the problem was not limited to BIO1. The modbus communications were failing for several other units as well.

Because some in-situ rewiring of a handful of channels had recently been done (more on this soon), I initially suspected that one of the Acromags had been damaged in the process. However, removing BIO1 (or other non-communicating modules) did not restore communications with the rest of the modules. To test whether the chassis was the source of the problem at all, we set up a fresh ADC (new out of the package) and directly connected it to the secondary Ethernet interface of c1psl. With only the one new ADC connected, the modbus IOC failed in exactly the same way.

To confirm that the new ADC did in fact work, we connected it to c1auxex in the same configuration. The unit worked fine connected to c1auxex. This established that the source of the problem was the c1psl host. After some extensive debugging, I traced the problem to a pre-execution script (part of the modbus IOC systemd service) which resets the secondary network interface (the one connected to the Acromag chassis) prior to launching the IOC. This was to ensure the secondary interface always had the correct IP address. It appears this reset was somehow creating a race condition that allowed the modbus initializations (first communications with the Acromags) to sometimes start before the network interface had actually come back up.

I still don't understand how this was happening, or why the pre script worked just fine up until yesterday, but eliminating the network interface reset fixes the problem in 100% of the trials we ran. Unfortunately we lost the entire day to debugging this problem, so the final round of testing is still to be completed. We plan to pick it back up tomorrow afternoon.

  14817   Tue Jul 30 09:13:31 2019 gautamUpdatePSLc1psl keyed, Agilent setup cleared
  1. IMC would not lock. c1psl EPICS channels were unresponsive. I keyed the crate and went through the usual burtrestore/PMC-relocking dance.
  2. While at 1X2, I decided to take this opportunity to clean up the AG4395 setup that has been setup there unused for several weeks now.
    • Unplugged the active probe connected via BNC-T connector to the mixer IF output.
    • Noticed that the active probe (S/N 2850J01450) did not have it's power connection connected. According to the manual, this is bad. I don't know if the probe is damaged or not.
    • Moved the AG4395 cart out of the way so that there is a little more room around 1X1/1X2.
ELOG V3.1.3-