The output stage (and AO GAIN stage) of the MC board was modelled. The transfer function was measured with the injection from EXC B. The denominator was TESTB2, and the numerator was SERVO OUT.
This stage is AC coupled by 2x 1st order HPFs. Firstly, this transfer function was measured with AO GAIN set to be 0dB. (Attachment 1)
This TF was used to characterize the cutoffs of the HPF stages, represented as the following ZPK:
Then the AO GAIN was already measured as seen in [ELOG 14948]. The AO gain TF was then modeled by LISO with the above HPF as the preset. This allows us to characterize the time delay of the AO GAIN part.
Input referred offsets on the IN1/IN2 were tested with different gain settings. The two inputs were plugged by the 50 ohm terminators. The output was monitored at OUT1 (SLOW Length Output). The fast path is AC coupled and has no sensitivity to the offset.
There is the EPICS monitor point for OUT1. With the multimeter it was confirmed that the EPICS monitor (C1:LSC-CM_REFL1_GAIN) has the right value except for the opposite sign because the output stage of OUT1 is inverting. The previous stages have no sign inversion. Therefore, the numbers below does not compensate the sign inversion.
Attachment 1 shows the output offset observed at C1:LSC-CM_REFL1_GAIN. There is some gain variation, but it is around the constant offset of ~26mV. This suggested that the most of the offset is not from the gain stages but from the later stages (like the boost stages). Note that the boost stages were turned off during the measurements.
Attachment 2 shows the input refered offset naively calculated from the above output offset. In dependent from which path was used, the offset with low gain was hugely enhanced.
Since the input referred offset without subtracting the static offset seemed useless, a constant offset of -26mV was subtracted from the calculation (Attachment 2). This shows that the input refered offset can go up to ~+/-20mV when the gain is up to -16dB. Above that, the offset is mV level.
I don't think this level of offset by whichever OP27 or AD829 becomes an issue when the input error signal is the order of a volt.
This suggests that it is more important to properly set the internal offset cancellation as well as to keep the gain setting to be high.
Updated Circuit Diagram and photos: https://dcc.ligo.org/D1500308-v2
- (1) and (6) of the diagram: TFs with various gain slider values for REFL1/REFL2/AO GAIN [ELOG 14948] (gain values and time delay modeling)
- Switching checks, latest photo of the board, Limiter check [ELOG 14953]
- (2): Boost transfer functions [ELOG 14955]
- (3): Slow (aka Length) CM output path [ELOG 14965]
- (4): Pole-Zero filter TF [ELOG 14965]
- (5): TF from TESTA2 to TESTB2 [ELOG 14966]
- (6): AC coupling TF of the AO GAIN stage [ELOG 14967]
- (7): AC coupling TF of the IN2 stage on IMC servo board [ELOG 15044]
Slow path = (1)*(2 if necessary)*(3)*(4 if necessary)
Fast path = (1)*(2 if necessary)*(4 if necessary)*(5)*(6)
gautam 20191122: Adding the measured AC coupling of the IN2 input of the IMC servo board for completeness.
I wanted to restart the c1oaf model. As usual, the first time the model was restarted, it came back online with a 0x2bad error. This isn't even listed in the diagnostics manual as one of the recognized error states (unless there is a typo and they mean 0x2bad when they say 0xbad). The fix that has worked for me is to stop and start the model again, but of course, there is some chance of taking all the vertex FEs down in the process. No permutation of mxstream and daqd process restarts have cleared this error. We need some CDS/RCG support to look into this issue and fix it, it is not reasonable to go through reboots of all the vertex FEs every time we want to make a model change.
Jon and I were surveying the CDS situation so that he can prepare a report for discussion with Rolf/Rich about our upcoming BHD upgrade. In our poking around, we must have bumped something somewhere because the c1ioo machine went offline, and consequently, took all the vertex models out. I rebooted everything with the reboot script, everything seems to have come back smoothly. I took this opportunity to install some saturation counters for the arm servos, as we have for the CARM/DARM loops, because I want to use these for a watch script that catches when the ALS loses lock and shuts stuff off before kicking optics around needlessly. See Attachment #1 for my changes.
I have been experiencing frequent crashes of DTT on pianosa in the past few weeks. This is pretty annoying to deal with when trying to characterize the interferometer loops. I attach the error log dumped to console. The error has to do with some kind of memory corruption. Recall that we aren't using a GDS version that is packaged with the SL7 lscsoft packages, we are using a pretty ancient (2.15) version that is built from source. I have been unable to build a newer version from source (though I didn't spend much time trying). pianosa is the only usable workstation at the moment, but perhaps someone can make this work on donatella / rossa for general improvement in quality of life.
After the CDSs crashed we run the rebootC1LSC.sh script.
The script is a bit annoying in that it requires entering the CDSs' passwords multiple times over the time it runs which is long.
The resulting CDS screen is a bit different than what was reported before (attached). Also, not all watchdogs were restored.
We restore the remaining watchdogs and do XARM locking. Everything seems to be fine.
It was way more annoying without a script and took longer than the 4 minutes it does now.
You can fix the requirement to enter password by changing the sshd settings on the FEs like I did for pianosa.
After running the script, you should verify that there are no red flags in the output to console. Yesterday, some of the settings the script was supposed to reset weren't correctly reset, possibly due to python/EPICS problems on donatella, and this cost me an hour of searching last night because the locking wasn't working. Anyway, best practise is to not crash the FEs.
I tried starting the c1oaf model, but got a DQ error (I want the option of running feedforward during locking even if the filters aren't particularly well tuned yet). Note that this isn't "just a warning light" - some channels are initialized to +/- 1e20, so if you try turning some filters on, you will deliver a massive kick to the optics. Restarting it crashed c1lsc (this is not unexpected behavior - the only way to clear the DQ error is to restart the model, and empirically, the success rate is ~50%). The reboot script brought everything back online smoothly, and the second, time, c1oaf started without any issues.
While looking at the CDS overview screen, I noticed that the c1scy model was reporting frequent RFM errors for the C1:SCY-RFM_ETMY_LSC channel (but none of the others). On the sender model (c1rfm), no errors were being reported. The diag reset button / mxstream restart didn't really work either. See Attachment #1. Just restarting the c1scy model didn't fix the error - I had to reboot the machine and restart the models, and now no errors are being reported.
Attachment #2 shows the current nominal CDS status - the red light on c1lsc is due to some missing c1dnn channels (I'll remove these at the next c1lsc model change because I don't want to un-necessarily reboot the vertex FEs), and the c1omc model is obsolete I guess. c1daf isn't running right now but once I get the new fiber (ordered), I'm gonna restart this model as well.
P.S. The ALS temperature sliders are not SDF-ed. So when the model was restarted, I had to change the sliders back to their old values to get the beat back in the usable range.
Every new year (on Dec 31 or Jan 1), all of the realtime models will report a "0x4000" error. This happens due to an offset to the GPStime driver not being updated. Here is how this can be fixed (slightly modified version of what was done at LASTI).
Steps to fix the DC errors:
/* 2019 had 365 days and no leap seconds */
pHardware->gpsOffset += 31536000;
/* 2019 had 365 days and no leap seconds */
pHardware->gpsOffset += 31536000;
sudo make install
sudo systemctl daqd_* stop
sudo modprobe -r symmetricom
sudo modprobe symmetricom
sudo service daqd_* start
Independent of this, there is a 1 second offset between the gpstimes reported by /proc/gps and gpstime. However, this doesn't seem to drift. We had effected a static offset to correct for this in the daqd config files, and it looks like these do not need to be updated on a yearly basis. All the daqd indicators are now green, see Attachment #1.
Seems that the GPS is out of sync on donatella. We could not get any data from diaggui...
$TARGET_DIR = /cvs/cds/caltech/target
It remains to (Jon is taking care of these)
Channel list with test status
== Test Status ==
[done] Lock PMC and IMC
[done] IMC Servo board test
[done] IMC LO Det Mon channel check
[0th order] WFS quadrant DC mon
[none] WFS I/F monitors
[0th order] WFS attenuators
[none] IOO QPD channels
[done] FSS readbacks
[done] PMC readbacks
Some more detailed elogs about the individual tests will follow.
Basically, I have characterized the IMC Servo board in detail. The summary finding is that the IN2 (=AO gain) slider needs to be investigated.
All other channels need to be verified in a more thorough fashion than my basic checks which were just to guarantee the core interferometer functionality, which is important to me.
Had to reboot both end machines and the c1rfm model to get the TRX and TRY signals to the LSC models. Now both arms can be locked using POX/POY respectively.
There was some work done on the Acro crate this morning. Unclear if this is independent, but I found that the IMC servo board IN1 slider doesn't respond anymore, even though I had tested it and verified it to be working. Patient debugging showed that BIO1 (and only that acromag unit with the static IP 192.168.114.61) doesn't show up on the subnet in c1psl. Hopefully it's just a loose network cable, if not we will switch out the unit in the afternoon.
Jon is going to make a python script which iteratively pings all devices on the subnet and we will put this info on an MEDM screen to catch this kind of silent failure.
To debug a problem with the new c1psl (later elog), we needed a Supermicro EPICS server that was using the shared EPICS/modbus/asyn binaries rather than a local install. Of those available in the lab (c1iscaux, c1vac, c1susaux being the others), this was the only one which uses the shared install. So I
At which point Jon reset the software end, I restored the slow bias voltage and re-enabled the local damping. The optic seems to have damped okay. The Oplev spot is back in ~center of the QPD and the green beam can be locked to a TEM00 mode (so the alignment is okay - the IR beam is unavailable while c1psl issues are being sorted but I judge that things are back to the nominal state now).
I have made a wiring + channel list that need to be included in the new C!AUXEY Acromag.
It was mostly copied from C1AUXEX.
I ignored the IPANG channels since it is going to be removed from the table.
I'd like to re-measure the transfer function from driving MC2 position to the MC_L_DQ channel (for feedforward purposes). Swept sine would be one option, but I can't get the "Envelope" feature of DTT to work, the excitation amplitude isn't getting scaled as specified in the envelope, and so I'm unable to make the measurement near 1 Hz (which is where the FF is effective). I see some scattered mentions of such an issue in past elogs but no mention of a fix (I also feel like I have gotten the envelope function to work for some other loop measurement templates). So then I thought I'd try broadband noise injection, since that seems to have been the approach followed in the past. Again, the noise injection needs to be shaped around ~1 Hz to avoid knocking the IMC out of lock, but I can't get Foton to do shaped noise injections because it doesn't inherit the sample rate when launched from inside DTT/awggui - this is not a new issue, does anyone know the fix?
Note that we are using the gds2.15 install of foton, but the pre-packaged foton that comes with the SL7 installation doesn't work either.
The envelope feature for swept-sine wasn't working because i specified the frequency grid in the wrong order apparently. Eric von Reis has been notified to include a sorting algorithm in future DTT so that this can be in arbitrary order. fixing that allows me to run a swept sine with enveloped excitation amplitude and hence get the TF I want, but still no shaped noise injections via foton 😢
do you really mean awggui cannot make shaped noise injections via its foton text box ? That has always worked for me in the past.
If this is broken I'm suspicious there's been some package installs to the shared dirs by someone.
The problem is that foton does not inherit the model sample rate when launched from DTT/awggui. This is likely some shared/linked/dynamic library issue, the binaries we are running are precompiled presumably for some other OS. I've never gotten this to work since we changed to SL7 (but I did use it successfully in 2017 with the Ubuntu12 install).
I used Yehonathan's wiring assignments to lay the rest of groundwork for the final slow controls machine upgrade, c1auxey. Actions completed:
The "1" will be dropped after the new system is permanently installed.
Hardware-wise, this system will require:
I know that we do have these quantities left on hand. The next steps are to set up the Supermicro host and begin assembling the Acromag chassis. Both of these activities require an in-person presence, so I think this is as far as we can advance this project for now.
We want to migrate the end shutter controls from c1aux to the end acromags. Could you include them to the list if not yet?
This will let us remove c1aux from the rack, I believe.
Yehonathan's list does include C1:AUX-GREEN_Y_Shutter and I copied its definition from /cvs/cds/caltech/target/c1aux/ShutterInterlock.db into the new ETMYaux.db file.
I noticed ShutterInterlock.db still contains about a dozen channels. Some of them appear to be ghosts (like the C1:AUX-PSL_Shutter[...] set, which has since become C1:PSL-PSL_Shutter[...] hosted on c1psl) but others like C1:AUX-GREEN_X_Shutter appear to still be in active use.
Around 5pm local time, the three vertex FEs crashed. AFAIK, no one was in the lab or working on anything CDS related, so this is worrying.
The machine needed a hard reboot as it was un-ssh-able.
The exact time that the machine went down is unknown because the blinkys were not DQ-ed. I've now added these to the EDCU to make these channels actually useful, and we may look back on the reliability (or otherwise) of the Acromag system. To my memory, this is the ~5th time one of the new Acromag servers has needed a hard reboot. While this may be less frequent (?) than the VME machines, perhaps there is some other reason for these dropouts. Maybe something to do with the martian network?
Anyway the machine is back up and running now.
Here is the procedure for setting up the three new BHD front-ends (c1bhd, c1sus2, c1ioo - replacement). This plan is based on technical advice from Rolf Bork and Keith Thorne.
The overall topology for each machine is shown here. As all our existing front-ends use (obsolete) Dolphin PCIe Gen1 cards for IPC, we have elected to re-use Dolphin Gen1 cards removed from the sites. Different PCIe generations of Dolphin cards cannot be mixed, so the only alternative would be to upgrade every 40m machine. However the drivers for these Gen1 Dolphin cards were last updated in 2016. Consequently, they do not support the latest Linux kernel (4.x) which forces us to install a near-obsolete OS for compatibility (Debian 8).
See Attachment #1. J8 was connected to a "LASTI timing slave" sitting in the rack that Chiara lives in - we don't use this for anything and I confirmed that there was no effect on the RTCDS when I pulled that fiber out. The LASTI timing slave also had a blinky that was blinking when the fiber was plugged in - which I take to believe that the slot works.
Can we get away with just using these two available slots, J8 and J13? Do we really need three new expansion chassis?
I believe we will use two new chassis at most. We'll replace c1ioo from Sun to Supermicro, but we recycle the existing timing system.
That's great. I wonder if we can also get away with not adding new Dolphin infrastructure. I'd really like to avoid changing any IPC drivers.
The new dolphin eventually helps us. But the installation is an invasive change to the existing system and should be done at the installation stage of the 40m BHD.
I added the EPCIS channels for the c1omc model (gains, matrix elements etc) to the autoburt such that we have a record of these, since we expect these models to be running somewhat regularly now, and I also expect many CDS crashes.
That's great news we won't have to worry about a new timing fanout for the two new machines, c1bhd and c1sus2. And there's no plan to change Dolphin IPC drivers. The plan is only to install the same (older) version of the driver on the two new machines and plug into free slots in the existing switch.
I edited /diskless/root.jessie/home/controls/.bashrc so that I don't have to keep doing this every time I do a model recompile.
Where is this variable set and how can I add the new paths to it?
I had to make a CDS change to the c1lsc model in an effort to get a few more signals into the models. Rather than risk requiring hard reboots (typcially my experience if I try to restart a model), I opted for the more deterministic scripted reboot, at the expense of spending ~20mins to get everything back up and running.
Update 2230: this was more complicated than expected - a nuclear reboot was necessary but now everything is back online and functioning as expected. While all the CDS indicators were green when I wrote this up at ~1800, the c1sus model was having frequent CPU overflows (execution time > 60 us). Not sure why this happened, or why a hard power reboot of everything fixed it, but I'm not delving into this.
The point of all this was that I can now simultaneously digitize 4 channels - 2 DCPDs, and 2 demodulated quadratures of an RF signal.
Attachment #1 shows that the c1rfm model isn't able to receive any signals from the front end machines at EX and EY. Attachment #2 shows that the problem appears to have started at ~430am today morning - I certainly wasn't doing anything with the IFO at that time.
I don't know what kind of error this is - what does it mean that the receiving model shows errors but the sender shows no errors? It is not a new kind of error, and the solution in the past has been a series of model reboots, but it'd be nice if we could fix such issues because it eats up a lot of time to reboot all the vertex machines. There is no diagnostic information available in all the places I looked. I'll ask the CDS group for help, but I'm not sure if they'll have anything useful since this RFM technology has been retired at the sites (?).
In the meantime, arm cavity locking in the usual way isn't possible since we don't have the trigger signals from the arm cavity transmission.
Update 1500 4 Oct: soft reboots of models didn't do the trick so I had to resort to hard reboots of all FEs/expansion chassis. Now the signals seem to be okay.
I'm starting the model restarts from remote. Then later I'll show up in the lab to do more hard resets.
==> It seems that the RFM errors are gone. Here are the steps.
I am working on the setup of a CDS FE, so please do not attempt any remote login to the IPMI interface of c1bhd until I'm done.
I was able to boot one of the 3 new Supermicro machines, which I christened c1bhd, in a diskless way (with the boot image hosted on fb, as is the case for all the other realtime FEs in the lab). This is just a first test, but it is reassuring that we can get this custom linux kernel to boot on the new hardware. Some errors about dolphin drivers are thrown at startup but this is to be expected since the server isn't connected to the dolphin network yet. We have the Dolphin adaptor card in hand, but since we have to get another PCIe card (supposedly from LLO according to the BHD spreadsheet), I defer installing this in the server chassis until we have all the necessary hardware on hand.
I also have to figure out the correct BIOS settings for this to really run effectively as a FE (we have to disable all the "un-necessary" system level services) - these machines have BIOS v3.2 as opposed to the older vintages for which there are instructions from K.T. et al.
There may yet be issues with drivers, but this is all the testing that can be done without getting an expansion chassis. After the vent and recovering the IFO, I may try experimenting with the c1ioo chassis, but I'd much prefer if we can do the testing offline on a subnet that doesn't mess with the regular IFO operation (until we need to test the IPC).
As discussed at the meeting, I commenced the recovery of the CDS status at 1750 local time.
Single arm POX/POY locking was checked, but not much more. Our IMC WFS are still out of service so I hand aligned the IMC a bit, IMC REFL DC went from ~0.3 to ~0.12, which is the usual nominal level.
I suspect what happened here is that the IP didn't get updated when we went from the 131.215.113.xxx system to 192.168.113.xxx system. I fixed it now and can access the web interface. This system is now ready for remote debugging (from inside the martian network obviously). The IP is 192.168.113.90.
Managed to pull this operation off without crashing the RFM network, phew.
BTW, a windows laptop that used to be in the VEA (I last remember it being on the table near MC2 which was cleared sometime to hold the spare suspensions) is missing. Anyone know where this is ?
As I was working on the IFO re-alignment just now, the rfm errors popped up again. I don't see any useful diagnostics on the web interface.
Do we want to take this opportunity to configure jumpers and set up the rogue master as Rolf suggested? Of course there's no guarantee that will fix anything, and may possibly make it impossible to recover the current state...
Attached is the layout for the "intermediate" CDS upgrade option, as was discussed on Wednesday. Under this plan:
Existing FEs stay where they are (they are not moved to a single rack)
Dolphin IPC remains PCIe Gen 1
RFM network is entirely replaced with Dolphin IPC
Please send me any omissions or corrections to the layout.
I just want to point out that if you move all the FEs to the same rack they can all be connected to the Dolphin switch via copper, and you would only have to string a single fiber to every IO rack, rather than the multiple now (for network, dolphin, timing, etc.).
The CDS model change required to get the AS WFS signals into the RTCDS system are rather invasive.
In terms of computational load, the c1ioo model seems to be able to handle the extra load no issues - ~35us/60us per cycle. The RFM model shows no extra computational time.
After this work, the IMC locking and POX/POY locking, and dither alignment servos are working okay. So I have some confidence that my invasive work hasn't completely destroyed everything. There is some hardware around the rear of 1X2 that I will clear tomorrow.
Koji fixed the problematic channel - the issue was a bad solder joint on the input resistors to the THS4131. The board was re-installed. I also made a custom 2x4-pin LEMO-->DB9 cable, so we are now recording the PMC and FSS ERR/CTRL channel diagnostics again (spectra tomorrow). Note that Ch32 is recording some sort of DuoTone signal and so is not usable. This is due to a misconfiguration - ADC0 CH31 is the one which is supposed to be reserved for this timing signal, and not ADC1 as we currently have. When we swap the c1ioo hosts, we should fix this issue.
I also did most of the work to make the MEDM screens for the revised ASC topology, tried to mirror the site screens where possible. The overview screen remains to be done. I also loaded the anti-whitening filters (z:p 150:15) at the demodulated WFS input signal entry points. We don't have remote whitening switching capability at this time, so I'll test the switching manually at some point.
The main issue is that in the AA chassis I built, Ch14 (with the first channel as Ch1) has the output saturated to 28V (differential). I'm not sure what kind of overvoltage protection the ADC has - we frequently have the inputs exceed the spec'd +/-20 V (e.g. when the whitening filters are engaged and the cavity is fringing), but pending further investigation, I am removing the SCSI connection from the rear of the AA chassis.
Last night, I briefly spoke with Koji about the CDS upgrade plan. This is a follow up.
Each server needs a minimum of two peripheral devices added to the PCIe bus:
As for the second issue, the main question is if we had an open PCIe slot on the c1iscex machine to install a Dolphin card. Looks like the 2 standard slots are taken (see Attachment #1), but a "low profile" slot is available. I can't find what the exact models of the Supermicro servers installed back in 2010 are, but maybe it's this one? It's a good match visually anyways. The manual says a "riser card" is required. I don't know if such a riser is already installed.
Questions I have, Rolf is probably the best person to answer:
I don't have omnigraffle - what about uploading the source doc in a format that the excellent (and free) draw.io can handle? I think we can do a much better job of making this diagram reflect reality. There should also be a corresponding diagram for the Acromag system (but that doesn't have to be tied to this task). Megatron (scripts machine) and nodus should be added to that diagram as well.
I used an Acromag XT1221 in CTN to play around with different wiring and see what works. Following are my findings:
Floating Single Ended Source (Attachment 2):
Differential Source (Attachment 3):
Comments and suggestions are welcome.
Related elog posts:
Edit Tue Jan 26 12:44:19 2021 :
Note that the third wiring diagram mentioned actually does not work. It is an error in judgement. See 40m/15762 for seeing what happens during this.
Thanks for the systematic effort.
I'm working on a better wiring diagram that takes into account multiple power supplies, how their GND is passed forward to the circuits or sensors using those power supplies and what possible wiring configurations on Acromag would give low noise. I think I have two configurations in mind which I will test and update here with data and better diagrams.
I took some striptool images earlier yesterday. So I'm dumping them here for further comments or inferences.