Currently the c1scy, c1mcs, and c1rfm models are reporting an error with receiving some data sent over the GE Fanuc Reflected memory cards.
To be more exact, the C1:SUS-ETMY_ALS signal from the c1gcv FE code on the c1ioo computer going too the Y end is not being received. However, the C1:SUS-ETMY_LSC signal is. So the physical RFM card seems to be working.
Similarly, the TRY signal is being sent correctly from the Y end computer. The X end is working fine and receiving both LSC and ALS signals.
The c1mcs and c1rfm models also receive data from the c1ioo computer and reporting receiving errors.
Because the RFM cards are transmitting and receiving at least some channels, I'm guessing there was changes made to the C1.ipc file, which defines the memory locations of these various channels on the RFM network, and that when a model was rebuilt, a different one using the previous IPC file was not, and thus one of the computer is going to the wrong place to either read or write data.
Tomorrow, I'm planning on the following:
1) Clean out the C1.ipc file (/opt/rtcds/caltech/c1/chans/ipc/)
2) Rebuild all models
3) Run activate_daq.py script
4) Restart models via script
If this doesn't clear up the problem, I'll continue to bug hunt.
This problem resurfaced, which I noticed when I couldn't get the single arm locks going.
The fix was NOT restarting the c1rfm model, which just brought the misery of all vertex FEs crashing and the usual dance to get everything back.
Restarting the sender models (i.e. c1scx and c1scy) seems to have done the trick though.
Attachment #1 shows that the c1rfm model isn't able to receive any signals from the front end machines at EX and EY. Attachment #2 shows that the problem appears to have started at ~430am today morning - I certainly wasn't doing anything with the IFO at that time.
I don't know what kind of error this is - what does it mean that the receiving model shows errors but the sender shows no errors? It is not a new kind of error, and the solution in the past has been a series of model reboots, but it'd be nice if we could fix such issues because it eats up a lot of time to reboot all the vertex machines. There is no diagnostic information available in all the places I looked. I'll ask the CDS group for help, but I'm not sure if they'll have anything useful since this RFM technology has been retired at the sites (?).
In the meantime, arm cavity locking in the usual way isn't possible since we don't have the trigger signals from the arm cavity transmission.
Update 1500 4 Oct: soft reboots of models didn't do the trick so I had to resort to hard reboots of all FEs/expansion chassis. Now the signals seem to be okay.
I'm starting the model restarts from remote. Then later I'll show up in the lab to do more hard resets.
==> It seems that the RFM errors are gone. Here are the steps.
As I was working on the IFO re-alignment just now, the rfm errors popped up again. I don't see any useful diagnostics on the web interface.
Do we want to take this opportunity to configure jumpers and set up the rogue master as Rolf suggested? Of course there's no guarantee that will fix anything, and may possibly make it impossible to recover the current state...
Most of the RFM went red this morning. I took the nuclear option and it seemed to be recovered.
We have too much crap in the rfm model. CPU time for the rfm model is regularly above 60us, and sometimes in the mid-70's (but sometimes jumps down briefly to ~47us, which is where I think it "used" to sit, but I don't remember when I last thought about that number)
This is potentially causing lots of asynchronous grief.
I noticed yesterday evening that I wasn't able to engage the single arm locking servos - turned out that they weren't getting triggered, which in turn pointed me to the fact that the arm transmssion channels seemed dead. Poking around a little, I found that there was a red light on the CDS overview screen for c1rfm.
Not sure how to debug further...
* Fix seems to be to restart the sender RFM models (c1scx, c1scy, c1asx, c1asy).
I wanted to lock the single arm POX/POY config to do some tests on the BeatMouth. But I was unable to.
Not sure what to make of all this, but I can lock the arms now.
There were red lights on the status screen indicating RFM errors for the c1scy, c1mcs and c1rfm processes.
The c1iscey, c1sus machines were receiving data sent over the RFM network from the c1ioo computer with a bad time stamp, a few cycles too late. The c1iscex computer was receiving data from c1ioo fine.
The c1iscex RFM card had gotten into a bad state and was somehow slowing things down/corrupting data. It didn't affect itself, but due to the loop topology was messing everyone else up. Basically the only one who wasn't throwing an error was the culprit.
Hard power cycling the c1iscex computer reset the RFM card and fixed the problem.
As noted by Steve, the RFM network was down this morning. I noticed that c1susvme1 sync counter was pegged at 16384, so I decided to start with reboots in that viscinity.
After power cycling crates containing c1sosvme, c1susvme1, and c1susvme2 (since the reset buttons didn't work) only c1sosvme and c1susvme2 came back normally. I hooked up a monitor and keyboard to c1susvme1, but saw nothing. I power cycled the c1susvme crate again, and this time I watched it boot properly. I'm not sure why it failed the first time.
The RFM network is now operating normally. I have re-enabled the watchdogs again after having turned them off for the reboots. Steve and I also re-enabled the ITMY coil drivers when I noticed them not damping once the watch dogs were re-enabled. The manual switches had been set to disabled, so we re-enabled them.
I am not so happy with the control signals that are coming into the OAF via the RFM/Dolphin/shmem.
The MCL/MCF signal travels via RFM from the IOO computer to the RFM model on the SUS computer, and then via dolphin to the OAF model on the LSC computer.
The MICH and PRCL signals travel via shmem from the LSC model to the OAF model, all on the LSC computer. They don't go through the RFM model.
The seismometer channels travel via shmem between the PEM model on the SUS computer and the RFM model on the SUS computer, and then via dolphin between the SUS computer and the OAF model on the LSC computer.
Each pdf shows the power spectrum and a time series of the signals in their "original" model, and in the OAF model. The seismometer is the only one that seems fine. The time series match, except for a delay which is not surprising, since the signals have to travel. The other signals seem pretty distorted. What is going on??? Why can we trust some, but not all, of the signals that move between models and between computers???
(This data was all taken while the MC was locked, but MICH and PRCL were not. I don't think this should have any effect on the signal transfer though).
The MCL isn't soooo bad, so maybe we can keep moving forward with it, but I'm concerned that we're not really going to be successful OAF-ing the other degrees of freedom if the signals are so distorted.
Each RFM memory location which needs to be read by a front end model slows the model significantly.
With no RFM memory locations to be read (replaced with grounds), the c1mcs model runs around 25 microseconds per cycle.
With 1 RFM memory location (MC_L), it runs around 29-33 microseconds.
With 3 RFM memory locations (MC_L, MC1_PIT, MC1_YAW), it runs around 45 microseconds.
With 7 RFM memory locations, the code generally doesn't run at all, going past the 62 microsecond maximum required to be able to keep up with the 16 kHz sample rate.
Last night Yuta somehow got it running with 7 RFM memory locations, but in that case, all the odd numbered RFM channels (1,3,5 as counted by the ipc file) did not work. It was running at around 55 microseconds in that case.
The c1ioo code which is writing the data to the RFM card is experiencing no such slow down.
Current CDS status:
I suspect what happened here is that the IP didn't get updated when we went from the 131.215.113.xxx system to 192.168.113.xxx system. I fixed it now and can access the web interface. This system is now ready for remote debugging (from inside the martian network obviously). The IP is 192.168.113.90.
Managed to pull this operation off without crashing the RFM network, phew.
BTW, a windows laptop that used to be in the VEA (I last remember it being on the table near MC2 which was cleared sometime to hold the spare suspensions) is missing. Anyone know where this is ?
Kiwamu and I strung a temporary RFM fiber from the c1iscex machine (in the new 1X9 rack) to the c1sus machine (in the new 1X4 rack). This was connected into the respective RFM cards. Once we put the fiber in correctly, the status lights came on the RFM card, which is a good sign. This did not go through the RFM bypass, and did not interfere with any other RFM connections.
We created a simple model to test the RFM card, which basically was 4 RFM memory locations passing back and forth between 2 filters on each machine. These models were called c1rf0 (on c1sus) and c1rf1 (on c1iscex). We added 4 entries to the /cvs/cds/caltech/chans/ipc/C1.ipc file corresponding to the 4 RFM memory locations, set their ipcType=RFM and set the ipcRate to 65536. The ipcNum were set from 0 to 3. The models ran, however, the data we were trying to pass over the RFM card did not seem to be being passed. Currently trying to contact Alex via e-mail to get debugging advice, and confirm the ipc file is setup correctly.
Since the RFM-Dolphin bridges for the ASX model was added to the c1rfm model, c1rfm kept timing-out from the single sample time of 60us.
The model had 19 dolphin accesses, 21 RFM accesses, and 9 shared memory (SHM) accesses.
At the beginning 2 RFM and 2 SHM accesses were moved to c1sus (i.e. they were mistakenly placed on c1rfm).
But this actually made the c1sus model timed out. So the model was reverted.
The current configuration is that the WFS related bridges were accommdated in the c1mcs model.
This made the timing of c1rfm ~40us. So it is safe now.
On the other hand, the c1mcs model has the time consumption of ~59us. This is marginal now.
We need to understand why any RFM access takes such huge delay.
The RFMnetwork is down. MC2 sus damping restored.
For the RF PD Frequency Response Measurement project, we get each PD signal from the "PD RF Mon" output of each demodulator board corresponding to our PD under test. Therefore we can't neglect the frequency response of various filters inside the demodulator board. I used our Agilent 4395 Network Analyzer to gather frequency response data for each demodulator board being considered for the RFPD frequency response project (AS55, REFL11, REFL33, REFL55, REFL165, POX11, POP22, POP110).
The NA swept over a frequency range of 1-500 MHz. Data was collected using NWAG4395A (from the netgpibdata directory). It should be noted that the command line options -a 16 -x 15 (averaging=16 and excitation amplitude=15 dBm[the max]), in addition to the usual command line options described in the help file, were used to minimize noise.
The data is located in /users/alex.cole. The file names are in the format [PDNAME]DemodFilt_1000000.dat (e.g. REFL11DemodFilt_1000000.dat). Results for POP110 are shown below.
While the vacuum system was knocked out, I measured the RF transimpedance (using the AM laser setup, didn't do the shot noise intercept current measurement for now) of all the RFPDs (except PMC REFL). At the very least, the following photodiodes are suspect:
For the remaining photodiodes, I measure a transimpedance that is within ~20% of what is on the wiki page. The notches may benefit from some retuning. While I have the data, I will fit this and post a more complete report on the wiki.
Update July 6 1145am: WFS response plots now have legends mapping quadrants, and I've also added the response of a spare PDA10CF (which is now the new POP22/POP110 photodiode).
A more comprehensive report has been uploaded here. I'll zip the data files and add them there too. In summary:
I'll upload the data and analysis notebook + liso fit files to the wiki as well shortly. The data, a Jupyter notebook making the plots, and the LISO fit files have been uploaded here.
I didn't do it this time but it'd be nice to also do the noise measurement and get an estimate for the shot-noise intercept current.
While I have the data, I will fit this and post a more complete report on the wiki.
For future reference, I've taken spectra of our various RFPDs while the PRMI was sideband locked on REFL33, using a 20dB RF coupler at the RF input of the demodulator boards. The 20dB coupling loss has been added back in on the plots. Data files are attached in a zip.
I also completely removed the cabling for REFLDC -> CM board, since it doesn't look like we plan on using it anytime in the immediate future.
After some discussion with Koji, I've asked Steve to order some SBP-30+ bandpass filters as a quick and cheap way to help out REFL33. (Also some SBP-60+ for 55MHz, since we only have 1*fmod and 2*fmod bandpasses here in the lab).
The nonlinearity in the LSC detection chain (cf T050268) comes from the photodetector and not the demod board. The demod board has low pass or band pass filters which Suresh installed a long time ago (we should check out what's in REFL33 demod board).
Inside the photodetector the nonlinearity comes about because of photodiode bias modulation (aka the Grote effect) and slew rate limited distortion in the MAX4107 preamp.
In this file (under Tommy), we have a notebook which runs through a spectrum of frequencies and determines the gain response of the attached filter. Below we have the output of a high pass filter. We use IQ demodulation to change IQ componets to DC. Then using a butterworth filter, we read out the DC components and determine the gain's magnitude and phase. However, the phase seems very noisy. This is because the oscillators in the different tiles are independent and a random phase is introduced by changing the mixer frequency in individual tiles. To resolve this we need Multi Tile Synchronization or "MTS".
Original Pynq Support Forum Query: https://discuss.pynq.io/t/rfsoc-2x2-phase-measurement/3892
We also have the code to fit a resposne function using IIRregular, but this is not as useful without proper phase data.
In the "Tommy" sub folder, I created a new notebook called "SimpleToneGenerator". This tunes the DAC and ADC mixers to a single frequency and reads off the Time Series and Fourier components. We can alos easily check the demodulation scheme and implement butterworth filters to check their function.
[Paco, Chris Stoughton, Leo -- remote]
This morning Chris came over to the 40m lab to help us get the RFSoC board going. After checking out our setup, we decided to do a very basic series of checks to see if we can at least get the ADCs to run coherently (independent of the DACs). For this I borrowed the Marconi 2023B from inside the lab and set its output to 1.137 GHz, 0 dBm. Then, I plugged it into the ADC1 and just ran the usual spectrum analyzer notebook on the rfsoc jupyter lab server. Attachment #1 - 2 shows the screen captured PSDs for ADCs 0 and 1 respectively with the 1137 MHz peaks alright.
The fast ADCs are indeed reading our input signals.
Before this simple test, we actually reached out to Leo over at Fermilab for some remote assistance on building up our minimally working firmware. For this, Chris started a new vivado project on his laptop, and realized the rfsoc 2x2 board files are not included in it by default. In order to add them, we had to go into Tools, Settings and add the 2020.1 Vivado Xilinx shop board repository path to the rfsoc2x2 v1.1 files. After a little bit of struggling, uninstalling, reinstalling them, and restarting Vivado, we managed to get into the actual overlay design. In there, with Leo's assistance, we dropped the Zynq MPSoC core (this includes the main interface drivers for the rfsoc 2x2 board). We then dropped an rf converter IP block, which we customized to use the right PLL settings. The settings, from the System Clocking tab were changed to have a 409.6 MHz Reference Clock (default was 122.88 MHz). This was not straightforward, as the default sampling rate of 2.00 GSPS was not integer-related so we had to also update that to 4.096 GSPS. Then, we saw that the max available Clock Out option was 256 MHz (we need to be >= 409.6 MHz), so Leo suggested we dropped a Clocking Wizard block to provide a 512 MHz clock input for the rfdc. The final settings are captured in Attachment # 3. The Clocking Wizard was added, and configured on its Output Clocks tab to provide a Requested Output Freq of 512 MHz. The finall settings of the Clocking wizard are captured in Attachment #4. Finally, we connected the blocks as shown in Attachment #5.
We will continue with this design tomorrow.
To access the board remotely through the 40m lab ethernet port, use
ssh -N -L localhost:1137:localhost:9090 xilinx@<ip_address>
Then in the browser go to
Other SSH commands using different ports or without the -N -L seemed to fail to open Jupyter. This way has been successful thereafter.
Since last week I've worked with tommy on getting the RFSoC 2x2 board to get some TFs from simple minicircuits type filters. The first thing I did was set up the board (which is in the office area) for remote access. I hooked up the TCP/IP port to a wall ethernet socket (LIGO-04) and the caltech network assiggned some IP address to our box. I guess eventually we can put this behind the lab network for internal use only.
After fiddling around with the tone-generators and spectrum analyzer tools in loopback configuration (DAC --> ADC direct connection), we noticed that lower frequency (~ 1 MHz) signals were hardly making it out/back into the board... so we looked at some of the schematics found here and saw that both RF data converters (ADC & DAC) interfaces are AC coupled through a BALUN network in the 10 - 8000 MHz band (see Attachment #1). This is in principle not great news if we want to get this board ready for audio-band DSP.
We decided that while Tommy works on measuring TFs for SHP-200 all the way up to ~ 2 GHz (which is possible with the board as is) I will design and put together an analog modulation/demodulation frontend so we can upconvert all our "slow" signals < 1MHz for fast, wideband DSP. and demodulate them back into the audio band. The BALUN network is pictured in Attachment #2 on the board, I'm afraid it's not very simple to bypass without damaging the PCB or causing some other unwanted effect on the high-speed DSP.
Seems like it should be possible to just remove the transformer (aka as a BALUN ... BALanced, UNbalanced), or replace it with a lower frequency part. Its just a usual mini-circuits part. Maybe you can ask Chris Stoughton about this and ask Tommy to checkout some of the RFSoC user forums for how to go to DC.
Here are a few options for replacement BALUNs from Mini Circuits and specs:
Current. TCM1-83X+, 10-8000 MHz, 50 Ohms, Impedance Ratio 1, Configuration K
1. Z7550-..., DC-2500 MHz (some DC-2300), 50/75 Ohms, Impedance Ratio 1.5, Configuration Q. There are various types of the Z7550 which have different connectors (SMA and BNCs). These have much larger dimensions than the TCM1-83X. Can handle up to 5A DC current with matching loss 0.6 dB.
2. SFMP-5075+, DC-2500 MHz, 50/75 Ohms, Impedance Ratio 1.5, Configuration D. This is an SMA connected BALUN. It can handle 350mA, has a matching loss 0.4 dB, and has 1W power handling.
The Xilinx RFSoC 2x2 board arrived right before the winter break, so this is kind of an overdue elog. I unboxed it, it came with two ~15 cm SMA M-M cables, an SD card preloaded with the ARM processor and a few overlay jupyter notebooks, a two-piece AC/DC adapter (kind of like a laptop charger), and a USB 3.0 cable. I got a 1U box, lid, and assembled a prototype box to hold this board, but this need not be a permanent solution (see Attachment #1). I drilled 4 thru holes on the bottom of the box to hold the board in place. A large component exceeds the 1U height, but is thin enough to clear one of the thin slits at the top (I believe this is a fuse of some sort). Then, I found a brand new front panel, and drilled 4x 13/32 thru holes in the front for SMA F-F connectors.
I powered the board, and quickly accessed its tutorial notebooks, including a spectrum analyzer and signal generators just to quickly check it works normally. The board has 2 fast RFADCs and 2 RFDACs exposed, 12 and 14 bit respectively, running at up to 4 GSps.
We followed the manual's guide for setting up MTS to sync on external signal. In the xrfdc package, we update the RFdc class to have RunMTS, SysRefEnable, and SysRefDisable functions as prescribed on page 180 of the manual. Then, we attempted to run the new functions in the notebook and read the DAC signal outputs on an oscilloscope. The DACs were not synced. We were also unable to get FIFOlatency readings.
With some help from the forums, we printed the status of the DAC MTS sync and were able to determined that our board's vivado design does not have MTS enabled on each tile. To fix this, we will need to construct a new Vivado desgin for the board. We were also warned to "make sure to generate correctly a PL_clock and a PL_sysref with your on board clock synthesizers and to capture them in the logic according to the requirements in PG269" of the RF Manual. From this we should be able to sync the DAC and ADC tiles as desired.
Finished building power spectrum analyzer for the RFSoC. There are two things that I would like to address down the road. First is that there is an oscillation between positive and negative voltages at the ADC sampling frequency. This creates an undesirable frequency component at the sampling rate. I have not yet figured out the cause of this positive to negative oscillation and have simply removed half of the samples in order to recover the frequency. Therefore, I would like to figure out the root of this oscillation and remove it. Also, we have a decimation factor of 2 as default by the board which we would like to remove but have been unable to do so.
Example: 8 MHz Square Wave from SRL signal generator.
We connected a 8 MHz signal generator to the device in order to sync up the ADCs and DACs and hopefully get phase data.
Some things to note:
Xilinx RF Manual: https://docs.xilinx.com/v/u/2.4-English/pg269-rf-data-converter
Steve pointed out that in the aftermath of the Nitrogen running out a couple of times last week, the RGA had shut itself off thinking that there was a leak and so it was not performing the scheduled scans once a day. So the data files from the scheduled scans were empty in the /opt/rtcds/caltech/c1/scripts/RGA/logs directory. The wiki page for getting it up and running again is up-to-date, but the script RGAset.py did not exist on the c0rga machine, which the RGA is communicating with via serial port. I copied over the script RGAset.py from rossa to c0rga and ran the script on that machine - but the error flags it returned were not all 0 (indicating some error according to the manual) - so I edited the script to send just the initialize command ('IN0') and commented out the other commands, after which I got error flags which were all 0. After this, I ran a manual scan using 'RGAlogger.py', and it appears that the RGA is now able to take scans again - I'm attaching a plot of the scan results. We've saved this scan as a reference to compare against after a few days.
Our last RGA scan is from February 14, 2016 We had a power outage on the 15th
Gautom has not succeded reseting it. The old c0rga computer looks dead. Q may resurrect it, if he can?
Jordan and I, in order to start pumpig down the RGA Volume, we began by opening V7 and VM. Afterwards, we started RP1 and RP3. After this, the pressure in the line between RP1, RP3, and V6 dropped to 3.4 mTorr. Next, we tried to open V6, although an error message popped up. We haven't been able to erase it since. But we were able to turn on TP2 with V4 closed. The pressure in that line is reporting 1.4 mTorr.
PRP on the sitemap is giving off an incorrect pressure for the line between RP1, RP3, and V6. This is verified by the pressure by the control screen and the physical controller as well.
Jordan, Tega, JC
Issue has been resolved. Breaker on RP1 was tripped so the RP1 button was reporting ON, but was not actually on which continuously tripped the V6 interlock. Breaker was reset, RP1 and RP3 turned on. The V6 was opened to rough out the RGA volume. Once, pressure was at ~100mtorr, V4 was opened to pump the RGA with TP2. V6 was closed and RP1/3 were turned off.
RGA is pumping down and will take scans next week to determine if a bakeout is needed
Prior to venting the RGA volume on Tuesday (4/12/2022) I took an RGA scan of the volume to be vented (RGA+TP1 volume+Manual Gate Valve) to see if there was a difference after replacing the manual gate valve. Attached is the plot from 4/12/22, and an overlay plot to complare 4/12/22 to 12/10/2021, when the same volume was scanned with the old (defective) manual gate valve.
There is a significant drop in the ratio O2 compared the the nitrogen peak and reduced Argon (AMU 40) which indicates there is no longer a large air leak.
12/10/21 N2/O2 ratio ~ 4 (Air 78%N2 / 21%O2)
4/12/22 N2/O2 ratio ~ 10
There is one significant (above noise level) peak above AMU 46, which is at AMU 58. This could possibly be acetone (AMU 43 and 58) but overall the new RGA Volume scans look significantly better after the manual gate valve replacement. Well done!
It looks like the hardware reset did the trick. Previously, I had just tried ssh-ing into c0rga and rebooting it. At the time, however, Steve and I noticed that the various LEDs on the RGA unit weren't on, as they are supposed to be in the nominal operating state. Today, Steve reported that all LEDs except the RS232 one were on today, so I just tried following the steps in this elog again, looks like things are back up and running. I'm attaching a plot of the scan generated using plotrgascan MATLAB script, it looks comparable to the plot in elog 11697, which if I remember right, was acceptable.
Unless there is some reason we want to keep this c0rga machine, I will recommission one of the spare Raspberry Pis lying around to interface with the RGA scanner when I get the time...
The c0rga computer was off, I turned it on via front panel button. After running RGAset.py, RGAlogger.py seems to run. However, there are error messages in the output of the plotrgascan MATLAB script; evidiently there are some negative/bogus values in the output.
I'll look into it more tomorrow.
This is a cold scan.
RGA background at day 12 of this vent . The maglev is pumping on the rga through VM2
RGA background with VM2 open to Maglev at day 37
Note: The PAN gauge of the annulos is at atm. Please do not vent this 200 ft long annulos line when you venting the annulos of a chamber. The chamber annulos should be closed off to this long 2" OD. pipe before you vent the annulos of a chamber.
The RGA time stamp was correct last at 20140527
Rga stopped scanning at 20140530