Message ID: 2577
Entry time: Thu May 28 14:13:53 2020
In reply to: 2576
Reply to this: 2578

Author:

anchal

Type:

DailyProgress

Category:

NoiseBudget

Subject:

Bayesian Analysis

I'm listing first few comments from Jon that I implemented:

Data cleaning can not be done by looking at the data itself. Some outside knowledge can be used to clean data. So, I removed all the empirical cleaning procedures and instead just removed frequency bins of 60 Hz harmonics and their neighboring bins. With HEPA filters off, the latest data is much cleaner and the peaks are mostly around these harmonics only.

I removed the neighboring bins of 60 Hz harmonics as Jon pointed out that PSD data points are not independent variables and their correlation depends on the windowing used. For Hann window, immediate neighbors are 50% correlated and the next neighbors are 5%.

The Hard ceiling approach is not correct because the likelihood of a frequency bin data point gets changed due to some other far away frequency bin. Here I've plotted probability distributions with and without hard ceiling to see how it affects our results.

Bayesian Analysis (Normal):

with shear loss angle taken from Penn et al. which is 5.2 x 10^{-7}. The limits are 90% confidence interval.

Note that this allows estimated noise to be more than measured noise in some frequency bins.

Bayesian Analysis (If Hard Ceiling is used):

with shear loss angle taken from Penn et al. which is 5.2 x 10^{-7}. The limits are 90% confidence interval.

Remaining steps to be implemented:

There are more things that Jon suggested which I'm listing here:

I'm trying to catch next stable measurement with saving the time series data.

The PSD data points are not normal distributed since "PSD = ASD^2 = y1^2 + y2^2. So the PSD is the sum of squared Gaussian variables, which is also not Gaussian (i.e., if a random variable can only assume positive values, it's not Gaussian-distributed)."

So I'm going to take PSD for 1s segements of data from the measurement and create a distribution for PSD at each frequency bin of interest (50Hz to 600 Hz) at a resolution of 1 Hz.

This distribution would give a better measure of likelihood function than assuming them to be normal distributed.

As mentioned above, neighboring frequency bins are always correlated in PSD data. To get rid of this, Jon suggested following

"the easiest way to handle this is to average every 5 consecutive frequency bins.

This "rebins" the PSD to a slightly lower frequency resolution at which every data point is now independent. You can do this bin-averaging inside the Welch routine that is generating the sample distributions: For each individual PSD, take the average of every 5 bins across the band of interest, then save those bin-averages (instead of the full-resolution values) into the persistent array of PSD values. Doing this will allow the likelihoods to decouple as before, and will also reduce the computational burden of computing the sample distributions by a factor of 5."

I'll update the results once I do this analysis with some new measurements with time-series data.