40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Sat Jul 16 00:30:42 2016, varun, Update, CDS, DAFI update: Frequency warping | c1lsc unresponsive 
    Reply  Thu Jul 21 22:02:35 2016, varun, Update, CDS, DAFI update: Frequency warping 07212016.png
Message ID: 12307     Entry time: Sat Jul 16 00:30:42 2016     Reply to this: 12324
Author: varun 
Type: Update 
Category: CDS 
Subject: DAFI update: Frequency warping | c1lsc unresponsive 

Summary: I am trying to implement frequency warping/pitch shifting on the real time FE. Here is a description long overdue:

Description: The overall idea is as follows:

The DFT of a frame x_i[n]  is given by X_i[k] = \sum_{n=0}^{m-1}x_i[n]e^{-j2\pi \frac{kn}{m}}. A matrix W containing all W_{kn} = e^{-j2\pi \frac{kn}{m}} for k, n = 1, 2, ..., m can be calculated and predefined in the code. The input arrival rate is 16384 Hz, i.e. once in every 60 $\mu$s time window. Hence, the fourier coefficients can be updated cumulatively in each cycle using the current value of the input, previous value of the fourier coefficient and the components of the W matrix. This will distribute the computational load of the FFT into all the time windows. Similar operations can be carried out for the inverse STFT.

I have written and run a pseudo-real time code on my CPU. The following is the essence:
 Let the frame-length be M, and the intended scale factor of the frequency warping be 'r'. The frame overlap is 50%. At each clock cycle, the following tasks are performed:  (1 to 3 are routine tasks performed at every clock cycle, 4 is a special task performed only when a frame is filled.)
 1) Take input and apply hanning window to it.
 2) Cumulate X_i[k] for every k using the value of x_i[n] (the input) at that particular instant. Also start to cumulate X_{i+1}[k], which will be later transfered to X_i[k].
 3) Because of 4), we now have 'r+1' filled frames corresponding to output fft. Now take the ifft using two consecutive frames corresponding to only two time series points. The computations required for this task are the same as the computations required for calculation of the fourier coefficients iteratively, since the entire time series ifft is not computed.
 4) Do these special tasks after each frame gets filled:
      At this point, the ffts of the current frame and a previous frame is ready. Let us call them X1 and X2.
      Calculate phase difference between the two.
      Calculate all the interpolated |Y_i| in between these two frames depending upon the scale factor.
      Assign phase of X1 to first Y frame and assign increasing phase to all the other Y frames.
      and also do all the usual non-special tasks.

This code takes about 9-10 microseconds for a cycle with special tasks, and 5-6 microseconds for a cycle with routine tasks on my laptop (brought down from 100 microseconds peak time in the earlier offline implementation due to elemination of explicit dft and reduction in fft size), for a frame size of 32 samples. However, when fed into the c1lsc FE, it crashes, as it has done once again today evening, in the same fashon as yesterday. There could be 2 possible reasons:

1) Size of the array containing the W_{kn} matrix elements is too large for the FE memory,

2) the computations are taking up more than 60 microseconds.

Since there are already a few codes with similar array sizes, I am more inclined to think that 2) is more likely.

Another problem that I am anticipating is that for a 32 point dft and a sampling rate of 16kHz, the frequency resolution achieved is about 500 Hz, which is not sufficient if we need to represent seismic signals. The only way I can think of, for representing such signals with a small number of fft points, is to reduce the effective sampling rate, i.e. do DSP on inputs at a much lower rate than 16kHz (say 1kHz, which will give a resolution of ~30 Hz, or 2kHz giving a resolution of ~60Hz). Another advantage of this method is that it frees up more clock cycles for computation, thus the computational load can be further distributed.  The problem in this implementation is that it will increase the delays.

ELOG V3.1.3-