Notes on Power Spectral Density (PSD) Estimation Using Matlab

Embed Size (px)

Citation preview

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    1/10

    Notes on power spectral density (PSD) estimation using Matlab

    I applied three different methods to analyze the power spectral density of the acquired hot

    wire signal:

    1.  Periodogram estimate

    2. 

    Welch’s power spectral density estimate 

    3.  Yule-Walker method – Autoregressive power spectral density estimate

    These methods can be further classified into two groups: nonparametric methods and

    parametric methods.

    Nonparametric methods

    Periodogram

    Periodogram is the most basic and complete nonparametric method of transforming the signal

    from time space to frequency space. It’s the direct conversion from time space to frequencyspace. Although periodogram is consider as the estimation method, the output of this method

    losses no information of the original signal.

    Basically one takes the Fourier transform (discrete in time) of the signal in time-space, then

    take the square of the magnitude of the signal (or multiply by the conjugate of the), scale the

    power properly (Nyquist criteria and energy conservation) and normalize the power by the

    number of data points (length of the signal multiply by sampling frequency)  – this gives an unit

    of (m2/s2)/Hz – if the raw data is velocity (m/s).

    Welch’s method 

    Welch’s method applies segmentation, windows, and weighting  –  series of preprocessing

    techniques.

    The method does the following thing:

    a. Separating the acquired signal of length N into K segments with each segment has length L.

    b. Multiplying each segment with a window function (Hamming, for instance).

    c. Performing a Fourier transform to “each” segment.

    d. Take the arithmetic mean of these segments.

    Something noteworthy is that these actions are performed to the data points with no

    information involving “time”. In other word, these actions assume that the sampling fr equency

    is 1 Hz. Therefore, it’s necessary to scale the processed data with respect to the actually

    sampling rate.

    Hamming window algorithm:

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    2/10

    wn  0.54 0.46cos 2  , 0 ≤ ≤  In order to show that, I tested the acquired data using hot-wire film:

    Sampling rate: 25000/s

    Sampling time: 1 s

    Nyquist criteria: 12500 Hz

    Maximum data points: 25000

    Matlab function:

    [pxx,f] = pwelch(x,window,noverlap,f,fs)

    Output: pxx is the power density of x, f is the frequency calculated from the function pwelch.

    Input: x is the raw signal, window is the number of samples each segment contains, noverlap isthe number of the samples overlaps in each of the two adjacent segments, f is the frequency

    introduce to the fast Fourier transform sequence buried within pwelch (the common name for

    this variable is nfft) or cyclical frequencies, and fs is the sampling rate in Hz.

    Variable f is the tricky part because in the document both seems to be applicable… 

    [pxx,f] = pwelch(x,window,noverlap,f,fs)

    [Pxx,f] = pwelch(x,window,noverlap,nfft,fs)

    This is indeed irritating and must be investigate with different testing:

    pwelch by default separates the data into eight segments with each overlapping no more than

    50% of the data. This is very important information and took me very long time to discover this

    from the matlab website.

    To see how this black box, pwelch, work in practice, let’s start from the simplest form: 

    [pxx,f] = pwelch(x)

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    3/10

     

    The lowest frequency for this Welch estimation is 3.1415…its ! Now based on our previous

    understanding that if no designated sampling rate is provided, the algorithm assume this is 1 Hz.

    In other word, the algorithm will assume its “1”. If we recover the actual frequency range from

    Nyquist criteria, its 2. So why is that? Well, this is actually in the unit of radian. The algorithm

    thinks the sampling rate is equivalent to 2  radians and proceed with this information.

    Therefore, it’s inappropriate to plot them on the same figure because the unit of the two is

    essentially different.

    Number of elements in the frequency output is 4097, excluding the first point of 0 frequency

    (DC), we have 4096 elements. Recall that if no information regarding segmentation is provided,

    the algorithm will separate the signal into 8 segments with 50% overlapping. Therefore, 25000

    samples will give 5555.555… samples in each segment with 50% overlapping. In order to have

    fast Fourier transform operates with optimal performance, number of elements N to be

    processed must be log2(N) = P and P must be an integer. In addition to that, 2^P has to be

    greater than N. It’s quite evident that 8192 is the number we are looking for in this case. Apply

    Nyquist criteria to avoid aliasing the total number of elements is 8192/2 = 4096! This explains

    the mystery of where this 4097 elements came from.

    The formula for finding number of segments and length of each segment is as follows:

    ( 12   )  where N is number of segments, l  is length of segment and L is length of the raw signal.

    As you probably noticed, this algorithm separates the entire data to 8 segments each have the

    length of 5555 elements in time space and then perform the 8192 points discrete Fourier

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    4/10

    transform (DFT) to achieve fast Fourier transform which end up having 8193 points (including

    DC) in frequency space. If this statement confuses you please recall the definition of discrete

    Fourier transform:

        ∙ −/−=  

    where N is 5555 and N is 8192 in this case.

    Here is the irritating fact about pwelch method:

    pwelch “scales” the frequency range of 2 with 8192 points. This implies that these windowed,

    averaged 5555 points segments in time space are used to represent the 8193 points (including

    DC) power spectral density in frequency space over the entire sampling frequency region. This

    is a problem because we are interest in the whole range of frequency. The fewer the elements

    each segment has the more information loss in the low frequency region (although we reduces

    the variation). Resolution in frequency domain is another cost we paid.

    Now let’s introduce more input variables:

    [Pxx,f] = pwelch(x,[],0,[],fs)

    where fs (sampling rate) in our case is 25000/s.

    What this does is to perform Welch method with 8 segments (by default), no overlapping and

    with sampling rate provided.

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    5/10

    This example has 2049 elements with a resolution of 25000/4096 = 6.1035 Hz which is exactly

    the value of the second element in the frequency output (first is zero). 25000/8 = 3125 and thus

    4096 is the right number of elements used for fast Fourier transform. Last element of the

    frequency output is also correct, it’s 12500 Hz. This method of no overlapping while averaging is

    still performed is called Bartlett’s method.

    Next is an example with default overlapping setting:

    [Pxx,f] = pwelch(x,[],[],[],fs)

    This is the first example “stretched (scaled)” with respect to the sampling rate of 25000/s. It has

    exactly the same shape of the first example and it’s physically reasonable (because information

    regarding sampling rate has been introduced)! This is the simplest form of the Welch method

    and easiest to understand. It has 4097 elements and the resolution is 25000/8192 = 3.0518

    which again is correct.

    To prove this is how the pwelch works, I tested the following two sequence:

    [Pxx,f] = pwelch(x,[],[],[],fs) - default

    [Pxx,f] = pwelch(x,5555,2777,[],fs) – manual

    where 5555 is number of elements per segment (total of 8 segments) and 2777 is 50% of 5555.

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    6/10

     

    They match each other perfectly (output file checked!).

    We can further compare how the number of segments affects the data:

    As can be seen from the plot, PSD estimation is significantly smoothed if the number of

    segments increases, of course, at the cost of resolution. Please noted that it’s not uncommon to

    have N greater than 100.

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    7/10

    This leaves us the last variable, f. Based on my understanding of fft in Matlab, what f does is

    acting as the NFFT at which it determines the resolution of the frequency “mathematically”.

    This is the variable that I feel uncomfortable to use. One can definitely have a very large value

    of f to extend the low frequency section but without the source code it ’s hard to justify the

    validity of the signal.

    The extended frequency curve at low frequency is the main difference of these two curves.

    The best way to remedy and improve the quality of the Welch method is to acquire a much

    longer period of data to retain the low frequency information.

    Parametric method

    Yule-Walker method

    Yule-Walker is a totally different approach of estimating the power spectral density. Unlike the

    previous two method of directly convert the acquired signal, Yule-Walker is an autoregressive

    method that sometimes called autocorrelation method. The idea of an autoregressive model is

    to predict the evolution of a function based on the time history of the function itself.

    The idea of autoregressive model is as follows:

      −  =  x is the function or variable of interest, a is the AR coefficients, p is the order of the model and  

    is the error with a mean of zero.

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    8/10

    The algorithm for the Yule-Walker model is as follows:

        ⋯ −⋮ ⋱ ⋮

      ⋯  

     

    where R is the autocovariance function:

    ≡ 1   −

    =+  This series of linear equations solves a1  to ap  as well as   knowing the fact that   has the

    property of zero mean and variance 2.

    The goal of this algorithm is to minimize  by fitting a series of a. This is achieved by solving the

    system of linear equations at which y is known prior. Once a and 2 is established, the power

    spectral density can be calculated directly:

         |1 ∑   exp2   =   | 

    The derivation of this equation is very tedious so please refer to the textbook. Please note that

    this equation is applicable to any parametric models.

    The following is the symtax used in Matlab:

    [Pxx,f] = pyulear(x,p,nfft,fs)

    Identical to pwelch, nfft and fs both determines the range of the frequency.

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    9/10

     

    This figure is for the case of fixed NFFT = 256 (default) and varying the order of the model.

    Many literatures I have read argue that Yule-Walker method requires higher order p to get a

    better result (including one of the papers cited by Bruno). I cannot distinguish any significant

    difference from current data set, perhaps the number of samples is insufficient to see this

    difference.

    Summary

    Based on my experience with data acquisition and signal processing so far, I would stick to

    nonparametric method instead of entering the world of autoregressive models due to the fact

    that I am lack of training in statistics.

  • 8/19/2019 Notes on Power Spectral Density (PSD) Estimation Using Matlab

    10/10

     

    It’s very hard to say which of the method is superior. Nevertheless the Welch method is

    definitely widely accepted and understood in the fluid mechanics community while the

    autoregressive modelling is being adopted by Poinsot/Veynante group and others.