Adaptive Noise Cancellation Using LMS and Optimal Filtering

Abstract

This project compares the performance of optimal filtering, LMS and batch LMS, for the adaptive noise cancellation problem, where the electro-acoustic transfer functions are unknown and changing. The optimal filter performs best, given that the signal is piecewise stationary, and the stationary discontinuities can be found manually. The LMS algorithm performed very well, and does not require the signal to be piecewise stationary, and requires no manual operation other than selection of the step-size and the filter order. The batch LMS algorithm performed poorly. For both optimal filtering and LMS, the original speech signal was easily recognized.

1. Introduction

In this project, we are exploring the differences between using LMS and optimal filtering to solve the noise cancellation problem when the channel transfer functions are unknown. The problem is that there is a speech signal recorded by a microphone that is corrupted by noise from an appliance (the primary signal). However, there is another recording of just the noise from the appliance (the reference signal). Both signals were sampled and synchronized in time. Figure 1 shows the recording setup.

We would like to be able to remove the noise component in the primary signal to recover the speech. We can do this by finding the transfer function between the reference signal and the part of the primary signal due to the noise, which is H₃^-1(z)∙H₂(z), in the figure.

The transfer functions H_i(z) are unknown, so we need to determine them in some way. For this project, we will use an all-zero filter in order to approximate H₃^-1(z)∙H₂(z). The setup for this method is shown in figure 2. We will investigate two methods for determining the coefficients of the transfer function: using the Wiener-Hopf solution, and the Least-Mean-Squares algorithm. To complicate the problem, the transfer functions change during the recording.

2. all-zero filter using mse

In order to find the “best” coefficients to use in our filter, we need to define what “best” means. Since the original speech is unknown (we are trying to find it), and the transfer functions are unknown, we need to estimate the transfer function. In order to do this effectively, we need a way of determining if our estimate is good or not. Since we are trying to find the filter that matches the reference signal to the part of the primary signal contributed by the noise source, it would be natural to look at the resulting difference (primary minus filtered reference) and try to minimize its energy.

Mathematically, minimizing this result gives the optimal filter if the speech and the reference are uncorrelated. If however, some of the speech signal leaks into the reference signal, then this method would try to minimize the speech part of the signal as well. This would result in poorer noise cancellation, and speech attenuation (bad).

3. Optimal filter solution

In order to implement the optimal solution, a filter size must be chosen. A filter order of 60, seems to be about the maximum needed, based on trail and error. Because the Wiener-Hopf solution is deterministic, choosing a larger filter size will not decrease performance (as is the case for LMS).

The optimal solution does a poor job of removing the noise from the signal. This makes sense, because the optimal filter assumes that the signal is stationary, but we know from the problem that this is not the case. But we do know that the signal is piecewise stationary. Thus, if we can determine the time at which the signal changes, we can use the optimal filtering method on each of these sections separately to come up with independent weight values.

To find the time when the filter coefficients change, a spectrogram was used to see if any visible drastic changes occur in the signal. It turns out that the change occurs precisely half-way through the signal at sample 23,000 (out of 46,000 total samples). When we use the Wiener-Hopf solution on each of these sections separately, we are able to hear the original speech, “logic clearly dictates, that the needs of the many, outweigh the needs of the few“, spoken by Leonard Nimoy, without any interference. Even the time when the transfer function changes, is hardly detectable from the speech.

The minimum of the cost function, J_min, is simply the expected value of the energy in the error (which in this case is the expected value of the energy in the speech). J_min was found to be 0.00018 for the first half, and 0.00013 for the second half. If we do not split the signal in half, we get a much larger J_min of 0.0022, which is approximately 12 times larger than the split solution.

In examining the coefficients of the transfer function, it becomes obvious that they are sums of decaying complex exponentials. This signifies that out all-zero filter is probably better modeled as an all-pole filter. Finding in the all-pole equivalent of the filter we get the following two transfer functions (one for each half of the signal in time, respectively):

G₁(z) = 1 / (1 + 0.5 z^-1 – 0.2 z^-2 + 0.2 z^-6 )

And

G₂(z) = 1 / (1 + 0.5 z^-2 )

The first half of the signal requires a much larger filter order to accurately model the transfer function (~60), since the coefficients do not die out rapidly. The second half of the filter requires a much smaller order (~10) to model the changed transfer function.

Note that if the transfer function between the reference and the noise component of the primary is an all-pole filter, the transfer function between the noise component of the primary and the reference will be an all-zero filter, and will fit our all-zero model better. If we do this however, the error of the filter will no longer be the speech signal desired, but instead it will be the speech filtered through our all-zero model. So if we reverse the roles of the primary and reference signals, we need to further add the inverse of the all-zero filter to the error signal to recover the original speech, as is shown in figure 3. By doing this, we will gain by requiring a much smaller filter order (seven for the first half of the signal and three for the second half).

Without the post-filtering, the error signal has no noise, but the speech is slightly muddled. When inverse filter is applied, this is implemented, we find that the results are as good as before, but now the filter order is drastically reduced.

4. Least-Mean-squares (LMS)

While the above Wiener-Hopf solution perfectly solves the problem, it required an intelligent system modification (breaking the problem into two parts) that will generally not work for an adaptive noise cancellation problem. Also, the optimal method cannot be performed in real-time, because the entire signal must be given in order for the solution to work. The LMS solution avoids both of these problems by adapting the filter weights as the speech is being played. The result however, is not close to the quality of the Wiener-Hopf solution.

In order to use the LMS algorithm, we must first determine a filter order and then a good step-size. This can be done by trail-and-error, or by finding the autocorrelation of the reference signal and the cross-correlation between the reference and primary signals. Since we already did this for the optimal filter, we know the filter order need not be any larger than 60. But where as increasing the filter order did not degrade the performance of the optimal filter, it will degrade the performance of LMS. This is because the larger filter order, the larger the eigenvalue spread of the autocorrelation matrix of the reference signal (we don’t change the autocorrrelation function, but we include more terms from that function into the autocorrelation matrix). Increasing the eigenvalue spread, in turn decreases the rate of convergence of LMS to the optimal solution. This occurs because the maximum step-size for convergence is related to the maximum eigenvalue, but the rate of convergence is proportional to the step-size and the minimum eigenvalue, which is shown in the equation:

1/τ ≈ 2µλ_min

where µ is the step-size, λ_min is the minimum eigenvalue, and τ is the overall time-constant (which is inversely related to the convergence rate).

As a compromise, we chose M = 20 for the order of the filter. If we use the rule of thumb, µ = 0.2/λ_maz, then the theoretical convergence time constant for the LMS rule is given by

τ ≈ 0.4 λ_max/ λ_min ≈ 4000 samples

This gives the time constant for the convergence of the slowest converging weight. Figure 6 shows the weight tracks of different weights for a single step-size. Figure 7 shows the weight track for w₂, for different values of µ. From these weight tracks you can clearly see the transition at sample 23000. By trail-and-error, the maximum step-size was found to be 0.3 (for M=20). Using this value resulted in very rough sounding speech, however, because the weights were changing so rapidly. The best value seemed to be about 0.04. Figure 8 shows the learning curve (MSE of the output).

Figure 6

Figure 7

Figure 8

To improve the performance of LMS, we can swap the primary and reference signals as we did in the optimal filter case. This filter should be easier to train using LMS for two reasons. Firstly, the filter order now only needs to be seven for reasons previously mentioned. Secondly, most of the optimal weight values are zero, so the filter will initialized to the optimal value for those. Of course, the speech will be slightly muddled for reasons previously mentioned. Figure 9 shows the weight tracks for this setup.

Figure 9

4. LMS using batch processing

Instead of using straight LMS, where we modify the weight values for every new data point, we can store the modifications that we would make for each data point, and then after a while apply all of the modifications at one time. For this project, we will use a batch size of 2000. The weight tracks for this case are shown in figure 9.

Since we are averaging the weight update over 2000 samples, we need a much larger value for step-size. By trail-and-error we found that a value of µ = 2 gives us the fastest convergence. This method performs poorly when compared to LMS. The reason for this is that there is not enough time for the LMS algorithm to adapt properly. It only has 11 iterations to converge before the transfer function changes. Even when the desired and reference signals are swapped, the speech is hardly recognizable. As the batch size is decreased, the algorithm performs better.

Figure 10

5. CONCLUSIONs

The Weiner-Hopf solution produced the clearest result when the signal was split in half. It also produced the poorest result when this procedure was not performed. The LMS algorithm using a batch size of one was much clearer than the batch size of 2000, and the LMS using the reversed input/desired response signals performed the best in terms of noise removal, but the signal was a little muddled. In the LMS signal, the signal is noisy for a split second at first and at the transition, but otherwise it is very clear.

10. References

[1] A.V. Oppenheim, E. Weinstein, K. Zangi, M. Feder and D. Gauger, "Single-Sensor Active Noise Cancellation", IEEE Transactions of Speech and Audio Processing, vol. 2, no. 4, pp. 285-290, April 1994.

[2] B. Widow, "Adaptive noise canceling: principles and applications", Proceedings of the IEEE, vol. 63, pp. 1692-1716, 1975.

[3] Simon Haykin, Adaptive Filter Theory, 4^th edition, Prentice Hall, New Jersey, 2002.

Adaptive Noise Cancellation using LMS and optimal filtering

Abstract