INTRODUCTION

HARSHA SATHYENDRA

ISMAIL UYSAL

Present day analog and cellular telephone channel infrastructures have limited the bandwidth of speech to 300-3400 Hz.  However, certain consonant sounds and clusters of phonemes occur in a far greater frequency range encompassing 20-8000 Hz.  The latter can be considered wideband speech (20-8000 Hz) and results in daily in-person interaction between individuals; the former narrowband speech (300-3400 Hz) occurs while talking on a telephone or cell-phone.  Thus, the problem occurs when there are indistinguishable high frequency phoneme(s) (e.g. the consonants n or m) in wideband speech with only narrowband speech available.  This leads to a dilemma, either change the whole telecommunications infrastructure, which may cost billions and billions of dollars or to come up with a more inventive solution to recreating wideband speech at the receiver given the narrowband input signal.  The latter solution, though requiring more thought, is several orders cheaper than the former, and thus will be addressed in this project.

With the benefits of narrowband to wideband speech well in hand one must now introduce the problem.  With any engineering problem, several assumptions must be made, and are as follows: (1) there is a correlation between the given narrowband speech signal and its wideband counterpart (2) even without exact reconstruction of the wideband speech signal the perceived auditory benefits far exceed the present-day narrowband speech perceived in telephones / cell-phones (3) the Linear Source Filter Model serves as an adequate representation for speech coding / recognition.

The Linear Source Filter Model often represents the coding and recognition of speech.  It is based upon an autoregressive (AR) filter, which represents the vocal tract (spectral envelope) and the source (excitation).  The AR model is also an all pole model and is filtered with I.I.D. white noise.   Thus, it is necessary to extend both the spectral envelope and excitation signal from the narrowband frequency range to the wideband frequency range to perform NB to WB speech conversion.