Features based on Fourier-Bessel Expansion for Application of Speaker Identification System


The aim of the project was to develop features based on Fourier-Bessel (FB) expansion for speaker identification system.

We developed four different feature extraction techniques for speaker identification using zero-order FB functions as basis functions. We implemented these techniques on MATLAB; and tested them on TIMIT database and CHAINS corpus (the ones generally used for speaker identification). To check the robustness and reliability of the system, we also tested the techniques on a database of 24 speakers prepared by ourselves at IIIT-Hyderabad. The training and testing data for the database was recorded in normal office environment.


  • Introduction
The human speech conveys different type of information. The goal of Automatic Speaker Identification is to extract, characterize, and recognize the information about speaker identity. The algorithm compares an unknown speaker's speech against a database of N known speakers. The best-match is returned as the identified speaker.

Fourier-Bessel (FB) expansion has been proved a better spectrum representation for speech signals than the Fourier expansion, because each of the Bessel basis functions supports finite bandwidth around a centre frequency, unlike the sinusoid basis functions which provide only spectral lines. Thus, a band-pass speech signal is represented compactly by FB expansion with fewer non-zero FB coefficients.

  • Speaker Identification System

Speaker Identification System

Both the training and the identification phases include feature extraction, sometimes called the front-end of the system. The feature extractor converts the digital speech signal into a sequence of numerical descriptors, called feature vectors. Feature extraction can be considered as a data reduction process that attempts to capture the essential characteristics of the speaker.

In the training phase, a speaker model is created from the feature vectors. The aim is to model the speaker's voice so that it generalizes beyond the training material. Gaussian mixture model (GMM) and vector quantization (VQ) are two of the most widely used techniques for modeling.

In the identification phase, features are first extracted from the unknown speaker's voice sample. The next step is pattern matching - an algorithm that computes a match score between the unknown speaker's feature vectors, and the models stored in the database. The last phase in identification chain is decision making. The decision module takes the match scores as its input, and makes the final decision of the speaker identity.

  • Feature extraction techniques
Feature extraction is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal in to a compact but effective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is strongly determined by the quality of the front-end. In other words, classification can be at most as accurate as the features.
Different feature extraction developed are:
  • Features based on Fourier-Bessel (FB) coefficients
  • Features based on FB coefficients energy
  • Features based on FB and discrete energy separation algoithm (DESA)
  • Features based on mean frequency

The above set of feature vectors are evaluated using Gaussian mixture model (GMM) classifier consisting 32 mixtures. The GMM, like a parametric model has structure and parameters that control the behavior of the density in known ways, but without the constraint that the data must follow a specific distribution. We used a random- mean selection, followed by a single iteration k-means clustering for initialization. With the same classifier used on all the features, and for all the databases, the effectiveness of each feature extraction technique can be compared.
To evaluate the performance, we implemented the speaker identification system on MATLAB using the developed feature extraction techniques. We compared the results using, the widely used, MFCC feature extraction technique.
We used the following speaker databases - TIMIT, CHAINS, and IIIT-Hyderabad speaker database. IIIT-Hyderabad speaker database was prepared by us.

  • Results
S. No. Feature extraction technique TIMIT
(22 speakers)
CHAINS
(16 speakers)
IIIT-Hyderabad
(24 speakers)
1 Features based on MFCC 22 14 22
2 Features based on FB coefficients 22 13 22
3 Features based on FB coefficient energy 22 12 22
4 Features based on FB-DESA 22 12 22
5 Features based on mean frequency 22 14 24

Copyright © 2014 Ronak Bajaj