Multilingual Speech Recognition

Image source

In this project we used the same custom database of Hindi Digit utterances by 50 different subjects 10 times each in various noise levels including ideal 0dB lab conditions as used in the speaker recognition project . But here instead of using the data to train and analyse neural network performances for speaker recognition/verification, we trained a speech(here digit) recognition model. Unlike speaker based learning where we had 100 samples per class(50 speakers), here we have 500 samples per class(10 digits), hence intuitively and practically the performance of all the models was better than that in the case of Speaker Reognition.

We trained five different models viz. Single Hidden Layer Neural network, Deep Neural Network, Radial Basis Function Neural Network(RBFNN) , Probabilistic Neural Network(PNN) and Self Organizing Maps(SOM,unsupervised) for the same and compared their performances. We used the same MFCC features extracted from each utterance for the training. We introduced an unsupervised paradigm too in the form of Self Organizing Maps that are an unsupervised clustering algorithm for classification.

As an extension we also trained a python version of the same model along with the data cleaning and feature extraction pipeline for an opensource noisy english digit data set to test the generalization ability of our approach.

comments powered by Disqus

Related