Universiti Teknologi Malaysia Institutional Repository

Automatic spoken language identification using MFCC based time series features

Biswas, Mainak and Rahaman, Saif and Ahmadian, Ali and Subari, Kamalularifin and Singh, Pawan Kumar (2023) Automatic spoken language identification using MFCC based time series features. Multimedia Tools and Applications, 82 (7). pp. 9565-9595. ISSN 1380-7501

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/s11042-021-11439-1

Abstract

Spoken Language Identification (SLID) is a fairly well researched field. It has already been established as a significant first step in all multilingual speech recognition systems. With the rise in ASR technologies in recent years, the importance of SLID has become undeniable. In this work, we propose a model for the recognition of Indian and foreign languages. With the goal of making our model robust to noise from everyday life, we augment our data with noise of varying loudness taken from diverse environments. From the MFCC time series of this augmented data, we extract aggregated macro-level features, and perform feature selection using the FRESH (FeatuRe Extraction based on Scalable Hypothesis tests) algorithm. This helps us obtain a set of features that are relevant to this problem. This filtered set is used to train an Artificial Neural Network. The model is then tested on three standard datasets. Firstly, from the IIT-M IndicTTS speech database, six languages are selected, and an accuracy of 99.93% is obtained. Secondly, the IIIT-H Indic speech database consisting of seven languages is used, and an accuracy of 99.94% is recorded. Lastly, eight languages from the VoxForge dataset are also used, and we achieve an accuracy of 98.43%. The promising results obtained lead us to believe that these features are suitable for capturing language specific characteristics of speech. Hence, we propose that they can be used as standard features for the task of SLID. The source code of our present work can be found by accessing the link: https://github.com/rahamansaif/LID-using-time-series-MFCC.

Item Type:Article
Uncontrolled Keywords:artificial neural network, feature selection, FRESH algorithm, Indian languages, mel frequency cepstral coefficients, spoken language identification, time series features
Subjects:Q Science > Q Science (General)
T Technology > T Technology (General)
Divisions:Science
ID Code:105914
Deposited By: Yanti Mohd Shah
Deposited On:26 May 2024 09:06
Last Modified:26 May 2024 09:06

Repository Staff Only: item control page