Universiti Teknologi Malaysia Institutional Repository

Momentous fragmented mel frequency cepstral coefficient and distance-based for human guided computerized Al-fatihah recitation assessment

Shafie, Noraimi (2021) Momentous fragmented mel frequency cepstral coefficient and distance-based for human guided computerized Al-fatihah recitation assessment. PhD thesis, Universiti Teknologi Malaysia.

[img] PDF
810kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

The use of technological speech recognition systems with a variety of approaches and techniques has grown exponentially in a variety of human-machine interaction applications. The advanced methodology for Al-Quran recitation evaluation is obligatory; referring to the regularity of the reader in adhering to the rules of Tajweed which generally consists of the laws of Makhraj (articulation process), Sifaat (letter features or pronunciation), and Harakaat (length of pronunciation). The main focus of this undertaken research is on the work of digitally transforming the voice signals of Al-Quran recitation, with identification of recitation errors according to the Tajweed law as well as the ability to evaluate the recitation based on syllable pronunciation. In other words, the research adopts/involves a complete methodical and tactical approach, functioning as a computational machine that highlights scientific solutions in pre-processing work, feature extractor design, and threshold matching process on the assessment of the recitation of the Al-Quran. The methodology is, first and foremost, proposed to provide solutions to the general problem of unnecessary signal noise as well as other variations of speech signals. The results of this filtered signal are then followed by the challenge of investigating and selecting the appropriate digit Al-Quran Recitation Speech Signal (QRSS) syllable representation, feature extractor, Quranic Recitation Acoustic Model (QRAM), the threshold matching, and classification process. Technically, the properties of Formant Frequency and Mel Frequency Cepstral Coefficients (MFCC) used in this thesis, are those involving energy distribution in the time-frequency domain, with MFCC and Dynamic Time Warping (DTW), which had initially shown to be promising miniature features. As a result, an idea was successfully established; a structured experimental procedure was designed for enhanced momentous fragmentary MFCC-formants frequency that was broken down or fragmented into three bands that eventually served as miniature features of the QRSS in representing combined vowels and consonants. Each of the QRSS pronunciations is evaluated based on parameter estimation of Maximum Likelihood Estimation (MLE) and Minimum Path Cost (MPC) parameters. Both parameters are estimated from Gaussian Mix Model (GMM) and DTW model and used as non-phonetic transcription to obtain and test the genuineness of the designed factual salient features. Each syllable was also classified to confirm the accuracy of the MLE and MPC in representing the syllable signature using three types of classifiers which are Linear Discriminant (LD), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN). KNN has shown high classification performance compared to LD and SVM. In the final stage, the threshold classification-based approach, which is technically considered as traditional Talaqi-like Al-Quran recitation approach was performed on 78 syllables of Al-Fatihah (Chapter 1) recited by 80 learners. This is done through the training and testing analysis using human-guided computerized assessment. The performance highlighted by the Intelligent Quranic Recitation Assistance (IQRA) system computing engine model for the MLEs of the low, middle, and high band were 87.27%, 86.86% and 86.33% respectively, with MPC performance of 90.34%. The overall results of the research study indicates that, for future work, a hyperparametric model can be used to process or estimate the threshold based on expert assessment automatically. Undoubtedly hyperparameter selection is usable for predicting model performance.

Item Type:Thesis (PhD)
Uncontrolled Keywords:Al-Quran Recitation Speech Signal (QRSS), Mel Frequency Cepstral Coefficients (MFCC), Gaussian Mix Model (GMM)
Subjects:T Technology > T Technology (General)
Divisions:Razak School of Engineering and Advanced Technology
ID Code:108142
Deposited By: Widya Wahid
Deposited On:22 Oct 2024 06:56
Last Modified:22 Oct 2024 06:56

Repository Staff Only: item control page