Universiti Teknologi Malaysia Institutional Repository

Bird species identification using spectrograms and convolutional neural networks

Saad, Aymen (2020) Bird species identification using spectrograms and convolutional neural networks. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering.

[img]
Preview
PDF
266kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Birds are particularly useful ecological indicators as they respond quickly to the changes in their environment. Thus, studies regarding the diversity of birds are indispensable. Domain experts classify birds manually to achieve accurate results, but the process is tedious with growing amounts of data. Meanwhile, bioacoustics monitoring employs automated recorders to collect large-scale audio data of fauna vocalization. Nevertheless, the analysis of large-scale audio is impossible to be done manually. Hence, machine learning is a more practical approach. Previously, Convolutional Neural Network (CNN) approach had achieved excellent results using the augmented spectrogram image of the audio. Varieties of CNN architectures such as Cube, Inception-v3, DenseNet, ResNet, and ConvNets are having advantage in high accuracy, but are disadvantage in high computational cost. These architectures are suitable for large-scale classification up to 1000 species due to the deep layer of neural network models to obtain high-level feature extraction from the spectrogram image. However, many devices intended for these models have limited computation resources and strict power consumption constraints. Therefore, the proposed study aims to optimize the CNN-based birdcall classifier model targeting embedded platforms. A low complexity CNN model, MobileNet-v2 was implied which is sufficient for a small-scale classification such as to identify ten bird species as inputs. The dataset used to train our model is from the Xeno-canto repository. Each audio data is amplified to 16 kHz and segmented into 1-second sample data. An algorithm to splice the audio according to the label is proposed. Then, each sample that contains the birdcall signal is augmented into three samples, and the noise only samples are removed from the dataset. The spectrogram image of the samples is obtained using STFT and MFCC conversion, and then all images are resized to 224×224×1, using Matlab 2019b. To verify our model, we compare it with the high complexity CNN model, ResNet-50. In the result, the MobileNet-v2 model has reduced the computational cost of ResNet-50 by 86% with a slight trade-off to the accuracy. Compared to ResNet-50, the accuracy of MobileNet-v2 dropped 12% if using STFT, but only dropped 2% if using MFCC, which made MobileNet-v2 model with MFCC conversion the best CNN model for device applications with small number of classifiers.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Kejuruteraan (Komputer dan Sistem Mikroelektronik)) - Universiti Teknologi Malaysia, 2020; Supervisors : Dr. Shahidatul Sadiah Abdul Manan
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Electrical Engineering
ID Code:93040
Deposited By: Yanti Mohd Shah
Deposited On:07 Nov 2021 06:00
Last Modified:07 Nov 2021 06:00

Repository Staff Only: item control page