Universiti Teknologi Malaysia Institutional Repository

Comparison of 1D VS 2D convolutional neural networks for bird sound detection

Tan, Pei Hong (2022) Comparison of 1D VS 2D convolutional neural networks for bird sound detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering.

[img]
Preview
PDF
227kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper.

Item Type:Thesis (Masters)
Uncontrolled Keywords:Bird Audio Detection (BAD), convolutional neural network (CNN), automatic acoustic detection system
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Faculty of Engineering - School of Electrical
ID Code:99590
Deposited By: Yanti Mohd Shah
Deposited On:08 Mar 2023 03:36
Last Modified:08 Mar 2023 03:36

Repository Staff Only: item control page