Universiti Teknologi Malaysia Institutional Repository

Multi level refinement enriched feature pyramid network for scale and class imbalance in object detection

Aziz, Lubna (2022) Multi level refinement enriched feature pyramid network for scale and class imbalance in object detection. PhD thesis, Universiti Teknologi Malaysia.

[img] PDF
697kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Object detection becomes challenging due to feature unbalancing, less contextual information and class imbalance. The feature pyramid has been used to learn multiscale representation in modern detectors. However, the current version of the feature pyramid failed to integrate useful semantic information across different scales. In addition, many negative anchors are generated during training, resulting in extreme class imbalance. This study proposed a Multi-Level Refinement Enriched Feature Pyramid Network (MREFP-Net) to jointly handle feature-level scale imbalance and class imbalance in object detection. Instead of designing a complex approach, a simple and effective multi-layered feature enrichment scheme was proposed that effectively combines deep, intermediate, and shallow features to obtain important semantic and spatial information for small object detection. In addition, a chained parallel pooling was proposed to capture rich background contextual information. A cascaded anchor refinement scheme was introduced to integrate useful multiscale contextual information into Single Shot MultiBox Detector's prediction layers to improve the multiscale detection's distinctiveness. The ultimate goal of the cascaded anchor refinement scheme was to counteract the class imbalance by refining anchors and enriching contextual features to improve regression and classification. The performance of MREFP-Net was evaluated using two benchmark datasets, MSCOCO and PASCAL VOC 07/ 12. For a 300 × 300 input on MS-COCO test-dev, MREFP-Net-ResNet101 achieved a state-of-the-art detection accuracy ???? of 36.6 with single-scale inference strategy and 39.2 ms on RTX 2060 GPU. For a 512 × 512 input on MS-COCO test-dev, MREFP-Net obtained an absolute gain of 2.5%. In particular, the results of MREFP-Net-VGG were benchmarked with 800 × 800 input on MS COCO test-dev: 49.2 ???? with a multiscale inference strategy. For 300 × 300 input, MREFP-Net achieved 82.5% ?????? on VOC07+12+COCO, and for 512 × 512 input, MREFP-Net obtained 84.6% ??????. Finally, feature visualization, object characteristic analysis and false-positive error analysis were performed to highlight the effectiveness of enriched features for small object detection. This study has proven that the proposed MREFP-Net was capable of detecting small objects and learning sensitive features to deal with scale, class imbalances, and appearance complexity across object instances.

Item Type:Thesis (PhD)
Uncontrolled Keywords:Multi-level refinement enriched feature pyramid network (MREFP-Net), Single shot multiBox detector's
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:101479
Deposited By: Narimah Nawil
Deposited On:21 Jun 2023 10:10
Last Modified:21 Jun 2023 10:10

Repository Staff Only: item control page