Universiti Teknologi Malaysia Institutional Repository

Multi-level refinement enriched feature pyramid network for object detection

Aziz, Lubna and Salam, Md. Sah and Ayub, Sara (2021) Multi-level refinement enriched feature pyramid network for object detection. Image and Vision Computing, 115 (NA). pp. 1-11. ISSN 0262-8856

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1016/j.imavis.2021.104287

Abstract

Class Imbalance and scales imbalance are common in object detection. A class imbalance occurs due to insufficient inequality between the number of instances with respect to different classes, while an imbalance in scale occurs when object have different scales and a different number of examples of different scales. In order to solve the problem of scale variance (scale imbalance) and class imbalance together, we propose a simple and effective feature enhancement scheme that explicitly uses all information of a multi-level structure to generate a multilevel contextual features pyramid with multiple scales. We also introduce a cascaded refinement scheme that incorporates multi-scale contextual features into the Single Shot Detector (SSD) predictive layers to improve their distinctiveness for multi-scale detection. A stack of multi-scale contextual feature modules is used in a feature enhancement scheme to merge the multi-level and multi-scale features. Then we collect the equivalent scale features over the Multi-layer Feature Fusion (MLFF) unit to construct a feature pyramid in which each feature map is made up of layers from multiple levels. More robustness and contextual information are integrated into the pyramid through chain parallel pooling operation. To improve classification and regression, a cascaded refinement scheme is proposed that effectively captures a large amount of contextual information and refines the anchors to solve the class imbalance problem. The experiments are carried out on two benchmarks datasets: MS COCO and PASCAL VOC 07/12. Our proposed approach achieves state-of-the-art accuracy with an AP of 40.6 in the case of multi-scale inference on MS COCO Test-dev (input size 320 × 320). For 512 × 512 input on the MS COCO Test-dev, our approach leads in an absolute gain in precision of 1.8% compared to the best reported results of single-stage detector (AP: 45.7).

Item Type:Article
Uncontrolled Keywords:Chained parallel pooling, CNN, Computer vision, Feature pyramid, Object detection
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:97380
Deposited By: Widya Wahid
Deposited On:10 Oct 2022 04:20
Last Modified:10 Oct 2022 04:20

Repository Staff Only: item control page