{"title":"Developing Mask R-CNN Framework for Real-Time Object Detection","authors":"Hmidani Oussama, Ismaili Alaoui El Mehdi","doi":"10.1109/CommNet60167.2023.10365298","DOIUrl":null,"url":null,"abstract":"In the field of computer vision, achieving real-time object detection through deep learning holds significant importance. Notable strides have been made in real-time object detection methods, particularly due to the rapid progress of deep convolutional neural networks (CNNs) compared to traditional approaches. It has been observed that existing real-time deep CNN-based object detectors face performance limitations, primarily stemming from the architecture of the underlying base network. This study introduces an improved framework for real-time object detection based on the Mask R-CNN model. To address the challenge of enhancing performance under stricter localization criteria, we replace the original Mask R-CNN’s Region of Interest Align (RoIAlign) with spatial interpolation. Additionally, in the final phase of the Mask R-CNN framework, we utilize the depthwise separable convolution architecture from EfficientNet-B7 to construct a classifier for proposal categorization and to adjust bounding boxes for detected objects. Experimental findings on both the COCO dataset and the ImageNet dataset demonstrate that our proposed approach surpasses the original Mask R-CNN in terms of detection accuracy and inference speed. Categorically, our method outperforms the original Mask R-CNN framework by 51.5% on the COCO test set and 46.2% on the ImageNet test set.","PeriodicalId":505542,"journal":{"name":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","volume":"21 6","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CommNet60167.2023.10365298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of computer vision, achieving real-time object detection through deep learning holds significant importance. Notable strides have been made in real-time object detection methods, particularly due to the rapid progress of deep convolutional neural networks (CNNs) compared to traditional approaches. It has been observed that existing real-time deep CNN-based object detectors face performance limitations, primarily stemming from the architecture of the underlying base network. This study introduces an improved framework for real-time object detection based on the Mask R-CNN model. To address the challenge of enhancing performance under stricter localization criteria, we replace the original Mask R-CNN’s Region of Interest Align (RoIAlign) with spatial interpolation. Additionally, in the final phase of the Mask R-CNN framework, we utilize the depthwise separable convolution architecture from EfficientNet-B7 to construct a classifier for proposal categorization and to adjust bounding boxes for detected objects. Experimental findings on both the COCO dataset and the ImageNet dataset demonstrate that our proposed approach surpasses the original Mask R-CNN in terms of detection accuracy and inference speed. Categorically, our method outperforms the original Mask R-CNN framework by 51.5% on the COCO test set and 46.2% on the ImageNet test set.