CSPDarkNet53、CSPResNeXt-50和EfficientNet-B0骨干网在YOLO V4上作为目标检测器的比较

International Journal of Engineering, Science and Information Technology Pub Date : 2022-09-14 DOI:10.52088/ijesty.v2i3.291

Marsa Mahasin, Irma Amelia Dewi

{"title":"CSPDarkNet53、CSPResNeXt-50和EfficientNet-B0骨干网在YOLO V4上作为目标检测器的比较","authors":"Marsa Mahasin, Irma Amelia Dewi","doi":"10.52088/ijesty.v2i3.291","DOIUrl":null,"url":null,"abstract":"YOLO v4 has a structure consisting of 3 parts: backbone, neck, and head. The backbone is a part of the YOLO v4 structure that serves as a feature extractor from the image; the backbone is also a convolutional neural network that can be replaced with another convolutional neural network. Many backbones are recommended by previous research, such as CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0. Therefore, research needs to be done to determine the effect of different backbones on the YOLO v4 model. One of the research objects that can be used is a microfossil. Research on the detection of microfossils is fundamental to assist paleontologists in knowing the species of microfossils as a determinant of rock age and distinguishing between similar microfossils. In this research, three backbones consisting of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 were used to train and detect image sets of 5 species of foraminiferal microfossils. The results were evaluated to determine the advantages of each backbone. There are a few metrics are that being used for evaluation, namely precision, recall, f1-score, average precision (AP), mean average precision (mAP), frames per second (FPS), and model size. As a result, the mean average precision (mAP) of the CSPDarkNet53 model reached 83.41%, the highest compared to CSPResNeXt-50 and EfficientNet-B0, which get a value of 81,00% and 81,76%. CSPResNeXt-50 model has a precision of 75.60%, recall of 81.10%, and f1-score of 78%. CSPDarkNet53 model also got the highest FPS value of 33.4FPS. However, the YOLO v4 model with the EfficientNet-B0 backbone is the lightest model, with only 156.8 MB.","PeriodicalId":14149,"journal":{"name":"International Journal of Engineering, Science and Information Technology","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Comparison of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 Backbones on YOLO V4 as Object Detector\",\"authors\":\"Marsa Mahasin, Irma Amelia Dewi\",\"doi\":\"10.52088/ijesty.v2i3.291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"YOLO v4 has a structure consisting of 3 parts: backbone, neck, and head. The backbone is a part of the YOLO v4 structure that serves as a feature extractor from the image; the backbone is also a convolutional neural network that can be replaced with another convolutional neural network. Many backbones are recommended by previous research, such as CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0. Therefore, research needs to be done to determine the effect of different backbones on the YOLO v4 model. One of the research objects that can be used is a microfossil. Research on the detection of microfossils is fundamental to assist paleontologists in knowing the species of microfossils as a determinant of rock age and distinguishing between similar microfossils. In this research, three backbones consisting of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 were used to train and detect image sets of 5 species of foraminiferal microfossils. The results were evaluated to determine the advantages of each backbone. There are a few metrics are that being used for evaluation, namely precision, recall, f1-score, average precision (AP), mean average precision (mAP), frames per second (FPS), and model size. As a result, the mean average precision (mAP) of the CSPDarkNet53 model reached 83.41%, the highest compared to CSPResNeXt-50 and EfficientNet-B0, which get a value of 81,00% and 81,76%. CSPResNeXt-50 model has a precision of 75.60%, recall of 81.10%, and f1-score of 78%. CSPDarkNet53 model also got the highest FPS value of 33.4FPS. However, the YOLO v4 model with the EfficientNet-B0 backbone is the lightest model, with only 156.8 MB.\",\"PeriodicalId\":14149,\"journal\":{\"name\":\"International Journal of Engineering, Science and Information Technology\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Engineering, Science and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52088/ijesty.v2i3.291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering, Science and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52088/ijesty.v2i3.291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

YOLO v4的结构由3部分组成:脊柱、颈部和头部。骨干是YOLO v4结构的一部分，作为图像的特征提取器;主干也是一个卷积神经网络，可以用另一个卷积神经网络代替。以前的研究推荐了许多骨干网，例如CSPDarkNet53、CSPResNeXt-50和EfficientNet-B0。因此，需要研究确定不同骨干网对YOLO v4模型的影响。其中一个可用的研究对象是微化石。微化石的检测研究是帮助古生物学家了解微化石的种类作为岩石时代的决定因素和区分相似微化石的基础。本研究利用CSPDarkNet53、CSPResNeXt-50和EfficientNet-B0三个骨干对5种有孔虫微化石的图像集进行训练和检测。对结果进行了评价，以确定每种主干的优势。有一些指标用于评估，即精度、召回率、f1分数、平均精度(AP)、平均平均精度(mAP)、每秒帧数(FPS)和模型大小。结果表明，CSPDarkNet53模型的平均精度(mAP)达到83.41%，高于CSPResNeXt-50和EfficientNet-B0模型的81.00%和81.76%。CSPResNeXt-50模型的准确率为75.60%，召回率为81.10%，f1得分为78%。CSPDarkNet53模型也获得了最高的33.4FPS。然而，使用EfficientNet-B0骨干网的YOLO v4型号是最轻的型号，只有156.8 MB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 Backbones on YOLO V4 as Object Detector

YOLO v4 has a structure consisting of 3 parts: backbone, neck, and head. The backbone is a part of the YOLO v4 structure that serves as a feature extractor from the image; the backbone is also a convolutional neural network that can be replaced with another convolutional neural network. Many backbones are recommended by previous research, such as CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0. Therefore, research needs to be done to determine the effect of different backbones on the YOLO v4 model. One of the research objects that can be used is a microfossil. Research on the detection of microfossils is fundamental to assist paleontologists in knowing the species of microfossils as a determinant of rock age and distinguishing between similar microfossils. In this research, three backbones consisting of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 were used to train and detect image sets of 5 species of foraminiferal microfossils. The results were evaluated to determine the advantages of each backbone. There are a few metrics are that being used for evaluation, namely precision, recall, f1-score, average precision (AP), mean average precision (mAP), frames per second (FPS), and model size. As a result, the mean average precision (mAP) of the CSPDarkNet53 model reached 83.41%, the highest compared to CSPResNeXt-50 and EfficientNet-B0, which get a value of 81,00% and 81,76%. CSPResNeXt-50 model has a precision of 75.60%, recall of 81.10%, and f1-score of 78%. CSPDarkNet53 model also got the highest FPS value of 33.4FPS. However, the YOLO v4 model with the EfficientNet-B0 backbone is the lightest model, with only 156.8 MB.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Engineering, Science and Information Technology

自引率

0.00%

发文量