{"title":"BiFormer Attention-Guided Multiscale Fusion Mask2former Networks for Fish Abnormal Behavior Recognition and Segmentation","authors":"Jihang Liu, Zeyuan Hu, Yixi Zhang, Yinjia Li, Jinsong Yang, Hong Yu","doi":"10.1155/are/8892810","DOIUrl":null,"url":null,"abstract":"<div>\n <p>To address the issues of accurately identifying and tracking individual fish abnormal behaviors and poor adaptability in the aquaculture field, this paper proposes a Mask2former model combined with a bidirectional routing attention mechanism (BiFormer) and a multiscale dilated attention (MSDA) module for fish abnormal behavior recognition and segmentation. To compensate for the lack of publicly available datasets on fish abnormal behavior, we created the “FISH_segmentation_2023” abnormal behavior dataset, which includes four types of fish behaviors. First, by introducing the BiFormer attention mechanism, the model can better capture critical temporal and spatial information in image sequences, significantly enhancing feature representation. Second, after processing the feature maps with the pixel decoder, the MSDA module is introduced to perform multiscale fusion on these features. The fused features are then passed to the transformer decoder, further enhancing the model’s ability to recognize fish abnormal behaviors. Finally, to further improve model performance and address class imbalance issues in the dataset, we designed a composite loss function combining focal loss and dice loss (FD loss). This loss function can balance the influence of easy and difficult-to-classify samples while optimizing segmentation performance, thereby improving the model’s recognition accuracy and mean intersection over union (mIoU) metrics. Experimental results show that the BiFormer multiscale dilated attention FD loss (BMF)-Mask2former model exhibits high performance, achieving average intersection over union (IoU), accuracy, and recall values of 92.33%, 95.63%, and 94.82%, respectively, on the self-built FISH_segmentation_2023 dataset, representing improvements of 6.10%, 4.50%, and 5.09%, respectively, compared to the Mask2former model. The study demonstrates that the proposed model can accurately capture both local and contextual features of fish abnormal behaviors through multiscale fusion methods, resulting in high-quality segmentation outcomes.</p>\n </div>","PeriodicalId":8104,"journal":{"name":"Aquaculture Research","volume":"2024 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/are/8892810","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture Research","FirstCategoryId":"97","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/are/8892810","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0
Abstract
To address the issues of accurately identifying and tracking individual fish abnormal behaviors and poor adaptability in the aquaculture field, this paper proposes a Mask2former model combined with a bidirectional routing attention mechanism (BiFormer) and a multiscale dilated attention (MSDA) module for fish abnormal behavior recognition and segmentation. To compensate for the lack of publicly available datasets on fish abnormal behavior, we created the “FISH_segmentation_2023” abnormal behavior dataset, which includes four types of fish behaviors. First, by introducing the BiFormer attention mechanism, the model can better capture critical temporal and spatial information in image sequences, significantly enhancing feature representation. Second, after processing the feature maps with the pixel decoder, the MSDA module is introduced to perform multiscale fusion on these features. The fused features are then passed to the transformer decoder, further enhancing the model’s ability to recognize fish abnormal behaviors. Finally, to further improve model performance and address class imbalance issues in the dataset, we designed a composite loss function combining focal loss and dice loss (FD loss). This loss function can balance the influence of easy and difficult-to-classify samples while optimizing segmentation performance, thereby improving the model’s recognition accuracy and mean intersection over union (mIoU) metrics. Experimental results show that the BiFormer multiscale dilated attention FD loss (BMF)-Mask2former model exhibits high performance, achieving average intersection over union (IoU), accuracy, and recall values of 92.33%, 95.63%, and 94.82%, respectively, on the self-built FISH_segmentation_2023 dataset, representing improvements of 6.10%, 4.50%, and 5.09%, respectively, compared to the Mask2former model. The study demonstrates that the proposed model can accurately capture both local and contextual features of fish abnormal behaviors through multiscale fusion methods, resulting in high-quality segmentation outcomes.
期刊介绍:
International in perspective, Aquaculture Research is published 12 times a year and specifically addresses research and reference needs of all working and studying within the many varied areas of aquaculture. The Journal regularly publishes papers on applied or scientific research relevant to freshwater, brackish, and marine aquaculture. It covers all aquatic organisms, floristic and faunistic, related directly or indirectly to human consumption. The journal also includes review articles, short communications and technical papers. Young scientists are particularly encouraged to submit short communications based on their own research.