Pub Date : 2024-04-23DOI: 10.1007/s10044-024-01267-y
Sangwon Kim, In-su Jang, ByoungChul Ko
{"title":"Domain-free fire detection using the spatial–temporal attention transform of the YOLO backbone","authors":"Sangwon Kim, In-su Jang, ByoungChul Ko","doi":"10.1007/s10044-024-01267-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01267-y","url":null,"abstract":"","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140667598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1007/s10044-024-01268-x
Sergio Campos, Juan Zamora, Héctor Allende
{"title":"Block-wise imputation EM algorithm in multi-source scenario: ADNI case","authors":"Sergio Campos, Juan Zamora, Héctor Allende","doi":"10.1007/s10044-024-01268-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01268-x","url":null,"abstract":"","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140675279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1007/s10044-024-01257-0
Amandeep Kaur, Geetanjali Singla, Manjinder Singh, Amit Mittal, Ruchi Mittal, Varun Malik
Accurate cotton images are significant component for surveiling cotton development and its precise control. A suitable technique for charting the distribution of cotton at the county or field level must be available to researchers and production managers. The classification of cotton remote sensing models at the county level has significant implications for precision farming, land management, and government decision-making. This work aims to develop a novel cotton crop classification model using satellite images based on soil behaviour. It includes phases like preprocessing, segmentation, feature extraction, and classification. Here, preprocessing is carried out by Gaussian filtering to improve the quality of the input image. Then Modified Deep Joint Segmentation method is employed for the segmentation process. The features such as wide dynamic range vegetation index, simple ratio, Green Chlorophyll index, Transformed vegetation index, and Green leaf area index are extracted for classifying the input. The hybrid Improved CNN (ICNN) and Bidirectional Gated recurrent Unit (Bi-GRU) have used for classification purposes, which is computed by the improved score level fusion. The suggested new hybrid optimization model known as the Battle Royale assisted Butterfly optimization algorithm (BRABOA) is used for adjusting the hidden neuron count of both the ICNN and Bi-GRU classifiers for improving the accuracy. At last, the efficiency of the suggested model is then evaluated to other schemes using a variety of metrics. The suggested HC + BRABOA method obtains a maximum accuracy of (0.95) over conventional methods at a learning percentage of 90% for classifying cotton crops using satellite images.
{"title":"Cotton crop classification using satellite images with score level fusion based hybrid model","authors":"Amandeep Kaur, Geetanjali Singla, Manjinder Singh, Amit Mittal, Ruchi Mittal, Varun Malik","doi":"10.1007/s10044-024-01257-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01257-0","url":null,"abstract":"<p>Accurate cotton images are significant component for surveiling cotton development and its precise control. A suitable technique for charting the distribution of cotton at the county or field level must be available to researchers and production managers. The classification of cotton remote sensing models at the county level has significant implications for precision farming, land management, and government decision-making. This work aims to develop a novel cotton crop classification model using satellite images based on soil behaviour. It includes phases like preprocessing, segmentation, feature extraction, and classification. Here, preprocessing is carried out by Gaussian filtering to improve the quality of the input image. Then Modified Deep Joint Segmentation method is employed for the segmentation process. The features such as wide dynamic range vegetation index, simple ratio, Green Chlorophyll index, Transformed vegetation index, and Green leaf area index are extracted for classifying the input. The hybrid Improved CNN (ICNN) and Bidirectional Gated recurrent Unit (Bi-GRU) have used for classification purposes, which is computed by the improved score level fusion. The suggested new hybrid optimization model known as the Battle Royale assisted Butterfly optimization algorithm (BRABOA) is used for adjusting the hidden neuron count of both the ICNN and Bi-GRU classifiers for improving the accuracy. At last, the efficiency of the suggested model is then evaluated to other schemes using a variety of metrics. The suggested HC + BRABOA method obtains a maximum accuracy of (0.95) over conventional methods at a learning percentage of 90% for classifying cotton crops using satellite images.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-15DOI: 10.1007/s10044-024-01263-2
Juan Manuel Rodriguez-Albala, Alejandro Peña, Pietro Melzi, Aythami Morales, Ruben Tolosana, Julian Fierrez, Ruben Vera-Rodriguez, Javier Ortega-Garcia
International Organizations urge the protection of our oceans and their ecosystems due to their immeasurable importance to humankind. Since illegal fishing activities, commonly known as IUU fishing, cause irreparable damage to these ecosystems, concerned organisms are pushing to detect and combat IUU fishing practices. The automatic identification system allows to locate the position and trajectory of fishing vessels. In this study we address the task of detecting vessels’ fishing gears based on the trajectory behavior defined by GPS position data, a useful task to prevent the proliferation of IUU fishing practices. We present a new database including trajectories that span 7 different fishing gears and analyze these as in a time sequence analysis problem. We leverage from feature extraction techniques from the online signature verification domain to model vessel trajectories, and extract relevant information in the form of both local and global feature sets. We show how, based on these sets of features, the kinematics of vessels according to different fishing gears can be effectively classified using common supervised learning algorithms with accuracies up to (90%). Furthermore, motivated by the concerns raised by several organizations on the adverse impact of bottom trawling on marine biodiversity, we present a binary classification experiment in which we were able to distinguish this kind of fishing gear with an accuracy of (99%). We also illustrate in an ablation study the relevance of factors such as data availability and the sampling period to perform fishing gear classification. Compared to existing works, we highlight these factors, especially the importance of using sampling periods in the order of minutes instead of hours.
{"title":"Spatio-temporal trajectory data modeling for fishing gear classification","authors":"Juan Manuel Rodriguez-Albala, Alejandro Peña, Pietro Melzi, Aythami Morales, Ruben Tolosana, Julian Fierrez, Ruben Vera-Rodriguez, Javier Ortega-Garcia","doi":"10.1007/s10044-024-01263-2","DOIUrl":"https://doi.org/10.1007/s10044-024-01263-2","url":null,"abstract":"<p>International Organizations urge the protection of our oceans and their ecosystems due to their immeasurable importance to humankind. Since illegal fishing activities, commonly known as IUU fishing, cause irreparable damage to these ecosystems, concerned organisms are pushing to detect and combat IUU fishing practices. The automatic identification system allows to locate the position and trajectory of fishing vessels. In this study we address the task of detecting vessels’ fishing gears based on the trajectory behavior defined by GPS position data, a useful task to prevent the proliferation of IUU fishing practices. We present a new database including trajectories that span 7 different fishing gears and analyze these as in a time sequence analysis problem. We leverage from feature extraction techniques from the online signature verification domain to model vessel trajectories, and extract relevant information in the form of both local and global feature sets. We show how, based on these sets of features, the kinematics of vessels according to different fishing gears can be effectively classified using common supervised learning algorithms with accuracies up to <span>(90%)</span>. Furthermore, motivated by the concerns raised by several organizations on the adverse impact of bottom trawling on marine biodiversity, we present a binary classification experiment in which we were able to distinguish this kind of fishing gear with an accuracy of <span>(99%)</span>. We also illustrate in an ablation study the relevance of factors such as data availability and the sampling period to perform fishing gear classification. Compared to existing works, we highlight these factors, especially the importance of using sampling periods in the order of minutes instead of hours.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-13DOI: 10.1007/s10044-024-01264-1
Jaroslav Moravec, Radim Šára
Cameras are the prevalent sensors used for perception in autonomous robotic systems, but their initial calibration may degrade over time due to dynamic factors. This may lead to a failure of downstream tasks, such as simultaneous localization and mapping (SLAM) or object recognition. Hence, a computationally lightweight process that detects the decalibration is of interest. We describe a modification of StOCaMo, an online calibration monitoring procedure for a stereoscopic system. The method uses robust kernel correlation based on epipolar constraints; it validates extrinsic calibration parameters on a single frame with no temporal tracking. In this paper, we present a modified StOCaMo with an improved recall rate on small decalibrations through a confirmation technique based on resampled variance. With fixed parameters learned on a realistic synthetic dataset from CARLA, StOCaMo and its proposed modification were tested on multiple sequences from two real-world datasets: KITTI and EuRoC MAV. The modification improved the recall of StOCaMo by 25 % (to 91 % and 82 %, respectively), and the accuracy by 12 % (to 94.7 % and 87.5 %, respectively), while labeling at most one-third of the input data as uninformative. The upgraded method achieved the rank correlation between StOCaMo V-index and downstream SLAM error of 0.78 (Spearman).
{"title":"High-recall calibration monitoring for stereo cameras","authors":"Jaroslav Moravec, Radim Šára","doi":"10.1007/s10044-024-01264-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01264-1","url":null,"abstract":"<p>Cameras are the prevalent sensors used for perception in autonomous robotic systems, but their initial calibration may degrade over time due to dynamic factors. This may lead to a failure of downstream tasks, such as simultaneous localization and mapping (SLAM) or object recognition. Hence, a computationally lightweight process that detects the decalibration is of interest. We describe a modification of StOCaMo, an online calibration monitoring procedure for a stereoscopic system. The method uses robust kernel correlation based on epipolar constraints; it validates extrinsic calibration parameters on a single frame with no temporal tracking. In this paper, we present a modified StOCaMo with an improved recall rate on small decalibrations through a confirmation technique based on resampled variance. With fixed parameters learned on a realistic synthetic dataset from CARLA, StOCaMo and its proposed modification were tested on multiple sequences from two real-world datasets: KITTI and EuRoC MAV. The modification improved the recall of StOCaMo by 25 % (to 91 % and 82 %, respectively), and the accuracy by 12 % (to 94.7 % and 87.5 %, respectively), while labeling at most one-third of the input data as uninformative. The upgraded method achieved the rank correlation between StOCaMo V-index and downstream SLAM error of 0.78 (Spearman).</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-12DOI: 10.1007/s10044-024-01260-5
Lacharme Guillaume, Cardot Hubert, Lente Christophe, Monmarche Nicolas
In this paper, we will provide a detailed explanation of the limitations behind differentiable architecture search (DARTS). Algorithms based on the DARTS paradigm tend to converge towards degenerate solutions. A degenerate solution corresponds to an architecture with a shallow graph containing mainly skip connections. We have identified 6 sources of errors that could explain this phenomenon. Some of these errors can only be partially eliminated. Therefore, we will propose an innovative solution to remove degenerate solutions from the search space. We will demonstrate the validity of our approach through experiments conducted on the CIFAR10 and CIFAR100 databases. Our code is available at the following link: https://scm.univ-tours.fr/projetspublics/lifat/darts_ibpria_sparcity
{"title":"The limitations of differentiable architecture search","authors":"Lacharme Guillaume, Cardot Hubert, Lente Christophe, Monmarche Nicolas","doi":"10.1007/s10044-024-01260-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01260-5","url":null,"abstract":"<p>In this paper, we will provide a detailed explanation of the limitations behind differentiable architecture search (DARTS). Algorithms based on the DARTS paradigm tend to converge towards degenerate solutions. A degenerate solution corresponds to an architecture with a shallow graph containing mainly skip connections. We have identified 6 sources of errors that could explain this phenomenon. Some of these errors can only be partially eliminated. Therefore, we will propose an innovative solution to remove degenerate solutions from the search space. We will demonstrate the validity of our approach through experiments conducted on the CIFAR10 and CIFAR100 databases. Our code is available at the following link: https://scm.univ-tours.fr/projetspublics/lifat/darts_ibpria_sparcity</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-07DOI: 10.1007/s10044-024-01262-3
Ana Almeida, Susana Brás, Susana Sargento, Filipe Cabral Pinto
The effective use of time series data is crucial in business decision-making. Temporal data reveals temporal trends and patterns, enabling decision-makers to make informed decisions and prevent potential problems. However, missing values in time series data can interfere with the analysis and lead to inaccurate conclusions. Thus, our work proposes a Focalize K-NN method that leverages time series properties to perform missing data imputation. This approach shows the benefits of taking advantage of correlated features and temporal lags to improve the performance of the traditional K-NN imputer. A similar approach could be employed in other methods. We tested this approach with two datasets, various parameter and feature combinations, and observed that it is beneficial in scenarios with disjoint missing patterns. Our findings demonstrate the effectiveness of Focalize K-NN for imputing missing values in time series data. The more noticeable benefits of our methods occur when there is a high percentage of missing data. However, as the amount of missing data increases, so does the error.
{"title":"Focalize K-NN: an imputation algorithm for time series datasets","authors":"Ana Almeida, Susana Brás, Susana Sargento, Filipe Cabral Pinto","doi":"10.1007/s10044-024-01262-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01262-3","url":null,"abstract":"<p>The effective use of time series data is crucial in business decision-making. Temporal data reveals temporal trends and patterns, enabling decision-makers to make informed decisions and prevent potential problems. However, missing values in time series data can interfere with the analysis and lead to inaccurate conclusions. Thus, our work proposes a Focalize K-NN method that leverages time series properties to perform missing data imputation. This approach shows the benefits of taking advantage of correlated features and temporal lags to improve the performance of the traditional K-NN imputer. A similar approach could be employed in other methods. We tested this approach with two datasets, various parameter and feature combinations, and observed that it is beneficial in scenarios with disjoint missing patterns. Our findings demonstrate the effectiveness of Focalize K-NN for imputing missing values in time series data. The more noticeable benefits of our methods occur when there is a high percentage of missing data. However, as the amount of missing data increases, so does the error.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-04DOI: 10.1007/s10044-024-01229-4
Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka, Jungpil Shin
Automatic sign language recognition (SLR) stands as a vital aspect within the realms of human–computer interaction and computer vision, facilitating the conversion of hand signs utilized by individuals with significant hearing and speech impairments into equivalent text or voice. Researchers have recently used hand skeleton joint information instead of the image pixel due to light illumination and complex background-bound problems. However, besides the hand information, body motion and facial gestures play an essential role in expressing sign language emotion. Also, a few researchers have been working to develop an SLR system by taking a multi-gesture dataset, but their performance accuracy and time complexity are not sufficient. In light of these limitations, we introduce a spatial and temporal attention model amalgamated with a general neural network designed for the SLR system. The main idea of our architecture is first to construct a fully connected graph to project the skeleton information. We employ self-attention mechanisms to extract insights from node and edge features across spatial and temporal domains. Our architecture bifurcates into three branches: a graph-based spatial branch, a graph-based temporal branch, and a general neural network branch, which collectively synergize to contribute to the final feature integration. Specifically, the spatial branch discerns spatial dependencies, while the temporal branch amplifies temporal dependencies embedded within the sequential hand skeleton data. Further, the general neural network branch enhances the architecture’s generalization capabilities, bolstering its robustness. In our evaluation, utilizing the Mexican Sign Language (MSL), Pakistani Sign Language (PSL) datasets, and American Sign Language Large Video dataset which comprises 3D joint coordinates for face, body, and hands that conducted experiments on individual gestures and their combinations. Impressively, our model demonstrated notable efficacy, achieving an accuracy rate of 99.96% for the MSL dataset, 92.00% for PSL, and 26.00% for the ASLLVD dataset, which includes more than 2700 classes. These exemplary performance metrics, coupled with the model’s computationally efficient profile, underscore its preeminence compared to contemporaneous methodologies in the field.
{"title":"Spatial–temporal attention with graph and general neural network-based sign language recognition","authors":"Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka, Jungpil Shin","doi":"10.1007/s10044-024-01229-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01229-4","url":null,"abstract":"<p>Automatic sign language recognition (SLR) stands as a vital aspect within the realms of human–computer interaction and computer vision, facilitating the conversion of hand signs utilized by individuals with significant hearing and speech impairments into equivalent text or voice. Researchers have recently used hand skeleton joint information instead of the image pixel due to light illumination and complex background-bound problems. However, besides the hand information, body motion and facial gestures play an essential role in expressing sign language emotion. Also, a few researchers have been working to develop an SLR system by taking a multi-gesture dataset, but their performance accuracy and time complexity are not sufficient. In light of these limitations, we introduce a spatial and temporal attention model amalgamated with a general neural network designed for the SLR system. The main idea of our architecture is first to construct a fully connected graph to project the skeleton information. We employ self-attention mechanisms to extract insights from node and edge features across spatial and temporal domains. Our architecture bifurcates into three branches: a graph-based spatial branch, a graph-based temporal branch, and a general neural network branch, which collectively synergize to contribute to the final feature integration. Specifically, the spatial branch discerns spatial dependencies, while the temporal branch amplifies temporal dependencies embedded within the sequential hand skeleton data. Further, the general neural network branch enhances the architecture’s generalization capabilities, bolstering its robustness. In our evaluation, utilizing the Mexican Sign Language (MSL), Pakistani Sign Language (PSL) datasets, and American Sign Language Large Video dataset which comprises 3D joint coordinates for face, body, and hands that conducted experiments on individual gestures and their combinations. Impressively, our model demonstrated notable efficacy, achieving an accuracy rate of 99.96% for the MSL dataset, 92.00% for PSL, and 26.00% for the ASLLVD dataset, which includes more than 2700 classes. These exemplary performance metrics, coupled with the model’s computationally efficient profile, underscore its preeminence compared to contemporaneous methodologies in the field.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140585406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning techniques can be effective in helping doctors diagnose gastrointestinal polyps. Currently, processing video frame sequences containing a large amount of spurious noise in polyp detection suffers from elevated recall and mean average precision. Moreover, the mean average precision is also low when the polyp target in the video frame has large-scale variability. Therefore, we propose a tiny polyp detection from endoscopic video frames using Vision Transformers, named TPolyp. The proposed method uses a cross-stage Swin Transformer as a multi-scale feature extractor to extract deep feature representations of data samples, improves the bidirectional sampling feature pyramid, and integrates the prediction heads of multiple channel self-attention mechanisms. This approach focuses more on the feature information of the tiny object detection task than convolutional neural networks and retains relatively deeper semantic information. It additionally improves feature expression and discriminability without increasing the computational complexity. Experimental results show that TPolyp improves detection accuracy by 7%, recall by 7.3%, and average accuracy by 7.5% compared to the YOLOv5 model, and has better tiny object detection in scenarios with blurry artifacts.
{"title":"Tiny polyp detection from endoscopic video frames using vision transformers","authors":"Entong Liu, Bishi He, Darong Zhu, Yuanjiao Chen, Zhe Xu","doi":"10.1007/s10044-024-01254-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01254-3","url":null,"abstract":"<p>Deep learning techniques can be effective in helping doctors diagnose gastrointestinal polyps. Currently, processing video frame sequences containing a large amount of spurious noise in polyp detection suffers from elevated recall and mean average precision. Moreover, the mean average precision is also low when the polyp target in the video frame has large-scale variability. Therefore, we propose a tiny polyp detection from endoscopic video frames using Vision Transformers, named TPolyp. The proposed method uses a cross-stage Swin Transformer as a multi-scale feature extractor to extract deep feature representations of data samples, improves the bidirectional sampling feature pyramid, and integrates the prediction heads of multiple channel self-attention mechanisms. This approach focuses more on the feature information of the tiny object detection task than convolutional neural networks and retains relatively deeper semantic information. It additionally improves feature expression and discriminability without increasing the computational complexity. Experimental results show that TPolyp improves detection accuracy by 7%, recall by 7.3%, and average accuracy by 7.5% compared to the YOLOv5 model, and has better tiny object detection in scenarios with blurry artifacts.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1007/s10044-024-01252-5
Abstract
Deep learning algorithms have gained widespread usage in defect detection systems. However, existing methods are not satisfied for large-scale applications on surface defect detection of strip steel. In this paper, we propose a precise and efficient detection model, named CABF-YOLO, based on the YOLOX for strip steel surface defects. Firstly, we introduce the Triplet Convolutional Coordinate Attention (TCCA) module in the backbone of the YOLOX. By factorizing the pooling operation, the TCCA module can accurately capture cross-channel features to identify the location information of defects. Secondly, we design a novel Bidirectional Fusion (BF) strategy in the neck of the YOLOX. The BF strategy enhances the fusion of low-level and high-level semantic information to obtain fine-grained information. Lastly, the original bounding box loss function is replaced by the EIoU loss function. In the EIoU loss function, the penalty term is redefined to consider the overlap area, central point, and side length of the required regressions to accelerate the convergence rate and localization accuracy. On the benchmark NEU-DET dataset and GC10-DET dataset, the experimental results show that the CABF-YOLO achieves superior performance compared with other comparison models and satisfies the real-time detection requirement of industrial production.
{"title":"CABF-YOLO: a precise and efficient deep learning method for defect detection on strip steel surface","authors":"","doi":"10.1007/s10044-024-01252-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01252-5","url":null,"abstract":"<h3>Abstract</h3> <p>Deep learning algorithms have gained widespread usage in defect detection systems. However, existing methods are not satisfied for large-scale applications on surface defect detection of strip steel. In this paper, we propose a precise and efficient detection model, named CABF-YOLO, based on the YOLOX for strip steel surface defects. Firstly, we introduce the Triplet Convolutional Coordinate Attention (TCCA) module in the backbone of the YOLOX. By factorizing the pooling operation, the TCCA module can accurately capture cross-channel features to identify the location information of defects. Secondly, we design a novel Bidirectional Fusion (BF) strategy in the neck of the YOLOX. The BF strategy enhances the fusion of low-level and high-level semantic information to obtain fine-grained information. Lastly, the original bounding box loss function is replaced by the EIoU loss function. In the EIoU loss function, the penalty term is redefined to consider the overlap area, central point, and side length of the required regressions to accelerate the convergence rate and localization accuracy. On the benchmark NEU-DET dataset and GC10-DET dataset, the experimental results show that the CABF-YOLO achieves superior performance compared with other comparison models and satisfies the real-time detection requirement of industrial production.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140585401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}