首页 > 最新文献

Pattern Analysis and Applications最新文献

英文 中文
Domain-free fire detection using the spatial–temporal attention transform of the YOLO backbone 利用 YOLO 主干网的时空注意变换进行无域火灾探测
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-23 DOI: 10.1007/s10044-024-01267-y
Sangwon Kim, In-su Jang, ByoungChul Ko
{"title":"Domain-free fire detection using the spatial–temporal attention transform of the YOLO backbone","authors":"Sangwon Kim, In-su Jang, ByoungChul Ko","doi":"10.1007/s10044-024-01267-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01267-y","url":null,"abstract":"","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140667598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Block-wise imputation EM algorithm in multi-source scenario: ADNI case 多源情况下的分块归因 EM 算法:ADNI 案例
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-22 DOI: 10.1007/s10044-024-01268-x
Sergio Campos, Juan Zamora, Héctor Allende
{"title":"Block-wise imputation EM algorithm in multi-source scenario: ADNI case","authors":"Sergio Campos, Juan Zamora, Héctor Allende","doi":"10.1007/s10044-024-01268-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01268-x","url":null,"abstract":"","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140675279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cotton crop classification using satellite images with score level fusion based hybrid model 利用基于分数级融合混合模型的卫星图像进行棉花作物分类
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-16 DOI: 10.1007/s10044-024-01257-0
Amandeep Kaur, Geetanjali Singla, Manjinder Singh, Amit Mittal, Ruchi Mittal, Varun Malik

Accurate cotton images are significant component for surveiling cotton development and its precise control. A suitable technique for charting the distribution of cotton at the county or field level must be available to researchers and production managers. The classification of cotton remote sensing models at the county level has significant implications for precision farming, land management, and government decision-making. This work aims to develop a novel cotton crop classification model using satellite images based on soil behaviour. It includes phases like preprocessing, segmentation, feature extraction, and classification. Here, preprocessing is carried out by Gaussian filtering to improve the quality of the input image. Then Modified Deep Joint Segmentation method is employed for the segmentation process. The features such as wide dynamic range vegetation index, simple ratio, Green Chlorophyll index, Transformed vegetation index, and Green leaf area index are extracted for classifying the input. The hybrid Improved CNN (ICNN) and Bidirectional Gated recurrent Unit (Bi-GRU) have used for classification purposes, which is computed by the improved score level fusion. The suggested new hybrid optimization model known as the Battle Royale assisted Butterfly optimization algorithm (BRABOA) is used for adjusting the hidden neuron count of both the ICNN and Bi-GRU classifiers for improving the accuracy. At last, the efficiency of the suggested model is then evaluated to other schemes using a variety of metrics. The suggested HC + BRABOA method obtains a maximum accuracy of (0.95) over conventional methods at a learning percentage of 90% for classifying cotton crops using satellite images.

准确的棉花图像是调查棉花发展及其精确控制的重要组成部分。研究人员和生产管理人员必须掌握一种合适的技术,绘制县级或田间棉花分布图。县级棉花遥感模型分类对精准农业、土地管理和政府决策具有重要意义。这项工作旨在利用卫星图像开发一种基于土壤特性的新型棉花作物分类模型。它包括预处理、分割、特征提取和分类等阶段。其中,预处理是通过高斯滤波来提高输入图像的质量。然后,在分割过程中采用修正的深度联合分割方法。提取宽动态范围植被指数、简单比率、绿色叶绿素指数、变换植被指数和绿叶面积指数等特征对输入图像进行分类。混合改进型 CNN(ICNN)和双向门控递归单元(Bi-GRU)用于分类目的,通过改进的分数级融合进行计算。所建议的新混合优化模型被称为 "大逃杀辅助蝴蝶优化算法"(BRABOA),用于调整 ICNN 和 Bi-GRU 分类器的隐藏神经元数量,以提高准确率。最后,使用各种指标对建议模型的效率与其他方案进行评估。在利用卫星图像对棉花作物进行分类时,建议的 HC + BRABOA 方法在学习率为 90% 的情况下,比传统方法获得了 (0.95) 的最高准确率。
{"title":"Cotton crop classification using satellite images with score level fusion based hybrid model","authors":"Amandeep Kaur, Geetanjali Singla, Manjinder Singh, Amit Mittal, Ruchi Mittal, Varun Malik","doi":"10.1007/s10044-024-01257-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01257-0","url":null,"abstract":"<p>Accurate cotton images are significant component for surveiling cotton development and its precise control. A suitable technique for charting the distribution of cotton at the county or field level must be available to researchers and production managers. The classification of cotton remote sensing models at the county level has significant implications for precision farming, land management, and government decision-making. This work aims to develop a novel cotton crop classification model using satellite images based on soil behaviour. It includes phases like preprocessing, segmentation, feature extraction, and classification. Here, preprocessing is carried out by Gaussian filtering to improve the quality of the input image. Then Modified Deep Joint Segmentation method is employed for the segmentation process. The features such as wide dynamic range vegetation index, simple ratio, Green Chlorophyll index, Transformed vegetation index, and Green leaf area index are extracted for classifying the input. The hybrid Improved CNN (ICNN) and Bidirectional Gated recurrent Unit (Bi-GRU) have used for classification purposes, which is computed by the improved score level fusion. The suggested new hybrid optimization model known as the Battle Royale assisted Butterfly optimization algorithm (BRABOA) is used for adjusting the hidden neuron count of both the ICNN and Bi-GRU classifiers for improving the accuracy. At last, the efficiency of the suggested model is then evaluated to other schemes using a variety of metrics. The suggested HC + BRABOA method obtains a maximum accuracy of (0.95) over conventional methods at a learning percentage of 90% for classifying cotton crops using satellite images.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal trajectory data modeling for fishing gear classification 为渔具分类建立时空轨迹数据模型
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-15 DOI: 10.1007/s10044-024-01263-2
Juan Manuel Rodriguez-Albala, Alejandro Peña, Pietro Melzi, Aythami Morales, Ruben Tolosana, Julian Fierrez, Ruben Vera-Rodriguez, Javier Ortega-Garcia

International Organizations urge the protection of our oceans and their ecosystems due to their immeasurable importance to humankind. Since illegal fishing activities, commonly known as IUU fishing, cause irreparable damage to these ecosystems, concerned organisms are pushing to detect and combat IUU fishing practices. The automatic identification system allows to locate the position and trajectory of fishing vessels. In this study we address the task of detecting vessels’ fishing gears based on the trajectory behavior defined by GPS position data, a useful task to prevent the proliferation of IUU fishing practices. We present a new database including trajectories that span 7 different fishing gears and analyze these as in a time sequence analysis problem. We leverage from feature extraction techniques from the online signature verification domain to model vessel trajectories, and extract relevant information in the form of both local and global feature sets. We show how, based on these sets of features, the kinematics of vessels according to different fishing gears can be effectively classified using common supervised learning algorithms with accuracies up to (90%). Furthermore, motivated by the concerns raised by several organizations on the adverse impact of bottom trawling on marine biodiversity, we present a binary classification experiment in which we were able to distinguish this kind of fishing gear with an accuracy of (99%). We also illustrate in an ablation study the relevance of factors such as data availability and the sampling period to perform fishing gear classification. Compared to existing works, we highlight these factors, especially the importance of using sampling periods in the order of minutes instead of hours.

由于海洋及其生态系统对人类的重要性不可估量,国际组织敦促保护海洋及其生态系统。由于非法捕鱼活动(通常被称为非法、无管制和未报告的捕捞活动)对这些生态系统造成了无法弥补的破坏,相关组织正在推动侦查和打击非法、无管制和未报告的捕捞活动。自动识别系统可以定位渔船的位置和轨迹。在这项研究中,我们根据 GPS 定位数据确定的轨迹行为来检测渔船的渔具,这是防止非法、未报告和无管制捕捞活动扩散的一项有用任务。我们提出了一个新的数据库,其中包括跨越 7 种不同渔具的轨迹,并将这些轨迹作为时序分析问题进行分析。我们利用在线签名验证领域的特征提取技术为船只轨迹建模,并以局部和全局特征集的形式提取相关信息。我们展示了如何基于这些特征集,使用普通的监督学习算法对不同渔具的船只运动学进行有效分类,准确率可达(90%)。此外,由于一些组织关注底拖网对海洋生物多样性的不利影响,我们提出了一个二元分类实验,在该实验中,我们能够以99%的准确率区分出这种渔具。我们还在一项消融研究中说明了数据可用性和采样期等因素对进行渔具分类的相关性。与现有工作相比,我们强调了这些因素,特别是使用分钟级而不是小时级采样周期的重要性。
{"title":"Spatio-temporal trajectory data modeling for fishing gear classification","authors":"Juan Manuel Rodriguez-Albala, Alejandro Peña, Pietro Melzi, Aythami Morales, Ruben Tolosana, Julian Fierrez, Ruben Vera-Rodriguez, Javier Ortega-Garcia","doi":"10.1007/s10044-024-01263-2","DOIUrl":"https://doi.org/10.1007/s10044-024-01263-2","url":null,"abstract":"<p>International Organizations urge the protection of our oceans and their ecosystems due to their immeasurable importance to humankind. Since illegal fishing activities, commonly known as IUU fishing, cause irreparable damage to these ecosystems, concerned organisms are pushing to detect and combat IUU fishing practices. The automatic identification system allows to locate the position and trajectory of fishing vessels. In this study we address the task of detecting vessels’ fishing gears based on the trajectory behavior defined by GPS position data, a useful task to prevent the proliferation of IUU fishing practices. We present a new database including trajectories that span 7 different fishing gears and analyze these as in a time sequence analysis problem. We leverage from feature extraction techniques from the online signature verification domain to model vessel trajectories, and extract relevant information in the form of both local and global feature sets. We show how, based on these sets of features, the kinematics of vessels according to different fishing gears can be effectively classified using common supervised learning algorithms with accuracies up to <span>(90%)</span>. Furthermore, motivated by the concerns raised by several organizations on the adverse impact of bottom trawling on marine biodiversity, we present a binary classification experiment in which we were able to distinguish this kind of fishing gear with an accuracy of <span>(99%)</span>. We also illustrate in an ablation study the relevance of factors such as data availability and the sampling period to perform fishing gear classification. Compared to existing works, we highlight these factors, especially the importance of using sampling periods in the order of minutes instead of hours.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-recall calibration monitoring for stereo cameras 立体摄像机的高召回率校准监控
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-13 DOI: 10.1007/s10044-024-01264-1
Jaroslav Moravec, Radim Šára

Cameras are the prevalent sensors used for perception in autonomous robotic systems, but their initial calibration may degrade over time due to dynamic factors. This may lead to a failure of downstream tasks, such as simultaneous localization and mapping (SLAM) or object recognition. Hence, a computationally lightweight process that detects the decalibration is of interest. We describe a modification of StOCaMo, an online calibration monitoring procedure for a stereoscopic system. The method uses robust kernel correlation based on epipolar constraints; it validates extrinsic calibration parameters on a single frame with no temporal tracking. In this paper, we present a modified StOCaMo with an improved recall rate on small decalibrations through a confirmation technique based on resampled variance. With fixed parameters learned on a realistic synthetic dataset from CARLA, StOCaMo and its proposed modification were tested on multiple sequences from two real-world datasets: KITTI and EuRoC MAV. The modification improved the recall of StOCaMo by 25 % (to 91 % and 82 %, respectively), and the accuracy by 12 % (to 94.7 % and 87.5 %, respectively), while labeling at most one-third of the input data as uninformative. The upgraded method achieved the rank correlation between StOCaMo V-index and downstream SLAM error of 0.78 (Spearman).

摄像头是自主机器人系统中最常用的感知传感器,但由于动态因素的影响,其初始校准功能可能会随着时间的推移而退化。这可能导致下游任务失败,如同步定位和映射(SLAM)或物体识别。因此,一个能检测到解校准的轻量级计算过程就显得尤为重要。我们介绍了对 StOCaMo 的修改,这是一种立体系统在线校准监控程序。该方法使用基于外极点约束的稳健内核相关性;它在单帧上验证外在校准参数,无需时间跟踪。在本文中,我们介绍了一种改进的 StOCaMo,它通过基于重采样方差的确认技术,提高了小规模解标定的召回率。通过在 CARLA 的现实合成数据集上学习固定参数,StOCaMo 及其改进版在两个真实世界数据集的多个序列上进行了测试:KITTI 和 EuRoC MAV。修改后,StOCaMo 的召回率提高了 25%(分别为 91% 和 82%),准确率提高了 12%(分别为 94.7% 和 87.5%),同时最多只有三分之一的输入数据被标记为无信息。升级后的方法使 StOCaMo V 指数与下游 SLAM 误差之间的等级相关性达到 0.78(Spearman)。
{"title":"High-recall calibration monitoring for stereo cameras","authors":"Jaroslav Moravec, Radim Šára","doi":"10.1007/s10044-024-01264-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01264-1","url":null,"abstract":"<p>Cameras are the prevalent sensors used for perception in autonomous robotic systems, but their initial calibration may degrade over time due to dynamic factors. This may lead to a failure of downstream tasks, such as simultaneous localization and mapping (SLAM) or object recognition. Hence, a computationally lightweight process that detects the decalibration is of interest. We describe a modification of StOCaMo, an online calibration monitoring procedure for a stereoscopic system. The method uses robust kernel correlation based on epipolar constraints; it validates extrinsic calibration parameters on a single frame with no temporal tracking. In this paper, we present a modified StOCaMo with an improved recall rate on small decalibrations through a confirmation technique based on resampled variance. With fixed parameters learned on a realistic synthetic dataset from CARLA, StOCaMo and its proposed modification were tested on multiple sequences from two real-world datasets: KITTI and EuRoC MAV. The modification improved the recall of StOCaMo by 25 % (to 91 % and 82 %, respectively), and the accuracy by 12 % (to 94.7 % and 87.5 %, respectively), while labeling at most one-third of the input data as uninformative. The upgraded method achieved the rank correlation between StOCaMo V-index and downstream SLAM error of 0.78 (Spearman).</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The limitations of differentiable architecture search 可微分架构搜索的局限性
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-12 DOI: 10.1007/s10044-024-01260-5
Lacharme Guillaume, Cardot Hubert, Lente Christophe, Monmarche Nicolas

In this paper, we will provide a detailed explanation of the limitations behind differentiable architecture search (DARTS). Algorithms based on the DARTS paradigm tend to converge towards degenerate solutions. A degenerate solution corresponds to an architecture with a shallow graph containing mainly skip connections. We have identified 6 sources of errors that could explain this phenomenon. Some of these errors can only be partially eliminated. Therefore, we will propose an innovative solution to remove degenerate solutions from the search space. We will demonstrate the validity of our approach through experiments conducted on the CIFAR10 and CIFAR100 databases. Our code is available at the following link: https://scm.univ-tours.fr/projetspublics/lifat/darts_ibpria_sparcity

在本文中,我们将详细解释可微分架构搜索(DARTS)背后的局限性。基于 DARTS 范式的算法往往会向退化解收敛。退化解对应的是主要包含跳过连接的浅层图架构。我们发现有 6 个错误源可以解释这种现象。其中一些错误只能部分消除。因此,我们将提出一种创新的解决方案,以消除搜索空间中的退化解决方案。我们将通过在 CIFAR10 和 CIFAR100 数据库上进行的实验来证明我们方法的有效性。我们的代码可从以下链接获取:https://scm.univ-tours.fr/projetspublics/lifat/darts_ibpria_sparcity
{"title":"The limitations of differentiable architecture search","authors":"Lacharme Guillaume, Cardot Hubert, Lente Christophe, Monmarche Nicolas","doi":"10.1007/s10044-024-01260-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01260-5","url":null,"abstract":"<p>In this paper, we will provide a detailed explanation of the limitations behind differentiable architecture search (DARTS). Algorithms based on the DARTS paradigm tend to converge towards degenerate solutions. A degenerate solution corresponds to an architecture with a shallow graph containing mainly skip connections. We have identified 6 sources of errors that could explain this phenomenon. Some of these errors can only be partially eliminated. Therefore, we will propose an innovative solution to remove degenerate solutions from the search space. We will demonstrate the validity of our approach through experiments conducted on the CIFAR10 and CIFAR100 databases. Our code is available at the following link: https://scm.univ-tours.fr/projetspublics/lifat/darts_ibpria_sparcity</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Focalize K-NN: an imputation algorithm for time series datasets Focalize K-NN:时间序列数据集的估算算法
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-07 DOI: 10.1007/s10044-024-01262-3
Ana Almeida, Susana Brás, Susana Sargento, Filipe Cabral Pinto

The effective use of time series data is crucial in business decision-making. Temporal data reveals temporal trends and patterns, enabling decision-makers to make informed decisions and prevent potential problems. However, missing values in time series data can interfere with the analysis and lead to inaccurate conclusions. Thus, our work proposes a Focalize K-NN method that leverages time series properties to perform missing data imputation. This approach shows the benefits of taking advantage of correlated features and temporal lags to improve the performance of the traditional K-NN imputer. A similar approach could be employed in other methods. We tested this approach with two datasets, various parameter and feature combinations, and observed that it is beneficial in scenarios with disjoint missing patterns. Our findings demonstrate the effectiveness of Focalize K-NN for imputing missing values in time series data. The more noticeable benefits of our methods occur when there is a high percentage of missing data. However, as the amount of missing data increases, so does the error.

有效利用时间序列数据对商业决策至关重要。时间数据揭示了时间趋势和模式,使决策者能够做出明智的决策并预防潜在的问题。然而,时间序列数据中的缺失值会干扰分析并导致不准确的结论。因此,我们的工作提出了一种 Focalize K-NN 方法,利用时间序列特性来执行缺失数据估算。这种方法显示了利用相关特征和时滞来提高传统 K-NN 计算器性能的好处。其他方法也可以采用类似的方法。我们用两个数据集、不同的参数和特征组合对这种方法进行了测试,发现它在缺失模式不连贯的情况下很有优势。我们的研究结果证明了 Focalize K-NN 在时间序列数据缺失值补偿方面的有效性。当缺失数据比例较高时,我们的方法就会产生更明显的优势。然而,随着缺失数据量的增加,误差也在增加。
{"title":"Focalize K-NN: an imputation algorithm for time series datasets","authors":"Ana Almeida, Susana Brás, Susana Sargento, Filipe Cabral Pinto","doi":"10.1007/s10044-024-01262-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01262-3","url":null,"abstract":"<p>The effective use of time series data is crucial in business decision-making. Temporal data reveals temporal trends and patterns, enabling decision-makers to make informed decisions and prevent potential problems. However, missing values in time series data can interfere with the analysis and lead to inaccurate conclusions. Thus, our work proposes a Focalize K-NN method that leverages time series properties to perform missing data imputation. This approach shows the benefits of taking advantage of correlated features and temporal lags to improve the performance of the traditional K-NN imputer. A similar approach could be employed in other methods. We tested this approach with two datasets, various parameter and feature combinations, and observed that it is beneficial in scenarios with disjoint missing patterns. Our findings demonstrate the effectiveness of Focalize K-NN for imputing missing values in time series data. The more noticeable benefits of our methods occur when there is a high percentage of missing data. However, as the amount of missing data increases, so does the error.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial–temporal attention with graph and general neural network-based sign language recognition 基于图形和通用神经网络的时空注意力手语识别
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-04 DOI: 10.1007/s10044-024-01229-4
Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka, Jungpil Shin

Automatic sign language recognition (SLR) stands as a vital aspect within the realms of human–computer interaction and computer vision, facilitating the conversion of hand signs utilized by individuals with significant hearing and speech impairments into equivalent text or voice. Researchers have recently used hand skeleton joint information instead of the image pixel due to light illumination and complex background-bound problems. However, besides the hand information, body motion and facial gestures play an essential role in expressing sign language emotion. Also, a few researchers have been working to develop an SLR system by taking a multi-gesture dataset, but their performance accuracy and time complexity are not sufficient. In light of these limitations, we introduce a spatial and temporal attention model amalgamated with a general neural network designed for the SLR system. The main idea of our architecture is first to construct a fully connected graph to project the skeleton information. We employ self-attention mechanisms to extract insights from node and edge features across spatial and temporal domains. Our architecture bifurcates into three branches: a graph-based spatial branch, a graph-based temporal branch, and a general neural network branch, which collectively synergize to contribute to the final feature integration. Specifically, the spatial branch discerns spatial dependencies, while the temporal branch amplifies temporal dependencies embedded within the sequential hand skeleton data. Further, the general neural network branch enhances the architecture’s generalization capabilities, bolstering its robustness. In our evaluation, utilizing the Mexican Sign Language (MSL), Pakistani Sign Language (PSL) datasets, and American Sign Language Large Video dataset which comprises 3D joint coordinates for face, body, and hands that conducted experiments on individual gestures and their combinations. Impressively, our model demonstrated notable efficacy, achieving an accuracy rate of 99.96% for the MSL dataset, 92.00% for PSL, and 26.00% for the ASLLVD dataset, which includes more than 2700 classes. These exemplary performance metrics, coupled with the model’s computationally efficient profile, underscore its preeminence compared to contemporaneous methodologies in the field.

自动手语识别(SLR)是人机交互和计算机视觉领域的一个重要方面,它有助于将有严重听力和语言障碍的人使用的手势转换成等效的文本或语音。由于光照和复杂的背景约束问题,研究人员最近使用手部骨骼关节信息代替图像像素。然而,除了手部信息外,身体动作和面部手势在表达手语情感方面也起着至关重要的作用。此外,也有一些研究人员致力于通过获取多手势数据集来开发单反系统,但其性能精度和时间复杂度都不够高。鉴于这些局限性,我们引入了一种时空注意力模型,并将其与专为 SLR 系统设计的通用神经网络相结合。我们架构的主要思路是首先构建一个全连接图来投射骨架信息。我们采用自我注意机制,从跨时空领域的节点和边缘特征中提取洞察力。我们的架构分为三个分支:基于图的空间分支、基于图的时间分支和通用神经网络分支,它们共同协同,为最终的特征整合做出贡献。具体来说,空间分支可识别空间依赖关系,而时间分支则可放大顺序手骨架数据中的时间依赖关系。此外,通用神经网络分支增强了架构的泛化能力,从而提高了其鲁棒性。在评估中,我们利用墨西哥手语(MSL)、巴基斯坦手语(PSL)数据集和美国手语大型视频数据集(包括面部、身体和手部的三维关节坐标)对单个手势及其组合进行了实验。令人印象深刻的是,我们的模型展示了显著的功效,MSL 数据集的准确率达到 99.96%,PSL 数据集的准确率达到 92.00%,包含 2700 多个类别的 ASLLVD 数据集的准确率达到 26.00%。这些堪称典范的性能指标,加上该模型的高效计算特性,凸显了它在该领域与同时代方法相比的卓越性。
{"title":"Spatial–temporal attention with graph and general neural network-based sign language recognition","authors":"Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka, Jungpil Shin","doi":"10.1007/s10044-024-01229-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01229-4","url":null,"abstract":"<p>Automatic sign language recognition (SLR) stands as a vital aspect within the realms of human–computer interaction and computer vision, facilitating the conversion of hand signs utilized by individuals with significant hearing and speech impairments into equivalent text or voice. Researchers have recently used hand skeleton joint information instead of the image pixel due to light illumination and complex background-bound problems. However, besides the hand information, body motion and facial gestures play an essential role in expressing sign language emotion. Also, a few researchers have been working to develop an SLR system by taking a multi-gesture dataset, but their performance accuracy and time complexity are not sufficient. In light of these limitations, we introduce a spatial and temporal attention model amalgamated with a general neural network designed for the SLR system. The main idea of our architecture is first to construct a fully connected graph to project the skeleton information. We employ self-attention mechanisms to extract insights from node and edge features across spatial and temporal domains. Our architecture bifurcates into three branches: a graph-based spatial branch, a graph-based temporal branch, and a general neural network branch, which collectively synergize to contribute to the final feature integration. Specifically, the spatial branch discerns spatial dependencies, while the temporal branch amplifies temporal dependencies embedded within the sequential hand skeleton data. Further, the general neural network branch enhances the architecture’s generalization capabilities, bolstering its robustness. In our evaluation, utilizing the Mexican Sign Language (MSL), Pakistani Sign Language (PSL) datasets, and American Sign Language Large Video dataset which comprises 3D joint coordinates for face, body, and hands that conducted experiments on individual gestures and their combinations. Impressively, our model demonstrated notable efficacy, achieving an accuracy rate of 99.96% for the MSL dataset, 92.00% for PSL, and 26.00% for the ASLLVD dataset, which includes more than 2700 classes. These exemplary performance metrics, coupled with the model’s computationally efficient profile, underscore its preeminence compared to contemporaneous methodologies in the field.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140585406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tiny polyp detection from endoscopic video frames using vision transformers 利用视觉变换器从内窥镜视频帧中检测微小息肉
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-04 DOI: 10.1007/s10044-024-01254-3
Entong Liu, Bishi He, Darong Zhu, Yuanjiao Chen, Zhe Xu

Deep learning techniques can be effective in helping doctors diagnose gastrointestinal polyps. Currently, processing video frame sequences containing a large amount of spurious noise in polyp detection suffers from elevated recall and mean average precision. Moreover, the mean average precision is also low when the polyp target in the video frame has large-scale variability. Therefore, we propose a tiny polyp detection from endoscopic video frames using Vision Transformers, named TPolyp. The proposed method uses a cross-stage Swin Transformer as a multi-scale feature extractor to extract deep feature representations of data samples, improves the bidirectional sampling feature pyramid, and integrates the prediction heads of multiple channel self-attention mechanisms. This approach focuses more on the feature information of the tiny object detection task than convolutional neural networks and retains relatively deeper semantic information. It additionally improves feature expression and discriminability without increasing the computational complexity. Experimental results show that TPolyp improves detection accuracy by 7%, recall by 7.3%, and average accuracy by 7.5% compared to the YOLOv5 model, and has better tiny object detection in scenarios with blurry artifacts.

深度学习技术可有效帮助医生诊断胃肠道息肉。目前,在息肉检测中处理含有大量杂散噪声的视频帧序列时,会出现召回率和平均精度降低的问题。此外,当视频帧中的息肉目标具有大范围变化时,平均精度也很低。因此,我们提出了一种利用视觉变换器从内窥镜视频帧中检测微小息肉的方法,命名为 TPolyp。该方法采用跨级 Swin 变换器作为多尺度特征提取器,提取数据样本的深度特征表征,改进了双向采样特征金字塔,并集成了多通道自注意机制的预测头。与卷积神经网络相比,这种方法更注重微小物体检测任务的特征信息,并保留了相对更深的语义信息。此外,它还在不增加计算复杂度的情况下提高了特征表达和可辨别性。实验结果表明,与 YOLOv5 模型相比,TPolyp 的检测准确率提高了 7%,召回率提高了 7.3%,平均准确率提高了 7.5%。
{"title":"Tiny polyp detection from endoscopic video frames using vision transformers","authors":"Entong Liu, Bishi He, Darong Zhu, Yuanjiao Chen, Zhe Xu","doi":"10.1007/s10044-024-01254-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01254-3","url":null,"abstract":"<p>Deep learning techniques can be effective in helping doctors diagnose gastrointestinal polyps. Currently, processing video frame sequences containing a large amount of spurious noise in polyp detection suffers from elevated recall and mean average precision. Moreover, the mean average precision is also low when the polyp target in the video frame has large-scale variability. Therefore, we propose a tiny polyp detection from endoscopic video frames using Vision Transformers, named TPolyp. The proposed method uses a cross-stage Swin Transformer as a multi-scale feature extractor to extract deep feature representations of data samples, improves the bidirectional sampling feature pyramid, and integrates the prediction heads of multiple channel self-attention mechanisms. This approach focuses more on the feature information of the tiny object detection task than convolutional neural networks and retains relatively deeper semantic information. It additionally improves feature expression and discriminability without increasing the computational complexity. Experimental results show that TPolyp improves detection accuracy by 7%, recall by 7.3%, and average accuracy by 7.5% compared to the YOLOv5 model, and has better tiny object detection in scenarios with blurry artifacts.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CABF-YOLO: a precise and efficient deep learning method for defect detection on strip steel surface CABF-YOLO:用于带钢表面缺陷检测的精确高效深度学习方法
IF 3.9 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-03 DOI: 10.1007/s10044-024-01252-5

Abstract

Deep learning algorithms have gained widespread usage in defect detection systems. However, existing methods are not satisfied for large-scale applications on surface defect detection of strip steel. In this paper, we propose a precise and efficient detection model, named CABF-YOLO, based on the YOLOX for strip steel surface defects. Firstly, we introduce the Triplet Convolutional Coordinate Attention (TCCA) module in the backbone of the YOLOX. By factorizing the pooling operation, the TCCA module can accurately capture cross-channel features to identify the location information of defects. Secondly, we design a novel Bidirectional Fusion (BF) strategy in the neck of the YOLOX. The BF strategy enhances the fusion of low-level and high-level semantic information to obtain fine-grained information. Lastly, the original bounding box loss function is replaced by the EIoU loss function. In the EIoU loss function, the penalty term is redefined to consider the overlap area, central point, and side length of the required regressions to accelerate the convergence rate and localization accuracy. On the benchmark NEU-DET dataset and GC10-DET dataset, the experimental results show that the CABF-YOLO achieves superior performance compared with other comparison models and satisfies the real-time detection requirement of industrial production.

摘要 深度学习算法在缺陷检测系统中得到了广泛应用。然而,现有方法并不能满足带钢表面缺陷检测的大规模应用。本文在 YOLOX 的基础上,针对带钢表面缺陷提出了一种精确高效的检测模型,命名为 CABF-YOLO。首先,我们在 YOLOX 的骨干中引入了三重卷积坐标注意(TCCA)模块。通过因子池化操作,TCCA 模块可以准确捕捉跨通道特征,从而识别缺陷的位置信息。其次,我们在 YOLOX 的颈部设计了一种新颖的双向融合(BF)策略。双向融合策略加强了低层次和高层次语义信息的融合,从而获得细粒度信息。最后,原有的边界框损失函数被 EIoU 损失函数所取代。在 EIoU 损失函数中,重新定义了惩罚项,以考虑所需回归的重叠区域、中心点和边长,从而加快收敛速度和定位精度。在基准 NEU-DET 数据集和 GC10-DET 数据集上的实验结果表明,与其他比较模型相比,CABF-YOLO 实现了更优越的性能,满足了工业生产的实时检测要求。
{"title":"CABF-YOLO: a precise and efficient deep learning method for defect detection on strip steel surface","authors":"","doi":"10.1007/s10044-024-01252-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01252-5","url":null,"abstract":"<h3>Abstract</h3> <p>Deep learning algorithms have gained widespread usage in defect detection systems. However, existing methods are not satisfied for large-scale applications on surface defect detection of strip steel. In this paper, we propose a precise and efficient detection model, named CABF-YOLO, based on the YOLOX for strip steel surface defects. Firstly, we introduce the Triplet Convolutional Coordinate Attention (TCCA) module in the backbone of the YOLOX. By factorizing the pooling operation, the TCCA module can accurately capture cross-channel features to identify the location information of defects. Secondly, we design a novel Bidirectional Fusion (BF) strategy in the neck of the YOLOX. The BF strategy enhances the fusion of low-level and high-level semantic information to obtain fine-grained information. Lastly, the original bounding box loss function is replaced by the EIoU loss function. In the EIoU loss function, the penalty term is redefined to consider the overlap area, central point, and side length of the required regressions to accelerate the convergence rate and localization accuracy. On the benchmark NEU-DET dataset and GC10-DET dataset, the experimental results show that the CABF-YOLO achieves superior performance compared with other comparison models and satisfies the real-time detection requirement of industrial production.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140585401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Analysis and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1