首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Associative graph convolution network for point cloud analysis 用于点云分析的关联图卷积网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-06 DOI: 10.1016/j.patcog.2024.111152
Xi Yang , Xingyilang Yin , Nannan Wang , Xinbo Gao
Since point cloud is the raw output of most 3D sensors, its effective analysis is in huge demand in the field of autonomous driving and robotic manipulation. However, directly processing point clouds is challenging because point clouds are a kind of disordered and unstructured geometric data. Recently, numerous graph convolution neural networks are proposed for introducing graph structure to point clouds yet far from perfect. Specially, DGCNN tries to learn local geometric of points in semantic space and recomputes the graph using nearest neighbors in the feature space in each layer. However, it discards all the information of the previous graph after each graph update, which neglects the relations between each dynamic update. To this end, we propose an associative graph convolution neural network (AGCN) which mainly consists of associative graph convolution (AGConv) and two kinds of residual connections. AGConv additionally considers the information from the previous graph when computing the edge function on current local neighborhoods in each layer, and it can precisely and continuously capture the local geometric features on point clouds. Residual connections further explore the semantic relations between layers for effective learning on point clouds. Extensive experiments on several benchmark datasets show that our network achieves competitive classification and segmentation results.
由于点云是大多数三维传感器的原始输出,因此在自动驾驶和机器人操纵领域对其进行有效分析的需求巨大。然而,由于点云是一种无序、非结构化的几何数据,因此直接处理点云具有很大的挑战性。最近,人们提出了许多将图结构引入点云的图卷积神经网络,但还远远不够完善。特别是,DGCNN 尝试在语义空间中学习点的局部几何,并在每一层中使用特征空间中的近邻重新计算图。然而,每次图形更新后,它都会丢弃之前图形的所有信息,从而忽略了每次动态更新之间的关系。为此,我们提出了关联图卷积神经网络(AGCN),它主要由关联图卷积(AGConv)和两种残差连接组成。关联图卷积(AGConv)在计算每层当前局部邻域的边缘函数时,会额外考虑上一层图的信息,因此可以精确、连续地捕捉点云上的局部几何特征。残差连接进一步探索了层与层之间的语义关系,从而实现对点云的有效学习。在多个基准数据集上进行的广泛实验表明,我们的网络能获得极具竞争力的分类和分割结果。
{"title":"Associative graph convolution network for point cloud analysis","authors":"Xi Yang ,&nbsp;Xingyilang Yin ,&nbsp;Nannan Wang ,&nbsp;Xinbo Gao","doi":"10.1016/j.patcog.2024.111152","DOIUrl":"10.1016/j.patcog.2024.111152","url":null,"abstract":"<div><div>Since point cloud is the raw output of most 3D sensors, its effective analysis is in huge demand in the field of autonomous driving and robotic manipulation. However, directly processing point clouds is challenging because point clouds are a kind of disordered and unstructured geometric data. Recently, numerous graph convolution neural networks are proposed for introducing graph structure to point clouds yet far from perfect. Specially, DGCNN tries to learn local geometric of points in semantic space and recomputes the graph using nearest neighbors in the feature space in each layer. However, it discards all the information of the previous graph after each graph update, which neglects the relations between each dynamic update. To this end, we propose an associative graph convolution neural network (AGCN) which mainly consists of associative graph convolution (AGConv) and two kinds of residual connections. AGConv additionally considers the information from the previous graph when computing the edge function on current local neighborhoods in each layer, and it can precisely and continuously capture the local geometric features on point clouds. Residual connections further explore the semantic relations between layers for effective learning on point clouds. Extensive experiments on several benchmark datasets show that our network achieves competitive classification and segmentation results.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111152"},"PeriodicalIF":7.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Riding feeling recognition based on multi-head self-attention LSTM for driverless automobile 基于多头自注意 LSTM 的无人驾驶汽车骑乘感识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.patcog.2024.111135
Xianzhi Tang, Yongjia Xie, Xinlong Li, Bo Wang
With the emergence of driverless technology, passenger ride comfort has become an issue of concern. In recent years, driving fatigue detection and braking sensation evaluation based on EEG signals have received more attention, and analyzing ride comfort using EEG signals is also a more intuitive method. However, it is still a challenge to find an effective method or model to evaluate passenger comfort. In this paper, we propose a long- and short-term memory network model based on a multiple self-attention mechanism for passenger comfort detection. By applying the multiple attention mechanism to the feature extraction process, more efficient classification results are obtained. The results show that the long- and short-term memory network using the multi-head self-attention mechanism is efficient in decision making along with higher classification accuracy. In conclusion, the classifier based on the multi-head attention mechanism proposed in this paper has excellent performance in EEG classification of different emotional states, and has a broad development prospect in brain-computer interaction.
随着无人驾驶技术的出现,乘客的乘坐舒适性成为人们关注的问题。近年来,基于脑电信号的驾驶疲劳检测和制动感觉评估受到越来越多的关注,利用脑电信号分析乘坐舒适性也是一种较为直观的方法。然而,如何找到一种有效的方法或模型来评估乘客舒适度仍是一个挑战。本文提出了一种基于多重自我注意机制的长短期记忆网络模型,用于乘客舒适度检测。通过在特征提取过程中应用多重注意机制,可以获得更有效的分类结果。结果表明,采用多头自我关注机制的长短期记忆网络决策效率高,分类准确率更高。总之,本文提出的基于多头注意机制的分类器在不同情绪状态的脑电分类中表现优异,在脑机交互领域具有广阔的发展前景。
{"title":"Riding feeling recognition based on multi-head self-attention LSTM for driverless automobile","authors":"Xianzhi Tang,&nbsp;Yongjia Xie,&nbsp;Xinlong Li,&nbsp;Bo Wang","doi":"10.1016/j.patcog.2024.111135","DOIUrl":"10.1016/j.patcog.2024.111135","url":null,"abstract":"<div><div>With the emergence of driverless technology, passenger ride comfort has become an issue of concern. In recent years, driving fatigue detection and braking sensation evaluation based on EEG signals have received more attention, and analyzing ride comfort using EEG signals is also a more intuitive method. However, it is still a challenge to find an effective method or model to evaluate passenger comfort. In this paper, we propose a long- and short-term memory network model based on a multiple self-attention mechanism for passenger comfort detection. By applying the multiple attention mechanism to the feature extraction process, more efficient classification results are obtained. The results show that the long- and short-term memory network using the multi-head self-attention mechanism is efficient in decision making along with higher classification accuracy. In conclusion, the classifier based on the multi-head attention mechanism proposed in this paper has excellent performance in EEG classification of different emotional states, and has a broad development prospect in brain-computer interaction.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111135"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint utilization of positive and negative pseudo-labels in semi-supervised facial expression recognition 在半监督面部表情识别中联合使用正负伪标签
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.patcog.2024.111147
Jinwei Lv, Yanli Ren, Guorui Feng
Facial expression recognition has obtained significant attention due to the abundance of unlabeled expressions, and semi-supervised learning aims to leverage unlabeled samples sufficiently. Recent approaches primarily focus on combining an adaptive margin and pseudo-labels to extract hard samples and boost performance. However, the instability of pseudo-labels and the utilization of the rest unlabeled samples remain critical challenges. We introduce a stable-positive-single and negative-multiple pseudo-labels (SPS-NM) method to solve the above two challenges. All unlabeled samples are categorized into three groups properly by adaptive confidence margins. When the maximum confidence scores are high and stable enough, the unlabeled samples are attached with positive pseudo-labels. On the contrary, when the confidence scores of unlabeled samples are low enough, the related negative-multi pseudo-labels are attached to these samples. The quality and quantity of classes in negative pseudo-labels are balanced by top-k. Eventually, the remaining unlabeled samples are ambiguous and fail to match their pseudo-labels, but they can still be used to extract valuable features by contrastive learning. We conduct comparative experiments and ablation study on RAF-DB, AffectNet and SFEW datasets to demonstrate that SPS-NM achieves improvement and becomes the state-of-the-art method in facial expression recognition.
由于存在大量未标记的表情,面部表情识别备受关注,而半监督学习的目的就是充分利用未标记的样本。最近的方法主要侧重于结合自适应边际和伪标签来提取硬样本并提高性能。然而,伪标签的不稳定性和其余未标记样本的利用仍然是关键挑战。我们引入了一种稳定的正-单-负-多伪标签(SPS-NM)方法来解决上述两个难题。通过自适应置信度将所有未标记样本正确地分为三组。当最大置信度分数足够高且足够稳定时,未标记样本会被附加上正伪标签。相反,当未标记样本的置信度分数足够低时,这些样本会被附加相关的负多伪标签。负伪标签中的类的质量和数量由 top-k 来平衡。最终,剩余的未标记样本是模糊的,无法与其伪标签匹配,但它们仍可通过对比学习提取有价值的特征。我们在 RAF-DB、AffectNet 和 SFEW 数据集上进行了对比实验和消融研究,证明 SPS-NM 在面部表情识别方面取得了改进,并成为最先进的方法。
{"title":"Joint utilization of positive and negative pseudo-labels in semi-supervised facial expression recognition","authors":"Jinwei Lv,&nbsp;Yanli Ren,&nbsp;Guorui Feng","doi":"10.1016/j.patcog.2024.111147","DOIUrl":"10.1016/j.patcog.2024.111147","url":null,"abstract":"<div><div>Facial expression recognition has obtained significant attention due to the abundance of unlabeled expressions, and semi-supervised learning aims to leverage unlabeled samples sufficiently. Recent approaches primarily focus on combining an adaptive margin and pseudo-labels to extract hard samples and boost performance. However, the instability of pseudo-labels and the utilization of the rest unlabeled samples remain critical challenges. We introduce a stable-positive-single and negative-multiple pseudo-labels (SPS-NM) method to solve the above two challenges. All unlabeled samples are categorized into three groups properly by adaptive confidence margins. When the maximum confidence scores are high and stable enough, the unlabeled samples are attached with positive pseudo-labels. On the contrary, when the confidence scores of unlabeled samples are low enough, the related negative-multi pseudo-labels are attached to these samples. The quality and quantity of classes in negative pseudo-labels are balanced by <span><math><mrow><mi>t</mi><mi>o</mi><mi>p</mi></mrow></math></span>-<span><math><mi>k</mi></math></span>. Eventually, the remaining unlabeled samples are ambiguous and fail to match their pseudo-labels, but they can still be used to extract valuable features by contrastive learning. We conduct comparative experiments and ablation study on RAF-DB, AffectNet and SFEW datasets to demonstrate that SPS-NM achieves improvement and becomes the state-of-the-art method in facial expression recognition.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111147"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised multimodal change detection based on difference contrast learning for remote sensing imagery 基于遥感图像差异对比学习的自监督多模态变化检测
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.patcog.2024.111148
Xuan Hou , Yunpeng Bai , Yefan Xie , Yunfeng Zhang , Lei Fu , Ying Li , Changjing Shang , Qiang Shen
Most existing change detection (CD) methods target homogeneous images. However, in real-world scenarios like disaster management, where CD is urgent and pre-changed and post-changed images are typical of different modalities, significant challenges arise for multimodal change detection (MCD). One challenge is that bi-temporal image pairs, sourced from distinct sensors, may cause an image domain gap. Another issue surfaces when multimodal bi-temporal image pairs require collaborative input from domain experts who are specialised among different image fields for pixel-level annotation, resulting in scarce annotated samples. To address these challenges, this paper proposes a novel self-supervised difference contrast learning framework (Self-DCF). This framework facilitates networks training without labelled samples by automatically exploiting the feature information inherent in bi-temporal imagery to supervise each other mutually. Additionally, a Unified Mapping Unit reduces the domain gap between different modal images. The efficiency and robustness of Self-DCF are validated on five popular datasets, outperforming state-of-the-art algorithms.
现有的变化检测(CD)方法大多针对同质图像。然而,在灾害管理等现实世界场景中,变化检测十分紧迫,而变化前和变化后的图像又是典型的不同模态图像,这就给多模态变化检测(MCD)带来了巨大挑战。挑战之一是来自不同传感器的双时态图像对可能会造成图像域差距。另一个问题是,当多模态双时相图像对需要不同图像领域的专业领域专家共同输入像素级注释时,就会出现注释样本不足的问题。为了应对这些挑战,本文提出了一种新颖的自监督差异对比度学习框架(Self-DCF)。该框架通过自动利用双时相图像中固有的特征信息来相互监督,从而在没有标记样本的情况下促进网络训练。此外,统一映射单元缩小了不同模态图像之间的域差距。Self-DCF 的效率和鲁棒性在五个流行的数据集上得到了验证,表现优于最先进的算法。
{"title":"Self-supervised multimodal change detection based on difference contrast learning for remote sensing imagery","authors":"Xuan Hou ,&nbsp;Yunpeng Bai ,&nbsp;Yefan Xie ,&nbsp;Yunfeng Zhang ,&nbsp;Lei Fu ,&nbsp;Ying Li ,&nbsp;Changjing Shang ,&nbsp;Qiang Shen","doi":"10.1016/j.patcog.2024.111148","DOIUrl":"10.1016/j.patcog.2024.111148","url":null,"abstract":"<div><div>Most existing change detection (CD) methods target homogeneous images. However, in real-world scenarios like disaster management, where CD is urgent and pre-changed and post-changed images are typical of different modalities, significant challenges arise for multimodal change detection (MCD). One challenge is that bi-temporal image pairs, sourced from distinct sensors, may cause an image domain gap. Another issue surfaces when multimodal bi-temporal image pairs require collaborative input from domain experts who are specialised among different image fields for pixel-level annotation, resulting in scarce annotated samples. To address these challenges, this paper proposes a novel self-supervised difference contrast learning framework (Self-DCF). This framework facilitates networks training without labelled samples by automatically exploiting the feature information inherent in bi-temporal imagery to supervise each other mutually. Additionally, a Unified Mapping Unit reduces the domain gap between different modal images. The efficiency and robustness of Self-DCF are validated on five popular datasets, outperforming state-of-the-art algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111148"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental feature selection: Parallel approach with local neighborhood rough sets and composite entropy 增量特征选择:采用局部邻域粗糙集和复合熵的并行方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.patcog.2024.111141
Weihua Xu, Weirui Ye
Rough set theory is a powerful mathematical framework for managing uncertainty and is widely utilized in feature selection. However, traditional rough set-based feature selection algorithms encounter significant challenges, especially when processing large-scale incremental data and adapting to the dynamic nature of real-world scenarios, where both data volume and feature sets are continuously changing. To overcome these limitations, this study proposes an innovative algorithm that integrates local neighborhood rough sets with composite entropy to measure uncertainty in information systems more accurately. By incorporating decision distribution, composite entropy enhances the precision of uncertainty quantification, thereby improving the effectiveness of the algorithm in feature selection. To further improve performance in handling large-scale incremental data, matrix operations are employed in place of traditional set-based methods, allowing the algorithm to fully utilize modern hardware capabilities for accelerated processing. Additionally, parallel computing technology is integrated to further enhance computational speed. An incremental version of the algorithm is also introduced to better adapt to dynamic data environments, increasing its flexibility and practicality. Comprehensive experimental evaluations demonstrate that the proposed algorithm significantly surpasses existing methods in both effectiveness and efficiency.
粗糙集理论是管理不确定性的强大数学框架,被广泛应用于特征选择。然而,传统的基于粗糙集的特征选择算法面临着巨大的挑战,尤其是在处理大规模增量数据和适应真实世界场景的动态特性时,因为数据量和特征集都在不断变化。为了克服这些局限性,本研究提出了一种创新算法,将局部邻域粗糙集与复合熵整合在一起,以更准确地衡量信息系统中的不确定性。通过结合决策分布,复合熵提高了不确定性量化的精度,从而改善了算法在特征选择方面的有效性。为了进一步提高处理大规模增量数据的性能,该算法采用了矩阵运算来取代传统的基于集合的方法,从而充分利用现代硬件能力来加速处理过程。此外,还集成了并行计算技术,以进一步提高计算速度。该算法还引入了增量版本,以更好地适应动态数据环境,提高其灵活性和实用性。全面的实验评估证明,所提出的算法在有效性和效率方面都大大超越了现有方法。
{"title":"Incremental feature selection: Parallel approach with local neighborhood rough sets and composite entropy","authors":"Weihua Xu,&nbsp;Weirui Ye","doi":"10.1016/j.patcog.2024.111141","DOIUrl":"10.1016/j.patcog.2024.111141","url":null,"abstract":"<div><div>Rough set theory is a powerful mathematical framework for managing uncertainty and is widely utilized in feature selection. However, traditional rough set-based feature selection algorithms encounter significant challenges, especially when processing large-scale incremental data and adapting to the dynamic nature of real-world scenarios, where both data volume and feature sets are continuously changing. To overcome these limitations, this study proposes an innovative algorithm that integrates local neighborhood rough sets with composite entropy to measure uncertainty in information systems more accurately. By incorporating decision distribution, composite entropy enhances the precision of uncertainty quantification, thereby improving the effectiveness of the algorithm in feature selection. To further improve performance in handling large-scale incremental data, matrix operations are employed in place of traditional set-based methods, allowing the algorithm to fully utilize modern hardware capabilities for accelerated processing. Additionally, parallel computing technology is integrated to further enhance computational speed. An incremental version of the algorithm is also introduced to better adapt to dynamic data environments, increasing its flexibility and practicality. Comprehensive experimental evaluations demonstrate that the proposed algorithm significantly surpasses existing methods in both effectiveness and efficiency.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111141"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HTCSigNet: A Hybrid Transformer and Convolution Signature Network for offline signature verification HTCSigNet:用于离线签名验证的混合变换器和卷积签名网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-05 DOI: 10.1016/j.patcog.2024.111146
Lidong Zheng , Da Wu , Shengjie Xu, Yuchen Zheng
For Offline Handwritten Signature Verification (OHSV) tasks, traditional Convolutional Neural Networks (CNNs) and transformers are hard to individually capture global and local features from signatures, and single-depth models often suffer from overfitting and poor generalization problems. To overcome those difficulties, in this paper, a novel Hybrid Transformer and Convolution Signature Network (HTCSigNet) is proposed to capture multi-scale features from signatures. Specifically, the HTCSigNet is an innovative framework that consists of two parts: transformer and CNN-based blocks which are used to respectively extract global and local features from signatures. The CNN-based block comprises a Space-to-depth Convolution (SPD-Conv) module which improves the feature learning capability by precisely focusing on signature strokes, a Spatial and Channel Reconstruction Convolution (SCConv) module which enhances model generalization by focusing on more distinctive micro-deformation features while reducing attention to common features, and convolution module that extracts the shape, morphology of specific strokes, and other local features from signatures. In the transformer-based block, there is a Vision Transformer (ViT) which is used to extract overall shape, layout, general direction, and other global features from signatures. After the feature learning stage, Writer-Dependent (WD) and Writer-Independent (WI) verification systems are constructed to evaluate the performance of the proposed HTCSigNet. Extensive experiments on four public signature datasets, GPDSsynthetic, CEDAR, UTSig, and BHSig260 (Bengali and Hindi) demonstrate that the proposed HTCSigNet learns discriminative representations between genuine and skilled forged signatures and achieves state-of-the-art or competitive performance compared with advanced verification systems. Furthermore, the proposed HTCSigNet is easy to transfer to different language datasets in OHSV tasks.2
对于离线手写签名验证(OHSV)任务,传统的卷积神经网络(CNN)和变换器很难单独捕捉签名的全局和局部特征,而且单深度模型往往存在过拟合和泛化能力差的问题。为了克服这些困难,本文提出了一种新颖的混合变换器和卷积签名网络(HTCSigNet)来捕捉签名中的多尺度特征。具体来说,HTCSigNet 是一个创新框架,由两部分组成:变压器和基于 CNN 的模块,分别用于从签名中提取全局和局部特征。基于 CNN 的区块包括一个空间-深度卷积(SPD-Conv)模块,该模块通过精确关注签名笔画来提高特征学习能力;一个空间和通道重构卷积(SCConv)模块,该模块通过关注更独特的微变形特征同时减少对常见特征的关注来提高模型泛化能力;以及一个卷积模块,该模块可从签名中提取形状、特定笔画的形态和其他局部特征。在基于变换器的模块中,有一个视觉变换器(ViT),用于从签名中提取整体形状、布局、大方向和其他全局特征。在特征学习阶段之后,为了评估 HTCSigNet 的性能,我们构建了依赖 WD(Writer-Dependent)和不依赖 WI(Writer-Independent)的验证系统。 在四个公共签名数据集 GPDSsynthetic、CEDAR、UTSig 和 BHSig260(孟加拉语和印地语)上进行的大量实验表明,HTCSigNet 能够学习真伪签名之间的区别表征,与先进的验证系统相比,HTCSigNet 的性能达到了一流水平或具有竞争力。此外,所提出的 HTCSigNet 很容易在 OHSV 任务中转移到不同的语言数据集。
{"title":"HTCSigNet: A Hybrid Transformer and Convolution Signature Network for offline signature verification","authors":"Lidong Zheng ,&nbsp;Da Wu ,&nbsp;Shengjie Xu,&nbsp;Yuchen Zheng","doi":"10.1016/j.patcog.2024.111146","DOIUrl":"10.1016/j.patcog.2024.111146","url":null,"abstract":"<div><div>For Offline Handwritten Signature Verification (OHSV) tasks, traditional Convolutional Neural Networks (CNNs) and transformers are hard to individually capture global and local features from signatures, and single-depth models often suffer from overfitting and poor generalization problems. To overcome those difficulties, in this paper, a novel Hybrid Transformer and Convolution Signature Network (HTCSigNet) is proposed to capture multi-scale features from signatures. Specifically, the HTCSigNet is an innovative framework that consists of two parts: transformer and CNN-based blocks which are used to respectively extract global and local features from signatures. The CNN-based block comprises a Space-to-depth Convolution (SPD-Conv) module which improves the feature learning capability by precisely focusing on signature strokes, a Spatial and Channel Reconstruction Convolution (SCConv) module which enhances model generalization by focusing on more distinctive micro-deformation features while reducing attention to common features, and convolution module that extracts the shape, morphology of specific strokes, and other local features from signatures. In the transformer-based block, there is a Vision Transformer (ViT) which is used to extract overall shape, layout, general direction, and other global features from signatures. After the feature learning stage, Writer-Dependent (WD) and Writer-Independent (WI) verification systems are constructed to evaluate the performance of the proposed HTCSigNet. Extensive experiments on four public signature datasets, GPDSsynthetic, CEDAR, UTSig, and BHSig260 (Bengali and Hindi) demonstrate that the proposed HTCSigNet learns discriminative representations between genuine and skilled forged signatures and achieves state-of-the-art or competitive performance compared with advanced verification systems. Furthermore, the proposed HTCSigNet is easy to transfer to different language datasets in OHSV tasks.<span><span><sup>2</sup></span></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111146"},"PeriodicalIF":7.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognition EGO-LM:高效、通用、开箱即用的手写文本识别语言模型
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.patcog.2024.111130
Hongliang Li , Dezhi Peng , Lianwen Jin
The language model (LM) plays a crucial role in post-processing handwritten text recognition (HTR) by capturing linguistic patterns. However, traditional rule-based LMs are inefficient, and recent end-to-end LMs require customized training for each HTR model. To address these limitations, we propose an Efficient, Generic, and Out-of-the-box Language Model (EGO-LM) for HTR. To unlock the out-of-the-box capability of the end-to-end LM, we introduce a vision-limited proxy task that focuses on visual-pattern-agnostic linguistic dependencies during training, enhancing the robustness and generality of the LM. The enhanced capabilities also enable EGO-LM to iteratively refine its output for a further accuracy boost without additional tuning. Moreover, we introduce a Diverse-Corpus Online Handwriting dataset (DCOH-120K) with more diverse corpus types and more samples than existing datasets, including 83,142 Chinese and 39,398 English text lines. Extensive experiments demonstrate that EGO-LM can attain state-of-the-art performance while achieving up to 613× acceleration. The DCOH-120K dataset is available at .
语言模型(LM)通过捕捉语言模式,在手写文本识别(HTR)的后处理中发挥着至关重要的作用。然而,传统的基于规则的语言模型效率低下,而最新的端到端语言模型需要对每个 HTR 模型进行定制化训练。为了解决这些局限性,我们提出了一种适用于 HTR 的高效、通用和开箱即用的语言模型(EGO-LM)。为了释放端到端 LM 的开箱即用能力,我们引入了一个视觉限制代理任务,该任务在训练过程中重点关注与视觉模式无关的语言依赖关系,从而增强了 LM 的鲁棒性和通用性。增强的功能还使 EGO-LM 能够迭代改进其输出,从而进一步提高准确性,而无需额外的调整。此外,我们还引入了多元化语料库在线手写数据集(DCOH-120K),与现有数据集相比,该数据集的语料类型更加多元化,样本数量也更多,包括 83,142 个中文文本行和 39,398 个英文文本行。广泛的实验证明,EGO-LM 可以达到最先进的性能,同时实现高达 613 倍的加速度。DCOH-120K数据集可在.NET.CN上获取。
{"title":"EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognition","authors":"Hongliang Li ,&nbsp;Dezhi Peng ,&nbsp;Lianwen Jin","doi":"10.1016/j.patcog.2024.111130","DOIUrl":"10.1016/j.patcog.2024.111130","url":null,"abstract":"<div><div>The language model (LM) plays a crucial role in post-processing handwritten text recognition (HTR) by capturing linguistic patterns. However, traditional rule-based LMs are inefficient, and recent end-to-end LMs require customized training for each HTR model. To address these limitations, we propose an <strong>E</strong>fficient, <strong>G</strong>eneric, and <strong>O</strong>ut-of-the-box <strong>L</strong>anguage <strong>M</strong>odel (EGO-LM) for HTR. To unlock the out-of-the-box capability of the end-to-end LM, we introduce a vision-limited proxy task that focuses on visual-pattern-agnostic linguistic dependencies during training, enhancing the robustness and generality of the LM. The enhanced capabilities also enable EGO-LM to iteratively refine its output for a further accuracy boost without additional tuning. Moreover, we introduce a <strong>D</strong>iverse-<strong>C</strong>orpus <strong>O</strong>nline <strong>H</strong>andwriting dataset (DCOH-120K) with more diverse corpus types and more samples than existing datasets, including 83,142 Chinese and 39,398 English text lines. Extensive experiments demonstrate that EGO-LM can attain state-of-the-art performance while achieving up to 613<span><math><mo>×</mo></math></span> acceleration. The DCOH-120K dataset is available at .</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111130"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAR target augmentation and recognition via cross-domain reconstruction 通过跨域重建进行合成孔径雷达目标增强和识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.patcog.2024.111117
Ganggang Dong, Yafei Song
The deep learning-based target recognition methods have achieved great performance in the preceding works. Large amounts of training data with label were collected to train a deep architecture, by which the inference can be obtained. For radar sensors, the data could be collected easily, yet the prior knowledge on label was difficult to be accessed. To solve the problem, a cross-domain re-imaging target augmentation method was proposed in this paper. The original image was first recast into the frequency domain. The frequency were then randomly filtered by a randomly generated mask. The size and the shape of mask was randomly determined. The filtering results were finally used for re-imaging. The original target can be then reconstructed accordingly. A series of new samples can be generated freely. The amounts and the diversity of dataset can be therefore improved. The proposed augmentation method can be implemented on-line or off-line, making it adaptable to various downstream tasks. Multiple comparative studies throw the light on the superiority of proposed method over the standard and recent techniques. It served to generate the images that would aid the downstream tasks.
基于深度学习的目标识别方法在前人的研究中取得了巨大的成就。收集大量带有标签的训练数据来训练深度架构,从而获得推理结果。对于雷达传感器来说,数据很容易收集,但关于标签的先验知识却很难获取。为了解决这个问题,本文提出了一种跨域再成像目标增强方法。首先将原始图像转换到频域。然后用随机生成的掩码对频率进行随机滤波。掩膜的大小和形状是随机确定的。滤波结果最终用于重新成像。然后就可以相应地重建原始目标。一系列新样本可以自由生成。因此,数据集的数量和多样性可以得到改善。所提出的增强方法可以在线或离线实施,因此可以适应各种下游任务。多项比较研究表明,拟议方法优于标准和最新技术。它可以生成有助于下游任务的图像。
{"title":"SAR target augmentation and recognition via cross-domain reconstruction","authors":"Ganggang Dong,&nbsp;Yafei Song","doi":"10.1016/j.patcog.2024.111117","DOIUrl":"10.1016/j.patcog.2024.111117","url":null,"abstract":"<div><div>The deep learning-based target recognition methods have achieved great performance in the preceding works. Large amounts of training data with label were collected to train a deep architecture, by which the inference can be obtained. For radar sensors, the data could be collected easily, yet the prior knowledge on label was difficult to be accessed. To solve the problem, a cross-domain re-imaging target augmentation method was proposed in this paper. The original image was first recast into the frequency domain. The frequency were then randomly filtered by a randomly generated mask. The size and the shape of mask was randomly determined. The filtering results were finally used for re-imaging. The original target can be then reconstructed accordingly. A series of new samples can be generated freely. The amounts and the diversity of dataset can be therefore improved. The proposed augmentation method can be implemented on-line or off-line, making it adaptable to various downstream tasks. Multiple comparative studies throw the light on the superiority of proposed method over the standard and recent techniques. It served to generate the images that would aid the downstream tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111117"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Intra-view and Inter-view Enhanced Tensor Low-rank Induced Affinity Graph Learning 联合视图内和视图间增强型张量低秩诱导亲和图学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.patcog.2024.111140
Weijun Sun, Chaoye Li, Qiaoyun Li, Xiaozhao Fang, Jiakai He, Lei Liu
Graph-based and tensor-based multi-view clustering have gained popularity in recent years due to their ability to explore the relationship between samples. However, there are still several shortcomings in the current multi-view graph clustering algorithms. (1) Most previous methods only focus on the inter-view correlation, while ignoring the intra-view correlation. (2) They usually use the Tensor Nuclear Norm (TNN) to approximate the rank of tensors. However, while it has the same penalty for different singular values, the model cannot approximate the true rank of tensors well. To solve these problems in a unified way, we propose a new tensor-based multi-view graph clustering method. Specifically, we introduce the Enhanced Tensor Rank (ETR) minimization of intra-view and inter-view in the process of learning the affinity graph of each view. Compared with 10 state-of-the-art methods on 8 real datasets, the experimental results demonstrate the superiority of our method.
近年来,基于图形和张量的多视图聚类因其能够探索样本之间的关系而广受欢迎。然而,目前的多视图聚类算法仍存在一些不足。(1) 以往的方法大多只关注视图间的相关性,而忽略了视图内的相关性。(2) 它们通常使用张量核规范(TNN)来逼近张量的秩。然而,虽然它对不同奇异值的惩罚相同,但该模型不能很好地逼近张量的真实秩。为了统一解决这些问题,我们提出了一种新的基于张量的多视图聚类方法。具体来说,我们在学习每个视图的亲和图的过程中引入了视图内和视图间的增强张量秩(ETR)最小化。在 8 个真实数据集上与 10 种最先进的方法相比,实验结果证明了我们的方法的优越性。
{"title":"Joint Intra-view and Inter-view Enhanced Tensor Low-rank Induced Affinity Graph Learning","authors":"Weijun Sun,&nbsp;Chaoye Li,&nbsp;Qiaoyun Li,&nbsp;Xiaozhao Fang,&nbsp;Jiakai He,&nbsp;Lei Liu","doi":"10.1016/j.patcog.2024.111140","DOIUrl":"10.1016/j.patcog.2024.111140","url":null,"abstract":"<div><div>Graph-based and tensor-based multi-view clustering have gained popularity in recent years due to their ability to explore the relationship between samples. However, there are still several shortcomings in the current multi-view graph clustering algorithms. (1) Most previous methods only focus on the inter-view correlation, while ignoring the intra-view correlation. (2) They usually use the Tensor Nuclear Norm (TNN) to approximate the rank of tensors. However, while it has the same penalty for different singular values, the model cannot approximate the true rank of tensors well. To solve these problems in a unified way, we propose a new tensor-based multi-view graph clustering method. Specifically, we introduce the Enhanced Tensor Rank (ETR) minimization of intra-view and inter-view in the process of learning the affinity graph of each view. Compared with 10 state-of-the-art methods on 8 real datasets, the experimental results demonstrate the superiority of our method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111140"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIM-Net: Progressive Inconsistency Mining Network for image manipulation localization PIM-Net:用于图像处理定位的渐进式不一致挖掘网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-03 DOI: 10.1016/j.patcog.2024.111136
Ningning Bai , Xiaofeng Wang , Ruidong Han , Jianpeng Hou , Yihang Wang , Shanmin Pang
The content authenticity and reliability of digital images have promoted the research on image manipulation localization (IML). Most current deep learning-based methods focus on extracting global or local tampering features for identifying forged regions. These features usually contain semantic information and lead to inaccurate detection results for non-object or incomplete semantic tampered regions. In this study, we propose a novel Progressive Inconsistency Mining Network (PIM-Net) for effective IML. Specifically, PIM-Net consists of two core modules, the Inconsistency Mining Module (ICMM) and the Progressive Fusion Refinement module (PFR). ICMM models the inconsistency between authentic and forged regions at two levels, i.e., pixel correlation inconsistency and region attribute incongruity, while avoiding the interference of semantic information. Then PFR progressively aggregates and refines extracted inconsistent features, which in turn yields finer and pure localization responses. Extensive qualitative and quantitative experiments on five benchmarks demonstrate PIM-Net’s superiority to current state-of-the-art IML methods. Code: https://github.com/ningnbai/PIM-Net.
数字图像内容的真实性和可靠性促进了图像处理定位(IML)的研究。目前大多数基于深度学习的方法都侧重于提取全局或局部篡改特征来识别伪造区域。这些特征通常包含语义信息,导致对非对象或语义不完整的篡改区域的检测结果不准确。在本研究中,我们提出了一种新颖的渐进式不一致性挖掘网络(PIM-Net),以实现有效的 IML。具体来说,PIM-Net 由两个核心模块组成,即不一致性挖掘模块(ICMM)和渐进式融合细化模块(PFR)。ICMM 从像素相关性不一致性和区域属性不一致性两个层面对真实区域和伪造区域之间的不一致性进行建模,同时避免语义信息的干扰。然后,PFR 对提取的不一致特征进行逐步聚合和细化,进而得到更精细、更纯粹的定位响应。在五个基准上进行的大量定性和定量实验证明,PIM-Net 优于目前最先进的 IML 方法。代码:https://github.com/ningnbai/PIM-Net。
{"title":"PIM-Net: Progressive Inconsistency Mining Network for image manipulation localization","authors":"Ningning Bai ,&nbsp;Xiaofeng Wang ,&nbsp;Ruidong Han ,&nbsp;Jianpeng Hou ,&nbsp;Yihang Wang ,&nbsp;Shanmin Pang","doi":"10.1016/j.patcog.2024.111136","DOIUrl":"10.1016/j.patcog.2024.111136","url":null,"abstract":"<div><div>The content authenticity and reliability of digital images have promoted the research on image manipulation localization (IML). Most current deep learning-based methods focus on extracting global or local tampering features for identifying forged regions. These features usually contain semantic information and lead to inaccurate detection results for non-object or incomplete semantic tampered regions. In this study, we propose a novel Progressive Inconsistency Mining Network (PIM-Net) for effective IML. Specifically, PIM-Net consists of two core modules, the Inconsistency Mining Module (ICMM) and the Progressive Fusion Refinement module (PFR). ICMM models the inconsistency between authentic and forged regions at two levels, i.e., pixel correlation inconsistency and region attribute incongruity, while avoiding the interference of semantic information. Then PFR progressively aggregates and refines extracted inconsistent features, which in turn yields finer and pure localization responses. Extensive qualitative and quantitative experiments on five benchmarks demonstrate PIM-Net’s superiority to current state-of-the-art IML methods. Code: <span><span>https://github.com/ningnbai/PIM-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111136"},"PeriodicalIF":7.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1