首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Eye-movement-prompted large image captioning model 眼动提示大图像字幕模型
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111097
Zheng Yang , Bing Han , Xinbo Gao , Zhi-Hui Zhan
Pretrained large vision-language models have shown outstanding performance on the task of image captioning. However, owing to the insufficient decoding of image features, existing large models sometimes lose important information, such as objects, scenes, and their relationships. In addition, the complex “black-box” nature of these models makes their mechanisms difficult to explain. Research shows that humans learn richer representations than machines do, which inspires us to improve the accuracy and interpretability of large image captioning models by combining human observation patterns. We built a new dataset, called saliency in image captioning (SIC), to explore relationships between human vision and language representation. One thousand images with rich context information were selected as image data of SIC. Each image was annotated with five caption labels and five eye-movement labels. Through analysis of the eye-movement data, we found that humans efficiently captured comprehensive information for image captioning during their observations. Therefore, we propose an eye-movement-prompted large image captioning model, which is embedded with two carefully designed modules: the eye-movement simulation module (EMS) and the eye-movement analyzing module (EMA). EMS combines the human observation pattern to simulate eye-movement features, including the positions and scan paths of eye fixations. EMA is a graph neural network (GNN) based module, which decodes graphical eye-movement data and abstracts image features as a directed graph. More accurate descriptions can be predicted by decoding the generated graph. Extensive experiments were conducted on the MS-COCO and NoCaps datasets to validate our model. The experimental results showed that our network was interpretable, and could achieve superior results compared with state-of-the-art methods, i.e., 84.2% BLEU-4 and 145.1% CIDEr-D on MS-COCO Karpathy test split, indicating its strong potential for use in image captioning.
预训练的大型视觉语言模型在图像字幕任务中表现出色。然而,由于对图像特征的解码不足,现有的大型模型有时会丢失重要信息,如物体、场景及其关系。此外,这些模型复杂的 "黑箱 "性质使其机制难以解释。研究表明,人类学习到的表征比机器更丰富,这启发我们结合人类的观察模式来提高大型图像字幕模型的准确性和可解释性。我们建立了一个新的数据集,称为 "图像标题中的显著性"(SIC),以探索人类视觉与语言表征之间的关系。我们选择了一千幅具有丰富语境信息的图像作为 SIC 的图像数据。每幅图像都标注了五个标题标签和五个眼动标签。通过对眼动数据的分析,我们发现人类在观察过程中能有效地捕捉到全面的图像标题信息。因此,我们提出了眼动提示大型图像字幕模型,该模型包含两个精心设计的模块:眼动模拟模块(EMS)和眼动分析模块(EMA)。EMS 结合人类观察模式来模拟眼球运动特征,包括眼球固定的位置和扫描路径。EMA 是一个基于图神经网络(GNN)的模块,可解码图形化眼动数据,并将图像特征抽象为有向图。通过对生成的图进行解码,可以预测出更准确的描述。为了验证我们的模型,我们在 MS-COCO 和 NoCaps 数据集上进行了广泛的实验。实验结果表明,我们的网络是可解释的,与最先进的方法相比,它能取得更优越的结果,即在 MS-COCO Karpathy 测试分集上,BLEU-4 为 84.2%,CIDEr-D 为 145.1%,这表明它在图像字幕方面具有强大的应用潜力。
{"title":"Eye-movement-prompted large image captioning model","authors":"Zheng Yang ,&nbsp;Bing Han ,&nbsp;Xinbo Gao ,&nbsp;Zhi-Hui Zhan","doi":"10.1016/j.patcog.2024.111097","DOIUrl":"10.1016/j.patcog.2024.111097","url":null,"abstract":"<div><div>Pretrained large vision-language models have shown outstanding performance on the task of image captioning. However, owing to the insufficient decoding of image features, existing large models sometimes lose important information, such as objects, scenes, and their relationships. In addition, the complex “black-box” nature of these models makes their mechanisms difficult to explain. Research shows that humans learn richer representations than machines do, which inspires us to improve the accuracy and interpretability of large image captioning models by combining human observation patterns. We built a new dataset, called saliency in image captioning (SIC), to explore relationships between human vision and language representation. One thousand images with rich context information were selected as image data of SIC. Each image was annotated with five caption labels and five eye-movement labels. Through analysis of the eye-movement data, we found that humans efficiently captured comprehensive information for image captioning during their observations. Therefore, we propose an eye-movement-prompted large image captioning model, which is embedded with two carefully designed modules: the eye-movement simulation module (EMS) and the eye-movement analyzing module (EMA). EMS combines the human observation pattern to simulate eye-movement features, including the positions and scan paths of eye fixations. EMA is a graph neural network (GNN) based module, which decodes graphical eye-movement data and abstracts image features as a directed graph. More accurate descriptions can be predicted by decoding the generated graph. Extensive experiments were conducted on the MS-COCO and NoCaps datasets to validate our model. The experimental results showed that our network was interpretable, and could achieve superior results compared with state-of-the-art methods, <em>i.e.</em>, 84.2% BLEU-4 and 145.1% CIDEr-D on MS-COCO Karpathy test split, indicating its strong potential for use in image captioning.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111097"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GBMOD: A granular-ball mean-shift outlier detector GBMOD:颗粒球均值偏移离群点检测器
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111115
Shitong Cheng , Xinyu Su , Baiyang Chen , Hongmei Chen , Dezhong Peng , Zhong Yuan
Outlier detection is a crucial data mining task involving identifying abnormal objects, errors, or emerging trends. Mean-shift-based outlier detection techniques evaluate the abnormality of an object by calculating the mean distance between the object and its k-nearest neighbors. However, in datasets with significant noise, the presence of noise in the k-nearest neighbors of some objects makes the model ineffective in detecting outliers. Additionally, the mean-shift outlier detection technique depends on finding the k-nearest neighbors of an object, which can be time-consuming. To address these issues, we propose a granular-ball computing-based mean-shift outlier detection method (GBMOD). Specifically, we first generate high-quality granular-balls to cover the data. By using the centers of granular-balls as anchors, the subsequent mean-shift process can effectively avoid the influence of noise points in the neighborhood. Then, outliers are detected based on the distance from the object to the displaced center of the granular-ball to which it belongs. Finally, the distance between the object and the shifted center of the granular-ball to which the object belongs is calculated, resulting in the outlier scores of objects. Subsequent experiments demonstrate the effectiveness, efficiency, and robustness of the method proposed in this paper.
离群点检测是一项重要的数据挖掘任务,涉及识别异常对象、错误或新趋势。基于均值移动的离群点检测技术通过计算对象与其 k 近邻之间的平均距离来评估对象的异常性。然而,在存在大量噪声的数据集中,一些对象的 k 近邻中存在噪声,使得该模型无法有效检测异常值。此外,均值偏移离群点检测技术依赖于找到对象的 k 个近邻,这可能非常耗时。为了解决这些问题,我们提出了一种基于颗粒球计算的均值偏移离群点检测方法(GBMOD)。具体来说,我们首先生成高质量的颗粒球来覆盖数据。通过使用颗粒球的中心作为锚点,随后的均值转移过程可以有效避免邻域中噪声点的影响。然后,根据对象到其所属颗粒球的位移中心的距离来检测异常值。最后,计算对象与所属颗粒球的移位中心之间的距离,得出对象的离群值。随后的实验证明了本文所提方法的有效性、高效性和稳健性。
{"title":"GBMOD: A granular-ball mean-shift outlier detector","authors":"Shitong Cheng ,&nbsp;Xinyu Su ,&nbsp;Baiyang Chen ,&nbsp;Hongmei Chen ,&nbsp;Dezhong Peng ,&nbsp;Zhong Yuan","doi":"10.1016/j.patcog.2024.111115","DOIUrl":"10.1016/j.patcog.2024.111115","url":null,"abstract":"<div><div>Outlier detection is a crucial data mining task involving identifying abnormal objects, errors, or emerging trends. Mean-shift-based outlier detection techniques evaluate the abnormality of an object by calculating the mean distance between the object and its <span><math><mi>k</mi></math></span>-nearest neighbors. However, in datasets with significant noise, the presence of noise in the <span><math><mi>k</mi></math></span>-nearest neighbors of some objects makes the model ineffective in detecting outliers. Additionally, the mean-shift outlier detection technique depends on finding the <span><math><mi>k</mi></math></span>-nearest neighbors of an object, which can be time-consuming. To address these issues, we propose a granular-ball computing-based mean-shift outlier detection method (GBMOD). Specifically, we first generate high-quality granular-balls to cover the data. By using the centers of granular-balls as anchors, the subsequent mean-shift process can effectively avoid the influence of noise points in the neighborhood. Then, outliers are detected based on the distance from the object to the displaced center of the granular-ball to which it belongs. Finally, the distance between the object and the shifted center of the granular-ball to which the object belongs is calculated, resulting in the outlier scores of objects. Subsequent experiments demonstrate the effectiveness, efficiency, and robustness of the method proposed in this paper.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111115"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the sparse coding model via hybrid Gaussian priors 通过混合高斯先验改进稀疏编码模型
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111102
Lijian Yang , Jianxun Mi , Weisheng Li , Guofen Wang , Bin Xiao
Sparse Coding (SC) imposes a sparse prior on the representation coefficients under a dictionary or a sensing matrix. However, the sparse regularization, approximately expressed as the L1-norm, is not strongly convex. The uniqueness of the optimal solution requires the dictionary to be of low mutual coherence. As a specialized form of SC, Convolutional Sparse Coding (CSC) encounters the same issue. Inspired by the Elastic Net, this paper proposes to learn an additional anisotropic Gaussian prior for the sparse codes, thus improving the convexity of the SC problem and enabling the modeling of feature correlation. As a result, the SC problem is modified by the proposed elastic projection. We thereby analyze the effectiveness of the proposed method under the framework of LISTA and demonstrate that this simple technique has the potential to correct bad codes and reduce the error bound, especially in noisy scenarios. Furthermore, we extend this technique to the CSC model for the vision practice of image denoising. Extensive experimental results show that the learned Gaussian prior significantly improves the performance of both the SC and CSC models. Source codes are available at https://github.com/eeejyang/EPCSCNet.
稀疏编码(SC)对字典或传感矩阵下的表示系数施加稀疏先验。然而,稀疏正则化(近似表示为 L1-norm)并不是强凸的。最优解的唯一性要求字典具有较低的相互一致性。作为 SC 的一种特殊形式,卷积稀疏编码(CSC)也遇到了同样的问题。受弹性网的启发,本文提出为稀疏编码学习额外的各向异性高斯先验,从而改善 SC 问题的凸性,实现特征相关性建模。因此,所提出的弹性投影对 SC 问题进行了修改。因此,我们在 LISTA 框架下分析了所提方法的有效性,并证明这种简单的技术具有纠正不良编码和降低误差边界的潜力,尤其是在有噪声的情况下。此外,我们还将这一技术扩展到 CSC 模型,用于图像去噪的视觉实践。广泛的实验结果表明,学习的高斯先验显著提高了 SC 和 CSC 模型的性能。源代码见 https://github.com/eeejyang/EPCSCNet。
{"title":"Improving the sparse coding model via hybrid Gaussian priors","authors":"Lijian Yang ,&nbsp;Jianxun Mi ,&nbsp;Weisheng Li ,&nbsp;Guofen Wang ,&nbsp;Bin Xiao","doi":"10.1016/j.patcog.2024.111102","DOIUrl":"10.1016/j.patcog.2024.111102","url":null,"abstract":"<div><div>Sparse Coding (SC) imposes a sparse prior on the representation coefficients under a dictionary or a sensing matrix. However, the sparse regularization, approximately expressed as the <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm, is not strongly convex. The uniqueness of the optimal solution requires the dictionary to be of low mutual coherence. As a specialized form of SC, Convolutional Sparse Coding (CSC) encounters the same issue. Inspired by the Elastic Net, this paper proposes to learn an additional anisotropic Gaussian prior for the sparse codes, thus improving the convexity of the SC problem and enabling the modeling of feature correlation. As a result, the SC problem is modified by the proposed elastic projection. We thereby analyze the effectiveness of the proposed method under the framework of LISTA and demonstrate that this simple technique has the potential to correct bad codes and reduce the error bound, especially in noisy scenarios. Furthermore, we extend this technique to the CSC model for the vision practice of image denoising. Extensive experimental results show that the learned Gaussian prior significantly improves the performance of both the SC and CSC models. Source codes are available at <span><span>https://github.com/eeejyang/EPCSCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111102"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data augmentation strategies for semi-supervised medical image segmentation 半监督医学图像分割的数据增强策略
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111116
Jiahui Wang , Dongsheng Ruan , Yang Li , Zefeng Wang , Yongquan Wu , Tao Tan , Guang Yang , Mingfeng Jiang
Exploiting unlabeled and labeled data augmentations has become considerably important for semi-supervised medical image segmentation tasks. However, existing data augmentation methods, such as Cut-mix and generative models, typically dependent on consistency regularization or ignore data correlation between slices. To address cognitive biases problems, we propose two novel data augmentation strategies and a Dual Attention-guided Consistency network (DACNet) to improve semi-supervised medical image segmentation performance significantly. For labeled data augmentation, we randomly crop and stitch annotated data rather than unlabeled data to create mixed annotated data, which breaks the anatomical structures and introduces voxel-level uncertainty in limited annotated data. For unlabeled data augmentation, we combine the diffusion model with the Laplacian pyramid fusion strategy to generate unlabeled data with higher slice correlation. To enhance the decoders to learn different semantic but discriminative features, we propose the DACNet to achieve structural differentiation by introducing spatial and channel attention into the decoders. Extensive experiments are conducted to show the effectiveness and generalization of our approach. Specifically, our proposed labeled and unlabeled data augmentation strategies improved accuracy by 0.3% to 16.49% and 0.22% to 1.72%, respectively, when compared with various state-of-the-art semi-supervised methods. Furthermore, our DACNet outperforms existing methods on three medical datasets (91.72% dice score with 20% labeled data on the LA dataset). Source code will be publicly available at https://github.com/Oubit1/DACNet.
在半监督医学图像分割任务中,利用非标记和标记数据增强已变得相当重要。然而,现有的数据扩增方法,如剪切混合和生成模型,通常依赖于一致性正则化或忽略切片之间的数据相关性。为了解决认知偏差问题,我们提出了两种新型数据增强策略和双注意力引导一致性网络(DACNet),以显著提高半监督医学图像分割性能。在标注数据扩增方面,我们随机裁剪和拼接标注数据而非非标注数据,以创建混合标注数据,从而打破解剖结构,并在有限的标注数据中引入体素级不确定性。在非标注数据增强方面,我们将扩散模型与拉普拉卡方金字塔融合策略相结合,生成具有更高切片相关性的非标注数据。为了增强解码器学习不同语义但具有区分性的特征,我们提出了 DACNet,通过在解码器中引入空间和通道注意力来实现结构区分。我们进行了广泛的实验,以展示我们方法的有效性和通用性。具体来说,与各种最先进的半监督方法相比,我们提出的标记和非标记数据增强策略分别提高了 0.3% 至 16.49% 和 0.22% 至 1.72% 的准确率。此外,我们的 DACNet 在三个医学数据集上的表现优于现有方法(在洛杉矶数据集上,使用 20% 的标记数据,骰子得分率为 91.72%)。源代码将在 https://github.com/Oubit1/DACNet 公开。
{"title":"Data augmentation strategies for semi-supervised medical image segmentation","authors":"Jiahui Wang ,&nbsp;Dongsheng Ruan ,&nbsp;Yang Li ,&nbsp;Zefeng Wang ,&nbsp;Yongquan Wu ,&nbsp;Tao Tan ,&nbsp;Guang Yang ,&nbsp;Mingfeng Jiang","doi":"10.1016/j.patcog.2024.111116","DOIUrl":"10.1016/j.patcog.2024.111116","url":null,"abstract":"<div><div>Exploiting unlabeled and labeled data augmentations has become considerably important for semi-supervised medical image segmentation tasks. However, existing data augmentation methods, such as Cut-mix and generative models, typically dependent on consistency regularization or ignore data correlation between slices. To address cognitive biases problems, we propose two novel data augmentation strategies and a Dual Attention-guided Consistency network (DACNet) to improve semi-supervised medical image segmentation performance significantly. For labeled data augmentation, we randomly crop and stitch annotated data rather than unlabeled data to create mixed annotated data, which breaks the anatomical structures and introduces voxel-level uncertainty in limited annotated data. For unlabeled data augmentation, we combine the diffusion model with the Laplacian pyramid fusion strategy to generate unlabeled data with higher slice correlation. To enhance the decoders to learn different semantic but discriminative features, we propose the DACNet to achieve structural differentiation by introducing spatial and channel attention into the decoders. Extensive experiments are conducted to show the effectiveness and generalization of our approach. Specifically, our proposed labeled and unlabeled data augmentation strategies improved accuracy by 0.3% to 16.49% and 0.22% to 1.72%, respectively, when compared with various state-of-the-art semi-supervised methods. Furthermore, our DACNet outperforms existing methods on three medical datasets (91.72% dice score with 20% labeled data on the LA dataset). Source code will be publicly available at <span><span>https://github.com/Oubit1/DACNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111116"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedKT: Federated learning with knowledge transfer for non-IID data FedKT:针对非 IID 数据的具有知识转移功能的联合学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111143
Wenjie Mao , Bin Yu , Chen Zhang , A.K. Qin , Yu Xie
Federated Learning enables clients to train a joint model collaboratively without disclosing raw data. However, learning over non-IID data may raise performance degeneration, which has become a fundamental bottleneck. Despite numerous efforts to address this issue, challenges such as excessive local computational burdens and reliance on shared data persist, rendering them impractical in real-world scenarios. In this paper, we propose a novel federated knowledge transfer framework to overcome data heterogeneity issues. Specifically, a model segmentation distillation method and a learnable aggregation network are developed for server-side knowledge ensemble and transfer, while a client-side consistency-constrained loss is devised to rectify local updates, thereby enhancing both global and client models. The framework considers both diversity and consistency among clients and can serve as a general solution for extracting knowledge from distributed nodes. Extensive experiments on four datasets demonstrate our framework’s effectiveness, achieving superior performance compared to advanced competitors in high-heterogeneity settings.
联合学习(Federated Learning)使客户能够在不公开原始数据的情况下协作训练联合模型。然而,通过非 IID 数据进行学习可能会导致性能下降,这已成为一个根本瓶颈。尽管为解决这一问题做出了许多努力,但诸如过重的本地计算负担和对共享数据的依赖等挑战依然存在,使它们在现实世界的应用场景中变得不切实际。在本文中,我们提出了一种新颖的联合知识传输框架,以克服数据异构问题。具体来说,我们开发了一种模型分割提炼方法和一种可学习的聚合网络,用于服务器端的知识集合和传输,同时设计了一种客户端一致性约束损失来纠正局部更新,从而增强全局模型和客户端模型。该框架同时考虑了客户端之间的多样性和一致性,可作为从分布式节点提取知识的通用解决方案。在四个数据集上进行的广泛实验证明了我们框架的有效性,在高异构性设置中,我们的框架比先进的竞争对手取得了更优越的性能。
{"title":"FedKT: Federated learning with knowledge transfer for non-IID data","authors":"Wenjie Mao ,&nbsp;Bin Yu ,&nbsp;Chen Zhang ,&nbsp;A.K. Qin ,&nbsp;Yu Xie","doi":"10.1016/j.patcog.2024.111143","DOIUrl":"10.1016/j.patcog.2024.111143","url":null,"abstract":"<div><div>Federated Learning enables clients to train a joint model collaboratively without disclosing raw data. However, learning over non-IID data may raise performance degeneration, which has become a fundamental bottleneck. Despite numerous efforts to address this issue, challenges such as excessive local computational burdens and reliance on shared data persist, rendering them impractical in real-world scenarios. In this paper, we propose a novel federated knowledge transfer framework to overcome data heterogeneity issues. Specifically, a model segmentation distillation method and a learnable aggregation network are developed for server-side knowledge ensemble and transfer, while a client-side consistency-constrained loss is devised to rectify local updates, thereby enhancing both global and client models. The framework considers both diversity and consistency among clients and can serve as a general solution for extracting knowledge from distributed nodes. Extensive experiments on four datasets demonstrate our framework’s effectiveness, achieving superior performance compared to advanced competitors in high-heterogeneity settings.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111143"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion 使用双分支斯温变换器与非对称注意力融合进行雷达步态识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111101
Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang
Video-based gait recognition suffers from potential privacy issues and performance degradation due to dim environments, partial occlusions, or camera view changes. Radar has recently become increasingly popular and overcome various challenges presented by vision sensors. To capture tiny differences in radar gait signatures of different people, a dual-branch Swin Transformer is proposed, where one branch captures the time variations of the radar micro-Doppler signature and the other captures the repetitive frequency patterns in the spectrogram. Unlike natural images where objects can be translated, rotated, or scaled, the spatial coordinates of spectrograms and CVDs have unique physical meanings, and there is no affine transformation for radar targets in these synthetic images. The patch splitting mechanism in Vision Transformer makes it ideal to extract discriminant information from patches, and learn the attentive information across patches, as each patch carries some unique physical properties of radar targets. Swin Transformer consists of a set of cascaded Swin blocks to extract semantic features from shallow to deep representations, further improving the classification performance. Lastly, to highlight the branch with larger discriminant power, an Asymmetric Attention Fusion is proposed to optimally fuse the discriminant features from the two branches. To enrich the research on radar gait recognition, a large-scale NTU-RGR dataset is constructed, containing 45,768 radar frames of 98 subjects. The proposed method is evaluated on the NTU-RGR dataset and the MMRGait-1.0 database. It consistently and significantly outperforms all the compared methods on both datasets. The codes are available at: https://github.com/wentaoheunnc/NTU-RGR.
基于视频的步态识别存在潜在的隐私问题,并且由于环境昏暗、部分遮挡或摄像头视角变化而导致性能下降。最近,雷达变得越来越流行,并克服了视觉传感器带来的各种挑战。为了捕捉不同人的雷达步态特征的微小差异,提出了一种双分支斯温变换器,其中一个分支捕捉雷达微多普勒特征的时间变化,另一个分支捕捉频谱图中的重复频率模式。与自然图像不同,自然图像中的物体可以平移、旋转或缩放,而频谱图和 CVD 的空间坐标具有独特的物理意义,因此这些合成图像中的雷达目标不存在仿射变换。Vision Transformer 中的补丁分割机制使其非常适合从补丁中提取判别信息,并学习补丁间的注意信息,因为每个补丁都带有雷达目标的一些独特物理特性。Swin Transformer 由一组级联 Swin 块组成,可从浅层到深层表征中提取语义特征,从而进一步提高分类性能。最后,为了突出具有更大判别能力的分支,提出了一种非对称注意力融合方法,以优化融合两个分支的判别特征。为了丰富雷达步态识别的研究,我们构建了一个大规模的 NTU-RGR 数据集,其中包含 98 个受试者的 45,768 个雷达帧。所提出的方法在 NTU-RGR 数据集和 MMRGait-1.0 数据库上进行了评估。在这两个数据集上,该方法始终明显优于所有比较过的方法。代码见:https://github.com/wentaoheunnc/NTU-RGR。
{"title":"Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion","authors":"Wentao He ,&nbsp;Jianfeng Ren ,&nbsp;Ruibin Bai ,&nbsp;Xudong Jiang","doi":"10.1016/j.patcog.2024.111101","DOIUrl":"10.1016/j.patcog.2024.111101","url":null,"abstract":"<div><div>Video-based gait recognition suffers from potential privacy issues and performance degradation due to dim environments, partial occlusions, or camera view changes. Radar has recently become increasingly popular and overcome various challenges presented by vision sensors. To capture tiny differences in radar gait signatures of different people, a dual-branch Swin Transformer is proposed, where one branch captures the time variations of the radar micro-Doppler signature and the other captures the repetitive frequency patterns in the spectrogram. Unlike natural images where objects can be translated, rotated, or scaled, the spatial coordinates of spectrograms and CVDs have unique physical meanings, and there is no affine transformation for radar targets in these synthetic images. The patch splitting mechanism in Vision Transformer makes it ideal to extract discriminant information from patches, and learn the attentive information across patches, as each patch carries some unique physical properties of radar targets. Swin Transformer consists of a set of cascaded Swin blocks to extract semantic features from shallow to deep representations, further improving the classification performance. Lastly, to highlight the branch with larger discriminant power, an Asymmetric Attention Fusion is proposed to optimally fuse the discriminant features from the two branches. To enrich the research on radar gait recognition, a large-scale NTU-RGR dataset is constructed, containing 45,768 radar frames of 98 subjects. The proposed method is evaluated on the NTU-RGR dataset and the MMRGait-1.0 database. It consistently and significantly outperforms all the compared methods on both datasets. <em>The codes are available at:</em> <span><span>https://github.com/wentaoheunnc/NTU-RGR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111101"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model MWVOS:通过可提示基础模型进行无掩码弱监督视频对象分割
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111100
Zhenghao Zhang , Shengfan Zhang , Zuozhuo Dai , Zilong Dong , Siyu Zhu
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm for a variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial–temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
目前最先进的视频对象分割技术必须在带有掩码注释的视频数据集上进行大量训练,从而限制了它们将零点学习转移到新图像分布和任务中的能力。然而,基础模型的最新进展,尤其是在图像分割领域,展示了强大的泛化能力,为新数据分布上的各种下游分割挑战引入了新颖的提示驱动范式。本研究利用不同的提示策略深入研究了视觉基础模型的潜力,并提出了一种用于无监督视频对象分割的无掩码方法。为了进一步提高提示学习在复杂多样视频场景中的效率,我们引入了一种时空解耦的可变形关注机制,以建立帧内和帧间特征之间的有效关联。在 DAVIS2017-unsupervised 数据集、YoutubeVIS19&21 数据集和 OIVS 数据集上进行的广泛实验表明,与现有的掩码监督方法相比,所提出的方法在没有掩码监督的情况下性能优越,而且还能推广到弱注释视频数据集。
{"title":"MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model","authors":"Zhenghao Zhang ,&nbsp;Shengfan Zhang ,&nbsp;Zuozhuo Dai ,&nbsp;Zilong Dong ,&nbsp;Siyu Zhu","doi":"10.1016/j.patcog.2024.111100","DOIUrl":"10.1016/j.patcog.2024.111100","url":null,"abstract":"<div><div>The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm for a variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial–temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&amp;21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111100"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression 用于可扩展高斯过程回归的联合随机全对称插值规则和局部近似法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111125
Hongli Zhang, Jinglei Liu
<div><div>When exploring the broad application prospects of large-scale Gaussian process regression (GPR), three core challenges significantly constrain its full effectiveness: firstly, the <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> time complexity of computing the inverse covariance matrix of <span><math><mi>n</mi></math></span> training points becomes an insurmountable performance bottleneck when processing large-scale datasets; Secondly, although traditional local approximation methods are widely used, they are often limited by the inconsistency of prediction results; The third issue is that many aggregation strategies lack discrimination when evaluating the importance of experts (i.e. local models), resulting in a loss of overall prediction accuracy. In response to the above challenges, this article innovatively proposes a comprehensive method that integrates third-degree stochastic fully symmetric interpolatory rules (TDSFSI), local approximation, and Tsallis mutual information (TDSFSIRLA), aiming to fundamentally break through existing limitations. Specifically, TDSFSIRLA first introduces an efficient third-degree stochastic fully symmetric interpolatory rules, which achieves accurate approximation of Gaussian kernel functions by generating adaptive dimensional feature maps. This innovation not only significantly reduces the number of required orthogonal nodes and effectively lowers computational costs, but also maintains extremely high approximation accuracy, providing a solid theoretical foundation for processing large-scale datasets. Furthermore, in order to overcome the inconsistency of local approximation methods, this paper adopts the Generalized Robust Bayesian Committee Machine (GRBCM) as the aggregation framework for local experts. GRBCM ensures the harmonious unity of the prediction results of each local model through its inherent consistency and robustness, significantly improving the stability and reliability of the overall prediction. More importantly, in response to the issue of uneven distribution of expert weights, this article creatively introduces Tsallis mutual information as a metric for weight allocation. Tsallis mutual information, with its sensitive ability to capture information complexity, assigns weights to different local experts that match their contribution, effectively solving the problem of prediction bias caused by uneven weight distribution and further improving prediction accuracy. In the experimental verification phase, this article conducted comprehensive testing on multiple synthetic datasets and seven representative real datasets. The results show that the TDSFSIRLA method not only achieves significant reduction in time complexity, but also demonstrates excellent performance in prediction accuracy, fully verifying its significant advantages and broad application prospects in the field of large-scale Gaussi
在探索大规模高斯过程回归(GPR)的广阔应用前景时,有三个核心挑战极大地制约了它的充分发挥:首先,在处理大规模数据集时,计算 n 个训练点的逆协方差矩阵所需的 O(n3) 时间复杂度成为一个难以克服的性能瓶颈;其次,尽管传统的局部逼近方法得到了广泛应用,但它们往往受限于预测结果的不一致性;第三个问题是,许多聚合策略在评估专家(即局部模型)的重要性时缺乏辨别力,导致整体预测精度下降。即局部模型)的重要性时缺乏辨别力,从而导致整体预测精度的损失。针对上述挑战,本文创新性地提出了一种集成了三度随机全对称插值规则(TDSFSI)、局部逼近和 Tsallis 互信息的综合方法(TDSFSIRLA),旨在从根本上突破现有的限制。具体来说,TDSFSIRLA 首先引入了一种高效的三度随机全对称插值规则,通过生成自适应维度特征图来实现对高斯核函数的精确逼近。这一创新不仅大大减少了所需正交节点的数量,有效降低了计算成本,而且保持了极高的逼近精度,为处理大规模数据集提供了坚实的理论基础。此外,为了克服局部逼近方法的不一致性,本文采用广义稳健贝叶斯委员会机(GRBCM)作为局部专家的聚合框架。GRBCM 通过其固有的一致性和鲁棒性保证了各局部模型预测结果的和谐统一,显著提高了整体预测的稳定性和可靠性。更重要的是,针对专家权重分配不均的问题,本文创造性地引入了 Tsallis 互信息作为权重分配的指标。Tsallis 互信息能够灵敏地捕捉信息复杂性,为不同的本地专家分配与其贡献相匹配的权重,有效解决了权重分配不均导致的预测偏差问题,进一步提高了预测精度。在实验验证阶段,本文在多个合成数据集和七个具有代表性的真实数据集上进行了全面测试。结果表明,TDSFSIRLA 方法不仅显著降低了时间复杂度,而且在预测精度方面表现优异,充分验证了其在大规模高斯过程回归领域的显著优势和广阔应用前景。
{"title":"Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression","authors":"Hongli Zhang,&nbsp;Jinglei Liu","doi":"10.1016/j.patcog.2024.111125","DOIUrl":"10.1016/j.patcog.2024.111125","url":null,"abstract":"&lt;div&gt;&lt;div&gt;When exploring the broad application prospects of large-scale Gaussian process regression (GPR), three core challenges significantly constrain its full effectiveness: firstly, the &lt;span&gt;&lt;math&gt;&lt;mrow&gt;&lt;mi&gt;O&lt;/mi&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt;&lt;msup&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; time complexity of computing the inverse covariance matrix of &lt;span&gt;&lt;math&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;/math&gt;&lt;/span&gt; training points becomes an insurmountable performance bottleneck when processing large-scale datasets; Secondly, although traditional local approximation methods are widely used, they are often limited by the inconsistency of prediction results; The third issue is that many aggregation strategies lack discrimination when evaluating the importance of experts (i.e. local models), resulting in a loss of overall prediction accuracy. In response to the above challenges, this article innovatively proposes a comprehensive method that integrates third-degree stochastic fully symmetric interpolatory rules (TDSFSI), local approximation, and Tsallis mutual information (TDSFSIRLA), aiming to fundamentally break through existing limitations. Specifically, TDSFSIRLA first introduces an efficient third-degree stochastic fully symmetric interpolatory rules, which achieves accurate approximation of Gaussian kernel functions by generating adaptive dimensional feature maps. This innovation not only significantly reduces the number of required orthogonal nodes and effectively lowers computational costs, but also maintains extremely high approximation accuracy, providing a solid theoretical foundation for processing large-scale datasets. Furthermore, in order to overcome the inconsistency of local approximation methods, this paper adopts the Generalized Robust Bayesian Committee Machine (GRBCM) as the aggregation framework for local experts. GRBCM ensures the harmonious unity of the prediction results of each local model through its inherent consistency and robustness, significantly improving the stability and reliability of the overall prediction. More importantly, in response to the issue of uneven distribution of expert weights, this article creatively introduces Tsallis mutual information as a metric for weight allocation. Tsallis mutual information, with its sensitive ability to capture information complexity, assigns weights to different local experts that match their contribution, effectively solving the problem of prediction bias caused by uneven weight distribution and further improving prediction accuracy. In the experimental verification phase, this article conducted comprehensive testing on multiple synthetic datasets and seven representative real datasets. The results show that the TDSFSIRLA method not only achieves significant reduction in time complexity, but also demonstrates excellent performance in prediction accuracy, fully verifying its significant advantages and broad application prospects in the field of large-scale Gaussi","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111125"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-preserving speaker verification system using Ranking-of-Element hashing 使用元素散列排名的保护隐私扬声器验证系统
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111107
Hong-Hanh Nguyen-Le , Lam Tran , Dinh Song An Nguyen , Nhien-An Le-Khac , Thuc Nguyen
The advancements in automatic speaker recognition have led to the exploration of voice data for verification systems. This raises concerns about the security of storing voice templates in plaintext. In this paper, we propose a novel cancellable biometrics that does not require users to manage random matrices or tokens. First, we pre-process the raw voice data and feed it into a deep feature extraction module to obtain embeddings. Next, we propose a hashing scheme, Ranking-of-Elements, which generates compact hashed codes by recording the number of elements whose values are lower than that of a random element. This approach captures more information from smaller-valued elements and prevents the adversary from guessing the ranking value through Attacks via Record Multiplicity. Lastly, we introduce a fuzzy matching method, to mitigate the variations in templates resulting from environmental noise. We evaluate the performance and security of our method on two datasets: TIMIT and VoxCeleb1.
说话人自动识别技术的进步促使人们开始探索用于验证系统的语音数据。这引发了人们对以明文存储语音模板的安全性的担忧。在本文中,我们提出了一种无需用户管理随机矩阵或令牌的新型可取消生物识别技术。首先,我们对原始语音数据进行预处理,并将其输入深度特征提取模块以获得嵌入信息。接下来,我们提出了一种散列方案--"元素排序"(Ranking-of-Elements),通过记录值小于随机元素值的元素数量来生成紧凑的散列代码。这种方法能从数值较小的元素中获取更多信息,并防止对手通过记录多重性攻击猜测排名值。最后,我们引入了一种模糊匹配方法,以减少环境噪声造成的模板变化。我们在两个数据集上评估了我们方法的性能和安全性:TIMIT 和 VoxCeleb1。
{"title":"Privacy-preserving speaker verification system using Ranking-of-Element hashing","authors":"Hong-Hanh Nguyen-Le ,&nbsp;Lam Tran ,&nbsp;Dinh Song An Nguyen ,&nbsp;Nhien-An Le-Khac ,&nbsp;Thuc Nguyen","doi":"10.1016/j.patcog.2024.111107","DOIUrl":"10.1016/j.patcog.2024.111107","url":null,"abstract":"<div><div>The advancements in automatic speaker recognition have led to the exploration of voice data for verification systems. This raises concerns about the security of storing voice templates in plaintext. In this paper, we propose a novel cancellable biometrics that does not require users to manage random matrices or tokens. First, we pre-process the raw voice data and feed it into a deep feature extraction module to obtain embeddings. Next, we propose a hashing scheme, Ranking-of-Elements, which generates compact hashed codes by recording the number of elements whose values are lower than that of a random element. This approach captures more information from smaller-valued elements and prevents the adversary from guessing the ranking value through Attacks via Record Multiplicity. Lastly, we introduce a fuzzy matching method, to mitigate the variations in templates resulting from environmental noise. We evaluate the performance and security of our method on two datasets: TIMIT and VoxCeleb1.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111107"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HDR reconstruction from a single exposure LDR using texture and structure dual-stream generation 利用纹理和结构双流生成技术,从单次曝光的 LDR 重建 HDR
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-31 DOI: 10.1016/j.patcog.2024.111127
Yu-Hsiang Chen, Shanq-Jang Ruan
Reconstructing high dynamic range (HDR) imagery from a single low dynamic range (LDR) photograph presents substantial challenges. The challenges are primarily due to the loss of details and information in regions of underexposure or overexposure due to quantization and saturation inherent to camera sensors. Traditional learning-based approaches often struggle with distinguishing overexposed regions within an object from the background, leading to compromised detail retention in these critical areas. Our methodology focuses on meticulously reconstructing structural and textural details to preserve the integrity of the structural information. We propose a new two-stage model architecture for HDR image reconstruction, including a dual-stream network and a feature fusion stage. The dual-stream network is designed to reconstruct structural and textural details, while the feature fusion stage aims to minimize artifacts by utilizing the reconstructed information. We have demonstrated that our proposed method performs better than other state-of-the-art single-image HDR reconstruction algorithms in various quality metrics.
从单张低动态范围(LDR)照片重建高动态范围(HDR)图像是一项巨大的挑战。这些挑战主要是由于相机传感器固有的量化和饱和度导致曝光不足或曝光过度区域的细节和信息丢失。传统的基于学习的方法往往难以将物体内曝光过度的区域与背景区分开来,导致这些关键区域的细节保留受到影响。我们的方法侧重于精心重建结构和纹理细节,以保持结构信息的完整性。我们为 HDR 图像重建提出了一种新的两阶段模型架构,包括双流网络和特征融合阶段。双流网络旨在重建结构和纹理细节,而特征融合阶段则旨在利用重建的信息最大限度地减少伪影。我们已经证明,我们提出的方法在各种质量指标上都优于其他最先进的单图像 HDR 重建算法。
{"title":"HDR reconstruction from a single exposure LDR using texture and structure dual-stream generation","authors":"Yu-Hsiang Chen,&nbsp;Shanq-Jang Ruan","doi":"10.1016/j.patcog.2024.111127","DOIUrl":"10.1016/j.patcog.2024.111127","url":null,"abstract":"<div><div>Reconstructing high dynamic range (HDR) imagery from a single low dynamic range (LDR) photograph presents substantial challenges. The challenges are primarily due to the loss of details and information in regions of underexposure or overexposure due to quantization and saturation inherent to camera sensors. Traditional learning-based approaches often struggle with distinguishing overexposed regions within an object from the background, leading to compromised detail retention in these critical areas. Our methodology focuses on meticulously reconstructing structural and textural details to preserve the integrity of the structural information. We propose a new two-stage model architecture for HDR image reconstruction, including a dual-stream network and a feature fusion stage. The dual-stream network is designed to reconstruct structural and textural details, while the feature fusion stage aims to minimize artifacts by utilizing the reconstructed information. We have demonstrated that our proposed method performs better than other state-of-the-art single-image HDR reconstruction algorithms in various quality metrics.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111127"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1