首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
A novel HMM distance measure with state alignment 带状态对齐的新型 HMM 距离测量法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.018
Nan Yang , Cheuk Hang Leung , Xing Yan
In this paper, we introduce a novel distance measure that conforms to the definition of a semi-distance, for quantifying the similarity between Hidden Markov Models (HMMs). This distance measure is not only easier to implement, but also accounts for state alignment before distance calculation, ensuring correctness and accuracy. Our proposed distance measure presents a significant advancement in HMM comparison, offering a more practical and accurate solution compared to existing measures. Numerical examples that demonstrate the utility of the proposed distance measure are given for HMMs with continuous state probability densities. In real-world data experiments, we employ HMM to represent the evolution of financial time series or music. Subsequently, leveraging the proposed distance measure, we conduct HMM-based unsupervised clustering, demonstrating promising results. Our approach proves effective in capturing the inherent difference in dynamics of financial time series, showcasing the practicality and success of the proposed distance measure.
在本文中,我们介绍了一种符合半距离定义的新型距离测量方法,用于量化隐马尔可夫模型(HMM)之间的相似性。这种距离度量不仅更容易实现,而且在距离计算前考虑了状态对齐,确保了正确性和准确性。我们提出的距离测量方法是 HMM 比较领域的一大进步,与现有测量方法相比,它提供了一种更实用、更准确的解决方案。针对具有连续状态概率密度的 HMM,我们给出了一些数字示例,证明了所提出的距离测量方法的实用性。在实际数据实验中,我们使用 HMM 来表示金融时间序列或音乐的演变。随后,利用提出的距离度量,我们进行了基于 HMM 的无监督聚类,并取得了令人满意的结果。事实证明,我们的方法能有效捕捉金融时间序列动态的内在差异,展示了所提出的距离度量的实用性和成功性。
{"title":"A novel HMM distance measure with state alignment","authors":"Nan Yang ,&nbsp;Cheuk Hang Leung ,&nbsp;Xing Yan","doi":"10.1016/j.patrec.2024.10.018","DOIUrl":"10.1016/j.patrec.2024.10.018","url":null,"abstract":"<div><div>In this paper, we introduce a novel distance measure that conforms to the definition of a semi-distance, for quantifying the similarity between Hidden Markov Models (HMMs). This distance measure is not only easier to implement, but also accounts for state alignment before distance calculation, ensuring correctness and accuracy. Our proposed distance measure presents a significant advancement in HMM comparison, offering a more practical and accurate solution compared to existing measures. Numerical examples that demonstrate the utility of the proposed distance measure are given for HMMs with continuous state probability densities. In real-world data experiments, we employ HMM to represent the evolution of financial time series or music. Subsequently, leveraging the proposed distance measure, we conduct HMM-based unsupervised clustering, demonstrating promising results. Our approach proves effective in capturing the inherent difference in dynamics of financial time series, showcasing the practicality and success of the proposed distance measure.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 314-321"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LuminanceGAN: Controlling the brightness of generated images for various night conditions LuminanceGAN:根据不同的夜间条件控制生成图像的亮度
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.014
Junghyun Seo , Sungjun Wang , Hyeonjae Jeon , Taesoo Kim , Yongsik Jin , Soon Kwon , Jeseok Kim , Yongseob Lim
There are diverse datasets available for training deep learning models utilized in autonomous driving. However, most of these datasets are composed of images obtained in day conditions, leading to a data imbalance issue when dealing with night condition images. Several day-to-night image translation models have been proposed to resolve the insufficiency of the night condition dataset, but these models often generate artifacts and cannot control the brightness of the generated image. In this study, we propose a LuminanceGAN, for controlling the brightness degree in night conditions to generate realistic night image outputs. The proposed novel Y-control loss converges the brightness degree of the output image to a specific luminance value. Furthermore, the implementation of the self-attention module effectively reduces artifacts in the generated images. Consequently, in qualitative comparisons, our model demonstrates superior performance in day-to-night image translation. Additionally, a quantitative evaluation was conducted using lane detection models, showing that our proposed method improves performance in night lane detection tasks. Moreover, the quality of the generated indoor dark images was assessed using an evaluation metric. It can be proven that our model generates images most similar to real dark images compared to other image translation models.
有多种数据集可用于训练自动驾驶中使用的深度学习模型。然而,这些数据集大多由白天获得的图像组成,导致在处理夜间图像时出现数据不平衡问题。为了解决夜间状态数据集的不足,人们提出了一些日夜图像转换模型,但这些模型往往会产生伪影,而且无法控制生成图像的亮度。在本研究中,我们提出了一种 LuminanceGAN,用于控制夜间条件下的亮度,以生成逼真的夜间图像输出。所提出的新型 Y 控制损失可将输出图像的亮度收敛到特定的亮度值。此外,自我注意模块的实施有效地减少了生成图像中的伪影。因此,在定性比较中,我们的模型在日夜图像转换中表现出了卓越的性能。此外,我们还使用车道检测模型进行了定量评估,结果表明我们提出的方法提高了夜间车道检测任务的性能。此外,我们还使用评估指标对生成的室内黑暗图像的质量进行了评估。结果证明,与其他图像转换模型相比,我们的模型生成的图像与真实的黑暗图像最为相似。
{"title":"LuminanceGAN: Controlling the brightness of generated images for various night conditions","authors":"Junghyun Seo ,&nbsp;Sungjun Wang ,&nbsp;Hyeonjae Jeon ,&nbsp;Taesoo Kim ,&nbsp;Yongsik Jin ,&nbsp;Soon Kwon ,&nbsp;Jeseok Kim ,&nbsp;Yongseob Lim","doi":"10.1016/j.patrec.2024.10.014","DOIUrl":"10.1016/j.patrec.2024.10.014","url":null,"abstract":"<div><div>There are diverse datasets available for training deep learning models utilized in autonomous driving. However, most of these datasets are composed of images obtained in day conditions, leading to a data imbalance issue when dealing with night condition images. Several day-to-night image translation models have been proposed to resolve the insufficiency of the night condition dataset, but these models often generate artifacts and cannot control the brightness of the generated image. In this study, we propose a LuminanceGAN, for controlling the brightness degree in night conditions to generate realistic night image outputs. The proposed novel Y-control loss converges the brightness degree of the output image to a specific luminance value. Furthermore, the implementation of the self-attention module effectively reduces artifacts in the generated images. Consequently, in qualitative comparisons, our model demonstrates superior performance in day-to-night image translation. Additionally, a quantitative evaluation was conducted using lane detection models, showing that our proposed method improves performance in night lane detection tasks. Moreover, the quality of the generated indoor dark images was assessed using an evaluation metric. It can be proven that our model generates images most similar to real dark images compared to other image translation models.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 292-299"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Denoising diffusion model with adversarial learning for unsupervised anomaly detection on brain MRI images 利用对抗学习对扩散模型去噪,实现脑磁共振成像图像的无监督异常检测
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.007
Jongmin Yu , Hyeontaek Oh , Younkwan Lee , Jinhong Yang
This paper proposes the Adversarial Denoising Diffusion Model (ADDM). Diffusion models excel at generating high-quality samples, outperforming other generative models. These models also achieve outstanding medical image anomaly detection (AD) results due to their strong sampling ability. However, the performance of the diffusion model-based methods is highly varied depending on the sampling frequency, and the time cost to generate good-quality samples is significantly higher than that of other generative models. We propose the ADDM, a diffusion model-based AD method trained with adversarial learning that can maintain high-quality sample generation ability and significantly reduce the number of sampling steps. The proposed adversarial learning is achieved by classifying model-based denoised samples and samples to which random Gaussian noise is added to a specific sampling step. Compared with the loss function of diffusion models, defined under the noise space to minimise the predicted noise and scheduled noise, the diffusion model can explicitly learn semantic information about the sample space since adversarial learning is defined based on the sample space. Our experiment demonstrated that adversarial learning helps achieve a data sampling performance similar to the DDPM with much fewer sampling steps. Experimental results show that the proposed ADDM outperformed existing unsupervised AD methods on Brain MRI images. In particular, in the comparison using 22 T1-weighted MRI scans provided by the Centre for Clinical Brain Sciences from the University of Edinburgh, the ADDM achieves similar performance with 50% fewer sampling steps than other DDPM-based AD methods, and it shows 6.2% better performance about the Dice metric with the same number of sampling steps.
本文提出了对抗性去噪扩散模型(ADDM)。扩散模型擅长生成高质量样本,其性能优于其他生成模型。由于其强大的采样能力,这些模型还能实现出色的医学图像异常检测(AD)结果。然而,基于扩散模型的方法的性能随采样频率的不同而变化很大,而且生成高质量样本的时间成本明显高于其他生成模型。我们提出的 ADDM 是一种基于扩散模型、经过对抗学习训练的 AD 方法,它既能保持高质量样本的生成能力,又能大大减少采样步骤。所提出的对抗学习是通过对基于模型的去噪样本和在特定采样步骤中加入随机高斯噪声的样本进行分类来实现的。扩散模型的损失函数是在噪声空间下定义的,目的是最小化预测噪声和预定噪声,与之相比,由于对抗学习是基于样本空间定义的,因此扩散模型可以明确地学习样本空间的语义信息。我们的实验证明,对抗学习有助于以更少的采样步骤实现与 DDPM 相似的数据采样性能。实验结果表明,在脑磁共振成像图像上,所提出的 ADDM 优于现有的无监督 AD 方法。特别是,在使用爱丁堡大学临床脑科学中心提供的 22 张 T1 加权 MRI 扫描图像进行的比较中,ADDM 比其他基于 DDPM 的 AD 方法减少了 50% 的采样步骤,却取得了类似的性能,而且在采样步骤相同的情况下,ADDM 比 Dice 指标高出 6.2%。
{"title":"Denoising diffusion model with adversarial learning for unsupervised anomaly detection on brain MRI images","authors":"Jongmin Yu ,&nbsp;Hyeontaek Oh ,&nbsp;Younkwan Lee ,&nbsp;Jinhong Yang","doi":"10.1016/j.patrec.2024.10.007","DOIUrl":"10.1016/j.patrec.2024.10.007","url":null,"abstract":"<div><div>This paper proposes the Adversarial Denoising Diffusion Model (ADDM). Diffusion models excel at generating high-quality samples, outperforming other generative models. These models also achieve outstanding medical image anomaly detection (AD) results due to their strong sampling ability. However, the performance of the diffusion model-based methods is highly varied depending on the sampling frequency, and the time cost to generate good-quality samples is significantly higher than that of other generative models. We propose the ADDM, a diffusion model-based AD method trained with adversarial learning that can maintain high-quality sample generation ability and significantly reduce the number of sampling steps. The proposed adversarial learning is achieved by classifying model-based denoised samples and samples to which random Gaussian noise is added to a specific sampling step. Compared with the loss function of diffusion models, defined under the noise space to minimise the predicted noise and scheduled noise, the diffusion model can explicitly learn semantic information about the sample space since adversarial learning is defined based on the sample space. Our experiment demonstrated that adversarial learning helps achieve a data sampling performance similar to the DDPM with much fewer sampling steps. Experimental results show that the proposed ADDM outperformed existing unsupervised AD methods on Brain MRI images. In particular, in the comparison using 22 T1-weighted MRI scans provided by the Centre for Clinical Brain Sciences from the University of Edinburgh, the ADDM achieves similar performance with 50% fewer sampling steps than other DDPM-based AD methods, and it shows 6.2% better performance about the Dice metric with the same number of sampling steps.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 229-235"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Table Transformers for imputing textual attributes 用于归因文本属性的表格转换器
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.023
Ting-Ruen Wei , Yuan Wang , Yoshitaka Inoue , Hsin-Tai Wu , Yi Fang
Missing data in tabular dataset is a common issue as the performance of downstream tasks usually depends on the completeness of the training dataset. Previous missing data imputation methods focus on numeric and categorical columns, but we propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) based on the transformer to impute unstructured textual columns using other columns in the table. We conduct extensive experiments on three datasets, and our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2. The performance improvement is more significant when the target sequence has a longer length. Additionally, we incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation. We also qualitatively compare with ChatGPT for realistic applications.
表格数据集中的缺失数据是一个常见问题,因为下游任务的性能通常取决于训练数据集的完整性。以前的缺失数据估算方法主要针对数值和分类列,而我们提出了一种新颖的端到端方法,称为 "文本属性估算表转换器(TTITA)",它基于转换器,利用表中的其他列估算非结构化文本列。我们在三个数据集上进行了广泛的实验,结果表明我们的方法在性能上优于循环神经网络和 Llama2 等基线模型。当目标序列长度较长时,性能提升更为显著。此外,我们还结合了多任务学习来同时对异构列进行归因,从而提高了文本归因的性能。我们还就实际应用与 ChatGPT 进行了定性比较。
{"title":"Table Transformers for imputing textual attributes","authors":"Ting-Ruen Wei ,&nbsp;Yuan Wang ,&nbsp;Yoshitaka Inoue ,&nbsp;Hsin-Tai Wu ,&nbsp;Yi Fang","doi":"10.1016/j.patrec.2024.09.023","DOIUrl":"10.1016/j.patrec.2024.09.023","url":null,"abstract":"<div><div>Missing data in tabular dataset is a common issue as the performance of downstream tasks usually depends on the completeness of the training dataset. Previous missing data imputation methods focus on numeric and categorical columns, but we propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) based on the transformer to impute unstructured textual columns using other columns in the table. We conduct extensive experiments on three datasets, and our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2. The performance improvement is more significant when the target sequence has a longer length. Additionally, we incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation. We also qualitatively compare with ChatGPT for realistic applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 258-264"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBCvT: Double Branch Convolutional Transformer for Medical Image Classification DBCvT:用于医学图像分类的双分支卷积变换器
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.008
Jinfeng Li , Meiling Feng , Chengyi Xia
Convolutional Neural Networks (CNNs) are extensively utilized in medical disease diagnosis, demonstrating the prominent performance in most cases. However, medical image processing based on deep learning faces some challenges. The limited availability and time-consuming annotations of medical image data restrict the scale and accuracy of model training. Data diversity and complexity further complicate these challenges. In order to address these issues, we introduce the Double Branch Convolutional Transformer (DBCvT), a hybrid CNN-Transformer feature extractor, which can better capture diverse fine-grained features and remain suitable for small datasets. In this model, separable downsampling convolution (SDConv) is used to mitigate excessive information loss during downsampling in standard convolutions. Additionally, we propose the Dual branch Channel Efficient multi-head Self-Attention (DCESA) mechanism to enhance the self-attention efficiency, consequently elevating network performance and effectiveness. Moreover, we introduce a novel convolutional channel-enhanced Attention mechanism to strengthen inter-channel relationships within feature maps post self-attention. The experiments of DBCvT on various medical image datasets have demonstrated the outstanding classification performance and generalization capability of the proposed model.
卷积神经网络(CNN)被广泛应用于医学疾病诊断,在大多数情况下都表现出卓越的性能。然而,基于深度学习的医学图像处理面临着一些挑战。医学图像数据的有限可用性和耗时的注释限制了模型训练的规模和准确性。数据的多样性和复杂性使这些挑战变得更加复杂。为了解决这些问题,我们引入了双分支卷积变换器(DBCvT),这是一种混合的 CNN-变换器特征提取器,可以更好地捕捉多样化的细粒度特征,并且仍然适用于小型数据集。在该模型中,使用了可分离降采样卷积(SDConv)来减轻标准卷积在降采样过程中的过度信息丢失。此外,我们还提出了双分支通道高效多头自我关注(DCESA)机制,以提高自我关注效率,从而提升网络性能和效率。此外,我们还引入了一种新颖的卷积信道增强注意机制,以加强自注意后特征图中的信道间关系。DBCvT 在各种医学图像数据集上的实验证明了所提模型出色的分类性能和泛化能力。
{"title":"DBCvT: Double Branch Convolutional Transformer for Medical Image Classification","authors":"Jinfeng Li ,&nbsp;Meiling Feng ,&nbsp;Chengyi Xia","doi":"10.1016/j.patrec.2024.10.008","DOIUrl":"10.1016/j.patrec.2024.10.008","url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) are extensively utilized in medical disease diagnosis, demonstrating the prominent performance in most cases. However, medical image processing based on deep learning faces some challenges. The limited availability and time-consuming annotations of medical image data restrict the scale and accuracy of model training. Data diversity and complexity further complicate these challenges. In order to address these issues, we introduce the Double Branch Convolutional Transformer (DBCvT), a hybrid CNN-Transformer feature extractor, which can better capture diverse fine-grained features and remain suitable for small datasets. In this model, separable downsampling convolution (SDConv) is used to mitigate excessive information loss during downsampling in standard convolutions. Additionally, we propose the Dual branch Channel Efficient multi-head Self-Attention (DCESA) mechanism to enhance the self-attention efficiency, consequently elevating network performance and effectiveness. Moreover, we introduce a novel convolutional channel-enhanced Attention mechanism to strengthen inter-channel relationships within feature maps post self-attention. The experiments of DBCvT on various medical image datasets have demonstrated the outstanding classification performance and generalization capability of the proposed model.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 250-257"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional dynamic point cloud completion network 区域动态点云完成网络
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.017
Liping Zhu, Yixuan Yang, Kai Liu, Silin Wu, Bingyao Wang, Xianxiang Chang
Point cloud completion network often encodes points into a global feature vector, then predicts the complete point cloud through the vector generation process. However, this method may not accurately capture complex shapes, as global feature vectors struggle to recover their detailed structure. In this paper, we present a novel shape completion network, namely RD-Net, that innovatively focuses on the interaction of information between points to provide both local and global information for generating fine-grained complete shape. Specifically, we propose a stored iteration-based method for point cloud sampling that quickly captures representative points within the point cloud. Subsequently, in order to better predict the shape and structure of the missing part, we design an iterative edge-convolution module. It uses a CNN-like hierarchy for feature extraction and learning context information. Moreover, we design a two-stage reconstruction process for latent vector decoding. We first employ a feature-points-based multi-scale generating decoder to estimate the missing point cloud hierarchically. This is followed by a self-attention mechanism that refines the generated shape and effectively generates structural details. By combining these innovations, RD-Net achieves a 2% reduction in CD error compared to the state-of-the-art method on the ShapeNet-part dataset.
点云补全网络通常将点编码为全局特征向量,然后通过向量生成过程预测完整的点云。然而,这种方法可能无法准确捕捉复杂的形状,因为全局特征向量难以恢复其细节结构。在本文中,我们提出了一种新颖的形状补全网络,即 RD-Net,它创新性地关注点与点之间的信息交互,为生成精细的完整形状提供局部和全局信息。具体来说,我们提出了一种基于存储迭代的点云采样方法,可快速捕捉点云中的代表性点。随后,为了更好地预测缺失部分的形状和结构,我们设计了一个迭代边缘卷积模块。它使用类似 CNN 的层次结构来提取特征和学习上下文信息。此外,我们还为潜向量解码设计了一个两阶段重建过程。我们首先采用基于特征点的多尺度生成解码器来分层估计缺失点云。随后,我们采用自我关注机制来完善生成的形状,并有效生成结构细节。通过结合这些创新技术,RD-Net 在 ShapeNet-part 数据集上的 CD 误差比最先进的方法减少了 2%。
{"title":"Regional dynamic point cloud completion network","authors":"Liping Zhu,&nbsp;Yixuan Yang,&nbsp;Kai Liu,&nbsp;Silin Wu,&nbsp;Bingyao Wang,&nbsp;Xianxiang Chang","doi":"10.1016/j.patrec.2024.10.017","DOIUrl":"10.1016/j.patrec.2024.10.017","url":null,"abstract":"<div><div>Point cloud completion network often encodes points into a global feature vector, then predicts the complete point cloud through the vector generation process. However, this method may not accurately capture complex shapes, as global feature vectors struggle to recover their detailed structure. In this paper, we present a novel shape completion network, namely RD-Net, that innovatively focuses on the interaction of information between points to provide both local and global information for generating fine-grained complete shape. Specifically, we propose a stored iteration-based method for point cloud sampling that quickly captures representative points within the point cloud. Subsequently, in order to better predict the shape and structure of the missing part, we design an iterative edge-convolution module. It uses a CNN-like hierarchy for feature extraction and learning context information. Moreover, we design a two-stage reconstruction process for latent vector decoding. We first employ a feature-points-based multi-scale generating decoder to estimate the missing point cloud hierarchically. This is followed by a self-attention mechanism that refines the generated shape and effectively generates structural details. By combining these innovations, RD-Net achieves a 2% reduction in CD error compared to the state-of-the-art method on the ShapeNet-part dataset.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 322-329"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-free diffusion inpainting using reference images for enhanced visual fidelity 利用参考图像进行无文本扩散涂色,增强视觉保真度
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.009
Beomjo Kim, Kyung-Ah Sohn
This paper presents a novel approach to subject-driven image generation that addresses the limitations of traditional text-to-image diffusion models. Our method generates images using reference images without relying on language-based prompts. We introduce a visual detail preserving module that captures intricate details and textures, addressing overfitting issues associated with limited training samples. The model's performance is further enhanced through a modified classifier-free guidance technique and feature concatenation, enabling the natural positioning and harmonization of subjects within diverse scenes. Quantitative assessments using CLIP, DINO and Quality scores (QS), along with a user study, demonstrate the superior quality of our generated images. Our work highlights the potential of pre-trained models and visual patch embeddings in subject-driven editing, balancing diversity and fidelity in image generation tasks. Our implementation is available at https://github.com/8eomio/Subject-Inpainting.
本文提出了一种主体驱动图像生成的新方法,解决了传统文本到图像扩散模型的局限性。我们的方法使用参考图像生成图像,而不依赖基于语言的提示。我们引入了一个视觉细节保护模块,该模块可捕捉复杂的细节和纹理,解决与有限的训练样本相关的过拟合问题。通过改进的无分类器引导技术和特征串联技术,该模型的性能得到了进一步提升,从而实现了不同场景中主体的自然定位和协调。使用 CLIP、DINO 和质量分数(QS)进行的定量评估以及一项用户研究表明,我们生成的图像质量上乘。我们的工作凸显了预训练模型和视觉补丁嵌入在主体驱动编辑中的潜力,在图像生成任务中平衡了多样性和保真度。我们的实施方案可在 https://github.com/8eomio/Subject-Inpainting 上查阅。
{"title":"Text-free diffusion inpainting using reference images for enhanced visual fidelity","authors":"Beomjo Kim,&nbsp;Kyung-Ah Sohn","doi":"10.1016/j.patrec.2024.10.009","DOIUrl":"10.1016/j.patrec.2024.10.009","url":null,"abstract":"<div><div>This paper presents a novel approach to subject-driven image generation that addresses the limitations of traditional text-to-image diffusion models. Our method generates images using reference images without relying on language-based prompts. We introduce a visual detail preserving module that captures intricate details and textures, addressing overfitting issues associated with limited training samples. The model's performance is further enhanced through a modified classifier-free guidance technique and feature concatenation, enabling the natural positioning and harmonization of subjects within diverse scenes. Quantitative assessments using CLIP, DINO and Quality scores (QS), along with a user study, demonstrate the superior quality of our generated images. Our work highlights the potential of pre-trained models and visual patch embeddings in subject-driven editing, balancing diversity and fidelity in image generation tasks. Our implementation is available at <span><span>https://github.com/8eomio/Subject-Inpainting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 221-228"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMIFR: Multi-modal industry focused data repository MMIFR:以多模式工业为重点的数据储存库
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.11.001
Mingxuan Chen , Shiqi Li , Xujun Wei , Jiacheng Song
In the rapidly advancing field of industrial automation, the availability of robust and diverse datasets is crucial for the development and evaluation of machine learning models. The data repository consists of four distinct versions of datasets: MMIFR-D, MMIFR-FS, MMIFR-OD and MMIFR-P. The MMIFR-D dataset comprises a comprehensive assemblage of 5907 images accompanied by corresponding textual descriptions, notably facilitating the application of industrial equipment classification. In contrast, the MMIFR-FS dataset serves as an alternative variant characterized by the inclusion of 129 distinct classes and 5907 images, specifically catering to the task of few-shot learning within the industrial domain. MMIFR-OD is another alternative variant, comprising 8,839 annotation instances across 128 distinct categories, is predominantly utilized for object detection tasks. Additionally, the MMIFR-P dataset consists of 142 textual–visual information pairs, making it suitable for detecting pairs of industrial equipment. Furthermore, we conduct a comprehensive comparative analysis of our dataset in relation to other datasets used in industrial settings. Benchmark performances for different industrial tasks on our data repository are provided. The proposed multimodal dataset, MMIFR, can be utilized for research in industrial automation, quality control, safety monitoring, and other relevant domains.
在快速发展的工业自动化领域,强大而多样的数据集对于机器学习模型的开发和评估至关重要。数据存储库包括四个不同版本的数据集:MMIFR-D、MMIFR-FS、MMIFR-OD 和 MMIFR-P。MMIFR-D 数据集由 5907 幅图像组成,并附有相应的文字说明,非常便于工业设备分类的应用。相比之下,MMIFR-FS 数据集是另一种变体,其特点是包含 129 个不同类别和 5907 幅图像,特别适合工业领域的少镜头学习任务。MMIFR-OD 是另一种变体,包含 128 个不同类别的 8839 个注释实例,主要用于物体检测任务。此外,MMIFR-P 数据集包含 142 个文本-视觉信息对,适用于检测工业设备对。此外,我们还对我们的数据集与工业环境中使用的其他数据集进行了全面的比较分析。我们还提供了不同工业任务在我们的数据存储库上的基准性能。提议的多模态数据集 MMIFR 可用于工业自动化、质量控制、安全监控和其他相关领域的研究。
{"title":"MMIFR: Multi-modal industry focused data repository","authors":"Mingxuan Chen ,&nbsp;Shiqi Li ,&nbsp;Xujun Wei ,&nbsp;Jiacheng Song","doi":"10.1016/j.patrec.2024.11.001","DOIUrl":"10.1016/j.patrec.2024.11.001","url":null,"abstract":"<div><div>In the rapidly advancing field of industrial automation, the availability of robust and diverse datasets is crucial for the development and evaluation of machine learning models. The data repository consists of four distinct versions of datasets: MMIFR-D, MMIFR-FS, MMIFR-OD and MMIFR-P. The MMIFR-D dataset comprises a comprehensive assemblage of 5907 images accompanied by corresponding textual descriptions, notably facilitating the application of industrial equipment classification. In contrast, the MMIFR-FS dataset serves as an alternative variant characterized by the inclusion of 129 distinct classes and 5907 images, specifically catering to the task of few-shot learning within the industrial domain. MMIFR-OD is another alternative variant, comprising 8,839 annotation instances across 128 distinct categories, is predominantly utilized for object detection tasks. Additionally, the MMIFR-P dataset consists of 142 textual–visual information pairs, making it suitable for detecting pairs of industrial equipment. Furthermore, we conduct a comprehensive comparative analysis of our dataset in relation to other datasets used in industrial settings. Benchmark performances for different industrial tasks on our data repository are provided. The proposed multimodal dataset, MMIFR, can be utilized for research in industrial automation, quality control, safety monitoring, and other relevant domains.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 306-313"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EDS: Exploring deeper into semantics for video captioning EDS:深入探索视频字幕语义
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.017
Yibo Lou , Wenjie Zhang , Xiaoning Song , Yang Hua , Xiao-Jun Wu
Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at https://github.com/BradenJoson/EDS.
近年来,有效利用语义信息对推进视频字幕至关重要。但是,目前流行的方法涉及设计各种语音部分(POS)标签作为先验信息,在整个训练过程中缺乏必要的语言知识指导,特别是在 POS 和初始描述生成方面。此外,局限于单一的语义信息源忽略了每段视频固有的不同解释的可能性。为了解决这些问题,我们提出了用于视频字幕的 "深入语义探索"(EDS)方法。EDS 包括三个以语义信息为重点的可行模块。具体来说,我们提出了语义监督生成(SSG)模块。它将语义信息作为先验信息进行整合,并促进词与词之间丰富的相互关系,从而实现 POS 监督。我们还提出了一个新颖的相似性语义扩展(SSE)模块,利用基于查询的语义扩展来协作生成细粒度内容。此外,所提出的输入语义增强(ISE)模块还提供了一种策略,用于减轻单词生成初始阶段所面临的信息限制。实验结果表明,通过监督、扩展和增强来利用语义信息,EDS 不仅能产生可喜的结果,还能凸显其有效性。代码见 https://github.com/BradenJoson/EDS。
{"title":"EDS: Exploring deeper into semantics for video captioning","authors":"Yibo Lou ,&nbsp;Wenjie Zhang ,&nbsp;Xiaoning Song ,&nbsp;Yang Hua ,&nbsp;Xiao-Jun Wu","doi":"10.1016/j.patrec.2024.09.017","DOIUrl":"10.1016/j.patrec.2024.09.017","url":null,"abstract":"<div><div>Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at <span><span>https://github.com/BradenJoson/EDS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 133-140"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAM: Adaptive federated meta-learning for MRI data FAM:针对核磁共振成像数据的自适应联合元学习
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.018
Indrajeet Kumar Sinha, Shekhar Verma, Krishna Pratap Singh
Federated learning enables multiple clients to collaborate to train a model without sharing data. Clients with insufficient data or data diversity participate in federated learning to learn a model with superior performance. MRI data suffers from inadequate data and different data distribution due to differences in MRI scanners and client characteristics. Also, privacy concerns preclude data sharing. In this work, we propose a novel adaptive federated meta-learning (FAM) mechanism for collaboratively learning a single global model, which is personalized locally on individual clients. The learnt sparse global model captures the common features in the MRI data across clients. This model is grown on each client to learn a personalized model by capturing additional client-specific parameters from local data. Experimental results on multiple data sets show that the personalization process at each client quickly converges using a limited number of epochs. The personalized client models outperformed the locally trained models, demonstrating the efficacy of the FAM mechanism. Additionally, the FAM-based sparse global model has fewer parameters that require less communication overhead during federated learning. This makes the model viable for networks with limited resources.
联合学习使多个客户端能够在不共享数据的情况下合作训练一个模型。数据不足或数据多样性的客户端可参与联合学习,以学习性能卓越的模型。由于核磁共振成像扫描仪和客户端特征的不同,核磁共振成像数据存在数据不足和数据分布不同的问题。此外,隐私问题也阻碍了数据共享。在这项工作中,我们提出了一种新颖的自适应联合元学习(FAM)机制,用于协作学习一个单一的全局模型,该模型在各个客户机上进行了本地个性化处理。学习到的稀疏全局模型能捕捉到不同客户端磁共振成像数据中的共同特征。通过从本地数据中捕捉额外的客户特定参数,该模型在每个客户机上生长,以学习个性化模型。在多个数据集上的实验结果表明,每个客户端的个性化过程在有限的历时内迅速收敛。个性化客户端模型的性能优于本地训练的模型,这证明了 FAM 机制的有效性。此外,基于 FAM 的稀疏全局模型参数较少,在联合学习过程中需要的通信开销也较少。这使得该模型适用于资源有限的网络。
{"title":"FAM: Adaptive federated meta-learning for MRI data","authors":"Indrajeet Kumar Sinha,&nbsp;Shekhar Verma,&nbsp;Krishna Pratap Singh","doi":"10.1016/j.patrec.2024.09.018","DOIUrl":"10.1016/j.patrec.2024.09.018","url":null,"abstract":"<div><div>Federated learning enables multiple clients to collaborate to train a model without sharing data. Clients with insufficient data or data diversity participate in federated learning to learn a model with superior performance. MRI data suffers from inadequate data and different data distribution due to differences in MRI scanners and client characteristics. Also, privacy concerns preclude data sharing. In this work, we propose a novel adaptive federated meta-learning (FAM) mechanism for collaboratively learning a single global model, which is personalized locally on individual clients. The learnt sparse global model captures the common features in the MRI data across clients. This model is grown on each client to learn a personalized model by capturing additional client-specific parameters from local data. Experimental results on multiple data sets show that the personalization process at each client quickly converges using a limited number of epochs. The personalized client models outperformed the locally trained models, demonstrating the efficacy of the FAM mechanism. Additionally, the FAM-based sparse global model has fewer parameters that require less communication overhead during federated learning. This makes the model viable for networks with limited resources.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 205-212"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1