首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Deep NRSFM for multi-view multi-body pose estimation 用于多视角多体姿态估计的深度 NRSFM
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.015
Áron Fóthi, Joul Skaf, Fengjiao Lu, Kristian Fenech

This paper addresses the challenging task of unsupervised relative human pose estimation. Our solution exploits the potential offered by utilizing multiple uncalibrated cameras. It is assumed that spatial human pose and camera parameter estimation can be solved as a block sparse dictionary learning problem with zero supervision. The resulting structures and camera parameters can fit individual skeletons into a common space. To do so, we exploit the fact that all individuals in the image are viewed from the same camera viewpoint, thus exploiting the information provided by multiple camera views and overcoming the lack of information on camera parameters. To the best of our knowledge, this is the first solution that requires neither 3D ground truth nor knowledge of the intrinsic or extrinsic camera parameters. Our approach demonstrates the potential of using multiple viewpoints to solve challenging computer vision problems. Additionally, we provide access to the code, encouraging further development and experimentation. https://github.com/Jeryoss/MVMB-NRSFM.

本文探讨了无监督相对人体姿态估计这一具有挑战性的任务。我们的解决方案利用了多台未校准摄像机的潜力。假设空间人体姿态和摄像机参数估计可以作为零监督的块稀疏字典学习问题来解决。由此产生的结构和摄像机参数可将单个骨架拟合到一个共同的空间中。为此,我们利用了图像中的所有个体都从同一摄像机视角观看这一事实,从而利用了多个摄像机视角提供的信息,克服了摄像机参数信息缺乏的问题。据我们所知,这是第一个既不需要三维地面实况,也不需要内在或外在相机参数知识的解决方案。我们的方法展示了使用多视角解决具有挑战性的计算机视觉问题的潜力。此外,我们还提供了代码访问权限,鼓励进一步开发和实验。https://github.com/Jeryoss/MVMB-NRSFM。
{"title":"Deep NRSFM for multi-view multi-body pose estimation","authors":"Áron Fóthi,&nbsp;Joul Skaf,&nbsp;Fengjiao Lu,&nbsp;Kristian Fenech","doi":"10.1016/j.patrec.2024.08.015","DOIUrl":"10.1016/j.patrec.2024.08.015","url":null,"abstract":"<div><p>This paper addresses the challenging task of unsupervised relative human pose estimation. Our solution exploits the potential offered by utilizing multiple uncalibrated cameras. It is assumed that spatial human pose and camera parameter estimation can be solved as a block sparse dictionary learning problem with zero supervision. The resulting structures and camera parameters can fit individual skeletons into a common space. To do so, we exploit the fact that all individuals in the image are viewed from the same camera viewpoint, thus exploiting the information provided by multiple camera views and overcoming the lack of information on camera parameters. To the best of our knowledge, this is the first solution that requires neither 3D ground truth nor knowledge of the intrinsic or extrinsic camera parameters. Our approach demonstrates the potential of using multiple viewpoints to solve challenging computer vision problems. Additionally, we provide access to the code, encouraging further development and experimentation. <span><span>https://github.com/Jeryoss/MVMB-NRSFM</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 218-224"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002472/pdfft?md5=c7f415f86c9c99693c29d66ef080962f&pid=1-s2.0-S0167865524002472-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special section: Best papers of the 15th Mexican conference on pattern recognition (MCPR) 2023 特别部分:第 15 届墨西哥模式识别会议(MCPR)最佳论文 2023
IF 5.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.017
Ansel Y. Rodríguez González, Humberto Perez-Espinosa, Jesús Ariel Carrasco Ochoa, José Francisco Martínez Trinidad, José Arturo Olvera López
{"title":"Special section: Best papers of the 15th Mexican conference on pattern recognition (MCPR) 2023","authors":"Ansel Y. Rodríguez González, Humberto Perez-Espinosa, Jesús Ariel Carrasco Ochoa, José Francisco Martínez Trinidad, José Arturo Olvera López","doi":"10.1016/j.patrec.2024.08.017","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.08.017","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"5 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Saliency-based video summarization for face anti-spoofing 基于显著性的人脸反欺骗视频总结
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.008
Usman Muhammad , Mourad Oussalah , Jorma Laaksonen

With the growing availability of databases for face presentation attack detection, researchers are increasingly focusing on video-based face anti-spoofing methods that involve hundreds to thousands of images for training the models. However, there is currently no clear consensus on the optimal number of frames in a video to improve face spoofing detection. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling the identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail images, enhancing the representation of the most important information. Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail images using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of the proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency for face presentation attack detection. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method’s effectiveness, a simple CNN–RNN deep learning architecture was used, and the experimental results showcased state-of-the-art performance on four challenging face anti-spoofing datasets.

随着用于人脸呈现攻击检测的数据库越来越多,研究人员越来越关注基于视频的人脸反欺骗方法,这种方法需要数百到数千张图像来训练模型。然而,对于视频中的最佳帧数以提高人脸欺骗检测的效果,目前还没有明确的共识。受视觉显著性理论的启发,我们提出了一种用于人脸反欺骗检测的视频总结方法,旨在利用视觉显著性提高深度学习模型的性能和效率。具体而言,我们从源图像的拉普拉斯滤波和维纳滤波输出之间的差异中提取出显著性信息,从而识别出每帧图像中视觉最突出的区域。随后,源图像被分解为基本图像和细节图像,从而增强了对最重要信息的呈现。然后根据显著性信息计算加权图,显示图像中每个像素的重要性。通过使用加权图线性组合基础图像和细节图像,该方法可融合源图像,从而创建一个能概括整个视频的单一代表性图像。所提方法的主要贡献在于展示了如何将视觉显著性作为一种以数据为中心的方法来提高人脸呈现攻击检测的性能和效率。通过关注图像中最突出的图像或区域,可以创建更具代表性和多样性的训练集,从而建立更有效的模型。为了验证该方法的有效性,我们使用了一个简单的 CNN-RNN 深度学习架构,实验结果显示,在四个具有挑战性的人脸反欺骗数据集上,该方法具有最先进的性能。
{"title":"Saliency-based video summarization for face anti-spoofing","authors":"Usman Muhammad ,&nbsp;Mourad Oussalah ,&nbsp;Jorma Laaksonen","doi":"10.1016/j.patrec.2024.08.008","DOIUrl":"10.1016/j.patrec.2024.08.008","url":null,"abstract":"<div><p>With the growing availability of databases for face presentation attack detection, researchers are increasingly focusing on video-based face anti-spoofing methods that involve hundreds to thousands of images for training the models. However, there is currently no clear consensus on the optimal number of frames in a video to improve face spoofing detection. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling the identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail images, enhancing the representation of the most important information. Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail images using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of the proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency for face presentation attack detection. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method’s effectiveness, a simple CNN–RNN deep learning architecture was used, and the experimental results showcased state-of-the-art performance on four challenging face anti-spoofing datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 190-196"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph neural collaborative filtering with medical content-aware pre-training for treatment pattern recommendation 图神经协同过滤与医疗内容感知预训练用于治疗模式推荐
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.014
Xin Min , Wei Li , Ruiqi Han , Tianlong Ji , Weidong Xie

Recently, considering the advancement of information technology in healthcare, electronic medical records (EMRs) have become the repository of patients’ treatment processes in hospitals, including the patient’s treatment pattern (standard treatment process), the patient’s medical history, the patient’s admission diagnosis, etc. In particular, EMRs-based treatment recommendation systems have become critical for optimizing clinical decision-making. EMRs contain complex relationships between patients and treatment patterns. Recent studies have shown that graph neural collaborative filtering can effectively capture the complex relationships in EMRs. However, none of the existing methods take into account the impact of medical content such as the patient’s admission diagnosis, and medical history on treatment recommendations. In this work, we propose a graph neural collaborative filtering model with medical content-aware pre-training (CAPRec) for learning initial embeddings with medical content to improve recommendation performance. First the model constructs a patient-treatment pattern interaction graph from EMRs data. Then we attempt to use the medical content for pre-training learning and transfer the learned embeddings to a graph neural collaborative filtering model. Finally, the learned initial embedding can support the downstream task of graph collaborative filtering. Extensive experiments on real world datasets have consistently demonstrated the effectiveness of the medical content-aware training framework in improving treatment recommendations.

最近,考虑到医疗保健领域信息技术的进步,电子病历(EMR)已成为医院患者治疗过程的储存库,包括患者的治疗模式(标准治疗过程)、患者的病史、患者的入院诊断等。特别是,基于 EMR 的治疗建议系统对优化临床决策至关重要。电子病历包含病人和治疗模式之间的复杂关系。最近的研究表明,图神经协同过滤可以有效捕捉 EMR 中的复杂关系。然而,现有的方法都没有考虑到医疗内容(如患者的入院诊断和病史)对治疗建议的影响。在这项工作中,我们提出了一种带有医疗内容感知预训练(CAPRec)的图神经协同过滤模型,用于学习带有医疗内容的初始嵌入,以提高推荐性能。首先,该模型从 EMRs 数据中构建患者-治疗模式交互图。然后,我们尝试使用医疗内容进行预训练学习,并将学习到的嵌入信息转移到图神经协同过滤模型中。最后,学习到的初始嵌入可以支持图协同过滤的下游任务。在现实世界数据集上进行的大量实验一致证明了医疗内容感知训练框架在改进治疗建议方面的有效性。
{"title":"Graph neural collaborative filtering with medical content-aware pre-training for treatment pattern recommendation","authors":"Xin Min ,&nbsp;Wei Li ,&nbsp;Ruiqi Han ,&nbsp;Tianlong Ji ,&nbsp;Weidong Xie","doi":"10.1016/j.patrec.2024.08.014","DOIUrl":"10.1016/j.patrec.2024.08.014","url":null,"abstract":"<div><p>Recently, considering the advancement of information technology in healthcare, electronic medical records (EMRs) have become the repository of patients’ treatment processes in hospitals, including the patient’s treatment pattern (standard treatment process), the patient’s medical history, the patient’s admission diagnosis, etc. In particular, EMRs-based treatment recommendation systems have become critical for optimizing clinical decision-making. EMRs contain complex relationships between patients and treatment patterns. Recent studies have shown that graph neural collaborative filtering can effectively capture the complex relationships in EMRs. However, none of the existing methods take into account the impact of medical content such as the patient’s admission diagnosis, and medical history on treatment recommendations. In this work, we propose a graph neural collaborative filtering model with medical content-aware pre-training (CAPRec) for learning initial embeddings with medical content to improve recommendation performance. First the model constructs a patient-treatment pattern interaction graph from EMRs data. Then we attempt to use the medical content for pre-training learning and transfer the learned embeddings to a graph neural collaborative filtering model. Finally, the learned initial embedding can support the downstream task of graph collaborative filtering. Extensive experiments on real world datasets have consistently demonstrated the effectiveness of the medical content-aware training framework in improving treatment recommendations.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 210-217"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swin-chart: An efficient approach for chart classification 斯温图表图表分类的有效方法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.patrec.2024.08.012
Anurag Dhote , Mohammed Javed , David S. Doermann

Charts are a visualization tool used in scientific documents to facilitate easy comprehension of complex relationships underlying data and experiments. Researchers use various chart types to convey scientific information, so the problem of data extraction and subsequent chart understanding becomes very challenging. Many studies have been taken up in the literature to address the problem of chart mining, whose motivation is to facilitate the editing of existing charts, carry out extrapolative studies, and provide a deeper understanding of the underlying data. The first step towards chart understanding is chart classification, for which traditional ML and CNN-based deep learning models have been used in the literature. In this paper, we propose Swin-Chart, a Swin transformer-based deep learning approach for chart classification, which generalizes well across multiple datasets with a wide range of chart categories. Swin-Chart comprises a pre-trained Swin Transformer, a finetuning component, and a weight averaging component. The proposed approach is tested on a five-chart image benchmark dataset. We observed that the Swin-Chart model outperformers existing state-of-the-art models on all the datasets. Furthermore, we also provide an ablation study of the Swin-Chart model with all five datasets to understand the importance of various sub-parts such as the back-bone Swin transformer model, the value of several best weights selected for the weight averaging component, and the presence of the weight averaging component itself.

The Swin-Chart model also received first position in the chart classification task on the latest dataset in the CHART Infographics competition at ICDAR 2023 - chartinfo.github.io.

图表是科学文献中的一种可视化工具,便于理解数据和实验背后的复杂关系。研究人员使用各种类型的图表来传递科学信息,因此数据提取和后续图表理解问题变得非常具有挑战性。针对图表挖掘问题,许多文献都进行了研究,其动机是促进现有图表的编辑,开展推断研究,并提供对基础数据的更深入理解。图表理解的第一步是图表分类,文献中使用了传统的 ML 和基于 CNN 的深度学习模型。在本文中,我们提出了一种基于 Swin 变换器的图表分类深度学习方法--Swin-Chart,它能在具有广泛图表类别的多个数据集上实现良好的泛化。Swin-Chart 由一个预训练的 Swin 变换器、一个微调组件和一个权重平均组件组成。我们在五个图表图像基准数据集上对所提出的方法进行了测试。我们发现,在所有数据集上,Swin-Chart 模型都优于现有的最先进模型。此外,我们还利用所有五个数据集对 Swin-Chart 模型进行了消融研究,以了解各个子部分的重要性,如骨干 Swin 变换器模型、为权重平均组件选择的几个最佳权重的值以及权重平均组件本身的存在。
{"title":"Swin-chart: An efficient approach for chart classification","authors":"Anurag Dhote ,&nbsp;Mohammed Javed ,&nbsp;David S. Doermann","doi":"10.1016/j.patrec.2024.08.012","DOIUrl":"10.1016/j.patrec.2024.08.012","url":null,"abstract":"<div><p>Charts are a visualization tool used in scientific documents to facilitate easy comprehension of complex relationships underlying data and experiments. Researchers use various chart types to convey scientific information, so the problem of data extraction and subsequent chart understanding becomes very challenging. Many studies have been taken up in the literature to address the problem of chart mining, whose motivation is to facilitate the editing of existing charts, carry out extrapolative studies, and provide a deeper understanding of the underlying data. The first step towards chart understanding is chart classification, for which traditional ML and CNN-based deep learning models have been used in the literature. In this paper, we propose Swin-Chart, a Swin transformer-based deep learning approach for chart classification, which generalizes well across multiple datasets with a wide range of chart categories. Swin-Chart comprises a pre-trained Swin Transformer, a finetuning component, and a weight averaging component. The proposed approach is tested on a five-chart image benchmark dataset. We observed that the Swin-Chart model outperformers existing state-of-the-art models on all the datasets. Furthermore, we also provide an ablation study of the Swin-Chart model with all five datasets to understand the importance of various sub-parts such as the back-bone Swin transformer model, the value of several best weights selected for the weight averaging component, and the presence of the weight averaging component itself.</p><p>The Swin-Chart model also received first position in the chart classification task on the latest dataset in the CHART Infographics competition at ICDAR 2023 - chartinfo.github.io.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 203-209"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Learning for Lane Detection via cross-similarity 通过交叉相似性对车道检测进行对比学习
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-20 DOI: 10.1016/j.patrec.2024.08.007
Ali Zoljodi , Sadegh Abadijou , Mina Alibeigi , Masoud Daneshtalab

Detecting lane markings in road scenes poses a significant challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by varying lighting conditions, adverse weather, occlusions by other vehicles or pedestrians, road plane changes, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of high-quality and diverse data to train a robust lane detection model capable of handling various real-world scenarios.

In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via Cross-Similarity (CLLD) to enhance the resilience and effectiveness of lane detection models in real-world scenarios, particularly when the visibility of lane markings are compromised. CLLD introduces a novel contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our newly proposed operation, dubbed cross-similarity.

The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmark datasets demonstrate that CLLD consistently outperforms state-of-the-art contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. When compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.

道路场景中的车道标线错综复杂,很容易受到不利条件的影响,因此对其进行检测是一项巨大的挑战。虽然车道标线具有很强的形状先验性,但其可视性很容易受到不同光照条件、恶劣天气、其他车辆或行人遮挡、路面变化以及颜色随时间褪色等因素的影响。检测过程因多种车道形状和自然变化的存在而变得更加复杂,因此需要大量高质量和多样化的数据来训练能够处理各种真实世界场景的鲁棒车道检测模型。在本文中,我们提出了一种名为 "通过交叉相似性进行车道检测的对比学习"(Contrastive Learning for Lane Detection via Cross-Similarity,简称 CLLD)的新型自监督学习方法,以增强车道检测模型在真实世界场景中的适应性和有效性,尤其是当车道标记的可见性受到影响时。CLLD 引入了一种新颖的对比学习(CL)方法,在输入图像的全局背景下评估局部特征的相似性。它利用周边信息来预测车道标记。这是通过将局部特征对比学习与我们新提出的操作(称为交叉相似性)相结合来实现的。局部特征 CL 专注于从小块图像中提取特征,这是精确定位车道分段的必要条件。同时,交叉相似性可以捕捉全局特征,从而根据周围环境检测出模糊的车道段。我们通过在增强过程中随机屏蔽部分输入图像来增强交叉相似性。在 TuSimple 和 CuLane 基准数据集上进行的大量实验表明,CLLD 始终优于最先进的对比学习方法,尤其是在阴影等有损可见度的条件下,同时它在正常条件下也能提供与之相当的结果。与监督学习相比,CLLD 在阴影和拥挤场景等具有挑战性的场景中仍然表现出色,而这些场景在实际驾驶中很常见。
{"title":"Contrastive Learning for Lane Detection via cross-similarity","authors":"Ali Zoljodi ,&nbsp;Sadegh Abadijou ,&nbsp;Mina Alibeigi ,&nbsp;Masoud Daneshtalab","doi":"10.1016/j.patrec.2024.08.007","DOIUrl":"10.1016/j.patrec.2024.08.007","url":null,"abstract":"<div><p>Detecting lane markings in road scenes poses a significant challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by varying lighting conditions, adverse weather, occlusions by other vehicles or pedestrians, road plane changes, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of high-quality and diverse data to train a robust lane detection model capable of handling various real-world scenarios.</p><p>In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via Cross-Similarity (CLLD) to enhance the resilience and effectiveness of lane detection models in real-world scenarios, particularly when the visibility of lane markings are compromised. CLLD introduces a novel contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our newly proposed operation, dubbed <em>cross-similarity</em>.</p><p>The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmark datasets demonstrate that CLLD consistently outperforms state-of-the-art contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. When compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 175-183"},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002393/pdfft?md5=216ead31bb4d56cfb720a21ce2d4db87&pid=1-s2.0-S0167865524002393-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction 用于面部头像重建的语义感知超空间可变形神经辐射场
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.patrec.2024.08.004
Kaixin Jin, Xiaoling Gu, Zimeng Wang, Zhenzhong Kuang, Zizhao Wu, Min Tan, Jun Yu

High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at https://github.com/jematy/SAHS-Deformable-Nerf.

从单目视频中重建高保真面部头像是计算机图形学和计算机视觉领域的一个突出研究课题。神经辐射场(NeRF)的最新进展表明,该技术在渲染新颖视图方面具有出色的能力,并因其在面部头像重建方面的潜力而备受关注。然而,以前的方法忽略了头部、躯干和复杂面部特征的复杂运动动态。此外,基于 NeRF 的面部头像重建通用框架也存在不足,既不能适应 3DMM 系数,也不能适应音频输入。为了应对这些挑战,我们提出了一个创新框架,利用语义感知超空间可变形 NeRF,促进从 3DMM 系数或音频特征重建高保真面部头像。我们的框架通过语义引导和统一的超空间变形模块,有效地解决了局部面部运动和更广泛的头部和躯干运动问题。具体来说,我们采用动态加权射线采样策略,将不同程度的注意力分配给不同的语义区域,通过语义引导来增强可变形 NeRF 框架,从而捕捉不同面部区域的精细细节。此外,我们还引入了超空间变形模块,可将观察空间坐标转换为规范超空间坐标,从而学习自然的面部变形和头躯干运动。广泛的实验验证了我们的框架优于现有的最先进方法,证明了它在制作逼真且富有表现力的面部化身方面的有效性。我们的代码见 https://github.com/jematy/SAHS-Deformable-Nerf。
{"title":"Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction","authors":"Kaixin Jin,&nbsp;Xiaoling Gu,&nbsp;Zimeng Wang,&nbsp;Zhenzhong Kuang,&nbsp;Zizhao Wu,&nbsp;Min Tan,&nbsp;Jun Yu","doi":"10.1016/j.patrec.2024.08.004","DOIUrl":"10.1016/j.patrec.2024.08.004","url":null,"abstract":"<div><p>High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at <span><span>https://github.com/jematy/SAHS-Deformable-Nerf</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 160-166"},"PeriodicalIF":3.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convolutional Spiking Neural Networks targeting learning and inference in highly imbalanced datasets 以高度不平衡数据集的学习和推理为目标的卷积尖峰神经网络
IF 5.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.08.002
Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva
Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.
尖峰神经网络(SNN)可在神经形态硬件上实现,因此被视为人工智能的下一个前沿领域,为该领域在现实世界中的应用铺平了道路。SNNs 提供了一种受生物启发的解决方案,具有事件驱动、节能和稀疏的特点。虽然取得了可喜的成果,但仍有一些挑战需要解决。例如,整合架构、学习、超参数优化和推理的 "设计-构建-评估 "流程需要针对具体问题进行定制。这对于金融服务等关键的高风险行业尤为重要。在本文中,我们介绍了一种新型深度卷积尖峰神经网络(CSNN)--SpikeConv,并在高度不平衡的在线银行开户欺诈问题中研究了这一过程。我们的方法与深度尖峰神经网络(DSNN)和梯度提升决策树(GBDT)进行了比较,结果显示我们的方法很有竞争力。
{"title":"Convolutional Spiking Neural Networks targeting learning and inference in highly imbalanced datasets","authors":"Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva","doi":"10.1016/j.patrec.2024.08.002","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.08.002","url":null,"abstract":"Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"86 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the special issue on “Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)” "基于部件的图像分析和分类的计算机视觉解决方案(CV_PARTIAL)"特刊简介
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.07.023
Fabio Narducci, Piercalo Dondi, David Freire Obregón, Florin Pop
{"title":"Introduction to the special issue on “Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)”","authors":"Fabio Narducci,&nbsp;Piercalo Dondi,&nbsp;David Freire Obregón,&nbsp;Florin Pop","doi":"10.1016/j.patrec.2024.07.023","DOIUrl":"10.1016/j.patrec.2024.07.023","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 150-151"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration 基于特征一致的共面对对应和融合的点云注册
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.08.001
Kuo-Liang Chung, Chia-Chi Hsu, Pei-Hsuan Hsieh

It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.

两个点云的配准是一项重要而具有挑战性的任务,估计出的配准方案可应用于三维视觉。本文首先提出了一种去除离群点的方法,以删除冗余的共面对对应关系,从而构建三个特征一致的共面对对应子集。接着,采用罗德里格斯公式和基于评分的方法求解每个对应子集的代表性配准解。然后,提出一种稳健的融合方法,将三个代表性方案融合为最终的配准方案。基于典型测试数据集的综合实验结果表明,与最先进的方法相比,我们的配准算法在获得良好配准精度的同时,还能显著缩短执行时间。
{"title":"Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration","authors":"Kuo-Liang Chung,&nbsp;Chia-Chi Hsu,&nbsp;Pei-Hsuan Hsieh","doi":"10.1016/j.patrec.2024.08.001","DOIUrl":"10.1016/j.patrec.2024.08.001","url":null,"abstract":"<div><p>It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 143-149"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1