Pub Date : 2024-08-22DOI: 10.1016/j.patrec.2024.08.015
Áron Fóthi, Joul Skaf, Fengjiao Lu, Kristian Fenech
This paper addresses the challenging task of unsupervised relative human pose estimation. Our solution exploits the potential offered by utilizing multiple uncalibrated cameras. It is assumed that spatial human pose and camera parameter estimation can be solved as a block sparse dictionary learning problem with zero supervision. The resulting structures and camera parameters can fit individual skeletons into a common space. To do so, we exploit the fact that all individuals in the image are viewed from the same camera viewpoint, thus exploiting the information provided by multiple camera views and overcoming the lack of information on camera parameters. To the best of our knowledge, this is the first solution that requires neither 3D ground truth nor knowledge of the intrinsic or extrinsic camera parameters. Our approach demonstrates the potential of using multiple viewpoints to solve challenging computer vision problems. Additionally, we provide access to the code, encouraging further development and experimentation. https://github.com/Jeryoss/MVMB-NRSFM.
{"title":"Deep NRSFM for multi-view multi-body pose estimation","authors":"Áron Fóthi, Joul Skaf, Fengjiao Lu, Kristian Fenech","doi":"10.1016/j.patrec.2024.08.015","DOIUrl":"10.1016/j.patrec.2024.08.015","url":null,"abstract":"<div><p>This paper addresses the challenging task of unsupervised relative human pose estimation. Our solution exploits the potential offered by utilizing multiple uncalibrated cameras. It is assumed that spatial human pose and camera parameter estimation can be solved as a block sparse dictionary learning problem with zero supervision. The resulting structures and camera parameters can fit individual skeletons into a common space. To do so, we exploit the fact that all individuals in the image are viewed from the same camera viewpoint, thus exploiting the information provided by multiple camera views and overcoming the lack of information on camera parameters. To the best of our knowledge, this is the first solution that requires neither 3D ground truth nor knowledge of the intrinsic or extrinsic camera parameters. Our approach demonstrates the potential of using multiple viewpoints to solve challenging computer vision problems. Additionally, we provide access to the code, encouraging further development and experimentation. <span><span>https://github.com/Jeryoss/MVMB-NRSFM</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 218-224"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002472/pdfft?md5=c7f415f86c9c99693c29d66ef080962f&pid=1-s2.0-S0167865524002472-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.patrec.2024.08.017
Ansel Y. Rodríguez González, Humberto Perez-Espinosa, Jesús Ariel Carrasco Ochoa, José Francisco Martínez Trinidad, José Arturo Olvera López
{"title":"Special section: Best papers of the 15th Mexican conference on pattern recognition (MCPR) 2023","authors":"Ansel Y. Rodríguez González, Humberto Perez-Espinosa, Jesús Ariel Carrasco Ochoa, José Francisco Martínez Trinidad, José Arturo Olvera López","doi":"10.1016/j.patrec.2024.08.017","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.08.017","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"5 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.patrec.2024.08.008
Usman Muhammad , Mourad Oussalah , Jorma Laaksonen
With the growing availability of databases for face presentation attack detection, researchers are increasingly focusing on video-based face anti-spoofing methods that involve hundreds to thousands of images for training the models. However, there is currently no clear consensus on the optimal number of frames in a video to improve face spoofing detection. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling the identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail images, enhancing the representation of the most important information. Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail images using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of the proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency for face presentation attack detection. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method’s effectiveness, a simple CNN–RNN deep learning architecture was used, and the experimental results showcased state-of-the-art performance on four challenging face anti-spoofing datasets.
{"title":"Saliency-based video summarization for face anti-spoofing","authors":"Usman Muhammad , Mourad Oussalah , Jorma Laaksonen","doi":"10.1016/j.patrec.2024.08.008","DOIUrl":"10.1016/j.patrec.2024.08.008","url":null,"abstract":"<div><p>With the growing availability of databases for face presentation attack detection, researchers are increasingly focusing on video-based face anti-spoofing methods that involve hundreds to thousands of images for training the models. However, there is currently no clear consensus on the optimal number of frames in a video to improve face spoofing detection. Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency. In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images, enabling the identification of the most visually salient regions within each frame. Subsequently, the source images are decomposed into base and detail images, enhancing the representation of the most important information. Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image. By linearly combining the base and detail images using the weighting maps, the method fuses the source images to create a single representative image that summarizes the entire video. The key contribution of the proposed method lies in demonstrating how visual saliency can be used as a data-centric approach to improve the performance and efficiency for face presentation attack detection. By focusing on the most salient images or regions within the images, a more representative and diverse training set can be created, potentially leading to more effective models. To validate the method’s effectiveness, a simple CNN–RNN deep learning architecture was used, and the experimental results showcased state-of-the-art performance on four challenging face anti-spoofing datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 190-196"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.patrec.2024.08.014
Xin Min , Wei Li , Ruiqi Han , Tianlong Ji , Weidong Xie
Recently, considering the advancement of information technology in healthcare, electronic medical records (EMRs) have become the repository of patients’ treatment processes in hospitals, including the patient’s treatment pattern (standard treatment process), the patient’s medical history, the patient’s admission diagnosis, etc. In particular, EMRs-based treatment recommendation systems have become critical for optimizing clinical decision-making. EMRs contain complex relationships between patients and treatment patterns. Recent studies have shown that graph neural collaborative filtering can effectively capture the complex relationships in EMRs. However, none of the existing methods take into account the impact of medical content such as the patient’s admission diagnosis, and medical history on treatment recommendations. In this work, we propose a graph neural collaborative filtering model with medical content-aware pre-training (CAPRec) for learning initial embeddings with medical content to improve recommendation performance. First the model constructs a patient-treatment pattern interaction graph from EMRs data. Then we attempt to use the medical content for pre-training learning and transfer the learned embeddings to a graph neural collaborative filtering model. Finally, the learned initial embedding can support the downstream task of graph collaborative filtering. Extensive experiments on real world datasets have consistently demonstrated the effectiveness of the medical content-aware training framework in improving treatment recommendations.
{"title":"Graph neural collaborative filtering with medical content-aware pre-training for treatment pattern recommendation","authors":"Xin Min , Wei Li , Ruiqi Han , Tianlong Ji , Weidong Xie","doi":"10.1016/j.patrec.2024.08.014","DOIUrl":"10.1016/j.patrec.2024.08.014","url":null,"abstract":"<div><p>Recently, considering the advancement of information technology in healthcare, electronic medical records (EMRs) have become the repository of patients’ treatment processes in hospitals, including the patient’s treatment pattern (standard treatment process), the patient’s medical history, the patient’s admission diagnosis, etc. In particular, EMRs-based treatment recommendation systems have become critical for optimizing clinical decision-making. EMRs contain complex relationships between patients and treatment patterns. Recent studies have shown that graph neural collaborative filtering can effectively capture the complex relationships in EMRs. However, none of the existing methods take into account the impact of medical content such as the patient’s admission diagnosis, and medical history on treatment recommendations. In this work, we propose a graph neural collaborative filtering model with medical content-aware pre-training (CAPRec) for learning initial embeddings with medical content to improve recommendation performance. First the model constructs a patient-treatment pattern interaction graph from EMRs data. Then we attempt to use the medical content for pre-training learning and transfer the learned embeddings to a graph neural collaborative filtering model. Finally, the learned initial embedding can support the downstream task of graph collaborative filtering. Extensive experiments on real world datasets have consistently demonstrated the effectiveness of the medical content-aware training framework in improving treatment recommendations.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 210-217"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1016/j.patrec.2024.08.012
Anurag Dhote , Mohammed Javed , David S. Doermann
Charts are a visualization tool used in scientific documents to facilitate easy comprehension of complex relationships underlying data and experiments. Researchers use various chart types to convey scientific information, so the problem of data extraction and subsequent chart understanding becomes very challenging. Many studies have been taken up in the literature to address the problem of chart mining, whose motivation is to facilitate the editing of existing charts, carry out extrapolative studies, and provide a deeper understanding of the underlying data. The first step towards chart understanding is chart classification, for which traditional ML and CNN-based deep learning models have been used in the literature. In this paper, we propose Swin-Chart, a Swin transformer-based deep learning approach for chart classification, which generalizes well across multiple datasets with a wide range of chart categories. Swin-Chart comprises a pre-trained Swin Transformer, a finetuning component, and a weight averaging component. The proposed approach is tested on a five-chart image benchmark dataset. We observed that the Swin-Chart model outperformers existing state-of-the-art models on all the datasets. Furthermore, we also provide an ablation study of the Swin-Chart model with all five datasets to understand the importance of various sub-parts such as the back-bone Swin transformer model, the value of several best weights selected for the weight averaging component, and the presence of the weight averaging component itself.
The Swin-Chart model also received first position in the chart classification task on the latest dataset in the CHART Infographics competition at ICDAR 2023 - chartinfo.github.io.
{"title":"Swin-chart: An efficient approach for chart classification","authors":"Anurag Dhote , Mohammed Javed , David S. Doermann","doi":"10.1016/j.patrec.2024.08.012","DOIUrl":"10.1016/j.patrec.2024.08.012","url":null,"abstract":"<div><p>Charts are a visualization tool used in scientific documents to facilitate easy comprehension of complex relationships underlying data and experiments. Researchers use various chart types to convey scientific information, so the problem of data extraction and subsequent chart understanding becomes very challenging. Many studies have been taken up in the literature to address the problem of chart mining, whose motivation is to facilitate the editing of existing charts, carry out extrapolative studies, and provide a deeper understanding of the underlying data. The first step towards chart understanding is chart classification, for which traditional ML and CNN-based deep learning models have been used in the literature. In this paper, we propose Swin-Chart, a Swin transformer-based deep learning approach for chart classification, which generalizes well across multiple datasets with a wide range of chart categories. Swin-Chart comprises a pre-trained Swin Transformer, a finetuning component, and a weight averaging component. The proposed approach is tested on a five-chart image benchmark dataset. We observed that the Swin-Chart model outperformers existing state-of-the-art models on all the datasets. Furthermore, we also provide an ablation study of the Swin-Chart model with all five datasets to understand the importance of various sub-parts such as the back-bone Swin transformer model, the value of several best weights selected for the weight averaging component, and the presence of the weight averaging component itself.</p><p>The Swin-Chart model also received first position in the chart classification task on the latest dataset in the CHART Infographics competition at ICDAR 2023 - chartinfo.github.io.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 203-209"},"PeriodicalIF":3.9,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1016/j.patrec.2024.08.007
Ali Zoljodi , Sadegh Abadijou , Mina Alibeigi , Masoud Daneshtalab
Detecting lane markings in road scenes poses a significant challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by varying lighting conditions, adverse weather, occlusions by other vehicles or pedestrians, road plane changes, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of high-quality and diverse data to train a robust lane detection model capable of handling various real-world scenarios.
In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via Cross-Similarity (CLLD) to enhance the resilience and effectiveness of lane detection models in real-world scenarios, particularly when the visibility of lane markings are compromised. CLLD introduces a novel contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our newly proposed operation, dubbed cross-similarity.
The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmark datasets demonstrate that CLLD consistently outperforms state-of-the-art contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. When compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.
道路场景中的车道标线错综复杂,很容易受到不利条件的影响,因此对其进行检测是一项巨大的挑战。虽然车道标线具有很强的形状先验性,但其可视性很容易受到不同光照条件、恶劣天气、其他车辆或行人遮挡、路面变化以及颜色随时间褪色等因素的影响。检测过程因多种车道形状和自然变化的存在而变得更加复杂,因此需要大量高质量和多样化的数据来训练能够处理各种真实世界场景的鲁棒车道检测模型。在本文中,我们提出了一种名为 "通过交叉相似性进行车道检测的对比学习"(Contrastive Learning for Lane Detection via Cross-Similarity,简称 CLLD)的新型自监督学习方法,以增强车道检测模型在真实世界场景中的适应性和有效性,尤其是当车道标记的可见性受到影响时。CLLD 引入了一种新颖的对比学习(CL)方法,在输入图像的全局背景下评估局部特征的相似性。它利用周边信息来预测车道标记。这是通过将局部特征对比学习与我们新提出的操作(称为交叉相似性)相结合来实现的。局部特征 CL 专注于从小块图像中提取特征,这是精确定位车道分段的必要条件。同时,交叉相似性可以捕捉全局特征,从而根据周围环境检测出模糊的车道段。我们通过在增强过程中随机屏蔽部分输入图像来增强交叉相似性。在 TuSimple 和 CuLane 基准数据集上进行的大量实验表明,CLLD 始终优于最先进的对比学习方法,尤其是在阴影等有损可见度的条件下,同时它在正常条件下也能提供与之相当的结果。与监督学习相比,CLLD 在阴影和拥挤场景等具有挑战性的场景中仍然表现出色,而这些场景在实际驾驶中很常见。
{"title":"Contrastive Learning for Lane Detection via cross-similarity","authors":"Ali Zoljodi , Sadegh Abadijou , Mina Alibeigi , Masoud Daneshtalab","doi":"10.1016/j.patrec.2024.08.007","DOIUrl":"10.1016/j.patrec.2024.08.007","url":null,"abstract":"<div><p>Detecting lane markings in road scenes poses a significant challenge due to their intricate nature, which is susceptible to unfavorable conditions. While lane markings have strong shape priors, their visibility is easily compromised by varying lighting conditions, adverse weather, occlusions by other vehicles or pedestrians, road plane changes, and fading of colors over time. The detection process is further complicated by the presence of several lane shapes and natural variations, necessitating large amounts of high-quality and diverse data to train a robust lane detection model capable of handling various real-world scenarios.</p><p>In this paper, we present a novel self-supervised learning method termed Contrastive Learning for Lane Detection via Cross-Similarity (CLLD) to enhance the resilience and effectiveness of lane detection models in real-world scenarios, particularly when the visibility of lane markings are compromised. CLLD introduces a novel contrastive learning (CL) method that assesses the similarity of local features within the global context of the input image. It uses the surrounding information to predict lane markings. This is achieved by integrating local feature contrastive learning with our newly proposed operation, dubbed <em>cross-similarity</em>.</p><p>The local feature CL concentrates on extracting features from small patches, a necessity for accurately localizing lane segments. Meanwhile, cross-similarity captures global features, enabling the detection of obscured lane segments based on their surroundings. We enhance cross-similarity by randomly masking portions of input images in the process of augmentation. Extensive experiments on TuSimple and CuLane benchmark datasets demonstrate that CLLD consistently outperforms state-of-the-art contrastive learning methods, particularly in visibility-impairing conditions like shadows, while it also delivers comparable results under normal conditions. When compared to supervised learning, CLLD still excels in challenging scenarios such as shadows and crowded scenes, which are common in real-world driving.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 175-183"},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002393/pdfft?md5=216ead31bb4d56cfb720a21ce2d4db87&pid=1-s2.0-S0167865524002393-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1016/j.patrec.2024.08.004
Kaixin Jin, Xiaoling Gu, Zimeng Wang, Zhenzhong Kuang, Zizhao Wu, Min Tan, Jun Yu
High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at https://github.com/jematy/SAHS-Deformable-Nerf.
{"title":"Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction","authors":"Kaixin Jin, Xiaoling Gu, Zimeng Wang, Zhenzhong Kuang, Zizhao Wu, Min Tan, Jun Yu","doi":"10.1016/j.patrec.2024.08.004","DOIUrl":"10.1016/j.patrec.2024.08.004","url":null,"abstract":"<div><p>High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at <span><span>https://github.com/jematy/SAHS-Deformable-Nerf</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 160-166"},"PeriodicalIF":3.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.08.002
Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva
Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.
{"title":"Convolutional Spiking Neural Networks targeting learning and inference in highly imbalanced datasets","authors":"Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva","doi":"10.1016/j.patrec.2024.08.002","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.08.002","url":null,"abstract":"Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"86 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.07.023
Fabio Narducci, Piercalo Dondi, David Freire Obregón, Florin Pop
{"title":"Introduction to the special issue on “Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)”","authors":"Fabio Narducci, Piercalo Dondi, David Freire Obregón, Florin Pop","doi":"10.1016/j.patrec.2024.07.023","DOIUrl":"10.1016/j.patrec.2024.07.023","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 150-151"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.patrec.2024.08.001
Kuo-Liang Chung, Chia-Chi Hsu, Pei-Hsuan Hsieh
It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.
{"title":"Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration","authors":"Kuo-Liang Chung, Chia-Chi Hsu, Pei-Hsuan Hsieh","doi":"10.1016/j.patrec.2024.08.001","DOIUrl":"10.1016/j.patrec.2024.08.001","url":null,"abstract":"<div><p>It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 143-149"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}