首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Single image dehazing based on multi-label graph cuts 基于多标签图切割的单幅图像去毛刺技术
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-26 DOI: 10.1016/j.patrec.2024.07.015
Minshen Qin , Junzheng Jiang , Fang Zhou

Haze blurs image information and reduces the visibility of objects in the image, which seriously affects the performance of computer vision applications in a hazy environment. We propose an improved dehazing model based on multi-label graph cuts. A hazy image is modeled as an undirected graph. The multi-label graph cuts algorithm divides the image into subregions according to the functions of brightness and saturation. A subregion is selected to estimate atmospheric light based on saturation. Under the similarity of transmission in the same subregion, transmission is estimated by the distance between the pixel and atmospheric light in RGB space. Finally, the transmission map is regularized to recover a haze-free image. Experiments in different scenarios demonstrate the effectiveness of the proposed method than the state-of-the-art methods.

雾霾会模糊图像信息,降低图像中物体的可见度,严重影响计算机视觉应用在雾霾环境中的性能。我们提出了一种基于多标签图切割的改进型去雾模型。雾霾图像被建模为一个无向图。多标签图切割算法根据亮度和饱和度函数将图像划分为子区域。根据饱和度选择一个子区域来估计大气光。在同一子区域透射率相似的情况下,通过像素与 RGB 空间中大气光之间的距离来估计透射率。最后,对透射图进行正则化处理,以恢复无雾霾图像。在不同场景下的实验证明,所提出的方法比最先进的方法更有效。
{"title":"Single image dehazing based on multi-label graph cuts","authors":"Minshen Qin ,&nbsp;Junzheng Jiang ,&nbsp;Fang Zhou","doi":"10.1016/j.patrec.2024.07.015","DOIUrl":"10.1016/j.patrec.2024.07.015","url":null,"abstract":"<div><p>Haze blurs image information and reduces the visibility of objects in the image, which seriously affects the performance of computer vision applications in a hazy environment. We propose an improved dehazing model based on multi-label graph cuts. A hazy image is modeled as an undirected graph. The multi-label graph cuts algorithm divides the image into subregions according to the functions of brightness and saturation. A subregion is selected to estimate atmospheric light based on saturation. Under the similarity of transmission in the same subregion, transmission is estimated by the distance between the pixel and atmospheric light in RGB space. Finally, the transmission map is regularized to recover a haze-free image. Experiments in different scenarios demonstrate the effectiveness of the proposed method than the state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 110-116"},"PeriodicalIF":3.9,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DualGroup for 3D instance and panoptic segmentation 用于 3D 实例和全景分割的 DualGroup
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-25 DOI: 10.1016/j.patrec.2024.07.014
Lin Zhao, Sijia Chen, Xu Tang, Wenbing Tao

Existing 3D instance segmentation methods usually learn the offsets (also known as center-shifted vectors) from points to their instance center for clustering and generating segmentation results. However, due to the instances with different scales, direct regression offsets will make the model pay more attention to the larger instances and ignore the smaller instances. Besides, the clustering also may fail because a single bandwidth for point grouping is insufficient for instances with different scales. To address these two problems, we propose a new framework (DualGroup) for 3D instance segmentation. For the first issue, different from directly learning the offsets, we propose an encoded center-shifted vector learning (ECSVL), which effectively compresses the range of the regression center-shifted vectors for more conducive learning of smaller instances. Second, to handle the instances with different scales in clustering, we propose a dual hierarchical grouping (DHG) to better group all points into different instances. The cooperation of these two components leads to the success of indoor instance segmentation. Moreover, the DualGroup is extended to the 3D panoptic segmentation by fusing the semantic predictions and instance results. Experimental results on the ScanNet v2 and S3DIS datasets demonstrate the effectiveness and superiority of the DualGroup.

现有的三维实例分割方法通常会学习从点到其实例中心的偏移量(也称为中心偏移向量),用于聚类和生成分割结果。然而,由于实例的尺度不同,直接回归偏移量会使模型更加关注较大的实例,而忽略较小的实例。此外,聚类也可能失败,因为对于不同尺度的实例来说,单一的点分组带宽是不够的。针对这两个问题,我们提出了一种新的三维实例分割框架(DualGroup)。针对第一个问题,与直接学习偏移量不同,我们提出了编码中心偏移向量学习(ECSVL),它能有效压缩回归中心偏移向量的范围,更有利于学习较小的实例。其次,为了在聚类中处理不同尺度的实例,我们提出了双重分层分组(DHG),以便更好地将所有点归类为不同的实例。这两个部分的合作使室内实例分割取得了成功。此外,通过融合语义预测和实例结果,DualGroup 还扩展到了三维全景分割。在 ScanNet v2 和 S3DIS 数据集上的实验结果证明了 DualGroup 的有效性和优越性。
{"title":"DualGroup for 3D instance and panoptic segmentation","authors":"Lin Zhao,&nbsp;Sijia Chen,&nbsp;Xu Tang,&nbsp;Wenbing Tao","doi":"10.1016/j.patrec.2024.07.014","DOIUrl":"10.1016/j.patrec.2024.07.014","url":null,"abstract":"<div><p>Existing 3D instance segmentation methods usually learn the offsets (also known as center-shifted vectors) from points to their instance center for clustering and generating segmentation results. However, due to the instances with different scales, direct regression offsets will make the model pay more attention to the larger instances and ignore the smaller instances. Besides, the clustering also may fail because a single bandwidth for point grouping is insufficient for instances with different scales. To address these two problems, we propose a new framework (DualGroup) for 3D instance segmentation. For the first issue, different from directly learning the offsets, we propose an encoded center-shifted vector learning (ECSVL), which effectively compresses the range of the regression center-shifted vectors for more conducive learning of smaller instances. Second, to handle the instances with different scales in clustering, we propose a dual hierarchical grouping (DHG) to better group all points into different instances. The cooperation of these two components leads to the success of indoor instance segmentation. Moreover, the DualGroup is extended to the 3D panoptic segmentation by fusing the semantic predictions and instance results. Experimental results on the ScanNet v2 and S3DIS datasets demonstrate the effectiveness and superiority of the DualGroup.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 124-129"},"PeriodicalIF":3.9,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141852827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised Domain Adaptation with Significance-Oriented Masking for Pelvic Organ Prolapse detection 针对骨盆器官脱垂检测的自监督领域自适应与重要性导向掩蔽技术
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-18 DOI: 10.1016/j.patrec.2024.07.012
Shichang Li , Hongjie Wu , Chenwei Tang , Dongdong Chen , Yueyue Chen , Ling Mei , Fan Yang , Jiancheng Lv

Pelvic Organ Prolapse (POP) is a common disease in middle-aged and elderly women. The detection of POP is a challenging task, and using deep learning for detection has its practical significance. However, medical image detection tasks always face many problems, i.e., small sample size, data imbalance, and unobvious pathological characteristics. In this paper, we propose a new training framework, called self-supervised Domain Adaptation with Significance-Oriented Masking (DASOM), to address these problems and improve the performance of POP intelligent detection. DASOM includes a new pre-training process based on the masked image modeling task, and redesigns the masking strategy, bringing the local induction capability required for detection to the model. Meanwhile, we also adopt the data process method fitting the pelvic floor ultrasonic dataset to effectively solve the problem of data shortage and imbalance. Extensive experimental results and analysis confirm that the proposed method significantly improves the performance and reliability of POP detection.

盆腔器官脱垂(POP)是中老年妇女的常见疾病。盆腔脏器脱垂的检测是一项具有挑战性的任务,利用深度学习进行检测具有重要的现实意义。然而,医学图像检测任务始终面临着样本量小、数据不平衡、病理特征不明显等诸多问题。针对这些问题,本文提出了一种新的训练框架,即 "自监督领域适应与意义定向掩蔽(DASOM)",以提高 POP 智能检测的性能。DASOM 包括一个基于遮蔽图像建模任务的新的预训练过程,并重新设计了遮蔽策略,为模型带来了检测所需的局部归纳能力。同时,我们还采用了拟合盆底超声数据集的数据处理方法,有效地解决了数据短缺和不平衡的问题。大量的实验结果和分析证实,所提出的方法显著提高了 POP 检测的性能和可靠性。
{"title":"Self-supervised Domain Adaptation with Significance-Oriented Masking for Pelvic Organ Prolapse detection","authors":"Shichang Li ,&nbsp;Hongjie Wu ,&nbsp;Chenwei Tang ,&nbsp;Dongdong Chen ,&nbsp;Yueyue Chen ,&nbsp;Ling Mei ,&nbsp;Fan Yang ,&nbsp;Jiancheng Lv","doi":"10.1016/j.patrec.2024.07.012","DOIUrl":"10.1016/j.patrec.2024.07.012","url":null,"abstract":"<div><p>Pelvic Organ Prolapse (POP) is a common disease in middle-aged and elderly women. The detection of POP is a challenging task, and using deep learning for detection has its practical significance. However, medical image detection tasks always face many problems, i.e., small sample size, data imbalance, and unobvious pathological characteristics. In this paper, we propose a new training framework, called self-supervised Domain Adaptation with Significance-Oriented Masking (DASOM), to address these problems and improve the performance of POP intelligent detection. DASOM includes a new pre-training process based on the masked image modeling task, and redesigns the masking strategy, bringing the local induction capability required for detection to the model. Meanwhile, we also adopt the data process method fitting the pelvic floor ultrasonic dataset to effectively solve the problem of data shortage and imbalance. Extensive experimental results and analysis confirm that the proposed method significantly improves the performance and reliability of POP detection.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 94-100"},"PeriodicalIF":3.9,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A micro Reinforcement Learning architecture for Intrusion Detection Systems 入侵检测系统的微型强化学习架构
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-15 DOI: 10.1016/j.patrec.2024.07.010
Boshra Darabi, Mozafar Bag-Mohammadi, Mojtaba Karami

This paper proposes an Intrusion Detection System (IDS) that utilizes Deep Reinforcement Learning (DRL) in a fine-grained manner to enhance the performance of binary and multiclass intrusion classification tasks. The proposed system, named Micro Reinforcement Learning Classifier (MRLC), is evaluated using three standard datasets. MRLC architecture utilizes a fine-grained learning approach to enhance IDS accuracy. Simulation studies demonstrate that MRLC has a high efficiency in discriminating different intrusion classes, outperforming state-of-the-art RL-based methods. The average accuracy of MRLC is 99.56%, 99.99%, 99.01% for NSL-KDD, CIC-IDS2018, and UNSW-NB15 datasets respectively. The implementation codes are available at https://github.com/boshradarabi/MICRO-RL-IDS.

本文提出了一种入侵检测系统(IDS),它以细粒度的方式利用深度强化学习(DRL)来提高二元和多类别入侵分类任务的性能。该系统被命名为微强化学习分类器(MRLC),使用三个标准数据集对其进行了评估。MRLC 架构利用细粒度学习方法来提高 IDS 的准确性。仿真研究表明,MRLC 在区分不同入侵类别方面具有很高的效率,优于最先进的基于 RL 的方法。在 NSL-KDD、CIC-IDS2018 和 UNSW-NB15 数据集上,MRLC 的平均准确率分别为 99.56%、99.99% 和 99.01%。实现代码可在 https://github.com/boshradarabi/MICRO-RL-IDS 上获取。
{"title":"A micro Reinforcement Learning architecture for Intrusion Detection Systems","authors":"Boshra Darabi,&nbsp;Mozafar Bag-Mohammadi,&nbsp;Mojtaba Karami","doi":"10.1016/j.patrec.2024.07.010","DOIUrl":"10.1016/j.patrec.2024.07.010","url":null,"abstract":"<div><p>This paper proposes an Intrusion Detection System (IDS) that utilizes Deep Reinforcement Learning (DRL) in a fine-grained manner to enhance the performance of binary and multiclass intrusion classification tasks. The proposed system, named Micro Reinforcement Learning Classifier (MRLC), is evaluated using three standard datasets. MRLC architecture utilizes a fine-grained learning approach to enhance IDS accuracy. Simulation studies demonstrate that MRLC has a high efficiency in discriminating different intrusion classes, outperforming state-of-the-art RL-based methods. The average accuracy of MRLC is 99.56%, 99.99%, 99.01% for NSL-KDD, CIC-IDS2018, and UNSW-NB15 datasets respectively. The implementation codes are available at <span><span>https://github.com/boshradarabi/MICRO-RL-IDS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 81-86"},"PeriodicalIF":3.9,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141702404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighted Intersection over Union (wIoU) for evaluating image segmentation 用于评估图像分割的加权相交联合(wIoU)
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-15 DOI: 10.1016/j.patrec.2024.07.011
Yeong-Jun Cho

In recent years, many semantic segmentation methods have been proposed to predict label of pixels in the scene. In general, we measure area prediction errors or boundary prediction errors for comparing methods. However, there is no intuitive evaluation metric that evaluates both aspects. In this work, we propose a new evaluation measure called weighted Intersection over Union (wIoU) for semantic segmentation. First, it builds a weight map generated from a boundary distance map, allowing weighted evaluation for each pixel based on a boundary importance factor. The proposed wIoU can evaluate both contour and region by setting a boundary importance factor. We validated the effectiveness of wIoU on a dataset of 33 scenes and demonstrated its flexibility. Using the proposed metric, we expect more flexible and intuitive evaluation in semantic segmentation field are possible.

近年来,人们提出了许多语义分割方法来预测场景中像素的标签。一般来说,我们会测量区域预测误差或边界预测误差来比较各种方法。然而,目前还没有一种直观的评估指标能同时评估这两个方面。在这项工作中,我们为语义分割提出了一种新的评估指标,称为加权交叉联合(wIoU)。首先,它建立了一个由边界距离图生成的权重图,允许根据边界重要性因子对每个像素进行加权评估。提议的 wIoU 可以通过设置边界重要性因子来评估轮廓和区域。我们在 33 个场景的数据集上验证了 wIoU 的有效性,并证明了它的灵活性。通过使用所提出的度量方法,我们预计在语义分割领域可以进行更灵活、更直观的评估。
{"title":"Weighted Intersection over Union (wIoU) for evaluating image segmentation","authors":"Yeong-Jun Cho","doi":"10.1016/j.patrec.2024.07.011","DOIUrl":"10.1016/j.patrec.2024.07.011","url":null,"abstract":"<div><p>In recent years, many semantic segmentation methods have been proposed to predict label of pixels in the scene. In general, we measure area prediction errors or boundary prediction errors for comparing methods. However, there is no intuitive evaluation metric that evaluates both aspects. In this work, we propose a new evaluation measure called weighted Intersection over Union (wIoU) for semantic segmentation. First, it builds a weight map generated from a boundary distance map, allowing weighted evaluation for each pixel based on a boundary importance factor. The proposed wIoU can evaluate both contour and region by setting a boundary importance factor. We validated the effectiveness of wIoU on a dataset of 33 scenes and demonstrated its flexibility. Using the proposed metric, we expect more flexible and intuitive evaluation in semantic segmentation field are possible.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 101-107"},"PeriodicalIF":3.9,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141884161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale occlusion suppression network for occluded person re-identification 用于被遮挡人员再识别的多尺度遮挡抑制网络
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-15 DOI: 10.1016/j.patrec.2024.07.009
Yunzuo Zhang, Yuehui Yang, Weili Kang, Jiawen Zhen

In practical application scenarios, the occlusion caused by various obstacles greatly undermines the accuracy of person re-identification. Most existing methods for occluded person re-identification focus on inferring visible parts of the body through auxiliary models, resulting in inaccurate feature matching of parts and ignoring the problem of insufficient occluded samples, which seriously affects the accuracy of occluded person re-identification. To address the above issues, we propose a multi-scale occlusion suppression network (MSOSNet) for occluded person re-identification. Specifically, we first propose a dual occlusion augmentation module (DOAM), which combines random occlusion with our proposed novel cross occlusion to generate more diverse occlusion data. Meanwhile, we design a novel occluded-aware spatial attention module (OSAM) to enable the network to focus on non-occluded areas of pedestrian images and effectively extract discriminative features. Ultimately, we propose a part feature matching module (PFMM) that utilizes graph matching algorithms to match non-occluded body parts of pedestrians. Extensive experimental results on both occluded and holistic datasets validate the effectiveness of our method.

在实际应用场景中,各种障碍物造成的遮挡极大地影响了人员再识别的准确性。现有的闭塞人再识别方法大多侧重于通过辅助模型推断身体的可见部分,导致部分特征匹配不准确,忽略了闭塞样本不足的问题,严重影响了闭塞人再识别的准确性。针对上述问题,我们提出了一种多尺度闭塞抑制网络(MSOSNet),用于闭塞人员再识别。具体来说,我们首先提出了双闭塞增强模块(DOAM),它将随机闭塞与我们提出的新型交叉闭塞相结合,以生成更多样化的闭塞数据。同时,我们设计了一个新颖的闭塞感知空间关注模块(OSAM),使网络能够关注行人图像中的非闭塞区域,并有效地提取辨别特征。最后,我们提出了一个部件特征匹配模块(PFMM),利用图匹配算法来匹配行人的非遮挡身体部位。在闭塞和整体数据集上的大量实验结果验证了我们方法的有效性。
{"title":"Multi-scale occlusion suppression network for occluded person re-identification","authors":"Yunzuo Zhang,&nbsp;Yuehui Yang,&nbsp;Weili Kang,&nbsp;Jiawen Zhen","doi":"10.1016/j.patrec.2024.07.009","DOIUrl":"10.1016/j.patrec.2024.07.009","url":null,"abstract":"<div><p>In practical application scenarios, the occlusion caused by various obstacles greatly undermines the accuracy of person re-identification. Most existing methods for occluded person re-identification focus on inferring visible parts of the body through auxiliary models, resulting in inaccurate feature matching of parts and ignoring the problem of insufficient occluded samples, which seriously affects the accuracy of occluded person re-identification. To address the above issues, we propose a multi-scale occlusion suppression network (MSOSNet) for occluded person re-identification. Specifically, we first propose a dual occlusion augmentation module (DOAM), which combines random occlusion with our proposed novel cross occlusion to generate more diverse occlusion data. Meanwhile, we design a novel occluded-aware spatial attention module (OSAM) to enable the network to focus on non-occluded areas of pedestrian images and effectively extract discriminative features. Ultimately, we propose a part feature matching module (PFMM) that utilizes graph matching algorithms to match non-occluded body parts of pedestrians. Extensive experimental results on both occluded and holistic datasets validate the effectiveness of our method.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 66-72"},"PeriodicalIF":3.9,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141717061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HC-MVSNet: A probability sampling-based multi-view-stereo network with hybrid cascade structure for 3D reconstruction HC-MVSNet:基于概率采样的多视角立体网络与混合级联结构,用于三维重建
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-14 DOI: 10.1016/j.patrec.2024.07.008
Tianxiang Gao, Zijian Hong, Yixing Tan, Lizhuo Sun, Yichen Wei, Jianwei Ma

Multi-view stereo (MVS) is one of the ways to obtain the 3D structure from 2D images. Deep learning is an effective end-to-end method for MVS. In previous MVS methods based on deep learning, the depth interval is deeply coupled with the feature map resolution, resulting in more accurate depth intervals accompanied by higher computational cost. This paper proposes a new deep neural network HC-MVSNet which utilizes a hybrid cascade structures for depth estimation of MVS. Different from the previous MVS methods, the new coarse-to-fine depth estimation method decouples the two processes of resolution increase and depth interval reduction through a simple operation, achieving higher reconstruction accuracy and completeness for minimal additional computational cost. In addition, an efficient depth sampling strategy based on probability distribution is introduced, which allocates higher hypothesis density for regions with a high probability of ground truth. This novel sampling method makes full use of redundant information that was previously neglected and significantly improves the textural detail of the results. Extensive experiments are conducted on DTU datasets, Tanks and Temples benchmark, and BlendedMVS datasets. The results show that the proposed method exhibits superior performance and better generalization behavior than existing MVS methods.

多视角立体(MVS)是从二维图像中获取三维结构的方法之一。深度学习是一种有效的端到端 MVS 方法。在以往基于深度学习的 MVS 方法中,深度区间与特征图分辨率深度耦合,结果是深度区间更精确,但计算成本更高。本文提出了一种新的深度神经网络 HC-MVSNet,利用混合级联结构进行 MVS 深度估计。与以往的 MVS 方法不同,新的从粗到细的深度估计方法通过简单的操作解耦了分辨率提高和深度区间缩小这两个过程,以最小的额外计算成本实现了更高的重建精度和完整性。此外,还引入了基于概率分布的高效深度采样策略,为地面实况概率高的区域分配更高的假设密度。这种新颖的采样方法充分利用了以往被忽视的冗余信息,显著改善了结果的纹理细节。在 DTU 数据集、坦克和寺庙基准数据集以及 BlendedMVS 数据集上进行了广泛的实验。结果表明,与现有的 MVS 方法相比,所提出的方法表现出更优越的性能和更好的泛化性能。
{"title":"HC-MVSNet: A probability sampling-based multi-view-stereo network with hybrid cascade structure for 3D reconstruction","authors":"Tianxiang Gao,&nbsp;Zijian Hong,&nbsp;Yixing Tan,&nbsp;Lizhuo Sun,&nbsp;Yichen Wei,&nbsp;Jianwei Ma","doi":"10.1016/j.patrec.2024.07.008","DOIUrl":"10.1016/j.patrec.2024.07.008","url":null,"abstract":"<div><p>Multi-view stereo (MVS) is one of the ways to obtain the 3D structure from 2D images. Deep learning is an effective end-to-end method for MVS. In previous MVS methods based on deep learning, the depth interval is deeply coupled with the feature map resolution, resulting in more accurate depth intervals accompanied by higher computational cost. This paper proposes a new deep neural network HC-MVSNet which utilizes a hybrid cascade structures for depth estimation of MVS. Different from the previous MVS methods, the new coarse-to-fine depth estimation method decouples the two processes of resolution increase and depth interval reduction through a simple operation, achieving higher reconstruction accuracy and completeness for minimal additional computational cost. In addition, an efficient depth sampling strategy based on probability distribution is introduced, which allocates higher hypothesis density for regions with a high probability of ground truth. This novel sampling method makes full use of redundant information that was previously neglected and significantly improves the textural detail of the results. Extensive experiments are conducted on DTU datasets, Tanks and Temples benchmark, and BlendedMVS datasets. The results show that the proposed method exhibits superior performance and better generalization behavior than existing MVS methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 59-65"},"PeriodicalIF":3.9,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141692229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised multi-echo point cloud denoising in snowfall 降雪中的自监督多回波点云去噪
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-10 DOI: 10.1016/j.patrec.2024.07.007
Alvari Seppänen, Risto Ojala, Kari Tammi

Snowfall can cause noise to light detection and ranging (LiDAR) data. This is a problem since it is used in many outdoor applications, e.g., autonomous driving. We propose the task of multi-echo denoising, where the goal is to pick the echo that represents the objects of interest and discard other echoes. Thus, the idea is to pick points from alternative echoes unavailable in standard strongest echo point clouds. Intuitively, we are trying to see through the snowfall. We propose a novel self-supervised deep learning method and the characteristics similarity regularization to achieve this goal. The characteristics similarity regularization utilizes noise characteristics to increase performance. The experiments with a real-world multi-echo snowfall dataset prove the efficacy of multi-echo denoising and superior performance to the baseline. Moreover, based on extensive experiments on a semi-synthetic dataset, our method achieves superior performance compared to the state-of-the-art in self-supervised snowfall denoising. Our work enables more reliable point cloud acquisition in snowfall. The code is available at https://github.com/alvariseppanen/SMEDen.

降雪会对光探测和测距(LiDAR)数据产生噪声。这是一个问题,因为光探测与测距(LiDAR)数据被广泛应用于自动驾驶等户外应用中。我们提出了多回波去噪任务,其目标是选取代表感兴趣对象的回波,而舍弃其他回波。因此,我们的想法是从标准最强回波点云中无法获得的其他回波中选取点。直观地说,我们试图看穿降雪。为了实现这一目标,我们提出了一种新颖的自监督深度学习方法和特征相似性正则化方法。特征相似性正则化利用噪声特征来提高性能。在真实世界的多回波降雪数据集上进行的实验证明了多回波去噪的功效和优于基线的性能。此外,基于在半合成数据集上的广泛实验,我们的方法在自监督降雪去噪方面取得了优于最先进方法的性能。我们的工作使降雪中的点云采集更加可靠。代码见 https://github.com/alvariseppanen/SMEDen。
{"title":"Self-supervised multi-echo point cloud denoising in snowfall","authors":"Alvari Seppänen,&nbsp;Risto Ojala,&nbsp;Kari Tammi","doi":"10.1016/j.patrec.2024.07.007","DOIUrl":"10.1016/j.patrec.2024.07.007","url":null,"abstract":"<div><p>Snowfall can cause noise to light detection and ranging (LiDAR) data. This is a problem since it is used in many outdoor applications, e.g., autonomous driving. We propose the task of multi-echo denoising, where the goal is to pick the echo that represents the objects of interest and discard other echoes. Thus, the idea is to pick points from alternative echoes unavailable in standard strongest echo point clouds. Intuitively, we are trying to see through the snowfall. We propose a novel self-supervised deep learning method and the characteristics similarity regularization to achieve this goal. The characteristics similarity regularization utilizes noise characteristics to increase performance. The experiments with a real-world multi-echo snowfall dataset prove the efficacy of multi-echo denoising and superior performance to the baseline. Moreover, based on extensive experiments on a semi-synthetic dataset, our method achieves superior performance compared to the state-of-the-art in self-supervised snowfall denoising. Our work enables more reliable point cloud acquisition in snowfall. The code is available at <span><span>https://github.com/alvariseppanen/SMEDen</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 52-58"},"PeriodicalIF":3.9,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002101/pdfft?md5=f6938a951e071593ec29e322120d28fd&pid=1-s2.0-S0167865524002101-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gated Siamese Fusion Network based on multimodal deep and hand-crafted features for personality traits assessment 基于多模态深度和手工特征的门控连体融合网络,用于人格特质评估
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-08 DOI: 10.1016/j.patrec.2024.07.004
Elena Ryumina , Maxim Markitantov , Dmitry Ryumin , Alexey Karpov

People tend to judge others assessing their personality traits relying on life experience. This fact is especially evident when making an informed hiring decision, which should consider not only skills, but also match a company’s values and culture. Based on this assumption, we use the Siamese Network (SN) for assessing five personality traits by pairwise analyzing and comparing people simultaneously. For this, we propose the OCEAN-AI framework based on Gated Siamese Fusion Network (GSFN), which comprises six modules and enables the fusion of hand-crafted and deep features across three modalities (video, audio, and text). We use the ChaLearn First Impressions v2 (FIv2) and Multimodal Personality Traits Assessment (MuPTA) corpora and identify that all six feature sets and their combinations due to different information content allow the framework to adjust to heterogeneous input data flexibly. The experimental results show that the pairwise comparison of people with the same or different Personality Traits (PT) during the training enhances the proposed framework performance. The framework outperforms the State-of-the-Art (SOTA) systems based on three modalities (video-face, audio and text) by the relative value of 1.3% (0.928 vs. 0.916) in terms of the mean accuracy (mACC) on the FIv2 corpus. We also outperform the SOTA system in terms of the Concordance Correlation Coefficient (CCC) by the relative value of 8.6% (0.667 vs. 0.614) using two modalities (video and audio) on the MuPTA corpus. We make our framework publicly available to integrate it into various applications such as recruitment, education, and healthcare.

人们往往会根据生活经验来评估他人的个性特征。在做出明智的招聘决定时,这一事实尤为明显,因为招聘决定不仅要考虑技能,还要与公司的价值观和文化相匹配。基于这一假设,我们使用连体网络(Siamese Network,SN)来评估五种人格特质,同时对人们进行配对分析和比较。为此,我们提出了基于门控连体融合网络(GSFN)的 OCEAN-AI 框架,该框架由六个模块组成,能够融合三种模式(视频、音频和文本)的手工特征和深度特征。我们使用 ChaLearn First Impressions v2(FIv2)和多模态人格特质评估(MuPTA)语料库,发现所有六个特征集及其因信息内容不同而产生的组合使该框架能够灵活地适应异构输入数据。实验结果表明,在训练过程中对具有相同或不同人格特质(PT)的人进行配对比较可以提高所提出的框架的性能。在 FIv2 语料库中,该框架的平均准确率(mACC)比基于三种模式(视频-人脸、音频和文本)的先进系统(SOTA)高出 1.3%(0.928 vs. 0.916)。我们还在 MuPTA 语料库上使用两种模式(视频和音频),在一致性相关系数 (CCC) 方面比 SOTA 系统高出 8.6%(0.667 对 0.614)。我们公开了我们的框架,以便将其集成到招聘、教育和医疗保健等各种应用中。
{"title":"Gated Siamese Fusion Network based on multimodal deep and hand-crafted features for personality traits assessment","authors":"Elena Ryumina ,&nbsp;Maxim Markitantov ,&nbsp;Dmitry Ryumin ,&nbsp;Alexey Karpov","doi":"10.1016/j.patrec.2024.07.004","DOIUrl":"10.1016/j.patrec.2024.07.004","url":null,"abstract":"<div><p>People tend to judge others assessing their personality traits relying on life experience. This fact is especially evident when making an informed hiring decision, which should consider not only skills, but also match a company’s values and culture. Based on this assumption, we use the Siamese Network (SN) for assessing five personality traits by pairwise analyzing and comparing people simultaneously. For this, we propose the OCEAN-AI framework based on Gated Siamese Fusion Network (GSFN), which comprises six modules and enables the fusion of hand-crafted and deep features across three modalities (video, audio, and text). We use the ChaLearn First Impressions v2 (FIv2) and Multimodal Personality Traits Assessment (MuPTA) corpora and identify that all six feature sets and their combinations due to different information content allow the framework to adjust to heterogeneous input data flexibly. The experimental results show that the pairwise comparison of people with the same or different Personality Traits (PT) during the training enhances the proposed framework performance. The framework outperforms the State-of-the-Art (SOTA) systems based on three modalities (video-face, audio and text) by the relative value of 1.3% (0.928 vs. 0.916) in terms of the mean accuracy (mACC) on the FIv2 corpus. We also outperform the SOTA system in terms of the Concordance Correlation Coefficient (CCC) by the relative value of 8.6% (0.667 vs. 0.614) using two modalities (video and audio) on the MuPTA corpus. We make our framework publicly available to integrate it into various applications such as recruitment, education, and healthcare.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 45-51"},"PeriodicalIF":3.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fall detection algorithm based on global and local feature extraction 基于全局和局部特征提取的跌倒检测算法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-08 DOI: 10.1016/j.patrec.2024.07.003
Bin Li , Jiangjiao Li , Peng Wang

Falls have become one of the main causes of injury and death among the elderly. A high-accuracy fall detection method can effectively detect falls in the elderly, thereby reducing the probability of injury and mortality. This paper proposes a fall detection algorithm based on global and local feature extraction. Specifically, we design a dual-stream network, with one branch composed of a convolutional neural network and a regional attention module for extracting local features from images. The other branch consists of an improved Transformer for extracting global features from images. The local and global features are then fused using a feature fusion module for classification, enabling fall detection. Experimental results show that the proposed approach achieves accuracies of 99.55% and 99.75% when tested with UP-Fall Detection Dataset and Le2i Fall Detection Dataset.

跌倒已成为老年人受伤和死亡的主要原因之一。高精度的跌倒检测方法可以有效检测老年人跌倒,从而降低受伤和死亡的概率。本文提出了一种基于全局和局部特征提取的跌倒检测算法。具体来说,我们设计了一个双流网络,其中一个分支由卷积神经网络和区域注意力模块组成,用于从图像中提取局部特征。另一个分支由改进的变换器组成,用于从图像中提取全局特征。然后使用特征融合模块将局部特征和全局特征进行融合分类,从而实现跌倒检测。实验结果表明,在使用 UP-Fall Detection Dataset 和 Le2i Fall Detection Dataset 进行测试时,拟议方法的准确率分别达到 99.55% 和 99.75%。
{"title":"Fall detection algorithm based on global and local feature extraction","authors":"Bin Li ,&nbsp;Jiangjiao Li ,&nbsp;Peng Wang","doi":"10.1016/j.patrec.2024.07.003","DOIUrl":"10.1016/j.patrec.2024.07.003","url":null,"abstract":"<div><p>Falls have become one of the main causes of injury and death among the elderly. A high-accuracy fall detection method can effectively detect falls in the elderly, thereby reducing the probability of injury and mortality. This paper proposes a fall detection algorithm based on global and local feature extraction. Specifically, we design a dual-stream network, with one branch composed of a convolutional neural network and a regional attention module for extracting local features from images. The other branch consists of an improved Transformer for extracting global features from images. The local and global features are then fused using a feature fusion module for classification, enabling fall detection. Experimental results show that the proposed approach achieves accuracies of 99.55% and 99.75% when tested with UP-Fall Detection Dataset and Le2i Fall Detection Dataset.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 31-37"},"PeriodicalIF":3.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1