Pub Date : 2024-07-26DOI: 10.1016/j.patrec.2024.07.015
Minshen Qin , Junzheng Jiang , Fang Zhou
Haze blurs image information and reduces the visibility of objects in the image, which seriously affects the performance of computer vision applications in a hazy environment. We propose an improved dehazing model based on multi-label graph cuts. A hazy image is modeled as an undirected graph. The multi-label graph cuts algorithm divides the image into subregions according to the functions of brightness and saturation. A subregion is selected to estimate atmospheric light based on saturation. Under the similarity of transmission in the same subregion, transmission is estimated by the distance between the pixel and atmospheric light in RGB space. Finally, the transmission map is regularized to recover a haze-free image. Experiments in different scenarios demonstrate the effectiveness of the proposed method than the state-of-the-art methods.
{"title":"Single image dehazing based on multi-label graph cuts","authors":"Minshen Qin , Junzheng Jiang , Fang Zhou","doi":"10.1016/j.patrec.2024.07.015","DOIUrl":"10.1016/j.patrec.2024.07.015","url":null,"abstract":"<div><p>Haze blurs image information and reduces the visibility of objects in the image, which seriously affects the performance of computer vision applications in a hazy environment. We propose an improved dehazing model based on multi-label graph cuts. A hazy image is modeled as an undirected graph. The multi-label graph cuts algorithm divides the image into subregions according to the functions of brightness and saturation. A subregion is selected to estimate atmospheric light based on saturation. Under the similarity of transmission in the same subregion, transmission is estimated by the distance between the pixel and atmospheric light in RGB space. Finally, the transmission map is regularized to recover a haze-free image. Experiments in different scenarios demonstrate the effectiveness of the proposed method than the state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 110-116"},"PeriodicalIF":3.9,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.patrec.2024.07.014
Lin Zhao, Sijia Chen, Xu Tang, Wenbing Tao
Existing 3D instance segmentation methods usually learn the offsets (also known as center-shifted vectors) from points to their instance center for clustering and generating segmentation results. However, due to the instances with different scales, direct regression offsets will make the model pay more attention to the larger instances and ignore the smaller instances. Besides, the clustering also may fail because a single bandwidth for point grouping is insufficient for instances with different scales. To address these two problems, we propose a new framework (DualGroup) for 3D instance segmentation. For the first issue, different from directly learning the offsets, we propose an encoded center-shifted vector learning (ECSVL), which effectively compresses the range of the regression center-shifted vectors for more conducive learning of smaller instances. Second, to handle the instances with different scales in clustering, we propose a dual hierarchical grouping (DHG) to better group all points into different instances. The cooperation of these two components leads to the success of indoor instance segmentation. Moreover, the DualGroup is extended to the 3D panoptic segmentation by fusing the semantic predictions and instance results. Experimental results on the ScanNet v2 and S3DIS datasets demonstrate the effectiveness and superiority of the DualGroup.
{"title":"DualGroup for 3D instance and panoptic segmentation","authors":"Lin Zhao, Sijia Chen, Xu Tang, Wenbing Tao","doi":"10.1016/j.patrec.2024.07.014","DOIUrl":"10.1016/j.patrec.2024.07.014","url":null,"abstract":"<div><p>Existing 3D instance segmentation methods usually learn the offsets (also known as center-shifted vectors) from points to their instance center for clustering and generating segmentation results. However, due to the instances with different scales, direct regression offsets will make the model pay more attention to the larger instances and ignore the smaller instances. Besides, the clustering also may fail because a single bandwidth for point grouping is insufficient for instances with different scales. To address these two problems, we propose a new framework (DualGroup) for 3D instance segmentation. For the first issue, different from directly learning the offsets, we propose an encoded center-shifted vector learning (ECSVL), which effectively compresses the range of the regression center-shifted vectors for more conducive learning of smaller instances. Second, to handle the instances with different scales in clustering, we propose a dual hierarchical grouping (DHG) to better group all points into different instances. The cooperation of these two components leads to the success of indoor instance segmentation. Moreover, the DualGroup is extended to the 3D panoptic segmentation by fusing the semantic predictions and instance results. Experimental results on the ScanNet v2 and S3DIS datasets demonstrate the effectiveness and superiority of the DualGroup.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 124-129"},"PeriodicalIF":3.9,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141852827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.1016/j.patrec.2024.07.012
Shichang Li , Hongjie Wu , Chenwei Tang , Dongdong Chen , Yueyue Chen , Ling Mei , Fan Yang , Jiancheng Lv
Pelvic Organ Prolapse (POP) is a common disease in middle-aged and elderly women. The detection of POP is a challenging task, and using deep learning for detection has its practical significance. However, medical image detection tasks always face many problems, i.e., small sample size, data imbalance, and unobvious pathological characteristics. In this paper, we propose a new training framework, called self-supervised Domain Adaptation with Significance-Oriented Masking (DASOM), to address these problems and improve the performance of POP intelligent detection. DASOM includes a new pre-training process based on the masked image modeling task, and redesigns the masking strategy, bringing the local induction capability required for detection to the model. Meanwhile, we also adopt the data process method fitting the pelvic floor ultrasonic dataset to effectively solve the problem of data shortage and imbalance. Extensive experimental results and analysis confirm that the proposed method significantly improves the performance and reliability of POP detection.
盆腔器官脱垂(POP)是中老年妇女的常见疾病。盆腔脏器脱垂的检测是一项具有挑战性的任务,利用深度学习进行检测具有重要的现实意义。然而,医学图像检测任务始终面临着样本量小、数据不平衡、病理特征不明显等诸多问题。针对这些问题,本文提出了一种新的训练框架,即 "自监督领域适应与意义定向掩蔽(DASOM)",以提高 POP 智能检测的性能。DASOM 包括一个基于遮蔽图像建模任务的新的预训练过程,并重新设计了遮蔽策略,为模型带来了检测所需的局部归纳能力。同时,我们还采用了拟合盆底超声数据集的数据处理方法,有效地解决了数据短缺和不平衡的问题。大量的实验结果和分析证实,所提出的方法显著提高了 POP 检测的性能和可靠性。
{"title":"Self-supervised Domain Adaptation with Significance-Oriented Masking for Pelvic Organ Prolapse detection","authors":"Shichang Li , Hongjie Wu , Chenwei Tang , Dongdong Chen , Yueyue Chen , Ling Mei , Fan Yang , Jiancheng Lv","doi":"10.1016/j.patrec.2024.07.012","DOIUrl":"10.1016/j.patrec.2024.07.012","url":null,"abstract":"<div><p>Pelvic Organ Prolapse (POP) is a common disease in middle-aged and elderly women. The detection of POP is a challenging task, and using deep learning for detection has its practical significance. However, medical image detection tasks always face many problems, i.e., small sample size, data imbalance, and unobvious pathological characteristics. In this paper, we propose a new training framework, called self-supervised Domain Adaptation with Significance-Oriented Masking (DASOM), to address these problems and improve the performance of POP intelligent detection. DASOM includes a new pre-training process based on the masked image modeling task, and redesigns the masking strategy, bringing the local induction capability required for detection to the model. Meanwhile, we also adopt the data process method fitting the pelvic floor ultrasonic dataset to effectively solve the problem of data shortage and imbalance. Extensive experimental results and analysis confirm that the proposed method significantly improves the performance and reliability of POP detection.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 94-100"},"PeriodicalIF":3.9,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes an Intrusion Detection System (IDS) that utilizes Deep Reinforcement Learning (DRL) in a fine-grained manner to enhance the performance of binary and multiclass intrusion classification tasks. The proposed system, named Micro Reinforcement Learning Classifier (MRLC), is evaluated using three standard datasets. MRLC architecture utilizes a fine-grained learning approach to enhance IDS accuracy. Simulation studies demonstrate that MRLC has a high efficiency in discriminating different intrusion classes, outperforming state-of-the-art RL-based methods. The average accuracy of MRLC is 99.56%, 99.99%, 99.01% for NSL-KDD, CIC-IDS2018, and UNSW-NB15 datasets respectively. The implementation codes are available at https://github.com/boshradarabi/MICRO-RL-IDS.
{"title":"A micro Reinforcement Learning architecture for Intrusion Detection Systems","authors":"Boshra Darabi, Mozafar Bag-Mohammadi, Mojtaba Karami","doi":"10.1016/j.patrec.2024.07.010","DOIUrl":"10.1016/j.patrec.2024.07.010","url":null,"abstract":"<div><p>This paper proposes an Intrusion Detection System (IDS) that utilizes Deep Reinforcement Learning (DRL) in a fine-grained manner to enhance the performance of binary and multiclass intrusion classification tasks. The proposed system, named Micro Reinforcement Learning Classifier (MRLC), is evaluated using three standard datasets. MRLC architecture utilizes a fine-grained learning approach to enhance IDS accuracy. Simulation studies demonstrate that MRLC has a high efficiency in discriminating different intrusion classes, outperforming state-of-the-art RL-based methods. The average accuracy of MRLC is 99.56%, 99.99%, 99.01% for NSL-KDD, CIC-IDS2018, and UNSW-NB15 datasets respectively. The implementation codes are available at <span><span>https://github.com/boshradarabi/MICRO-RL-IDS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 81-86"},"PeriodicalIF":3.9,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141702404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1016/j.patrec.2024.07.011
Yeong-Jun Cho
In recent years, many semantic segmentation methods have been proposed to predict label of pixels in the scene. In general, we measure area prediction errors or boundary prediction errors for comparing methods. However, there is no intuitive evaluation metric that evaluates both aspects. In this work, we propose a new evaluation measure called weighted Intersection over Union (wIoU) for semantic segmentation. First, it builds a weight map generated from a boundary distance map, allowing weighted evaluation for each pixel based on a boundary importance factor. The proposed wIoU can evaluate both contour and region by setting a boundary importance factor. We validated the effectiveness of wIoU on a dataset of 33 scenes and demonstrated its flexibility. Using the proposed metric, we expect more flexible and intuitive evaluation in semantic segmentation field are possible.
{"title":"Weighted Intersection over Union (wIoU) for evaluating image segmentation","authors":"Yeong-Jun Cho","doi":"10.1016/j.patrec.2024.07.011","DOIUrl":"10.1016/j.patrec.2024.07.011","url":null,"abstract":"<div><p>In recent years, many semantic segmentation methods have been proposed to predict label of pixels in the scene. In general, we measure area prediction errors or boundary prediction errors for comparing methods. However, there is no intuitive evaluation metric that evaluates both aspects. In this work, we propose a new evaluation measure called weighted Intersection over Union (wIoU) for semantic segmentation. First, it builds a weight map generated from a boundary distance map, allowing weighted evaluation for each pixel based on a boundary importance factor. The proposed wIoU can evaluate both contour and region by setting a boundary importance factor. We validated the effectiveness of wIoU on a dataset of 33 scenes and demonstrated its flexibility. Using the proposed metric, we expect more flexible and intuitive evaluation in semantic segmentation field are possible.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 101-107"},"PeriodicalIF":3.9,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141884161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In practical application scenarios, the occlusion caused by various obstacles greatly undermines the accuracy of person re-identification. Most existing methods for occluded person re-identification focus on inferring visible parts of the body through auxiliary models, resulting in inaccurate feature matching of parts and ignoring the problem of insufficient occluded samples, which seriously affects the accuracy of occluded person re-identification. To address the above issues, we propose a multi-scale occlusion suppression network (MSOSNet) for occluded person re-identification. Specifically, we first propose a dual occlusion augmentation module (DOAM), which combines random occlusion with our proposed novel cross occlusion to generate more diverse occlusion data. Meanwhile, we design a novel occluded-aware spatial attention module (OSAM) to enable the network to focus on non-occluded areas of pedestrian images and effectively extract discriminative features. Ultimately, we propose a part feature matching module (PFMM) that utilizes graph matching algorithms to match non-occluded body parts of pedestrians. Extensive experimental results on both occluded and holistic datasets validate the effectiveness of our method.
{"title":"Multi-scale occlusion suppression network for occluded person re-identification","authors":"Yunzuo Zhang, Yuehui Yang, Weili Kang, Jiawen Zhen","doi":"10.1016/j.patrec.2024.07.009","DOIUrl":"10.1016/j.patrec.2024.07.009","url":null,"abstract":"<div><p>In practical application scenarios, the occlusion caused by various obstacles greatly undermines the accuracy of person re-identification. Most existing methods for occluded person re-identification focus on inferring visible parts of the body through auxiliary models, resulting in inaccurate feature matching of parts and ignoring the problem of insufficient occluded samples, which seriously affects the accuracy of occluded person re-identification. To address the above issues, we propose a multi-scale occlusion suppression network (MSOSNet) for occluded person re-identification. Specifically, we first propose a dual occlusion augmentation module (DOAM), which combines random occlusion with our proposed novel cross occlusion to generate more diverse occlusion data. Meanwhile, we design a novel occluded-aware spatial attention module (OSAM) to enable the network to focus on non-occluded areas of pedestrian images and effectively extract discriminative features. Ultimately, we propose a part feature matching module (PFMM) that utilizes graph matching algorithms to match non-occluded body parts of pedestrians. Extensive experimental results on both occluded and holistic datasets validate the effectiveness of our method.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 66-72"},"PeriodicalIF":3.9,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141717061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-view stereo (MVS) is one of the ways to obtain the 3D structure from 2D images. Deep learning is an effective end-to-end method for MVS. In previous MVS methods based on deep learning, the depth interval is deeply coupled with the feature map resolution, resulting in more accurate depth intervals accompanied by higher computational cost. This paper proposes a new deep neural network HC-MVSNet which utilizes a hybrid cascade structures for depth estimation of MVS. Different from the previous MVS methods, the new coarse-to-fine depth estimation method decouples the two processes of resolution increase and depth interval reduction through a simple operation, achieving higher reconstruction accuracy and completeness for minimal additional computational cost. In addition, an efficient depth sampling strategy based on probability distribution is introduced, which allocates higher hypothesis density for regions with a high probability of ground truth. This novel sampling method makes full use of redundant information that was previously neglected and significantly improves the textural detail of the results. Extensive experiments are conducted on DTU datasets, Tanks and Temples benchmark, and BlendedMVS datasets. The results show that the proposed method exhibits superior performance and better generalization behavior than existing MVS methods.
{"title":"HC-MVSNet: A probability sampling-based multi-view-stereo network with hybrid cascade structure for 3D reconstruction","authors":"Tianxiang Gao, Zijian Hong, Yixing Tan, Lizhuo Sun, Yichen Wei, Jianwei Ma","doi":"10.1016/j.patrec.2024.07.008","DOIUrl":"10.1016/j.patrec.2024.07.008","url":null,"abstract":"<div><p>Multi-view stereo (MVS) is one of the ways to obtain the 3D structure from 2D images. Deep learning is an effective end-to-end method for MVS. In previous MVS methods based on deep learning, the depth interval is deeply coupled with the feature map resolution, resulting in more accurate depth intervals accompanied by higher computational cost. This paper proposes a new deep neural network HC-MVSNet which utilizes a hybrid cascade structures for depth estimation of MVS. Different from the previous MVS methods, the new coarse-to-fine depth estimation method decouples the two processes of resolution increase and depth interval reduction through a simple operation, achieving higher reconstruction accuracy and completeness for minimal additional computational cost. In addition, an efficient depth sampling strategy based on probability distribution is introduced, which allocates higher hypothesis density for regions with a high probability of ground truth. This novel sampling method makes full use of redundant information that was previously neglected and significantly improves the textural detail of the results. Extensive experiments are conducted on DTU datasets, Tanks and Temples benchmark, and BlendedMVS datasets. The results show that the proposed method exhibits superior performance and better generalization behavior than existing MVS methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 59-65"},"PeriodicalIF":3.9,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141692229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1016/j.patrec.2024.07.007
Alvari Seppänen, Risto Ojala, Kari Tammi
Snowfall can cause noise to light detection and ranging (LiDAR) data. This is a problem since it is used in many outdoor applications, e.g., autonomous driving. We propose the task of multi-echo denoising, where the goal is to pick the echo that represents the objects of interest and discard other echoes. Thus, the idea is to pick points from alternative echoes unavailable in standard strongest echo point clouds. Intuitively, we are trying to see through the snowfall. We propose a novel self-supervised deep learning method and the characteristics similarity regularization to achieve this goal. The characteristics similarity regularization utilizes noise characteristics to increase performance. The experiments with a real-world multi-echo snowfall dataset prove the efficacy of multi-echo denoising and superior performance to the baseline. Moreover, based on extensive experiments on a semi-synthetic dataset, our method achieves superior performance compared to the state-of-the-art in self-supervised snowfall denoising. Our work enables more reliable point cloud acquisition in snowfall. The code is available at https://github.com/alvariseppanen/SMEDen.
{"title":"Self-supervised multi-echo point cloud denoising in snowfall","authors":"Alvari Seppänen, Risto Ojala, Kari Tammi","doi":"10.1016/j.patrec.2024.07.007","DOIUrl":"10.1016/j.patrec.2024.07.007","url":null,"abstract":"<div><p>Snowfall can cause noise to light detection and ranging (LiDAR) data. This is a problem since it is used in many outdoor applications, e.g., autonomous driving. We propose the task of multi-echo denoising, where the goal is to pick the echo that represents the objects of interest and discard other echoes. Thus, the idea is to pick points from alternative echoes unavailable in standard strongest echo point clouds. Intuitively, we are trying to see through the snowfall. We propose a novel self-supervised deep learning method and the characteristics similarity regularization to achieve this goal. The characteristics similarity regularization utilizes noise characteristics to increase performance. The experiments with a real-world multi-echo snowfall dataset prove the efficacy of multi-echo denoising and superior performance to the baseline. Moreover, based on extensive experiments on a semi-synthetic dataset, our method achieves superior performance compared to the state-of-the-art in self-supervised snowfall denoising. Our work enables more reliable point cloud acquisition in snowfall. The code is available at <span><span>https://github.com/alvariseppanen/SMEDen</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 52-58"},"PeriodicalIF":3.9,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002101/pdfft?md5=f6938a951e071593ec29e322120d28fd&pid=1-s2.0-S0167865524002101-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
People tend to judge others assessing their personality traits relying on life experience. This fact is especially evident when making an informed hiring decision, which should consider not only skills, but also match a company’s values and culture. Based on this assumption, we use the Siamese Network (SN) for assessing five personality traits by pairwise analyzing and comparing people simultaneously. For this, we propose the OCEAN-AI framework based on Gated Siamese Fusion Network (GSFN), which comprises six modules and enables the fusion of hand-crafted and deep features across three modalities (video, audio, and text). We use the ChaLearn First Impressions v2 (FIv2) and Multimodal Personality Traits Assessment (MuPTA) corpora and identify that all six feature sets and their combinations due to different information content allow the framework to adjust to heterogeneous input data flexibly. The experimental results show that the pairwise comparison of people with the same or different Personality Traits (PT) during the training enhances the proposed framework performance. The framework outperforms the State-of-the-Art (SOTA) systems based on three modalities (video-face, audio and text) by the relative value of 1.3% (0.928 vs. 0.916) in terms of the mean accuracy (mACC) on the FIv2 corpus. We also outperform the SOTA system in terms of the Concordance Correlation Coefficient (CCC) by the relative value of 8.6% (0.667 vs. 0.614) using two modalities (video and audio) on the MuPTA corpus. We make our framework publicly available to integrate it into various applications such as recruitment, education, and healthcare.
人们往往会根据生活经验来评估他人的个性特征。在做出明智的招聘决定时,这一事实尤为明显,因为招聘决定不仅要考虑技能,还要与公司的价值观和文化相匹配。基于这一假设,我们使用连体网络(Siamese Network,SN)来评估五种人格特质,同时对人们进行配对分析和比较。为此,我们提出了基于门控连体融合网络(GSFN)的 OCEAN-AI 框架,该框架由六个模块组成,能够融合三种模式(视频、音频和文本)的手工特征和深度特征。我们使用 ChaLearn First Impressions v2(FIv2)和多模态人格特质评估(MuPTA)语料库,发现所有六个特征集及其因信息内容不同而产生的组合使该框架能够灵活地适应异构输入数据。实验结果表明,在训练过程中对具有相同或不同人格特质(PT)的人进行配对比较可以提高所提出的框架的性能。在 FIv2 语料库中,该框架的平均准确率(mACC)比基于三种模式(视频-人脸、音频和文本)的先进系统(SOTA)高出 1.3%(0.928 vs. 0.916)。我们还在 MuPTA 语料库上使用两种模式(视频和音频),在一致性相关系数 (CCC) 方面比 SOTA 系统高出 8.6%(0.667 对 0.614)。我们公开了我们的框架,以便将其集成到招聘、教育和医疗保健等各种应用中。
{"title":"Gated Siamese Fusion Network based on multimodal deep and hand-crafted features for personality traits assessment","authors":"Elena Ryumina , Maxim Markitantov , Dmitry Ryumin , Alexey Karpov","doi":"10.1016/j.patrec.2024.07.004","DOIUrl":"10.1016/j.patrec.2024.07.004","url":null,"abstract":"<div><p>People tend to judge others assessing their personality traits relying on life experience. This fact is especially evident when making an informed hiring decision, which should consider not only skills, but also match a company’s values and culture. Based on this assumption, we use the Siamese Network (SN) for assessing five personality traits by pairwise analyzing and comparing people simultaneously. For this, we propose the OCEAN-AI framework based on Gated Siamese Fusion Network (GSFN), which comprises six modules and enables the fusion of hand-crafted and deep features across three modalities (video, audio, and text). We use the ChaLearn First Impressions v2 (FIv2) and Multimodal Personality Traits Assessment (MuPTA) corpora and identify that all six feature sets and their combinations due to different information content allow the framework to adjust to heterogeneous input data flexibly. The experimental results show that the pairwise comparison of people with the same or different Personality Traits (PT) during the training enhances the proposed framework performance. The framework outperforms the State-of-the-Art (SOTA) systems based on three modalities (video-face, audio and text) by the relative value of 1.3% (0.928 vs. 0.916) in terms of the mean accuracy (mACC) on the FIv2 corpus. We also outperform the SOTA system in terms of the Concordance Correlation Coefficient (CCC) by the relative value of 8.6% (0.667 vs. 0.614) using two modalities (video and audio) on the MuPTA corpus. We make our framework publicly available to integrate it into various applications such as recruitment, education, and healthcare.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 45-51"},"PeriodicalIF":3.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1016/j.patrec.2024.07.003
Bin Li , Jiangjiao Li , Peng Wang
Falls have become one of the main causes of injury and death among the elderly. A high-accuracy fall detection method can effectively detect falls in the elderly, thereby reducing the probability of injury and mortality. This paper proposes a fall detection algorithm based on global and local feature extraction. Specifically, we design a dual-stream network, with one branch composed of a convolutional neural network and a regional attention module for extracting local features from images. The other branch consists of an improved Transformer for extracting global features from images. The local and global features are then fused using a feature fusion module for classification, enabling fall detection. Experimental results show that the proposed approach achieves accuracies of 99.55% and 99.75% when tested with UP-Fall Detection Dataset and Le2i Fall Detection Dataset.
{"title":"Fall detection algorithm based on global and local feature extraction","authors":"Bin Li , Jiangjiao Li , Peng Wang","doi":"10.1016/j.patrec.2024.07.003","DOIUrl":"10.1016/j.patrec.2024.07.003","url":null,"abstract":"<div><p>Falls have become one of the main causes of injury and death among the elderly. A high-accuracy fall detection method can effectively detect falls in the elderly, thereby reducing the probability of injury and mortality. This paper proposes a fall detection algorithm based on global and local feature extraction. Specifically, we design a dual-stream network, with one branch composed of a convolutional neural network and a regional attention module for extracting local features from images. The other branch consists of an improved Transformer for extracting global features from images. The local and global features are then fused using a feature fusion module for classification, enabling fall detection. Experimental results show that the proposed approach achieves accuracies of 99.55% and 99.75% when tested with UP-Fall Detection Dataset and Le2i Fall Detection Dataset.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 31-37"},"PeriodicalIF":3.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}