Pattern Recognition Letters最新文献

英文中文

MiniMedGPT: Efficient Large Vision–Language Model for medical Visual Question Answering

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-01-08 DOI: 10.1016/j.patrec.2025.01.001

Abdel Rahman Alsabbagh , Tariq Mansour , Mohammad Al-Kharabsheh , Abdel Salam Ebdah , Roa’a Al-Emaryeen , Sara Al-Nahhas , Waleed Mahafza , Omar Al-Kadi

While Large Vision–Language Models (LVLMs) like GPT-4 and Gemini demonstrate significant potential, their utilization in the medical domain remains largely unexplored. This is due to challenges attributed to prolonged training and language generation issues. Imbalances within medical Visual Question Answering (VQA) datasets further complicate the integration of LVLMs. In this paper, we present a novel approach named MiniMedGPT (Mini Medical Generative Pretrained Transformer). Inspired by MiniGPT4-v2, MiniMedGPT is specifically designed for efficient medical VQA. The framework of MiniMedGPT is built upon both medical and generic pretrained Large Language Models and features an end-to-end versatile fine-tuning pipeline that enables the alignment of medical VQA data in just 30 min within a single-stage framework. To address language generation shortcomings and dataset imbalances, we employ Gemini Vision Pro and MediCap using them as an auxiliary component. Through comprehensive benchmarking and evaluations against 6 prominent medical VQA models across 2 well-known datasets, our approach brings an improved performance with the least number of trainable parameters against competitors across various performance metrics. This work can help train junior clinicians and has the potential to serve as a decision support tool for experienced radiologists.¹

{"title":"MiniMedGPT: Efficient Large Vision–Language Model for medical Visual Question Answering","authors":"Abdel Rahman Alsabbagh , Tariq Mansour , Mohammad Al-Kharabsheh , Abdel Salam Ebdah , Roa’a Al-Emaryeen , Sara Al-Nahhas , Waleed Mahafza , Omar Al-Kadi","doi":"10.1016/j.patrec.2025.01.001","DOIUrl":"10.1016/j.patrec.2025.01.001","url":null,"abstract":"<div><div>While Large Vision–Language Models (LVLMs) like GPT-4 and Gemini demonstrate significant potential, their utilization in the medical domain remains largely unexplored. This is due to challenges attributed to prolonged training and language generation issues. Imbalances within medical Visual Question Answering (VQA) datasets further complicate the integration of LVLMs. In this paper, we present a novel approach named <strong>MiniMedGPT</strong> (<strong>Mini Med</strong>ical <strong>G</strong>enerative <strong>P</strong>retrained <strong>T</strong>ransformer). Inspired by MiniGPT4-v2, MiniMedGPT is specifically designed for efficient medical VQA. The framework of MiniMedGPT is built upon both medical and generic pretrained Large Language Models and features an end-to-end versatile fine-tuning pipeline that enables the alignment of medical VQA data in just 30 min within a single-stage framework. To address language generation shortcomings and dataset imbalances, we employ Gemini Vision Pro and MediCap using them as an auxiliary component. Through comprehensive benchmarking and evaluations against 6 prominent medical VQA models across 2 well-known datasets, our approach brings an improved performance with the least number of trainable parameters against competitors across various performance metrics. This work can help train junior clinicians and has the potential to serve as a decision support tool for experienced radiologists.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 8-16"},"PeriodicalIF":3.9,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143178094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal information fusion and artificial intelligence approaches for sustainable computing in data centers

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-01-06 DOI: 10.1016/j.patrec.2024.12.006

Xinyi Wu, Aiping He

With the rapid expansion of cloud computing, artificial intelligence, and big data analytics, data centers have become integral to modern digital infrastructure. However, their escalating energy consumption poses significant challenges for sustainable operations. This paper presents a novel multimodal data fusion algorithm aimed at optimizing energy management in data centers. By integrating environmental sensor data, system logs, and visual information, we constructed a comprehensive framework for analyzing energy consumption patterns. Experiments conducted on publicly available datasets validated the algorithm’s effectiveness in energy prediction, enhancing energy efficiency, and optimizing server loads. Results indicate that the proposed method outperforms traditional baseline algorithms such as Support Vector Machines, Random Forest, and Long Short-Term Memory networks across multiple evaluation metrics. Additionally, the algorithm demonstrates good computational efficiency, making it suitable for deployment in large-scale data centers. Our research provides a significant theoretical foundation and practical guidance for sustainable energy management.

{"title":"Multimodal information fusion and artificial intelligence approaches for sustainable computing in data centers","authors":"Xinyi Wu, Aiping He","doi":"10.1016/j.patrec.2024.12.006","DOIUrl":"10.1016/j.patrec.2024.12.006","url":null,"abstract":"<div><div>With the rapid expansion of cloud computing, artificial intelligence, and big data analytics, data centers have become integral to modern digital infrastructure. However, their escalating energy consumption poses significant challenges for sustainable operations. This paper presents a novel multimodal data fusion algorithm aimed at optimizing energy management in data centers. By integrating environmental sensor data, system logs, and visual information, we constructed a comprehensive framework for analyzing energy consumption patterns. Experiments conducted on publicly available datasets validated the algorithm’s effectiveness in energy prediction, enhancing energy efficiency, and optimizing server loads. Results indicate that the proposed method outperforms traditional baseline algorithms such as Support Vector Machines, Random Forest, and Long Short-Term Memory networks across multiple evaluation metrics. Additionally, the algorithm demonstrates good computational efficiency, making it suitable for deployment in large-scale data centers. Our research provides a significant theoretical foundation and practical guidance for sustainable energy management.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 17-22"},"PeriodicalIF":3.9,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143178092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EB-CNN: Ensemble of branch convolutional neural network for image classification

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-01-04 DOI: 10.1016/j.patrec.2024.12.017

Azizi Abdullah , Wei Soong Wong , Dheeb Albashish

Traditionally, image classifiers using Convolutional Neural Networks (CNNs) have all their outputs combined into a single layer. This assumes all categories are equally distinct and independent. However, some classes are harder to distinguish by using just this single output layer for classification due less flexibility of the model to learn complex relationships and representations within the data. Different classes may require different levels of abstraction or representation, which cannot be adequately captured by a single output layer. This paper proposes an ensemble method that combine different layers or branches of CNN network. The approach divides the CNN network i.e. VGG16 into five different distinct branches to simulate the coarse, intermediate and fine spatial scale corresponding to the hierarchical structure of the deep learning network. However, a possible problem with combining all branch models to create a dense pool of candidate for ensemble learning is that the potential lack of diversity among the classifier models, which can hinder the ensemble’s ability to generalize and may lead to suboptimal performance. Therefore, in order to improve the predictive performance, we designed a heuristic ensemble selection method that chooses the relevant models from the pool of saved models based on the their accuracy. We have performed experiments on 6 different datasets. The results show that our approach outperforms the baseline CNN model that rely on the single layer for making a final decision.

{"title":"EB-CNN: Ensemble of branch convolutional neural network for image classification","authors":"Azizi Abdullah , Wei Soong Wong , Dheeb Albashish","doi":"10.1016/j.patrec.2024.12.017","DOIUrl":"10.1016/j.patrec.2024.12.017","url":null,"abstract":"<div><div>Traditionally, image classifiers using Convolutional Neural Networks (CNNs) have all their outputs combined into a single layer. This assumes all categories are equally distinct and independent. However, some classes are harder to distinguish by using just this single output layer for classification due less flexibility of the model to learn complex relationships and representations within the data. Different classes may require different levels of abstraction or representation, which cannot be adequately captured by a single output layer. This paper proposes an ensemble method that combine different layers or branches of CNN network. The approach divides the CNN network i.e. VGG16 into five different distinct branches to simulate the coarse, intermediate and fine spatial scale corresponding to the hierarchical structure of the deep learning network. However, a possible problem with combining all branch models to create a dense pool of candidate for ensemble learning is that the potential lack of diversity among the classifier models, which can hinder the ensemble’s ability to generalize and may lead to suboptimal performance. Therefore, in order to improve the predictive performance, we designed a heuristic ensemble selection method that chooses the relevant models from the pool of saved models based on the their accuracy. We have performed experiments on 6 different datasets. The results show that our approach outperforms the baseline CNN model that rely on the single layer for making a final decision.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143178093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural network modelling of kinematic and dynamic features for signature verification

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-01-01 DOI: 10.1016/j.patrec.2024.11.021

Moises Diaz , Miguel A. Ferrer , Jose Juan Quintana , Adam Wolniakowski , Roman Trochimczuk , Kanstantsin Miatliuk , Giovanna Castellano , Gennaro Vessio

Online signature parameters, which are based on human characteristics, broaden the applicability of an automatic signature verifier. Although kinematic and dynamic features have previously been suggested, accurately measuring features such as arm and forearm torques remains challenging. We present two approaches for estimating angular velocities, angular positions, and force torques. The first approach involves using a physical UR5e robotic arm to reproduce a signature while capturing those parameters over time. The second method, a cost-effective approach, uses a neural network to estimate the same parameters. Our findings demonstrate that a simple neural network model can extract effective parameters for signature verification. Training the neural network with the MCYT300 dataset and cross-validating with other databases, namely, BiosecurID, Visual, Blind, OnOffSigDevanagari-75 and OnOffSigBengali-75 confirm the model’s generalization capability. The trained model is available at: https://github.com/gvessio/SignatureKinematics.

{"title":"Neural network modelling of kinematic and dynamic features for signature verification","authors":"Moises Diaz , Miguel A. Ferrer , Jose Juan Quintana , Adam Wolniakowski , Roman Trochimczuk , Kanstantsin Miatliuk , Giovanna Castellano , Gennaro Vessio","doi":"10.1016/j.patrec.2024.11.021","DOIUrl":"10.1016/j.patrec.2024.11.021","url":null,"abstract":"<div><div>Online signature parameters, which are based on human characteristics, broaden the applicability of an automatic signature verifier. Although kinematic and dynamic features have previously been suggested, accurately measuring features such as arm and forearm torques remains challenging. We present two approaches for estimating angular velocities, angular positions, and force torques. The first approach involves using a physical UR5e robotic arm to reproduce a signature while capturing those parameters over time. The second method, a cost-effective approach, uses a neural network to estimate the same parameters. Our findings demonstrate that a simple neural network model can extract effective parameters for signature verification. Training the neural network with the MCYT300 dataset and cross-validating with other databases, namely, BiosecurID, Visual, Blind, OnOffSigDevanagari-75 and OnOffSigBengali-75 confirm the model’s generalization capability. The trained model is available at: <span><span>https://github.com/gvessio/SignatureKinematics</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 130-136"},"PeriodicalIF":3.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143168081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A 3D wrist motion-based sign language video summarization technique

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-12-30 DOI: 10.1016/j.patrec.2024.12.015

Evangelos G. Sartinas, Emmanouil Z. Psarakis, Dimitrios I. Kosmopoulos

An interesting problem in many video-based applications is the generation of short synopses by selecting the most informative frames, a procedure which is known as video summarization. For sign language videos the benefits of using the

t

-parameterized counterpart of the curvature of the 2-D signer’s wrist trajectory to identify keyframes, have been reported in the literature [1]. In this paper we extend these ideas by modeling the 3-D hand motion that is extracted from each frame of the video. To this end we propose a new informative function based on the

t

-parameterized curvature and torsion of the 3-D trajectory. The method to characterize video frames as keyframes depends on whether the motion occurs in 2-D or 3-D space. Specifically, in the case of 3-D motion we look for the maxima of the harmonic mean of the curvature and torsion of the target’s trajectory; in the planar motion case we seek for the maxima of the trajectory’s curvature. The proposed 3-D feature is experimentally evaluated in applications of sign language videos on (1) objective measures using ground-truth keyframe annotations, (2) human-based evaluation of understanding, and (3) in the gloss classification problem. The results obtained are promising.

{"title":"A 3D wrist motion-based sign language video summarization technique","authors":"Evangelos G. Sartinas, Emmanouil Z. Psarakis, Dimitrios I. Kosmopoulos","doi":"10.1016/j.patrec.2024.12.015","DOIUrl":"10.1016/j.patrec.2024.12.015","url":null,"abstract":"<div><div>An interesting problem in many video-based applications is the generation of short synopses by selecting the most informative frames, a procedure which is known as video summarization. For sign language videos the benefits of using the <span><math><mi>t</mi></math></span>-parameterized counterpart of the curvature of the 2-D signer’s wrist trajectory to identify keyframes, have been reported in the literature <span><span>[1]</span></span>. In this paper we extend these ideas by modeling the 3-D hand motion that is extracted from each frame of the video. To this end we propose a new informative function based on the <span><math><mi>t</mi></math></span>-parameterized curvature and torsion of the 3-D trajectory. The method to characterize video frames as keyframes depends on whether the motion occurs in 2-D or 3-D space. Specifically, in the case of 3-D motion we look for the maxima of the harmonic mean of the curvature and torsion of the target’s trajectory; in the planar motion case we seek for the maxima of the trajectory’s curvature. The proposed 3-D feature is experimentally evaluated in applications of sign language videos on (1) objective measures using ground-truth keyframe annotations, (2) human-based evaluation of understanding, and (3) in the gloss classification problem. The results obtained are promising.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 23-30"},"PeriodicalIF":3.9,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143176641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GAF-Net: A new automated segmentation method based on multiscale feature fusion and feedback module GAF-Net：一种基于多尺度特征融合和反馈模块的自动分割方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-26 DOI: 10.1016/j.patrec.2024.11.025

Long Wen , Yuxing Ye , Lei Zuo

Surface defect detection (SDD) is the necessary technique to monitor the surface quality of production. However, fine grain defects caused by stress loading, environmental influences, and construction defects is still a challenge to detect. In this research, the convolutional neural network for crack segmentation is developed based on the feature fusion and feedback on the global features and multi-scale feature (GAF-Net). First, a multi-scale feature feedback module (MSFF) is proposed, which uses four different scales to refine local features by fusing high-level and sub-high-level features to perform feedback correction. Secondly, the global feature module (GF) is proposed to generate a fine global information map using local features and adaptive weighted fusion with the correction map for crack detection. Finally, the GAF-Net network with multi-level feature maps is deeply supervised to accelerate GAF-Net and improve the detection accuracy. GAF-Net is trained and experimented on three publicly available pavement crack datasets, and the results show that GAF-Net achieves state-of-the-art results in the IoU segmentation metrics when compared to other deep learning methods (Crackforest: 53.61 %; Crack500: 65.19 %; DeepCrack: 81.63 %).

表面缺陷检测（SDD）是监控生产表面质量的必要技术。然而，由应力载荷、环境影响和施工缺陷引起的细粒缺陷检测仍然是一个挑战。在本研究中，基于全局特征和多尺度特征的特征融合和反馈（GAF-Net），开发了用于裂缝分割的卷积神经网络。首先，提出了一种多尺度特征反馈模块（MSFF），该模块采用四种不同尺度对局部特征进行细化，融合高级特征和次高级特征进行反馈校正；其次，提出了全局特征模块（GF），利用局部特征和自适应加权融合与校正图生成精细的全局信息图，用于裂纹检测；最后，对具有多层次特征映射的GAF-Net网络进行深度监督，以加快GAF-Net的速度，提高检测精度。在三个公开的路面裂缝数据集上对GAF-Net进行了训练和实验，结果表明，与其他深度学习方法相比，GAF-Net在IoU分割指标上取得了最先进的结果(Crackforest: 53.61%；Crack500: 65.19%；DeepCrack: 81.63%)。

{"title":"GAF-Net: A new automated segmentation method based on multiscale feature fusion and feedback module","authors":"Long Wen , Yuxing Ye , Lei Zuo","doi":"10.1016/j.patrec.2024.11.025","DOIUrl":"10.1016/j.patrec.2024.11.025","url":null,"abstract":"<div><div>Surface defect detection (SDD) is the necessary technique to monitor the surface quality of production. However, fine grain defects caused by stress loading, environmental influences, and construction defects is still a challenge to detect. In this research, the convolutional neural network for crack segmentation is developed based on the feature fusion and feedback on the global features and multi-scale feature (GAF-Net). First, a multi-scale feature feedback module (MSFF) is proposed, which uses four different scales to refine local features by fusing high-level and sub-high-level features to perform feedback correction. Secondly, the global feature module (GF) is proposed to generate a fine global information map using local features and adaptive weighted fusion with the correction map for crack detection. Finally, the GAF-Net network with multi-level feature maps is deeply supervised to accelerate GAF-Net and improve the detection accuracy. GAF-Net is trained and experimented on three publicly available pavement crack datasets, and the results show that GAF-Net achieves state-of-the-art results in the IoU segmentation metrics when compared to other deep learning methods (Crackforest: 53.61 %; Crack500: 65.19 %; DeepCrack: 81.63 %).</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 86-92"},"PeriodicalIF":3.9,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bilateral symmetry-based augmentation method for improved tooth segmentation in panoramic X-rays 基于双侧对称增强的全景x射线牙齿分割改进方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-26 DOI: 10.1016/j.patrec.2024.11.023

Sanket Wathore, Subrahmanyam Gorthi

Panoramic X-rays are crucial in dental radiology, providing detailed images that are essential for diagnosing and planning treatment for various oral conditions. The advent of automated methods that learn from annotated data promises to significantly aid clinical experts in making accurate diagnoses. However, these methods often require large amounts of annotated data, making the generation of high-quality annotations for panoramic X-rays both challenging and time-consuming. This paper introduces a novel bilateral symmetry-based augmentation method specifically designed to enhance tooth segmentation in panoramic X-rays. By exploiting the inherent bilateral symmetry of these images, our proposed method systematically generates augmented data, leading to substantial improvements in the performance of tooth segmentation models. By increasing the training data size fourfold, our approach proportionately reduces the effort required to manually annotate extensive datasets. These findings highlight the potential of leveraging the symmetrical properties of medical images to enhance model performance and accuracy in dental radiology. The effectiveness of the proposed method is evaluated on three widely adopted deep learning models: U-Net, SE U-Net, and TransUNet. Significant improvements in segmentation accuracy are observed with the proposed augmentation method across all models. For example, the average Dice Similarity Coefficient (DSC) increases by over 8%, reaching 76.7% for TransUNet. Further, comparisons with existing augmentation methods, including rigid transform-based and elastic grid-based techniques, show that the proposed method consistently outperforms them with additional improvements up to 5% in terms of average DSC, with the exact improvement varying depending on the model and training dataset size. We have made the data augmentation codes and tools developed based on our method available at https://github.com/wathoresanket/bilateralsymmetrybasedaugmentation.

全景x光在牙科放射学中是至关重要的，它提供了对各种口腔疾病的诊断和计划治疗至关重要的详细图像。从带注释的数据中学习的自动化方法的出现有望极大地帮助临床专家做出准确的诊断。然而，这些方法通常需要大量的注释数据，使得为全景x射线生成高质量的注释既具有挑战性又耗时。本文介绍了一种新的基于双侧对称的增强方法，专门用于增强全景x射线中的牙齿分割。通过利用这些图像固有的双边对称性，我们提出的方法系统地生成增强数据，从而大大提高了牙齿分割模型的性能。通过将训练数据大小增加四倍，我们的方法按比例减少了手动注释大量数据集所需的工作量。这些发现突出了利用医学图像的对称特性来提高牙科放射学模型性能和准确性的潜力。该方法的有效性在三种广泛采用的深度学习模型上进行了评估：U-Net、SE U-Net和TransUNet。在所有模型中，所提出的增强方法都显著提高了分割精度。例如，平均骰子相似系数（DSC）增加了8%以上，TransUNet达到76.7%。此外，与现有的增强方法（包括基于刚性变换和基于弹性网格的技术）的比较表明，所提出的方法在平均DSC方面的额外改进始终优于它们，最高可达5%，具体改进取决于模型和训练数据集的大小。我们已经在https://github.com/wathoresanket/bilateralsymmetrybasedaugmentation上提供了基于我们的方法开发的数据增强代码和工具。

{"title":"Bilateral symmetry-based augmentation method for improved tooth segmentation in panoramic X-rays","authors":"Sanket Wathore, Subrahmanyam Gorthi","doi":"10.1016/j.patrec.2024.11.023","DOIUrl":"10.1016/j.patrec.2024.11.023","url":null,"abstract":"<div><div>Panoramic X-rays are crucial in dental radiology, providing detailed images that are essential for diagnosing and planning treatment for various oral conditions. The advent of automated methods that learn from annotated data promises to significantly aid clinical experts in making accurate diagnoses. However, these methods often require large amounts of annotated data, making the generation of high-quality annotations for panoramic X-rays both challenging and time-consuming. This paper introduces a novel bilateral symmetry-based augmentation method specifically designed to enhance tooth segmentation in panoramic X-rays. By exploiting the inherent bilateral symmetry of these images, our proposed method systematically generates augmented data, leading to substantial improvements in the performance of tooth segmentation models. By increasing the training data size fourfold, our approach proportionately reduces the effort required to manually annotate extensive datasets. These findings highlight the potential of leveraging the symmetrical properties of medical images to enhance model performance and accuracy in dental radiology. The effectiveness of the proposed method is evaluated on three widely adopted deep learning models: U-Net, SE U-Net, and TransUNet. Significant improvements in segmentation accuracy are observed with the proposed augmentation method across all models. For example, the average Dice Similarity Coefficient (DSC) increases by over 8%, reaching 76.7% for TransUNet. Further, comparisons with existing augmentation methods, including rigid transform-based and elastic grid-based techniques, show that the proposed method consistently outperforms them with additional improvements up to 5% in terms of average DSC, with the exact improvement varying depending on the model and training dataset size. We have made the data augmentation codes and tools developed based on our method available at <span><span>https://github.com/wathoresanket/bilateralsymmetrybasedaugmentation</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Segmentation of MRI tumors and pelvic anatomy via cGAN-synthesized data and attention-enhanced U-Net 通过cgan合成数据和注意增强U-Net分割MRI肿瘤和骨盆解剖

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-24 DOI: 10.1016/j.patrec.2024.11.003

Mudassar Ali , Haoji Hu , Tong Wu , Maryam Mansoor , Qiong Luo , Weizeng Zheng , Neng Jin

Accurate tumor segmentation within MRI images is of great importance for both diagnosis and treatment; however, in many cases, sufficient annotated datasets may not be available. This paper develops a novel approach to the medical image segmentation of tumors in the brain, liver, and pelvic regions within MRI images, by combining an attention-enhanced U-Net model with a cGAN. We introduce three key novelties: a patch discriminator in the cGAN to enhance realism of generated images, attention mechanisms in the U-Net to enhance the accuracy of segmentation, and finally an application to pelvic MRI segmentation, which has seen little exploration. Our method addresses the issue of limited availability of annotated data by generating realistic synthetic images to augment the process of training. Our experimental results on brain, liver, and pelvic MRI datasets show that our approach outperforms the state-of-the-art methods with a Dice Coefficient of 98.61 % for brain MRI, 88.60 % for liver MRI, and 91.93 % for pelvic MRI. We can also observe great increases in the Hausdorff Distance, at especially complex anatomical regions such as tumor boundaries. The proposed combination of synthetic data creation and novel segmentation techniques opens new perspectives for robust medical image segmentation.

MRI图像中肿瘤的准确分割对于诊断和治疗都具有重要意义。然而，在许多情况下，可能没有足够的带注释的数据集。本文通过将注意力增强的U-Net模型与cGAN相结合，开发了一种新的方法来分割MRI图像中大脑、肝脏和骨盆区域的肿瘤。我们介绍了三个关键的新技术：cGAN中的补丁鉴别器以增强生成图像的真实感，U-Net中的注意机制以提高分割的准确性，最后是骨盆MRI分割的应用，这方面的探索很少。我们的方法通过生成真实的合成图像来增强训练过程，解决了标注数据可用性有限的问题。我们在脑、肝和骨盆MRI数据集上的实验结果表明，我们的方法优于最先进的方法，脑MRI的Dice系数为98.61%，肝脏MRI为88.60%，骨盆MRI为91.93%。我们还可以观察到豪斯多夫距离的大幅增加，特别是在复杂的解剖区域，如肿瘤边界。提出的合成数据创建和新分割技术的结合为鲁棒医学图像分割开辟了新的视角。

{"title":"Segmentation of MRI tumors and pelvic anatomy via cGAN-synthesized data and attention-enhanced U-Net","authors":"Mudassar Ali , Haoji Hu , Tong Wu , Maryam Mansoor , Qiong Luo , Weizeng Zheng , Neng Jin","doi":"10.1016/j.patrec.2024.11.003","DOIUrl":"10.1016/j.patrec.2024.11.003","url":null,"abstract":"<div><div>Accurate tumor segmentation within MRI images is of great importance for both diagnosis and treatment; however, in many cases, sufficient annotated datasets may not be available. This paper develops a novel approach to the medical image segmentation of tumors in the brain, liver, and pelvic regions within MRI images, by combining an attention-enhanced U-Net model with a cGAN. We introduce three key novelties: a patch discriminator in the cGAN to enhance realism of generated images, attention mechanisms in the U-Net to enhance the accuracy of segmentation, and finally an application to pelvic MRI segmentation, which has seen little exploration. Our method addresses the issue of limited availability of annotated data by generating realistic synthetic images to augment the process of training. Our experimental results on brain, liver, and pelvic MRI datasets show that our approach outperforms the state-of-the-art methods with a Dice Coefficient of 98.61 % for brain MRI, 88.60 % for liver MRI, and 91.93 % for pelvic MRI. We can also observe great increases in the Hausdorff Distance, at especially complex anatomical regions such as tumor boundaries. The proposed combination of synthetic data creation and novel segmentation techniques opens new perspectives for robust medical image segmentation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"187 ","pages":"Pages 100-106"},"PeriodicalIF":3.9,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incremental component tree contour computation 增量分量树轮廓计算

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-23 DOI: 10.1016/j.patrec.2024.11.019

Dennis J. Silva , Jiří Kosinka , Ronaldo F. Hashimoto , Jos B.T.M. Roerdink , Alexandre Morimitsu , Wonder A.L. Alves

A component tree is a graph representation that encodes the connected components of the upper or lower level sets of a grayscale image. Consequently, the nodes of a component tree represent binary images of the encoded connected components. There exist various algorithms that efficiently extract information and attributes of nodes of a component tree by incrementally exploiting the subset relation encoding in the tree. However, to the best of our knowledge, there is no such incremental approach to extract the contours of the nodes. In this paper, we propose an efficient incremental method to compute the contours of the nodes of a component tree by counting the edges (sides) of contour pixels. In addition, we discuss our method’s time complexity. We also experimentally show that our proposed method is faster than the standard approach based on node reconstruction.

组件树是一种图形表示，它对灰度图像的上层或下层集的连接组件进行编码。因此，组件树的节点表示编码的连接组件的二值图像。目前已有多种算法通过增量利用组件树中的子集关系编码，有效地提取组件树节点的信息和属性。然而，据我们所知，目前还没有这样的增量方法来提取节点的轮廓。在本文中，我们提出了一种有效的增量方法，通过计算轮廓像素的边（边）来计算组件树节点的轮廓。此外，我们还讨论了该方法的时间复杂度。实验结果表明，该方法比基于节点重构的标准方法更快。

引用次数: 0

Multichannel image classification based on adaptive attribute profiles 基于自适应属性轮廓的多通道图像分类

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2024-11-23 DOI: 10.1016/j.patrec.2024.11.015

Wonder A.L. Alves , Wander S. Campos , Charles F. Gobber , Dennis J. Silva , Ronaldo F. Hashimoto

Morphological Attribute Profiles serve as powerful tools for extracting meaningful features from remote sensing data. The construction of Morphological Attribute Profiles relies on two primary parameters: the choice of attribute type and the definition of a numerical threshold sequence. However, selecting an appropriate threshold sequence can be a difficult task, as an inappropriate choice can lead to an uninformative feature space. In this paper, we propose a semi-automatic approach based on the theory of Maximally Stable Extremal Regions to address this challenge. Our approach takes an increasing attribute type and an initial sequence of thresholds as input and locally adjusts threshold values based on region stability within the image. Experimental results demonstrate that our method significantly increases classification accuracy through the refinement of threshold values.

形态属性轮廓是提取遥感数据中有意义特征的有力工具。形态学属性概况的构建依赖于两个主要参数：属性类型的选择和数值阈值序列的定义。然而，选择合适的阈值序列可能是一项困难的任务，因为不适当的选择可能导致信息不足的特征空间。在本文中，我们提出了一种基于极大稳定极区理论的半自动方法来解决这一挑战。我们的方法采用增加属性类型和初始阈值序列作为输入，并根据图像内的区域稳定性局部调整阈值。实验结果表明，该方法通过对阈值的细化，显著提高了分类精度。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Pattern Recognition Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀