首页 > 最新文献

Informatics in Medicine Unlocked最新文献

英文 中文
Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity 在公共卫生中使用模糊c均值聚类和PCA:一种对抗心血管疾病和肥胖的机器学习方法
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101666
Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili
This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.
The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m2), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m2), and HOMA-IR (6.05), suggesting a latent risk group.
PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.
The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.
Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.
本研究引入了一个新的框架,将主成分分析(PCA)与模糊c均值(FCM)聚类相结合,以增强对高维健康数据的分析,特别是针对心血管疾病(CVD)和肥胖的危险人群的识别。这种在公共卫生领域从未探索过的独特方法有望为这些紧迫的卫生问题提供新的见解和解决办法。提出的PCA-FCM模型应用于一个包含20多个健康变量的数据集,这些健康变量来自18-75岁的人口样本。分析确定了四个不同的集群,每个集群都显示出独特的风险模式。例如,第一组(平均年龄29岁)表现出高体重指数(BMI)(平均33.7 kg/m2)、高腰围(113 cm)和胰岛素抵抗的迹象(FBS, 133 mg/dL;HOMA-IR, 7.12)。相比之下,第二组患者(平均年龄61岁)收缩压最高(收缩压143 mmHg),低密度脂蛋白胆固醇升高(4.27 mmol/L),甘油三酯升高(2.59 mmol/L),表明晚期代谢综合征。第三组(平均年龄51岁)代谢状况更健康,HOMA-IR较低(3.74),收缩压正常(127 mmHg),脂质水平平衡(HDL, 1.36 mmol/L)。第四组(平均年龄43岁)的收缩压升高(134 mmHg), BMI升高(32.1 kg/m2), HOMA-IR升高(6.05),提示为潜在危险组。PCA发现,腰围、内脏脂肪、LDL/HDL比、非HDL胆固醇和腰高比是影响聚类分离的最重要变量,前两个主成分的负荷均在0.70以上。同时,运动、身高、家族史和HDL的负荷均低于0.30,表明对群集形成的影响最小。模型评价支持四聚类解决方案的选择,剪影得分为0.62,聚类间方差占总方差的64%,表明聚类定义良好,具有凝聚力。尽管该框架提高了聚类精度并揭示了临床可操作的模式,但要在现实世界的医疗保健系统中成功实施,必须解决诸如健康数据隐私问题、临床医生在解释PCA结果方面的困难、验证模型在不同人群中的普遍性以及低资源环境中的技术资源限制等挑战。未来的研究应纳入纵向数据,并探索与深度学习等先进模型的融合,以提高预测的准确性和在实时临床环境中的适应性。
{"title":"Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity","authors":"Gamal Saad Mohamed Khamis ,&nbsp;Nasser S. Alqahtani ,&nbsp;Sultan Munadi Alanazi ,&nbsp;Mohammed Muharrab Alruwaili ,&nbsp;Mariam Shabram Alenazi ,&nbsp;Maneaf A. Alrawaili","doi":"10.1016/j.imu.2025.101666","DOIUrl":"10.1016/j.imu.2025.101666","url":null,"abstract":"<div><div>This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.</div><div>The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m<sup>2</sup>), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m<sup>2</sup>), and HOMA-IR (6.05), suggesting a latent risk group.</div><div>PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.</div><div>The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.</div><div>Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101666"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revolutionizing heart attack prognosis: Introducing an innovative regression model for prediction 革命性的心脏病预测:引入一种创新的预测回归模型
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101664
Hanaa Albanna , Madhav Raj Theeng Tamang , Chandan Patel , Mhd Saeed Sharif

Objective:

Heart attack prediction using machine learning is crucial for preemptive action and personalized healthcare. This research aims to predict heart attacks by employing machine learning in healthcare using a diverse range of patient data-including demographic, lifestyle, and physiological factors, which helps to create robust and generalizable predictions. Besides this, various models that balance accuracy with interpretability have been presented, emphasizing early detection and proactive intervention. It is expected that this cross-disciplinary approach will underline the role of machine learning in the mitigation of the heart disease burden and optimization of resources spent on healthcare.

Methods:

This study explores the application of machine learning techniques for predicting heart attack risk using structured clinical data. A range of classification models — Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) — were selected based on their proven effectiveness in prior healthcare prediction studies and their balance between accuracy and interpretability. The methodology involved comprehensive data preprocessing, class imbalance handling, and hyperparameter tuning to optimize model performance. Performance metrics included Accuracy, Precision, Recall, F1-score, and AUC-ROC. Exploratory Data Analysis (EDA) was conducted to assess the role of variables such as BMI, age, and glucose levels in predicting stroke, a proxy used for heart attack due to dataset limitations.

Results:

The SVM and LR models achieved the highest accuracy (95.08%), followed by RF (94.86%) and DT (91.46%). Despite high accuracy, key challenges were observed:
Class Imbalance: Only 249 cases in the dataset represented positive stroke outcomes, resulting in poor recall for minority class predictions. This reduced the model’s sensitivity to actual stroke cases, a significant limitation in clinical scenarios where false negatives can be life-threatening.
Data-Label Inconsistency: Although the study is framed as predicting heart attacks, the dataset pertains to stroke prediction. This misalignment creates confusion in the clinical relevance of the findings and weakens the generalizability of the models for heart attack risk assessment.
Lack of Model Interpretability in Practice: Though LIME and SHAP were cited as tools to ensure model transparency, they were not implemented or evaluated. This limits clinicians’ trust in the model’s predictions—an essential factor for real-world adoption.

Conclusion:

This research shows how machine learning can play a meaningful role in improving how we predict heart attacks and ultimately help improve patient care. The results demonstrated that even well-known models like Support Vector Machine and Logistic Regression can perform very well when applied to s
目的:利用机器学习预测心脏病发作对预防行动和个性化医疗至关重要。本研究旨在通过使用各种患者数据(包括人口统计、生活方式和生理因素)在医疗保健中使用机器学习来预测心脏病发作,这有助于创建稳健且可推广的预测。除此之外,还提出了各种平衡准确性和可解释性的模型,强调早期发现和主动干预。预计这种跨学科方法将强调机器学习在减轻心脏病负担和优化医疗保健资源方面的作用。方法:本研究探讨了机器学习技术在利用结构化临床数据预测心脏病发作风险中的应用。选择了一系列分类模型——逻辑回归(LR)、支持向量机(SVM)、随机森林(RF)和决策树(DT)——基于它们在先前的医疗预测研究中被证明的有效性以及它们在准确性和可解释性之间的平衡。该方法包括全面的数据预处理、类不平衡处理和超参数调优以优化模型性能。性能指标包括准确率、精密度、召回率、f1评分和AUC-ROC。探索性数据分析(EDA)是为了评估BMI、年龄和血糖水平等变量在预测中风中的作用,由于数据集的限制,中风是心脏病发作的一个替代指标。结果:SVM和LR模型准确率最高(95.08%),其次是RF(94.86%)和DT(91.46%)。尽管准确率很高,但我们也观察到了一些关键的挑战:类别不平衡:数据集中只有249个病例代表了积极的中风结果,导致少数类别预测的召回率很低。这降低了模型对实际中风病例的敏感性,在临床情况下,假阴性可能危及生命,这是一个重大限制。数据标签不一致:虽然该研究的框架是预测心脏病发作,但数据集属于中风预测。这种不一致造成了研究结果临床相关性的混乱,并削弱了心脏病发作风险评估模型的普遍性。实践中缺乏模型可解释性:尽管LIME和SHAP被引用为确保模型透明度的工具,但它们没有得到实施或评估。这限制了临床医生对模型预测的信任,而这是在现实世界中采用该模型的一个重要因素。结论:这项研究表明,机器学习可以在改善我们预测心脏病发作的方式并最终帮助改善患者护理方面发挥有意义的作用。结果表明,即使是众所周知的模型,如支持向量机和逻辑回归,也可以很好地应用于结构化健康数据。同样清楚的是,日常变量——如年龄、身体质量指数、血糖水平和吸烟习惯——都是评估心血管风险的重要信号。但是,虽然这些模型达到了很高的准确性,但研究也表明,仅凭性能还不足以用于现实世界。为了使机器学习在医疗保健中真正有用,模型需要正确处理不平衡的数据,提供透明和可理解的预测,并与临床需求保持一致。这项工作不仅突出了人工智能在改变预测性医疗保健方面的潜力,也提醒了我们在此过程中必须解决的实际挑战。明确的目标、可解释的结果以及对临床实践的周到整合是使这些工具安全、有效并获得医疗保健专业人员信任的关键。
{"title":"Revolutionizing heart attack prognosis: Introducing an innovative regression model for prediction","authors":"Hanaa Albanna ,&nbsp;Madhav Raj Theeng Tamang ,&nbsp;Chandan Patel ,&nbsp;Mhd Saeed Sharif","doi":"10.1016/j.imu.2025.101664","DOIUrl":"10.1016/j.imu.2025.101664","url":null,"abstract":"<div><h3>Objective:</h3><div>Heart attack prediction using machine learning is crucial for preemptive action and personalized healthcare. This research aims to predict heart attacks by employing machine learning in healthcare using a diverse range of patient data-including demographic, lifestyle, and physiological factors, which helps to create robust and generalizable predictions. Besides this, various models that balance accuracy with interpretability have been presented, emphasizing early detection and proactive intervention. It is expected that this cross-disciplinary approach will underline the role of machine learning in the mitigation of the heart disease burden and optimization of resources spent on healthcare.</div></div><div><h3>Methods:</h3><div>This study explores the application of machine learning techniques for predicting heart attack risk using structured clinical data. A range of classification models — Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) — were selected based on their proven effectiveness in prior healthcare prediction studies and their balance between accuracy and interpretability. The methodology involved comprehensive data preprocessing, class imbalance handling, and hyperparameter tuning to optimize model performance. Performance metrics included Accuracy, Precision, Recall, F1-score, and AUC-ROC. Exploratory Data Analysis (EDA) was conducted to assess the role of variables such as BMI, age, and glucose levels in predicting stroke, a proxy used for heart attack due to dataset limitations.</div></div><div><h3>Results:</h3><div>The SVM and LR models achieved the highest accuracy (95.08%), followed by RF (94.86%) and DT (91.46%). Despite high accuracy, key challenges were observed:</div><div>Class Imbalance: Only 249 cases in the dataset represented positive stroke outcomes, resulting in poor recall for minority class predictions. This reduced the model’s sensitivity to actual stroke cases, a significant limitation in clinical scenarios where false negatives can be life-threatening.</div><div>Data-Label Inconsistency: Although the study is framed as predicting heart attacks, the dataset pertains to stroke prediction. This misalignment creates confusion in the clinical relevance of the findings and weakens the generalizability of the models for heart attack risk assessment.</div><div>Lack of Model Interpretability in Practice: Though LIME and SHAP were cited as tools to ensure model transparency, they were not implemented or evaluated. This limits clinicians’ trust in the model’s predictions—an essential factor for real-world adoption.</div></div><div><h3>Conclusion:</h3><div>This research shows how machine learning can play a meaningful role in improving how we predict heart attacks and ultimately help improve patient care. The results demonstrated that even well-known models like Support Vector Machine and Logistic Regression can perform very well when applied to s","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101664"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LMVT: A hybrid vision transformer with attention mechanisms for efficient and explainable lung cancer diagnosis LMVT:一种具有注意机制的混合视觉变换器,用于高效和可解释的肺癌诊断
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101669
Jesika Debnath , Al Shahriar Uddin Khondakar Pranta , Amira Hossain , Anamul Sakib , Hamdadur Rahman , Rezaul Haque , Md.Redwan Ahmed , Ahmed Wasif Reza , S M Masfequier.Rahman Swapno , Abhishek Appaji
Lung cancer continues to be a leading cause of cancer-related deaths worldwide due to its high mortality rate and the complexities involved in diagnosis. Traditional diagnostic approaches often face issues such as subjectivity, class imbalance, and limited applicability across different imaging modalities. To tackle these problems, we introduce Lung MobileVIT (LMVT), a lightweight hybrid model that combines a Convolutional Neural Network (CNN) and a Transformer for multiclass lung cancer classification. LMVT utilizes depthwise separable convolutions for local texture extraction while employing multi-head self-attention (MHSA) to capture long-range global dependencies. Furthermore, we integrate attention mechanisms based on the Convolutional Block Attention Module (CBAM) and feature selection techniques derived from the Simple Gray Level Difference Method (SGLDM) to improve discriminative focus and minimize redundancy. LMVT utilizes attention recalibration to enhance the saliency of the minority class, while also incorporating curriculum augmentation strategies that balance representation across underrepresented classes. The model has been trained and validated using two public datasets (IQ-OTH/NCCD and LC25000) and evaluated for both 3-class and 5-class classification tasks. LMVT achieved an impressive 99.61 % accuracy and 99.22 % F1-score for the 3-class classification, along with 99.75 % accuracy and 99.44 % specificity for the 5-class classification. This performance surpasses that of several recent Vision Transformer (ViT) architectures. Statistical significance tests and confidence intervals confirm the reliability of these performance metrics, while an analysis of model complexity supports its capability for potential deployment. To enhance clinical interpretability, the model is integrated with explainable AI (XAI) and is implemented within a web-based diagnostic application for analyzing CT and histopathology images. This study highlights the potential of hybrid ViT architectures in creating scalable and interpretable data-driven tools for practical use in lung cancer diagnostics.
由于肺癌的高死亡率和诊断的复杂性,它仍然是世界范围内癌症相关死亡的主要原因。传统的诊断方法经常面临诸如主观性、类别不平衡以及不同成像模式的有限适用性等问题。为了解决这些问题,我们引入了Lung MobileVIT (LMVT),这是一种轻量级混合模型,结合了卷积神经网络(CNN)和Transformer,用于多类别肺癌分类。LMVT利用深度可分离卷积进行局部纹理提取,同时利用多头自注意(MHSA)捕获远程全局依赖关系。此外,我们将基于卷积块注意模块(CBAM)的注意机制与源自简单灰度差法(SGLDM)的特征选择技术相结合,以提高判别焦点和最小化冗余。LMVT利用注意力重新校准来提高少数族裔班级的显著性,同时还结合课程增强策略来平衡代表性不足班级的代表性。该模型使用两个公共数据集(IQ-OTH/NCCD和LC25000)进行了训练和验证,并对3类和5类分类任务进行了评估。LMVT在3级分类中达到了令人印象深刻的99.61%的准确率和99.22%的f1评分,在5级分类中达到了99.75%的准确率和99.44%的特异性。这种性能超过了最近几种视觉变压器(ViT)架构。统计显著性测试和置信区间确认了这些性能指标的可靠性,而模型复杂性的分析支持其潜在部署的能力。为了提高临床可解释性,该模型集成了可解释的人工智能(XAI),并在基于网络的诊断应用程序中实现,用于分析CT和组织病理学图像。这项研究强调了混合ViT架构在创建可扩展和可解释的数据驱动工具方面的潜力,这些工具可用于肺癌诊断的实际应用。
{"title":"LMVT: A hybrid vision transformer with attention mechanisms for efficient and explainable lung cancer diagnosis","authors":"Jesika Debnath ,&nbsp;Al Shahriar Uddin Khondakar Pranta ,&nbsp;Amira Hossain ,&nbsp;Anamul Sakib ,&nbsp;Hamdadur Rahman ,&nbsp;Rezaul Haque ,&nbsp;Md.Redwan Ahmed ,&nbsp;Ahmed Wasif Reza ,&nbsp;S M Masfequier.Rahman Swapno ,&nbsp;Abhishek Appaji","doi":"10.1016/j.imu.2025.101669","DOIUrl":"10.1016/j.imu.2025.101669","url":null,"abstract":"<div><div>Lung cancer continues to be a leading cause of cancer-related deaths worldwide due to its high mortality rate and the complexities involved in diagnosis. Traditional diagnostic approaches often face issues such as subjectivity, class imbalance, and limited applicability across different imaging modalities. To tackle these problems, we introduce Lung MobileVIT (LMVT), a lightweight hybrid model that combines a Convolutional Neural Network (CNN) and a Transformer for multiclass lung cancer classification. LMVT utilizes depthwise separable convolutions for local texture extraction while employing multi-head self-attention (MHSA) to capture long-range global dependencies. Furthermore, we integrate attention mechanisms based on the Convolutional Block Attention Module (CBAM) and feature selection techniques derived from the Simple Gray Level Difference Method (SGLDM) to improve discriminative focus and minimize redundancy. LMVT utilizes attention recalibration to enhance the saliency of the minority class, while also incorporating curriculum augmentation strategies that balance representation across underrepresented classes. The model has been trained and validated using two public datasets (IQ-OTH/NCCD and LC25000) and evaluated for both 3-class and 5-class classification tasks. LMVT achieved an impressive 99.61 % accuracy and 99.22 % F1-score for the 3-class classification, along with 99.75 % accuracy and 99.44 % specificity for the 5-class classification. This performance surpasses that of several recent Vision Transformer (ViT) architectures. Statistical significance tests and confidence intervals confirm the reliability of these performance metrics, while an analysis of model complexity supports its capability for potential deployment. To enhance clinical interpretability, the model is integrated with explainable AI (XAI) and is implemented within a web-based diagnostic application for analyzing CT and histopathology images. This study highlights the potential of hybrid ViT architectures in creating scalable and interpretable data-driven tools for practical use in lung cancer diagnostics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101669"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhenoQC: An integrated toolkit for quality control of phenotypic data in genomic research PhenoQC:基因组研究中表型数据质量控制的集成工具包
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101693
Jorge Miguel Silva, José Luis Oliveira

Background:

Large-scale genomic research requires robust, consistent phenotypic datasets for meaningful genotype–phenotype correlations. However, diverse collection protocols, incomplete entries, and heterogeneous terminologies frequently compromise data quality and slows downstream analysis.

Methodology:

To address these issues, we present PhenoQC, a high-throughput, configuration-driven toolkit that unifies schema validation, ontology-based semantic alignment, and missing-data imputation in a single workflow. Its modular architecture leverages chunk-based parallelism to handle large datasets, while customizable schemas enforce structural and type constraints. PhenoQC applies user-defined and state-of-the-art machine learning-based imputation and performs multi-ontology mapping with fuzzy matching to harmonize phenotype text. It also quantifies potential imputation-induced distributional shifts by reporting standardized mean difference, variance ratio, and Kolmogorov–Smirnov statistics for numeric variables, and population stability index and Cramér’s V for categorical variables, with user-configurable thresholds. The toolkit provides command-line and graphical interfaces for seamless integration into automated pipelines and interactive curation environments.

Results:

We benchmarked PhenoQC on synthetic datasets with up to 100,000 records and it demonstrated near-linear scalability and full recovery of artificially missing numeric values.Moreover, PhenoQC’s ontology alignment achieved over 97% accuracy under textual corruption. Finally, using two real clinical datasets, PhenoQC successfully imputed missing values, enforced schema compliance, and flagged data anomalies without significant overhead.

Conclusions:

PhenoQC saves manual curation time and ensures consistent, analysis-ready phenotypic data through its streamlined system. Its adaptable design adjusts to evolving ontologies and domain-specific rules, empowering researchers to conduct more reliable studies.
背景:大规模基因组研究需要稳健、一致的表型数据集,以获得有意义的基因型-表型相关性。然而,不同的收集协议、不完整的条目和异构术语经常会影响数据质量,并减慢下游分析的速度。方法:为了解决这些问题,我们提出了PhenoQC,这是一个高吞吐量、配置驱动的工具包,它将模式验证、基于本体的语义对齐和丢失数据的输入统一到一个工作流中。它的模块化架构利用基于块的并行性来处理大型数据集,而可定制的模式则执行结构和类型约束。PhenoQC应用用户定义和最先进的基于机器学习的输入,并执行多本体映射与模糊匹配,以协调表型文本。它还通过报告数字变量的标准化均值差、方差比和Kolmogorov-Smirnov统计数据,以及分类变量的人口稳定指数和cramsamr 's V,以及用户可配置的阈值,来量化潜在的假设引起的分布偏移。该工具包提供命令行和图形界面,用于无缝集成到自动化管道和交互式管理环境中。结果:我们在多达100,000条记录的合成数据集上对PhenoQC进行了基准测试,它展示了近似线性的可扩展性和人为丢失的数值的完全恢复。此外,在文本损坏的情况下,PhenoQC的本体对齐准确率达到了97%以上。最后,使用两个真实的临床数据集,PhenoQC成功地输入了缺失值,强制遵循模式,并标记了数据异常,而没有显著的开销。结论:通过其精简的系统,PhenoQC节省了人工管理时间,并确保了一致的、可分析的表型数据。它的适应性设计适应不断发展的本体和领域特定规则,使研究人员能够进行更可靠的研究。
{"title":"PhenoQC: An integrated toolkit for quality control of phenotypic data in genomic research","authors":"Jorge Miguel Silva,&nbsp;José Luis Oliveira","doi":"10.1016/j.imu.2025.101693","DOIUrl":"10.1016/j.imu.2025.101693","url":null,"abstract":"<div><h3>Background:</h3><div>Large-scale genomic research requires robust, consistent phenotypic datasets for meaningful genotype–phenotype correlations. However, diverse collection protocols, incomplete entries, and heterogeneous terminologies frequently compromise data quality and slows downstream analysis.</div></div><div><h3>Methodology:</h3><div>To address these issues, we present PhenoQC, a high-throughput, configuration-driven toolkit that unifies schema validation, ontology-based semantic alignment, and missing-data imputation in a single workflow. Its modular architecture leverages chunk-based parallelism to handle large datasets, while customizable schemas enforce structural and type constraints. PhenoQC applies user-defined and state-of-the-art machine learning-based imputation and performs multi-ontology mapping with fuzzy matching to harmonize phenotype text. It also quantifies potential imputation-induced distributional shifts by reporting standardized mean difference, variance ratio, and Kolmogorov–Smirnov statistics for numeric variables, and population stability index and Cramér’s <span><math><mi>V</mi></math></span> for categorical variables, with user-configurable thresholds. The toolkit provides command-line and graphical interfaces for seamless integration into automated pipelines and interactive curation environments.</div></div><div><h3>Results:</h3><div>We benchmarked PhenoQC on synthetic datasets with up to 100,000 records and it demonstrated near-linear scalability and full recovery of artificially missing numeric values.Moreover, PhenoQC’s ontology alignment achieved over 97% accuracy under textual corruption. Finally, using two real clinical datasets, PhenoQC successfully imputed missing values, enforced schema compliance, and flagged data anomalies without significant overhead.</div></div><div><h3>Conclusions:</h3><div>PhenoQC saves manual curation time and ensures consistent, analysis-ready phenotypic data through its streamlined system. Its adaptable design adjusts to evolving ontologies and domain-specific rules, empowering researchers to conduct more reliable studies.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"58 ","pages":"Article 101693"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel comparison of CellaVision DC-1 and microscopic assessment of blood film morphology in paediatrics CellaVision DC-1与儿科血膜形态显微评估的新比较
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101690
Heba Sharif , Denise E. Jackson , Genia Burchall

Background

The aim of this study was to evaluate the blood film assessment of CellaVision DC-1 compared to conventional microscopy in stained peripheral blood (PB) films from paediatric samples.

Methods

Blood films (n = 50) including clinically normal samples as well as common pathological conditions, were collected and examined by conventional microscopy and CellaVision DC-1. Manual microscopy counts vs. automated WBC differentiation and RBC grading via Cellavision, including manual re-classification, were compared to expert morphologist reporting. Using statistical analysis, the following metrics were measured including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).

Results

The reliability of RBC grading ranged between 60 and 100 % sensitivity and 55–74 % specificity for CellaVision method compared to 78–93 % sensitivity with manual microscopy, demonstrating the latter as the superior method. Additionally, DC-1 misclassified the presence of blasts for lymphocytes, with 67 % compared to 100 % specificity with the gold standard microscopy. Both pre- and post-classification, re-classifications, and manual microscopy showed strong correlations of WBC differential counts with expert/known readings, mainly for neutrophils and lymphocytes (R2: 0.60–0.85). In terms of time, CellaVision took 1 min longer to scan and assess each slide than did light microscopy, which could affect timely diagnosis and treatment decisions.

Conclusion

The use of CellaVision DC-1 may be beneficial to diagnostic laboratories in the adult setting; however, further research should focus on enhancing automated analysis when assessing paediatric samples that demand human intellect and critical thinking. Medical Scientist training and software development are recommended. Manual microscopy is faster and more accurate. Slide signing and DC-1 classifications of unclassified WBCs need scientist intervention.
本研究的目的是评价CellaVision DC-1与传统显微镜在儿童外周血染色(PB)膜上的血膜评估。方法采集临床正常标本及常见病理标本50例,经常规显微镜及CellaVision DC-1检查。人工显微镜计数与Cellavision自动白细胞分化和红细胞分级,包括人工重新分类,与专家形态学报告进行比较。通过统计分析,测量以下指标,包括敏感性、特异性、阳性预测值(PPV)和阴性预测值(NPV)。结果CellaVision方法对红细胞分级的灵敏度为60% ~ 100%,特异性为55 ~ 74%,而手工镜检的灵敏度为78 ~ 93%,证明手工镜检是一种更好的方法。此外,DC-1错误地将淋巴细胞存在的原细胞分类为67%,而金标准显微镜的特异性为100%。分类前、分类后、重新分类和人工显微镜均显示WBC差异计数与专家/已知读数有很强的相关性,主要是中性粒细胞和淋巴细胞(R2: 0.60-0.85)。在时间方面,CellaVision扫描和评估每张载玻片的时间比光学显微镜长1分钟,这可能会影响及时的诊断和治疗决策。结论CellaVision DC-1可用于成人诊断实验室;然而,进一步的研究应侧重于在评估需要人类智力和批判性思维的儿科样本时加强自动化分析。建议进行医学科学家培训和软件开发。手动显微镜更快更准确。滑动签名和DC-1分类未分类白细胞需要科学家的干预。
{"title":"Novel comparison of CellaVision DC-1 and microscopic assessment of blood film morphology in paediatrics","authors":"Heba Sharif ,&nbsp;Denise E. Jackson ,&nbsp;Genia Burchall","doi":"10.1016/j.imu.2025.101690","DOIUrl":"10.1016/j.imu.2025.101690","url":null,"abstract":"<div><h3>Background</h3><div>The aim of this study was to evaluate the blood film assessment of CellaVision DC-1 compared to conventional microscopy in stained peripheral blood (PB) films from paediatric samples.</div></div><div><h3>Methods</h3><div>Blood films (n = 50) including clinically normal samples as well as common pathological conditions, were collected and examined by conventional microscopy and CellaVision DC-1. Manual microscopy counts vs. automated WBC differentiation and RBC grading via Cellavision, including manual re-classification, were compared to expert morphologist reporting. Using statistical analysis, the following metrics were measured including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).</div></div><div><h3>Results</h3><div>The reliability of RBC grading ranged between 60 and 100 % sensitivity and 55–74 % specificity for CellaVision method compared to 78–93 % sensitivity with manual microscopy, demonstrating the latter as the superior method. Additionally, DC-1 misclassified the presence of blasts for lymphocytes, with 67 % compared to 100 % specificity with the gold standard microscopy. Both pre- and post-classification, re-classifications, and manual microscopy showed strong correlations of WBC differential counts with expert/known readings, mainly for neutrophils and lymphocytes (<span><math><mrow><msup><mi>R</mi><mrow><mn>2</mn><mo>:</mo></mrow></msup></mrow></math></span> 0.60–0.85). In terms of time, CellaVision took 1 min longer to scan and assess each slide than did light microscopy, which could affect timely diagnosis and treatment decisions.</div></div><div><h3>Conclusion</h3><div>The use of CellaVision DC-1 may be beneficial to diagnostic laboratories in the adult setting; however, further research should focus on enhancing automated analysis when assessing paediatric samples that demand human intellect and critical thinking. Medical Scientist training and software development are recommended. Manual microscopy is faster and more accurate. Slide signing and DC-1 classifications of unclassified WBCs need scientist intervention.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"58 ","pages":"Article 101690"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145099557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards trustworthy AI-driven leukemia diagnosis: A hybrid Hierarchical Federated Learning and explainable AI framework 迈向可信赖的AI驱动的白血病诊断:混合分层联邦学习和可解释的AI框架
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101618
Khadija Pervez , Syed Irfan Sohail , Faiza Parwez , Muhammad Abdullah Zia
Accurate detection and classification of microscopic cells from acute lymphoblastic leukemia remain challenging due to the difficulty of differentiating between cancerous and healthy cells. This paper proposes a novel approach to identify and categorize acute lymphoblastic leukemia that uses explainable artificial intelligence and federated learning to train models across multiple institutions while keeping patient information decentralized and encrypted. The framework trains EfficientNetB3 for the classification of leukemia cells and incorporates explainability techniques to make decisions of the underlying model transparent and interpretable. The framework employs a hierarchical federated learning approach that allows distributed learning across clinical centers, ensuring that sensitive patient data remain localized. Explainability techniques such as saliency maps, occlusion sensitivity, and randomized input sampling for explanation with relevant evaluation scores are integrated in the framework to provide visual and textual explanations of model’s predictions to enhance interpretability. The experiments were carried out on a publicly available dataset consisting of 15,135 microscopic images. The performance of the proposed model was benchmarked against traditional centralized models and classical federated learning techniques. The proposed model demonstrated a 2.5% improvement in accuracy (96.5%) and a 5.4% increase in F1-score (94.4%) compared to baseline models. Hierarchical federated learning reduced communication costs by 15% while maintaining data privacy. The integration of explainable artificial intelligence improved the transparency of model decisions, with a high area under the ROC curve (AUC) of 0.98 for the classification of leukemia cells. These results suggest that the proposed framework offers a robust solution for intelligent systems for medical diagnostics and can also be extended to other medical imaging tasks.
由于难以区分癌变细胞和健康细胞,急性淋巴细胞白血病显微细胞的准确检测和分类仍然具有挑战性。本文提出了一种识别和分类急性淋巴细胞白血病的新方法,该方法使用可解释的人工智能和联合学习来跨多个机构训练模型,同时保持患者信息的分散和加密。该框架训练了用于白血病细胞分类的EfficientNetB3,并结合了可解释性技术,使底层模型的决策透明且可解释。该框架采用分层联邦学习方法,允许跨临床中心进行分布式学习,确保敏感的患者数据保持本地化。可解释性技术,如显著性图、遮挡敏感性和随机输入抽样的解释与相关的评估分数被整合到框架中,以提供模型预测的视觉和文本解释,以增强可解释性。实验是在一个由15135张显微图像组成的公开数据集上进行的。该模型的性能与传统的集中式模型和经典的联邦学习技术进行了基准测试。与基线模型相比,该模型的准确率提高了2.5% (96.5%),f1评分提高了5.4%(94.4%)。分层联邦学习在保持数据隐私的同时减少了15%的通信成本。可解释人工智能的集成提高了模型决策的透明度,白血病细胞分类的ROC曲线下面积(AUC)高达0.98。这些结果表明,所提出的框架为医疗诊断智能系统提供了一个强大的解决方案,也可以扩展到其他医学成像任务。
{"title":"Towards trustworthy AI-driven leukemia diagnosis: A hybrid Hierarchical Federated Learning and explainable AI framework","authors":"Khadija Pervez ,&nbsp;Syed Irfan Sohail ,&nbsp;Faiza Parwez ,&nbsp;Muhammad Abdullah Zia","doi":"10.1016/j.imu.2025.101618","DOIUrl":"10.1016/j.imu.2025.101618","url":null,"abstract":"<div><div>Accurate detection and classification of microscopic cells from acute lymphoblastic leukemia remain challenging due to the difficulty of differentiating between cancerous and healthy cells. This paper proposes a novel approach to identify and categorize acute lymphoblastic leukemia that uses explainable artificial intelligence and federated learning to train models across multiple institutions while keeping patient information decentralized and encrypted. The framework trains EfficientNetB3 for the classification of leukemia cells and incorporates explainability techniques to make decisions of the underlying model transparent and interpretable. The framework employs a hierarchical federated learning approach that allows distributed learning across clinical centers, ensuring that sensitive patient data remain localized. Explainability techniques such as saliency maps, occlusion sensitivity, and randomized input sampling for explanation with relevant evaluation scores are integrated in the framework to provide visual and textual explanations of model’s predictions to enhance interpretability. The experiments were carried out on a publicly available dataset consisting of 15,135 microscopic images. The performance of the proposed model was benchmarked against traditional centralized models and classical federated learning techniques. The proposed model demonstrated a 2.5% improvement in accuracy (96.5%) and a 5.4% increase in F1-score (94.4%) compared to baseline models. Hierarchical federated learning reduced communication costs by 15% while maintaining data privacy. The integration of explainable artificial intelligence improved the transparency of model decisions, with a high area under the ROC curve (AUC) of 0.98 for the classification of leukemia cells. These results suggest that the proposed framework offers a robust solution for intelligent systems for medical diagnostics and can also be extended to other medical imaging tasks.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"53 ","pages":"Article 101618"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Customer relationship management systems in clinical laboratories: A systematic review 临床实验室的客户关系管理系统:系统综述
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101628
Jonel Bation , Mary Ann Jaro , Lheyniel Jane Nery , Mudjahidin Mudjahidin , Andre Parvian Aristio , Eddie Bouy Palad , Jason Chavez , Lemuel Clark Velasco
The implementation of Customer Relationship Management (CRM) Systems in clinical laboratories is crucial in improving customer relationships, service quality, and operational efficiency that aligns with a patient-centric care model. This study utilizes the PRISMA guidelines in reviewing and synthesizing 26 journal articles using the People, Process, Technology (PPT) framework to analyze the roles of people involved in clinical settings, the processes by which laboratory services were delivered, and the technological considerations enhancing patient care. Results revealed that the successful implementation of CRM systems in clinical laboratories depends on the aligned efforts of both developers and end-users. Subsequently, marketing processes and customer service were then found out to be crucial for the successful utilization of CRM systems in clinical laboratories. The features and the system integration techniques of CRM systems were found out to be vital in developing efficient operations, enhancing data analysis, and extending accessibility. The research gap analysis, on the one hand, shows that the effectiveness of CRM systems on the patients, the lack of qualitative methods, and the development of corrective actions to increase patient satisfaction are relevant areas of research concerns to optimize the effectiveness of implementing different CRM systems.
在临床实验室实施客户关系管理(CRM)系统对于改善客户关系、服务质量和运营效率至关重要,并与以患者为中心的护理模式保持一致。本研究利用PRISMA指南,使用人员、流程、技术(PPT)框架对26篇期刊文章进行了回顾和综合,以分析临床环境中人员的角色、提供实验室服务的流程以及提高患者护理的技术考虑。结果表明,CRM系统在临床实验室的成功实施取决于开发人员和最终用户的一致努力。随后,营销流程和客户服务被发现是临床实验室成功利用CRM系统的关键。客户关系管理系统的特性和系统集成技术在开发高效操作、增强数据分析和扩展可访问性方面至关重要。研究差距分析一方面表明,CRM系统对患者的有效性、定性方法的缺乏以及提高患者满意度的纠正措施的发展是优化不同CRM系统实施有效性的相关研究关注领域。
{"title":"Customer relationship management systems in clinical laboratories: A systematic review","authors":"Jonel Bation ,&nbsp;Mary Ann Jaro ,&nbsp;Lheyniel Jane Nery ,&nbsp;Mudjahidin Mudjahidin ,&nbsp;Andre Parvian Aristio ,&nbsp;Eddie Bouy Palad ,&nbsp;Jason Chavez ,&nbsp;Lemuel Clark Velasco","doi":"10.1016/j.imu.2025.101628","DOIUrl":"10.1016/j.imu.2025.101628","url":null,"abstract":"<div><div>The implementation of Customer Relationship Management (CRM) Systems in clinical laboratories is crucial in improving customer relationships, service quality, and operational efficiency that aligns with a patient-centric care model. This study utilizes the PRISMA guidelines in reviewing and synthesizing 26 journal articles using the People, Process, Technology (PPT) framework to analyze the roles of people involved in clinical settings, the processes by which laboratory services were delivered, and the technological considerations enhancing patient care. Results revealed that the successful implementation of CRM systems in clinical laboratories depends on the aligned efforts of both developers and end-users. Subsequently, marketing processes and customer service were then found out to be crucial for the successful utilization of CRM systems in clinical laboratories. The features and the system integration techniques of CRM systems were found out to be vital in developing efficient operations, enhancing data analysis, and extending accessibility. The research gap analysis, on the one hand, shows that the effectiveness of CRM systems on the patients, the lack of qualitative methods, and the development of corrective actions to increase patient satisfaction are relevant areas of research concerns to optimize the effectiveness of implementing different CRM systems.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"53 ","pages":"Article 101628"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing the Sensitivity Frontier in digital contact tracing: Comparative analysis of proposed methods toward maximized utility 推进数字接触追踪的灵敏度前沿:实现效用最大化的方法比较分析
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101622
Junko Ami , Yanbo Pang , Hiroshi Masui , Takashi Okumura , Yoshihide Sekimoto
During the COVID-19 pandemic, many countries adopted Digital Contact Tracing (DCT) technology to control infections. However, the widely-used Bluetooth Low Energy (BLE)-based DCT requires both the infected individual and the contact to have the application activated to detect exposure. Forcing citizens to install the DCT application could compromise their privacy. Therefore, to make DCT a truly usable tool, it is crucial to develop a DCT system that possesses high sensitivity, without depending on the application usage rate.
The Computation of Infection Risk via Confidential Locational Entries (CIRCLE) is a DCT method that utilizes connection logs from mobile phone base stations, theoretically offering much higher sensitivity than BLE-based DCT. However, its real performance has not been proven, and thus, this paper estimates the sensitivity and specificity of both BLE-based DCT and CIRCLE in a comparative setting. The estimation combines simulated movement patterns of residents with real-world data from app usage in Japan, utilizing both simulation and numerical modeling, with missing data supplemented through sensitivity analysis.
The sensitivity of BLE-based DCT is severely limited by the application’s usage rate, with an estimated baseline of just 10.9%, and even under highly optimistic assumptions, it only reaches 27.0%. In contrast, CIRCLE demonstrated a significantly higher sensitivity of 85.6%, greatly surpassing BLE-based DCT. The specificity of CIRCLE, though, decreased as the number of infected individuals increased, dropping to less than half of BLE-based DCT’s specificity during widespread infection. The BLE-based DCT used during the pandemic suffers from low sensitivity. While CIRCLE has specificity challenges, it provides exceptionally high sensitivity. Integrating these methods could redefine the design of digital contact tracing, leading to better utility for future infection control.
在2019冠状病毒病大流行期间,许多国家采用了数字接触者追踪技术来控制感染。然而,广泛使用的基于蓝牙低功耗(BLE)的DCT需要被感染者和接触者都激活应用程序以检测暴露。强迫公民安装DCT应用程序可能会损害他们的隐私。因此,为了使DCT成为真正可用的工具,开发一种不依赖于应用使用率的高灵敏度DCT系统至关重要。通过保密位置条目计算感染风险(CIRCLE)是一种利用移动电话基站连接日志的DCT方法,理论上比基于ble的DCT灵敏度高得多。然而,其实际性能尚未得到证实,因此,本文在比较设置中估计了基于ble的DCT和CIRCLE的敏感性和特异性。该估计将模拟居民的移动模式与日本应用程序使用的真实数据相结合,利用模拟和数值建模,并通过敏感性分析补充缺失数据。基于ble的DCT的灵敏度受到应用程序使用率的严重限制,估计基线仅为10.9%,即使在非常乐观的假设下,它也只能达到27.0%。相比之下,CIRCLE的灵敏度为85.6%,大大超过了基于ble的DCT。然而,CIRCLE的特异性随着感染人数的增加而下降,在广泛感染期间降至不到基于ble的DCT特异性的一半。大流行期间使用的基于ble的DCT灵敏度较低。虽然CIRCLE在特异性方面存在挑战,但它提供了异常高的灵敏度。整合这些方法可以重新定义数字接触者追踪的设计,从而为未来的感染控制带来更好的效用。
{"title":"Advancing the Sensitivity Frontier in digital contact tracing: Comparative analysis of proposed methods toward maximized utility","authors":"Junko Ami ,&nbsp;Yanbo Pang ,&nbsp;Hiroshi Masui ,&nbsp;Takashi Okumura ,&nbsp;Yoshihide Sekimoto","doi":"10.1016/j.imu.2025.101622","DOIUrl":"10.1016/j.imu.2025.101622","url":null,"abstract":"<div><div>During the COVID-19 pandemic, many countries adopted Digital Contact Tracing (DCT) technology to control infections. However, the widely-used Bluetooth Low Energy (BLE)-based DCT requires both the infected individual and the contact to have the application activated to detect exposure. Forcing citizens to install the DCT application could compromise their privacy. Therefore, to make DCT a truly usable tool, it is crucial to develop a DCT system that possesses high sensitivity, without depending on the application usage rate.</div><div>The Computation of Infection Risk via Confidential Locational Entries (CIRCLE) is a DCT method that utilizes connection logs from mobile phone base stations, theoretically offering much higher sensitivity than BLE-based DCT. However, its real performance has not been proven, and thus, this paper estimates the sensitivity and specificity of both BLE-based DCT and CIRCLE in a comparative setting. The estimation combines simulated movement patterns of residents with real-world data from app usage in Japan, utilizing both simulation and numerical modeling, with missing data supplemented through sensitivity analysis.</div><div>The sensitivity of BLE-based DCT is severely limited by the application’s usage rate, with an estimated baseline of just 10.9%, and even under highly optimistic assumptions, it only reaches 27.0%. In contrast, CIRCLE demonstrated a significantly higher sensitivity of 85.6%, greatly surpassing BLE-based DCT. The specificity of CIRCLE, though, decreased as the number of infected individuals increased, dropping to less than half of BLE-based DCT’s specificity during widespread infection. The BLE-based DCT used during the pandemic suffers from low sensitivity. While CIRCLE has specificity challenges, it provides exceptionally high sensitivity. Integrating these methods could redefine the design of digital contact tracing, leading to better utility for future infection control.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"53 ","pages":"Article 101622"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing patient questionnaires and physician documentation in a tertiary hospital setting: A retrospective analysis of subjective information 比较三级医院设置的患者问卷和医生文件:主观信息的回顾性分析
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101707
Nattawipa Thawinwisan , Chang Liu , Goshiro Yamamoto , Kazumasa Kishimoto , Yukiko Mori , Tomohiro Kuroda

Background:

Patient perception is crucial for medical decision-making. However, discrepancies often arise between patients’ reports and physician documentation. This study aims to examine differences in subjective information between patient questionnaires and physician records in a tertiary hospital setting to provide insights for the development of clinical documentation tools.

Methods:

We retrospectively analyzed 500 paired patient questionnaires and corresponding physician records from five departments at Kyoto University Hospital. Subjective information from the History of Present Illness (HPI) was manually extracted. AI-assisted comparison identified discrepancies, with outputs reviewed by trained researchers. Discrepancies were graded by severity and examined for symptom characteristics and documentation patterns. Logistic regression assessed associations between discrepancy and patient demographics, department, and additional content indicators, with results expressed as odds ratios (OR) and 95% confidence intervals (CI).

Results:

HPI-related subjective information appeared in 72.8% of patient questionnaires and 80.6% of physician records. However, 48.6% of physician records missed portions of patient-reported information, while 70.8% included additional information absent from questionnaires. Physicians frequently omitted details particularly in onset, quality, severity, and modifying factors. Vague symptoms were more likely to be omitted. Documentation practices varied across departments. Record alignment improved when the patients themselves referenced investigations or referrals.

Conclusion:

Discrepancies between patient-reported and physician-documented subjective information are common and may affect diagnostic accuracy and care continuity. Enhancing patient questionnaires, supporting interactive history taking, and preserving original patient expressions may help bridge documentation gaps. These findings support the development of tools and strategies that better integrate patient narratives into clinical documentation.
背景:患者感知对医疗决策至关重要。然而,患者报告和医生文件之间经常出现差异。本研究旨在探讨三级医院病患问卷与医师记录在主观资讯上的差异,为临床文献工具的发展提供见解。方法:回顾性分析来自京都大学医院5个科室的500对患者问卷和相应的医师记录。从现有病史(HPI)中提取主观信息。人工智能辅助比较发现了差异,并由训练有素的研究人员审查了产出。差异按严重程度分级,并检查症状特征和记录模式。Logistic回归评估差异与患者人口统计学、科室和其他内容指标之间的关联,结果用比值比(OR)和95%置信区间(CI)表示。结果:72.8%的患者问卷和80.6%的医师病历中存在与hpi相关的主观信息。然而,48.6%的医生记录遗漏了部分患者报告的信息,而70.8%的医生记录包含了问卷中没有的额外信息。医生经常忽略细节,特别是发病、质量、严重程度和改变因素。模糊的症状更有可能被忽略。文档实践因部门而异。当患者自己参考调查或转诊时,记录一致性得到改善。结论:患者报告和医生记录的主观信息之间的差异是常见的,并可能影响诊断的准确性和护理的连续性。加强患者问卷调查,支持交互式历史记录,并保留原始的患者表达可能有助于弥合文档的差距。这些发现支持开发更好地将患者叙述整合到临床文献中的工具和策略。
{"title":"Comparing patient questionnaires and physician documentation in a tertiary hospital setting: A retrospective analysis of subjective information","authors":"Nattawipa Thawinwisan ,&nbsp;Chang Liu ,&nbsp;Goshiro Yamamoto ,&nbsp;Kazumasa Kishimoto ,&nbsp;Yukiko Mori ,&nbsp;Tomohiro Kuroda","doi":"10.1016/j.imu.2025.101707","DOIUrl":"10.1016/j.imu.2025.101707","url":null,"abstract":"<div><h3>Background:</h3><div>Patient perception is crucial for medical decision-making. However, discrepancies often arise between patients’ reports and physician documentation. This study aims to examine differences in subjective information between patient questionnaires and physician records in a tertiary hospital setting to provide insights for the development of clinical documentation tools.</div></div><div><h3>Methods:</h3><div>We retrospectively analyzed 500 paired patient questionnaires and corresponding physician records from five departments at Kyoto University Hospital. Subjective information from the History of Present Illness (HPI) was manually extracted. AI-assisted comparison identified discrepancies, with outputs reviewed by trained researchers. Discrepancies were graded by severity and examined for symptom characteristics and documentation patterns. Logistic regression assessed associations between discrepancy and patient demographics, department, and additional content indicators, with results expressed as odds ratios (OR) and 95% confidence intervals (CI).</div></div><div><h3>Results:</h3><div>HPI-related subjective information appeared in 72.8% of patient questionnaires and 80.6% of physician records. However, 48.6% of physician records missed portions of patient-reported information, while 70.8% included additional information absent from questionnaires. Physicians frequently omitted details particularly in onset, quality, severity, and modifying factors. Vague symptoms were more likely to be omitted. Documentation practices varied across departments. Record alignment improved when the patients themselves referenced investigations or referrals.</div></div><div><h3>Conclusion:</h3><div>Discrepancies between patient-reported and physician-documented subjective information are common and may affect diagnostic accuracy and care continuity. Enhancing patient questionnaires, supporting interactive history taking, and preserving original patient expressions may help bridge documentation gaps. These findings support the development of tools and strategies that better integrate patient narratives into clinical documentation.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"59 ","pages":"Article 101707"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145419095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-national survey on the termination of Digital Contact Tracing apps: Have we killed the goose that lays the golden eggs? 关于终止数字接触追踪应用程序的跨国调查:我们是否杀了下金蛋的鹅?
Q1 Medicine Pub Date : 2025-01-01 DOI: 10.1016/j.imu.2025.101694
Yuki Kamei , Wataru Tanabe , Manabu Ichikawa , Takashi Okumura
During the COVID-19 pandemic, Digital Contact Tracing (DCT) tools were deployed worldwide as critical Non-Pharmaceutical Interventions aimed at controlling virus transmission. While simulation studies and some real-world evaluations suggested their potential effectiveness, many were discontinued before the pandemic ended. The reasons behind these decisions and the full lifecycle of these applications remain poorly documented. To address this gap, we conducted a cross-national multi-lingual survey on the status and discontinuation of contact tracing apps.
We developed a registry of countries and their DCT apps by combining existing study results with new online surveys. For each app, we manually collected data on operational status, reasons for termination, and contextual factors. A qualitative analysis was then conducted to identify common patterns and their potential association with national pandemic trajectories.
The registry includes 184 DCT apps across 158 countries and regions. Among these, 45.7% had been terminated by the time of analysis. Termination reasons were categorized into five primary areas: pandemic stage, government policy shifts, privacy concerns, technical challenges, and user acceptance issues. Notably, apps that did not use the Google/Apple Exposure Notification framework were more likely to face privacy and technical barriers, contributing to early shutdowns. We also observed cases where app termination was followed by infection surges.
This study showed that effectiveness and continuity of DCT apps depend not only on technical performance but also on strategic alignment with infection control measures and adequate supporting resources. Based on the findings, Future DCT systems should be designed to remain viable throughout pandemics.
在2019冠状病毒病大流行期间,全球部署了数字接触者追踪工具,作为控制病毒传播的关键非药物干预措施。虽然模拟研究和一些真实世界的评估表明它们可能有效,但许多在大流行结束之前就停止了。这些决定背后的原因以及这些应用程序的完整生命周期的文档仍然很少。为了解决这一差距,我们对接触者追踪应用程序的现状和中断情况进行了一项跨国多语种调查。通过将现有研究结果与新的在线调查相结合,我们开发了一个国家及其DCT应用程序的注册表。对于每个应用程序,我们手动收集有关运行状态、终止原因和上下文因素的数据。然后进行了定性分析,以确定共同模式及其与国家大流行轨迹的潜在关联。该注册表包括158个国家和地区的184个DCT应用程序。其中45.7%在分析时已终止。终止原因主要分为五个方面:大流行阶段、政府政策转变、隐私问题、技术挑战和用户接受问题。值得注意的是,没有使用b谷歌/Apple曝光通知框架的应用更有可能面临隐私和技术障碍,导致提前关闭。我们还观察到应用终止后感染激增的情况。这项研究表明,DCT应用程序的有效性和连续性不仅取决于技术性能,还取决于与感染控制措施的战略一致性和足够的支持资源。基于这些发现,未来的DCT系统应该设计成在大流行期间保持可行性。
{"title":"Cross-national survey on the termination of Digital Contact Tracing apps: Have we killed the goose that lays the golden eggs?","authors":"Yuki Kamei ,&nbsp;Wataru Tanabe ,&nbsp;Manabu Ichikawa ,&nbsp;Takashi Okumura","doi":"10.1016/j.imu.2025.101694","DOIUrl":"10.1016/j.imu.2025.101694","url":null,"abstract":"<div><div>During the COVID-19 pandemic, Digital Contact Tracing (DCT) tools were deployed worldwide as critical Non-Pharmaceutical Interventions aimed at controlling virus transmission. While simulation studies and some real-world evaluations suggested their potential effectiveness, many were discontinued before the pandemic ended. The reasons behind these decisions and the full lifecycle of these applications remain poorly documented. To address this gap, we conducted a cross-national multi-lingual survey on the status and discontinuation of contact tracing apps.</div><div>We developed a registry of countries and their DCT apps by combining existing study results with new online surveys. For each app, we manually collected data on operational status, reasons for termination, and contextual factors. A qualitative analysis was then conducted to identify common patterns and their potential association with national pandemic trajectories.</div><div>The registry includes 184 DCT apps across 158 countries and regions. Among these, 45.7% had been terminated by the time of analysis. Termination reasons were categorized into five primary areas: pandemic stage, government policy shifts, privacy concerns, technical challenges, and user acceptance issues. Notably, apps that did not use the Google/Apple Exposure Notification framework were more likely to face privacy and technical barriers, contributing to early shutdowns. We also observed cases where app termination was followed by infection surges.</div><div>This study showed that effectiveness and continuity of DCT apps depend not only on technical performance but also on strategic alignment with infection control measures and adequate supporting resources. Based on the findings, Future DCT systems should be designed to remain viable throughout pandemics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"59 ","pages":"Article 101694"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145463706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Informatics in Medicine Unlocked
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1