首页 > 最新文献

Biodata Mining最新文献

英文 中文
Genotype subtyping approach to identify unnoticed variants in diseases from GWAS data. 基因型分型方法从GWAS数据中识别疾病中未被注意的变异。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-10 DOI: 10.1186/s13040-025-00512-2
Debora Garza-Hernandez, Emmanuel Martinez-Ledesma, Victor Trevino
{"title":"Genotype subtyping approach to identify unnoticed variants in diseases from GWAS data.","authors":"Debora Garza-Hernandez, Emmanuel Martinez-Ledesma, Victor Trevino","doi":"10.1186/s13040-025-00512-2","DOIUrl":"10.1186/s13040-025-00512-2","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"8"},"PeriodicalIF":6.1,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875041/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145949375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization or mirage? Data leakage and reported performance in neonatal EEG seizure detection models: a systematic review. 概括还是海市蜃楼?新生儿脑电图发作检测模型的数据泄漏和报告性能:系统回顾。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 DOI: 10.1186/s13040-025-00516-y
Geletaw Sahle Tegenaw, Hailin Song, Tomas Ward
{"title":"Generalization or mirage? Data leakage and reported performance in neonatal EEG seizure detection models: a systematic review.","authors":"Geletaw Sahle Tegenaw, Hailin Song, Tomas Ward","doi":"10.1186/s13040-025-00516-y","DOIUrl":"https://doi.org/10.1186/s13040-025-00516-y","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145935756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-cohort genetic risk prediction for Alzheimer's disease: a transfer learning approach using GWAS and deep learning models. 阿尔茨海默病的跨队列遗传风险预测:使用GWAS和深度学习模型的迁移学习方法。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-22 DOI: 10.1186/s13040-025-00506-0
Isibor Kennedy Ihianle, Wathsala Samarasekara, Keeley Brookes, Pedro Machado
{"title":"Cross-cohort genetic risk prediction for Alzheimer's disease: a transfer learning approach using GWAS and deep learning models.","authors":"Isibor Kennedy Ihianle, Wathsala Samarasekara, Keeley Brookes, Pedro Machado","doi":"10.1186/s13040-025-00506-0","DOIUrl":"10.1186/s13040-025-00506-0","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"89"},"PeriodicalIF":6.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12752400/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145811982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SBT-Net: a tri-cue guided multimodal fusion framework for depression recognition. SBT-Net:一个用于抑郁症识别的三线索引导多模态融合框架。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-22 DOI: 10.1186/s13040-025-00498-x
Yujie Huo, Weng Howe Chan, Ahmad Najmi Bin Amerhaider Nuar, Hongyu Gao

Early detection of depression is vital for public health, yet current multimodal methods often struggle with challenges such as modality incompleteness, semantic inconsistency, and emotional temporal fluctuation. To address these issues, this paper proposes SBT-Net, a novel Semantic-Bias-Trend guided framework for robust depression detection from audio and text data. The model integrates three innovative modules: a semantically guided cross-modal gating (SGCMG) mechanism that dynamically filters effective modality features based on global semantic cues, a bias-guided tensor product attention (BG-TPA) mechanism that enhances fine-grained fusion and alignment between modalities, and an emotion trend modeling (ETM) module that captures the temporal evolution of depressive emotional states.We evaluate SBT-Net using two widely adopted benchmark datasets: DAIC-WOZ, which contains 189 interview sessions, and EATD-Corpus, comprising 162 conversational samples. Experimental results show that SBT-Net achieves excellent performance in multiple indicators, including 93.0% accuracy, 0.93 F1 score, and 0.92 recall, all of which surpass the competitive baselines.Ablation studies further validate the individual and synergistic contributions of each proposed module.These findings highlight the potential of integrating semantic guidance, bias-aware fusion, and emotional trend modeling to advance multimodal depression detection solutions. The source code can be found at https://github.com/ghy-yhg/SBT-Net .

早期发现抑郁症对公众健康至关重要,然而目前的多模态方法常常面临着模态不完整、语义不一致和情绪时间波动等挑战。为了解决这些问题,本文提出了SBT-Net,这是一个新的语义偏差趋势指导框架,用于从音频和文本数据中进行鲁棒抑郁检测。该模型集成了三个创新模块:基于全局语义线索动态过滤有效模态特征的语义引导跨模态门控(SGCMG)机制,增强模态之间细粒度融合和一致性的偏导张量积注意(BG-TPA)机制,以及捕捉抑郁情绪状态时间演变的情绪趋势建模(ETM)模块。我们使用两个广泛采用的基准数据集来评估SBT-Net: DAIC-WOZ(包含189个访谈会话)和EATD-Corpus(包含162个会话样本)。实验结果表明,SBT-Net在多个指标上都取得了优异的成绩,准确率为93.0%,F1分数为0.93,召回率为0.92,均超过竞争基准。消融研究进一步验证了每个提议模块的个体和协同贡献。这些发现强调了整合语义引导、偏见感知融合和情绪趋势建模来推进多模态抑郁症检测解决方案的潜力。源代码可以在https://github.com/ghy-yhg/SBT-Net上找到。
{"title":"SBT-Net: a tri-cue guided multimodal fusion framework for depression recognition.","authors":"Yujie Huo, Weng Howe Chan, Ahmad Najmi Bin Amerhaider Nuar, Hongyu Gao","doi":"10.1186/s13040-025-00498-x","DOIUrl":"10.1186/s13040-025-00498-x","url":null,"abstract":"<p><p>Early detection of depression is vital for public health, yet current multimodal methods often struggle with challenges such as modality incompleteness, semantic inconsistency, and emotional temporal fluctuation. To address these issues, this paper proposes SBT-Net, a novel Semantic-Bias-Trend guided framework for robust depression detection from audio and text data. The model integrates three innovative modules: a semantically guided cross-modal gating (SGCMG) mechanism that dynamically filters effective modality features based on global semantic cues, a bias-guided tensor product attention (BG-TPA) mechanism that enhances fine-grained fusion and alignment between modalities, and an emotion trend modeling (ETM) module that captures the temporal evolution of depressive emotional states.We evaluate SBT-Net using two widely adopted benchmark datasets: DAIC-WOZ, which contains 189 interview sessions, and EATD-Corpus, comprising 162 conversational samples. Experimental results show that SBT-Net achieves excellent performance in multiple indicators, including 93.0% accuracy, 0.93 F1 score, and 0.92 recall, all of which surpass the competitive baselines.Ablation studies further validate the individual and synergistic contributions of each proposed module.These findings highlight the potential of integrating semantic guidance, bias-aware fusion, and emotional trend modeling to advance multimodal depression detection solutions. The source code can be found at https://github.com/ghy-yhg/SBT-Net .</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"86"},"PeriodicalIF":6.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12723886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145811969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An unsupervised tool for biomarker discovery and cancer subtyping applied to glioblastoma. 用于胶质母细胞瘤的生物标志物发现和癌症亚型分型的无监督工具。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-22 DOI: 10.1186/s13040-025-00500-6
Roberta Coletti, J Orestes Cerdeira, Marcos Raydan, Marta B Lopes

Background: High-dimensional omics data often contain more variables than observations, which can lead to overfitting and negatively impact the results of classical data analysis methods. To address the issue, supervised variable selection methods are often used, incorporating penalty terms into the model. While effective for selecting task-specific variables, this approach may not preserve the overall dataset structure for multiple downstream analyses. This study aims to evaluate unsupervised variable selection approaches and introduce a novel tool that improves data interpretability while maintaining biological information.

Results: We assessed multiple unsupervised variable selection techniques to identify a representative subset of the original dataset. Based on this evaluation, we developed TRIM-IT, a computational tool that integrates unsupervised variable selection, clustering, survival analysis, and differential gene expression analysis. TRIM-IT was applied to glioblastoma transcriptomics data, uncovering three distinct patient clusters. These clusters correlated with tumor histology, exhibited significantly different survival outcomes, and revealed molecular profiles that suggest potential biomarker candidates.

Conclusion: TRIM-IT provides a novel approach for analyzing high-dimensional omics data while preserving key biological insights. Its ability to identify meaningful patient subgroups and molecular signatures highlights its applicability across various biomedical research contexts. The tool is implemented in R and the code is publicly available for reproduction and adaptation to other studies.

背景:高维组学数据往往包含比观测值更多的变量,这可能导致过拟合,并对经典数据分析方法的结果产生负面影响。为了解决这个问题,经常使用监督变量选择方法,将惩罚项纳入模型。虽然对于选择特定于任务的变量是有效的,但这种方法可能无法为多个下游分析保留整体数据集结构。本研究旨在评估无监督变量选择方法,并引入一种新的工具,在保持生物信息的同时提高数据的可解释性。结果:我们评估了多种无监督变量选择技术,以确定原始数据集的代表性子集。基于这一评估,我们开发了trimit,这是一个集成了无监督变量选择、聚类、生存分析和差异基因表达分析的计算工具。trimit应用于胶质母细胞瘤转录组学数据,揭示了三个不同的患者群。这些簇与肿瘤组织学相关,表现出显著不同的生存结果,并揭示了潜在生物标志物候选物的分子谱。结论:trimit为分析高维组学数据提供了一种新的方法,同时保留了关键的生物学见解。其识别有意义的患者亚群和分子特征的能力突出了其在各种生物医学研究背景下的适用性。该工具是用R语言实现的,代码是公开的,可用于复制和适应其他研究。
{"title":"An unsupervised tool for biomarker discovery and cancer subtyping applied to glioblastoma.","authors":"Roberta Coletti, J Orestes Cerdeira, Marcos Raydan, Marta B Lopes","doi":"10.1186/s13040-025-00500-6","DOIUrl":"10.1186/s13040-025-00500-6","url":null,"abstract":"<p><strong>Background: </strong>High-dimensional omics data often contain more variables than observations, which can lead to overfitting and negatively impact the results of classical data analysis methods. To address the issue, supervised variable selection methods are often used, incorporating penalty terms into the model. While effective for selecting task-specific variables, this approach may not preserve the overall dataset structure for multiple downstream analyses. This study aims to evaluate unsupervised variable selection approaches and introduce a novel tool that improves data interpretability while maintaining biological information.</p><p><strong>Results: </strong>We assessed multiple unsupervised variable selection techniques to identify a representative subset of the original dataset. Based on this evaluation, we developed TRIM-IT, a computational tool that integrates unsupervised variable selection, clustering, survival analysis, and differential gene expression analysis. TRIM-IT was applied to glioblastoma transcriptomics data, uncovering three distinct patient clusters. These clusters correlated with tumor histology, exhibited significantly different survival outcomes, and revealed molecular profiles that suggest potential biomarker candidates.</p><p><strong>Conclusion: </strong>TRIM-IT provides a novel approach for analyzing high-dimensional omics data while preserving key biological insights. Its ability to identify meaningful patient subgroups and molecular signatures highlights its applicability across various biomedical research contexts. The tool is implemented in R and the code is publicly available for reproduction and adaptation to other studies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"85"},"PeriodicalIF":6.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12720448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145811993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning to predict emergency department revisit using static and dynamic features (Deep Revisit): development and validation study. 使用静态和动态特征的深度学习预测急诊科重访(深度重访):开发和验证研究。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-20 DOI: 10.1186/s13040-025-00509-x
Su-Yin Hsu, Jhe-Yi Jhu, Jun-Wan Gao, Chien-Hua Huang, Chu-Lin Tsai, Li-Chen Fu

Background: Emergency Department (ED) revisits represent a critical issue in emergency medicine. Identifying high-risk revisit cases (revisits with intensive care unit admissions, cardiac arrest, or requiring emergency surgery) is particularly important. While prior studies have explored machine learning models for ED revisit prediction, few deep learning approaches exist, and dynamic features remain underutilized.

Methods: We used data from National Taiwan University Hospital (NTUH), incorporating both static (e.g., age, sex, triage) and dynamic (vital signs) features. A preprocessing strategy was developed to handle temporal irregularities. We proposed a hybrid deep learning model combining Temporal Convolutional Network (TCN) and feature tokenizer (FT)-Transformer to integrate static and short-term dynamic information.

Results: We evaluated our model on NTUH 2016-2019 data, achieving the area under the receiver operating characteristic curve (AUROC) of 0.8453 and the area under precision recall curve (AUPRC) of 0.0935 for high-risk revisits (base rate = 0.01), and AUROC of 0.7250 and AUPRC of 0.2005 for general revisits (base rate = 0.042). The model maintained robust performance when validated on 2020-2022 data. Compared to the static-only logistic regression baseline, our model improved AUPRC from 0.0288 to 0.0935 and precision from 0.0281 to 0.0428.

Conclusion: Our model significantly outperformed the static-only baseline. It demonstrates the effectiveness of multimodal clinical data fusion in improving ED revisit prediction and supporting clinical decision-making.

背景:急诊科(ED)重访是急诊医学的一个关键问题。确定高危重访病例(重症监护病房入院、心脏骤停或需要紧急手术的重访病例)尤为重要。虽然之前的研究已经探索了用于ED重访预测的机器学习模型,但深度学习方法很少,动态特征仍未得到充分利用。方法:我们使用国立台湾大学医院(NTUH)的资料,包括静态(如年龄、性别、分诊)和动态(生命体征)特征。提出了一种处理时间不规则性的预处理策略。我们提出了一种结合时序卷积网络(TCN)和特征标记器(FT)-Transformer的混合深度学习模型来整合静态和短期动态信息。结果:基于NTUH 2016-2019年数据对模型进行了评估,高风险访诊(基准率= 0.01)的受试者工作特征曲线下面积(AUROC)为0.8453,精确查全曲线下面积(AUPRC)为0.0935,普通访诊(基准率= 0.042)的受试者工作特征曲线下面积(AUROC)为0.7250,精确查全曲线下面积(AUPRC)为0.2005。在2020-2022年的数据上验证,该模型保持了稳健的性能。与静态逻辑回归基线相比,我们的模型将AUPRC从0.0288提高到0.0935,精度从0.0281提高到0.0428。结论:我们的模型明显优于静态基线。这表明了多模式临床数据融合在改善急诊科重访预测和支持临床决策方面的有效性。
{"title":"Deep learning to predict emergency department revisit using static and dynamic features (Deep Revisit): development and validation study.","authors":"Su-Yin Hsu, Jhe-Yi Jhu, Jun-Wan Gao, Chien-Hua Huang, Chu-Lin Tsai, Li-Chen Fu","doi":"10.1186/s13040-025-00509-x","DOIUrl":"10.1186/s13040-025-00509-x","url":null,"abstract":"<p><strong>Background: </strong>Emergency Department (ED) revisits represent a critical issue in emergency medicine. Identifying high-risk revisit cases (revisits with intensive care unit admissions, cardiac arrest, or requiring emergency surgery) is particularly important. While prior studies have explored machine learning models for ED revisit prediction, few deep learning approaches exist, and dynamic features remain underutilized.</p><p><strong>Methods: </strong>We used data from National Taiwan University Hospital (NTUH), incorporating both static (e.g., age, sex, triage) and dynamic (vital signs) features. A preprocessing strategy was developed to handle temporal irregularities. We proposed a hybrid deep learning model combining Temporal Convolutional Network (TCN) and feature tokenizer (FT)-Transformer to integrate static and short-term dynamic information.</p><p><strong>Results: </strong>We evaluated our model on NTUH 2016-2019 data, achieving the area under the receiver operating characteristic curve (AUROC) of 0.8453 and the area under precision recall curve (AUPRC) of 0.0935 for high-risk revisits (base rate = 0.01), and AUROC of 0.7250 and AUPRC of 0.2005 for general revisits (base rate = 0.042). The model maintained robust performance when validated on 2020-2022 data. Compared to the static-only logistic regression baseline, our model improved AUPRC from 0.0288 to 0.0935 and precision from 0.0281 to 0.0428.</p><p><strong>Conclusion: </strong>Our model significantly outperformed the static-only baseline. It demonstrates the effectiveness of multimodal clinical data fusion in improving ED revisit prediction and supporting clinical decision-making.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"88"},"PeriodicalIF":6.1,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12750547/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Driven SaO2 prediction from pulse oximetry and electronic health records. 基于脉搏血氧仪和电子健康记录的人工智能驱动的SaO2预测。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-20 DOI: 10.1186/s13040-025-00511-3
JiWon Woo, Orian Stapleton, Jay Luo, Chao Cheng Chuang, Yolanda Su, Sreenidhi Sankararaman, Esanika Mukherjee, Nikita Sivakumar, Katherine Calligy, Summer Duffy, Rebecca Mosier, Joseph Greenstein, Casey Overby Taylor, Danielle Gottlieb Sen
{"title":"AI-Driven SaO<sub>2</sub> prediction from pulse oximetry and electronic health records.","authors":"JiWon Woo, Orian Stapleton, Jay Luo, Chao Cheng Chuang, Yolanda Su, Sreenidhi Sankararaman, Esanika Mukherjee, Nikita Sivakumar, Katherine Calligy, Summer Duffy, Rebecca Mosier, Joseph Greenstein, Casey Overby Taylor, Danielle Gottlieb Sen","doi":"10.1186/s13040-025-00511-3","DOIUrl":"10.1186/s13040-025-00511-3","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"90"},"PeriodicalIF":6.1,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12752230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subphenotype heterogeneity to guide predictive enrichment in acute kidney injury: insights from machine learning and target trial emulation. 亚表型异质性指导急性肾损伤的预测富集:来自机器学习和目标试验模拟的见解。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-20 DOI: 10.1186/s13040-025-00514-0
Jiayang Li, Mingyi Zhao, Qingnan He
{"title":"Subphenotype heterogeneity to guide predictive enrichment in acute kidney injury: insights from machine learning and target trial emulation.","authors":"Jiayang Li, Mingyi Zhao, Qingnan He","doi":"10.1186/s13040-025-00514-0","DOIUrl":"10.1186/s13040-025-00514-0","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"87"},"PeriodicalIF":6.1,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12752266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep survival analysis from adult and pediatric electrocardiograms: a multi-center benchmark study. 成人和儿童心电图的深度生存分析:一项多中心基准研究。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-17 DOI: 10.1186/s13040-025-00510-4
Platon Lukyanenko, Joshua Mayourian, Mingxuan Liu, John K Triedman, Sunil J Ghelani, William G La Cava

Background: Artificial intelligence applied to electrocardiography (AI-ECG) has recently shown potential for mortality prediction, but heterogeneous approaches and private datasets have limited generalizable insights into AI methodologies fit for this purpose. To address this, we systematically evaluated model design choices across three large medical center cohorts: Beth Isreal Deaconess (MIMIC-IV: n = 795,546 ECGs, United States), Telehealth Network of Minas Gerais (Code-15; n = 345,779, Brazil), and Boston Children's Hospital (BCH; n = 255,379, United States).

Results: We comprehensively evaluates models to predict all-cause mortality, comparing horizon-based classification and deep survival methods various neural architectures including convolutional neural networks and transformers. We also benchmarked against demographic-only and gradient boosting baselines. Top models yielded good performance (median concordance, Code-15: 0.83; MIMIC-IV: 0.78; BCH: 0.81). Incorporating age and sex improved performance across all datasets. Classifier-Cox models exhibited site-dependent sensitivity to horizon choice (median Pearson's R, Code-15: 0.35; MIMIC-IV: -0.71; BCH: 0.37). External validation reduced concordance, and in some cases, demographic-only models outperformed externally trained AI-ECG models on Code-15. However, models trained on multi-site data outperformed site-specific models by margins ranging from 5% to 22%.

Conclusions: These findings highlight several key factors for robust AI-ECG deployment. Deep survival methods consistently provided advantages over horizon-based classifiers, while inclusion of demographic covariates such as age and sex improved predictive performance across sites. The sensitivity of classifier-based models to horizon selection underscores the need for site-specific calibration. The multi-site experiment reveals that cross-cohort training, even between adult and pediatric cohorts, can substantially improve performance on those cohorts compared to cohort-specific training. Together, these results emphasize the importance of model type, demographic features, and training data diversity in developing AI-ECG models that can be reliably applied across populations.

背景:人工智能应用于心电图(AI- ecg)最近显示出预测死亡率的潜力,但异构方法和私有数据集限制了适合这一目的的人工智能方法的可推广见解。为了解决这个问题,我们系统地评估了三个大型医疗中心队列的模型设计选择:Beth Isreal Deaconess (MIMIC-IV: n = 795,546个心电图,美国)、Minas Gerais远程医疗网络(Code-15; n = 345,779,巴西)和波士顿儿童医院(BCH; n = 255,379,美国)。结果:我们综合评价了预测全因死亡率的模型,比较了基于水平的分类和深度生存方法以及卷积神经网络和变压器等各种神经结构。我们还以人口统计学和梯度提升基线为基准。顶级模型产生了良好的性能(中位数一致性,Code-15: 0.83; MIMIC-IV: 0.78; BCH: 0.81)。结合年龄和性别提高了所有数据集的性能。分类器- cox模型对水平选择表现出位点依赖的敏感性(中位数Pearson’s R, Code-15: 0.35; MIMIC-IV: -0.71; BCH: 0.37)。外部验证降低了一致性,在某些情况下,仅人口统计学模型在Code-15上的表现优于外部训练的AI-ECG模型。然而,在多站点数据上训练的模型比特定站点模型的性能高出5%到22%。结论:这些发现强调了AI-ECG稳健部署的几个关键因素。深度生存方法始终比基于水平的分类器提供优势,而包含人口统计协变量(如年龄和性别)则提高了跨站点的预测性能。基于分类器的模型对地平线选择的敏感性强调了特定地点校准的必要性。多站点实验表明,与特定队列训练相比,跨队列训练,即使是成人和儿科队列之间的训练,也能显著提高这些队列的表现。总之,这些结果强调了模型类型、人口统计学特征和训练数据多样性在开发可可靠地应用于人群的AI-ECG模型中的重要性。
{"title":"Deep survival analysis from adult and pediatric electrocardiograms: a multi-center benchmark study.","authors":"Platon Lukyanenko, Joshua Mayourian, Mingxuan Liu, John K Triedman, Sunil J Ghelani, William G La Cava","doi":"10.1186/s13040-025-00510-4","DOIUrl":"10.1186/s13040-025-00510-4","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence applied to electrocardiography (AI-ECG) has recently shown potential for mortality prediction, but heterogeneous approaches and private datasets have limited generalizable insights into AI methodologies fit for this purpose. To address this, we systematically evaluated model design choices across three large medical center cohorts: Beth Isreal Deaconess (MIMIC-IV: n = 795,546 ECGs, United States), Telehealth Network of Minas Gerais (Code-15; n = 345,779, Brazil), and Boston Children's Hospital (BCH; n = 255,379, United States).</p><p><strong>Results: </strong>We comprehensively evaluates models to predict all-cause mortality, comparing horizon-based classification and deep survival methods various neural architectures including convolutional neural networks and transformers. We also benchmarked against demographic-only and gradient boosting baselines. Top models yielded good performance (median concordance, Code-15: 0.83; MIMIC-IV: 0.78; BCH: 0.81). Incorporating age and sex improved performance across all datasets. Classifier-Cox models exhibited site-dependent sensitivity to horizon choice (median Pearson's R, Code-15: 0.35; MIMIC-IV: -0.71; BCH: 0.37). External validation reduced concordance, and in some cases, demographic-only models outperformed externally trained AI-ECG models on Code-15. However, models trained on multi-site data outperformed site-specific models by margins ranging from 5% to 22%.</p><p><strong>Conclusions: </strong>These findings highlight several key factors for robust AI-ECG deployment. Deep survival methods consistently provided advantages over horizon-based classifiers, while inclusion of demographic covariates such as age and sex improved predictive performance across sites. The sensitivity of classifier-based models to horizon selection underscores the need for site-specific calibration. The multi-site experiment reveals that cross-cohort training, even between adult and pediatric cohorts, can substantially improve performance on those cohorts compared to cohort-specific training. Together, these results emphasize the importance of model type, demographic features, and training data diversity in developing AI-ECG models that can be reliably applied across populations.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":" ","pages":"6"},"PeriodicalIF":6.1,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12821881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CatBoost with physics-based metaheuristics for thyroid cancer recurrence prediction. CatBoost与基于物理的元启发式甲状腺癌复发预测。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-12-09 DOI: 10.1186/s13040-025-00494-1
Proshenjit Sarker, Kwonhue Choi, Abdullah-Al Nahid, Md Abdus Samad
{"title":"CatBoost with physics-based metaheuristics for thyroid cancer recurrence prediction.","authors":"Proshenjit Sarker, Kwonhue Choi, Abdullah-Al Nahid, Md Abdus Samad","doi":"10.1186/s13040-025-00494-1","DOIUrl":"10.1186/s13040-025-00494-1","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"84"},"PeriodicalIF":6.1,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687482/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1