首页 > 最新文献

Biodata Mining最新文献

英文 中文
A probabilistic approach for building disease phenotypes across electronic health records. 在电子健康记录中建立疾病表型的概率方法。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-11 DOI: 10.1186/s13040-025-00454-9
David Vidmar, Jessica De Freitas, Will Thompson, John M Pfeifer, Brandon K Fornwalt, Noah Zimmerman, Riccardo Miotto, Ruijun Chen
{"title":"A probabilistic approach for building disease phenotypes across electronic health records.","authors":"David Vidmar, Jessica De Freitas, Will Thompson, John M Pfeifer, Brandon K Fornwalt, Noah Zimmerman, Riccardo Miotto, Ruijun Chen","doi":"10.1186/s13040-025-00454-9","DOIUrl":"10.1186/s13040-025-00454-9","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"39"},"PeriodicalIF":4.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
subMG automates data submission for metagenomics studies. subMG自动提交宏基因组学研究的数据。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-05 DOI: 10.1186/s13040-025-00453-w
Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba

Background: Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.

Results: subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.

Conclusions: By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.

背景:公开可用的宏基因组学数据集对于确保科学发现的可重复性和支持当代大规模研究至关重要。然而,提交一个全面的宏基因组数据集既麻烦又耗时。它需要包括样本信息、测序读数、组装、分组组合、宏基因组组装基因组(MAGs)和适当的元数据。因此,宏基因组学研究的发表往往带有不完整的数据集,或者在某些情况下,根本没有任何数据。subMG通过简化和自动化数据提交过程来解决这一挑战,从而鼓励更广泛和更一致的数据共享。结果:subMG简化了向欧洲核苷酸档案馆(ENA)提交宏基因组学研究结果的过程,允许研究人员以单一形式输入他们研究中的文件和元数据,并自动化下游任务,否则需要大量的手工工作和专业知识。该工具附带了全面的文档以及为不同用例量身定制的示例数据,可以通过命令行或图形用户界面(GUI)操作,使其易于部署到广泛的潜在用户。结论:通过简化基因组解析宏基因组学研究数据集的提交,subMG显著减少了研究人员所需的时间、精力和专业知识,从而为未来更多、更全面的数据提交铺平了道路。越来越多的文献完备且公平的数据可用于未来的研究,特别是在荟萃分析和比较研究中。
{"title":"subMG automates data submission for metagenomics studies.","authors":"Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba","doi":"10.1186/s13040-025-00453-w","DOIUrl":"10.1186/s13040-025-00453-w","url":null,"abstract":"<p><strong>Background: </strong>Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.</p><p><strong>Results: </strong>subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.</p><p><strong>Conclusions: </strong>By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"38"},"PeriodicalIF":4.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12142852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network-based analyses of multiomics data in biomedicine. 基于网络的生物医学多组学数据分析。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-27 DOI: 10.1186/s13040-025-00452-x
Rachit Kumar, Joseph D Romano, Marylyn D Ritchie

Network representations of data are designed to encode relationships between concepts as sets of edges between nodes. Human biology is inherently complex and is represented by data that often exists in a hierarchical nature. One canonical example is the relationship that exists within and between various -omics datasets, including genomics, transcriptomics, and proteomics, among others. Encoding such data in a network-based or graph-based representation allows the explicit incorporation of such relationships into various biomedical big data tasks, including (but not limited to) disease subtyping, interaction prediction, biomarker identification, and patient classification. This review will present various existing approaches in using network representations and analysis of data in multiomics in the framework of deep learning and machine learning approaches, subdivided into supervised and unsupervised approaches, to identify benefits and drawbacks of various approaches as well as the possible next steps for the field.

数据的网络表示被设计为将概念之间的关系编码为节点之间的边集。人类生物学本质上是复杂的,并由数据表示,这些数据通常存在于层次结构中。一个典型的例子是存在于各种组学数据集内部和之间的关系,包括基因组学、转录组学和蛋白质组学等。在基于网络或基于图的表示中对这些数据进行编码,可以将这些关系明确地合并到各种生物医学大数据任务中,包括(但不限于)疾病亚型、相互作用预测、生物标志物识别和患者分类。本综述将介绍在深度学习和机器学习方法框架下使用网络表示和多组学数据分析的各种现有方法,细分为监督和无监督方法,以确定各种方法的优缺点以及该领域可能的下一步。
{"title":"Network-based analyses of multiomics data in biomedicine.","authors":"Rachit Kumar, Joseph D Romano, Marylyn D Ritchie","doi":"10.1186/s13040-025-00452-x","DOIUrl":"10.1186/s13040-025-00452-x","url":null,"abstract":"<p><p>Network representations of data are designed to encode relationships between concepts as sets of edges between nodes. Human biology is inherently complex and is represented by data that often exists in a hierarchical nature. One canonical example is the relationship that exists within and between various -omics datasets, including genomics, transcriptomics, and proteomics, among others. Encoding such data in a network-based or graph-based representation allows the explicit incorporation of such relationships into various biomedical big data tasks, including (but not limited to) disease subtyping, interaction prediction, biomarker identification, and patient classification. This review will present various existing approaches in using network representations and analysis of data in multiomics in the framework of deep learning and machine learning approaches, subdivided into supervised and unsupervised approaches, to identify benefits and drawbacks of various approaches as well as the possible next steps for the field.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"37"},"PeriodicalIF":6.1,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12117783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144161878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis. 更正:通过多尺度人类相互作用组网络和社区分析了解急性髓系白血病的治疗靶点。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-22 DOI: 10.1186/s13040-025-00451-y
Suruthy Sivanathan, Ting Hu
{"title":"Correction: Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis.","authors":"Suruthy Sivanathan, Ting Hu","doi":"10.1186/s13040-025-00451-y","DOIUrl":"10.1186/s13040-025-00451-y","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"36"},"PeriodicalIF":4.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12096567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144127755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records. 大数据中的联合模型:纵向电子健康记录中所需数据质量的基于仿真的指南。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-13 DOI: 10.1186/s13040-025-00450-z
Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke

Background: Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.

Methods: In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.

Results: Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.

背景:据报道,在过去十年中,办公室医生和医院对电子健康数据(EHR)的使用有所增加。然而,这些数据类型带来了完整性和数据质量方面的挑战,特别是对于更复杂的模型,不清楚这些特征如何影响性能。方法:采用纵向模型和生存模型相结合的联合模型,将所有可用信息纳入模型中。本文的目的是为纵向电子病历数据的必要质量建立基于仿真的指导方针,使联合模型比cox模型表现得更好。我们通过系统和透明地改变数据质量的不同特征,例如测量频率、噪声和患者之间的异质性,进行了广泛的模拟研究。我们应用联合模型并评估其相对于传统Cox生存建模技术的性能。结果:关键发现表明,疾病发病前的生物标志物变化必须在相似的患者组中保持一致。随着噪声的增加和测量密度的提高,联合模型在模型性能上超过了传统的Cox回归模型。我们用两个现实世界的例子来说明指南的有用性和局限性,即血清胆红素对原发性胆汁性肝硬化的影响和估计肾小球滤过率对慢性肾脏疾病的影响。
{"title":"Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.","authors":"Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke","doi":"10.1186/s13040-025-00450-z","DOIUrl":"10.1186/s13040-025-00450-z","url":null,"abstract":"<p><strong>Background: </strong>Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.</p><p><strong>Methods: </strong>In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.</p><p><strong>Results: </strong>Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"35"},"PeriodicalIF":4.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Motif clustering and digital biomarker extraction for free-living physical activity analysis. 更正:基序聚类和数字生物标记提取用于自由生活的身体活动分析。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-12 DOI: 10.1186/s13040-025-00449-6
Ya-Ting Liang, Charlotte Wang
{"title":"Correction: Motif clustering and digital biomarker extraction for free-living physical activity analysis.","authors":"Ya-Ting Liang, Charlotte Wang","doi":"10.1186/s13040-025-00449-6","DOIUrl":"10.1186/s13040-025-00449-6","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"34"},"PeriodicalIF":4.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12067653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144008381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting the approaches to DNA damage detection in genetic toxicology: insights and regulatory implications. 重访遗传毒理学中DNA损伤检测的方法:见解和监管意义。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-06 DOI: 10.1186/s13040-025-00447-8
Sulaiman Mohammed Alnasser

Genetic toxicology is crucial for evaluating the potential risks of chemicals and drugs to human health and the environment. The emergence of high-throughput technologies has transformed this field, providing more efficient, cost-effective, and ethically sound methods for genotoxicity testing. It utilizes advanced screening techniques, including automated in vitro assays and computational models to rapidly assess the genotoxic potential of thousands of compounds simultaneously. This review explores the transformation of traditional in vitro and in vivo methods into computational models for genotoxicity assessment. By leveraging advances in machine learning, artificial intelligence, and high-throughput screening, computational approaches are increasingly replacing conventional methods. Coupling conventional screening with artificial intelligence (AI) and machine learning (ML) models has significantly enhanced their predictive capabilities, enabling the identification of genotoxicity signatures tied to molecular structures and biological pathways. Regulatory agencies increasingly support such methodologies as humane alternatives to traditional animal models, provided they are validated and exhibit strong predictive power. Standardization efforts, including the establishment of common endpoints across testing approaches, are pivotal for enhancing comparability and fostering consensus in toxicological assessments. Initiatives like ToxCast exemplify the successful incorporation of HTS data into regulatory decision-making, demonstrating that well-interpreted in vitro results can align with in vivo outcomes. Innovations in testing methodologies, global data sharing, and real-time monitoring continue to refine the precision and personalization of risk assessments, promising a transformative impact on safety evaluations and regulatory frameworks.

遗传毒理学对于评估化学品和药物对人类健康和环境的潜在风险至关重要。高通量技术的出现改变了这一领域,为基因毒性检测提供了更有效、更具成本效益和合乎伦理的方法。它利用先进的筛选技术,包括自动体外测定和计算模型,同时快速评估数千种化合物的遗传毒性潜力。这篇综述探讨了传统的体外和体内方法转化为遗传毒性评估的计算模型。通过利用机器学习、人工智能和高通量筛选方面的进步,计算方法正日益取代传统方法。将传统筛选与人工智能(AI)和机器学习(ML)模型相结合,大大增强了它们的预测能力,从而能够识别与分子结构和生物途径相关的遗传毒性特征。监管机构越来越多地支持这些方法作为传统动物模型的人道替代品,只要它们得到验证并显示出强大的预测能力。标准化工作,包括建立跨测试方法的共同终点,对于加强毒理学评估的可比性和促进共识至关重要。像ToxCast这样的项目成功地将HTS数据整合到监管决策中,证明了良好解释的体外结果可以与体内结果一致。测试方法、全球数据共享和实时监测方面的创新不断提高风险评估的准确性和个性化,有望对安全评估和监管框架产生革命性的影响。
{"title":"Revisiting the approaches to DNA damage detection in genetic toxicology: insights and regulatory implications.","authors":"Sulaiman Mohammed Alnasser","doi":"10.1186/s13040-025-00447-8","DOIUrl":"https://doi.org/10.1186/s13040-025-00447-8","url":null,"abstract":"<p><p>Genetic toxicology is crucial for evaluating the potential risks of chemicals and drugs to human health and the environment. The emergence of high-throughput technologies has transformed this field, providing more efficient, cost-effective, and ethically sound methods for genotoxicity testing. It utilizes advanced screening techniques, including automated in vitro assays and computational models to rapidly assess the genotoxic potential of thousands of compounds simultaneously. This review explores the transformation of traditional in vitro and in vivo methods into computational models for genotoxicity assessment. By leveraging advances in machine learning, artificial intelligence, and high-throughput screening, computational approaches are increasingly replacing conventional methods. Coupling conventional screening with artificial intelligence (AI) and machine learning (ML) models has significantly enhanced their predictive capabilities, enabling the identification of genotoxicity signatures tied to molecular structures and biological pathways. Regulatory agencies increasingly support such methodologies as humane alternatives to traditional animal models, provided they are validated and exhibit strong predictive power. Standardization efforts, including the establishment of common endpoints across testing approaches, are pivotal for enhancing comparability and fostering consensus in toxicological assessments. Initiatives like ToxCast exemplify the successful incorporation of HTS data into regulatory decision-making, demonstrating that well-interpreted in vitro results can align with in vivo outcomes. Innovations in testing methodologies, global data sharing, and real-time monitoring continue to refine the precision and personalization of risk assessments, promising a transformative impact on safety evaluations and regulatory frameworks.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"33"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis. 通过多尺度人类相互作用组网络和社区分析了解急性髓系白血病的治疗靶点。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-02 DOI: 10.1186/s13040-025-00444-x
Suruthy Sivanathan, Ting Hu

Acute myeloid leukemia (AML) is caused by proliferation of mutated myeloid progenitor cells. The standard chemotherapy regimen does not efficiently cause remission as there is a high relapse rate. Resistance acquired by leukemic stem cells is suggested to be one of the root causes of relapse. Therefore, there is an urgency to develop new drugs for therapy. Repurposing approved drugs for AML can provide a cost-friendly, time-efficient, and affordable alternative. The multiscale interactome network is a computational tool that can identify potential therapeutic candidates by comparing mechanisms of the drug and disease. Communities that could be potentially experimentally validated are detected in the multiscale interactome network using the algorithm CRank. The results are evaluated through literature search and Gene Ontology (GO) enrichment analysis. In this research, we identify therapeutic candidates for AML and their mechanisms from the interactome, and isolate prioritized communities that are dominant in the therapeutic mechanism that could potentially be used as a prompt for pre-clinical/translational research (e.g. bioinformatics, laboratory research) to focus on biological functions and mechanisms that are associated with the disease and drug. This method may allow for an efficient and accelerated discovery of potential candidates for AML, a rapidly progressing disease.

急性髓系白血病(AML)是由髓系祖细胞突变引起的。由于复发率高,标准化疗方案不能有效缓解。白血病干细胞获得的耐药性被认为是复发的根本原因之一。因此,迫切需要开发新的治疗药物。重新利用已批准的药物治疗急性髓性白血病可以提供一种成本低廉、时间高效且负担得起的替代方案。多尺度相互作用网络是一种计算工具,可以通过比较药物和疾病的机制来识别潜在的治疗候选者。使用曲柄算法在多尺度交互组网络中检测可能被实验验证的社区。通过文献检索和基因本体(GO)富集分析对结果进行评价。在这项研究中,我们从相互作用组中确定AML的候选治疗药物及其机制,并分离出在治疗机制中占主导地位的优先社区,这些社区可能被用作临床前/转化研究(例如生物信息学,实验室研究)的提示,以关注与疾病和药物相关的生物学功能和机制。这种方法可以有效和加速发现AML(一种快速发展的疾病)的潜在候选药物。
{"title":"Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis.","authors":"Suruthy Sivanathan, Ting Hu","doi":"10.1186/s13040-025-00444-x","DOIUrl":"10.1186/s13040-025-00444-x","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) is caused by proliferation of mutated myeloid progenitor cells. The standard chemotherapy regimen does not efficiently cause remission as there is a high relapse rate. Resistance acquired by leukemic stem cells is suggested to be one of the root causes of relapse. Therefore, there is an urgency to develop new drugs for therapy. Repurposing approved drugs for AML can provide a cost-friendly, time-efficient, and affordable alternative. The multiscale interactome network is a computational tool that can identify potential therapeutic candidates by comparing mechanisms of the drug and disease. Communities that could be potentially experimentally validated are detected in the multiscale interactome network using the algorithm CRank. The results are evaluated through literature search and Gene Ontology (GO) enrichment analysis. In this research, we identify therapeutic candidates for AML and their mechanisms from the interactome, and isolate prioritized communities that are dominant in the therapeutic mechanism that could potentially be used as a prompt for pre-clinical/translational research (e.g. bioinformatics, laboratory research) to focus on biological functions and mechanisms that are associated with the disease and drug. This method may allow for an efficient and accelerated discovery of potential candidates for AML, a rapidly progressing disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"32"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inter-organ correlation based multi-task deep learning model for dynamically predicting functional deterioration in multiple organ systems of ICU patients. 基于器官间相关性的多任务深度学习模型动态预测ICU患者多器官系统功能恶化。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-04-16 DOI: 10.1186/s13040-025-00445-w
Zhixuan Zeng, Yang Liu, Shuo Yao, Minjie Lin, Xu Cai, Wenbin Nan, Yiyang Xie, Xun Gong

Background: Functional deterioration (FD) of various organ systems is the major cause of death in ICU patients, but few studies propose effective multi-task (MT) model to predict FD of multiple organs simultaneously. This study propose a MT deep learning model named inter-organ correlation based multi-task model (IOC-MT), to dynamically predict FD in six organ systems.

Methods: Three public ICU databases were used for model training and validation. The IOC-MT was designed based on the routine MT deep learning framework, but it used a Graph Attention Networks (GAT) module to capture inter-organ correlation and an adaptive adjustment mechanism (AAM) to adjust prediction. We compared the IOC-MT to five single-task (ST) baseline models, including three deep models (LSTM-ST, GRU-ST, Transformer-ST) and two machine learning models (GRU-ST, RF-ST), and performed ablation study to assess the contribution of important components in IOC-MT. Model discrimination was evaluated by AUROC and AUPRC, and model calibration was assessed by the calibration curve. The attention weight and adjustment coefficient were analyzed at both overall and individual level to show the AAM of IOC-MT.

Results: The IOC-MT had comparable discrimination and calibration to LSTM-ST, GRU-ST and Transformer-ST for most organs under different gap windows in the internal and external validation, and obviously outperformed GRU-ST, RF-ST. The ablation study showed that the GAT, AAM and missing indicator could improve the overall performance of the model. Furthermore, the inter-organ correlation and prediction adjustment of IOC-MT were intuitive and comprehensible, and also had biological plausibility.

Conclusions: The IOC-MT is a promising MT model for dynamically predicting FD in six organ systems. It can capture inter-organ correlation and adjust the prediction for one organ based on aggregated information from the other organs.

背景:各脏器系统功能恶化(Functional degradation, FD)是ICU患者死亡的主要原因,但目前很少有研究提出有效的多任务(multi-task, MT)模型来同时预测多脏器功能恶化。本研究提出了一种基于器官间相关的多任务模型(IOC-MT)的机器学习深度模型,用于动态预测六个器官系统的FD。方法:使用3个ICU公共数据库进行模型训练和验证。ioc -机器翻译是在常规机器翻译深度学习框架的基础上设计的,但它使用了一个图注意网络(GAT)模块来捕获器官间的相关性,并使用了一个自适应调整机制(AAM)来调整预测。我们将IOC-MT与5个单任务(ST)基线模型进行了比较,包括3个深度模型(LSTM-ST、GRU-ST、Transformer-ST)和2个机器学习模型(GRU-ST、RF-ST),并进行了消融研究,以评估IOC-MT中重要成分的贡献。用AUROC和AUPRC评价模型判别,用标定曲线评价模型定标。从整体和个体两个层面分析了IOC-MT的注意权值和调节系数。结果:IOC-MT与LSTM-ST、GRU-ST和Transformer-ST在不同间隙窗的内、外验证中,对大多数器官具有相当的鉴别和校准能力,且明显优于GRU-ST、RF-ST。烧蚀研究表明,GAT、AAM和缺失指标可以提高模型的整体性能。此外,IOC-MT的器官间相关性和预测调整直观易懂,具有生物学合理性。结论:IOC-MT是一种很有前途的动态预测六个器官系统FD的MT模型。它可以捕获器官间的相关性,并根据来自其他器官的汇总信息调整对一个器官的预测。
{"title":"Inter-organ correlation based multi-task deep learning model for dynamically predicting functional deterioration in multiple organ systems of ICU patients.","authors":"Zhixuan Zeng, Yang Liu, Shuo Yao, Minjie Lin, Xu Cai, Wenbin Nan, Yiyang Xie, Xun Gong","doi":"10.1186/s13040-025-00445-w","DOIUrl":"https://doi.org/10.1186/s13040-025-00445-w","url":null,"abstract":"<p><strong>Background: </strong>Functional deterioration (FD) of various organ systems is the major cause of death in ICU patients, but few studies propose effective multi-task (MT) model to predict FD of multiple organs simultaneously. This study propose a MT deep learning model named inter-organ correlation based multi-task model (IOC-MT), to dynamically predict FD in six organ systems.</p><p><strong>Methods: </strong>Three public ICU databases were used for model training and validation. The IOC-MT was designed based on the routine MT deep learning framework, but it used a Graph Attention Networks (GAT) module to capture inter-organ correlation and an adaptive adjustment mechanism (AAM) to adjust prediction. We compared the IOC-MT to five single-task (ST) baseline models, including three deep models (LSTM-ST, GRU-ST, Transformer-ST) and two machine learning models (GRU-ST, RF-ST), and performed ablation study to assess the contribution of important components in IOC-MT. Model discrimination was evaluated by AUROC and AUPRC, and model calibration was assessed by the calibration curve. The attention weight and adjustment coefficient were analyzed at both overall and individual level to show the AAM of IOC-MT.</p><p><strong>Results: </strong>The IOC-MT had comparable discrimination and calibration to LSTM-ST, GRU-ST and Transformer-ST for most organs under different gap windows in the internal and external validation, and obviously outperformed GRU-ST, RF-ST. The ablation study showed that the GAT, AAM and missing indicator could improve the overall performance of the model. Furthermore, the inter-organ correlation and prediction adjustment of IOC-MT were intuitive and comprehensible, and also had biological plausibility.</p><p><strong>Conclusions: </strong>The IOC-MT is a promising MT model for dynamically predicting FD in six organ systems. It can capture inter-organ correlation and adjust the prediction for one organ based on aggregated information from the other organs.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"31"},"PeriodicalIF":4.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12001458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing clinical outcome predictions through effective sample size evaluation in graph-based digital twin modeling. 通过基于图形的数字孪生模型的有效样本量评估,增强临床结果预测。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-04-15 DOI: 10.1186/s13040-025-00446-9
Xi Li, Jui-Hsuan Chang, Mythreye Venkatesan, Zhiping Paul Wang, Jason H Moore

Digital twins in healthcare offer an innovative approach to precision diagnosis, prognosis, and treatment. SynTwin, a novel computational methodology to generate digital twins using synthetic data and network science, has previously shown promise for improving prediction of breast cancer mortality. In this study, we validate SynTwin using population-level data for different cancer types from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). We assess its predictive accuracy across cancer types of varying sample sizes (n = 1,000 to 30,000 records), mortality rates (35% to 60%), and study designs, revealing insights into the strengths and limitations of digital twins derived from synthetic data in mortality prediction. We also evaluate the effect of sample size (n = 1,000 to 70,000 records) on predictive accuracy for selected cancers (non-Hodgkin lymphoma, bladder, and colorectal cancers). Our results indicate that for larger datasets (n > 10,000) including digital twins in the nearest network neighbor prediction model significantly improves the performance compared to using real patients alone. Specifically, AUROCs ranged from 0.828 to 0.884 for cancers such as cervix uteri and ovarian cancer with digital twins, compared to 0.720 to 0.858 when using real patient data. Similarly, among the selected three cancers, AUROCs using digital twins exceeded AUROCs using real patients alone by at least 0.06 with narrowing variance in performance as the sample size increased. These results highlight the benefit of network-based digital twins, while emphasizing the importance of considering effective sample size when developing predictive models like SynTwin.

医疗保健中的数字孪生为精确诊断、预后和治疗提供了一种创新方法。SynTwin是一种利用合成数据和网络科学生成数字双胞胎的新型计算方法,此前曾显示出改善乳腺癌死亡率预测的希望。在这项研究中,我们使用来自美国国家癌症研究所(National cancer Institute, USA)的监测、流行病学和最终结果(SEER)项目的不同癌症类型的人口水平数据来验证SynTwin。我们评估了其在不同样本量的癌症类型(n = 1,000至30,000条记录)、死亡率(35%至60%)和研究设计中的预测准确性,揭示了从死亡率预测的合成数据中得出的数字双胞胎的优势和局限性。我们还评估了样本量(n = 1,000至70,000条记录)对选定癌症(非霍奇金淋巴瘤、膀胱癌和结直肠癌)预测准确性的影响。我们的研究结果表明,与单独使用真实患者相比,在最近邻网络预测模型中包含数字双胞胎的更大数据集(n > 10,000)显着提高了性能。具体来说,数字双胞胎的宫颈癌和卵巢癌等癌症的auroc范围为0.828至0.884,而使用真实患者数据的auroc范围为0.720至0.858。同样,在选定的三种癌症中,使用数字双胞胎的auroc比单独使用真实患者的auroc至少高出0.06,随着样本量的增加,性能差异逐渐缩小。这些结果突出了基于网络的数字双胞胎的好处,同时强调了在开发像SynTwin这样的预测模型时考虑有效样本量的重要性。
{"title":"Enhancing clinical outcome predictions through effective sample size evaluation in graph-based digital twin modeling.","authors":"Xi Li, Jui-Hsuan Chang, Mythreye Venkatesan, Zhiping Paul Wang, Jason H Moore","doi":"10.1186/s13040-025-00446-9","DOIUrl":"https://doi.org/10.1186/s13040-025-00446-9","url":null,"abstract":"<p><p>Digital twins in healthcare offer an innovative approach to precision diagnosis, prognosis, and treatment. SynTwin, a novel computational methodology to generate digital twins using synthetic data and network science, has previously shown promise for improving prediction of breast cancer mortality. In this study, we validate SynTwin using population-level data for different cancer types from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). We assess its predictive accuracy across cancer types of varying sample sizes (n = 1,000 to 30,000 records), mortality rates (35% to 60%), and study designs, revealing insights into the strengths and limitations of digital twins derived from synthetic data in mortality prediction. We also evaluate the effect of sample size (n = 1,000 to 70,000 records) on predictive accuracy for selected cancers (non-Hodgkin lymphoma, bladder, and colorectal cancers). Our results indicate that for larger datasets (n > 10,000) including digital twins in the nearest network neighbor prediction model significantly improves the performance compared to using real patients alone. Specifically, AUROCs ranged from 0.828 to 0.884 for cancers such as cervix uteri and ovarian cancer with digital twins, compared to 0.720 to 0.858 when using real patient data. Similarly, among the selected three cancers, AUROCs using digital twins exceeded AUROCs using real patients alone by at least 0.06 with narrowing variance in performance as the sample size increased. These results highlight the benefit of network-based digital twins, while emphasizing the importance of considering effective sample size when developing predictive models like SynTwin.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"30"},"PeriodicalIF":4.0,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11998210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1