首页 > 最新文献

Biodata Mining最新文献

英文 中文
DBSCAN and DBCV application to open medical records heterogeneous data for identifying clinically significant clusters of patients with neuroblastoma. DBSCAN和DBCV应用于开放医疗记录异构数据,以识别临床意义重大的神经母细胞瘤患者群。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-12 DOI: 10.1186/s13040-025-00455-8
Davide Chicco, Luca Oneto, Davide Cangelosi

Neuroblastoma is a common pediatric cancer that affects thousands of infants worldwide, especially children under five years of age. Although recovery for patients with neuroblastoma is possible in 80% of cases, only 40% of those with high-risk stage four neuroblastoma survive. Electronic health records of patients with this disease contain valuable data on patients that can be analyzed using computational intelligence and statistical software by biomedical informatics researchers. Unsupervised machine learning methods, in particular, can identify clinically significant subgroups of patients, which can lead to new therapies or medical treatments for future patients belonging to the same subgroups. However, access to these datasets is often restricted, making it difficult to obtain them for independent research projects. In this study, we retrieved three open datasets containing data from patients diagnosed with neuroblastoma: the Genoa dataset and the Shanghai dataset from the Neuroblastoma Electronic Health Records Open Data Repository, and a dataset from the TARGET-NBL renowned program. We analyzed these datasets using several clustering techniques and measured the results with the DBCV (Density-Based Clustering Validation) index. Among these algorithms, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) was the only one that produced meaningful results. We scrutinized the two clusters of patients' profiles identified by DBSCAN in the three datasets and recognized several relevant clinical variables that clearly partitioned the patients into the two clusters that have clinical meaning in the neuroblastoma literature. Our results can have a significant impact on health informatics, because any computational analyst wishing to cluster small data of patients of a rare disease can choose to use DBSCAN and DBCV rather than utilizing more common methods such as k-Means and Silhouette coefficient.

神经母细胞瘤是一种常见的儿科癌症,影响着全世界成千上万的婴儿,尤其是5岁以下的儿童。尽管80%的神经母细胞瘤患者有可能康复,但只有40%的高危四期神经母细胞瘤患者存活。患有这种疾病的患者的电子健康记录包含有价值的患者数据,可以由生物医学信息学研究人员使用计算智能和统计软件进行分析。特别是无监督的机器学习方法,可以识别临床显着的患者亚组,这可以为属于同一亚组的未来患者带来新的疗法或医学治疗。然而,对这些数据集的访问往往受到限制,这使得独立研究项目很难获得它们。在本研究中,我们检索了三个包含神经母细胞瘤患者数据的开放数据集:来自神经母细胞瘤电子健康记录开放数据库的热那亚数据集和上海数据集,以及来自TARGET-NBL知名项目的数据集。我们使用几种聚类技术分析这些数据集,并用DBCV(基于密度的聚类验证)指数测量结果。在这些算法中,DBSCAN(基于密度的空间聚类应用与噪声)是唯一产生有意义的结果。我们仔细研究了三个数据集中DBSCAN识别的两组患者资料,并识别出几个相关的临床变量,这些变量明确地将患者划分为神经母细胞瘤文献中具有临床意义的两组。我们的结果可能对健康信息学产生重大影响,因为任何希望对罕见疾病患者的小数据进行聚类的计算分析师都可以选择使用DBSCAN和DBCV,而不是使用更常见的方法,如k-Means和Silhouette系数。
{"title":"DBSCAN and DBCV application to open medical records heterogeneous data for identifying clinically significant clusters of patients with neuroblastoma.","authors":"Davide Chicco, Luca Oneto, Davide Cangelosi","doi":"10.1186/s13040-025-00455-8","DOIUrl":"10.1186/s13040-025-00455-8","url":null,"abstract":"<p><p>Neuroblastoma is a common pediatric cancer that affects thousands of infants worldwide, especially children under five years of age. Although recovery for patients with neuroblastoma is possible in 80% of cases, only 40% of those with high-risk stage four neuroblastoma survive. Electronic health records of patients with this disease contain valuable data on patients that can be analyzed using computational intelligence and statistical software by biomedical informatics researchers. Unsupervised machine learning methods, in particular, can identify clinically significant subgroups of patients, which can lead to new therapies or medical treatments for future patients belonging to the same subgroups. However, access to these datasets is often restricted, making it difficult to obtain them for independent research projects. In this study, we retrieved three open datasets containing data from patients diagnosed with neuroblastoma: the Genoa dataset and the Shanghai dataset from the Neuroblastoma Electronic Health Records Open Data Repository, and a dataset from the TARGET-NBL renowned program. We analyzed these datasets using several clustering techniques and measured the results with the DBCV (Density-Based Clustering Validation) index. Among these algorithms, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) was the only one that produced meaningful results. We scrutinized the two clusters of patients' profiles identified by DBSCAN in the three datasets and recognized several relevant clinical variables that clearly partitioned the patients into the two clusters that have clinical meaning in the neuroblastoma literature. Our results can have a significant impact on health informatics, because any computational analyst wishing to cluster small data of patients of a rare disease can choose to use DBSCAN and DBCV rather than utilizing more common methods such as k-Means and Silhouette coefficient.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"40"},"PeriodicalIF":4.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12164137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144286933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A probabilistic approach for building disease phenotypes across electronic health records. 在电子健康记录中建立疾病表型的概率方法。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-11 DOI: 10.1186/s13040-025-00454-9
David Vidmar, Jessica De Freitas, Will Thompson, John M Pfeifer, Brandon K Fornwalt, Noah Zimmerman, Riccardo Miotto, Ruijun Chen
{"title":"A probabilistic approach for building disease phenotypes across electronic health records.","authors":"David Vidmar, Jessica De Freitas, Will Thompson, John M Pfeifer, Brandon K Fornwalt, Noah Zimmerman, Riccardo Miotto, Ruijun Chen","doi":"10.1186/s13040-025-00454-9","DOIUrl":"10.1186/s13040-025-00454-9","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"39"},"PeriodicalIF":4.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
subMG automates data submission for metagenomics studies. subMG自动提交宏基因组学研究的数据。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-06-05 DOI: 10.1186/s13040-025-00453-w
Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba

Background: Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.

Results: subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.

Conclusions: By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.

背景:公开可用的宏基因组学数据集对于确保科学发现的可重复性和支持当代大规模研究至关重要。然而,提交一个全面的宏基因组数据集既麻烦又耗时。它需要包括样本信息、测序读数、组装、分组组合、宏基因组组装基因组(MAGs)和适当的元数据。因此,宏基因组学研究的发表往往带有不完整的数据集,或者在某些情况下,根本没有任何数据。subMG通过简化和自动化数据提交过程来解决这一挑战,从而鼓励更广泛和更一致的数据共享。结果:subMG简化了向欧洲核苷酸档案馆(ENA)提交宏基因组学研究结果的过程,允许研究人员以单一形式输入他们研究中的文件和元数据,并自动化下游任务,否则需要大量的手工工作和专业知识。该工具附带了全面的文档以及为不同用例量身定制的示例数据,可以通过命令行或图形用户界面(GUI)操作,使其易于部署到广泛的潜在用户。结论:通过简化基因组解析宏基因组学研究数据集的提交,subMG显著减少了研究人员所需的时间、精力和专业知识,从而为未来更多、更全面的数据提交铺平了道路。越来越多的文献完备且公平的数据可用于未来的研究,特别是在荟萃分析和比较研究中。
{"title":"subMG automates data submission for metagenomics studies.","authors":"Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba","doi":"10.1186/s13040-025-00453-w","DOIUrl":"10.1186/s13040-025-00453-w","url":null,"abstract":"<p><strong>Background: </strong>Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.</p><p><strong>Results: </strong>subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.</p><p><strong>Conclusions: </strong>By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"38"},"PeriodicalIF":4.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12142852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network-based analyses of multiomics data in biomedicine. 基于网络的生物医学多组学数据分析。
IF 6.1 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-27 DOI: 10.1186/s13040-025-00452-x
Rachit Kumar, Joseph D Romano, Marylyn D Ritchie

Network representations of data are designed to encode relationships between concepts as sets of edges between nodes. Human biology is inherently complex and is represented by data that often exists in a hierarchical nature. One canonical example is the relationship that exists within and between various -omics datasets, including genomics, transcriptomics, and proteomics, among others. Encoding such data in a network-based or graph-based representation allows the explicit incorporation of such relationships into various biomedical big data tasks, including (but not limited to) disease subtyping, interaction prediction, biomarker identification, and patient classification. This review will present various existing approaches in using network representations and analysis of data in multiomics in the framework of deep learning and machine learning approaches, subdivided into supervised and unsupervised approaches, to identify benefits and drawbacks of various approaches as well as the possible next steps for the field.

数据的网络表示被设计为将概念之间的关系编码为节点之间的边集。人类生物学本质上是复杂的,并由数据表示,这些数据通常存在于层次结构中。一个典型的例子是存在于各种组学数据集内部和之间的关系,包括基因组学、转录组学和蛋白质组学等。在基于网络或基于图的表示中对这些数据进行编码,可以将这些关系明确地合并到各种生物医学大数据任务中,包括(但不限于)疾病亚型、相互作用预测、生物标志物识别和患者分类。本综述将介绍在深度学习和机器学习方法框架下使用网络表示和多组学数据分析的各种现有方法,细分为监督和无监督方法,以确定各种方法的优缺点以及该领域可能的下一步。
{"title":"Network-based analyses of multiomics data in biomedicine.","authors":"Rachit Kumar, Joseph D Romano, Marylyn D Ritchie","doi":"10.1186/s13040-025-00452-x","DOIUrl":"10.1186/s13040-025-00452-x","url":null,"abstract":"<p><p>Network representations of data are designed to encode relationships between concepts as sets of edges between nodes. Human biology is inherently complex and is represented by data that often exists in a hierarchical nature. One canonical example is the relationship that exists within and between various -omics datasets, including genomics, transcriptomics, and proteomics, among others. Encoding such data in a network-based or graph-based representation allows the explicit incorporation of such relationships into various biomedical big data tasks, including (but not limited to) disease subtyping, interaction prediction, biomarker identification, and patient classification. This review will present various existing approaches in using network representations and analysis of data in multiomics in the framework of deep learning and machine learning approaches, subdivided into supervised and unsupervised approaches, to identify benefits and drawbacks of various approaches as well as the possible next steps for the field.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"37"},"PeriodicalIF":6.1,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12117783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144161878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis. 更正:通过多尺度人类相互作用组网络和社区分析了解急性髓系白血病的治疗靶点。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-22 DOI: 10.1186/s13040-025-00451-y
Suruthy Sivanathan, Ting Hu
{"title":"Correction: Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis.","authors":"Suruthy Sivanathan, Ting Hu","doi":"10.1186/s13040-025-00451-y","DOIUrl":"10.1186/s13040-025-00451-y","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"36"},"PeriodicalIF":4.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12096567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144127755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records. 大数据中的联合模型:纵向电子健康记录中所需数据质量的基于仿真的指南。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-13 DOI: 10.1186/s13040-025-00450-z
Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke

Background: Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.

Methods: In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.

Results: Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.

背景:据报道,在过去十年中,办公室医生和医院对电子健康数据(EHR)的使用有所增加。然而,这些数据类型带来了完整性和数据质量方面的挑战,特别是对于更复杂的模型,不清楚这些特征如何影响性能。方法:采用纵向模型和生存模型相结合的联合模型,将所有可用信息纳入模型中。本文的目的是为纵向电子病历数据的必要质量建立基于仿真的指导方针,使联合模型比cox模型表现得更好。我们通过系统和透明地改变数据质量的不同特征,例如测量频率、噪声和患者之间的异质性,进行了广泛的模拟研究。我们应用联合模型并评估其相对于传统Cox生存建模技术的性能。结果:关键发现表明,疾病发病前的生物标志物变化必须在相似的患者组中保持一致。随着噪声的增加和测量密度的提高,联合模型在模型性能上超过了传统的Cox回归模型。我们用两个现实世界的例子来说明指南的有用性和局限性,即血清胆红素对原发性胆汁性肝硬化的影响和估计肾小球滤过率对慢性肾脏疾病的影响。
{"title":"Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.","authors":"Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke","doi":"10.1186/s13040-025-00450-z","DOIUrl":"10.1186/s13040-025-00450-z","url":null,"abstract":"<p><strong>Background: </strong>Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.</p><p><strong>Methods: </strong>In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.</p><p><strong>Results: </strong>Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"35"},"PeriodicalIF":4.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Motif clustering and digital biomarker extraction for free-living physical activity analysis. 更正:基序聚类和数字生物标记提取用于自由生活的身体活动分析。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-12 DOI: 10.1186/s13040-025-00449-6
Ya-Ting Liang, Charlotte Wang
{"title":"Correction: Motif clustering and digital biomarker extraction for free-living physical activity analysis.","authors":"Ya-Ting Liang, Charlotte Wang","doi":"10.1186/s13040-025-00449-6","DOIUrl":"10.1186/s13040-025-00449-6","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"34"},"PeriodicalIF":4.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12067653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144008381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting the approaches to DNA damage detection in genetic toxicology: insights and regulatory implications. 重访遗传毒理学中DNA损伤检测的方法:见解和监管意义。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-06 DOI: 10.1186/s13040-025-00447-8
Sulaiman Mohammed Alnasser

Genetic toxicology is crucial for evaluating the potential risks of chemicals and drugs to human health and the environment. The emergence of high-throughput technologies has transformed this field, providing more efficient, cost-effective, and ethically sound methods for genotoxicity testing. It utilizes advanced screening techniques, including automated in vitro assays and computational models to rapidly assess the genotoxic potential of thousands of compounds simultaneously. This review explores the transformation of traditional in vitro and in vivo methods into computational models for genotoxicity assessment. By leveraging advances in machine learning, artificial intelligence, and high-throughput screening, computational approaches are increasingly replacing conventional methods. Coupling conventional screening with artificial intelligence (AI) and machine learning (ML) models has significantly enhanced their predictive capabilities, enabling the identification of genotoxicity signatures tied to molecular structures and biological pathways. Regulatory agencies increasingly support such methodologies as humane alternatives to traditional animal models, provided they are validated and exhibit strong predictive power. Standardization efforts, including the establishment of common endpoints across testing approaches, are pivotal for enhancing comparability and fostering consensus in toxicological assessments. Initiatives like ToxCast exemplify the successful incorporation of HTS data into regulatory decision-making, demonstrating that well-interpreted in vitro results can align with in vivo outcomes. Innovations in testing methodologies, global data sharing, and real-time monitoring continue to refine the precision and personalization of risk assessments, promising a transformative impact on safety evaluations and regulatory frameworks.

遗传毒理学对于评估化学品和药物对人类健康和环境的潜在风险至关重要。高通量技术的出现改变了这一领域,为基因毒性检测提供了更有效、更具成本效益和合乎伦理的方法。它利用先进的筛选技术,包括自动体外测定和计算模型,同时快速评估数千种化合物的遗传毒性潜力。这篇综述探讨了传统的体外和体内方法转化为遗传毒性评估的计算模型。通过利用机器学习、人工智能和高通量筛选方面的进步,计算方法正日益取代传统方法。将传统筛选与人工智能(AI)和机器学习(ML)模型相结合,大大增强了它们的预测能力,从而能够识别与分子结构和生物途径相关的遗传毒性特征。监管机构越来越多地支持这些方法作为传统动物模型的人道替代品,只要它们得到验证并显示出强大的预测能力。标准化工作,包括建立跨测试方法的共同终点,对于加强毒理学评估的可比性和促进共识至关重要。像ToxCast这样的项目成功地将HTS数据整合到监管决策中,证明了良好解释的体外结果可以与体内结果一致。测试方法、全球数据共享和实时监测方面的创新不断提高风险评估的准确性和个性化,有望对安全评估和监管框架产生革命性的影响。
{"title":"Revisiting the approaches to DNA damage detection in genetic toxicology: insights and regulatory implications.","authors":"Sulaiman Mohammed Alnasser","doi":"10.1186/s13040-025-00447-8","DOIUrl":"https://doi.org/10.1186/s13040-025-00447-8","url":null,"abstract":"<p><p>Genetic toxicology is crucial for evaluating the potential risks of chemicals and drugs to human health and the environment. The emergence of high-throughput technologies has transformed this field, providing more efficient, cost-effective, and ethically sound methods for genotoxicity testing. It utilizes advanced screening techniques, including automated in vitro assays and computational models to rapidly assess the genotoxic potential of thousands of compounds simultaneously. This review explores the transformation of traditional in vitro and in vivo methods into computational models for genotoxicity assessment. By leveraging advances in machine learning, artificial intelligence, and high-throughput screening, computational approaches are increasingly replacing conventional methods. Coupling conventional screening with artificial intelligence (AI) and machine learning (ML) models has significantly enhanced their predictive capabilities, enabling the identification of genotoxicity signatures tied to molecular structures and biological pathways. Regulatory agencies increasingly support such methodologies as humane alternatives to traditional animal models, provided they are validated and exhibit strong predictive power. Standardization efforts, including the establishment of common endpoints across testing approaches, are pivotal for enhancing comparability and fostering consensus in toxicological assessments. Initiatives like ToxCast exemplify the successful incorporation of HTS data into regulatory decision-making, demonstrating that well-interpreted in vitro results can align with in vivo outcomes. Innovations in testing methodologies, global data sharing, and real-time monitoring continue to refine the precision and personalization of risk assessments, promising a transformative impact on safety evaluations and regulatory frameworks.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"33"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis. 通过多尺度人类相互作用组网络和社区分析了解急性髓系白血病的治疗靶点。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-05-02 DOI: 10.1186/s13040-025-00444-x
Suruthy Sivanathan, Ting Hu

Acute myeloid leukemia (AML) is caused by proliferation of mutated myeloid progenitor cells. The standard chemotherapy regimen does not efficiently cause remission as there is a high relapse rate. Resistance acquired by leukemic stem cells is suggested to be one of the root causes of relapse. Therefore, there is an urgency to develop new drugs for therapy. Repurposing approved drugs for AML can provide a cost-friendly, time-efficient, and affordable alternative. The multiscale interactome network is a computational tool that can identify potential therapeutic candidates by comparing mechanisms of the drug and disease. Communities that could be potentially experimentally validated are detected in the multiscale interactome network using the algorithm CRank. The results are evaluated through literature search and Gene Ontology (GO) enrichment analysis. In this research, we identify therapeutic candidates for AML and their mechanisms from the interactome, and isolate prioritized communities that are dominant in the therapeutic mechanism that could potentially be used as a prompt for pre-clinical/translational research (e.g. bioinformatics, laboratory research) to focus on biological functions and mechanisms that are associated with the disease and drug. This method may allow for an efficient and accelerated discovery of potential candidates for AML, a rapidly progressing disease.

急性髓系白血病(AML)是由髓系祖细胞突变引起的。由于复发率高,标准化疗方案不能有效缓解。白血病干细胞获得的耐药性被认为是复发的根本原因之一。因此,迫切需要开发新的治疗药物。重新利用已批准的药物治疗急性髓性白血病可以提供一种成本低廉、时间高效且负担得起的替代方案。多尺度相互作用网络是一种计算工具,可以通过比较药物和疾病的机制来识别潜在的治疗候选者。使用曲柄算法在多尺度交互组网络中检测可能被实验验证的社区。通过文献检索和基因本体(GO)富集分析对结果进行评价。在这项研究中,我们从相互作用组中确定AML的候选治疗药物及其机制,并分离出在治疗机制中占主导地位的优先社区,这些社区可能被用作临床前/转化研究(例如生物信息学,实验室研究)的提示,以关注与疾病和药物相关的生物学功能和机制。这种方法可以有效和加速发现AML(一种快速发展的疾病)的潜在候选药物。
{"title":"Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis.","authors":"Suruthy Sivanathan, Ting Hu","doi":"10.1186/s13040-025-00444-x","DOIUrl":"10.1186/s13040-025-00444-x","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) is caused by proliferation of mutated myeloid progenitor cells. The standard chemotherapy regimen does not efficiently cause remission as there is a high relapse rate. Resistance acquired by leukemic stem cells is suggested to be one of the root causes of relapse. Therefore, there is an urgency to develop new drugs for therapy. Repurposing approved drugs for AML can provide a cost-friendly, time-efficient, and affordable alternative. The multiscale interactome network is a computational tool that can identify potential therapeutic candidates by comparing mechanisms of the drug and disease. Communities that could be potentially experimentally validated are detected in the multiscale interactome network using the algorithm CRank. The results are evaluated through literature search and Gene Ontology (GO) enrichment analysis. In this research, we identify therapeutic candidates for AML and their mechanisms from the interactome, and isolate prioritized communities that are dominant in the therapeutic mechanism that could potentially be used as a prompt for pre-clinical/translational research (e.g. bioinformatics, laboratory research) to focus on biological functions and mechanisms that are associated with the disease and drug. This method may allow for an efficient and accelerated discovery of potential candidates for AML, a rapidly progressing disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"32"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inter-organ correlation based multi-task deep learning model for dynamically predicting functional deterioration in multiple organ systems of ICU patients. 基于器官间相关性的多任务深度学习模型动态预测ICU患者多器官系统功能恶化。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-04-16 DOI: 10.1186/s13040-025-00445-w
Zhixuan Zeng, Yang Liu, Shuo Yao, Minjie Lin, Xu Cai, Wenbin Nan, Yiyang Xie, Xun Gong

Background: Functional deterioration (FD) of various organ systems is the major cause of death in ICU patients, but few studies propose effective multi-task (MT) model to predict FD of multiple organs simultaneously. This study propose a MT deep learning model named inter-organ correlation based multi-task model (IOC-MT), to dynamically predict FD in six organ systems.

Methods: Three public ICU databases were used for model training and validation. The IOC-MT was designed based on the routine MT deep learning framework, but it used a Graph Attention Networks (GAT) module to capture inter-organ correlation and an adaptive adjustment mechanism (AAM) to adjust prediction. We compared the IOC-MT to five single-task (ST) baseline models, including three deep models (LSTM-ST, GRU-ST, Transformer-ST) and two machine learning models (GRU-ST, RF-ST), and performed ablation study to assess the contribution of important components in IOC-MT. Model discrimination was evaluated by AUROC and AUPRC, and model calibration was assessed by the calibration curve. The attention weight and adjustment coefficient were analyzed at both overall and individual level to show the AAM of IOC-MT.

Results: The IOC-MT had comparable discrimination and calibration to LSTM-ST, GRU-ST and Transformer-ST for most organs under different gap windows in the internal and external validation, and obviously outperformed GRU-ST, RF-ST. The ablation study showed that the GAT, AAM and missing indicator could improve the overall performance of the model. Furthermore, the inter-organ correlation and prediction adjustment of IOC-MT were intuitive and comprehensible, and also had biological plausibility.

Conclusions: The IOC-MT is a promising MT model for dynamically predicting FD in six organ systems. It can capture inter-organ correlation and adjust the prediction for one organ based on aggregated information from the other organs.

背景:各脏器系统功能恶化(Functional degradation, FD)是ICU患者死亡的主要原因,但目前很少有研究提出有效的多任务(multi-task, MT)模型来同时预测多脏器功能恶化。本研究提出了一种基于器官间相关的多任务模型(IOC-MT)的机器学习深度模型,用于动态预测六个器官系统的FD。方法:使用3个ICU公共数据库进行模型训练和验证。ioc -机器翻译是在常规机器翻译深度学习框架的基础上设计的,但它使用了一个图注意网络(GAT)模块来捕获器官间的相关性,并使用了一个自适应调整机制(AAM)来调整预测。我们将IOC-MT与5个单任务(ST)基线模型进行了比较,包括3个深度模型(LSTM-ST、GRU-ST、Transformer-ST)和2个机器学习模型(GRU-ST、RF-ST),并进行了消融研究,以评估IOC-MT中重要成分的贡献。用AUROC和AUPRC评价模型判别,用标定曲线评价模型定标。从整体和个体两个层面分析了IOC-MT的注意权值和调节系数。结果:IOC-MT与LSTM-ST、GRU-ST和Transformer-ST在不同间隙窗的内、外验证中,对大多数器官具有相当的鉴别和校准能力,且明显优于GRU-ST、RF-ST。烧蚀研究表明,GAT、AAM和缺失指标可以提高模型的整体性能。此外,IOC-MT的器官间相关性和预测调整直观易懂,具有生物学合理性。结论:IOC-MT是一种很有前途的动态预测六个器官系统FD的MT模型。它可以捕获器官间的相关性,并根据来自其他器官的汇总信息调整对一个器官的预测。
{"title":"Inter-organ correlation based multi-task deep learning model for dynamically predicting functional deterioration in multiple organ systems of ICU patients.","authors":"Zhixuan Zeng, Yang Liu, Shuo Yao, Minjie Lin, Xu Cai, Wenbin Nan, Yiyang Xie, Xun Gong","doi":"10.1186/s13040-025-00445-w","DOIUrl":"https://doi.org/10.1186/s13040-025-00445-w","url":null,"abstract":"<p><strong>Background: </strong>Functional deterioration (FD) of various organ systems is the major cause of death in ICU patients, but few studies propose effective multi-task (MT) model to predict FD of multiple organs simultaneously. This study propose a MT deep learning model named inter-organ correlation based multi-task model (IOC-MT), to dynamically predict FD in six organ systems.</p><p><strong>Methods: </strong>Three public ICU databases were used for model training and validation. The IOC-MT was designed based on the routine MT deep learning framework, but it used a Graph Attention Networks (GAT) module to capture inter-organ correlation and an adaptive adjustment mechanism (AAM) to adjust prediction. We compared the IOC-MT to five single-task (ST) baseline models, including three deep models (LSTM-ST, GRU-ST, Transformer-ST) and two machine learning models (GRU-ST, RF-ST), and performed ablation study to assess the contribution of important components in IOC-MT. Model discrimination was evaluated by AUROC and AUPRC, and model calibration was assessed by the calibration curve. The attention weight and adjustment coefficient were analyzed at both overall and individual level to show the AAM of IOC-MT.</p><p><strong>Results: </strong>The IOC-MT had comparable discrimination and calibration to LSTM-ST, GRU-ST and Transformer-ST for most organs under different gap windows in the internal and external validation, and obviously outperformed GRU-ST, RF-ST. The ablation study showed that the GAT, AAM and missing indicator could improve the overall performance of the model. Furthermore, the inter-organ correlation and prediction adjustment of IOC-MT were intuitive and comprehensible, and also had biological plausibility.</p><p><strong>Conclusions: </strong>The IOC-MT is a promising MT model for dynamically predicting FD in six organ systems. It can capture inter-organ correlation and adjust the prediction for one organ based on aggregated information from the other organs.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"31"},"PeriodicalIF":4.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12001458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1