Transcriptomics is the study of RNA transcripts, the portion of the genome that is transcribed, in a specific cell, tissue, or organism. Transcriptomics provides insight into gene expression patterns, regulation, and the underlying mechanisms of cellular processes. Community transcriptomics takes this a step further by studying the RNA transcripts from environmental assemblies of organisms, with the intention of better understanding the interactions between members of the community. Community transcriptomics requires successful extraction of RNA from a diverse set of organisms and subsequent analysis via mapping those reads to a reference genome or de novo assembly of the reads. Both, extraction protocols and the analysis steps can pose hurdles for community transcriptomics. This review covers advances in transcriptomic techniques and assesses the viability of applying them to community transcriptomics.
{"title":"Environmental community transcriptomics: strategies and struggles.","authors":"Jeanet Mante, Kyra E Groover, Randi M Pullen","doi":"10.1093/bfgp/elae033","DOIUrl":"10.1093/bfgp/elae033","url":null,"abstract":"<p><p>Transcriptomics is the study of RNA transcripts, the portion of the genome that is transcribed, in a specific cell, tissue, or organism. Transcriptomics provides insight into gene expression patterns, regulation, and the underlying mechanisms of cellular processes. Community transcriptomics takes this a step further by studying the RNA transcripts from environmental assemblies of organisms, with the intention of better understanding the interactions between members of the community. Community transcriptomics requires successful extraction of RNA from a diverse set of organisms and subsequent analysis via mapping those reads to a reference genome or de novo assembly of the reads. Both, extraction protocols and the analysis steps can pose hurdles for community transcriptomics. This review covers advances in transcriptomic techniques and assesses the viability of applying them to community transcriptomics.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tyler Kolisnik, Faeze Keshavarz-Rahaghi, Rachel V Purcell, Adam N H Smith, Olin K Silander
Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
随机森林模型被广泛应用于基因组数据分析,并能深入揭示复杂的生物机制,尤其是当特征以交互、非线性或非相加的方式影响目标时。目前,一些计算速度最快的随机森林方法是用 Python 实现的。然而,许多生物学家使用 R 进行基因组数据分析,因为 R 提供了一个统一的平台来执行额外的统计分析和可视化。pyRforest 继承了 Python 的高效内存管理和并行化功能,并针对大型基因组数据集(如 RNA-seq 数据集)上的分类任务进行了优化。这种方法可用于估算和直观显示单个特征的 P 值,使研究人员能够识别出有可靠统计证据表明存在效应的特征子集。此外,pyRforest 还包括 SHapley Additive exPlanations 值的计算和可视化方法。pyRforest 结合了 Python 和 R 的优势,从而改进了用于基因组数据分析的随机森林模型的实现和可解释性。pyRforest 的下载地址为:https://www.github.com/tkolisnik/pyRforest,相关的 vignette 下载地址为:https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf。
{"title":"pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.","authors":"Tyler Kolisnik, Faeze Keshavarz-Rahaghi, Rachel V Purcell, Adam N H Smith, Olin K Silander","doi":"10.1093/bfgp/elae038","DOIUrl":"10.1093/bfgp/elae038","url":null,"abstract":"<p><p>Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn \"RandomForestClassifier\" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular epidemiology of Foot-and-mouth disease (FMD) is crucial to implement its control strategies including vaccination and containment, which primarily deals with knowing serotype, topotype, and lineage of the virus. The existing approaches including serotyping are biological in nature, which are time-consuming and risky due to live virus handling. Thus, novel computational tools are highly required for large-scale molecular epidemiology of the FMD virus. This study reported a comprehensive computational tool for FMD molecular epidemiology. Ten learning algorithms were initially evaluated on cross-validated and ten independent secondary datasets for serotype prediction using sequence-based features through accuracy, sensitivity and 14 other metrics. Next, best performing algorithms, with higher serotype predictive accuracies, were evaluated for topotype and lineage prediction using cross-validation. These algorithms are implemented in the computational tool. Then, performance of the developed approach was assessed on five independent secondary datasets, never seen before, and primary experimental data. Our cross-validated and independent evaluation of learning algorithms for serotype prediction revealed that support vector machine, random forest, XGBoost, and AdaBoost algorithms outperformed others. Then, these four algorithms were evaluated for topotype and lineage prediction, which achieved accuracy ≥96% and precision ≥95% on cross-validated data. These algorithms are implemented in the web-server (https://nifmd-bbf.icar.gov.in/MolEpidPred), which allows rapid molecular epidemiology of FMD virus. The independent validation of the MolEpidPred observed accuracies ≥98%, ≥90%, and ≥ 80% for serotype, topotype, and lineage prediction, respectively. On wet-lab data, the MolEpidPred tool provided results in fewer seconds and achieved accuracies of 100%, 100%, and 96% for serotype, topotype, and lineage prediction, respectively, when benchmarked with phylogenetic analysis. MolEpidPred tool provides an innovative platform for large-scale molecular epidemiology of FMD virus, which is crucial for tracking FMD virus infection and implementing control program.
{"title":"MolEpidPred: a novel computational tool for the molecular epidemiology of foot-and-mouth disease virus using VP1 nucleotide sequence data.","authors":"Samarendra Das, Utkal Nayak, Soumen Pal, Saravanan Subramaniam","doi":"10.1093/bfgp/elaf001","DOIUrl":"10.1093/bfgp/elaf001","url":null,"abstract":"<p><p>Molecular epidemiology of Foot-and-mouth disease (FMD) is crucial to implement its control strategies including vaccination and containment, which primarily deals with knowing serotype, topotype, and lineage of the virus. The existing approaches including serotyping are biological in nature, which are time-consuming and risky due to live virus handling. Thus, novel computational tools are highly required for large-scale molecular epidemiology of the FMD virus. This study reported a comprehensive computational tool for FMD molecular epidemiology. Ten learning algorithms were initially evaluated on cross-validated and ten independent secondary datasets for serotype prediction using sequence-based features through accuracy, sensitivity and 14 other metrics. Next, best performing algorithms, with higher serotype predictive accuracies, were evaluated for topotype and lineage prediction using cross-validation. These algorithms are implemented in the computational tool. Then, performance of the developed approach was assessed on five independent secondary datasets, never seen before, and primary experimental data. Our cross-validated and independent evaluation of learning algorithms for serotype prediction revealed that support vector machine, random forest, XGBoost, and AdaBoost algorithms outperformed others. Then, these four algorithms were evaluated for topotype and lineage prediction, which achieved accuracy ≥96% and precision ≥95% on cross-validated data. These algorithms are implemented in the web-server (https://nifmd-bbf.icar.gov.in/MolEpidPred), which allows rapid molecular epidemiology of FMD virus. The independent validation of the MolEpidPred observed accuracies ≥98%, ≥90%, and ≥ 80% for serotype, topotype, and lineage prediction, respectively. On wet-lab data, the MolEpidPred tool provided results in fewer seconds and achieved accuracies of 100%, 100%, and 96% for serotype, topotype, and lineage prediction, respectively, when benchmarked with phylogenetic analysis. MolEpidPred tool provides an innovative platform for large-scale molecular epidemiology of FMD virus, which is crucial for tracking FMD virus infection and implementing control program.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881699/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Melanoma is characterized by its rapid progression and high mortality rates, making early and accurate detection essential for improving patient outcomes. This paper presents a comprehensive review of significant advancements in early melanoma detection, with a focus on integrating computer vision and deep learning techniques. This study investigates cutting-edge neural networks such as YOLO, GAN, Mask R-CNN, ResNet, and DenseNet to explore their application in enhancing early melanoma detection and diagnosis. These models were critically evaluated for their capacity to enhance dermatological imaging and diagnostic accuracy, crucial for effective melanoma treatment. Our research demonstrates that these AI technologies refine image analysis and feature extraction, and enhance processing capabilities in various clinical settings. Additionally, we emphasize the importance of comprehensive dermatological datasets such as PH2, ISIC, DERMQUEST, and MED-NODE, which are crucial for training and validating these sophisticated models. Integrating these datasets ensures that the AI systems are robust, versatile, and perform well under diverse conditions. The results of this study suggest that the integration of AI into melanoma detection marks a significant advancement in the field of medical diagnostics and is expected to have the potential to improve patient outcomes through more accurate and earlier detection methods. Future research should focus on enhancing these technologies further, integrating multimodal data, and improving AI decision interpretability to facilitate clinical adoption, thus transforming melanoma diagnostics into a more precise, personalized, and preventive healthcare service.
{"title":"Advances in computer vision and deep learning-facilitated early detection of melanoma.","authors":"Yantong Liu, Chuang Li, Feifei Li, Rubin Lin, Dongdong Zhang, Yifan Lian","doi":"10.1093/bfgp/elaf002","DOIUrl":"10.1093/bfgp/elaf002","url":null,"abstract":"<p><p>Melanoma is characterized by its rapid progression and high mortality rates, making early and accurate detection essential for improving patient outcomes. This paper presents a comprehensive review of significant advancements in early melanoma detection, with a focus on integrating computer vision and deep learning techniques. This study investigates cutting-edge neural networks such as YOLO, GAN, Mask R-CNN, ResNet, and DenseNet to explore their application in enhancing early melanoma detection and diagnosis. These models were critically evaluated for their capacity to enhance dermatological imaging and diagnostic accuracy, crucial for effective melanoma treatment. Our research demonstrates that these AI technologies refine image analysis and feature extraction, and enhance processing capabilities in various clinical settings. Additionally, we emphasize the importance of comprehensive dermatological datasets such as PH2, ISIC, DERMQUEST, and MED-NODE, which are crucial for training and validating these sophisticated models. Integrating these datasets ensures that the AI systems are robust, versatile, and perform well under diverse conditions. The results of this study suggest that the integration of AI into melanoma detection marks a significant advancement in the field of medical diagnostics and is expected to have the potential to improve patient outcomes through more accurate and earlier detection methods. Future research should focus on enhancing these technologies further, integrating multimodal data, and improving AI decision interpretability to facilitate clinical adoption, thus transforming melanoma diagnostics into a more precise, personalized, and preventive healthcare service.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11942789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143733377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.
{"title":"STLBRF: an improved random forest algorithm based on standardized-threshold for feature screening of gene expression data.","authors":"Huini Feng, Ying Ju, Xiaofeng Yin, Wenshi Qiu, Xu Zhang","doi":"10.1093/bfgp/elae048","DOIUrl":"10.1093/bfgp/elae048","url":null,"abstract":"<p><p>When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hwisoo Choi, Hyeonkyu Kim, Hoebin Chung, Dong-Sung Lee, Junil Kim
Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.
{"title":"Application of computational algorithms for single-cell RNA-seq and ATAC-seq in neurodegenerative diseases.","authors":"Hwisoo Choi, Hyeonkyu Kim, Hoebin Chung, Dong-Sung Lee, Junil Kim","doi":"10.1093/bfgp/elae044","DOIUrl":"10.1093/bfgp/elae044","url":null,"abstract":"<p><p>Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangfei Huang, Zilu Yu, Juan Tian, Tao Chen, Aiping Wei, Chao Mei, Shibiao Chen, Yong Li
Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.
{"title":"m6A RNA modification pathway: orchestrating fibrotic mechanisms across multiple organs.","authors":"Xiangfei Huang, Zilu Yu, Juan Tian, Tao Chen, Aiping Wei, Chao Mei, Shibiao Chen, Yong Li","doi":"10.1093/bfgp/elae051","DOIUrl":"10.1093/bfgp/elae051","url":null,"abstract":"<p><p>Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Xia, An Xiong, Zilong Zhang, Quan Zou, Feifei Cui
Deep learning models have made significant progress in the biomedical field, particularly in the prediction of drug-drug interactions (DDIs). DDIs are pharmacodynamic reactions between two or more drugs in the body, which may lead to adverse effects and are of great significance for drug development and clinical research. However, predicting DDI through traditional clinical trials and experiments is not only costly but also time-consuming. When utilizing advanced Artificial Intelligence (AI) and deep learning techniques, both developers and users face multiple challenges, including the problem of acquiring and encoding data, as well as the difficulty of designing computational methods. In this paper, we review a variety of DDI prediction methods, including similarity-based, network-based, and integration-based approaches, to provide an up-to-date and easy-to-understand guide for researchers in different fields. Additionally, we provide an in-depth analysis of widely used molecular representations and a systematic exposition of the theoretical framework of models used to extract features from graph data.
{"title":"A comprehensive review of deep learning-based approaches for drug-drug interaction prediction.","authors":"Yan Xia, An Xiong, Zilong Zhang, Quan Zou, Feifei Cui","doi":"10.1093/bfgp/elae052","DOIUrl":"10.1093/bfgp/elae052","url":null,"abstract":"<p><p>Deep learning models have made significant progress in the biomedical field, particularly in the prediction of drug-drug interactions (DDIs). DDIs are pharmacodynamic reactions between two or more drugs in the body, which may lead to adverse effects and are of great significance for drug development and clinical research. However, predicting DDI through traditional clinical trials and experiments is not only costly but also time-consuming. When utilizing advanced Artificial Intelligence (AI) and deep learning techniques, both developers and users face multiple challenges, including the problem of acquiring and encoding data, as well as the difficulty of designing computational methods. In this paper, we review a variety of DDI prediction methods, including similarity-based, network-based, and integration-based approaches, to provide an up-to-date and easy-to-understand guide for researchers in different fields. Additionally, we provide an in-depth analysis of widely used molecular representations and a systematic exposition of the theoretical framework of models used to extract features from graph data.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11847217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.
高通量基因表达数据已广泛产生并用于生物机制研究、生物标记物检测、疾病诊断和预后。这些应用不仅包括大量转录组数据,还包括单细胞 RNA-seq 数据。然而,由于合成数据分析的限制,从转录组数据中提取可靠的生物信息仍然具有挑战性。目前的数据预处理方法,包括数据集归一化和批量效应校正,都不足以解决这些问题并提高下游分析的数据质量。另外,与依赖基因表达丰度的定量方法相比,侧重于基因表达相对顺序(ROGER)的定性方法信息量更大。基因表达成对分析方法是 ROGER 的增强版,旨在对样本空间或特征空间进行数据整合。在这篇综述中,我们总结了应用于转录组数据分析的方法,并讨论了这些方法在预测临床结果方面的潜力。
{"title":"Less is more: relative rank is more informative than absolute abundance for compositional NGS data.","authors":"Xubin Zheng, Nana Jin, Qiong Wu, Ning Zhang, Haonan Wu, Yuanhao Wang, Rui Luo, Tao Liu, Wanfu Ding, Qingshan Geng, Lixin Cheng","doi":"10.1093/bfgp/elae045","DOIUrl":"10.1093/bfgp/elae045","url":null,"abstract":"<p><p>High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Wojtulewski, Aleksandra Sikora, Sean Dineen, Mustafa Raoof, Aleksandra Karolak
Objective: The primary objective of this study is to investigate various applications of artificial intelligence (AI) and statistical methodologies for analyzing and managing peritoneal metastases (PM) caused by gastrointestinal cancers.
Methods: Relevant keywords and search criteria were comprehensively researched on PubMed and Google Scholar to identify articles and reviews related to the topic. The AI approaches considered were conventional machine learning (ML) and deep learning (DL) models, and the relevant statistical approaches included biostatistics and logistic models.
Results: The systematic literature review yielded nearly 30 articles meeting the predefined criteria. Analyses of these studies showed that AI methodologies consistently outperformed traditional statistical approaches. In the AI approaches, DL consistently produced the most precise results, while classical ML demonstrated varied performance but maintained high predictive accuracy. The sample size was the recurring factor that increased the accuracy of the predictions for models of the same type.
Conclusions: AI and statistical approaches can detect PM developing among patients with gastrointestinal cancers. Therefore, if clinicians integrated these approaches into diagnostics and prognostics, they could better analyze and manage PM, enhancing clinical decision-making and patients' outcomes. Collaboration across multiple institutions would also help in standardizing methods for data collection and allowing consistent results.
{"title":"Using artificial intelligence and statistics for managing peritoneal metastases from gastrointestinal cancers.","authors":"Adam Wojtulewski, Aleksandra Sikora, Sean Dineen, Mustafa Raoof, Aleksandra Karolak","doi":"10.1093/bfgp/elae049","DOIUrl":"10.1093/bfgp/elae049","url":null,"abstract":"<p><strong>Objective: </strong>The primary objective of this study is to investigate various applications of artificial intelligence (AI) and statistical methodologies for analyzing and managing peritoneal metastases (PM) caused by gastrointestinal cancers.</p><p><strong>Methods: </strong>Relevant keywords and search criteria were comprehensively researched on PubMed and Google Scholar to identify articles and reviews related to the topic. The AI approaches considered were conventional machine learning (ML) and deep learning (DL) models, and the relevant statistical approaches included biostatistics and logistic models.</p><p><strong>Results: </strong>The systematic literature review yielded nearly 30 articles meeting the predefined criteria. Analyses of these studies showed that AI methodologies consistently outperformed traditional statistical approaches. In the AI approaches, DL consistently produced the most precise results, while classical ML demonstrated varied performance but maintained high predictive accuracy. The sample size was the recurring factor that increased the accuracy of the predictions for models of the same type.</p><p><strong>Conclusions: </strong>AI and statistical approaches can detect PM developing among patients with gastrointestinal cancers. Therefore, if clinicians integrated these approaches into diagnostics and prognostics, they could better analyze and manage PM, enhancing clinical decision-making and patients' outcomes. Collaboration across multiple institutions would also help in standardizing methods for data collection and allowing consistent results.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}