Pub Date : 2024-08-21DOI: 10.1186/s12859-024-05899-z
Jaemin Jeon, Youjeong Suk, Sang Cheol Kim, Hye-Yeong Jo, Kwangsoo Kim, Inuk Jung
Background: Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers.
Method: In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes.
Results: DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https://github.com/cobi-git/DenoiseIt.
{"title":"Denoiseit: denoising gene expression data using rank based isolation trees.","authors":"Jaemin Jeon, Youjeong Suk, Sang Cheol Kim, Hye-Yeong Jo, Kwangsoo Kim, Inuk Jung","doi":"10.1186/s12859-024-05899-z","DOIUrl":"10.1186/s12859-024-05899-z","url":null,"abstract":"<p><strong>Background: </strong>Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers.</p><p><strong>Method: </strong>In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes.</p><p><strong>Results: </strong>DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https://github.com/cobi-git/DenoiseIt.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340143/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142016291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1186/s12859-024-05816-4
Maxime Estavoyer, Marion Dufeu, Grégoire Ranson, Sylvain Lefort, Thibault Voeltzel, Véronique Maguer-Satta, Olivier Gandrillon, Thomas Lepoutre
Background: In the present work, we aimed at modeling a relaxation experiment which consists in selecting a subfraction of a cell population and observing the speed at which the entire initial distribution for a given marker is reconstituted.
Methods: For this we first proposed a modification of a previously published mechanistic two-state model of gene expression to which we added a state-dependent proliferation term. This results in a system of two partial differential equations. Under the assumption of a linear dependence of the proliferation rate with respect to the marker level, we could derive the asymptotic profile of the solutions of this model.
Results: In order to confront our model with experimental data, we generated a relaxation experiment of the CD34 antigen on the surface of TF1-BA cells, starting either from the highest or the lowest CD34 expression levels. We observed in both cases that after approximately 25 days the distribution of CD34 returns to its initial stationary state. Numerical simulations, based on parameter values estimated from the dataset, have shown that the model solutions closely align with the experimental data from the relaxation experiments.
Conclusion: Altogether our results strongly support the notion that cells should be seen and modeled as probabilistic dynamical systems.
{"title":"Modeling relaxation experiments with a mechanistic model of gene expression.","authors":"Maxime Estavoyer, Marion Dufeu, Grégoire Ranson, Sylvain Lefort, Thibault Voeltzel, Véronique Maguer-Satta, Olivier Gandrillon, Thomas Lepoutre","doi":"10.1186/s12859-024-05816-4","DOIUrl":"10.1186/s12859-024-05816-4","url":null,"abstract":"<p><strong>Background: </strong>In the present work, we aimed at modeling a relaxation experiment which consists in selecting a subfraction of a cell population and observing the speed at which the entire initial distribution for a given marker is reconstituted.</p><p><strong>Methods: </strong>For this we first proposed a modification of a previously published mechanistic two-state model of gene expression to which we added a state-dependent proliferation term. This results in a system of two partial differential equations. Under the assumption of a linear dependence of the proliferation rate with respect to the marker level, we could derive the asymptotic profile of the solutions of this model.</p><p><strong>Results: </strong>In order to confront our model with experimental data, we generated a relaxation experiment of the CD34 antigen on the surface of TF1-BA cells, starting either from the highest or the lowest CD34 expression levels. We observed in both cases that after approximately 25 days the distribution of CD34 returns to its initial stationary state. Numerical simulations, based on parameter values estimated from the dataset, have shown that the model solutions closely align with the experimental data from the relaxation experiments.</p><p><strong>Conclusion: </strong>Altogether our results strongly support the notion that cells should be seen and modeled as probabilistic dynamical systems.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11334594/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142008229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1186/s12859-024-05894-4
Liqun Zhong, Lingrui Li, Ge Yang
Background: Fluorescence microscopy (FM) is an important and widely adopted biological imaging technique. Segmentation is often the first step in quantitative analysis of FM images. Deep neural networks (DNNs) have become the state-of-the-art tools for image segmentation. However, their performance on natural images may collapse under certain image corruptions or adversarial attacks. This poses real risks to their deployment in real-world applications. Although the robustness of DNN models in segmenting natural images has been studied extensively, their robustness in segmenting FM images remains poorly understood RESULTS: To address this deficiency, we have developed an assay that benchmarks robustness of DNN segmentation models using datasets of realistic synthetic 2D FM images with precisely controlled corruptions or adversarial attacks. Using this assay, we have benchmarked robustness of ten representative models such as DeepLab and Vision Transformer. We find that models with good robustness on natural images may perform poorly on FM images. We also find new robustness properties of DNN models and new connections between their corruption robustness and adversarial robustness. To further assess the robustness of the selected models, we have also benchmarked them on real microscopy images of different modalities without using simulated degradation. The results are consistent with those obtained on the realistic synthetic images, confirming the fidelity and reliability of our image synthesis method as well as the effectiveness of our assay.
Conclusions: Based on comprehensive benchmarking experiments, we have found distinct robustness properties of deep neural networks in semantic segmentation of FM images. Based on the findings, we have made specific recommendations on selection and design of robust models for FM image segmentation.
{"title":"Benchmarking robustness of deep neural networks in semantic segmentation of fluorescence microscopy images.","authors":"Liqun Zhong, Lingrui Li, Ge Yang","doi":"10.1186/s12859-024-05894-4","DOIUrl":"10.1186/s12859-024-05894-4","url":null,"abstract":"<p><strong>Background: </strong>Fluorescence microscopy (FM) is an important and widely adopted biological imaging technique. Segmentation is often the first step in quantitative analysis of FM images. Deep neural networks (DNNs) have become the state-of-the-art tools for image segmentation. However, their performance on natural images may collapse under certain image corruptions or adversarial attacks. This poses real risks to their deployment in real-world applications. Although the robustness of DNN models in segmenting natural images has been studied extensively, their robustness in segmenting FM images remains poorly understood RESULTS: To address this deficiency, we have developed an assay that benchmarks robustness of DNN segmentation models using datasets of realistic synthetic 2D FM images with precisely controlled corruptions or adversarial attacks. Using this assay, we have benchmarked robustness of ten representative models such as DeepLab and Vision Transformer. We find that models with good robustness on natural images may perform poorly on FM images. We also find new robustness properties of DNN models and new connections between their corruption robustness and adversarial robustness. To further assess the robustness of the selected models, we have also benchmarked them on real microscopy images of different modalities without using simulated degradation. The results are consistent with those obtained on the realistic synthetic images, confirming the fidelity and reliability of our image synthesis method as well as the effectiveness of our assay.</p><p><strong>Conclusions: </strong>Based on comprehensive benchmarking experiments, we have found distinct robustness properties of deep neural networks in semantic segmentation of FM images. Based on the findings, we have made specific recommendations on selection and design of robust models for FM image segmentation.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11334404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142008228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1186/s12859-024-05901-8
Junwei Luo, Jiayi Wang, Haixia Zhai, Junfeng Wang
Background: The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results.
Results: In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase .
Conclusions: Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.
背景:利用长读数进行单核苷酸多态性(SNP)分期已成为一种流行的方法,为人类疾病研究和动植物遗传研究提供了大量支持。然而,由于 SNP 位点之间联系关系的复杂性和读数中的测序误差,最近的方法仍无法获得令人满意的结果:在这项研究中,我们提出了一种基于图的算法--GCphase,它利用最小切割算法来进行分期。首先,基于长读数与参考基因组之间的比对,GCphase 过滤掉模糊的 SNP 位点和无用的读数信息。其次,GCphase 构建了一个图,其中一个顶点代表 SNP 位点的等位基因,每条边代表是否有读数支持;此外,GCphase 采用图最小切割算法对 SNP 进行分期。接下来,GCpahse 采用两个纠错步骤来完善上一步得到的分期结果,从而有效降低错误率。最后,GCphase 获得相位块。在 Nanopore 和 PacBio 长读取数据集上,GCphase 与其他三种方法(WhatsHap、HapCUT2 和 LongPhase)进行了比较。代码可从 https://github.com/baimawjy/GCphase 上获取:实验结果表明,与其他方法相比,在不同数据的不同测序深度下,GCphase 的切换错误数最少,准确率最高。
{"title":"GCphase: an SNP phasing method using a graph partition and error correction algorithm.","authors":"Junwei Luo, Jiayi Wang, Haixia Zhai, Junfeng Wang","doi":"10.1186/s12859-024-05901-8","DOIUrl":"10.1186/s12859-024-05901-8","url":null,"abstract":"<p><strong>Background: </strong>The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results.</p><p><strong>Results: </strong>In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase .</p><p><strong>Conclusions: </strong>Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11331634/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142003514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.","authors":"Egor Guguchkin, Artem Kasianov, Maksim Belenikin, Gaukhar Zobkova, Ekaterina Kosova, Vsevolod Makeev, Evgeny Karpulevich","doi":"10.1186/s12859-024-05892-6","DOIUrl":"10.1186/s12859-024-05892-6","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11334595/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142003513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1186/s12859-024-05883-7
Dallace Francis, Fengzhu Sun
Background: Construction of co-occurrence networks in metagenomic data often employs correlation to infer pairwise relationships between microbes. However, biological systems are complex and often display qualities non-linear in nature. Therefore, the reliance on correlation alone may overlook important relationships and fail to capture the full breadth of intricacies presented in underlying interaction networks. It is of interest to incorporate metrics that are not only robust in detecting linear relationships, but non-linear ones as well.
Results: In this paper, we explore the use of various mutual information (MI) estimation approaches for quantifying pairwise relationships in biological data and compare their performances against two traditional measures-Pearson's correlation coefficient, r, and Spearman's rank correlation coefficient, ρ. Metrics are tested on both simulated data designed to mimic pairwise relationships that may be found in ecological systems and real data from a previous study on C. diff infection. The results demonstrate that, in the case of asymmetric relationships, mutual information estimators can provide better detection ability than Pearson's or Spearman's correlation coefficients. Specifically, we find that these estimators have elevated performances in the detection of exploitative relationships, demonstrating the potential benefit of including them in future metagenomic studies.
Conclusions: Mutual information (MI) can uncover complex pairwise relationships in biological data that may be missed by traditional measures of association. The inclusion of such relationships when constructing co-occurrence networks can result in a more comprehensive analysis than the use of correlation alone.
背景:元基因组数据中共生网络的构建通常采用相关性来推断微生物之间的配对关系。然而,生物系统是复杂的,而且往往表现出非线性的性质。因此,仅仅依靠相关性可能会忽略重要的关系,也无法捕捉潜在交互网络中错综复杂的全部内容。因此,我们有兴趣采用不仅能检测线性关系,而且能检测非线性关系的指标:本文探讨了使用各种互信息(MI)估算方法来量化生物数据中的配对关系,并将它们的性能与两个传统指标--皮尔逊相关系数 r 和斯皮尔曼等级相关系数 ρ 进行了比较。结果表明,在非对称关系的情况下,互信息估计器比皮尔逊或斯皮尔曼相关系数能提供更好的检测能力。具体来说,我们发现这些估计值在检测利用关系时表现更佳,这表明将它们纳入未来的元基因组研究可能会带来益处:结论:互信息(MI)可以发现生物数据中复杂的成对关系,而传统的关联测量方法可能会忽略这些关系。结论:互信息(MI)可以发现生物数据中复杂的成对关系,而传统的关联测量方法可能会忽略这些关系。在构建共现网络时,如果将这些关系纳入其中,就能获得比单独使用相关性更全面的分析结果。
{"title":"A comparative analysis of mutual information methods for pairwise relationship detection in metagenomic data.","authors":"Dallace Francis, Fengzhu Sun","doi":"10.1186/s12859-024-05883-7","DOIUrl":"10.1186/s12859-024-05883-7","url":null,"abstract":"<p><strong>Background: </strong>Construction of co-occurrence networks in metagenomic data often employs correlation to infer pairwise relationships between microbes. However, biological systems are complex and often display qualities non-linear in nature. Therefore, the reliance on correlation alone may overlook important relationships and fail to capture the full breadth of intricacies presented in underlying interaction networks. It is of interest to incorporate metrics that are not only robust in detecting linear relationships, but non-linear ones as well.</p><p><strong>Results: </strong>In this paper, we explore the use of various mutual information (MI) estimation approaches for quantifying pairwise relationships in biological data and compare their performances against two traditional measures-Pearson's correlation coefficient, r, and Spearman's rank correlation coefficient, ρ. Metrics are tested on both simulated data designed to mimic pairwise relationships that may be found in ecological systems and real data from a previous study on C. diff infection. The results demonstrate that, in the case of asymmetric relationships, mutual information estimators can provide better detection ability than Pearson's or Spearman's correlation coefficients. Specifically, we find that these estimators have elevated performances in the detection of exploitative relationships, demonstrating the potential benefit of including them in future metagenomic studies.</p><p><strong>Conclusions: </strong>Mutual information (MI) can uncover complex pairwise relationships in biological data that may be missed by traditional measures of association. The inclusion of such relationships when constructing co-occurrence networks can result in a more comprehensive analysis than the use of correlation alone.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323399/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141981635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1186/s12859-024-05831-5
Li-Pang Chen, Hsiao-Ting Huang
Background: Survival analysis has been used to characterize the time-to-event data. In medical studies, a typical application is to analyze the survival time of specific cancers by using high-dimensional gene expressions. The main challenges include the involvement of non-informaive gene expressions and possibly nonlinear relationship between survival time and gene expressions. Moreover, due to possibly imprecise data collection or wrong record, measurement error might be ubiquitous in the survival time and its censoring status. Ignoring measurement error effects may incur biased estimator and wrong conclusion.
Results: To tackle those challenges and derive a reliable estimation with efficiently computational implementation, we develop the R package AFFECT, which is referred to Accelerated Functional Failure time model with Error-Contaminated survival Times.
Conclusions: This package aims to correct for measurement error effects in survival times and implements a boosting algorithm under corrected data to determine informative gene expressions as well as derive the corresponding nonlinear functions.
背景:生存分析一直被用于描述从时间到事件的数据特征。在医学研究中,一个典型的应用是利用高维基因表达来分析特定癌症的存活时间。面临的主要挑战包括:非有效基因表达的参与,以及生存时间与基因表达之间可能存在的非线性关系。此外,由于数据收集可能不精确或记录错误,生存时间及其普查状态可能普遍存在测量误差。忽略测量误差的影响可能会导致估计结果有偏差,并得出错误的结论:为了应对这些挑战,并通过高效的计算实现可靠的估计,我们开发了 R 软件包 AFFECT,即带有误差污染生存时间的加速功能性故障时间模型:该软件包旨在校正生存时间的测量误差效应,并在校正后的数据下实施提升算法,以确定有信息量的基因表达,并推导出相应的非线性函数。
{"title":"AFFECT: an R package for accelerated functional failure time model with error-contaminated survival times and applications to gene expression data.","authors":"Li-Pang Chen, Hsiao-Ting Huang","doi":"10.1186/s12859-024-05831-5","DOIUrl":"10.1186/s12859-024-05831-5","url":null,"abstract":"<p><strong>Background: </strong>Survival analysis has been used to characterize the time-to-event data. In medical studies, a typical application is to analyze the survival time of specific cancers by using high-dimensional gene expressions. The main challenges include the involvement of non-informaive gene expressions and possibly nonlinear relationship between survival time and gene expressions. Moreover, due to possibly imprecise data collection or wrong record, measurement error might be ubiquitous in the survival time and its censoring status. Ignoring measurement error effects may incur biased estimator and wrong conclusion.</p><p><strong>Results: </strong>To tackle those challenges and derive a reliable estimation with efficiently computational implementation, we develop the R package AFFECT, which is referred to Accelerated Functional Failure time model with Error-Contaminated survival Times.</p><p><strong>Conclusions: </strong>This package aims to correct for measurement error effects in survival times and implements a boosting algorithm under corrected data to determine informative gene expressions as well as derive the corresponding nonlinear functions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323647/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1186/s12859-024-05891-7
Xin-Fei Wang, Chang-Qing Yu, Zhu-Hong You, Yan Wang, Lan Huang, Yan Qiao, Lei Wang, Zheng-Wei Li
Circular RNA (CircRNA)-microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted.
{"title":"BEROLECMI: a novel prediction method to infer circRNA-miRNA interaction from the role definition of molecular attributes and biological networks.","authors":"Xin-Fei Wang, Chang-Qing Yu, Zhu-Hong You, Yan Wang, Lan Huang, Yan Qiao, Lei Wang, Zheng-Wei Li","doi":"10.1186/s12859-024-05891-7","DOIUrl":"10.1186/s12859-024-05891-7","url":null,"abstract":"<p><p>Circular RNA (CircRNA)-microRNA (miRNA) interaction (CMI) is an important model for the regulation of biological processes by non-coding RNA (ncRNA), which provides a new perspective for the study of human complex diseases. However, the existing CMI prediction models mainly rely on the nearest neighbor structure in the biological network, ignoring the molecular network topology, so it is difficult to improve the prediction performance. In this paper, we proposed a new CMI prediction method, BEROLECMI, which uses molecular sequence attributes, molecular self-similarity, and biological network topology to define the specific role feature representation for molecules to infer the new CMI. BEROLECMI effectively makes up for the lack of network topology in the CMI prediction model and achieves the highest prediction performance in three commonly used data sets. In the case study, 14 of the 15 pairs of unknown CMIs were correctly predicted.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316391/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141911579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1186/s12859-024-05890-8
Jie Yang, Jiya Tian, Jinchao Miao, Yunsheng Chen
Background: In complex agricultural environments, the presence of shadows, leaf debris, and uneven illumination can hinder the performance of leaf segmentation models for cucumber disease detection. This is further exacerbated by the imbalance in pixel ratios between background and lesion areas, which affects the accuracy of lesion extraction.
Results: An original image segmentation framework, the LS-ASPP model, which utilizes a two-stage Atrous Spatial Pyramid Pooling (ASPP) approach combined with adaptive loss to address these challenges has been proposed. The Leaf-ASPP stage employs attention modules and residual structures to capture multi-scale semantic information and enhance edge perception, allowing for precise extraction of leaf contours from complex backgrounds. In the Spot-ASPP stage, we adjust the dilation rate of ASPP and introduce a Convolutional Attention Block Module (CABM) to accurately segment lesion areas.
Conclusions: The LS-ASPP model demonstrates improved performance in semantic segmentation accuracy under complex conditions, providing a robust solution for precise cucumber lesion segmentation. By focusing on challenging pixels and adapting to the specific requirements of agricultural image analysis, our framework has the potential to enhance disease detection accuracy and facilitate timely and effective crop management decisions.
{"title":"Adaptive loss-guided multi-stage residual ASPP for lesion segmentation and disease detection in cucumber under complex backgrounds.","authors":"Jie Yang, Jiya Tian, Jinchao Miao, Yunsheng Chen","doi":"10.1186/s12859-024-05890-8","DOIUrl":"10.1186/s12859-024-05890-8","url":null,"abstract":"<p><strong>Background: </strong>In complex agricultural environments, the presence of shadows, leaf debris, and uneven illumination can hinder the performance of leaf segmentation models for cucumber disease detection. This is further exacerbated by the imbalance in pixel ratios between background and lesion areas, which affects the accuracy of lesion extraction.</p><p><strong>Results: </strong>An original image segmentation framework, the LS-ASPP model, which utilizes a two-stage Atrous Spatial Pyramid Pooling (ASPP) approach combined with adaptive loss to address these challenges has been proposed. The Leaf-ASPP stage employs attention modules and residual structures to capture multi-scale semantic information and enhance edge perception, allowing for precise extraction of leaf contours from complex backgrounds. In the Spot-ASPP stage, we adjust the dilation rate of ASPP and introduce a Convolutional Attention Block Module (CABM) to accurately segment lesion areas.</p><p><strong>Conclusions: </strong>The LS-ASPP model demonstrates improved performance in semantic segmentation accuracy under complex conditions, providing a robust solution for precise cucumber lesion segmentation. By focusing on challenging pixels and adapting to the specific requirements of agricultural image analysis, our framework has the potential to enhance disease detection accuracy and facilitate timely and effective crop management decisions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11312732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141905822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1186/s12859-024-05870-y
Xing Zhao, Zigui Chen, Huating Wang, Hao Sun
Quantitative measurement of RNA expression levels through RNA-Seq is an ideal replacement for conventional cancer diagnosis via microscope examination. Currently, cancer-related RNA-Seq studies focus on two aspects: classifying the status and tissue of origin of a sample and discovering marker genes. Existing studies typically identify marker genes by statistically comparing healthy and cancer samples. However, this approach overlooks marker genes with low expression level differences and may be influenced by experimental results. This paper introduces "GENESO," a novel framework for pan-cancer classification and marker gene discovery using the occlusion method in conjunction with deep learning. we first trained a baseline deep LSTM neural network capable of distinguishing the origins and statuses of samples utilizing RNA-Seq data. Then, we propose a novel marker gene discovery method called "Symmetrical Occlusion (SO)". It collaborates with the baseline LSTM network, mimicking the "gain of function" and "loss of function" of genes to evaluate their importance in pan-cancer classification quantitatively. By identifying the genes of utmost importance, we then isolate them to train new neural networks, resulting in higher-performance LSTM models that utilize only a reduced set of highly relevant genes. The baseline neural network achieves an impressive validation accuracy of 96.59% in pan-cancer classification. With the help of SO, the accuracy of the second network reaches 98.30%, while using 67% fewer genes. Notably, our method excels in identifying marker genes that are not differentially expressed. Moreover, we assessed the feasibility of our method using single-cell RNA-Seq data, employing known marker genes as a validation test.
{"title":"Occlusion enhanced pan-cancer classification via deep learning.","authors":"Xing Zhao, Zigui Chen, Huating Wang, Hao Sun","doi":"10.1186/s12859-024-05870-y","DOIUrl":"10.1186/s12859-024-05870-y","url":null,"abstract":"<p><p>Quantitative measurement of RNA expression levels through RNA-Seq is an ideal replacement for conventional cancer diagnosis via microscope examination. Currently, cancer-related RNA-Seq studies focus on two aspects: classifying the status and tissue of origin of a sample and discovering marker genes. Existing studies typically identify marker genes by statistically comparing healthy and cancer samples. However, this approach overlooks marker genes with low expression level differences and may be influenced by experimental results. This paper introduces \"GENESO,\" a novel framework for pan-cancer classification and marker gene discovery using the occlusion method in conjunction with deep learning. we first trained a baseline deep LSTM neural network capable of distinguishing the origins and statuses of samples utilizing RNA-Seq data. Then, we propose a novel marker gene discovery method called \"Symmetrical Occlusion (SO)\". It collaborates with the baseline LSTM network, mimicking the \"gain of function\" and \"loss of function\" of genes to evaluate their importance in pan-cancer classification quantitatively. By identifying the genes of utmost importance, we then isolate them to train new neural networks, resulting in higher-performance LSTM models that utilize only a reduced set of highly relevant genes. The baseline neural network achieves an impressive validation accuracy of 96.59% in pan-cancer classification. With the help of SO, the accuracy of the second network reaches 98.30%, while using 67% fewer genes. Notably, our method excels in identifying marker genes that are not differentially expressed. Moreover, we assessed the feasibility of our method using single-cell RNA-Seq data, employing known marker genes as a validation test.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11308240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141905825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}