Pub Date : 2013-05-25eCollection Date: 2013-01-01DOI: 10.1155/2013/361321
Bryan M H Keng, Oliver Y W Chan, Sean S J Heng, Maurice H T Ling
The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, studies had demonstrated that genes assumed to be stably expressed in a species are not necessarily stably expressed in other organisms. This study aims to evaluate the likelihood of genus-specific reference genes for liver using comparable microarray datasets from Spermophilus lateralis and Spermophilus tridecemlineatus. The coefficient of variance (CV) of each probe was calculated and there were 178 probes common between the lowest 10% CV of both datasets (n = 1258). All 3 lists were analysed by NormFinder. Our results suggest that the most invariant probe for S. tridecemlineatus was 02n12, while that for S. lateralis was 24j21. However, our results showed that Probes 02n12 and 24j21 are ranked 8644 and 926 in terms of invariancy for S. lateralis and S. tridecemlineatus respectively. This suggests the lack of common liver-specific reference probes for both S. lateralis and S. tridecemlineatus. Given that S. lateralis and S. tridecemlineatus are closely related species and the datasets are comparable, our results do not support the presence of genus-specific reference genes.
{"title":"Transcriptome Analysis of Spermophilus lateralis and Spermophilus tridecemlineatus Liver Does Not Suggest the Presence of Spermophilus-Liver-Specific Reference Genes.","authors":"Bryan M H Keng, Oliver Y W Chan, Sean S J Heng, Maurice H T Ling","doi":"10.1155/2013/361321","DOIUrl":"https://doi.org/10.1155/2013/361321","url":null,"abstract":"<p><p>The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, studies had demonstrated that genes assumed to be stably expressed in a species are not necessarily stably expressed in other organisms. This study aims to evaluate the likelihood of genus-specific reference genes for liver using comparable microarray datasets from Spermophilus lateralis and Spermophilus tridecemlineatus. The coefficient of variance (CV) of each probe was calculated and there were 178 probes common between the lowest 10% CV of both datasets (n = 1258). All 3 lists were analysed by NormFinder. Our results suggest that the most invariant probe for S. tridecemlineatus was 02n12, while that for S. lateralis was 24j21. However, our results showed that Probes 02n12 and 24j21 are ranked 8644 and 926 in terms of invariancy for S. lateralis and S. tridecemlineatus respectively. This suggests the lack of common liver-specific reference probes for both S. lateralis and S. tridecemlineatus. Given that S. lateralis and S. tridecemlineatus are closely related species and the datasets are comparable, our results do not support the presence of genus-specific reference genes. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"361321"},"PeriodicalIF":0.0,"publicationDate":"2013-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/361321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-18eCollection Date: 2013-01-01DOI: 10.1155/2013/725434
Eran Elhaik, Dan Graur
Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called "compositional domains," each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online.
{"title":"IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes.","authors":"Eran Elhaik, Dan Graur","doi":"10.1155/2013/725434","DOIUrl":"https://doi.org/10.1155/2013/725434","url":null,"abstract":"<p><p>Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called \"compositional domains,\" each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"725434"},"PeriodicalIF":0.0,"publicationDate":"2013-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393066/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is O(m (3) k) for an m × k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others.
单倍型是单染色体上的单核苷酸多态性(snp)的一种模式。从染色体序列中排列重叠但混杂错误的片段构建一对单倍型是一个非常重要的问题。最小纠错法的目的是尽量减少需要纠正的错误数量,从而通过片段的一致构建对单倍型。我们给出了一种启发式算法(HMEC),它使用增益度量搜索备选解决方案,并在没有更好的解决方案时停止。对于m × k SNP矩阵,每次迭代的时间复杂度为O(m (3) k),其中m和k分别为SNP矩阵中的片段数(行数)和SNP位点数(列数)。为了缩短运行时间,还给出了备选增益措施。在模拟数据和真实数据上,我们将我们的算法与其他方法在精度和运行时间方面进行了比较,我们大量的实验结果表明我们的算法优于其他算法。
{"title":"HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction.","authors":"Md Shamsuzzoha Bayzid, Md Maksudul Alam, Abdullah Mueen, Md Saidur Rahman","doi":"10.1155/2013/291741","DOIUrl":"https://doi.org/10.1155/2013/291741","url":null,"abstract":"<p><p>Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is O(m (3) k) for an m × k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"291741"},"PeriodicalIF":0.0,"publicationDate":"2013-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/291741","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33179456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-12eCollection Date: 2012-01-01DOI: 10.5402/2012/371718
Jarrett D Morrow, Brandon W Higgs
Accurate base calls generated from sequencing data are required for downstream biological interpretation, particularly in the case of rare variants. CallSim is a software application that provides evidence for the validity of base calls believed to be sequencing errors and it is applicable to Ion Torrent and 454 data. The algorithm processes a single read using a Monte Carlo approach to sequencing simulation, not dependent upon information from any other read in the data set. Three examples from general read correction, as well as from error-or-variant classification, demonstrate its effectiveness for a robust low-volume read processing base corrector. Specifically, correction of errors in Ion Torrent reads from a study involving mutations in multidrug resistant Staphylococcus aureus illustrates an ability to classify an erroneous homopolymer call. In addition, support for a rare variant in 454 data for a mixed viral population demonstrates "base rescue" capabilities. CallSim provides evidence regarding the validity of base calls in sequences produced by 454 or Ion Torrent systems and is intended for hands-on downstream processing analysis. These downstream efforts, although time consuming, are necessary steps for accurate identification of rare variants.
{"title":"CallSim: Evaluation of Base Calls Using Sequencing Simulation.","authors":"Jarrett D Morrow, Brandon W Higgs","doi":"10.5402/2012/371718","DOIUrl":"https://doi.org/10.5402/2012/371718","url":null,"abstract":"<p><p>Accurate base calls generated from sequencing data are required for downstream biological interpretation, particularly in the case of rare variants. CallSim is a software application that provides evidence for the validity of base calls believed to be sequencing errors and it is applicable to Ion Torrent and 454 data. The algorithm processes a single read using a Monte Carlo approach to sequencing simulation, not dependent upon information from any other read in the data set. Three examples from general read correction, as well as from error-or-variant classification, demonstrate its effectiveness for a robust low-volume read processing base corrector. Specifically, correction of errors in Ion Torrent reads from a study involving mutations in multidrug resistant Staphylococcus aureus illustrates an ability to classify an erroneous homopolymer call. In addition, support for a rare variant in 454 data for a mixed viral population demonstrates \"base rescue\" capabilities. CallSim provides evidence regarding the validity of base calls in sequences produced by 454 or Ion Torrent systems and is intended for hands-on downstream processing analysis. These downstream efforts, although time consuming, are necessary steps for accurate identification of rare variants. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"371718"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-21eCollection Date: 2012-01-01DOI: 10.5402/2012/696758
Nelson R Salinas, Damon P Little
We present eLAMP, a PERL script, with Tk graphical interface, that electronically simulates Loop-mediated AMPlification (LAMP) allowing users to efficiently test putative LAMP primers on a set of target sequences. eLAMP can match primers to templates using either exact (via builtin PERL regular expressions) or approximate matching (via the tre-agrep library). Performance was tested on 40 whole genome sequences of Staphylococcus. eLAMP correctly predicted that the two tested primer sets would amplify from S. aureus genomes and not amplify from other Staphylococcus species. Open source (GNU Public License) PERL scripts are available for download from the New York Botanical Garden's website.
{"title":"Electric LAMP: Virtual Loop-Mediated Isothermal AMPlification.","authors":"Nelson R Salinas, Damon P Little","doi":"10.5402/2012/696758","DOIUrl":"https://doi.org/10.5402/2012/696758","url":null,"abstract":"<p><p>We present eLAMP, a PERL script, with Tk graphical interface, that electronically simulates Loop-mediated AMPlification (LAMP) allowing users to efficiently test putative LAMP primers on a set of target sequences. eLAMP can match primers to templates using either exact (via builtin PERL regular expressions) or approximate matching (via the tre-agrep library). Performance was tested on 40 whole genome sequences of Staphylococcus. eLAMP correctly predicted that the two tested primer sets would amplify from S. aureus genomes and not amplify from other Staphylococcus species. Open source (GNU Public License) PERL scripts are available for download from the New York Botanical Garden's website. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"696758"},"PeriodicalIF":0.0,"publicationDate":"2012-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4417551/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33179454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-11eCollection Date: 2012-01-01DOI: 10.5402/2012/157135
Debra Knisley, Jeff Knisley, Chelsea Ross, Alissa Rockney
The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering techniques. We predict that many more topologies will be identified as those representing RNA secondary structures than currently predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more informative in classifying and exploring secondary RNA structures.
考虑到RNA分子在基因调控等生物过程中的重要性,从初级序列中预测次级RNA折叠仍然是一个重要的研究领域。为了促进这项工作,已经开发了二级结构的图模型来量化并从而表征二级褶皱的拓扑特性。在这项工作中,我们利用二级RNA结构的多图表示来检查现有图论描述符将所有可能的拓扑分类为RNA样或非RNA样的能力。我们使用了一百多个描述符和几种不同的机器学习方法,包括最近邻算法、单类分类器和几种聚类技术。我们预测,与目前在RAG (RNA- as - graphs)数据库中预测的相比,将有更多的拓扑被识别为代表RNA二级结构的拓扑。结果还表明,哪些描述符和算法在分类和探索二级RNA结构方面更有信息。
{"title":"Classifying multigraph models of secondary RNA structure using graph-theoretic descriptors.","authors":"Debra Knisley, Jeff Knisley, Chelsea Ross, Alissa Rockney","doi":"10.5402/2012/157135","DOIUrl":"https://doi.org/10.5402/2012/157135","url":null,"abstract":"<p><p>The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering techniques. We predict that many more topologies will be identified as those representing RNA secondary structures than currently predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more informative in classifying and exploring secondary RNA structures. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"157135"},"PeriodicalIF":0.0,"publicationDate":"2012-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5402/2012/157135","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33173868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-11eCollection Date: 2012-01-01DOI: 10.5402/2012/381023
Lars Seemann, Jason Shulman, Gemunu H Gunaratne
Early and accurate diagnoses of cancer can significantly improve the design of personalized therapy and enhance the success of therapeutic interventions. Histopathological approaches, which rely on microscopic examinations of malignant tissue, are not conducive to timely diagnoses. High throughput genomics offers a possible new classification of cancer subtypes. Unfortunately, most clustering algorithms have not been proven sufficiently robust. We propose a novel approach that relies on the use of statistical invariants and persistent homology, one of the most exciting recent developments in topology. It identifies a sufficient but compact set of genes for the analysis as well as a core group of tightly correlated patient samples for each subtype. Partitioning occurs hierarchically and allows for the identification of genetically similar subtypes. We analyzed the gene expression profiles of 202 tumors of the brain cancer glioblastoma multiforme (GBM) given at the Cancer Genome Atlas (TCGA) site. We identify core patient groups associated with the classical, mesenchymal, and proneural subtypes of GBM. In our analysis, the neural subtype consists of several small groups rather than a single component. A subtype prediction model is introduced which partitions tumors in a manner consistent with clustering algorithms but requires the genetic signature of only 59 genes.
{"title":"A robust topology-based algorithm for gene expression profiling.","authors":"Lars Seemann, Jason Shulman, Gemunu H Gunaratne","doi":"10.5402/2012/381023","DOIUrl":"https://doi.org/10.5402/2012/381023","url":null,"abstract":"<p><p>Early and accurate diagnoses of cancer can significantly improve the design of personalized therapy and enhance the success of therapeutic interventions. Histopathological approaches, which rely on microscopic examinations of malignant tissue, are not conducive to timely diagnoses. High throughput genomics offers a possible new classification of cancer subtypes. Unfortunately, most clustering algorithms have not been proven sufficiently robust. We propose a novel approach that relies on the use of statistical invariants and persistent homology, one of the most exciting recent developments in topology. It identifies a sufficient but compact set of genes for the analysis as well as a core group of tightly correlated patient samples for each subtype. Partitioning occurs hierarchically and allows for the identification of genetically similar subtypes. We analyzed the gene expression profiles of 202 tumors of the brain cancer glioblastoma multiforme (GBM) given at the Cancer Genome Atlas (TCGA) site. We identify core patient groups associated with the classical, mesenchymal, and proneural subtypes of GBM. In our analysis, the neural subtype consists of several small groups rather than a single component. A subtype prediction model is introduced which partitions tumors in a manner consistent with clustering algorithms but requires the genetic signature of only 59 genes. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"381023"},"PeriodicalIF":0.0,"publicationDate":"2012-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33173870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01eCollection Date: 2012-01-01DOI: 10.5402/2012/419419
Roozbeh Manshaei, Pooya Sobhe Bidari, Mahdi Aliyari Shoorehdeli, Amir Feizi, Tahmineh Lohrasebi, Mohammad Ali Malboobi, Matthew Kyan, Javad Alirezaie
Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task.
基因调控网络(GRN)逆向工程是从基因表达数据中估算细胞系统基因相互作用的过程。在本文中,我们提出了一种基于神经模糊网络的新型混合系统算法,用于在只有中等数量测量数据的情况下,从观测基因表达数据中重建基因调控网络。该方法利用模糊逻辑将基因表达值转化为定性描述因子,可通过一组定义的规则进行评估。该算法使用神经模糊网络来模拟基因对其他基因的影响,然后通过四个决策阶段来提取基因间的相互作用。所提算法的主要特点之一是可以轻松、快速地提取最佳数量的模糊规则,而不会过度参数化。通过对 S. cerevisiae 细胞周期的微阵列表达谱进行数据分析和仿真,证明了所提出的算法不仅能准确选择时间序列基因表达数据的模式,而且与已发表的四种算法相比,能提供重建精度更高的模型:DBNs、VBEM、时间延迟 ARACNE 和 PF 受 LASSO 的影响。在网络重建任务中,我们用召回率和 F 分数评估了所提出方法的准确性。
{"title":"Hybrid-controlled neurofuzzy networks analysis resulting in genetic regulatory networks reconstruction.","authors":"Roozbeh Manshaei, Pooya Sobhe Bidari, Mahdi Aliyari Shoorehdeli, Amir Feizi, Tahmineh Lohrasebi, Mohammad Ali Malboobi, Matthew Kyan, Javad Alirezaie","doi":"10.5402/2012/419419","DOIUrl":"10.5402/2012/419419","url":null,"abstract":"<p><p>Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"419419"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33173871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-16eCollection Date: 2012-01-01DOI: 10.5402/2012/537217
Lingling An, R W Doerge
It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest.
{"title":"Dynamic clustering of gene expression.","authors":"Lingling An, R W Doerge","doi":"10.5402/2012/537217","DOIUrl":"10.5402/2012/537217","url":null,"abstract":"<p><p>It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"537217"},"PeriodicalIF":0.0,"publicationDate":"2012-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393063/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33179453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression.
{"title":"Differential Expression Analysis for RNA-Seq Data.","authors":"Rashi Gupta, Isha Dewan, Richa Bharti, Alok Bhattacharya","doi":"10.5402/2012/817508","DOIUrl":"https://doi.org/10.5402/2012/817508","url":null,"abstract":"<p><p>RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"817508"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}