首页 > 最新文献

ISRN bioinformatics最新文献

英文 中文
Transcriptome Analysis of Spermophilus lateralis and Spermophilus tridecemlineatus Liver Does Not Suggest the Presence of Spermophilus-Liver-Specific Reference Genes. 侧边嗜精鱼和三头嗜精鱼肝脏的转录组分析不表明存在嗜精鱼肝脏特异性参考基因。
Pub Date : 2013-05-25 eCollection Date: 2013-01-01 DOI: 10.1155/2013/361321
Bryan M H Keng, Oliver Y W Chan, Sean S J Heng, Maurice H T Ling

The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, studies had demonstrated that genes assumed to be stably expressed in a species are not necessarily stably expressed in other organisms. This study aims to evaluate the likelihood of genus-specific reference genes for liver using comparable microarray datasets from Spermophilus lateralis and Spermophilus tridecemlineatus. The coefficient of variance (CV) of each probe was calculated and there were 178 probes common between the lowest 10% CV of both datasets (n = 1258). All 3 lists were analysed by NormFinder. Our results suggest that the most invariant probe for S. tridecemlineatus was 02n12, while that for S. lateralis was 24j21. However, our results showed that Probes 02n12 and 24j21 are ranked 8644 and 926 in terms of invariancy for S. lateralis and S. tridecemlineatus respectively. This suggests the lack of common liver-specific reference probes for both S. lateralis and S. tridecemlineatus. Given that S. lateralis and S. tridecemlineatus are closely related species and the datasets are comparable, our results do not support the presence of genus-specific reference genes.

基因表达研究中使用的内参基因的表达在大多数情况下是稳定的。然而,研究表明,在一个物种中被认为稳定表达的基因在其他生物体中并不一定稳定表达。本研究旨在利用来自侧精子和三叉戟精子的可比性微阵列数据集,评估肝脏中存在属特异性内参基因的可能性。计算每个探针的方差系数(CV),两个数据集CV值最低的10%之间共有178个探针(n = 1258)。NormFinder对这三个列表进行了分析。结果表明,三叉棘豆的不变探针为02n12,侧边棘豆的不变探针为24j21。然而,我们的研究结果表明,探针02n12和24j21对S. lateralis和S. tridecemlineatus的不变性分别排名8644和926。这表明缺乏共同的肝脏特异性的参考探针的S.外侧和S. tridecemlineatus。考虑到侧边棘猴和三角棘猴是亲缘关系密切的物种,且数据集具有可比性,我们的研究结果不支持存在属特异性内参基因。
{"title":"Transcriptome Analysis of Spermophilus lateralis and Spermophilus tridecemlineatus Liver Does Not Suggest the Presence of Spermophilus-Liver-Specific Reference Genes.","authors":"Bryan M H Keng,&nbsp;Oliver Y W Chan,&nbsp;Sean S J Heng,&nbsp;Maurice H T Ling","doi":"10.1155/2013/361321","DOIUrl":"https://doi.org/10.1155/2013/361321","url":null,"abstract":"<p><p>The expressions of reference genes used in gene expression studies are assumed to be stable under most circumstances. However, studies had demonstrated that genes assumed to be stably expressed in a species are not necessarily stably expressed in other organisms. This study aims to evaluate the likelihood of genus-specific reference genes for liver using comparable microarray datasets from Spermophilus lateralis and Spermophilus tridecemlineatus. The coefficient of variance (CV) of each probe was calculated and there were 178 probes common between the lowest 10% CV of both datasets (n = 1258). All 3 lists were analysed by NormFinder. Our results suggest that the most invariant probe for S. tridecemlineatus was 02n12, while that for S. lateralis was 24j21. However, our results showed that Probes 02n12 and 24j21 are ranked 8644 and 926 in terms of invariancy for S. lateralis and S. tridecemlineatus respectively. This suggests the lack of common liver-specific reference probes for both S. lateralis and S. tridecemlineatus. Given that S. lateralis and S. tridecemlineatus are closely related species and the datasets are comparable, our results do not support the presence of genus-specific reference genes. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"361321"},"PeriodicalIF":0.0,"publicationDate":"2013-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/361321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes. IsoPlotter(+):一个研究基因组组成结构的工具。
Pub Date : 2013-04-18 eCollection Date: 2013-01-01 DOI: 10.1155/2013/725434
Eran Elhaik, Dan Graur

Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called "compositional domains," each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online.

真核生物基因组,特别是动物基因组,具有复杂的、非均匀的、非随机的内部组成组织。动物基因组的组成结构可以被描述为离散基因组区域的马赛克,称为“组成结构域”,每个区域都有不同的GC含量,与其上游和下游相邻结构域显著不同。一个典型的动物基因组由不同长度和核苷酸组成的组成均匀和非均匀结构域的混合物组成,这些结构域相互穿插。我们设计了IsoPlotter,一种无偏分割算法,用于推断基因组的组成组织。IsoPlotter已成为描述基因组组成的不可或缺的工具,并已用于分析十多个基因组。应用包括描述新的基因组,将结构域组成与基因组成及其密度相关联,研究基因组的进化,测试系统基因组学假设,以及检测人类与已灭绝的古人类之间潜在的杂交区域。为了扩展IsoPlotter的使用,我们设计了一个完全自动化的管道,称为IsoPlotter(+)来执行所有的分割分析,包括图形显示,并建立了一个存储库,用于所有完整测序的脊椎动物和无脊椎动物基因组的组成域图。IsoPlotter(+)管道和存储库为基因组组成结构的研究提供了一个全面的解决方案。在这里,我们通过将IsoPlotter(+)应用于人类和昆虫基因组来演示它。计算工具和数据存储库可在线获得。
{"title":"IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes.","authors":"Eran Elhaik,&nbsp;Dan Graur","doi":"10.1155/2013/725434","DOIUrl":"https://doi.org/10.1155/2013/725434","url":null,"abstract":"<p><p>Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called \"compositional domains,\" each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"725434"},"PeriodicalIF":0.0,"publicationDate":"2013-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393066/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction. HMEC:个体单倍型最小纠错的启发式算法。
Pub Date : 2013-01-28 eCollection Date: 2013-01-01 DOI: 10.1155/2013/291741
Md Shamsuzzoha Bayzid, Md Maksudul Alam, Abdullah Mueen, Md Saidur Rahman

Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is O(m (3) k) for an m × k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others.

单倍型是单染色体上的单核苷酸多态性(snp)的一种模式。从染色体序列中排列重叠但混杂错误的片段构建一对单倍型是一个非常重要的问题。最小纠错法的目的是尽量减少需要纠正的错误数量,从而通过片段的一致构建对单倍型。我们给出了一种启发式算法(HMEC),它使用增益度量搜索备选解决方案,并在没有更好的解决方案时停止。对于m × k SNP矩阵,每次迭代的时间复杂度为O(m (3) k),其中m和k分别为SNP矩阵中的片段数(行数)和SNP位点数(列数)。为了缩短运行时间,还给出了备选增益措施。在模拟数据和真实数据上,我们将我们的算法与其他方法在精度和运行时间方面进行了比较,我们大量的实验结果表明我们的算法优于其他算法。
{"title":"HMEC: A Heuristic Algorithm for Individual Haplotyping with Minimum Error Correction.","authors":"Md Shamsuzzoha Bayzid,&nbsp;Md Maksudul Alam,&nbsp;Abdullah Mueen,&nbsp;Md Saidur Rahman","doi":"10.1155/2013/291741","DOIUrl":"https://doi.org/10.1155/2013/291741","url":null,"abstract":"<p><p>Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is O(m (3) k) for an m × k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2013 ","pages":"291741"},"PeriodicalIF":0.0,"publicationDate":"2013-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2013/291741","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33179456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
CallSim: Evaluation of Base Calls Using Sequencing Simulation. CallSim:基于序列模拟的基调用评估。
Pub Date : 2012-12-12 eCollection Date: 2012-01-01 DOI: 10.5402/2012/371718
Jarrett D Morrow, Brandon W Higgs

Accurate base calls generated from sequencing data are required for downstream biological interpretation, particularly in the case of rare variants. CallSim is a software application that provides evidence for the validity of base calls believed to be sequencing errors and it is applicable to Ion Torrent and 454 data. The algorithm processes a single read using a Monte Carlo approach to sequencing simulation, not dependent upon information from any other read in the data set. Three examples from general read correction, as well as from error-or-variant classification, demonstrate its effectiveness for a robust low-volume read processing base corrector. Specifically, correction of errors in Ion Torrent reads from a study involving mutations in multidrug resistant Staphylococcus aureus illustrates an ability to classify an erroneous homopolymer call. In addition, support for a rare variant in 454 data for a mixed viral population demonstrates "base rescue" capabilities. CallSim provides evidence regarding the validity of base calls in sequences produced by 454 or Ion Torrent systems and is intended for hands-on downstream processing analysis. These downstream efforts, although time consuming, are necessary steps for accurate identification of rare variants.

下游生物学解释需要从测序数据中产生准确的碱基呼叫,特别是在罕见变异的情况下。CallSim是一个软件应用程序,为基础调用的有效性提供证据,认为是测序错误,它适用于离子激流和454数据。该算法使用蒙特卡罗方法进行序列模拟,不依赖于数据集中任何其他读取的信息。从一般读取校正和错误或变体分类的三个示例中,证明了该方法对于鲁棒小容量读取处理基校正器的有效性。具体来说,一项涉及耐多药金黄色葡萄球菌突变的研究更正了离子激流读取中的错误,说明了对错误的均聚物进行分类的能力。此外,454数据中对混合病毒种群的罕见变体的支持证明了“碱基拯救”能力。CallSim提供了关于454或离子激流系统产生的序列中基本调用有效性的证据,旨在用于动手下游处理分析。这些下游工作,虽然耗时,是准确识别罕见变异的必要步骤。
{"title":"CallSim: Evaluation of Base Calls Using Sequencing Simulation.","authors":"Jarrett D Morrow,&nbsp;Brandon W Higgs","doi":"10.5402/2012/371718","DOIUrl":"https://doi.org/10.5402/2012/371718","url":null,"abstract":"<p><p>Accurate base calls generated from sequencing data are required for downstream biological interpretation, particularly in the case of rare variants. CallSim is a software application that provides evidence for the validity of base calls believed to be sequencing errors and it is applicable to Ion Torrent and 454 data. The algorithm processes a single read using a Monte Carlo approach to sequencing simulation, not dependent upon information from any other read in the data set. Three examples from general read correction, as well as from error-or-variant classification, demonstrate its effectiveness for a robust low-volume read processing base corrector. Specifically, correction of errors in Ion Torrent reads from a study involving mutations in multidrug resistant Staphylococcus aureus illustrates an ability to classify an erroneous homopolymer call. In addition, support for a rare variant in 454 data for a mixed viral population demonstrates \"base rescue\" capabilities. CallSim provides evidence regarding the validity of base calls in sequences produced by 454 or Ion Torrent systems and is intended for hands-on downstream processing analysis. These downstream efforts, although time consuming, are necessary steps for accurate identification of rare variants. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"371718"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Electric LAMP: Virtual Loop-Mediated Isothermal AMPlification. 电灯:虚拟环路介导的等温放大。
Pub Date : 2012-11-21 eCollection Date: 2012-01-01 DOI: 10.5402/2012/696758
Nelson R Salinas, Damon P Little

We present eLAMP, a PERL script, with Tk graphical interface, that electronically simulates Loop-mediated AMPlification (LAMP) allowing users to efficiently test putative LAMP primers on a set of target sequences. eLAMP can match primers to templates using either exact (via builtin PERL regular expressions) or approximate matching (via the tre-agrep library). Performance was tested on 40 whole genome sequences of Staphylococcus. eLAMP correctly predicted that the two tested primer sets would amplify from S. aureus genomes and not amplify from other Staphylococcus species. Open source (GNU Public License) PERL scripts are available for download from the New York Botanical Garden's website.

我们提出了eLAMP,一个PERL脚本,带有Tk图形界面,可以电子模拟环介导扩增(LAMP),允许用户有效地测试一组目标序列上假定的LAMP引物。eLAMP可以使用精确匹配(通过内置PERL正则表达式)或近似匹配(通过tree -agrep库)将引物匹配到模板。对40个葡萄球菌全基因组序列进行性能测试。eLAMP正确预测了这两组引物可以扩增金黄色葡萄球菌基因组,而不能扩增其他葡萄球菌。开放源码(GNU公共许可证)PERL脚本可以从纽约植物园的网站上下载。
{"title":"Electric LAMP: Virtual Loop-Mediated Isothermal AMPlification.","authors":"Nelson R Salinas,&nbsp;Damon P Little","doi":"10.5402/2012/696758","DOIUrl":"https://doi.org/10.5402/2012/696758","url":null,"abstract":"<p><p>We present eLAMP, a PERL script, with Tk graphical interface, that electronically simulates Loop-mediated AMPlification (LAMP) allowing users to efficiently test putative LAMP primers on a set of target sequences. eLAMP can match primers to templates using either exact (via builtin PERL regular expressions) or approximate matching (via the tre-agrep library). Performance was tested on 40 whole genome sequences of Staphylococcus. eLAMP correctly predicted that the two tested primer sets would amplify from S. aureus genomes and not amplify from other Staphylococcus species. Open source (GNU Public License) PERL scripts are available for download from the New York Botanical Garden's website. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"696758"},"PeriodicalIF":0.0,"publicationDate":"2012-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4417551/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33179454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Classifying multigraph models of secondary RNA structure using graph-theoretic descriptors. 利用图论描述符对二级RNA结构的多图模型进行分类。
Pub Date : 2012-11-11 eCollection Date: 2012-01-01 DOI: 10.5402/2012/157135
Debra Knisley, Jeff Knisley, Chelsea Ross, Alissa Rockney

The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering techniques. We predict that many more topologies will be identified as those representing RNA secondary structures than currently predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more informative in classifying and exploring secondary RNA structures.

考虑到RNA分子在基因调控等生物过程中的重要性,从初级序列中预测次级RNA折叠仍然是一个重要的研究领域。为了促进这项工作,已经开发了二级结构的图模型来量化并从而表征二级褶皱的拓扑特性。在这项工作中,我们利用二级RNA结构的多图表示来检查现有图论描述符将所有可能的拓扑分类为RNA样或非RNA样的能力。我们使用了一百多个描述符和几种不同的机器学习方法,包括最近邻算法、单类分类器和几种聚类技术。我们预测,与目前在RAG (RNA- as - graphs)数据库中预测的相比,将有更多的拓扑被识别为代表RNA二级结构的拓扑。结果还表明,哪些描述符和算法在分类和探索二级RNA结构方面更有信息。
{"title":"Classifying multigraph models of secondary RNA structure using graph-theoretic descriptors.","authors":"Debra Knisley,&nbsp;Jeff Knisley,&nbsp;Chelsea Ross,&nbsp;Alissa Rockney","doi":"10.5402/2012/157135","DOIUrl":"https://doi.org/10.5402/2012/157135","url":null,"abstract":"<p><p>The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering techniques. We predict that many more topologies will be identified as those representing RNA secondary structures than currently predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more informative in classifying and exploring secondary RNA structures. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"157135"},"PeriodicalIF":0.0,"publicationDate":"2012-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.5402/2012/157135","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33173868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A robust topology-based algorithm for gene expression profiling. 基于拓扑的基因表达谱鲁棒算法。
Pub Date : 2012-11-11 eCollection Date: 2012-01-01 DOI: 10.5402/2012/381023
Lars Seemann, Jason Shulman, Gemunu H Gunaratne

Early and accurate diagnoses of cancer can significantly improve the design of personalized therapy and enhance the success of therapeutic interventions. Histopathological approaches, which rely on microscopic examinations of malignant tissue, are not conducive to timely diagnoses. High throughput genomics offers a possible new classification of cancer subtypes. Unfortunately, most clustering algorithms have not been proven sufficiently robust. We propose a novel approach that relies on the use of statistical invariants and persistent homology, one of the most exciting recent developments in topology. It identifies a sufficient but compact set of genes for the analysis as well as a core group of tightly correlated patient samples for each subtype. Partitioning occurs hierarchically and allows for the identification of genetically similar subtypes. We analyzed the gene expression profiles of 202 tumors of the brain cancer glioblastoma multiforme (GBM) given at the Cancer Genome Atlas (TCGA) site. We identify core patient groups associated with the classical, mesenchymal, and proneural subtypes of GBM. In our analysis, the neural subtype consists of several small groups rather than a single component. A subtype prediction model is introduced which partitions tumors in a manner consistent with clustering algorithms but requires the genetic signature of only 59 genes.

癌症的早期和准确诊断可以显著改善个性化治疗的设计,提高治疗干预的成功率。组织病理学方法依赖于恶性组织的显微检查,不利于及时诊断。高通量基因组学提供了一种可能的癌症亚型新分类。不幸的是,大多数聚类算法还没有被证明足够健壮。我们提出了一种新的方法,它依赖于使用统计不变量和持久同调,这是拓扑学中最令人兴奋的最新发展之一。它为分析确定了一组足够但紧凑的基因,以及每个亚型的紧密相关患者样本的核心组。划分发生在层次上,并允许识别遗传上相似的亚型。我们分析了癌症基因组图谱(TCGA)网站上的202例多形性脑癌胶质母细胞瘤(GBM)的基因表达谱。我们确定了与GBM经典亚型、间质亚型和前膜亚型相关的核心患者组。在我们的分析中,神经亚型由几个小群体组成,而不是单一的组成部分。介绍了一种亚型预测模型,该模型以与聚类算法一致的方式划分肿瘤,但只需要59个基因的遗传特征。
{"title":"A robust topology-based algorithm for gene expression profiling.","authors":"Lars Seemann,&nbsp;Jason Shulman,&nbsp;Gemunu H Gunaratne","doi":"10.5402/2012/381023","DOIUrl":"https://doi.org/10.5402/2012/381023","url":null,"abstract":"<p><p>Early and accurate diagnoses of cancer can significantly improve the design of personalized therapy and enhance the success of therapeutic interventions. Histopathological approaches, which rely on microscopic examinations of malignant tissue, are not conducive to timely diagnoses. High throughput genomics offers a possible new classification of cancer subtypes. Unfortunately, most clustering algorithms have not been proven sufficiently robust. We propose a novel approach that relies on the use of statistical invariants and persistent homology, one of the most exciting recent developments in topology. It identifies a sufficient but compact set of genes for the analysis as well as a core group of tightly correlated patient samples for each subtype. Partitioning occurs hierarchically and allows for the identification of genetically similar subtypes. We analyzed the gene expression profiles of 202 tumors of the brain cancer glioblastoma multiforme (GBM) given at the Cancer Genome Atlas (TCGA) site. We identify core patient groups associated with the classical, mesenchymal, and proneural subtypes of GBM. In our analysis, the neural subtype consists of several small groups rather than a single component. A subtype prediction model is introduced which partitions tumors in a manner consistent with clustering algorithms but requires the genetic signature of only 59 genes. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"381023"},"PeriodicalIF":0.0,"publicationDate":"2012-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33173870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Hybrid-controlled neurofuzzy networks analysis resulting in genetic regulatory networks reconstruction. 通过混合控制神经模糊网络分析,重建基因调控网络。
Pub Date : 2012-11-01 eCollection Date: 2012-01-01 DOI: 10.5402/2012/419419
Roozbeh Manshaei, Pooya Sobhe Bidari, Mahdi Aliyari Shoorehdeli, Amir Feizi, Tahmineh Lohrasebi, Mohammad Ali Malboobi, Matthew Kyan, Javad Alirezaie

Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task.

基因调控网络(GRN)逆向工程是从基因表达数据中估算细胞系统基因相互作用的过程。在本文中,我们提出了一种基于神经模糊网络的新型混合系统算法,用于在只有中等数量测量数据的情况下,从观测基因表达数据中重建基因调控网络。该方法利用模糊逻辑将基因表达值转化为定性描述因子,可通过一组定义的规则进行评估。该算法使用神经模糊网络来模拟基因对其他基因的影响,然后通过四个决策阶段来提取基因间的相互作用。所提算法的主要特点之一是可以轻松、快速地提取最佳数量的模糊规则,而不会过度参数化。通过对 S. cerevisiae 细胞周期的微阵列表达谱进行数据分析和仿真,证明了所提出的算法不仅能准确选择时间序列基因表达数据的模式,而且与已发表的四种算法相比,能提供重建精度更高的模型:DBNs、VBEM、时间延迟 ARACNE 和 PF 受 LASSO 的影响。在网络重建任务中,我们用召回率和 F 分数评估了所提出方法的准确性。
{"title":"Hybrid-controlled neurofuzzy networks analysis resulting in genetic regulatory networks reconstruction.","authors":"Roozbeh Manshaei, Pooya Sobhe Bidari, Mahdi Aliyari Shoorehdeli, Amir Feizi, Tahmineh Lohrasebi, Mohammad Ali Malboobi, Matthew Kyan, Javad Alirezaie","doi":"10.5402/2012/419419","DOIUrl":"10.5402/2012/419419","url":null,"abstract":"<p><p>Reverse engineering of gene regulatory networks (GRNs) is the process of estimating genetic interactions of a cellular system from gene expression data. In this paper, we propose a novel hybrid systematic algorithm based on neurofuzzy network for reconstructing GRNs from observational gene expression data when only a medium-small number of measurements are available. The approach uses fuzzy logic to transform gene expression values into qualitative descriptors that can be evaluated by using a set of defined rules. The algorithm uses neurofuzzy network to model genes effects on other genes followed by four stages of decision making to extract gene interactions. One of the main features of the proposed algorithm is that an optimal number of fuzzy rules can be easily and rapidly extracted without overparameterizing. Data analysis and simulation are conducted on microarray expression profiles of S. cerevisiae cell cycle and demonstrate that the proposed algorithm not only selects the patterns of the time series gene expression data accurately, but also provides models with better reconstruction accuracy when compared with four published algorithms: DBNs, VBEM, time delay ARACNE, and PF subjected to LASSO. The accuracy of the proposed approach is evaluated in terms of recall and F-score for the network reconstruction task. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"419419"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33173871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic clustering of gene expression. 基因表达的动态聚类
Pub Date : 2012-10-16 eCollection Date: 2012-01-01 DOI: 10.5402/2012/537217
Lingling An, R W Doerge

It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest.

众所周知,基因同时参与多个生物过程,而基因在这些过程中是相互协调的。遗憾的是,以发现新基因为目的对基因进行分组的聚类方法未能认识到生物过程的动态性质,即使对不同时间或发育阶段的基因表达进行评估,也只能提供静态的聚类。利用时间频率分析的技术和理论,基于不同频谱频率代表不同生物过程的假设,对周期性基因表达谱进行动态聚类。我们提出了一种两步聚类验证方法,以统计估计最佳聚类数量,并将重要聚类与噪声区分开来。由此产生的聚类揭示了协调共表达的基因。这种新颖的动态聚类方法可广泛应用于对序列顺序感兴趣的各种序列数据场景。
{"title":"Dynamic clustering of gene expression.","authors":"Lingling An, R W Doerge","doi":"10.5402/2012/537217","DOIUrl":"10.5402/2012/537217","url":null,"abstract":"<p><p>It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"537217"},"PeriodicalIF":0.0,"publicationDate":"2012-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393063/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33179453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential Expression Analysis for RNA-Seq Data. RNA-Seq数据的差异表达分析
Pub Date : 2012-09-20 eCollection Date: 2012-01-01 DOI: 10.5402/2012/817508
Rashi Gupta, Isha Dewan, Richa Bharti, Alok Bhattacharya

RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression.

RNA-Seq越来越多地被用于基因表达谱分析。在这种方法中,使用下一代测序(NGS)平台进行测序。由于其高度并行性,可以在短时间内以较低的成本生成数百万个读取。因此,数据分析是一项重大挑战,统计和计算方法的发展对于从这些庞大的数据中得出有意义的结论至关重要。在这里,我们评估了三种不同类型的规范化(转录物百万分率、M值的修剪平均值、分位数规范化),并评估规范化数据是否减少了重复之间的技术可变性。此外,我们还提出了两种检测两种生物条件下差异表达基因的新方法:(i)似然比法和(ii)贝叶斯法。我们提出的寻找差异表达基因的方法在三个真实数据集上进行了测试。我们的方法至少和现有的分析差异表达的方法一样好,甚至更好。
{"title":"Differential Expression Analysis for RNA-Seq Data.","authors":"Rashi Gupta,&nbsp;Isha Dewan,&nbsp;Richa Bharti,&nbsp;Alok Bhattacharya","doi":"10.5402/2012/817508","DOIUrl":"https://doi.org/10.5402/2012/817508","url":null,"abstract":"<p><p>RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression. </p>","PeriodicalId":90877,"journal":{"name":"ISRN bioinformatics","volume":"2012 ","pages":"817508"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33272167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
期刊
ISRN bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1