Journal of bioinformatics最新文献_第2页

ISCB/SPRINGER series in computational biology ISCB/SPRINGER计算生物学系列

Journal of bioinformatics

Pub Date : 2013-12-15 DOI: 10.1093/bioinformatics/btt630

A. Dress, M. Linial, O. Troyanskaya, M. Vingron

In late 2012, the International Society for Computational Biology (ISCB) and Springer partnered together to enhance the Springer book series in computational biology. The two worked closely together to come up with a strategy to bring to ISCB members and the community at large educational materials that would not only educate the community but also help advance the science. Sponsored by ISCB, the computational biology series publish the latest high-quality research devoted to specific issues in computer-assisted analysis of biological data. The main emphasis is on current scientific developments and innovative techniques in computational biology (bioinformatics), bringing to light methods from mathematics, statistics and computer science that directly address biological problems currently under investigation. The series offer publications that present the state-of-the-art regarding the problems in question, show computational biology/bioinformatics methods at work and discuss anticipated demands regarding developments in future methodology. Titles can range from focused monographs, to undergraduate and graduate textbooks and professional text/reference works. Additionally, ISCB members will receive a 25% discount on book purchases within the series. Springer is seeking to publish quality books in the areas including, but not limited to, databases, data analysis and ontologies; functional and comparative genomics; gene regulation and transcriptomics; protein interactions and networks; data, literature and text mining; molecular sequence analysis; biological networks; sequencing and genotyping technologies; population genetics; systems biology; imaging and visualization; computational proteomics; molecular structural biology; evolution and phylogenetics; metagenomics; biomedical applications; high performance biocomputing; and synthetic biological systems. Book proposal submission details can be found at the book series Web site (http:// www.springer.com/series/5769).

2012年底，国际计算生物学学会(ISCB)和斯普林格合作，加强了斯普林格计算生物学系列丛书。两人密切合作，制定了一项战略，向ISCB成员和整个社区提供教育材料，不仅可以教育社区，还可以帮助推动科学发展。由ISCB赞助，计算生物学系列发表了最新的高质量研究，致力于计算机辅助生物数据分析的具体问题。主要的重点是当前的科学发展和创新技术在计算生物学(生物信息学)，带来光的方法从数学，统计学和计算机科学，直接解决目前正在研究的生物问题。该系列提供的出版物展示了有关问题的最新技术，展示了工作中的计算生物学/生物信息学方法，并讨论了有关未来方法发展的预期需求。题目的范围可以从专著到本科和研究生教材以及专业文本/参考作品。此外，ISCB会员在购买该系列图书时将获得25%的折扣。施普林格正在寻求出版高质量的书籍，包括但不限于数据库、数据分析和本体论;功能和比较基因组学;基因调控和转录组学;蛋白质相互作用和网络;数据、文献和文本挖掘;分子序列分析;生物网络;测序和基因分型技术;群体遗传学;系统生物学;成像和可视化;计算蛋白质组学;分子结构生物学;进化与系统发育;宏基因组;生物医学应用程序;高性能生物计算;合成生物系统。图书提案提交的详细信息可在图书系列网站(http:// www.springer.com/series/5769)上找到。

{"title":"ISCB/SPRINGER series in computational biology","authors":"A. Dress, M. Linial, O. Troyanskaya, M. Vingron","doi":"10.1093/bioinformatics/btt630","DOIUrl":"https://doi.org/10.1093/bioinformatics/btt630","url":null,"abstract":"In late 2012, the International Society for Computational Biology (ISCB) and Springer partnered together to enhance the Springer book series in computational biology. The two worked closely together to come up with a strategy to bring to ISCB members and the community at large educational materials that would not only educate the community but also help advance the science. Sponsored by ISCB, the computational biology series publish the latest high-quality research devoted to specific issues in computer-assisted analysis of biological data. The main emphasis is on current scientific developments and innovative techniques in computational biology (bioinformatics), bringing to light methods from mathematics, statistics and computer science that directly address biological problems currently under investigation. The series offer publications that present the state-of-the-art regarding the problems in question, show computational biology/bioinformatics methods at work and discuss anticipated demands regarding developments in future methodology. Titles can range from focused monographs, to undergraduate and graduate textbooks and professional text/reference works. Additionally, ISCB members will receive a 25% discount on book purchases within the series. Springer is seeking to publish quality books in the areas including, but not limited to, databases, data analysis and ontologies; functional and comparative genomics; gene regulation and transcriptomics; protein interactions and networks; data, literature and text mining; molecular sequence analysis; biological networks; sequencing and genotyping technologies; population genetics; systems biology; imaging and visualization; computational proteomics; molecular structural biology; evolution and phylogenetics; metagenomics; biomedical applications; high performance biocomputing; and synthetic biological systems. Book proposal submission details can be found at the book series Web site (http:// www.springer.com/series/5769).","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"4 1","pages":"3246-3247"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89238007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Response to 'Comments on "MMFPh: A Maximal Motif Finder for Phosphoproteomics Datasets"' 对“MMFPh:磷酸化蛋白质组学数据集的最大Motif Finder”的评论的回应

Journal of bioinformatics

Pub Date : 2012-08-01 DOI: 10.1093/bioinformatics/bts347

Tuobin Wang, A. Kettenbach, S. Gerber, C. Bailey-Kellogg

Recently, two new approaches to find overrepresented motifs in phosphoproteomics datasets have been introduced: our MMFPh (Wang et al., 2012) and He et al.’s Motif-All (BMC Bioinformatics 2011). Both methods espouse the importance of completeness— finding all motifs supported by the data—in contrast to previous approaches that may miss some motifs due to algorithmic choices. As we discuss in the Introduction of our article, however, while both methods seek to identify all significant motifs, they employ different significance assessments. In some cases, the difference does not matter much if at all. However, as we show in the Results, in some cases the difference leads to Motif-All finding many more motifs than MMFPh, and many more than those that are biologically supported, including known false positives planted in synthetic datasets. They also lead to Motif-All occasionally missing a motif found by MMFPh, though not for the datasets and parameter settings employed in the presented examples. Since MMFPh and Motif-All employ different notions of significance, it is not surprising that empirically they do not find exactly the same sets of motifs. He et al. (submitted for publication) elaborate on this finding by providing a theoretical characterization with respect to their notion of significance, which is a global assessment of an entire peptide. In contrast, MMFPh employs a local assessment of individual amino acid/position pairs during construction of a motif [our Equation (1)], as introduced by the popular Motif-X approach to phosphorylation motif discovery

最近，引入了两种新方法来发现磷酸化蛋白质组学数据集中过度代表性的基序:我们的MMFPh (Wang等人，2012)和He等人的Motif-All (BMC Bioinformatics 2011)。这两种方法都支持完整性的重要性——找到数据支持的所有主题——与之前可能由于算法选择而错过某些主题的方法形成对比。然而，正如我们在文章的引言中所讨论的，虽然两种方法都试图识别所有重要的母题，但它们采用了不同的显著性评估。在某些情况下，这种差异就算有，也无关紧要。然而，正如我们在结果中所示，在某些情况下，这种差异导致Motif-All比MMFPh找到更多的motif，并且比那些生物学支持的motif要多得多，包括在合成数据集中植入的已知假阳性。它们也会导致motif - all偶尔丢失MMFPh发现的motif，尽管在所提供的示例中使用的数据集和参数设置不是这样。由于MMFPh和母题-都使用不同的意义概念，因此它们在经验上没有找到完全相同的母题集也就不足为奇了。他等人(已提交发表)通过提供关于其重要性概念的理论特征来详细阐述这一发现，这是对整个肽的全局评估。相比之下，MMFPh在构建基序期间对单个氨基酸/位置对进行局部评估[我们的公式(1)]，正如流行的motif - x磷酸化基序发现方法所介绍的那样

{"title":"Response to 'Comments on \"MMFPh: A Maximal Motif Finder for Phosphoproteomics Datasets\"'","authors":"Tuobin Wang, A. Kettenbach, S. Gerber, C. Bailey-Kellogg","doi":"10.1093/bioinformatics/bts347","DOIUrl":"https://doi.org/10.1093/bioinformatics/bts347","url":null,"abstract":"Recently, two new approaches to find overrepresented motifs in phosphoproteomics datasets have been introduced: our MMFPh (Wang et al., 2012) and He et al.’s Motif-All (BMC Bioinformatics 2011). Both methods espouse the importance of completeness— finding all motifs supported by the data—in contrast to previous approaches that may miss some motifs due to algorithmic choices. As we discuss in the Introduction of our article, however, while both methods seek to identify all significant motifs, they employ different significance assessments. In some cases, the difference does not matter much if at all. However, as we show in the Results, in some cases the difference leads to Motif-All finding many more motifs than MMFPh, and many more than those that are biologically supported, including known false positives planted in synthetic datasets. They also lead to Motif-All occasionally missing a motif found by MMFPh, though not for the datasets and parameter settings employed in the presented examples. Since MMFPh and Motif-All employ different notions of significance, it is not surprising that empirically they do not find exactly the same sets of motifs. He et al. (submitted for publication) elaborate on this finding by providing a theoretical characterization with respect to their notion of significance, which is a global assessment of an entire peptide. In contrast, MMFPh employs a local assessment of individual amino acid/position pairs during construction of a motif [our Equation (1)], as introduced by the popular Motif-X approach to phosphorylation motif discovery","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"7 1","pages":"2213"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74466340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Response to Letter to the Editor by Philip Good on To Permute or Not to Permute 对菲利普·古德致编辑的信的回应，关于是否要排位

Journal of bioinformatics

Pub Date : 2010-09-01 DOI: 10.1093/bioinformatics/btq313

V. Calian, J. Hsu

In current practice, such as GWAS (genome-wide association studies), permutation is often applied to multiple testing for association between large number of features [e.g. single nucleotide polymorphisms (SNPs)] and phenotypes (Hahn et al., 2008). Inferring that there is a difference between the phenotypic groups X and Y in some of the features is not very useful. One has to know for which features there is a difference. Exchangeability, a necessary condition for the validity of permutation tests, might be applicable if subjects are assigned randomly to treatments and the treatment is totally innocuous. However, instead of randomized, controlled clinical trials, bioinformatics discovery studies are mostly retrospective. Huang et al. (2006) gives examples of how permutation testing may fail to control Type I error when exchangeability does not hold. Equally important, Theorem 2.2 of this paper gives a succinct condition on when permutation testing is valid, even when exchangeability fails. This condition is as follows. In testing the null hypothesis that there is no difference in an entire set of features between groups X and Y , when the sample sizes are equal, even if the data distributions FX and FY have unequal even order cumulants, so long as they have equal odd higher order (third order and higher) cumulants, permutation testing controls Type I error rate. This precise condition is the basis for the subsequent papers Xu and Hsu (2007) and Calian et al. (2008) to uncover the Marginal-Determines-the Joint (MDJ) distribution condition needed for permutation multiple testing to control multiple testing error rates. Regardless of sample sizes, permutation multiple tests may not control false discoveries of which features are predictive of phenotype, unless it is assumed that the joint distributions of nonpredictive features are identical between the X and Y groups. Checking this assumption on the joint distribution using the data

在当前的实践中，例如GWAS(全基因组关联研究)，排列通常用于对大量特征[例如单核苷酸多态性(snp)]与表型之间的关联进行多重测试(Hahn等人，2008)。推断表型组X和Y在某些特征上存在差异并不是很有用。我们必须知道哪些特征是不同的。互换性是排列试验有效性的必要条件，如果受试者被随机分配到治疗中，并且治疗是完全无害的，则可能适用。然而，生物信息学发现研究大多是回顾性的，而不是随机对照临床试验。Huang等人(2006)举例说明，当互换性不成立时，排列测试可能无法控制I型错误。同样重要的是，本文的定理2.2给出了一个简单的条件，即当互换性失效时，置换检验是有效的。这个条件如下。当样本大小相等时，即使数据分布FX和FY具有不等的偶数阶累积量，只要它们具有相等的奇数高阶(三阶和更高)累积量，在检验零假设时，置换检验控制类型I错误率。这一精确条件是后续论文Xu and Hsu(2007)和Calian et al.(2008)的基础，揭示了置换多重检验控制多重检验错误率所需的边际决定-联合(MDJ)分布条件。无论样本量如何，排列多重检验可能无法控制哪些特征可预测表型的错误发现，除非假设非预测特征的联合分布在X组和Y组之间是相同的。用数据在联合分布上检验这个假设

{"title":"Response to Letter to the Editor by Philip Good on To Permute or Not to Permute","authors":"V. Calian, J. Hsu","doi":"10.1093/bioinformatics/btq313","DOIUrl":"https://doi.org/10.1093/bioinformatics/btq313","url":null,"abstract":"In current practice, such as GWAS (genome-wide association studies), permutation is often applied to multiple testing for association between large number of features [e.g. single nucleotide polymorphisms (SNPs)] and phenotypes (Hahn et al., 2008). Inferring that there is a difference between the phenotypic groups X and Y in some of the features is not very useful. One has to know for which features there is a difference. Exchangeability, a necessary condition for the validity of permutation tests, might be applicable if subjects are assigned randomly to treatments and the treatment is totally innocuous. However, instead of randomized, controlled clinical trials, bioinformatics discovery studies are mostly retrospective. Huang et al. (2006) gives examples of how permutation testing may fail to control Type I error when exchangeability does not hold. Equally important, Theorem 2.2 of this paper gives a succinct condition on when permutation testing is valid, even when exchangeability fails. This condition is as follows. In testing the null hypothesis that there is no difference in an entire set of features between groups X and Y , when the sample sizes are equal, even if the data distributions FX and FY have unequal even order cumulants, so long as they have equal odd higher order (third order and higher) cumulants, permutation testing controls Type I error rate. This precise condition is the basis for the subsequent papers Xu and Hsu (2007) and Calian et al. (2008) to uncover the Marginal-Determines-the Joint (MDJ) distribution condition needed for permutation multiple testing to control multiple testing error rates. Regardless of sample sizes, permutation multiple tests may not control false discoveries of which features are predictive of phenotype, unless it is assumed that the joint distributions of nonpredictive features are identical between the X and Y groups. Checking this assumption on the joint distribution using the data","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"17 1","pages":"2215"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88301609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Papers on normalization, variable selection, classification or clustering of microarray data 关于微阵列数据的归一化、变量选择、分类或聚类的论文

Journal of bioinformatics

Pub Date : 2009-03-01 DOI: 10.1093/bioinformatics/btp038

David M. Rocke, T. Ideker, O. Troyanskaya, John Quackenbush, J. Dopazo

Over the last decade or so, there have been large numbers of methods published on approaches for normalization, variable (gene) selection, classification, and clustering of microarray data. As indicated in the scope document for Bioinformatics, this requires papers describing new methods for these problems to meet a very high standard, showing important improvement in results for real biological data, as well as novelty. In this editorial, we describe some standards that need to be met for papers in these areas to be seriously considered. We ask that prospective authors consider these points carefully before submission of their papers to Bioinformatics. The Role of Simulation. Simulation can be useful in investigating the properties of various methods of data analysis. Yet there are important barriers to credible use of simulation in microarray studies, largely due to what we don’t know about the statistical distribution of measured gene expression levels. First, the distribution across transcripts of true expression values is dependent on the biological state of the tissue or cell, and for a given state this is unknown, even in distributional form, and may further exhibit genespecific and platform-specific effects. Second, the correlation within biological replicates of true expression is unknown, and is likely unknowable in detail given that it is

在过去的十年左右，已经发表了大量关于微阵列数据规范化、变量(基因)选择、分类和聚类的方法。正如生物信息学的范围文件所指出的，这要求描述这些问题的新方法的论文达到非常高的标准，显示出对真实生物数据的结果的重要改进，以及新颖性。在这篇社论中，我们描述了一些需要满足的标准，这些领域的论文被认真考虑。我们要求未来的作者在将论文提交给生物信息学之前仔细考虑这些要点。模拟的作用。模拟在研究各种数据分析方法的特性时是有用的。然而，在微阵列研究中可靠地使用模拟存在重要障碍，这主要是由于我们不知道测量基因表达水平的统计分布。首先，真实表达值的转录本分布取决于组织或细胞的生物学状态，对于给定的状态，这是未知的，即使以分布形式，也可能进一步表现出基因特异性和平台特异性效应。其次，真实表达的生物复制内部的相关性是未知的，并且很可能是不可知的细节，因为它是

{"title":"Papers on normalization, variable selection, classification or clustering of microarray data","authors":"David M. Rocke, T. Ideker, O. Troyanskaya, John Quackenbush, J. Dopazo","doi":"10.1093/bioinformatics/btp038","DOIUrl":"https://doi.org/10.1093/bioinformatics/btp038","url":null,"abstract":"Over the last decade or so, there have been large numbers of methods published on approaches for normalization, variable (gene) selection, classification, and clustering of microarray data. As indicated in the scope document for Bioinformatics, this requires papers describing new methods for these problems to meet a very high standard, showing important improvement in results for real biological data, as well as novelty. In this editorial, we describe some standards that need to be met for papers in these areas to be seriously considered. We ask that prospective authors consider these points carefully before submission of their papers to Bioinformatics. The Role of Simulation. Simulation can be useful in investigating the properties of various methods of data analysis. Yet there are important barriers to credible use of simulation in microarray studies, largely due to what we don’t know about the statistical distribution of measured gene expression levels. First, the distribution across transcripts of true expression values is dependent on the biological state of the tissue or cell, and for a given state this is unknown, even in distributional form, and may further exhibit genespecific and platform-specific effects. Second, the correlation within biological replicates of true expression is unknown, and is likely unknowable in detail given that it is","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"8 1","pages":"701-702"},"PeriodicalIF":0.0,"publicationDate":"2009-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75260159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

In response to comment on 'A congruence index for testing topological similarity between trees' 回应“用于测试树之间拓扑相似性的同余指数”的评论

Journal of bioinformatics

Pub Date : 2009-01-01 DOI: 10.1093/BIOINFORMATICS/BTN535

D. M. Vienne, T. Giraud, O. Martin

In response to comment on ‘A congruence index for testing topological similarity between trees’ Damien M. de Vienne1,∗, Tatiana Giraud1 and Olivier C. Martin2,3 1Univ Paris-Sud, Laboratoire de Recherche en Informatique, UMR8623, Orsay F-91405; CNRS, Orsay F-91405, 2Univ Paris-Sud, UMR8626, LPTMS, Orsay F-91405; CNRS, Orsay F-91405 and 3Univ Paris-Sud, UMR8120, Laboratoire de Genetique Vegetale du Moulon, Gif-sur-Yvette F-91190, France

Damien M. de Vienne1，∗，Tatiana Giraud1和Olivier C. martin2, 1Univ Paris-Sud, Laboratoire de Recherche en Informatique, UMR8623, Orsay F-91405;CNRS, Orsay F-91405, 2Univ Paris-Sud, UMR8626, LPTMS, Orsay F-91405;CNRS, Orsay F-91405和3Univ Paris-Sud, UMR8120, Moulon植物遗传学实验室，Gif-sur-Yvette F-91190，法国

引用次数: 7

In response to "On E-value for tandem MS scoring schemes" 回应“关于串联质谱评分计划的e值”

Journal of bioinformatics

Pub Date : 2008-07-15 DOI: 10.1093/bioinformatics/btn252

Jainab Khatun, Morgan C. Giddings

We thank Mark Segal for raising the issue of interpreting MS/MS scores. As he noted, we used a method proposed by Fenyo and Beavis (FB) (2003) to asses the significance of identification using HMM_Score. In his letter, Segal makes two basic assertions about this use: (1) that the extreme value distribution does not apply for the MS/MS database scoring systems used by FB and our HMM and (2) the linear tail fitting of the log survival function is not robust. He proposes a method that he authored as an alternative for estimating evd parameters that he says may be more robust, and also points to a method by Shen et al. that is specific to assessing significance of proteins/peptides identifications using MS/MS data. While it is valuable to examine whether there exist better ways of statistically interpreting the results of MS/MS search, in his letter, Segal did not provide any clear supporting evidence for his claim that the MS/MS scorers cannot use E-values. In our case, we calculate a score distribution for all random matches on-the-fly, then deriving the survival function, s, (the cumulative probability distribution) and finally, fitting a line to log of this function for the high-scoring portion of s. We verified the methodology for a series of randomly chosen HMM_Score search results, observing that in all cases, the fit had very high correlation values (R2 > 0.9). All subsequent validation of HMM_Score was performed using the E-values produced, and as reported the system performs well.

我们感谢Mark Segal提出解释MS/MS分数的问题。正如他所指出的，我们使用了Fenyo和Beavis (FB)(2003)提出的方法，使用HMM_Score来评估识别的重要性。在他的信中，Segal对这种用法做出了两个基本断言:(1)极值分布不适用于FB和我们的HMM使用的MS/MS数据库评分系统;(2)对数生存函数的线性尾部拟合不是鲁棒的。他提出了自己撰写的一种方法，作为估计evd参数的替代方法，他认为这种方法可能更稳健，并指出Shen等人的一种方法，该方法专门用于使用MS/MS数据评估蛋白质/肽鉴定的重要性。虽然研究是否存在更好的统计方法来解释质谱/质谱搜索结果是有价值的，但在他的信中，西格尔没有提供任何明确的证据来支持他的说法，即质谱/质谱评分者不能使用e值。在我们的案例中，我们计算所有随机匹配的得分分布，然后推导生存函数s(累积概率分布)，最后，为s的高分部分拟合该函数的对数。我们对一系列随机选择的HMM_Score搜索结果验证了该方法，观察到在所有情况下，拟合具有非常高的相关值(R2 > 0.9)。HMM_Score的所有后续验证都是使用生成的e值执行的，根据报告，系统表现良好。

{"title":"In response to \"On E-value for tandem MS scoring schemes\"","authors":"Jainab Khatun, Morgan C. Giddings","doi":"10.1093/bioinformatics/btn252","DOIUrl":"https://doi.org/10.1093/bioinformatics/btn252","url":null,"abstract":"We thank Mark Segal for raising the issue of interpreting MS/MS scores. As he noted, we used a method proposed by Fenyo and Beavis (FB) (2003) to asses the significance of identification using HMM_Score. In his letter, Segal makes two basic assertions about this use: (1) that the extreme value distribution does not apply for the MS/MS database scoring systems used by FB and our HMM and (2) the linear tail fitting of the log survival function is not robust. He proposes a method that he authored as an alternative for estimating evd parameters that he says may be more robust, and also points to a method by Shen et al. that is specific to assessing significance of proteins/peptides identifications using MS/MS data. While it is valuable to examine whether there exist better ways of statistically interpreting the results of MS/MS search, in his letter, Segal did not provide any clear supporting evidence for his claim that the MS/MS scorers cannot use E-values. In our case, we calculate a score distribution for all random matches on-the-fly, then deriving the survival function, s, (the cumulative probability distribution) and finally, fitting a line to log of this function for the high-scoring portion of s. We verified the methodology for a series of randomly chosen HMM_Score search results, observing that in all cases, the fit had very high correlation values (R2 > 0.9). All subsequent validation of HMM_Score was performed using the E-values produced, and as reported the system performs well.","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"1 1","pages":"1654"},"PeriodicalIF":0.0,"publicationDate":"2008-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83920058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reply to "Comment on causality and pathway search in microarray time series experiment" 回复“微阵列时间序列实验中的因果关系和路径搜索问题评论”

Journal of bioinformatics

Pub Date : 2008-04-01 DOI: 10.1093/bioinformatics/btn019

N. Mukhopadhyay, Snigdhansu Chatterjee

We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small pri

我们感谢Nagarajan和Upreti教授对我们的论文Mukhopadhyay and Chatterjee(2007)的兴趣。在那里，我们建议在微阵列时间序列表达的非循环、均方差框架中使用基于格兰杰因果关系的途径检测;通常是短时间序列涉及大量的基因。Nagarajan教授和Upreti教授指出，在异方差存在的情况下，像“基因x调节基因y的表达，同时基因y调节基因x的表达”这样的循环，格兰杰因果关系检验可能无法提供信息。在这里，我们采用术语“异方差”(“同方差”)来表示白噪声的无条件方差，在欧几里得坐标系中以二元向量表示，在不同的坐标方向上是不同的(相同的)。因此，从本质上讲，如果违反了关于时间序列的非循环和均方差性质的假设，则因果关系检测的检验可能会失败。这一点很重要，因为当同一时期的循环关系存在时，因果关系的概念就没有什么意义了。在经济学的背景下，Eichler(2007)提出了对同期相关性和格兰杰因果关系的处理。极端的异方差可能表明基因表达不正常。在信的最后，Nagarajan博士和Upreti博士提到了正常化步骤。适当的归一化应该消除噪声方差的广泛差异，因此现在的微阵列数据集通常是事实上的归一化版本。Mukhopadhyay和Chatterjee(2007)中使用的数据也被归一化。然而，正如Nagarajan教授和Upreti教授所指出的那样，技术差异可能仍然存在。这将违反我们方法的假设(以及许多其他依赖于共同未知方差的统计比较方法)。Nagarajan教授在评论中友好地提出了双基因系统的参考文献，这些系统的时间特征可能不适合均方差的因果框架。因此，可能需要一个全向量自回归结构来捕捉它们在各种滞后(包括滞后零)时的相互依赖性。由此可以推测，存在多基因系统，其时间相互依赖的性质是极其复杂的。虽然目前关于基因调控网络的知识有限，但我们咨询的一些生物学专家认为，如果对它们进行足够长的时间跨度的研究，可能会在大型多基因网络中发现周期性模式，作为反馈过程的一部分。引出这种模式的适当方法是在很长一段时间内对所有基因进行多变量(可能是非平稳的)时间序列分析。目前这是不可行的，因为目前最先进的微阵列时间序列实验持续时间短，通常涉及非常大量的基因。因此，在我们看来，将网络限制为非循环网络是产生信息分析的一个小代价。未来更长时间的微阵列实验，以及与基因和蛋白质相互作用有关的生物和化学特性的发现，无疑将有助于更好地理解基因网络。我们想指出的是，在模型1(方程2)中，12、21、2和2 #需要是已知的常数，以保持数学显示(4)-(7)。如果f!12的某些(或全部)项，则显示(4)-(7)缺少每个估计参数的O (n!1)项;! 21;";#g是从数据中估计出来的，其中n是时间序列数据的长度。此外，s1的方程并没有说明这样一个事实，即作为单变量时间序列，xt和yt都是AR(2)(2阶自回归)过程，而不是AR(1)。类似的评论也适用于模型2(公式11)。在Mukhopadhyay和Chatterjee(2007)中考虑的人类细胞周期数据中，一个实验中的n为12，而时间序列本身为802维，这可以看出微阵列时间序列建模的难度。

{"title":"Reply to \"Comment on causality and pathway search in microarray time series experiment\"","authors":"N. Mukhopadhyay, Snigdhansu Chatterjee","doi":"10.1093/bioinformatics/btn019","DOIUrl":"https://doi.org/10.1093/bioinformatics/btn019","url":null,"abstract":"We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small pri","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"205 1","pages":"1033"},"PeriodicalIF":0.0,"publicationDate":"2008-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80394750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Metabolic systems cost-benefit analysis for interpreting network structure and regulation - Erratum 解释网络结构和调节的代谢系统成本效益分析-勘误

Journal of bioinformatics

Pub Date : 2007-08-05 DOI: 10.1093/bioinformatics/btm318

R. Carlson

引用次数: 2

Response to comments on "Bayesian Hierarchical Error Model for Analysis of Gene Expression Data" 对“基因表达数据分析的贝叶斯层次误差模型”评论的回应

Journal of bioinformatics

Pub Date : 2006-09-15 DOI: 10.1093/bioinformatics/btl333

HyungJun Cho, Jae K. Lee

引用次数: 0

Software patents in Bioinformatics 生物信息学中的软件专利

Journal of bioinformatics

Pub Date : 2006-06-15 DOI: 10.1093/bioinformatics/btl166

A. Valencia, A. Bateman

Bioinformatics has published papers describing new software for over 20 years (Nilsson and Klein 1985). During this time the world of software has changed considerably particularly with the irresistible rise of initiatives to build freely accessible software as has opening access to data resources. The Internet and the Web have also changed the way we use and distribute software. This social and technical revolution is also changing the structure of the relations between commercial and academic software-based activities , for which patents and software protection are key elements. In this and the following issue we publish two editorials addressing the general topics of software accessibility, patents and intellectual property. In this issue, Steven L. Salzberg and John Quackenbush (past and present Associate Editors, respectively) present one perspective on the issue. In the next issue another of our Associate Editors, Jonathan D. Wren will put forward a different perspective. We welcome additional contributions to this discussion from our readership, which will help the journal in the process of adapting our publication guidelines to better serve the development of Bioinformatics. (1985) SEQ-ED: an interactive computer program for editing, analysis and storage of long DNA sequences.

20多年来，生物信息学一直在发表描述新软件的论文(Nilsson和Klein 1985)。在这段时间里，软件世界发生了相当大的变化，特别是随着构建自由访问的软件和开放访问数据资源的倡议的不可抗拒的兴起。互联网和网络也改变了我们使用和分发软件的方式。这场社会和技术革命也正在改变商业和学术软件活动之间的关系结构，其中专利和软件保护是关键因素。在本期和下一期中，我们发表了两篇社论，讨论软件可访问性、专利和知识产权的一般主题。在本期中，Steven L. Salzberg和John Quackenbush(分别为前任和现任副主编)就这个问题提出了一种观点。下期我们的另一位副编辑乔纳森·d·雷恩将提出不同的观点。我们欢迎读者对这一讨论的更多贡献，这将有助于该杂志在调整我们的出版指南的过程中更好地服务于生物信息学的发展。(1985) SEQ-ED:用于编辑、分析和存储长DNA序列的交互式计算机程序。

{"title":"Software patents in Bioinformatics","authors":"A. Valencia, A. Bateman","doi":"10.1093/bioinformatics/btl166","DOIUrl":"https://doi.org/10.1093/bioinformatics/btl166","url":null,"abstract":"Bioinformatics has published papers describing new software for over 20 years (Nilsson and Klein 1985). During this time the world of software has changed considerably particularly with the irresistible rise of initiatives to build freely accessible software as has opening access to data resources. The Internet and the Web have also changed the way we use and distribute software. This social and technical revolution is also changing the structure of the relations between commercial and academic software-based activities , for which patents and software protection are key elements. In this and the following issue we publish two editorials addressing the general topics of software accessibility, patents and intellectual property. In this issue, Steven L. Salzberg and John Quackenbush (past and present Associate Editors, respectively) present one perspective on the issue. In the next issue another of our Associate Editors, Jonathan D. Wren will put forward a different perspective. We welcome additional contributions to this discussion from our readership, which will help the journal in the process of adapting our publication guidelines to better serve the development of Bioinformatics. (1985) SEQ-ED: an interactive computer program for editing, analysis and storage of long DNA sequences.","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"38 11 1","pages":"1415"},"PeriodicalIF":0.0,"publicationDate":"2006-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75863423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1