Pub Date : 2013-12-15DOI: 10.1093/bioinformatics/btt630
A. Dress, M. Linial, O. Troyanskaya, M. Vingron
In late 2012, the International Society for Computational Biology (ISCB) and Springer partnered together to enhance the Springer book series in computational biology. The two worked closely together to come up with a strategy to bring to ISCB members and the community at large educational materials that would not only educate the community but also help advance the science. Sponsored by ISCB, the computational biology series publish the latest high-quality research devoted to specific issues in computer-assisted analysis of biological data. The main emphasis is on current scientific developments and innovative techniques in computational biology (bioinformatics), bringing to light methods from mathematics, statistics and computer science that directly address biological problems currently under investigation. The series offer publications that present the state-of-the-art regarding the problems in question, show computational biology/bioinformatics methods at work and discuss anticipated demands regarding developments in future methodology. Titles can range from focused monographs, to undergraduate and graduate textbooks and professional text/reference works. Additionally, ISCB members will receive a 25% discount on book purchases within the series. Springer is seeking to publish quality books in the areas including, but not limited to, databases, data analysis and ontologies; functional and comparative genomics; gene regulation and transcriptomics; protein interactions and networks; data, literature and text mining; molecular sequence analysis; biological networks; sequencing and genotyping technologies; population genetics; systems biology; imaging and visualization; computational proteomics; molecular structural biology; evolution and phylogenetics; metagenomics; biomedical applications; high performance biocomputing; and synthetic biological systems. Book proposal submission details can be found at the book series Web site (http:// www.springer.com/series/5769).
{"title":"ISCB/SPRINGER series in computational biology","authors":"A. Dress, M. Linial, O. Troyanskaya, M. Vingron","doi":"10.1093/bioinformatics/btt630","DOIUrl":"https://doi.org/10.1093/bioinformatics/btt630","url":null,"abstract":"In late 2012, the International Society for Computational Biology (ISCB) and Springer partnered together to enhance the Springer book series in computational biology. The two worked closely together to come up with a strategy to bring to ISCB members and the community at large educational materials that would not only educate the community but also help advance the science. Sponsored by ISCB, the computational biology series publish the latest high-quality research devoted to specific issues in computer-assisted analysis of biological data. The main emphasis is on current scientific developments and innovative techniques in computational biology (bioinformatics), bringing to light methods from mathematics, statistics and computer science that directly address biological problems currently under investigation. The series offer publications that present the state-of-the-art regarding the problems in question, show computational biology/bioinformatics methods at work and discuss anticipated demands regarding developments in future methodology. Titles can range from focused monographs, to undergraduate and graduate textbooks and professional text/reference works. Additionally, ISCB members will receive a 25% discount on book purchases within the series. Springer is seeking to publish quality books in the areas including, but not limited to, databases, data analysis and ontologies; functional and comparative genomics; gene regulation and transcriptomics; protein interactions and networks; data, literature and text mining; molecular sequence analysis; biological networks; sequencing and genotyping technologies; population genetics; systems biology; imaging and visualization; computational proteomics; molecular structural biology; evolution and phylogenetics; metagenomics; biomedical applications; high performance biocomputing; and synthetic biological systems. Book proposal submission details can be found at the book series Web site (http:// www.springer.com/series/5769).","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"4 1","pages":"3246-3247"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89238007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-01DOI: 10.1093/bioinformatics/bts347
Tuobin Wang, A. Kettenbach, S. Gerber, C. Bailey-Kellogg
Recently, two new approaches to find overrepresented motifs in phosphoproteomics datasets have been introduced: our MMFPh (Wang et al., 2012) and He et al.’s Motif-All (BMC Bioinformatics 2011). Both methods espouse the importance of completeness— finding all motifs supported by the data—in contrast to previous approaches that may miss some motifs due to algorithmic choices. As we discuss in the Introduction of our article, however, while both methods seek to identify all significant motifs, they employ different significance assessments. In some cases, the difference does not matter much if at all. However, as we show in the Results, in some cases the difference leads to Motif-All finding many more motifs than MMFPh, and many more than those that are biologically supported, including known false positives planted in synthetic datasets. They also lead to Motif-All occasionally missing a motif found by MMFPh, though not for the datasets and parameter settings employed in the presented examples. Since MMFPh and Motif-All employ different notions of significance, it is not surprising that empirically they do not find exactly the same sets of motifs. He et al. (submitted for publication) elaborate on this finding by providing a theoretical characterization with respect to their notion of significance, which is a global assessment of an entire peptide. In contrast, MMFPh employs a local assessment of individual amino acid/position pairs during construction of a motif [our Equation (1)], as introduced by the popular Motif-X approach to phosphorylation motif discovery
{"title":"Response to 'Comments on \"MMFPh: A Maximal Motif Finder for Phosphoproteomics Datasets\"'","authors":"Tuobin Wang, A. Kettenbach, S. Gerber, C. Bailey-Kellogg","doi":"10.1093/bioinformatics/bts347","DOIUrl":"https://doi.org/10.1093/bioinformatics/bts347","url":null,"abstract":"Recently, two new approaches to find overrepresented motifs in phosphoproteomics datasets have been introduced: our MMFPh (Wang et al., 2012) and He et al.’s Motif-All (BMC Bioinformatics 2011). Both methods espouse the importance of completeness— finding all motifs supported by the data—in contrast to previous approaches that may miss some motifs due to algorithmic choices. As we discuss in the Introduction of our article, however, while both methods seek to identify all significant motifs, they employ different significance assessments. In some cases, the difference does not matter much if at all. However, as we show in the Results, in some cases the difference leads to Motif-All finding many more motifs than MMFPh, and many more than those that are biologically supported, including known false positives planted in synthetic datasets. They also lead to Motif-All occasionally missing a motif found by MMFPh, though not for the datasets and parameter settings employed in the presented examples. Since MMFPh and Motif-All employ different notions of significance, it is not surprising that empirically they do not find exactly the same sets of motifs. He et al. (submitted for publication) elaborate on this finding by providing a theoretical characterization with respect to their notion of significance, which is a global assessment of an entire peptide. In contrast, MMFPh employs a local assessment of individual amino acid/position pairs during construction of a motif [our Equation (1)], as introduced by the popular Motif-X approach to phosphorylation motif discovery","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"7 1","pages":"2213"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74466340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-09-01DOI: 10.1093/bioinformatics/btq313
V. Calian, J. Hsu
In current practice, such as GWAS (genome-wide association studies), permutation is often applied to multiple testing for association between large number of features [e.g. single nucleotide polymorphisms (SNPs)] and phenotypes (Hahn et al., 2008). Inferring that there is a difference between the phenotypic groups X and Y in some of the features is not very useful. One has to know for which features there is a difference. Exchangeability, a necessary condition for the validity of permutation tests, might be applicable if subjects are assigned randomly to treatments and the treatment is totally innocuous. However, instead of randomized, controlled clinical trials, bioinformatics discovery studies are mostly retrospective. Huang et al. (2006) gives examples of how permutation testing may fail to control Type I error when exchangeability does not hold. Equally important, Theorem 2.2 of this paper gives a succinct condition on when permutation testing is valid, even when exchangeability fails. This condition is as follows. In testing the null hypothesis that there is no difference in an entire set of features between groups X and Y , when the sample sizes are equal, even if the data distributions FX and FY have unequal even order cumulants, so long as they have equal odd higher order (third order and higher) cumulants, permutation testing controls Type I error rate. This precise condition is the basis for the subsequent papers Xu and Hsu (2007) and Calian et al. (2008) to uncover the Marginal-Determines-the Joint (MDJ) distribution condition needed for permutation multiple testing to control multiple testing error rates. Regardless of sample sizes, permutation multiple tests may not control false discoveries of which features are predictive of phenotype, unless it is assumed that the joint distributions of nonpredictive features are identical between the X and Y groups. Checking this assumption on the joint distribution using the data
在当前的实践中,例如GWAS(全基因组关联研究),排列通常用于对大量特征[例如单核苷酸多态性(snp)]与表型之间的关联进行多重测试(Hahn等人,2008)。推断表型组X和Y在某些特征上存在差异并不是很有用。我们必须知道哪些特征是不同的。互换性是排列试验有效性的必要条件,如果受试者被随机分配到治疗中,并且治疗是完全无害的,则可能适用。然而,生物信息学发现研究大多是回顾性的,而不是随机对照临床试验。Huang等人(2006)举例说明,当互换性不成立时,排列测试可能无法控制I型错误。同样重要的是,本文的定理2.2给出了一个简单的条件,即当互换性失效时,置换检验是有效的。这个条件如下。当样本大小相等时,即使数据分布FX和FY具有不等的偶数阶累积量,只要它们具有相等的奇数高阶(三阶和更高)累积量,在检验零假设时,置换检验控制类型I错误率。这一精确条件是后续论文Xu and Hsu(2007)和Calian et al.(2008)的基础,揭示了置换多重检验控制多重检验错误率所需的边际决定-联合(MDJ)分布条件。无论样本量如何,排列多重检验可能无法控制哪些特征可预测表型的错误发现,除非假设非预测特征的联合分布在X组和Y组之间是相同的。用数据在联合分布上检验这个假设
{"title":"Response to Letter to the Editor by Philip Good on To Permute or Not to Permute","authors":"V. Calian, J. Hsu","doi":"10.1093/bioinformatics/btq313","DOIUrl":"https://doi.org/10.1093/bioinformatics/btq313","url":null,"abstract":"In current practice, such as GWAS (genome-wide association studies), permutation is often applied to multiple testing for association between large number of features [e.g. single nucleotide polymorphisms (SNPs)] and phenotypes (Hahn et al., 2008). Inferring that there is a difference between the phenotypic groups X and Y in some of the features is not very useful. One has to know for which features there is a difference. Exchangeability, a necessary condition for the validity of permutation tests, might be applicable if subjects are assigned randomly to treatments and the treatment is totally innocuous. However, instead of randomized, controlled clinical trials, bioinformatics discovery studies are mostly retrospective. Huang et al. (2006) gives examples of how permutation testing may fail to control Type I error when exchangeability does not hold. Equally important, Theorem 2.2 of this paper gives a succinct condition on when permutation testing is valid, even when exchangeability fails. This condition is as follows. In testing the null hypothesis that there is no difference in an entire set of features between groups X and Y , when the sample sizes are equal, even if the data distributions FX and FY have unequal even order cumulants, so long as they have equal odd higher order (third order and higher) cumulants, permutation testing controls Type I error rate. This precise condition is the basis for the subsequent papers Xu and Hsu (2007) and Calian et al. (2008) to uncover the Marginal-Determines-the Joint (MDJ) distribution condition needed for permutation multiple testing to control multiple testing error rates. Regardless of sample sizes, permutation multiple tests may not control false discoveries of which features are predictive of phenotype, unless it is assumed that the joint distributions of nonpredictive features are identical between the X and Y groups. Checking this assumption on the joint distribution using the data","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"17 1","pages":"2215"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88301609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-03-01DOI: 10.1093/bioinformatics/btp038
David M. Rocke, T. Ideker, O. Troyanskaya, John Quackenbush, J. Dopazo
Over the last decade or so, there have been large numbers of methods published on approaches for normalization, variable (gene) selection, classification, and clustering of microarray data. As indicated in the scope document for Bioinformatics, this requires papers describing new methods for these problems to meet a very high standard, showing important improvement in results for real biological data, as well as novelty. In this editorial, we describe some standards that need to be met for papers in these areas to be seriously considered. We ask that prospective authors consider these points carefully before submission of their papers to Bioinformatics. The Role of Simulation. Simulation can be useful in investigating the properties of various methods of data analysis. Yet there are important barriers to credible use of simulation in microarray studies, largely due to what we don’t know about the statistical distribution of measured gene expression levels. First, the distribution across transcripts of true expression values is dependent on the biological state of the tissue or cell, and for a given state this is unknown, even in distributional form, and may further exhibit genespecific and platform-specific effects. Second, the correlation within biological replicates of true expression is unknown, and is likely unknowable in detail given that it is
{"title":"Papers on normalization, variable selection, classification or clustering of microarray data","authors":"David M. Rocke, T. Ideker, O. Troyanskaya, John Quackenbush, J. Dopazo","doi":"10.1093/bioinformatics/btp038","DOIUrl":"https://doi.org/10.1093/bioinformatics/btp038","url":null,"abstract":"Over the last decade or so, there have been large numbers of methods published on approaches for normalization, variable (gene) selection, classification, and clustering of microarray data. As indicated in the scope document for Bioinformatics, this requires papers describing new methods for these problems to meet a very high standard, showing important improvement in results for real biological data, as well as novelty. In this editorial, we describe some standards that need to be met for papers in these areas to be seriously considered. We ask that prospective authors consider these points carefully before submission of their papers to Bioinformatics. The Role of Simulation. Simulation can be useful in investigating the properties of various methods of data analysis. Yet there are important barriers to credible use of simulation in microarray studies, largely due to what we don’t know about the statistical distribution of measured gene expression levels. First, the distribution across transcripts of true expression values is dependent on the biological state of the tissue or cell, and for a given state this is unknown, even in distributional form, and may further exhibit genespecific and platform-specific effects. Second, the correlation within biological replicates of true expression is unknown, and is likely unknowable in detail given that it is","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"8 1","pages":"701-702"},"PeriodicalIF":0.0,"publicationDate":"2009-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75260159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01DOI: 10.1093/BIOINFORMATICS/BTN535
D. M. Vienne, T. Giraud, O. Martin
In response to comment on ‘A congruence index for testing topological similarity between trees’ Damien M. de Vienne1,∗, Tatiana Giraud1 and Olivier C. Martin2,3 1Univ Paris-Sud, Laboratoire de Recherche en Informatique, UMR8623, Orsay F-91405; CNRS, Orsay F-91405, 2Univ Paris-Sud, UMR8626, LPTMS, Orsay F-91405; CNRS, Orsay F-91405 and 3Univ Paris-Sud, UMR8120, Laboratoire de Genetique Vegetale du Moulon, Gif-sur-Yvette F-91190, France
Damien M. de Vienne1,∗,Tatiana Giraud1和Olivier C. martin2, 1Univ Paris-Sud, Laboratoire de Recherche en Informatique, UMR8623, Orsay F-91405;CNRS, Orsay F-91405, 2Univ Paris-Sud, UMR8626, LPTMS, Orsay F-91405;CNRS, Orsay F-91405和3Univ Paris-Sud, UMR8120, Moulon植物遗传学实验室,Gif-sur-Yvette F-91190,法国
{"title":"In response to comment on 'A congruence index for testing topological similarity between trees'","authors":"D. M. Vienne, T. Giraud, O. Martin","doi":"10.1093/BIOINFORMATICS/BTN535","DOIUrl":"https://doi.org/10.1093/BIOINFORMATICS/BTN535","url":null,"abstract":"In response to comment on ‘A congruence index for testing topological similarity between trees’ Damien M. de Vienne1,∗, Tatiana Giraud1 and Olivier C. Martin2,3 1Univ Paris-Sud, Laboratoire de Recherche en Informatique, UMR8623, Orsay F-91405; CNRS, Orsay F-91405, 2Univ Paris-Sud, UMR8626, LPTMS, Orsay F-91405; CNRS, Orsay F-91405 and 3Univ Paris-Sud, UMR8120, Laboratoire de Genetique Vegetale du Moulon, Gif-sur-Yvette F-91190, France","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"10 1","pages":"150-151"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74658373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-07-15DOI: 10.1093/bioinformatics/btn252
Jainab Khatun, Morgan C. Giddings
We thank Mark Segal for raising the issue of interpreting MS/MS scores. As he noted, we used a method proposed by Fenyo and Beavis (FB) (2003) to asses the significance of identification using HMM_Score. In his letter, Segal makes two basic assertions about this use: (1) that the extreme value distribution does not apply for the MS/MS database scoring systems used by FB and our HMM and (2) the linear tail fitting of the log survival function is not robust. He proposes a method that he authored as an alternative for estimating evd parameters that he says may be more robust, and also points to a method by Shen et al. that is specific to assessing significance of proteins/peptides identifications using MS/MS data. While it is valuable to examine whether there exist better ways of statistically interpreting the results of MS/MS search, in his letter, Segal did not provide any clear supporting evidence for his claim that the MS/MS scorers cannot use E-values. In our case, we calculate a score distribution for all random matches on-the-fly, then deriving the survival function, s, (the cumulative probability distribution) and finally, fitting a line to log of this function for the high-scoring portion of s. We verified the methodology for a series of randomly chosen HMM_Score search results, observing that in all cases, the fit had very high correlation values (R2 > 0.9). All subsequent validation of HMM_Score was performed using the E-values produced, and as reported the system performs well.
{"title":"In response to \"On E-value for tandem MS scoring schemes\"","authors":"Jainab Khatun, Morgan C. Giddings","doi":"10.1093/bioinformatics/btn252","DOIUrl":"https://doi.org/10.1093/bioinformatics/btn252","url":null,"abstract":"We thank Mark Segal for raising the issue of interpreting MS/MS scores. As he noted, we used a method proposed by Fenyo and Beavis (FB) (2003) to asses the significance of identification using HMM_Score. In his letter, Segal makes two basic assertions about this use: (1) that the extreme value distribution does not apply for the MS/MS database scoring systems used by FB and our HMM and (2) the linear tail fitting of the log survival function is not robust. He proposes a method that he authored as an alternative for estimating evd parameters that he says may be more robust, and also points to a method by Shen et al. that is specific to assessing significance of proteins/peptides identifications using MS/MS data. While it is valuable to examine whether there exist better ways of statistically interpreting the results of MS/MS search, in his letter, Segal did not provide any clear supporting evidence for his claim that the MS/MS scorers cannot use E-values. In our case, we calculate a score distribution for all random matches on-the-fly, then deriving the survival function, s, (the cumulative probability distribution) and finally, fitting a line to log of this function for the high-scoring portion of s. We verified the methodology for a series of randomly chosen HMM_Score search results, observing that in all cases, the fit had very high correlation values (R2 > 0.9). All subsequent validation of HMM_Score was performed using the E-values produced, and as reported the system performs well.","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"1 1","pages":"1654"},"PeriodicalIF":0.0,"publicationDate":"2008-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83920058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-04-01DOI: 10.1093/bioinformatics/btn019
N. Mukhopadhyay, Snigdhansu Chatterjee
We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small pri
我们感谢Nagarajan和Upreti教授对我们的论文Mukhopadhyay and Chatterjee(2007)的兴趣。在那里,我们建议在微阵列时间序列表达的非循环、均方差框架中使用基于格兰杰因果关系的途径检测;通常是短时间序列涉及大量的基因。Nagarajan教授和Upreti教授指出,在异方差存在的情况下,像“基因x调节基因y的表达,同时基因y调节基因x的表达”这样的循环,格兰杰因果关系检验可能无法提供信息。在这里,我们采用术语“异方差”(“同方差”)来表示白噪声的无条件方差,在欧几里得坐标系中以二元向量表示,在不同的坐标方向上是不同的(相同的)。因此,从本质上讲,如果违反了关于时间序列的非循环和均方差性质的假设,则因果关系检测的检验可能会失败。这一点很重要,因为当同一时期的循环关系存在时,因果关系的概念就没有什么意义了。在经济学的背景下,Eichler(2007)提出了对同期相关性和格兰杰因果关系的处理。极端的异方差可能表明基因表达不正常。在信的最后,Nagarajan博士和Upreti博士提到了正常化步骤。适当的归一化应该消除噪声方差的广泛差异,因此现在的微阵列数据集通常是事实上的归一化版本。Mukhopadhyay和Chatterjee(2007)中使用的数据也被归一化。然而,正如Nagarajan教授和Upreti教授所指出的那样,技术差异可能仍然存在。这将违反我们方法的假设(以及许多其他依赖于共同未知方差的统计比较方法)。Nagarajan教授在评论中友好地提出了双基因系统的参考文献,这些系统的时间特征可能不适合均方差的因果框架。因此,可能需要一个全向量自回归结构来捕捉它们在各种滞后(包括滞后零)时的相互依赖性。由此可以推测,存在多基因系统,其时间相互依赖的性质是极其复杂的。虽然目前关于基因调控网络的知识有限,但我们咨询的一些生物学专家认为,如果对它们进行足够长的时间跨度的研究,可能会在大型多基因网络中发现周期性模式,作为反馈过程的一部分。引出这种模式的适当方法是在很长一段时间内对所有基因进行多变量(可能是非平稳的)时间序列分析。目前这是不可行的,因为目前最先进的微阵列时间序列实验持续时间短,通常涉及非常大量的基因。因此,在我们看来,将网络限制为非循环网络是产生信息分析的一个小代价。未来更长时间的微阵列实验,以及与基因和蛋白质相互作用有关的生物和化学特性的发现,无疑将有助于更好地理解基因网络。我们想指出的是,在模型1(方程2)中,12、21、2和2 #需要是已知的常数,以保持数学显示(4)-(7)。如果f!12的某些(或全部)项,则显示(4)-(7)缺少每个估计参数的O (n!1)项;! 21;";#g是从数据中估计出来的,其中n是时间序列数据的长度。此外,s1的方程并没有说明这样一个事实,即作为单变量时间序列,xt和yt都是AR(2)(2阶自回归)过程,而不是AR(1)。类似的评论也适用于模型2(公式11)。在Mukhopadhyay和Chatterjee(2007)中考虑的人类细胞周期数据中,一个实验中的n为12,而时间序列本身为802维,这可以看出微阵列时间序列建模的难度。
{"title":"Reply to \"Comment on causality and pathway search in microarray time series experiment\"","authors":"N. Mukhopadhyay, Snigdhansu Chatterjee","doi":"10.1093/bioinformatics/btn019","DOIUrl":"https://doi.org/10.1093/bioinformatics/btn019","url":null,"abstract":"We thank Professors Nagarajan and Upreti for their interest in our paper, Mukhopadhyay and Chatterjee (2007). There, we propose using Granger causality-based pathway detection in an acyclic, homoscedastic framework for microarray time-series expressions; which are generally short-duration time series involving very large number of genes. Professors Nagarajan and Upreti point out that in the presence of heteroscedasticity, and a cycle like ‘gene x regulates the expression of gene y and simultaneously gene y regulates the expression of gene x’, Granger causality tests may not be informative. Here, we adopt the term ‘heteroscedasticity’ (‘homoscedasticity’) to mean the unconditional variance of the white noise, represented as a bivariate vector in the Euclidean co-ordinate system, is different (same) in different co-ordinate directions. Thus, in essence, if the assumptions about the acyclic and homoscedastic nature of the time series are violated, tests for causality detection may fail. This is an important point, since when a contemporaneous cyclic relationship is present, the notion of causality makes little sense. In the context of economics, Eichler (2007) present a treatment of contemporaneous correlation as well as Granger causality. Extreme heteroscedasticity may be indicative of improper normalization of gene expressions. At the end of their letter, Dr Nagarajan and Dr Upreti mention the normalization step. Proper normalization should remove wide discrepancy in noise variance, hence nowadays microarray datasets are typically available in de facto normalized version. The data used in Mukhopadhyay and Chatterjee (2007) is also normalized. However, difference in technical variance, as indicated by Professors Nagarajan and Upreti, may still be present. And that will violate the assumption of our method (as well as many other statistical comparison methods relying on common unknown variance). Professor Nagarajan, in review, kindly suggested references for two-gene systems whose time-profile may not fit into to a homoscedastic, cause-effect framework. Thus, a full vector autoregression structure may be needed to capture their mutual dependence at various lags (including lag zero). It can be guessed that multi-gene systems exist whose temporal codependency nature is extremely complex. Although current knowledge about gene regulatory networks is limited, some biology experts we consulted believe that cyclical patterns may be found in large multi-gene networks as a part of a feedback procedure, if they are studied over long enough time spans. A proper approach to elicit such patterns would be to conduct multivariate, possibly non-stationary, time-series analysis with all the genes over a long time horizon. This is not feasible currently, since present state-of-the-art microarray time series experiments are of short duration and typically involve very large number of genes. Hence, restricting the network to acyclic ones is, in our opinion, a small pri","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"205 1","pages":"1033"},"PeriodicalIF":0.0,"publicationDate":"2008-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80394750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-08-05DOI: 10.1093/bioinformatics/btm318
R. Carlson
{"title":"Metabolic systems cost-benefit analysis for interpreting network structure and regulation - Erratum","authors":"R. Carlson","doi":"10.1093/bioinformatics/btm318","DOIUrl":"https://doi.org/10.1093/bioinformatics/btm318","url":null,"abstract":"","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"27 1","pages":"2202"},"PeriodicalIF":0.0,"publicationDate":"2007-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75053687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-09-15DOI: 10.1093/bioinformatics/btl333
HyungJun Cho, Jae K. Lee
{"title":"Response to comments on \"Bayesian Hierarchical Error Model for Analysis of Gene Expression Data\"","authors":"HyungJun Cho, Jae K. Lee","doi":"10.1093/bioinformatics/btl333","DOIUrl":"https://doi.org/10.1093/bioinformatics/btl333","url":null,"abstract":"","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"32 5","pages":"2452"},"PeriodicalIF":0.0,"publicationDate":"2006-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/btl333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72397829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-06-15DOI: 10.1093/bioinformatics/btl166
A. Valencia, A. Bateman
Bioinformatics has published papers describing new software for over 20 years (Nilsson and Klein 1985). During this time the world of software has changed considerably particularly with the irresistible rise of initiatives to build freely accessible software as has opening access to data resources. The Internet and the Web have also changed the way we use and distribute software. This social and technical revolution is also changing the structure of the relations between commercial and academic software-based activities , for which patents and software protection are key elements. In this and the following issue we publish two editorials addressing the general topics of software accessibility, patents and intellectual property. In this issue, Steven L. Salzberg and John Quackenbush (past and present Associate Editors, respectively) present one perspective on the issue. In the next issue another of our Associate Editors, Jonathan D. Wren will put forward a different perspective. We welcome additional contributions to this discussion from our readership, which will help the journal in the process of adapting our publication guidelines to better serve the development of Bioinformatics. (1985) SEQ-ED: an interactive computer program for editing, analysis and storage of long DNA sequences.
20多年来,生物信息学一直在发表描述新软件的论文(Nilsson和Klein 1985)。在这段时间里,软件世界发生了相当大的变化,特别是随着构建自由访问的软件和开放访问数据资源的倡议的不可抗拒的兴起。互联网和网络也改变了我们使用和分发软件的方式。这场社会和技术革命也正在改变商业和学术软件活动之间的关系结构,其中专利和软件保护是关键因素。在本期和下一期中,我们发表了两篇社论,讨论软件可访问性、专利和知识产权的一般主题。在本期中,Steven L. Salzberg和John Quackenbush(分别为前任和现任副主编)就这个问题提出了一种观点。下期我们的另一位副编辑乔纳森·d·雷恩将提出不同的观点。我们欢迎读者对这一讨论的更多贡献,这将有助于该杂志在调整我们的出版指南的过程中更好地服务于生物信息学的发展。(1985) SEQ-ED:用于编辑、分析和存储长DNA序列的交互式计算机程序。
{"title":"Software patents in Bioinformatics","authors":"A. Valencia, A. Bateman","doi":"10.1093/bioinformatics/btl166","DOIUrl":"https://doi.org/10.1093/bioinformatics/btl166","url":null,"abstract":"Bioinformatics has published papers describing new software for over 20 years (Nilsson and Klein 1985). During this time the world of software has changed considerably particularly with the irresistible rise of initiatives to build freely accessible software as has opening access to data resources. The Internet and the Web have also changed the way we use and distribute software. This social and technical revolution is also changing the structure of the relations between commercial and academic software-based activities , for which patents and software protection are key elements. In this and the following issue we publish two editorials addressing the general topics of software accessibility, patents and intellectual property. In this issue, Steven L. Salzberg and John Quackenbush (past and present Associate Editors, respectively) present one perspective on the issue. In the next issue another of our Associate Editors, Jonathan D. Wren will put forward a different perspective. We welcome additional contributions to this discussion from our readership, which will help the journal in the process of adapting our publication guidelines to better serve the development of Bioinformatics. (1985) SEQ-ED: an interactive computer program for editing, analysis and storage of long DNA sequences.","PeriodicalId":90576,"journal":{"name":"Journal of bioinformatics","volume":"38 11 1","pages":"1415"},"PeriodicalIF":0.0,"publicationDate":"2006-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75863423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}