Genome informatics. International Conference on Genome Informatics最新文献_第3页

Genome-wide analysis of plant UGT family based on sequence and substrate information. 基于序列和底物信息的植物UGT家族全基因组分析。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01

Yosuke Nishimura, Toshiaki Tokimatsu, Masaaki Kotera, Susumu Goto, Minoru Kanehisa

UGTs (UDP glycosyltransferase) are the largest glycosyltransferase gene family in higher plants, modifying secondary metabolites, hormones, and xenobiotics. This gene family plays an important role in the vast diversity of plant secondary metabolites specific to species. Experimental data of biochemical activities and physiological roles of plant UGTs are increasing but most UGTs are not still functionally characterized. To understand their catalytic specificity and function from sequence data, phylogenetic analyses have been achieved mainly in Arabidopsis, but massive and comprehensive approach covering various species has not been applied yet. In this study, we collected 733 UGT sequences derived from 96 plant species and 252 substrate specificity data. We constructed a phylogenetic tree and divided most part of these genes into nine sequence groups, which are characterized by biochemical specificity. Furthermore, we performed genome-wide analysis of seven plant species UGTs by mapping them into these groups. We propose this is the first step to understand whole glycosylated secondary metabolites of each plant species from its genome information.

UGTs (UDP糖基转移酶)是高等植物中最大的糖基转移酶基因家族，可调节次生代谢物、激素和外源物。该基因家族在植物次生代谢物的多样性中起着重要作用。植物ugt的生化活性和生理作用的实验数据越来越多，但大多数ugt仍未得到功能表征。为了从序列数据中了解它们的催化特异性和功能，系统发育分析主要是在拟南芥中实现的，但尚未应用大规模和全面的涵盖各种物种的方法。在这项研究中，我们收集了来自96种植物的733个UGT序列和252个底物特异性数据。我们构建了系统发育树，并将大部分基因划分为9个序列组，这些序列组具有生物化学特异性。此外，我们对7个植物物种的ugt进行了全基因组分析，并将它们定位到这些组中。我们认为这是从每个植物物种的基因组信息中了解其全糖基化次生代谢物的第一步。

{"title":"Genome-wide analysis of plant UGT family based on sequence and substrate information.","authors":"Yosuke Nishimura, Toshiaki Tokimatsu, Masaaki Kotera, Susumu Goto, Minoru Kanehisa","doi":"","DOIUrl":"","url":null,"abstract":"UGTs (UDP glycosyltransferase) are the largest glycosyltransferase gene family in higher plants, modifying secondary metabolites, hormones, and xenobiotics. This gene family plays an important role in the vast diversity of plant secondary metabolites specific to species. Experimental data of biochemical activities and physiological roles of plant UGTs are increasing but most UGTs are not still functionally characterized. To understand their catalytic specificity and function from sequence data, phylogenetic analyses have been achieved mainly in Arabidopsis, but massive and comprehensive approach covering various species has not been applied yet. In this study, we collected 733 UGT sequences derived from 96 plant species and 252 substrate specificity data. We constructed a phylogenetic tree and divided most part of these genes into nine sequence groups, which are characterized by biochemical specificity. Furthermore, we performed genome-wide analysis of seven plant species UGTs by mapping them into these groups. We propose this is the first step to understand whole glycosylated secondary metabolites of each plant species from its genome information.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"24 ","pages":"127-38"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30252342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Strategies of non-sequential protein structure alignments. 非序列蛋白质结构比对策略。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01

Aysam Guerler, Ernst-Walter Knapp

Due to the large number of available protein structure alignment algorithms, a lot of effort has been made to define robust measures to evaluate their performances and the quality of generated alignments. Most quality measures involve the number of aligned residues and the RMSD. In this work, we analyze how these two properties are influenced by different residue assignment strategies as employed in common non-sequential structure alignment algorithms. Therefore, we implemented different residue assignment strategies into our non-sequential structure alignment algorithm GANGSTA+. We compared the resulting numbers of aligned residues and RMSDs for each residue assignment strategy and different alignment algorithms on a benchmark set of circular-permuted protein pairs. Unfortunately, differences in the residue assignment strategies are often ignored when comparing the performances of different algorithms. However, our results clearly show that this may strongly bias the observations. Bringing residue assignment strategies in line can explain observed performance differences between entirely different alignment algorithms. Our results suggest that performance comparison of non-sequential protein structure alignment algorithms should be based on the same residue assignment strategy.

由于有大量可用的蛋白质结构比对算法，已经做了很多努力来定义鲁棒度量来评估它们的性能和生成比对的质量。大多数质量测量涉及对齐残差的数量和均方根偏差。在这项工作中，我们分析了在常见的非顺序结构对齐算法中使用的不同剩余分配策略对这两个性质的影响。因此，我们在我们的非顺序结构对齐算法GANGSTA+中实现了不同的剩余分配策略。我们比较了每个残基分配策略和不同的排列算法在一组圆形排列蛋白对的基准集上得到的对齐残基数量和rmsd。遗憾的是，在比较不同算法的性能时，往往忽略了残数分配策略的差异。然而，我们的结果清楚地表明，这可能会严重影响观察结果。使剩余分配策略一致可以解释在完全不同的对齐算法之间观察到的性能差异。我们的研究结果表明，非序列蛋白质结构比对算法的性能比较应该基于相同的残基分配策略。

{"title":"Strategies of non-sequential protein structure alignments.","authors":"Aysam Guerler, Ernst-Walter Knapp","doi":"","DOIUrl":"","url":null,"abstract":"Due to the large number of available protein structure alignment algorithms, a lot of effort has been made to define robust measures to evaluate their performances and the quality of generated alignments. Most quality measures involve the number of aligned residues and the RMSD. In this work, we analyze how these two properties are influenced by different residue assignment strategies as employed in common non-sequential structure alignment algorithms. Therefore, we implemented different residue assignment strategies into our non-sequential structure alignment algorithm GANGSTA+. We compared the resulting numbers of aligned residues and RMSDs for each residue assignment strategy and different alignment algorithms on a benchmark set of circular-permuted protein pairs. Unfortunately, differences in the residue assignment strategies are often ignored when comparing the performances of different algorithms. However, our results clearly show that this may strongly bias the observations. Bringing residue assignment strategies in line can explain observed performance differences between entirely different alignment algorithms. Our results suggest that performance comparison of non-sequential protein structure alignment algorithms should be based on the same residue assignment strategy.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"22 ","pages":"21-9"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28783006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterization and classification of adverse drug interactions. 药物不良反应的特征和分类。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01

Masataka Takarabe, Daichi Shigemizu, Masaaki Kotera, Susumu Goto, Minoru Kanehisa

Drug interactions which may cause harmful events are important for our health and new drag development. In the previous work, we extracted the drug interaction data from Japanese drug package inserts and generated the drug interaction network. The network contains a large number of drugs densely connected to each other, where drug targets and drug-metabolizing enzymes were shared in the drug interactions. In this study, we further analyzed the obtained drug interaction network by merging drugs into drug categories based on the Anatomical Therapeutic Chemical (ATC) classification. The merged data of drug interactions indicated drug properties that are related to drug interaction mechanisms or symptoms. We investigated the relationships between the drug groups and drug interaction mechanisms or symptoms.

药物相互作用对我们的健康和新药开发具有重要的意义。在之前的工作中，我们从日本药品说明书中提取药物相互作用数据，并生成药物相互作用网络。该网络包含大量相互紧密连接的药物，其中药物靶点和药物代谢酶在药物相互作用中是共享的。在本研究中，我们进一步分析了获得的药物相互作用网络，将药物合并到基于解剖治疗化学(ATC)分类的药物类别中。药物相互作用的合并数据表明与药物相互作用机制或症状相关的药物特性。我们调查了药物组与药物相互作用机制或症状之间的关系。

引用次数: 0

On the performance of methods for finding a switching mechanism in gene expression. 关于寻找基因表达转换机制的方法的性能。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01 DOI: 10.1142/9781848166585_0006

Mitsunori Kayano, Ichigaku Takigawa, Motoki Shiga, K. Tsuda, Hiroshi Mamitsuka

We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.

我们解决了检测基因表达开关机制的问题，其中两个基因在一个实验条件下正相关，而在另一个实验条件下负相关。我们比较了现有方法对该问题的性能，大致分为两种类型:交互测试(IT)和相关系数的差异。相互作用检验是目前遗传学中检测上位性的标准方法，它是在两个有或没有相互作用项的逻辑回归之间进行对数似然比检验，从而检查两个基因之间相互作用的强度。另一方面，两种实验条件下可以计算出两个相关系数，它们之间的差异更直观地反映了表达趋势的变化。在我们的实验中，我们测试了三种不同类型的相关系数:Pearson、Spearman和中相关(双权重中相关)。实验采用从GEO (Gene Expression Omnibus)数据库中选择的约2.3 × 10(9)个组合进行。我们根据IT的p值或相关系数差的绝对值对所有组合进行排序，然后根据切换机制直观地评估排名靠前的组合。结果表明:1)IT检测到的组合包括非切换组合;2)Pearson容易受到异常值的影响，而Spearman和中相关则可能避免异常值的影响。

{"title":"On the performance of methods for finding a switching mechanism in gene expression.","authors":"Mitsunori Kayano, Ichigaku Takigawa, Motoki Shiga, K. Tsuda, Hiroshi Mamitsuka","doi":"10.1142/9781848166585_0006","DOIUrl":"https://doi.org/10.1142/9781848166585_0006","url":null,"abstract":"We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"24 1","pages":"69-83"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76784928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Annotating gene functions with integrative spectral clustering on microarray expressions and sequences. 利用微阵列表达和序列的整合谱聚类来注释基因功能。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01 DOI: 10.1142/9781848165786_0009

Limin Li, Motoki Shiga, W. Ching, Hiroshi Mamitsuka

Annotating genes is a fundamental issue in the post-genomic era. A typical procedure for this issue is first clustering genes by their features and then assigning functions of unknown genes by using known genes in the same cluster. A lot of genomic information are available for this issue, but two major types of data which can be measured for any gene are microarray expressions and sequences, both of which however have their own flaws. Thus a natural and promising approach for gene annotation is to integrate these two data sources, especially in terms of their costs to be optimized in clustering. We develop an efficient gene annotation method with three steps containing spectral clustering over the integrated cost, based on the idea of network modularity. We rigorously examined the performance of our proposed method from three different viewpoints. All experimental results indicate the performance advantage of our method over possible clustering/classification-based approaches of gene function annotation, using expressions and/or sequences.

基因注释是后基因组时代的一个基本问题。一个典型的方法是首先根据基因的特征进行聚类，然后在同一聚类中使用已知基因来分配未知基因的功能。大量的基因组信息可用于这个问题，但可以测量任何基因的两种主要类型的数据是微阵列表达和序列，但两者都有自己的缺陷。因此，整合这两个数据源是一种自然而有前途的基因注释方法，特别是考虑到它们在聚类中优化的成本。基于网络模块化的思想，提出了一种基于集成代价的谱聚类的三步基因注释方法。我们从三个不同的角度严格检查了我们提出的方法的性能。所有的实验结果表明，我们的方法优于可能的基于聚类/分类的基因功能注释方法，使用表达式和/或序列。

引用次数: 13

Predicting protein complex geometries with linear scoring functions. 用线性评分函数预测蛋白质复杂几何形状。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01 DOI: 10.1142/9781848166585_0002

Ozgur Demir-Kavuk, Florian Krull, Myong-Ho Chae, E. Knapp

Protein-Protein interactions play an important role in many cellular processes. However experimental determination of the protein complex structure is quite difficult and time consuming. Hence, there is need for fast and accurate in silico protein docking methods. These methods generally consist of two stages: (i) a sampling algorithm that generates a large number of candidate complex geometries (decoys), and (ii) a scoring function that ranks these decoys such that nearnative decoys are higher ranked than other decoys. We have recently developed a neural network based scoring function that performed better than other state-of-the-art scoring functions on a benchmark of 65 protein complexes. Here, we use similar ideas to develop a method that is based on linear scoring functions. We compare the linear scoring function of the present study with other knowledge-based scoring functions such as ZDOCK 3.0, ZRANK and the previously developed neural network. Despite its simplicity the linear scoring function performs as good as the compared state-of-the-art methods and predictions are simple and rapid to compute.

蛋白质-蛋白质相互作用在许多细胞过程中起着重要作用。然而，蛋白质复合体结构的实验测定是相当困难和耗时的。因此，需要一种快速、准确的硅蛋白对接方法。这些方法通常由两个阶段组成:(i)生成大量候选复杂几何形状(诱饵)的抽样算法，以及(ii)对这些诱饵进行排名的评分函数，以便近地诱饵比其他诱饵排名更高。我们最近开发了一个基于神经网络的评分函数，在65个蛋白质复合物的基准上，它比其他最先进的评分函数表现得更好。在这里，我们使用类似的想法来开发一种基于线性评分函数的方法。我们将本研究的线性评分函数与其他基于知识的评分函数(如ZDOCK 3.0, ZRANK和先前开发的神经网络)进行了比较。尽管线性评分函数简单，但它的表现与比较的最先进的方法一样好，并且预测简单且计算迅速。

{"title":"Predicting protein complex geometries with linear scoring functions.","authors":"Ozgur Demir-Kavuk, Florian Krull, Myong-Ho Chae, E. Knapp","doi":"10.1142/9781848166585_0002","DOIUrl":"https://doi.org/10.1142/9781848166585_0002","url":null,"abstract":"Protein-Protein interactions play an important role in many cellular processes. However experimental determination of the protein complex structure is quite difficult and time consuming. Hence, there is need for fast and accurate in silico protein docking methods. These methods generally consist of two stages: (i) a sampling algorithm that generates a large number of candidate complex geometries (decoys), and (ii) a scoring function that ranks these decoys such that nearnative decoys are higher ranked than other decoys. We have recently developed a neural network based scoring function that performed better than other state-of-the-art scoring functions on a benchmark of 65 protein complexes. Here, we use similar ideas to develop a method that is based on linear scoring functions. We compare the linear scoring function of the present study with other knowledge-based scoring functions such as ZDOCK 3.0, ZRANK and the previously developed neural network. Despite its simplicity the linear scoring function performs as good as the compared state-of-the-art methods and predictions are simple and rapid to compute.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"59 1","pages":"21-30"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84629995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

CaMPDB: a resource for calpain and modulatory proteolysis. CaMPDB:钙蛋白酶和调节性蛋白水解的资源。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01 DOI: 10.1142/9781848165786_0017

David duVerle, Ichigaku Takigawa, Y. Ono, H. Sorimachi, Hiroshi Mamitsuka

While the importance of modulatory proteolysis in research has steadily increased, knowledge on this process has remained largely disorganized, with the nature and role of entities composing modulatory proteolysis still uncertain. We built CaMPDB, a resource on modulatory proteolysis, with a focus on calpain, a well-studied intracellular protease which regulates substrate functions by proteolytic processing. CaMPDB contains sequences of calpains, substrates and inhibitors as well as substrate cleavage sites, collected from the literature. Some cleavage efficiencies were evaluated by biochemical experiments and a cleavage site prediction tool is provided to assist biologists in understanding calpain-mediated cellular processes. CaMPDB is freely accessible at http://calpain.org.

虽然调节性蛋白水解在研究中的重要性稳步增加，但对这一过程的认识在很大程度上仍然是混乱的，组成调节性蛋白水解的实体的性质和作用仍然不确定。我们建立了CaMPDB，这是一个关于调节蛋白水解的资源，重点是calpain，一种经过充分研究的细胞内蛋白酶，通过蛋白水解过程调节底物功能。CaMPDB包含钙蛋白酶、底物和抑制剂以及底物裂解位点的序列，这些序列是从文献中收集的。通过生化实验评估了一些裂解效率，并提供了一个裂解位点预测工具，以帮助生物学家了解calpain介导的细胞过程。CaMPDB可在http://calpain.org免费访问。

引用次数: 51

Integer programming-based method for completing signaling pathways and its application to analysis of colorectal cancer. 基于整数规划的信号通路完成方法及其在结直肠癌分析中的应用。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01 DOI: 10.1142/9781848166585_0016

Takeyuki Tamura, Yoshihiro Yamanishi, M. Tanabe, S. Goto, M. Kanehisa, K. Horimoto, T. Akutsu

Signaling pathways are often represented by networks where each node corresponds to a protein and each edge corresponds to a relationship between nodes such as activation, inhibition and binding. However, such signaling pathways in a cell may be affected by genetic and epigenetic alteration. Some edges may be deleted and some edges may be newly added. The current knowledge about known signaling pathways is available on some public databases, but most of the signaling pathways including changes upon the cell state alterations remain largely unknown. In this paper, we develop an integer programming-based method for inferring such changes by using gene expression data. We test our method on its ability to reconstruct the pathway of colorectal cancer in the KEGG database.

信号通路通常由网络表示，其中每个节点对应一个蛋白质，每个边缘对应节点之间的关系，如激活、抑制和结合。然而，细胞中的这种信号通路可能受到遗传和表观遗传改变的影响。一些边可能被删除，一些边可能被新添加。目前关于已知信号通路的知识可以从一些公共数据库中获得，但大多数信号通路包括细胞状态改变的变化在很大程度上仍然未知。在本文中，我们开发了一种基于整数规划的方法，通过使用基因表达数据来推断这种变化。我们测试了我们的方法在KEGG数据库中重建结直肠癌通路的能力。

引用次数: 3

Predicting protein complex geometries with linear scoring functions. 用线性评分函数预测蛋白质复杂几何形状。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01

Ozgur Demir-Kavuk, Florian Krull, Myong-Ho Chae, Ernst-Walter Knapp

Protein-Protein interactions play an important role in many cellular processes. However experimental determination of the protein complex structure is quite difficult and time consuming. Hence, there is need for fast and accurate in silico protein docking methods. These methods generally consist of two stages: (i) a sampling algorithm that generates a large number of candidate complex geometries (decoys), and (ii) a scoring function that ranks these decoys such that nearnative decoys are higher ranked than other decoys. We have recently developed a neural network based scoring function that performed better than other state-of-the-art scoring functions on a benchmark of 65 protein complexes. Here, we use similar ideas to develop a method that is based on linear scoring functions. We compare the linear scoring function of the present study with other knowledge-based scoring functions such as ZDOCK 3.0, ZRANK and the previously developed neural network. Despite its simplicity the linear scoring function performs as good as the compared state-of-the-art methods and predictions are simple and rapid to compute.

蛋白质-蛋白质相互作用在许多细胞过程中起着重要作用。然而，蛋白质复合体结构的实验测定是相当困难和耗时的。因此，需要一种快速、准确的硅蛋白对接方法。这些方法通常由两个阶段组成:(i)生成大量候选复杂几何形状(诱饵)的抽样算法，以及(ii)对这些诱饵进行排名的评分函数，以便近地诱饵比其他诱饵排名更高。我们最近开发了一个基于神经网络的评分函数，在65个蛋白质复合物的基准上，它比其他最先进的评分函数表现得更好。在这里，我们使用类似的想法来开发一种基于线性评分函数的方法。我们将本研究的线性评分函数与其他基于知识的评分函数(如ZDOCK 3.0, ZRANK和先前开发的神经网络)进行了比较。尽管线性评分函数简单，但它的表现与比较的最先进的方法一样好，并且预测简单且计算迅速。

引用次数: 0

On the performance of methods for finding a switching mechanism in gene expression. 关于寻找基因表达转换机制的方法的性能。

Genome informatics. International Conference on Genome Informatics

Pub Date : 2010-01-01

Mitsunori Kayano, Ichigaku Takigawa, Motoki Shiga, Koji Tsuda, Hiroshi Mamitsuka

We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.

我们解决了检测基因表达开关机制的问题，其中两个基因在一个实验条件下正相关，而在另一个实验条件下负相关。我们比较了现有方法对该问题的性能，大致分为两种类型:交互测试(IT)和相关系数的差异。相互作用检验是目前遗传学中检测上位性的标准方法，它是在两个有或没有相互作用项的逻辑回归之间进行对数似然比检验，从而检查两个基因之间相互作用的强度。另一方面，两种实验条件下可以计算出两个相关系数，它们之间的差异更直观地反映了表达趋势的变化。在我们的实验中，我们测试了三种不同类型的相关系数:Pearson、Spearman和中相关(双权重中相关)。实验采用从GEO (Gene Expression Omnibus)数据库中选择的约2.3 × 10(9)个组合进行。我们根据IT的p值或相关系数差的绝对值对所有组合进行排序，然后根据切换机制直观地评估排名靠前的组合。结果表明:1)IT检测到的组合包括非切换组合;2)Pearson容易受到异常值的影响，而Spearman和中相关则可能避免异常值的影响。

{"title":"On the performance of methods for finding a switching mechanism in gene expression.","authors":"Mitsunori Kayano, Ichigaku Takigawa, Motoki Shiga, Koji Tsuda, Hiroshi Mamitsuka","doi":"","DOIUrl":"","url":null,"abstract":"We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.","PeriodicalId":73143,"journal":{"name":"Genome informatics. International Conference on Genome Informatics","volume":"24 ","pages":"69-83"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30252337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0