首页 > 最新文献

In Silico Biology最新文献

英文 中文
A prior knowledge based approach to infer gene regulatory networks 基于先验知识的基因调控网络推断方法
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722069
M. Hasan, N. Noman, H. Iba
In this research, we use S-System model and Differential Evolution based inference method to capture cellular dynamics using available mutual interaction information among genes. We propose a new fitness function, effectively incorporating a priori information, which guides the inference method to deduce correct skeletal structure of the network with more accurate parameter values. Proposed fitness function mirrors user's confidence in the validity of knowledge and helps in narrowing down the search range of the model parameters for highly confident knowledge. We investigate the potency of the method in terms of quality of data and required data size. The proposed method is shown to perform better in inherent noisy data and in presence of small number of time-dynamics data. We also investigate how the inference method performs in terms of iterative incorporation of knowledge. In inferring cell-cycle data of budding yeast (Saccharomyces cerevisiae), guided by knowledge, the inference method predicts 17 and 23 correct regulations in first and second iteration, respectively which is significantly higher than some other existing methods. Along with finding the parameter values more accurately, it predicts some new regulations and helps in revealing the underlying network structure.
在这项研究中,我们使用S-System模型和基于差分进化的推理方法,利用基因之间可用的相互作用信息来捕捉细胞动力学。我们提出了一种新的适应度函数,有效地结合了先验信息,指导推理方法以更准确的参数值推断出正确的网络骨架结构。提出的适应度函数反映了用户对知识有效性的置信度,有助于缩小模型参数对高置信度知识的搜索范围。我们在数据质量和所需数据大小方面调查了该方法的效力。结果表明,该方法在固有噪声数据和少量时间动态数据的情况下具有较好的性能。我们还研究了推理方法在知识迭代整合方面的表现。在对出芽酵母(Saccharomyces cerevisiae)细胞周期数据的推断中,在知识的指导下,该推理方法在第一次迭代和第二次迭代中分别预测出17条和23条正确规律,显著高于现有的一些方法。在更准确地找到参数值的同时,它还预测了一些新的规律,有助于揭示潜在的网络结构。
{"title":"A prior knowledge based approach to infer gene regulatory networks","authors":"M. Hasan, N. Noman, H. Iba","doi":"10.1145/1722024.1722069","DOIUrl":"https://doi.org/10.1145/1722024.1722069","url":null,"abstract":"In this research, we use S-System model and Differential Evolution based inference method to capture cellular dynamics using available mutual interaction information among genes. We propose a new fitness function, effectively incorporating a priori information, which guides the inference method to deduce correct skeletal structure of the network with more accurate parameter values. Proposed fitness function mirrors user's confidence in the validity of knowledge and helps in narrowing down the search range of the model parameters for highly confident knowledge. We investigate the potency of the method in terms of quality of data and required data size. The proposed method is shown to perform better in inherent noisy data and in presence of small number of time-dynamics data. We also investigate how the inference method performs in terms of iterative incorporation of knowledge. In inferring cell-cycle data of budding yeast (Saccharomyces cerevisiae), guided by knowledge, the inference method predicts 17 and 23 correct regulations in first and second iteration, respectively which is significantly higher than some other existing methods. Along with finding the parameter values more accurately, it predicts some new regulations and helps in revealing the underlying network structure.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
SpiceRDb: an integrated knowledgebase of "spice-disease" remedies SpiceRDb:“香料疾病”疗法的综合知识库
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722057
R. Pauly, M. Pradhan, M. Palakal
SpiceRDb knowledgebase is a unique attempt to elucidate the science behind the action of spices on various disease pathways, using text mining and molecular modeling tools. These spice-remedies have been demonstrated to mediate therapeutic benefits for wide spectra of diseases ranging from multiple sclerosis to colorectal cancer. Furthermore, the docking studies identified curcumin, a component of turmeric, to be a potential disease biomarker for colorectal neoplasm. Thus, the usefulness of the SpiceRDb knowledgebase motivates us to make it available to the public community in order to benefit from the vast knowledge available about alternative medicine projects and the recent scientific evidences supporting the benefits of spice-remedies.
SpiceRDb知识库是一个独特的尝试,阐明香料对各种疾病途径的作用背后的科学,使用文本挖掘和分子建模工具。这些香料疗法已被证明对从多发性硬化症到结直肠癌的广泛疾病有治疗作用。此外,对接研究发现姜黄的一种成分姜黄素是结直肠肿瘤的潜在疾病生物标志物。因此,SpiceRDb知识库的有用性促使我们将其提供给公众社区,以便从有关替代医学项目的大量可用知识和支持香料疗法益处的最新科学证据中受益。
{"title":"SpiceRDb: an integrated knowledgebase of \"spice-disease\" remedies","authors":"R. Pauly, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722057","DOIUrl":"https://doi.org/10.1145/1722024.1722057","url":null,"abstract":"SpiceRDb knowledgebase is a unique attempt to elucidate the science behind the action of spices on various disease pathways, using text mining and molecular modeling tools. These spice-remedies have been demonstrated to mediate therapeutic benefits for wide spectra of diseases ranging from multiple sclerosis to colorectal cancer. Furthermore, the docking studies identified curcumin, a component of turmeric, to be a potential disease biomarker for colorectal neoplasm. Thus, the usefulness of the SpiceRDb knowledgebase motivates us to make it available to the public community in order to benefit from the vast knowledge available about alternative medicine projects and the recent scientific evidences supporting the benefits of spice-remedies.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of toxicity and pharmacological potential of selected spice compounds 预测选定香料化合物的毒性和药理潜力
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722060
A. Riju, K. Sithara, S. S. Nair, S. Eapen
The use of computational tools in the prediction of ADME/Tox properties of compounds is growing rapidly in drug discovery as the benefits they provide in high throughput and early application in drug design are realized. Numerous examples exist of drugs that have had to be withdrawn, because of unacceptable toxicity, in clinical trials and even after reaching the market. In this study phytochemicals from selected spices were used to predict their rodent carcinogenicity, mutagenicity, PPB and BBB. Out of 108 compounds analysed, we found that only five compounds as non-mutagenic and non-carcinogenic and all the remaining were toxic in a pharmacological perspective. The five non-toxic compounds are alpha-zingiberene, delphinidin, laurotetanine, malabaricone-B and malabaricone-C. The PPB values of alpha-zingiberene, delphinidin and laurotetanine are in the <90% range (57.58, 88.41, 52.59, respectively) indicating that the three compounds were weakly bound to plasma proteins and the other two (malabaricone-B and malabaricone-C) strongly binds to plasma protein. The identification of delphinidin as a naturally occurring inhibitor of VEGF (vascular endothelial growth factor) receptors suggests that this molecule possesses important antiangiogenic properties that may be helpful for the prevention and treatment of cancer. The healing activity of malabaricone B and malabaricone C, the major antioxidant constituents of Myristaceae family, against indomethacin-induced gastric ulceration in mice has been studied. Though spices are well known for their antioxidant, antimicrobial, antinflammatory properties etc., this study clearly indicates the plethora of carcinogenic behaviour of spice compounds.
计算工具在预测化合物ADME/Tox特性方面的使用在药物发现中迅速增长,因为它们在高通量和药物设计中的早期应用中提供了好处。在临床试验中,甚至在上市后,由于不可接受的毒性而不得不撤回药物的例子数不胜数。本研究从香料中提取植物化学物质,对其致癌性、诱变性、PPB和BBB进行预测。在分析的108种化合物中,我们发现只有5种化合物是非致突变性和非致癌性的,其余所有化合物从药理学角度来看都是有毒的。这五种无毒化合物分别是-青果烯、飞燕苷、月桂破伤风碱、马拉巴里酮- b和马拉巴里酮- c。α -姜绿烯、飞鸽苷和laurotetanine的PPB值均在<90%的范围内(分别为57.58、88.41和52.59),表明这3种化合物与血浆蛋白的结合较弱,而另外2种化合物(malabaricon - b和malabaricon - c)与血浆蛋白的结合较强。水飞蓟素作为血管内皮生长因子受体的天然抑制剂的鉴定表明,这种分子具有重要的抗血管生成特性,可能有助于预防和治疗癌症。研究了肉豆蔻科主要抗氧化成分马拉巴利酮B和马拉巴利酮C对吲哚美辛致小鼠胃溃疡的愈合作用。尽管香料以其抗氧化、抗菌、抗炎等特性而闻名,但这项研究清楚地表明,香料化合物有过多的致癌行为。
{"title":"Prediction of toxicity and pharmacological potential of selected spice compounds","authors":"A. Riju, K. Sithara, S. S. Nair, S. Eapen","doi":"10.1145/1722024.1722060","DOIUrl":"https://doi.org/10.1145/1722024.1722060","url":null,"abstract":"The use of computational tools in the prediction of ADME/Tox properties of compounds is growing rapidly in drug discovery as the benefits they provide in high throughput and early application in drug design are realized. Numerous examples exist of drugs that have had to be withdrawn, because of unacceptable toxicity, in clinical trials and even after reaching the market. In this study phytochemicals from selected spices were used to predict their rodent carcinogenicity, mutagenicity, PPB and BBB. Out of 108 compounds analysed, we found that only five compounds as non-mutagenic and non-carcinogenic and all the remaining were toxic in a pharmacological perspective. The five non-toxic compounds are alpha-zingiberene, delphinidin, laurotetanine, malabaricone-B and malabaricone-C. The PPB values of alpha-zingiberene, delphinidin and laurotetanine are in the <90% range (57.58, 88.41, 52.59, respectively) indicating that the three compounds were weakly bound to plasma proteins and the other two (malabaricone-B and malabaricone-C) strongly binds to plasma protein. The identification of delphinidin as a naturally occurring inhibitor of VEGF (vascular endothelial growth factor) receptors suggests that this molecule possesses important antiangiogenic properties that may be helpful for the prevention and treatment of cancer. The healing activity of malabaricone B and malabaricone C, the major antioxidant constituents of Myristaceae family, against indomethacin-induced gastric ulceration in mice has been studied. Though spices are well known for their antioxidant, antimicrobial, antinflammatory properties etc., this study clearly indicates the plethora of carcinogenic behaviour of spice compounds.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722060","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Biomedical association mining and validation 生物医学关联挖掘与验证
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722035
P. Gandra, M. Pradhan, M. Palakal
During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.
在过去十年中,生物医学文献中发表的数据呈指数级增长。随着这种增长,手动阅读所有论文以获取所需信息变得很困难。已经开发了许多文本挖掘算法和方法来从现有文献中提取信息。其中一个重要的信息是发现功能术语之间的联系,如基因、蛋白质、药物、疾病等。这些联系可以是随意的、明确的或隐含的。最常见的应用之一是从Pubmed中挖掘蛋白质之间的相互作用。目前研究的重点是识别和验证隐性蛋白质-蛋白质关联,因为这些很难从文献中识别。当自动检测到这些关联时,它们是嘈杂的,需要验证其生物学意义。在验证过程中,这些关联通过一系列过滤器和算法来去除数据中存在的噪声。在这项研究中,我们使用16个基因id从Pubmed数据库中检索到与再生生物学相关的32,693篇文献,193,738个句子。从这些句子中,BioMap发现了10004显式和30,000隐式蛋白质相互作用对,使用所提出的方法进行了验证。最后确定了308个隐式对作为该方法的结果。这些结果表明,所提出的方法可以有效地用于通过文献挖掘获得的隐式蛋白质-蛋白质相互作用的生物学验证。
{"title":"Biomedical association mining and validation","authors":"P. Gandra, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722035","DOIUrl":"https://doi.org/10.1145/1722024.1722035","url":null,"abstract":"During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A novel system for predicting plant protein kinase superfamily by using machine learning methodology 利用机器学习方法预测植物蛋白激酶超家族的新系统
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722064
V. Mallika, K. Sivakumar, E. Soniya
Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.
蛋白激酶是最大的蛋白质超家族之一,几乎参与了每一个细胞过程。在植物中,由于其在细胞通讯,生长和发育中的重要作用,对这种特殊蛋白质的研究越来越多。开发一种工具来确定序列是植物蛋白激酶的可能性,将简化工作并加快实验表征。在这种方法中,已经开发并实现了一个高性能预测服务器“PhytokinaseSVM”,可以在http://type3pks.in/kinase上获得。在开发过程中,利用了支持向量机、基于核的监督学习技术以及二肽和多重频率的组成特性。基于有限的可用数据,该工具提供了一个简单独特的平台来确定特定序列是否是植物蛋白激酶的概率,准确度中等(98%)。当对500种蛋白激酶和500种非蛋白激酶进行测试时,PhytokinaseSVM达到了96%的特异性和100%的灵敏度,而这些蛋白激酶不是训练数据集的一部分。我们期望该工具可以作为植物蛋白激酶研究人员的有用资源,因为它是免费的。该工具还可以预测其他真核蛋白激酶。目前正在进行的工作是通过在训练数据集中包含更多的序列特征来进一步提高预测精度。
{"title":"A novel system for predicting plant protein kinase superfamily by using machine learning methodology","authors":"V. Mallika, K. Sivakumar, E. Soniya","doi":"10.1145/1722024.1722064","DOIUrl":"https://doi.org/10.1145/1722024.1722064","url":null,"abstract":"Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending Bafna-Pevzner algorithm 扩展Bafna-Pevzner算法
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722051
Ulisses Dias, Zanoni Dias
Genome Rearrangement is a field that addresses the problem of finding the minimum number of global operations, such as transpositions, reversals, fusions and fissions that transform a given genome into another. In this work we deal with transposition events, which are events that change the position of two contiguous block of genes in the same chromosome. Some approximation algorithms for this problem were published so far. Bafna and Pevzner [1] proposed the first 1.5-approximation algorithm for the transposition distance problem and recently Elias and Hartman [4] delineated a 1.375-approximation algorithm, which is currently the best performance ratio known. Many other algorithms achieve good performance on experimental results and provide new insights to solve the problem [2, 5, 8, 9, 11]. In this paper we present two main results. The first result is the implementation of the 1.375-algorithm described by Elias and Hartman [4]. We also compared the experimental results from Elias-Hartman algorithm with other approaches. It is important to realize that no implementation of Elias-Hartman algorithm was provided before this work and the approximation proof was assisted by a computer program. Although the approximation ratio is an important issue, we need to know how the algorithm behaves on practical experiments. For this reason, we show the experimental results of Elias-Hartman algorithm using our datasets. The second result is the description of our algorithm based on Bafna and Pevzner [1] 1.5-approximation algorithm. Our algorithm uses a set of heuristics that allowed us to improve the solution quality of the original algorithm, but keeping the original 1.5-approximation ratio. We compare our experimental results with the best results published so far. The results indicate that our algorithm performs best in practice. The solution quality analysis also shows that our algorithm outperforms Elias and Hartman 1.375-approximation algorithm on longer permutations, despite the approximation ratio. We delineate an algorithm for the transposition distance problem. Our algorithm is the first polynomial time algorithm that sorts by transposition any permutation π, for |π| = 9. We show that our algorithm is better than the other algorithms using sequences π, for π < 11. We also show that our algorithm keeps the good performance on longer permutations. We claim that the heuristics proposed in this work contribute for discovering the complexity of sorting by transposition, which remains open.
基因组重排是一个解决寻找最小数量的全局操作问题的领域,例如将给定基因组转换为另一个基因组的转位,反转,融合和裂变。在这项工作中,我们处理转位事件,这是改变两个连续的基因块在同一染色体上的位置的事件。目前已经发表了一些求解该问题的近似算法。Bafna和Pevzner[1]提出了移位距离问题的第一个1.5近似算法,最近Elias和Hartman[4]提出了目前已知性能最好的1.375近似算法。许多其他算法在实验结果上取得了良好的性能,并为解决问题提供了新的见解[2,5,8,9,11]。在本文中,我们提出了两个主要结果。第一个结果是Elias和Hartman[4]描述的1.375算法的实现。并将Elias-Hartman算法与其他方法的实验结果进行了比较。重要的是要认识到,在此工作之前没有提供Elias-Hartman算法的实现,并且近似证明是由计算机程序辅助的。虽然近似比是一个重要的问题,但我们需要知道算法在实际实验中的表现。因此,我们使用我们的数据集展示了Elias-Hartman算法的实验结果。第二个结果是基于Bafna和Pevzner[1] 1.5近似算法的算法描述。我们的算法使用了一组启发式方法,使我们能够提高原始算法的解质量,但保持原始的1.5近似比。我们将我们的实验结果与迄今为止发表的最佳结果进行了比较。结果表明,该算法在实际应用中具有较好的性能。解质量分析还表明,尽管近似比存在,但我们的算法在长排列上优于Elias和Hartman的1.375近似算法。提出了一种求解变换距离问题的算法。对于|π| = 9,我们的算法是第一个通过转置对任意排列π进行排序的多项式时间算法。当π < 11时,我们的算法优于其他使用π序列的算法。我们还证明了我们的算法在较长的排列上保持了良好的性能。我们声称,在这项工作中提出的启发式有助于发现通过换位排序的复杂性,这仍然是开放的。
{"title":"Extending Bafna-Pevzner algorithm","authors":"Ulisses Dias, Zanoni Dias","doi":"10.1145/1722024.1722051","DOIUrl":"https://doi.org/10.1145/1722024.1722051","url":null,"abstract":"Genome Rearrangement is a field that addresses the problem of finding the minimum number of global operations, such as transpositions, reversals, fusions and fissions that transform a given genome into another. In this work we deal with transposition events, which are events that change the position of two contiguous block of genes in the same chromosome.\u0000 Some approximation algorithms for this problem were published so far. Bafna and Pevzner [1] proposed the first 1.5-approximation algorithm for the transposition distance problem and recently Elias and Hartman [4] delineated a 1.375-approximation algorithm, which is currently the best performance ratio known. Many other algorithms achieve good performance on experimental results and provide new insights to solve the problem [2, 5, 8, 9, 11].\u0000 In this paper we present two main results. The first result is the implementation of the 1.375-algorithm described by Elias and Hartman [4]. We also compared the experimental results from Elias-Hartman algorithm with other approaches. It is important to realize that no implementation of Elias-Hartman algorithm was provided before this work and the approximation proof was assisted by a computer program. Although the approximation ratio is an important issue, we need to know how the algorithm behaves on practical experiments. For this reason, we show the experimental results of Elias-Hartman algorithm using our datasets.\u0000 The second result is the description of our algorithm based on Bafna and Pevzner [1] 1.5-approximation algorithm. Our algorithm uses a set of heuristics that allowed us to improve the solution quality of the original algorithm, but keeping the original 1.5-approximation ratio. We compare our experimental results with the best results published so far. The results indicate that our algorithm performs best in practice. The solution quality analysis also shows that our algorithm outperforms Elias and Hartman 1.375-approximation algorithm on longer permutations, despite the approximation ratio.\u0000 We delineate an algorithm for the transposition distance problem. Our algorithm is the first polynomial time algorithm that sorts by transposition any permutation π, for |π| = 9. We show that our algorithm is better than the other algorithms using sequences π, for π < 11. We also show that our algorithm keeps the good performance on longer permutations. We claim that the heuristics proposed in this work contribute for discovering the complexity of sorting by transposition, which remains open.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A novel measure for evaluating an ordered list: application in microRNA target prediction 一种评价有序序列的新方法:在microRNA靶标预测中的应用
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722067
Debarka Sengupta, S. Bandyopadhyay, U. Maulik
Sensitivity and specificity are the most widely used statistics for measuring the performance of a binary classification test. They stand vastly meaningful for variety of use cases where the classifying tests are affordable. But unfortunately, there is a legion of problems arriving from different streams of natural sciences where the screening test is too expensive to render for all the predicted objects. Thus, the trend has been for scientists to calculate the sensitivity and the specificity of a binary classification test based on a handful of experimentally proven facts, which is theoretically uncertain. In this article a novel measure is proposed that assigns importance to multiple ordered lists, taking into account the share of majority voted ranked pairs of elements a list contains. A real life bioinformatic application is demonstrated in the domain of microRNA target prediction where a number of algorithms exist. Using the proposed measure, we aim to assign certain weight to each algorithm that conveys its reliability with respect to the rest.
灵敏度和特异性是最广泛使用的统计量,以衡量一个二元分类测试的性能。它们对于分类测试负担得起的各种用例来说意义重大。但不幸的是,有大量的问题来自不同的自然科学流,其中筛选测试过于昂贵,无法渲染所有预测对象。因此,科学家们的趋势是根据少数实验证明的事实来计算二元分类测试的灵敏度和特异性,这些事实在理论上是不确定的。在这篇文章中,提出了一种新的方法来分配重要性的多个有序列表,考虑到多数投票排序对的元素列表包含的份额。现实生活中的生物信息学应用在microRNA靶标预测领域得到了证明,其中存在许多算法。使用提出的度量,我们的目标是为每个算法分配一定的权重,以传达其相对于其他算法的可靠性。
{"title":"A novel measure for evaluating an ordered list: application in microRNA target prediction","authors":"Debarka Sengupta, S. Bandyopadhyay, U. Maulik","doi":"10.1145/1722024.1722067","DOIUrl":"https://doi.org/10.1145/1722024.1722067","url":null,"abstract":"Sensitivity and specificity are the most widely used statistics for measuring the performance of a binary classification test. They stand vastly meaningful for variety of use cases where the classifying tests are affordable. But unfortunately, there is a legion of problems arriving from different streams of natural sciences where the screening test is too expensive to render for all the predicted objects. Thus, the trend has been for scientists to calculate the sensitivity and the specificity of a binary classification test based on a handful of experimentally proven facts, which is theoretically uncertain. In this article a novel measure is proposed that assigns importance to multiple ordered lists, taking into account the share of majority voted ranked pairs of elements a list contains. A real life bioinformatic application is demonstrated in the domain of microRNA target prediction where a number of algorithms exist. Using the proposed measure, we aim to assign certain weight to each algorithm that conveys its reliability with respect to the rest.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722067","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A fast bit-parallel multi-patterns string matching algorithm for biological sequences 生物序列的快速位并行多模式字符串匹配算法
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722077
R. Prasad, S. Agarwal, I. Yadav, Bharat Singh
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).
搜索模式P[0…]出现的问题m-1]在文本T[0…]n-1>, m≤n,其中P和T的符号取自大小为Σ的字母表Σ,称为精确字符串匹配问题。目前,模式匹配是在生物序列数据库中定位核苷酸或氨基酸序列模式的有力工具。搜索一组模式P0 P1 P2…r-1, r≥1,在给定文本T中称为多模式字符串匹配问题。多模式字符串匹配问题以前已经通过有效的位并行字符串匹配算法:shift-or和BNDM来解决。许多其他类型的算法也存在于相同的目的,但比特并行已被证明比其他算法更有效。在本文中,我们用q-gram (B. Durian et al., 2008)扩展了BNDM算法,用于多模式,其中每个多模式是任意DNA模式。我们假设每个模式的大小为m,模式的总长度小于或等于所用计算机的字长(w)。由于BNDM算法已被证明比其他任何位并行字符串匹配算法都要快(G. Navarro, 2000),因此,我们比较了多模式q-gram BNDM算法与现有BNDM算法在不同q值和模式数(r)下的性能。
{"title":"A fast bit-parallel multi-patterns string matching algorithm for biological sequences","authors":"R. Prasad, S. Agarwal, I. Yadav, Bharat Singh","doi":"10.1145/1722024.1722077","DOIUrl":"https://doi.org/10.1145/1722024.1722077","url":null,"abstract":"The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722077","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mining SSR and SNP/Indel sites in expressed sequence tag libraries of Radopholus similis 类人猿表达序列标签库中SSR和SNP/Indel位点的挖掘
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722042
A. Riju, P. Lakshmi, P. Nima, N. Reena, S. Eapen
The objective of this study is to explore the single sequence repeats (SSRs) and single nucleotide polymorphims (SNPs) in expressed sequence tags (ESTs) of Radopholus similis. We retrieved 7380 EST sequences consisting different tissues/condition libraries from dbEST of National Centre for Biotechnology Information (NCBI). A total of 1449 SSRs were detected by MISA perl script. Hexa-nucleotide repeats (836 nos.) followed by mononucleotide repeats (207 nos.) were found to be more abundant than other types of repeats. Putative SNP/Indels were found out with the help of AutoSNP. As many as 1038 SNPs and 108 small indels (insertion/deletion) were found with a density of one SNP/191 bp and one indel/1.8 kbp. Candidate SNPs were categorized according to nucleotide substitution as either transition (C↔T or G↔A) or transversion (C↔G, A↔T, C↔A or T↔G). We observed a higher number of transversions type substitution (537) than transitions (501). However considering the individual substitutions, G↔A (281) and C↔T (220) were found to be predominant than purine to pyrimidine base substitutions. Since the SSR and SNP markers are invaluable tools for genetic analysis, the identified SSRs and SNPs of R. similis could be used in diversity analysis, genetic trait mapping, association studies and marker assisted selection.
本研究的目的是探讨相似Radopholus similis表达序列标签(est)中的单序列重复序列(SSRs)和单核苷酸多态性(snp)。我们从美国国家生物技术信息中心(NCBI)的dbEST数据库中检索了7380条EST序列,这些序列包含不同的组织/条件库。MISA perl脚本共检测到1449个SSRs。六核苷酸重复序列(836个)和单核苷酸重复序列(207个)比其他类型的重复序列更丰富。假定的SNP/Indels是在AutoSNP的帮助下发现的。共发现1038个SNP和108个小缺失(插入/缺失),密度分别为1个SNP/191 bp和1个indel/1.8 kbp。候选snp根据核苷酸替换分为过渡(C↔T或G↔A)或转换(C↔G, A↔T, C↔A或T↔G)。我们观察到更多的翻转型取代(537)比过渡(501)。然而,考虑到单个替换,我们发现G↔A(281)和C↔T(220)比嘌呤到嘧啶基替换更占优势。由于SSR和SNP标记是遗传分析的宝贵工具,因此鉴定出的相似根SSR和SNP可用于多样性分析、遗传性状定位、关联研究和标记辅助选择。
{"title":"Mining SSR and SNP/Indel sites in expressed sequence tag libraries of Radopholus similis","authors":"A. Riju, P. Lakshmi, P. Nima, N. Reena, S. Eapen","doi":"10.1145/1722024.1722042","DOIUrl":"https://doi.org/10.1145/1722024.1722042","url":null,"abstract":"The objective of this study is to explore the single sequence repeats (SSRs) and single nucleotide polymorphims (SNPs) in expressed sequence tags (ESTs) of Radopholus similis. We retrieved 7380 EST sequences consisting different tissues/condition libraries from dbEST of National Centre for Biotechnology Information (NCBI). A total of 1449 SSRs were detected by MISA perl script. Hexa-nucleotide repeats (836 nos.) followed by mononucleotide repeats (207 nos.) were found to be more abundant than other types of repeats. Putative SNP/Indels were found out with the help of AutoSNP. As many as 1038 SNPs and 108 small indels (insertion/deletion) were found with a density of one SNP/191 bp and one indel/1.8 kbp. Candidate SNPs were categorized according to nucleotide substitution as either transition (C↔T or G↔A) or transversion (C↔G, A↔T, C↔A or T↔G). We observed a higher number of transversions type substitution (537) than transitions (501). However considering the individual substitutions, G↔A (281) and C↔T (220) were found to be predominant than purine to pyrimidine base substitutions. Since the SSR and SNP markers are invaluable tools for genetic analysis, the identified SSRs and SNPs of R. similis could be used in diversity analysis, genetic trait mapping, association studies and marker assisted selection.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of disordered regions in protein kinase subfamilies of Homo sapiens and Coenorhabditis elegans 智人与秀丽隐杆线虫蛋白激酶亚家族紊乱区分析
Q2 Medicine Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722028
K. Kurup, J. Natarajan
Protein kinase is a kinase enzyme that modifies other proteins by chemically adding phosphate groups to them. In this work, first the protein kinases of Coenorhabditis elegans and Homo sapiens with three or more common domain were grouped and disorder regions of protein kinases in each group were predicted. Then the similarities of the disordered regions among the organisms were found. Linear motifs present in these similar disorder regions were identified and tested for their conservation in both Homo sapiens and Coenorhabditis elegans. It is found that, though the similarities in disorder regions are high, the linear motifs are not conserved much in these distantly related organisms.
蛋白激酶是一种激酶酶,通过在其他蛋白质上添加磷酸基团来修饰它们。本研究首先对线虫和智人具有3个或3个以上共同结构域的蛋白激酶进行分组,并对每组蛋白激酶的紊乱区进行预测。然后发现了生物间无序区域的相似性。在这些相似的紊乱区域中存在线性基序,并对其在智人和秀丽隐杆线虫中的保守性进行了鉴定和测试。研究发现,尽管在这些远亲生物中,紊乱区域的相似性很高,但线性基序的保守性并不高。
{"title":"Analysis of disordered regions in protein kinase subfamilies of Homo sapiens and Coenorhabditis elegans","authors":"K. Kurup, J. Natarajan","doi":"10.1145/1722024.1722028","DOIUrl":"https://doi.org/10.1145/1722024.1722028","url":null,"abstract":"Protein kinase is a kinase enzyme that modifies other proteins by chemically adding phosphate groups to them. In this work, first the protein kinases of Coenorhabditis elegans and Homo sapiens with three or more common domain were grouped and disorder regions of protein kinases in each group were predicted. Then the similarities of the disordered regions among the organisms were found. Linear motifs present in these similar disorder regions were identified and tested for their conservation in both Homo sapiens and Coenorhabditis elegans. It is found that, though the similarities in disorder regions are high, the linear motifs are not conserved much in these distantly related organisms.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
In Silico Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1