In Silico Biology最新文献

英文中文

Gene regulatory network from microarray data using dynamic neural fuzzy approach 基因调控网络从微阵列数据采用动态神经模糊方法

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722044

S. Vineetha, C. Chandra Shekara Bhat, S. M. Idicula

The paper presents a multilayered dynamic neural fuzzy network (DNFN) to extract regulatory relationship among genes and reconstruct gene regulatory network for circulating plasma RNA data from colon cancer patients. This method combines the merits of connectionist and fuzzy approaches. It encodes the knowledge learned in the form of fuzzy rules and processes data following fuzzy reasoning principles. While the dynamic aspect of gene regulation was taken into account through the on-line learning of fuzzy rules, the structural learning together with the parameter learning form a fast learning algorithm for building a small, yet powerful, dynamic neural fuzzy network. One of the main advantages of DNFN is that there is no predetermination of hidden nodes, since it can find its optimal structure automatically and quickly. The inferred knowledge using the above network may provide biological insights that can be used to design and interpret further experiments.

本文采用多层动态神经模糊网络(DNFN)提取结肠癌患者循环血浆RNA数据的基因调控关系，重构基因调控网络。该方法结合了连接方法和模糊方法的优点。它以模糊规则的形式对所学知识进行编码，并按照模糊推理原则对数据进行处理。通过模糊规则的在线学习，考虑了基因调控的动态方面，结构学习和参数学习形成了快速学习算法，构建了一个小而强大的动态神经模糊网络。DNFN的主要优点之一是不需要预先确定隐藏节点，可以自动快速地找到其最优结构。使用上述网络推断出的知识可能提供生物学见解，可用于设计和解释进一步的实验。

引用次数: 4

A prior knowledge based approach to infer gene regulatory networks 基于先验知识的基因调控网络推断方法

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722069

M. Hasan, N. Noman, H. Iba

In this research, we use S-System model and Differential Evolution based inference method to capture cellular dynamics using available mutual interaction information among genes. We propose a new fitness function, effectively incorporating a priori information, which guides the inference method to deduce correct skeletal structure of the network with more accurate parameter values. Proposed fitness function mirrors user's confidence in the validity of knowledge and helps in narrowing down the search range of the model parameters for highly confident knowledge. We investigate the potency of the method in terms of quality of data and required data size. The proposed method is shown to perform better in inherent noisy data and in presence of small number of time-dynamics data. We also investigate how the inference method performs in terms of iterative incorporation of knowledge. In inferring cell-cycle data of budding yeast (Saccharomyces cerevisiae), guided by knowledge, the inference method predicts 17 and 23 correct regulations in first and second iteration, respectively which is significantly higher than some other existing methods. Along with finding the parameter values more accurately, it predicts some new regulations and helps in revealing the underlying network structure.

在这项研究中，我们使用S-System模型和基于差分进化的推理方法，利用基因之间可用的相互作用信息来捕捉细胞动力学。我们提出了一种新的适应度函数，有效地结合了先验信息，指导推理方法以更准确的参数值推断出正确的网络骨架结构。提出的适应度函数反映了用户对知识有效性的置信度，有助于缩小模型参数对高置信度知识的搜索范围。我们在数据质量和所需数据大小方面调查了该方法的效力。结果表明，该方法在固有噪声数据和少量时间动态数据的情况下具有较好的性能。我们还研究了推理方法在知识迭代整合方面的表现。在对出芽酵母(Saccharomyces cerevisiae)细胞周期数据的推断中，在知识的指导下，该推理方法在第一次迭代和第二次迭代中分别预测出17条和23条正确规律，显著高于现有的一些方法。在更准确地找到参数值的同时，它还预测了一些新的规律，有助于揭示潜在的网络结构。

{"title":"A prior knowledge based approach to infer gene regulatory networks","authors":"M. Hasan, N. Noman, H. Iba","doi":"10.1145/1722024.1722069","DOIUrl":"https://doi.org/10.1145/1722024.1722069","url":null,"abstract":"In this research, we use S-System model and Differential Evolution based inference method to capture cellular dynamics using available mutual interaction information among genes. We propose a new fitness function, effectively incorporating a priori information, which guides the inference method to deduce correct skeletal structure of the network with more accurate parameter values. Proposed fitness function mirrors user's confidence in the validity of knowledge and helps in narrowing down the search range of the model parameters for highly confident knowledge. We investigate the potency of the method in terms of quality of data and required data size. The proposed method is shown to perform better in inherent noisy data and in presence of small number of time-dynamics data. We also investigate how the inference method performs in terms of iterative incorporation of knowledge. In inferring cell-cycle data of budding yeast (Saccharomyces cerevisiae), guided by knowledge, the inference method predicts 17 and 23 correct regulations in first and second iteration, respectively which is significantly higher than some other existing methods. Along with finding the parameter values more accurately, it predicts some new regulations and helps in revealing the underlying network structure.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"39"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Improving motif refinement using hybrid expectation maximization and random projection 利用混合期望最大化和随机投影改进基序优化

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722048

H. S. Shashidhara, Prince Joseph, K. Srinivasa

The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling are sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. A novel optimization framework searches the neighborhood regions of the initial alignments in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. The work aims at implementing the hybrid algorithm and enhancing it by trying different global methods and other techniques. Then aggregation methods rather than projection methods are tried.

基序查找问题的主要目标是在一组序列中检测出新颖的、过度表示的未知信号。期望最大化(EM)和吉布斯抽样等流行算法对初始猜测很敏感，并且很快收敛到最近的局部最大值。一种新的优化框架以系统的方式搜索初始排列的邻域，以探索多个局部最优解。通过将原优化问题转化为相应的动力系统，并估计局部最大值的实际稳定边界，实现了有效的搜索。该工作旨在实现混合算法，并通过尝试不同的全局方法和其他技术来增强混合算法。然后尝试聚合法而不是投影法。

引用次数: 2

SpiceRDb: an integrated knowledgebase of "spice-disease" remedies SpiceRDb:“香料疾病”疗法的综合知识库

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722057

R. Pauly, M. Pradhan, M. Palakal

SpiceRDb knowledgebase is a unique attempt to elucidate the science behind the action of spices on various disease pathways, using text mining and molecular modeling tools. These spice-remedies have been demonstrated to mediate therapeutic benefits for wide spectra of diseases ranging from multiple sclerosis to colorectal cancer. Furthermore, the docking studies identified curcumin, a component of turmeric, to be a potential disease biomarker for colorectal neoplasm. Thus, the usefulness of the SpiceRDb knowledgebase motivates us to make it available to the public community in order to benefit from the vast knowledge available about alternative medicine projects and the recent scientific evidences supporting the benefits of spice-remedies.

SpiceRDb知识库是一个独特的尝试，阐明香料对各种疾病途径的作用背后的科学，使用文本挖掘和分子建模工具。这些香料疗法已被证明对从多发性硬化症到结直肠癌的广泛疾病有治疗作用。此外，对接研究发现姜黄的一种成分姜黄素是结直肠肿瘤的潜在疾病生物标志物。因此，SpiceRDb知识库的有用性促使我们将其提供给公众社区，以便从有关替代医学项目的大量可用知识和支持香料疗法益处的最新科学证据中受益。

引用次数: 0

Biomedical association mining and validation 生物医学关联挖掘与验证

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722035

P. Gandra, M. Pradhan, M. Palakal

During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.

在过去十年中，生物医学文献中发表的数据呈指数级增长。随着这种增长，手动阅读所有论文以获取所需信息变得很困难。已经开发了许多文本挖掘算法和方法来从现有文献中提取信息。其中一个重要的信息是发现功能术语之间的联系，如基因、蛋白质、药物、疾病等。这些联系可以是随意的、明确的或隐含的。最常见的应用之一是从Pubmed中挖掘蛋白质之间的相互作用。目前研究的重点是识别和验证隐性蛋白质-蛋白质关联，因为这些很难从文献中识别。当自动检测到这些关联时，它们是嘈杂的，需要验证其生物学意义。在验证过程中，这些关联通过一系列过滤器和算法来去除数据中存在的噪声。在这项研究中，我们使用16个基因id从Pubmed数据库中检索到与再生生物学相关的32,693篇文献，193,738个句子。从这些句子中，BioMap发现了10004显式和30,000隐式蛋白质相互作用对，使用所提出的方法进行了验证。最后确定了308个隐式对作为该方法的结果。这些结果表明，所提出的方法可以有效地用于通过文献挖掘获得的隐式蛋白质-蛋白质相互作用的生物学验证。

{"title":"Biomedical association mining and validation","authors":"P. Gandra, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722035","DOIUrl":"https://doi.org/10.1145/1722024.1722035","url":null,"abstract":"During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Extending Bafna-Pevzner algorithm 扩展Bafna-Pevzner算法

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722051

Ulisses Dias, Zanoni Dias

Genome Rearrangement is a field that addresses the problem of finding the minimum number of global operations, such as transpositions, reversals, fusions and fissions that transform a given genome into another. In this work we deal with transposition events, which are events that change the position of two contiguous block of genes in the same chromosome. Some approximation algorithms for this problem were published so far. Bafna and Pevzner [1] proposed the first 1.5-approximation algorithm for the transposition distance problem and recently Elias and Hartman [4] delineated a 1.375-approximation algorithm, which is currently the best performance ratio known. Many other algorithms achieve good performance on experimental results and provide new insights to solve the problem [2, 5, 8, 9, 11]. In this paper we present two main results. The first result is the implementation of the 1.375-algorithm described by Elias and Hartman [4]. We also compared the experimental results from Elias-Hartman algorithm with other approaches. It is important to realize that no implementation of Elias-Hartman algorithm was provided before this work and the approximation proof was assisted by a computer program. Although the approximation ratio is an important issue, we need to know how the algorithm behaves on practical experiments. For this reason, we show the experimental results of Elias-Hartman algorithm using our datasets. The second result is the description of our algorithm based on Bafna and Pevzner [1] 1.5-approximation algorithm. Our algorithm uses a set of heuristics that allowed us to improve the solution quality of the original algorithm, but keeping the original 1.5-approximation ratio. We compare our experimental results with the best results published so far. The results indicate that our algorithm performs best in practice. The solution quality analysis also shows that our algorithm outperforms Elias and Hartman 1.375-approximation algorithm on longer permutations, despite the approximation ratio. We delineate an algorithm for the transposition distance problem. Our algorithm is the first polynomial time algorithm that sorts by transposition any permutation π, for |π| = 9. We show that our algorithm is better than the other algorithms using sequences π, for π < 11. We also show that our algorithm keeps the good performance on longer permutations. We claim that the heuristics proposed in this work contribute for discovering the complexity of sorting by transposition, which remains open.

基因组重排是一个解决寻找最小数量的全局操作问题的领域，例如将给定基因组转换为另一个基因组的转位，反转，融合和裂变。在这项工作中，我们处理转位事件，这是改变两个连续的基因块在同一染色体上的位置的事件。目前已经发表了一些求解该问题的近似算法。Bafna和Pevzner[1]提出了移位距离问题的第一个1.5近似算法，最近Elias和Hartman[4]提出了目前已知性能最好的1.375近似算法。许多其他算法在实验结果上取得了良好的性能，并为解决问题提供了新的见解[2,5,8,9,11]。在本文中，我们提出了两个主要结果。第一个结果是Elias和Hartman[4]描述的1.375算法的实现。并将Elias-Hartman算法与其他方法的实验结果进行了比较。重要的是要认识到，在此工作之前没有提供Elias-Hartman算法的实现，并且近似证明是由计算机程序辅助的。虽然近似比是一个重要的问题，但我们需要知道算法在实际实验中的表现。因此，我们使用我们的数据集展示了Elias-Hartman算法的实验结果。第二个结果是基于Bafna和Pevzner[1] 1.5近似算法的算法描述。我们的算法使用了一组启发式方法，使我们能够提高原始算法的解质量，但保持原始的1.5近似比。我们将我们的实验结果与迄今为止发表的最佳结果进行了比较。结果表明，该算法在实际应用中具有较好的性能。解质量分析还表明，尽管近似比存在，但我们的算法在长排列上优于Elias和Hartman的1.375近似算法。提出了一种求解变换距离问题的算法。对于|π| = 9，我们的算法是第一个通过转置对任意排列π进行排序的多项式时间算法。当π < 11时，我们的算法优于其他使用π序列的算法。我们还证明了我们的算法在较长的排列上保持了良好的性能。我们声称，在这项工作中提出的启发式有助于发现通过换位排序的复杂性，这仍然是开放的。

{"title":"Extending Bafna-Pevzner algorithm","authors":"Ulisses Dias, Zanoni Dias","doi":"10.1145/1722024.1722051","DOIUrl":"https://doi.org/10.1145/1722024.1722051","url":null,"abstract":"Genome Rearrangement is a field that addresses the problem of finding the minimum number of global operations, such as transpositions, reversals, fusions and fissions that transform a given genome into another. In this work we deal with transposition events, which are events that change the position of two contiguous block of genes in the same chromosome.\u0000 Some approximation algorithms for this problem were published so far. Bafna and Pevzner [1] proposed the first 1.5-approximation algorithm for the transposition distance problem and recently Elias and Hartman [4] delineated a 1.375-approximation algorithm, which is currently the best performance ratio known. Many other algorithms achieve good performance on experimental results and provide new insights to solve the problem [2, 5, 8, 9, 11].\u0000 In this paper we present two main results. The first result is the implementation of the 1.375-algorithm described by Elias and Hartman [4]. We also compared the experimental results from Elias-Hartman algorithm with other approaches. It is important to realize that no implementation of Elias-Hartman algorithm was provided before this work and the approximation proof was assisted by a computer program. Although the approximation ratio is an important issue, we need to know how the algorithm behaves on practical experiments. For this reason, we show the experimental results of Elias-Hartman algorithm using our datasets.\u0000 The second result is the description of our algorithm based on Bafna and Pevzner [1] 1.5-approximation algorithm. Our algorithm uses a set of heuristics that allowed us to improve the solution quality of the original algorithm, but keeping the original 1.5-approximation ratio. We compare our experimental results with the best results published so far. The results indicate that our algorithm performs best in practice. The solution quality analysis also shows that our algorithm outperforms Elias and Hartman 1.375-approximation algorithm on longer permutations, despite the approximation ratio.\u0000 We delineate an algorithm for the transposition distance problem. Our algorithm is the first polynomial time algorithm that sorts by transposition any permutation π, for |π| = 9. We show that our algorithm is better than the other algorithms using sequences π, for π < 11. We also show that our algorithm keeps the good performance on longer permutations. We claim that the heuristics proposed in this work contribute for discovering the complexity of sorting by transposition, which remains open.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A novel system for predicting plant protein kinase superfamily by using machine learning methodology 利用机器学习方法预测植物蛋白激酶超家族的新系统

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722064

V. Mallika, K. Sivakumar, E. Soniya

Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.

蛋白激酶是最大的蛋白质超家族之一，几乎参与了每一个细胞过程。在植物中，由于其在细胞通讯，生长和发育中的重要作用，对这种特殊蛋白质的研究越来越多。开发一种工具来确定序列是植物蛋白激酶的可能性，将简化工作并加快实验表征。在这种方法中，已经开发并实现了一个高性能预测服务器“PhytokinaseSVM”，可以在http://type3pks.in/kinase上获得。在开发过程中，利用了支持向量机、基于核的监督学习技术以及二肽和多重频率的组成特性。基于有限的可用数据，该工具提供了一个简单独特的平台来确定特定序列是否是植物蛋白激酶的概率，准确度中等(98%)。当对500种蛋白激酶和500种非蛋白激酶进行测试时，PhytokinaseSVM达到了96%的特异性和100%的灵敏度，而这些蛋白激酶不是训练数据集的一部分。我们期望该工具可以作为植物蛋白激酶研究人员的有用资源，因为它是免费的。该工具还可以预测其他真核蛋白激酶。目前正在进行的工作是通过在训练数据集中包含更多的序列特征来进一步提高预测精度。

{"title":"A novel system for predicting plant protein kinase superfamily by using machine learning methodology","authors":"V. Mallika, K. Sivakumar, E. Soniya","doi":"10.1145/1722024.1722064","DOIUrl":"https://doi.org/10.1145/1722024.1722064","url":null,"abstract":"Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"34"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel measure for evaluating an ordered list: application in microRNA target prediction 一种评价有序序列的新方法:在microRNA靶标预测中的应用

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722067

Debarka Sengupta, S. Bandyopadhyay, U. Maulik

Sensitivity and specificity are the most widely used statistics for measuring the performance of a binary classification test. They stand vastly meaningful for variety of use cases where the classifying tests are affordable. But unfortunately, there is a legion of problems arriving from different streams of natural sciences where the screening test is too expensive to render for all the predicted objects. Thus, the trend has been for scientists to calculate the sensitivity and the specificity of a binary classification test based on a handful of experimentally proven facts, which is theoretically uncertain. In this article a novel measure is proposed that assigns importance to multiple ordered lists, taking into account the share of majority voted ranked pairs of elements a list contains. A real life bioinformatic application is demonstrated in the domain of microRNA target prediction where a number of algorithms exist. Using the proposed measure, we aim to assign certain weight to each algorithm that conveys its reliability with respect to the rest.

灵敏度和特异性是最广泛使用的统计量，以衡量一个二元分类测试的性能。它们对于分类测试负担得起的各种用例来说意义重大。但不幸的是，有大量的问题来自不同的自然科学流，其中筛选测试过于昂贵，无法渲染所有预测对象。因此，科学家们的趋势是根据少数实验证明的事实来计算二元分类测试的灵敏度和特异性，这些事实在理论上是不确定的。在这篇文章中，提出了一种新的方法来分配重要性的多个有序列表，考虑到多数投票排序对的元素列表包含的份额。现实生活中的生物信息学应用在microRNA靶标预测领域得到了证明，其中存在许多算法。使用提出的度量，我们的目标是为每个算法分配一定的权重，以传达其相对于其他算法的可靠性。

引用次数: 6

A fast bit-parallel multi-patterns string matching algorithm for biological sequences 生物序列的快速位并行多模式字符串匹配算法

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722077

R. Prasad, S. Agarwal, I. Yadav, Bharat Singh

The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).

搜索模式P[0…]出现的问题m-1]在文本T[0…]n-1>， m≤n，其中P和T的符号取自大小为Σ的字母表Σ，称为精确字符串匹配问题。目前，模式匹配是在生物序列数据库中定位核苷酸或氨基酸序列模式的有力工具。搜索一组模式P0 P1 P2…r-1, r≥1，在给定文本T中称为多模式字符串匹配问题。多模式字符串匹配问题以前已经通过有效的位并行字符串匹配算法:shift-or和BNDM来解决。许多其他类型的算法也存在于相同的目的，但比特并行已被证明比其他算法更有效。在本文中，我们用q-gram (B. Durian et al.， 2008)扩展了BNDM算法，用于多模式，其中每个多模式是任意DNA模式。我们假设每个模式的大小为m，模式的总长度小于或等于所用计算机的字长(w)。由于BNDM算法已被证明比其他任何位并行字符串匹配算法都要快(G. Navarro, 2000)，因此，我们比较了多模式q-gram BNDM算法与现有BNDM算法在不同q值和模式数(r)下的性能。

{"title":"A fast bit-parallel multi-patterns string matching algorithm for biological sequences","authors":"R. Prasad, S. Agarwal, I. Yadav, Bharat Singh","doi":"10.1145/1722024.1722077","DOIUrl":"https://doi.org/10.1145/1722024.1722077","url":null,"abstract":"The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"46"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722077","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

DNA sequence representation methods DNA序列表示方法

Q2 Medicine

In Silico Biology

Pub Date : 2010-02-15 DOI: 10.1145/1722024.1722073

G. Santhosh Kumar, S. Shiji

DNA sequence representation methods are used to denote a gene structure effectively and help in similarities/dissimilarities analysis of coding sequences. Many different kinds of representations have been proposed in the literature. They can be broadly classified into Numerical, Graphical, Geometrical and Hybrid representation methods. DNA structure and function analysis are made easy with graphical and geometrical representation methods since it gives visual representation of a DNA structure. In numerical method, numerical values are assigned to a sequence and digital signal processing methods are used to analyze the sequence. Hybrid approaches are also reported in the literature to analyze DNA sequences. This paper reviews the latest developments in DNA Sequence representation methods. We also present a taxonomy of various methods. A comparison of these methods where ever possible is also done.

DNA序列表示方法用于有效地表示基因结构，有助于编码序列的异同分析。文献中提出了许多不同类型的表述。它们大致可分为数值表示、图形表示、几何表示和混合表示。DNA的结构和功能分析很容易与图形和几何表示方法，因为它提供了DNA结构的视觉表示。在数值方法中，将数值赋给序列，并使用数字信号处理方法对序列进行分析。混合方法也报道在文献中分析DNA序列。本文综述了DNA序列表示方法的最新进展。我们还提出了各种方法的分类。尽可能地对这些方法进行比较。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

In Silico Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀