In this research, we use S-System model and Differential Evolution based inference method to capture cellular dynamics using available mutual interaction information among genes. We propose a new fitness function, effectively incorporating a priori information, which guides the inference method to deduce correct skeletal structure of the network with more accurate parameter values. Proposed fitness function mirrors user's confidence in the validity of knowledge and helps in narrowing down the search range of the model parameters for highly confident knowledge. We investigate the potency of the method in terms of quality of data and required data size. The proposed method is shown to perform better in inherent noisy data and in presence of small number of time-dynamics data. We also investigate how the inference method performs in terms of iterative incorporation of knowledge. In inferring cell-cycle data of budding yeast (Saccharomyces cerevisiae), guided by knowledge, the inference method predicts 17 and 23 correct regulations in first and second iteration, respectively which is significantly higher than some other existing methods. Along with finding the parameter values more accurately, it predicts some new regulations and helps in revealing the underlying network structure.
{"title":"A prior knowledge based approach to infer gene regulatory networks","authors":"M. Hasan, N. Noman, H. Iba","doi":"10.1145/1722024.1722069","DOIUrl":"https://doi.org/10.1145/1722024.1722069","url":null,"abstract":"In this research, we use S-System model and Differential Evolution based inference method to capture cellular dynamics using available mutual interaction information among genes. We propose a new fitness function, effectively incorporating a priori information, which guides the inference method to deduce correct skeletal structure of the network with more accurate parameter values. Proposed fitness function mirrors user's confidence in the validity of knowledge and helps in narrowing down the search range of the model parameters for highly confident knowledge. We investigate the potency of the method in terms of quality of data and required data size. The proposed method is shown to perform better in inherent noisy data and in presence of small number of time-dynamics data. We also investigate how the inference method performs in terms of iterative incorporation of knowledge. In inferring cell-cycle data of budding yeast (Saccharomyces cerevisiae), guided by knowledge, the inference method predicts 17 and 23 correct regulations in first and second iteration, respectively which is significantly higher than some other existing methods. Along with finding the parameter values more accurately, it predicts some new regulations and helps in revealing the underlying network structure.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"39"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SpiceRDb knowledgebase is a unique attempt to elucidate the science behind the action of spices on various disease pathways, using text mining and molecular modeling tools. These spice-remedies have been demonstrated to mediate therapeutic benefits for wide spectra of diseases ranging from multiple sclerosis to colorectal cancer. Furthermore, the docking studies identified curcumin, a component of turmeric, to be a potential disease biomarker for colorectal neoplasm. Thus, the usefulness of the SpiceRDb knowledgebase motivates us to make it available to the public community in order to benefit from the vast knowledge available about alternative medicine projects and the recent scientific evidences supporting the benefits of spice-remedies.
{"title":"SpiceRDb: an integrated knowledgebase of \"spice-disease\" remedies","authors":"R. Pauly, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722057","DOIUrl":"https://doi.org/10.1145/1722024.1722057","url":null,"abstract":"SpiceRDb knowledgebase is a unique attempt to elucidate the science behind the action of spices on various disease pathways, using text mining and molecular modeling tools. These spice-remedies have been demonstrated to mediate therapeutic benefits for wide spectra of diseases ranging from multiple sclerosis to colorectal cancer. Furthermore, the docking studies identified curcumin, a component of turmeric, to be a potential disease biomarker for colorectal neoplasm. Thus, the usefulness of the SpiceRDb knowledgebase motivates us to make it available to the public community in order to benefit from the vast knowledge available about alternative medicine projects and the recent scientific evidences supporting the benefits of spice-remedies.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"395 1","pages":"28"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of computational tools in the prediction of ADME/Tox properties of compounds is growing rapidly in drug discovery as the benefits they provide in high throughput and early application in drug design are realized. Numerous examples exist of drugs that have had to be withdrawn, because of unacceptable toxicity, in clinical trials and even after reaching the market. In this study phytochemicals from selected spices were used to predict their rodent carcinogenicity, mutagenicity, PPB and BBB. Out of 108 compounds analysed, we found that only five compounds as non-mutagenic and non-carcinogenic and all the remaining were toxic in a pharmacological perspective. The five non-toxic compounds are alpha-zingiberene, delphinidin, laurotetanine, malabaricone-B and malabaricone-C. The PPB values of alpha-zingiberene, delphinidin and laurotetanine are in the <90% range (57.58, 88.41, 52.59, respectively) indicating that the three compounds were weakly bound to plasma proteins and the other two (malabaricone-B and malabaricone-C) strongly binds to plasma protein. The identification of delphinidin as a naturally occurring inhibitor of VEGF (vascular endothelial growth factor) receptors suggests that this molecule possesses important antiangiogenic properties that may be helpful for the prevention and treatment of cancer. The healing activity of malabaricone B and malabaricone C, the major antioxidant constituents of Myristaceae family, against indomethacin-induced gastric ulceration in mice has been studied. Though spices are well known for their antioxidant, antimicrobial, antinflammatory properties etc., this study clearly indicates the plethora of carcinogenic behaviour of spice compounds.
{"title":"Prediction of toxicity and pharmacological potential of selected spice compounds","authors":"A. Riju, K. Sithara, S. S. Nair, S. Eapen","doi":"10.1145/1722024.1722060","DOIUrl":"https://doi.org/10.1145/1722024.1722060","url":null,"abstract":"The use of computational tools in the prediction of ADME/Tox properties of compounds is growing rapidly in drug discovery as the benefits they provide in high throughput and early application in drug design are realized. Numerous examples exist of drugs that have had to be withdrawn, because of unacceptable toxicity, in clinical trials and even after reaching the market. In this study phytochemicals from selected spices were used to predict their rodent carcinogenicity, mutagenicity, PPB and BBB. Out of 108 compounds analysed, we found that only five compounds as non-mutagenic and non-carcinogenic and all the remaining were toxic in a pharmacological perspective. The five non-toxic compounds are alpha-zingiberene, delphinidin, laurotetanine, malabaricone-B and malabaricone-C. The PPB values of alpha-zingiberene, delphinidin and laurotetanine are in the <90% range (57.58, 88.41, 52.59, respectively) indicating that the three compounds were weakly bound to plasma proteins and the other two (malabaricone-B and malabaricone-C) strongly binds to plasma protein. The identification of delphinidin as a naturally occurring inhibitor of VEGF (vascular endothelial growth factor) receptors suggests that this molecule possesses important antiangiogenic properties that may be helpful for the prevention and treatment of cancer. The healing activity of malabaricone B and malabaricone C, the major antioxidant constituents of Myristaceae family, against indomethacin-induced gastric ulceration in mice has been studied. Though spices are well known for their antioxidant, antimicrobial, antinflammatory properties etc., this study clearly indicates the plethora of carcinogenic behaviour of spice compounds.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"31"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722060","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.
{"title":"Biomedical association mining and validation","authors":"P. Gandra, M. Pradhan, M. Palakal","doi":"10.1145/1722024.1722035","DOIUrl":"https://doi.org/10.1145/1722024.1722035","url":null,"abstract":"During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722035","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.
{"title":"A novel system for predicting plant protein kinase superfamily by using machine learning methodology","authors":"V. Mallika, K. Sivakumar, E. Soniya","doi":"10.1145/1722024.1722064","DOIUrl":"https://doi.org/10.1145/1722024.1722064","url":null,"abstract":"Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"34"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome Rearrangement is a field that addresses the problem of finding the minimum number of global operations, such as transpositions, reversals, fusions and fissions that transform a given genome into another. In this work we deal with transposition events, which are events that change the position of two contiguous block of genes in the same chromosome. Some approximation algorithms for this problem were published so far. Bafna and Pevzner [1] proposed the first 1.5-approximation algorithm for the transposition distance problem and recently Elias and Hartman [4] delineated a 1.375-approximation algorithm, which is currently the best performance ratio known. Many other algorithms achieve good performance on experimental results and provide new insights to solve the problem [2, 5, 8, 9, 11]. In this paper we present two main results. The first result is the implementation of the 1.375-algorithm described by Elias and Hartman [4]. We also compared the experimental results from Elias-Hartman algorithm with other approaches. It is important to realize that no implementation of Elias-Hartman algorithm was provided before this work and the approximation proof was assisted by a computer program. Although the approximation ratio is an important issue, we need to know how the algorithm behaves on practical experiments. For this reason, we show the experimental results of Elias-Hartman algorithm using our datasets. The second result is the description of our algorithm based on Bafna and Pevzner [1] 1.5-approximation algorithm. Our algorithm uses a set of heuristics that allowed us to improve the solution quality of the original algorithm, but keeping the original 1.5-approximation ratio. We compare our experimental results with the best results published so far. The results indicate that our algorithm performs best in practice. The solution quality analysis also shows that our algorithm outperforms Elias and Hartman 1.375-approximation algorithm on longer permutations, despite the approximation ratio. We delineate an algorithm for the transposition distance problem. Our algorithm is the first polynomial time algorithm that sorts by transposition any permutation π, for |π| = 9. We show that our algorithm is better than the other algorithms using sequences π, for π < 11. We also show that our algorithm keeps the good performance on longer permutations. We claim that the heuristics proposed in this work contribute for discovering the complexity of sorting by transposition, which remains open.
{"title":"Extending Bafna-Pevzner algorithm","authors":"Ulisses Dias, Zanoni Dias","doi":"10.1145/1722024.1722051","DOIUrl":"https://doi.org/10.1145/1722024.1722051","url":null,"abstract":"Genome Rearrangement is a field that addresses the problem of finding the minimum number of global operations, such as transpositions, reversals, fusions and fissions that transform a given genome into another. In this work we deal with transposition events, which are events that change the position of two contiguous block of genes in the same chromosome.\u0000 Some approximation algorithms for this problem were published so far. Bafna and Pevzner [1] proposed the first 1.5-approximation algorithm for the transposition distance problem and recently Elias and Hartman [4] delineated a 1.375-approximation algorithm, which is currently the best performance ratio known. Many other algorithms achieve good performance on experimental results and provide new insights to solve the problem [2, 5, 8, 9, 11].\u0000 In this paper we present two main results. The first result is the implementation of the 1.375-algorithm described by Elias and Hartman [4]. We also compared the experimental results from Elias-Hartman algorithm with other approaches. It is important to realize that no implementation of Elias-Hartman algorithm was provided before this work and the approximation proof was assisted by a computer program. Although the approximation ratio is an important issue, we need to know how the algorithm behaves on practical experiments. For this reason, we show the experimental results of Elias-Hartman algorithm using our datasets.\u0000 The second result is the description of our algorithm based on Bafna and Pevzner [1] 1.5-approximation algorithm. Our algorithm uses a set of heuristics that allowed us to improve the solution quality of the original algorithm, but keeping the original 1.5-approximation ratio. We compare our experimental results with the best results published so far. The results indicate that our algorithm performs best in practice. The solution quality analysis also shows that our algorithm outperforms Elias and Hartman 1.375-approximation algorithm on longer permutations, despite the approximation ratio.\u0000 We delineate an algorithm for the transposition distance problem. Our algorithm is the first polynomial time algorithm that sorts by transposition any permutation π, for |π| = 9. We show that our algorithm is better than the other algorithms using sequences π, for π < 11. We also show that our algorithm keeps the good performance on longer permutations. We claim that the heuristics proposed in this work contribute for discovering the complexity of sorting by transposition, which remains open.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sensitivity and specificity are the most widely used statistics for measuring the performance of a binary classification test. They stand vastly meaningful for variety of use cases where the classifying tests are affordable. But unfortunately, there is a legion of problems arriving from different streams of natural sciences where the screening test is too expensive to render for all the predicted objects. Thus, the trend has been for scientists to calculate the sensitivity and the specificity of a binary classification test based on a handful of experimentally proven facts, which is theoretically uncertain. In this article a novel measure is proposed that assigns importance to multiple ordered lists, taking into account the share of majority voted ranked pairs of elements a list contains. A real life bioinformatic application is demonstrated in the domain of microRNA target prediction where a number of algorithms exist. Using the proposed measure, we aim to assign certain weight to each algorithm that conveys its reliability with respect to the rest.
{"title":"A novel measure for evaluating an ordered list: application in microRNA target prediction","authors":"Debarka Sengupta, S. Bandyopadhyay, U. Maulik","doi":"10.1145/1722024.1722067","DOIUrl":"https://doi.org/10.1145/1722024.1722067","url":null,"abstract":"Sensitivity and specificity are the most widely used statistics for measuring the performance of a binary classification test. They stand vastly meaningful for variety of use cases where the classifying tests are affordable. But unfortunately, there is a legion of problems arriving from different streams of natural sciences where the screening test is too expensive to render for all the predicted objects. Thus, the trend has been for scientists to calculate the sensitivity and the specificity of a binary classification test based on a handful of experimentally proven facts, which is theoretically uncertain. In this article a novel measure is proposed that assigns importance to multiple ordered lists, taking into account the share of majority voted ranked pairs of elements a list contains. A real life bioinformatic application is demonstrated in the domain of microRNA target prediction where a number of algorithms exist. Using the proposed measure, we aim to assign certain weight to each algorithm that conveys its reliability with respect to the rest.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"8 1","pages":"37"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722067","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).
{"title":"A fast bit-parallel multi-patterns string matching algorithm for biological sequences","authors":"R. Prasad, S. Agarwal, I. Yadav, Bharat Singh","doi":"10.1145/1722024.1722077","DOIUrl":"https://doi.org/10.1145/1722024.1722077","url":null,"abstract":"The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1>with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"46"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722077","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64108640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The objective of this study is to explore the single sequence repeats (SSRs) and single nucleotide polymorphims (SNPs) in expressed sequence tags (ESTs) of Radopholus similis. We retrieved 7380 EST sequences consisting different tissues/condition libraries from dbEST of National Centre for Biotechnology Information (NCBI). A total of 1449 SSRs were detected by MISA perl script. Hexa-nucleotide repeats (836 nos.) followed by mononucleotide repeats (207 nos.) were found to be more abundant than other types of repeats. Putative SNP/Indels were found out with the help of AutoSNP. As many as 1038 SNPs and 108 small indels (insertion/deletion) were found with a density of one SNP/191 bp and one indel/1.8 kbp. Candidate SNPs were categorized according to nucleotide substitution as either transition (C↔T or G↔A) or transversion (C↔G, A↔T, C↔A or T↔G). We observed a higher number of transversions type substitution (537) than transitions (501). However considering the individual substitutions, G↔A (281) and C↔T (220) were found to be predominant than purine to pyrimidine base substitutions. Since the SSR and SNP markers are invaluable tools for genetic analysis, the identified SSRs and SNPs of R. similis could be used in diversity analysis, genetic trait mapping, association studies and marker assisted selection.
{"title":"Mining SSR and SNP/Indel sites in expressed sequence tag libraries of Radopholus similis","authors":"A. Riju, P. Lakshmi, P. Nima, N. Reena, S. Eapen","doi":"10.1145/1722024.1722042","DOIUrl":"https://doi.org/10.1145/1722024.1722042","url":null,"abstract":"The objective of this study is to explore the single sequence repeats (SSRs) and single nucleotide polymorphims (SNPs) in expressed sequence tags (ESTs) of Radopholus similis. We retrieved 7380 EST sequences consisting different tissues/condition libraries from dbEST of National Centre for Biotechnology Information (NCBI). A total of 1449 SSRs were detected by MISA perl script. Hexa-nucleotide repeats (836 nos.) followed by mononucleotide repeats (207 nos.) were found to be more abundant than other types of repeats. Putative SNP/Indels were found out with the help of AutoSNP. As many as 1038 SNPs and 108 small indels (insertion/deletion) were found with a density of one SNP/191 bp and one indel/1.8 kbp. Candidate SNPs were categorized according to nucleotide substitution as either transition (C↔T or G↔A) or transversion (C↔G, A↔T, C↔A or T↔G). We observed a higher number of transversions type substitution (537) than transitions (501). However considering the individual substitutions, G↔A (281) and C↔T (220) were found to be predominant than purine to pyrimidine base substitutions. Since the SSR and SNP markers are invaluable tools for genetic analysis, the identified SSRs and SNPs of R. similis could be used in diversity analysis, genetic trait mapping, association studies and marker assisted selection.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein kinase is a kinase enzyme that modifies other proteins by chemically adding phosphate groups to them. In this work, first the protein kinases of Coenorhabditis elegans and Homo sapiens with three or more common domain were grouped and disorder regions of protein kinases in each group were predicted. Then the similarities of the disordered regions among the organisms were found. Linear motifs present in these similar disorder regions were identified and tested for their conservation in both Homo sapiens and Coenorhabditis elegans. It is found that, though the similarities in disorder regions are high, the linear motifs are not conserved much in these distantly related organisms.
{"title":"Analysis of disordered regions in protein kinase subfamilies of Homo sapiens and Coenorhabditis elegans","authors":"K. Kurup, J. Natarajan","doi":"10.1145/1722024.1722028","DOIUrl":"https://doi.org/10.1145/1722024.1722028","url":null,"abstract":"Protein kinase is a kinase enzyme that modifies other proteins by chemically adding phosphate groups to them. In this work, first the protein kinases of Coenorhabditis elegans and Homo sapiens with three or more common domain were grouped and disorder regions of protein kinases in each group were predicted. Then the similarities of the disordered regions among the organisms were found. Linear motifs present in these similar disorder regions were identified and tested for their conservation in both Homo sapiens and Coenorhabditis elegans. It is found that, though the similarities in disorder regions are high, the linear motifs are not conserved much in these distantly related organisms.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64107973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}