Pub Date : 2013-12-27DOI: 10.2174/1875036201307010063
Anamika Singh, Rajeev Singh
Each molecule has its own specialty, structure and function and when these molecules are combined together they form a compound. Structure and function of a molecule are related to each other and QSARs (Quantitative Structure- Activity relationships) are based on the criteria that the structure of a molecule must contain the features responsible for its physical, chemical, and biological properties, and on the ability to represent the chemical by one, or more, numerical descriptor(s). By QSAR models, the biological activity of a new or untested chemical can be inferred from the molecular structure of similar compounds whose activities have already been assessed. QSARs attempt to relate physical and chemical properties of molecules to their biological activities. For this there are so many descriptors (for example, molecular weight, number of rotatable bonds, Log P) and simple statistical methods such as Multiple Linear Regression (MLR) are used to predict a model. These models describe the activity of the data set and can predict activities for further sets of (untested) compounds. These types of descriptors are simple to calculate and allow for a relatively fast analysis. 3D-QSAR uses probe-based sampling within a molecular lattice to determine three-dimensional properties of molecules (particularly steric and electrostatic values) and can then correlate these 3D descriptors with biological activity. Physicochemical descriptors, include hydrophobicity, topology, electronic properties, and steric effects etc. These descriptors can be calculated empirically, statistically or through more recent computational methods. QSARs are currently being applied in many disciplines, with many pertaining to drug design and environmental risk assessment. Key word: QSAR, Ligand Designing, LogP, Cheminformatics, Docking.
{"title":"QSAR and its Role in Target-Ligand Interaction","authors":"Anamika Singh, Rajeev Singh","doi":"10.2174/1875036201307010063","DOIUrl":"https://doi.org/10.2174/1875036201307010063","url":null,"abstract":"Each molecule has its own specialty, structure and function and when these molecules are combined together they form a compound. Structure and function of a molecule are related to each other and QSARs (Quantitative Structure- Activity relationships) are based on the criteria that the structure of a molecule must contain the features responsible for its physical, chemical, and biological properties, and on the ability to represent the chemical by one, or more, numerical descriptor(s). By QSAR models, the biological activity of a new or untested chemical can be inferred from the molecular structure of similar compounds whose activities have already been assessed. QSARs attempt to relate physical and chemical properties of molecules to their biological activities. For this there are so many descriptors (for example, molecular weight, number of rotatable bonds, Log P) and simple statistical methods such as Multiple Linear Regression (MLR) are used to predict a model. These models describe the activity of the data set and can predict activities for further sets of (untested) compounds. These types of descriptors are simple to calculate and allow for a relatively fast analysis. 3D-QSAR uses probe-based sampling within a molecular lattice to determine three-dimensional properties of molecules (particularly steric and electrostatic values) and can then correlate these 3D descriptors with biological activity. Physicochemical descriptors, include hydrophobicity, topology, electronic properties, and steric effects etc. These descriptors can be calculated empirically, statistically or through more recent computational methods. QSARs are currently being applied in many disciplines, with many pertaining to drug design and environmental risk assessment. Key word: QSAR, Ligand Designing, LogP, Cheminformatics, Docking.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"63-67"},"PeriodicalIF":0.0,"publicationDate":"2013-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-13DOI: 10.2174/1875036201307010034
Hui Zhang, S. Pounds, Li Tang
Recent developments in Next-Generation Sequencing (NGS) technologies have opened doors for ultra high throughput sequencing mRNA (mRNA-seq) of the whole transcriptome. mRNA-seq has enabled researchers to comprehensively search for underlying biological determinants of diseases and ultimately discover novel preventive and therapeutic solutions. Unfortunately, given the complexity of mRNA-seq data, data generation has outgrown current analytical capacity, hindering the pace of research in this area. Thus, there is an urgent need to develop novel statistical methodology that addresses problems related to mRNA-seq data. This review addresses the common challenge of the presence of overdispersion in mRNA count data. We review current methods for modeling overdispersion, such as negative binomial, quasi-likelihood Poisson method, and the two-stage adaptive method; introduce related statistical theories; and discuss their applications to mRNA-seq count data.
{"title":"Statistical Methods for Overdispersion in mRNA-Seq Count Data","authors":"Hui Zhang, S. Pounds, Li Tang","doi":"10.2174/1875036201307010034","DOIUrl":"https://doi.org/10.2174/1875036201307010034","url":null,"abstract":"Recent developments in Next-Generation Sequencing (NGS) technologies have opened doors for ultra high throughput sequencing mRNA (mRNA-seq) of the whole transcriptome. mRNA-seq has enabled researchers to comprehensively search for underlying biological determinants of diseases and ultimately discover novel preventive and therapeutic solutions. Unfortunately, given the complexity of mRNA-seq data, data generation has outgrown current analytical capacity, hindering the pace of research in this area. Thus, there is an urgent need to develop novel statistical methodology that addresses problems related to mRNA-seq data. This review addresses the common challenge of the presence of overdispersion in mRNA count data. We review current methods for modeling overdispersion, such as negative binomial, quasi-likelihood Poisson method, and the two-stage adaptive method; introduce related statistical theories; and discuss their applications to mRNA-seq count data.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"60 1","pages":"34-40"},"PeriodicalIF":0.0,"publicationDate":"2013-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-13DOI: 10.2174/1875036201307010041
D. Georgiou, T. Karakasidis, A. Megaritis
The study of genetic sequences is of great importance in biology and medicine. Sequence analysis and taxonomy are two major fields of application of bioinformatics. In this survey, we present results concerning genetic sequences and Chou's pseudo amino acid composition as well as methodologies developed based on this concept along with elements of fuzzy set theory, and emphasize on fuzzy clustering and its application in analysis of genetic sequences.
{"title":"A Short Survey on Genetic Sequences, Chou’s Pseudo Amino Acid Composition and its Combination with Fuzzy Set Theory","authors":"D. Georgiou, T. Karakasidis, A. Megaritis","doi":"10.2174/1875036201307010041","DOIUrl":"https://doi.org/10.2174/1875036201307010041","url":null,"abstract":"The study of genetic sequences is of great importance in biology and medicine. Sequence analysis and taxonomy are two major fields of application of bioinformatics. In this survey, we present results concerning genetic sequences and Chou's pseudo amino acid composition as well as methodologies developed based on this concept along with elements of fuzzy set theory, and emphasize on fuzzy clustering and its application in analysis of genetic sequences.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"41-48"},"PeriodicalIF":0.0,"publicationDate":"2013-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-13DOI: 10.2174/1875036201307010049
V. Sutariya, A. Groshev, Prabodh Sadana, D. Bhatia, Y. Pathak
Artificial neural networks (ANNs) technology models the pattern recognition capabilities of the neural networks of the brain. Similarly to a single neuron in the brain, artificial neuron unit receives inputs from many external sources, processes them, and makes decisions. Interestingly, ANN simulates the biological nervous system and draws on analogues of adaptive biological neurons. ANNs do not require rigidly structured experimental designs and can map functions using historical or incomplete data, which makes them a powerful tool for simulation of various non-linear systems.ANNs have many applications in various fields, including engineering, psychology, medicinal chemistry and pharmaceutical research. Because of their capacity for making predictions, pattern recognition, and modeling, ANNs have been very useful in many aspects of pharmaceutical research including modeling of the brain neural network, analytical data analysis, drug modeling, protein structure and function, dosage optimization and manufacturing, pharmacokinetics and pharmacodynamics modeling, and in vitro in vivo correlations. This review discusses the applications of ANNs in drug delivery and pharmacological research.
{"title":"Artificial Neural Network in Drug Delivery and Pharmaceutical Research","authors":"V. Sutariya, A. Groshev, Prabodh Sadana, D. Bhatia, Y. Pathak","doi":"10.2174/1875036201307010049","DOIUrl":"https://doi.org/10.2174/1875036201307010049","url":null,"abstract":"Artificial neural networks (ANNs) technology models the pattern recognition capabilities of the neural networks of the brain. Similarly to a single neuron in the brain, artificial neuron unit receives inputs from many external sources, processes them, and makes decisions. Interestingly, ANN simulates the biological nervous system and draws on analogues of adaptive biological neurons. ANNs do not require rigidly structured experimental designs and can map functions using historical or incomplete data, which makes them a powerful tool for simulation of various non-linear systems.ANNs have many applications in various fields, including engineering, psychology, medicinal chemistry and pharmaceutical research. Because of their capacity for making predictions, pattern recognition, and modeling, ANNs have been very useful in many aspects of pharmaceutical research including modeling of the brain neural network, analytical data analysis, drug modeling, protein structure and function, dosage optimization and manufacturing, pharmacokinetics and pharmacodynamics modeling, and in vitro in vivo correlations. This review discusses the applications of ANNs in drug delivery and pharmacological research.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"49-62"},"PeriodicalIF":0.0,"publicationDate":"2013-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-13DOI: 10.2174/1875036201307010027
Gengxin Li, Hongjiang Zhu
With the availability of high-density genomic data containing millions of single nucleotide polymorphisms and tens or hundreds of thousands of individuals, genetic association study is likely to identify the variants contributing to complex traits in a genome-wide scale. However, genome-wide association studies are confounded by some spurious associations due to not properly interpreting sample structure (containing population structure, family structure and cryptic relatedness). The absence of complete genealogy of population in the genome-wide association studies model greatly motivates the development of new methods to correct the inflation of false positive. In this process, linear mixed model based approaches with the advantage of capturing multilevel relatedness have gained large ground. We summarize current literatures dealing with sample structure, and our review focuses on the following four areas: (i) The approaches handling population structure in genome-wide association studies; (ii) The linear mixed model based approaches in genome-wide association studies; (iii) The performance of linear mixed model based approaches in genome-wide association studies and (iv) The unsolved issues and future work of linear mixed model based approaches.
{"title":"Genetic Studies: The Linear Mixed Models in Genome-wide Association Studies","authors":"Gengxin Li, Hongjiang Zhu","doi":"10.2174/1875036201307010027","DOIUrl":"https://doi.org/10.2174/1875036201307010027","url":null,"abstract":"With the availability of high-density genomic data containing millions of single nucleotide polymorphisms and tens or hundreds of thousands of individuals, genetic association study is likely to identify the variants contributing to complex traits in a genome-wide scale. However, genome-wide association studies are confounded by some spurious associations due to not properly interpreting sample structure (containing population structure, family structure and cryptic relatedness). The absence of complete genealogy of population in the genome-wide association studies model greatly motivates the development of new methods to correct the inflation of false positive. In this process, linear mixed model based approaches with the advantage of capturing multilevel relatedness have gained large ground. We summarize current literatures dealing with sample structure, and our review focuses on the following four areas: (i) The approaches handling population structure in genome-wide association studies; (ii) The linear mixed model based approaches in genome-wide association studies; (iii) The performance of linear mixed model based approaches in genome-wide association studies and (iv) The unsolved issues and future work of linear mixed model based approaches.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"27-33"},"PeriodicalIF":0.0,"publicationDate":"2013-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-29DOI: 10.2174/1875036201307010019
K. Blighe
Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such 'dimension reduction' techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.
{"title":"Haplotype Classification Using Copy Number Variation and Principal Components Analysis","authors":"K. Blighe","doi":"10.2174/1875036201307010019","DOIUrl":"https://doi.org/10.2174/1875036201307010019","url":null,"abstract":"Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such 'dimension reduction' techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"19-24"},"PeriodicalIF":0.0,"publicationDate":"2013-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-07-26DOI: 10.2174/1875036201307010009
A. Basuchoudhary, Vahan Simoyan, R. Mazumder
We investigate why biologists fail to contribute to biological databases although almost all of them use these databases for research. We find, using evolutionary game theory and computer simulations, that (a) the initial distribution of contributors who are patient determines whether a culture of contribution will prevail or not (b) institutions (where institution means "a significant practice, relationship, or organization in a society or culture") that incentivize patience and therefore limit free riding make contribution more likely and, (c) a stable institution, whether it incentivizes patience or not, will increase contribution. As a result we suggest there is a trade-off between the benefits of changing institutions to incentivize patience and the costs of the change itself. Moreover, even if it is possible to create institutions that incentivize patience among scientists such institutions may nevertheless fail. We create a computer simulation of a population of biologists based on our theory. These simulations suggest that institutions should focus more on rewards rather than penalties to incentivize a culture of contribution. Our approach therefore provides a methodology for developing a practical blueprint for organizing scientists to encourage cooperation and maximizing scientific output.
{"title":"Community annotation and the evolution of cooperation: How patience matters","authors":"A. Basuchoudhary, Vahan Simoyan, R. Mazumder","doi":"10.2174/1875036201307010009","DOIUrl":"https://doi.org/10.2174/1875036201307010009","url":null,"abstract":"We investigate why biologists fail to contribute to biological databases although almost all of them use these databases for research. We find, using evolutionary game theory and computer simulations, that (a) the initial distribution of contributors who are patient determines whether a culture of contribution will prevail or not (b) institutions (where institution means \"a significant practice, relationship, or organization in a society or culture\") that incentivize patience and therefore limit free riding make contribution more likely and, (c) a stable institution, whether it incentivizes patience or not, will increase contribution. As a result we suggest there is a trade-off between the benefits of changing institutions to incentivize patience and the costs of the change itself. Moreover, even if it is possible to create institutions that incentivize patience among scientists such institutions may nevertheless fail. We create a computer simulation of a population of biologists based on our theory. These simulations suggest that institutions should focus more on rewards rather than penalties to incentivize a culture of contribution. Our approach therefore provides a methodology for developing a practical blueprint for organizing scientists to encourage cooperation and maximizing scientific output.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"9-18"},"PeriodicalIF":0.0,"publicationDate":"2013-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-01-31DOI: 10.2174/1875036201307010001
Erik Aronesty
High throughput sequencing (HTS) has resulted in extreme growth rates of sequencing data. At our lab, we generate terabytes of data every day. It is usually seen as required for data output to be "cleaned" and processed in various ways prior to use for common tasks such as variant calling, expression quantification and assembly. Two common tasks associated with HTS are adapter trimming and paired-end joining. I have developed two tools at Expression Analysis, Inc. to address these common tasks. The names of these programs are fastq-mcf and fastq-join. I compared the performance of these tools to similar open-source utilities, both in terms of resource efficiency, and effectiveness.
{"title":"Comparison of Sequencing Utility Programs","authors":"Erik Aronesty","doi":"10.2174/1875036201307010001","DOIUrl":"https://doi.org/10.2174/1875036201307010001","url":null,"abstract":"High throughput sequencing (HTS) has resulted in extreme growth rates of sequencing data. At our lab, we generate terabytes of data every day. It is usually seen as required for data output to be \"cleaned\" and processed in various ways prior to use for common tasks such as variant calling, expression quantification and assembly. Two common tasks associated with HTS are adapter trimming and paired-end joining. I have developed two tools at Expression Analysis, Inc. to address these common tasks. The names of these programs are fastq-mcf and fastq-join. I compared the performance of these tools to similar open-source utilities, both in terms of resource efficiency, and effectiveness.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2013-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-30DOI: 10.2174/1875036201206010055
A. Collins, X. Ke
Tetra-primer ARMS-PCR is used extensively as a low cost, single PCR assay requiring no post-PCR manipulation. The design of successful primers depends on a number of variables such as melting temperatures, GC content, complementarity and selection of mismatch bases. The optimal selection of primers can be achieved in an automated way using a program which evaluates candidate primers for a given sequence. The Primer1 software was developed originally for use in the context of restriction fragment length polymorphism analysis using gel electrophoresis. However, recent applications have been more diverse, reviewed here, and we present an overview of the Primer1 software for primer design and web-service. We have updated the Primer1 program, and provide more complete details of the implementation. We also provide test data and output. The program is now available on a new, efficient, LAMP web service for users at: http://primer1.soton.ac.uk/primer1.html
{"title":"Primer1: Primer Design Web Service for Tetra-Primer ARMS-PCR","authors":"A. Collins, X. Ke","doi":"10.2174/1875036201206010055","DOIUrl":"https://doi.org/10.2174/1875036201206010055","url":null,"abstract":"Tetra-primer ARMS-PCR is used extensively as a low cost, single PCR assay requiring no post-PCR manipulation. The design of successful primers depends on a number of variables such as melting temperatures, GC content, complementarity and selection of mismatch bases. The optimal selection of primers can be achieved in an automated way using a program which evaluates candidate primers for a given sequence. The Primer1 software was developed originally for use in the context of restriction fragment length polymorphism analysis using gel electrophoresis. However, recent applications have been more diverse, reviewed here, and we present an overview of the Primer1 software for primer design and web-service. We have updated the Primer1 program, and provide more complete details of the implementation. We also provide test data and output. The program is now available on a new, efficient, LAMP web service for users at: http://primer1.soton.ac.uk/primer1.html","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"6 1","pages":"55-58"},"PeriodicalIF":0.0,"publicationDate":"2012-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68106515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}