首页 > 最新文献

Advances in Bioinformatics最新文献

英文 中文
Identification of Novel Inhibitors for Tobacco Mosaic Virus Infection in Solanaceae Plants 茄科植物烟草花叶病毒新抑制剂的鉴定
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-10-18 DOI: 10.1155/2015/198214
Archana Prabahar, S. Swaminathan, Arul Loganathan, R. Jegadeesan
Tobacco mosaic virus (TMV) infects several crops of economic importance (e.g., tomato) and remains as one of the major concerns to the farmers. TMV enters the host cell and produces the capping enzyme RNA polymerase. The viral genome replicates further to produce multiple mRNAs which encodes several proteins, including the coat protein and an RNA-dependent RNA polymerase (RdRp), as well as the movement protein. TMV replicase domain was chosen for the virtual screening studies against small molecules derived from ligand databases such as PubChem and ChemBank. Catalytic sites of the RdRp domain were identified and subjected to docking analysis with screened ligands derived from virtual screening LigandFit. Small molecules that interact with the target molecule at the catalytic domain region amino acids, GDD, were chosen as the best inhibitors for controlling the TMV replicase activity.
烟草花叶病毒(TMV)感染几种重要的经济作物(如番茄),仍然是农民关注的主要问题之一。TMV进入宿主细胞并产生盖帽酶RNA聚合酶。病毒基因组进一步复制,产生多种mrna,这些mrna编码几种蛋白质,包括外壳蛋白和RNA依赖的RNA聚合酶(RdRp),以及运动蛋白。选择TMV复制酶域对PubChem和ChemBank等配体数据库中衍生的小分子进行虚拟筛选研究。RdRp结构域的催化位点被确定,并与虚拟筛选LigandFit获得的筛选配体进行对接分析。在催化结构域与靶分子氨基酸相互作用的小分子GDD被认为是控制TMV复制酶活性的最佳抑制剂。
{"title":"Identification of Novel Inhibitors for Tobacco Mosaic Virus Infection in Solanaceae Plants","authors":"Archana Prabahar, S. Swaminathan, Arul Loganathan, R. Jegadeesan","doi":"10.1155/2015/198214","DOIUrl":"https://doi.org/10.1155/2015/198214","url":null,"abstract":"Tobacco mosaic virus (TMV) infects several crops of economic importance (e.g., tomato) and remains as one of the major concerns to the farmers. TMV enters the host cell and produces the capping enzyme RNA polymerase. The viral genome replicates further to produce multiple mRNAs which encodes several proteins, including the coat protein and an RNA-dependent RNA polymerase (RdRp), as well as the movement protein. TMV replicase domain was chosen for the virtual screening studies against small molecules derived from ligand databases such as PubChem and ChemBank. Catalytic sites of the RdRp domain were identified and subjected to docking analysis with screened ligands derived from virtual screening LigandFit. Small molecules that interact with the target molecule at the catalytic domain region amino acids, GDD, were chosen as the best inhibitors for controlling the TMV replicase activity.","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/198214","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64838183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine. 基于支持向量机的酪氨酸激酶配体-受体对预测。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-08-11 DOI: 10.1155/2015/528097
Masayuki Yarimizu, Cao Wei, Yusuke Komiyama, Kokoro Ueki, Shugo Nakamura, Kazuya Sumikoshi, Tohru Terada, Kentaro Shimizu

Receptor tyrosine kinases are essential proteins involved in cellular differentiation and proliferation in vivo and are heavily involved in allergic diseases, diabetes, and onset/proliferation of cancerous cells. Identifying the interacting partner of this protein, a growth factor ligand, will provide a deeper understanding of cellular proliferation/differentiation and other cell processes. In this study, we developed a method for predicting tyrosine kinase ligand-receptor pairs from their amino acid sequences. We collected tyrosine kinase ligand-receptor pairs from the Database of Interacting Proteins (DIP) and UniProtKB, filtered them by removing sequence redundancy, and used them as a dataset for machine learning and assessment of predictive performance. Our prediction method is based on support vector machines (SVMs), and we evaluated several input features suitable for tyrosine kinase for machine learning and compared and analyzed the results. Using sequence pattern information and domain information extracted from sequences as input features, we obtained 0.996 of the area under the receiver operating characteristic curve. This accuracy is higher than that obtained from general protein-protein interaction pair predictions.

受体酪氨酸激酶是参与体内细胞分化和增殖的必需蛋白,在过敏性疾病、糖尿病和癌细胞的发生/增殖中有重要作用。识别这种蛋白质的相互作用伙伴,一种生长因子配体,将提供对细胞增殖/分化和其他细胞过程的更深入了解。在这项研究中,我们开发了一种从氨基酸序列预测酪氨酸激酶配体-受体对的方法。我们从相互作用蛋白数据库(DIP)和UniProtKB中收集了酪氨酸激酶配体-受体对,通过去除序列冗余进行过滤,并将其用作机器学习和预测性能评估的数据集。我们的预测方法基于支持向量机(svm),我们评估了几种适合酪氨酸激酶机器学习的输入特征,并对结果进行了比较和分析。以序列模式信息和从序列中提取的域信息作为输入特征,得到接收者工作特征曲线下面积的0.996。这种准确度高于一般的蛋白质-蛋白质相互作用对预测。
{"title":"Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine.","authors":"Masayuki Yarimizu,&nbsp;Cao Wei,&nbsp;Yusuke Komiyama,&nbsp;Kokoro Ueki,&nbsp;Shugo Nakamura,&nbsp;Kazuya Sumikoshi,&nbsp;Tohru Terada,&nbsp;Kentaro Shimizu","doi":"10.1155/2015/528097","DOIUrl":"https://doi.org/10.1155/2015/528097","url":null,"abstract":"<p><p>Receptor tyrosine kinases are essential proteins involved in cellular differentiation and proliferation in vivo and are heavily involved in allergic diseases, diabetes, and onset/proliferation of cancerous cells. Identifying the interacting partner of this protein, a growth factor ligand, will provide a deeper understanding of cellular proliferation/differentiation and other cell processes. In this study, we developed a method for predicting tyrosine kinase ligand-receptor pairs from their amino acid sequences. We collected tyrosine kinase ligand-receptor pairs from the Database of Interacting Proteins (DIP) and UniProtKB, filtered them by removing sequence redundancy, and used them as a dataset for machine learning and assessment of predictive performance. Our prediction method is based on support vector machines (SVMs), and we evaluated several input features suitable for tyrosine kinase for machine learning and compared and analyzed the results. Using sequence pattern information and domain information extracted from sequences as input features, we obtained 0.996 of the area under the receiver operating characteristic curve. This accuracy is higher than that obtained from general protein-protein interaction pair predictions. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"528097"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/528097","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33986035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Development of a machine learning method to predict membrane protein-ligand binding residues using basic sequence information. 利用基本序列信息预测膜蛋白-配体结合残基的机器学习方法的发展。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-01-31 DOI: 10.1155/2015/843030
M Xavier Suresh, M Michael Gromiha, Makiko Suwa

Locating ligand binding sites and finding the functionally important residues from protein sequences as well as structures became one of the challenges in understanding their function. Hence a Naïve Bayes classifier has been trained to predict whether a given amino acid residue in membrane protein sequence is a ligand binding residue or not using only sequence based information. The input to the classifier consists of the features of the target residue and two sequence neighbors on each side of the target residue. The classifier is trained and evaluated on a nonredundant set of 42 sequences (chains with at least one transmembrane domain) from 31 alpha-helical membrane proteins. The classifier achieves an overall accuracy of 70.7% with 72.5% specificity and 61.1% sensitivity in identifying ligand binding residues from sequence. The classifier performs better when the sequence is encoded by psi-blast generated PSSM profiles. Assessment of the predictions in the context of three-dimensional structures of proteins reveals the effectiveness of this method in identifying ligand binding sites from sequence information. In 83.3% (35 out of 42) of the proteins, the classifier identifies the ligand binding sites by correctly recognizing more than half of the binding residues. This will be useful to protein engineers in exploiting potential residues for functional assessment.

定位配体结合位点,从蛋白质序列和结构中寻找具有重要功能的残基,成为了解其功能的挑战之一。因此,Naïve贝叶斯分类器已经被训练来预测膜蛋白序列中给定的氨基酸残基是否是配体结合残基,或者仅仅使用基于序列的信息。分类器的输入由目标残基的特征和目标残基两侧的两个序列邻居组成。该分类器在来自31个α -螺旋膜蛋白的42个序列(至少具有一个跨膜结构域的链)的非冗余集上进行训练和评估。该分类器从序列中识别配体结合残基的总体准确率为70.7%,特异性为72.5%,灵敏度为61.1%。当序列被psi-blast生成的PSSM剖面编码时,分类器表现更好。在蛋白质三维结构背景下的预测评估揭示了该方法在从序列信息中识别配体结合位点方面的有效性。在83.3%(42个蛋白质中的35个)的蛋白质中,分类器通过正确识别一半以上的结合残基来识别配体结合位点。这将有助于蛋白质工程师利用潜在残基进行功能评估。
{"title":"Development of a machine learning method to predict membrane protein-ligand binding residues using basic sequence information.","authors":"M Xavier Suresh,&nbsp;M Michael Gromiha,&nbsp;Makiko Suwa","doi":"10.1155/2015/843030","DOIUrl":"https://doi.org/10.1155/2015/843030","url":null,"abstract":"<p><p>Locating ligand binding sites and finding the functionally important residues from protein sequences as well as structures became one of the challenges in understanding their function. Hence a Naïve Bayes classifier has been trained to predict whether a given amino acid residue in membrane protein sequence is a ligand binding residue or not using only sequence based information. The input to the classifier consists of the features of the target residue and two sequence neighbors on each side of the target residue. The classifier is trained and evaluated on a nonredundant set of 42 sequences (chains with at least one transmembrane domain) from 31 alpha-helical membrane proteins. The classifier achieves an overall accuracy of 70.7% with 72.5% specificity and 61.1% sensitivity in identifying ligand binding residues from sequence. The classifier performs better when the sequence is encoded by psi-blast generated PSSM profiles. Assessment of the predictions in the context of three-dimensional structures of proteins reveals the effectiveness of this method in identifying ligand binding sites from sequence information. In 83.3% (35 out of 42) of the proteins, the classifier identifies the ligand binding sites by correctly recognizing more than half of the binding residues. This will be useful to protein engineers in exploiting potential residues for functional assessment. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"843030"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/843030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33155845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. 应用于微阵列数据的特征选择和特征提取方法综述
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-06-11 DOI: 10.1155/2015/198363
Zena M Hira, Duncan F Gillies

We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.

我们总结了对高维微阵列数据进行降维处理的各种方法。目前有许多不同的特征选择和特征提取方法,并得到了广泛应用。所有这些方法都旨在去除冗余和不相关的特征,从而使新实例的分类更加准确。微阵列是一种收集基因表达的生物平台,也是一种常用的数据源。由于微阵列提供的数据量很大,因此分析起来比较困难。此外,不同基因之间的复杂关系也增加了分析的难度,而去除多余的特征可以提高分析结果的质量。我们介绍了一些最常用的选择重要特征的方法,并对它们进行了比较。我们概述了这些方法的优缺点,以便更清楚地了解何时使用这些方法来节省计算时间和资源。
{"title":"A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.","authors":"Zena M Hira, Duncan F Gillies","doi":"10.1155/2015/198363","DOIUrl":"10.1155/2015/198363","url":null,"abstract":"<p><p>We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"198363"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4480804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34285583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Conserved GEQYQQLR Epitope Has Been Identified in the Nucleoprotein of Ebola Virus by Using an In Silico Approach. 利用芯片技术在埃博拉病毒核蛋白中鉴定出一个高度保守的GEQYQQLR表位。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-02-01 DOI: 10.1155/2015/278197
Mohammad Tuhin Ali, Md Ohedul Islam

Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in thousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study, we aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based therapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the identification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and VP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using various computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30 protein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and could lead a way in the milieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the actual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV.

埃博拉病毒(EBOV)是一种致命的病毒,已经引起了几次致命的疫情。最近,它引发了另一次疫情,导致数千人患病。目前仍然缺乏针对这种病毒的有效和经批准的疫苗或治疗方法。在本研究中,我们旨在从几种EBOV编码蛋白中预测b细胞表位,这可能有助于开发新的基于抗体的治疗方法或针对该病毒的病毒抗原检测方法。采用多序列比对(MSA)鉴定EBOV糖蛋白(GP)、核蛋白(NP)和病毒结构蛋白(VP40、VP35和VP24)之间的保守区。接下来,利用免疫表位数据库(IEDB)中提供的各种计算工具,从保守区域预测不同的共识免疫原性和保守位点。在GP、NP、VP40、VP35和VP30蛋白中,只有NP给出了100%保守的GEQYQQLR b细胞表位,满足了有效b细胞表位的理想特征,可能在埃博拉治疗环境中开辟一条道路。然而,成功的体内和体外研究是确定我们预测的表位的实际效力并将其作为预防所有EBOV致命菌株的药物的先决条件。
{"title":"A Highly Conserved GEQYQQLR Epitope Has Been Identified in the Nucleoprotein of Ebola Virus by Using an In Silico Approach.","authors":"Mohammad Tuhin Ali,&nbsp;Md Ohedul Islam","doi":"10.1155/2015/278197","DOIUrl":"https://doi.org/10.1155/2015/278197","url":null,"abstract":"<p><p>Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in thousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study, we aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based therapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the identification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and VP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using various computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30 protein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and could lead a way in the milieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the actual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"278197"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/278197","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33079117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences. CISAPS:用于蛋白质序列分析的复杂信息光谱。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-01-06 DOI: 10.1155/2015/909765
Charalambos Chrysostomou, Huseyin Seker, Nizamettin Aydin

Complex informational spectrum analysis for protein sequences (CISAPS) and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work.

介绍了基于web的蛋白质序列复杂信息光谱分析系统(CISAPS)及其服务器。近年来的研究表明,在蛋白质序列分析中仅使用绝对光谱是不够的。因此,CISAPS被开发为考虑并提供三种形式的结果,包括绝对光谱、实光谱和虚光谱。在本研究中作为案例研究提出的与甲型流感亚型分析相关的生物学特征也可以单独出现在真实或想象的频谱中。结果表明,蛋白质类可以根据从CISAPS web服务器中提取的特征呈现相似或差异。这些关联可能与特定氨基酸指数所代表的蛋白质特征有关。此外,还讨论了可能影响分析的各种技术问题,如零填充和窗口。CISAPS使用611个独特氨基酸索引的扩展列表,其中每个索引代表不同的属性来执行分析。这个基于网络的服务器使研究人员与信号处理方法的知识很少,以应用和包括复杂的信息频谱分析到他们的工作。
{"title":"CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences.","authors":"Charalambos Chrysostomou,&nbsp;Huseyin Seker,&nbsp;Nizamettin Aydin","doi":"10.1155/2015/909765","DOIUrl":"https://doi.org/10.1155/2015/909765","url":null,"abstract":"<p><p>Complex informational spectrum analysis for protein sequences (CISAPS) and its web-based server are developed and presented. As recent studies show, only the use of the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient. Therefore, CISAPS is developed to consider and provide results in three forms including absolute, real, and imaginary spectrum. Biologically related features to the analysis of influenza A subtypes as presented as a case study in this study can also appear individually either in the real or imaginary spectrum. As the results presented, protein classes can present similarities or differences according to the features extracted from CISAPS web server. These associations are probable to be related with the protein feature that the specific amino acid index represents. In addition, various technical issues such as zero-padding and windowing that may affect the analysis are also addressed. CISAPS uses an expanded list of 611 unique amino acid indices where each one represents a different property to perform the analysis. This web-based server enables researchers with little knowledge of signal processing methods to apply and include complex informational spectrum analysis to their work. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"909765"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/909765","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33013111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Discovering Alzheimer Genetic Biomarkers Using Bayesian Networks. 利用贝叶斯网络发现阿尔茨海默病基因生物标志物。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-08-23 DOI: 10.1155/2015/639367
Fayroz F Sherif, Nourhan Zayed, Mahmoud Fakhr

Single nucleotide polymorphisms (SNPs) contribute most of the genetic variation to the human genome. SNPs associate with many complex and common diseases like Alzheimer's disease (AD). Discovering SNP biomarkers at different loci can improve early diagnosis and treatment of these diseases. Bayesian network provides a comprehensible and modular framework for representing interactions between genes or single SNPs. Here, different Bayesian network structure learning algorithms have been applied in whole genome sequencing (WGS) data for detecting the causal AD SNPs and gene-SNP interactions. We focused on polymorphisms in the top ten genes associated with AD and identified by genome-wide association (GWA) studies. New SNP biomarkers were observed to be significantly associated with Alzheimer's disease. These SNPs are rs7530069, rs113464261, rs114506298, rs73504429, rs7929589, rs76306710, and rs668134. The obtained results demonstrated the effectiveness of using BN for identifying AD causal SNPs with acceptable accuracy. The results guarantee that the SNP set detected by Markov blanket based methods has a strong association with AD disease and achieves better performance than both naïve Bayes and tree augmented naïve Bayes. Minimal augmented Markov blanket reaches accuracy of 66.13% and sensitivity of 88.87% versus 61.58% and 59.43% in naïve Bayes, respectively.

单核苷酸多态性(SNPs)贡献了人类基因组的大部分遗传变异。snp与许多复杂和常见的疾病,如阿尔茨海默病(AD)有关。发现不同位点的SNP生物标志物可以改善这些疾病的早期诊断和治疗。贝叶斯网络为表示基因之间或单个snp之间的相互作用提供了一个可理解的模块化框架。本研究将不同的贝叶斯网络结构学习算法应用于全基因组测序(WGS)数据中,以检测AD致病snp和基因- snp相互作用。我们重点研究了与AD相关的前10个基因的多态性,并通过全基因组关联(GWA)研究鉴定。新的SNP生物标志物被观察到与阿尔茨海默病显著相关。这些snp分别是rs7530069、rs113464261、rs114506298、rs73504429、rs7929589、rs76306710和rs668134。所获得的结果证明了使用BN识别AD因果snp的有效性,并且具有可接受的准确性。结果保证了基于马尔可夫毯的方法检测到的SNP集与AD疾病有很强的相关性,并且比naïve贝叶斯和树增强naïve贝叶斯具有更好的性能。最小增强马尔可夫毯的准确率为66.13%,灵敏度为88.87%,而naïve贝叶斯的准确率为61.58%,灵敏度为59.43%。
{"title":"Discovering Alzheimer Genetic Biomarkers Using Bayesian Networks.","authors":"Fayroz F Sherif,&nbsp;Nourhan Zayed,&nbsp;Mahmoud Fakhr","doi":"10.1155/2015/639367","DOIUrl":"https://doi.org/10.1155/2015/639367","url":null,"abstract":"<p><p>Single nucleotide polymorphisms (SNPs) contribute most of the genetic variation to the human genome. SNPs associate with many complex and common diseases like Alzheimer's disease (AD). Discovering SNP biomarkers at different loci can improve early diagnosis and treatment of these diseases. Bayesian network provides a comprehensible and modular framework for representing interactions between genes or single SNPs. Here, different Bayesian network structure learning algorithms have been applied in whole genome sequencing (WGS) data for detecting the causal AD SNPs and gene-SNP interactions. We focused on polymorphisms in the top ten genes associated with AD and identified by genome-wide association (GWA) studies. New SNP biomarkers were observed to be significantly associated with Alzheimer's disease. These SNPs are rs7530069, rs113464261, rs114506298, rs73504429, rs7929589, rs76306710, and rs668134. The obtained results demonstrated the effectiveness of using BN for identifying AD causal SNPs with acceptable accuracy. The results guarantee that the SNP set detected by Markov blanket based methods has a strong association with AD disease and achieves better performance than both naïve Bayes and tree augmented naïve Bayes. Minimal augmented Markov blanket reaches accuracy of 66.13% and sensitivity of 88.87% versus 61.58% and 59.43% in naïve Bayes, respectively. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"639367"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/639367","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34068790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Semantic annotation for biological information retrieval system. 生物信息检索系统的语义注释。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-02-09 DOI: 10.1155/2015/597170
Mohamed Marouf Z Oshaiba, Enas M F El Houby, Akram Salah

Online literatures are increasing in a tremendous rate. Biological domain is one of the fast growing domains. Biological researchers face a problem finding what they are searching for effectively and efficiently. The aim of this research is to find documents that contain any combination of biological process and/or molecular function and/or cellular component. This research proposes a framework that helps researchers to retrieve meaningful documents related to their asserted terms based on gene ontology (GO). The system utilizes GO by semantically decomposing it into three subontologies (cellular component, biological process, and molecular function). Researcher has the flexibility to choose searching terms from any combination of the three subontologies. Document annotation is taking a place in this research to create an index of biological terms in documents to speed the searching process. Query expansion is used to infer semantically related terms to asserted terms. It increases the search meaningful results using the term synonyms and term relationships. The system uses a ranking method to order the retrieved documents based on the ranking weights. The proposed system achieves researchers' needs to find documents that fit the asserted terms semantically.

在线文献正在飞速增长。生物领域是增长最快的领域之一。生物研究人员面临着如何有效、高效地找到他们所搜索内容的问题。本研究的目的是查找包含任何生物过程和/或分子功能和/或细胞成分组合的文档。本研究提出了一个框架,可帮助研究人员根据基因本体(GO)检索与其断言术语相关的有意义文档。该系统利用 GO,将其语义分解为三个子本体(细胞成分、生物过程和分子功能)。研究人员可以灵活地从这三个子本体的任意组合中选择搜索词。在这项研究中,文档注释的作用是创建文档中生物术语的索引,以加快搜索过程。查询扩展用于推断与断言术语语义相关的术语。它利用术语同义词和术语关系来增加有意义的搜索结果。系统采用排序方法,根据排序权重对检索到的文档进行排序。提议的系统满足了研究人员查找符合断言术语语义的文档的需求。
{"title":"Semantic annotation for biological information retrieval system.","authors":"Mohamed Marouf Z Oshaiba, Enas M F El Houby, Akram Salah","doi":"10.1155/2015/597170","DOIUrl":"10.1155/2015/597170","url":null,"abstract":"<p><p>Online literatures are increasing in a tremendous rate. Biological domain is one of the fast growing domains. Biological researchers face a problem finding what they are searching for effectively and efficiently. The aim of this research is to find documents that contain any combination of biological process and/or molecular function and/or cellular component. This research proposes a framework that helps researchers to retrieve meaningful documents related to their asserted terms based on gene ontology (GO). The system utilizes GO by semantically decomposing it into three subontologies (cellular component, biological process, and molecular function). Researcher has the flexibility to choose searching terms from any combination of the three subontologies. Document annotation is taking a place in this research to create an index of biological terms in documents to speed the searching process. Query expansion is used to infer semantically related terms to asserted terms. It increases the search meaningful results using the term synonyms and term relationships. The system uses a ranking method to order the retrieved documents based on the ranking weights. The proposed system achieves researchers' needs to find documents that fit the asserted terms semantically. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"597170"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4337267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33431161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FN-Identify: Novel Restriction Enzymes-Based Method for Bacterial Identification in Absence of Genome Sequencing. FN-Identify:基于限制酶的新方法,用于在没有基因组测序的情况下鉴定细菌。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-12-31 DOI: 10.1155/2015/303605
Mohamed Awad, Osama Ouda, Ali El-Refy, Fawzy A El-Feky, Kareem A Mosa, Mohamed Helmy

Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups) in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available.

16S rRNA 和 HSP60 等基因的测序和限制性分析被广泛用于微生物群落的分子鉴定。随着生物信息学的快速发展,基因组测序已成为细菌鉴定的首选方法。然而,基因组测序技术在发展中国家仍然遥不可及。在本文中,我们提出了一种无需测序的细菌鉴定方法 FN-Identify。FN-Identify 利用 GenBank 和其他数据库中的基因序列数据以及我们开发的 CreateScheme 和 GeneIdentify 两种算法,创建了一种基于限制性酶的鉴定方案。FN-Identify 在一项使用限制性酶和 16S rRNA 基因序列的硅分析中,使用三种不同的、多样化的细菌群(乳酸杆菌、假单胞菌和分枝杆菌群的成员)进行了测试。仅利用片段编号信息或结合片段大小对三个菌群成员的限制性图谱进行分析,使用最少四种、最多八种限制性酶,成功鉴定出三个菌群的所有成员。我们的研究结果证明了 FN-Identify 方法及其两种算法的实用性和准确性,可作为在无法进行基因组测序的情况下使用标准微生物实验室技术的替代方法。
{"title":"FN-Identify: Novel Restriction Enzymes-Based Method for Bacterial Identification in Absence of Genome Sequencing.","authors":"Mohamed Awad, Osama Ouda, Ali El-Refy, Fawzy A El-Feky, Kareem A Mosa, Mohamed Helmy","doi":"10.1155/2015/303605","DOIUrl":"10.1155/2015/303605","url":null,"abstract":"<p><p>Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups) in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 1","pages":"303605"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64894866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhosphoHunter: An Efficient Software Tool for Phosphopeptide Identification. PhosphoHunter:一个高效的磷酸肽鉴定软件工具。
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2015-01-01 Epub Date: 2015-01-12 DOI: 10.1155/2015/382869
Alessandra Tiengo, Lorenzo Pasotti, Nicola Barbarini, Paolo Magni

Phosphorylation is a protein posttranslational modification. It is responsible of the activation/inactivation of disease-related pathways, thanks to its role of "molecular switch." The study of phosphorylated proteins becomes a key point for the proteomic analyses focused on the identification of diagnostic/therapeutic targets. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the most widely used analytical approach. Although unmodified peptides are automatically identified by consolidated algorithms, phosphopeptides still require automated tools to avoid time-consuming manual interpretation. To improve phosphopeptide identification efficiency, a novel procedure was developed and implemented in a Perl/C tool called PhosphoHunter, here proposed and evaluated. It includes a preliminary heuristic step for filtering out the MS/MS spectra produced by nonphosphorylated peptides before sequence identification. A method to assess the statistical significance of identified phosphopeptides was also formulated. PhosphoHunter performance was tested on a dataset of 1500 MS/MS spectra and it was compared with two other tools: Mascot and Inspect. Comparisons demonstrated that a strong point of PhosphoHunter is sensitivity, suggesting that it is able to identify real phosphopeptides with superior performance. Performance indexes depend on a single parameter (intensity threshold) that users can tune according to the study aim. All the three tools localized >90% of phosphosites.

磷酸化是一种蛋白质翻译后修饰。由于其“分子开关”的作用,它负责疾病相关途径的激活/失活。磷酸化蛋白的研究成为蛋白质组学分析的重点,以确定诊断/治疗靶点。液相色谱-串联质谱(LC-MS/MS)是应用最广泛的分析方法。虽然未经修饰的肽可以通过综合算法自动识别,但磷酸肽仍然需要自动化工具来避免耗时的人工解释。为了提高磷酸肽的识别效率,在Perl/C工具中开发并实现了一个新的程序,称为PhosphoHunter,这里提出并评估。它包括一个初步的启发式步骤,用于在序列鉴定之前过滤掉非磷酸化肽产生的MS/MS光谱。还制定了一种评估鉴定的磷酸肽的统计意义的方法。在1500 MS/MS光谱数据集上测试了PhosphoHunter的性能,并比较了另外两种工具:Mascot和Inspect。比较表明,PhosphoHunter的一个优点是灵敏度,这表明它能够以优越的性能识别真正的磷酸肽。性能指标依赖于单个参数(强度阈值),用户可以根据研究目标对其进行调整。这三种工具都定位了>90%的磷酸基。
{"title":"PhosphoHunter: An Efficient Software Tool for Phosphopeptide Identification.","authors":"Alessandra Tiengo,&nbsp;Lorenzo Pasotti,&nbsp;Nicola Barbarini,&nbsp;Paolo Magni","doi":"10.1155/2015/382869","DOIUrl":"https://doi.org/10.1155/2015/382869","url":null,"abstract":"<p><p>Phosphorylation is a protein posttranslational modification. It is responsible of the activation/inactivation of disease-related pathways, thanks to its role of \"molecular switch.\" The study of phosphorylated proteins becomes a key point for the proteomic analyses focused on the identification of diagnostic/therapeutic targets. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the most widely used analytical approach. Although unmodified peptides are automatically identified by consolidated algorithms, phosphopeptides still require automated tools to avoid time-consuming manual interpretation. To improve phosphopeptide identification efficiency, a novel procedure was developed and implemented in a Perl/C tool called PhosphoHunter, here proposed and evaluated. It includes a preliminary heuristic step for filtering out the MS/MS spectra produced by nonphosphorylated peptides before sequence identification. A method to assess the statistical significance of identified phosphopeptides was also formulated. PhosphoHunter performance was tested on a dataset of 1500 MS/MS spectra and it was compared with two other tools: Mascot and Inspect. Comparisons demonstrated that a strong point of PhosphoHunter is sensitivity, suggesting that it is able to identify real phosphopeptides with superior performance. Performance indexes depend on a single parameter (intensity threshold) that users can tune according to the study aim. All the three tools localized >90% of phosphosites. </p>","PeriodicalId":39059,"journal":{"name":"Advances in Bioinformatics","volume":"2015 ","pages":"382869"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2015/382869","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33030452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Advances in Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1