首页 > 最新文献

Journal of Data Mining in Genomics & Proteomics最新文献

英文 中文
Review Paper: Data Mining of Fungal Secondary Metabolites Using Genomics and Proteomics 综述:基于基因组学和蛋白质组学的真菌次生代谢物数据挖掘
Pub Date : 2015-09-04 DOI: 10.4172/2153-0602.1000178
Ruchi Sethi Gutch, Kaushal Sharma, Aditi Tiwari
Fungi are versatile organisms; they exist on earth in all extremes of conditions. Fungi are sources of important chemical entities which may be both beneficial and deleterious. Biotechnology has helped to harness this potential of Fungi in a positive direction. The advancements in Genomics and Proteomics have opened up new horizon in research. Improved advanced Molecular Biological Technologies have given a boost to our understanding of genes and helped us to exploit the full potential of Fungi. Bioinformatics and Statistical sciences are indispensable in this regard. Databases are available, providing fast, efficient, meaningful interpretation and analysis of vast amounts of data generated in scientific laboratories.
真菌是多用途生物;它们在地球上各种极端条件下生存。真菌是重要的化学物质的来源,这些物质可能是有益的,也可能是有害的。生物技术已经帮助真菌朝着积极的方向利用这种潜力。基因组学和蛋白质组学的进步开辟了新的研究领域。先进的分子生物学技术提高了我们对基因的理解,并帮助我们开发真菌的全部潜力。生物信息学和统计科学在这方面是不可或缺的。数据库可用,为科学实验室产生的大量数据提供快速、有效、有意义的解释和分析。
{"title":"Review Paper: Data Mining of Fungal Secondary Metabolites Using Genomics and Proteomics","authors":"Ruchi Sethi Gutch, Kaushal Sharma, Aditi Tiwari","doi":"10.4172/2153-0602.1000178","DOIUrl":"https://doi.org/10.4172/2153-0602.1000178","url":null,"abstract":"Fungi are versatile organisms; they exist on earth in all extremes of conditions. Fungi are sources of important chemical entities which may be both beneficial and deleterious. Biotechnology has helped to harness this potential of Fungi in a positive direction. The advancements in Genomics and Proteomics have opened up new horizon in research. Improved advanced Molecular Biological Technologies have given a boost to our understanding of genes and helped us to exploit the full potential of Fungi. Bioinformatics and Statistical sciences are indispensable in this regard. Databases are available, providing fast, efficient, meaningful interpretation and analysis of vast amounts of data generated in scientific laboratories.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"1 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2015-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79059247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Syntrophics Bridging the Gap of Methanogenesis in the Jharia Coal Bed Basin 同养作用弥补了Jharia煤层盆地甲烷生成的空白
Pub Date : 2015-08-21 DOI: 10.4172/2153-0602.1000177
P. Jha, Sujit Ghosh, K. Mukhopadhyay, A. Sachan, A. S. Vidyarthi
The bituminous and sub-bituminous rank of coals is being produced from the Jharia basin of Jharkhand which is the largest producer of CBM in India. Although there have been many reports on methanogenesis from Jharia, the present study deals with the special emphasis on the syntrophic microbes which can act as catalyst for the hydrogenotrophic methanogenesis. Using the metagenomic approach followed by 454 pyro sequencing, the presence of syntrophic community has been deciphered for the first time from the formation water samples of Jharia coal bed basin. The taxonomic assignment of unassembled clean metagenomic sequences was performed using BLASTX against the GenBank database through MG-RAST server. The class clostridia revealed a sequence affiliation to family Syntrophomonadaceae and class Deltaproteobacteria to family Desulfobacteraceae, Pelobacteraceae, Syntrophaceae, and Syntrophobacteraceae. Results revealed the possibility of thermobiogenic methanogenesis in the coal bed due to the presence of syntrophs related to Syntrophothermus genus. The presence of such communities can aid in biotransformation of coal to methane leading to enhanced energy production
印度贾坎德邦的贾里亚盆地是印度最大的煤层气生产国。虽然已有许多关于贾哈里亚产甲烷的报道,但本文的研究重点是作为氢营养化产甲烷催化剂的合营养微生物。利用宏基因组方法和454 pyro测序,首次从Jharia煤层盆地地层水样中破译了共生群落的存在。使用BLASTX软件通过MG-RAST服务器对GenBank数据库进行未组装干净宏基因组序列的分类鉴定。梭菌纲与合养单胞菌科有亲缘关系,三角洲变形菌纲与Desulfobacteraceae、Pelobacteraceae、Syntrophaceae和Syntrophobacteraceae有亲缘关系。结果表明,煤层中存在与合养菌属有关的合养菌,可能发生热生甲烷生成。这些群落的存在有助于煤向甲烷的生物转化,从而提高能源产量
{"title":"Syntrophics Bridging the Gap of Methanogenesis in the Jharia Coal Bed Basin","authors":"P. Jha, Sujit Ghosh, K. Mukhopadhyay, A. Sachan, A. S. Vidyarthi","doi":"10.4172/2153-0602.1000177","DOIUrl":"https://doi.org/10.4172/2153-0602.1000177","url":null,"abstract":"The bituminous and sub-bituminous rank of coals is being produced from the Jharia basin of Jharkhand which is the largest producer of CBM in India. Although there have been many reports on methanogenesis from Jharia, the present study deals with the special emphasis on the syntrophic microbes which can act as catalyst for the hydrogenotrophic methanogenesis. Using the metagenomic approach followed by 454 pyro sequencing, the presence of syntrophic community has been deciphered for the first time from the formation water samples of Jharia coal bed basin. The taxonomic assignment of unassembled clean metagenomic sequences was performed using BLASTX against the GenBank database through MG-RAST server. The class clostridia revealed a sequence affiliation to family Syntrophomonadaceae and class Deltaproteobacteria to family Desulfobacteraceae, Pelobacteraceae, Syntrophaceae, and Syntrophobacteraceae. Results revealed the possibility of thermobiogenic methanogenesis in the coal bed due to the presence of syntrophs related to Syntrophothermus genus. The presence of such communities can aid in biotransformation of coal to methane leading to enhanced energy production","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"367 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2015-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74905512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
mBLAST: Keeping up with the sequencing explosion for (meta)genome analysis. mBLAST:紧跟(元)基因组分析的测序爆炸。
Pub Date : 2015-08-01 Epub Date: 2013-07-31 DOI: 10.4172/2153-0602.1000135
Curtis Davis, Karthik Kota, Venkat Baldhandapani, Wei Gong, Sahar Abubucker, Eric Becker, John Martin, Kristine M Wylie, Radhika Khetani, Matthew E Hudson, George M Weinstock, Makedonka Mitreva

Recent advances in next-generation sequencing technologies require alignment algorithms and software that can keep pace with the heightened data production. Standard algorithms, especially protein similarity searches, represent significant bottlenecks in analysis pipelines. For metagenomic approaches in particular, it is now often necessary to search hundreds of millions of sequence reads against large databases. Here we describe mBLAST, an accelerated search algorithm for translated and/or protein alignments to large datasets based on the Basic Local Alignment Search Tool (BLAST) and retaining the high sensitivity of BLAST. The mBLAST algorithms achieve substantial speed up over the National Center for Biotechnology Information (NCBI) programs BLASTX, TBLASTX and BLASTP for large datasets, allowing analysis within reasonable timeframes on standard computer architectures. In this article, the impact of mBLAST is demonstrated with sequences originating from the microbiota of healthy humans from the Human Microbiome Project. mBLAST is designed as a plug-in replacement for BLAST for any study that involves short-read sequences and includes high-throughput analysis. The mBLAST software is freely available to academic users at www.multicorewareinc.com.

新一代测序技术的最新进展要求校准算法和软件能够跟上数据生产的步伐。标准算法,特别是蛋白质相似性搜索,是分析管道中的重要瓶颈。特别是对于宏基因组方法,现在经常需要在大型数据库中搜索数以亿计的序列读取。mBLAST是一种基于基本局部比对搜索工具(Basic Local Alignment search Tool, BLAST)并保留BLAST的高灵敏度的大型数据集翻译和/或蛋白质比对的加速搜索算法。与国家生物技术信息中心(NCBI)项目BLASTX、TBLASTX和BLASTP相比,mBLAST算法实现了对大型数据集的大幅提速,允许在合理的时间框架内在标准计算机架构上进行分析。在这篇文章中,mBLAST的影响是通过来自人类微生物组计划的健康人类微生物群的序列来证明的。mBLAST被设计为BLAST的插件替代品,适用于任何涉及短读序列和高通量分析的研究。mBLAST软件免费提供给学术用户,网址是www.multicorewareinc.com。
{"title":"mBLAST: Keeping up with the sequencing explosion for (meta)genome analysis.","authors":"Curtis Davis,&nbsp;Karthik Kota,&nbsp;Venkat Baldhandapani,&nbsp;Wei Gong,&nbsp;Sahar Abubucker,&nbsp;Eric Becker,&nbsp;John Martin,&nbsp;Kristine M Wylie,&nbsp;Radhika Khetani,&nbsp;Matthew E Hudson,&nbsp;George M Weinstock,&nbsp;Makedonka Mitreva","doi":"10.4172/2153-0602.1000135","DOIUrl":"https://doi.org/10.4172/2153-0602.1000135","url":null,"abstract":"<p><p>Recent advances in next-generation sequencing technologies require alignment algorithms and software that can keep pace with the heightened data production. Standard algorithms, especially protein similarity searches, represent significant bottlenecks in analysis pipelines. For metagenomic approaches in particular, it is now often necessary to search hundreds of millions of sequence reads against large databases. Here we describe mBLAST, an accelerated search algorithm for translated and/or protein alignments to large datasets based on the Basic Local Alignment Search Tool (BLAST) and retaining the high sensitivity of BLAST. The mBLAST algorithms achieve substantial speed up over the National Center for Biotechnology Information (NCBI) programs BLASTX, TBLASTX and BLASTP for large datasets, allowing analysis within reasonable timeframes on standard computer architectures. In this article, the impact of mBLAST is demonstrated with sequences originating from the microbiota of healthy humans from the Human Microbiome Project. mBLAST is designed as a plug-in replacement for BLAST for any study that involves short-read sequences and includes high-throughput analysis. The mBLAST software is freely available to academic users at www.multicorewareinc.com.</p>","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4612494/pdf/nihms696431.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34117363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Prevalence and Molecular Epidemiology of Human Papillomavirus in Ecuadorian Women with Cervical Cytological Abnormalities 人乳头瘤病毒在宫颈细胞学异常的厄瓜多尔妇女中的患病率和分子流行病学
Pub Date : 2015-07-23 DOI: 10.4172/2153-0602.1000174
Guido Silva Francisco Altamirano, Walter Montenegro, Ricardo Silva
The relationship between human papillomavirus (HPV) and cervical cancer remains a topic of extensive research. This virus is responsible for mild and severe abnormalities that can slowly trigger some type of carcinoma with a strong association with sexual practice. Availability of new techniques for HPV tipification allow to better establish more common virus types associated to this neoplasia. The article presents prevalence and molecular epidemiology (PCR results) from 1000 female patients affiliated to the Ecuadorian Institute of Social Security (IESS), concurrent to Teodoro Maldonado Carbo Hospital in the city of Guayaquil, Ecuador, from July 2011 to August 2013. Results prove that the most prevalent types of HPV present are: HPV-16 (29, 77%); HPV-52 (16, 18%); HPV-51 (12, 30%); HPV-6 (9, 71%); and HPV-59 (8, 74%). Molecular epidemiology is quite distinct from that found in other parts of the world. Ecuador is importing Papillomavirus vaccines, and general idea from health authorities is that these vaccines offer protection against 75% of papilloma virus infections. Results presented in this study, suggest that this protection is less than 30% for women in the province of Guayas.
人乳头瘤病毒(HPV)与宫颈癌之间的关系仍然是一个广泛研究的课题。这种病毒可导致轻微和严重的异常,并可慢慢引发与性行为密切相关的某些类型的癌。新技术的可用性为HPV鉴定允许更好地建立更常见的病毒类型相关的这种肿瘤。本文介绍了2011年7月至2013年8月厄瓜多尔瓜亚基尔市特奥多罗马尔多纳多卡波医院隶属于厄瓜多尔社会保障研究所(IESS)的1000名女性患者的患病率和分子流行病学(PCR结果)。结果证明,目前最流行的HPV类型是:HPV-16 (29.77%);Hpv-52 (16.18%);Hpv-51 (12.30%);Hpv-6 (9.71%);HPV-59(8.74%)。分子流行病学与世界其他地区的流行病学截然不同。厄瓜多尔正在进口乳头瘤病毒疫苗,卫生当局的总体想法是,这些疫苗可预防75%的乳头瘤病毒感染。这项研究的结果表明,瓜亚斯省妇女的这种保护不到30%。
{"title":"Prevalence and Molecular Epidemiology of Human Papillomavirus in Ecuadorian Women with Cervical Cytological Abnormalities","authors":"Guido Silva Francisco Altamirano, Walter Montenegro, Ricardo Silva","doi":"10.4172/2153-0602.1000174","DOIUrl":"https://doi.org/10.4172/2153-0602.1000174","url":null,"abstract":"The relationship between human papillomavirus (HPV) and cervical cancer remains a topic of extensive research. This virus is responsible for mild and severe abnormalities that can slowly trigger some type of carcinoma with a strong association with sexual practice. Availability of new techniques for HPV tipification allow to better establish more common virus types associated to this neoplasia. The article presents prevalence and molecular epidemiology (PCR results) from 1000 female patients affiliated to the Ecuadorian Institute of Social Security (IESS), concurrent to Teodoro Maldonado Carbo Hospital in the city of Guayaquil, Ecuador, from July 2011 to August 2013. Results prove that the most prevalent types of HPV present are: HPV-16 (29, 77%); HPV-52 (16, 18%); HPV-51 (12, 30%); HPV-6 (9, 71%); and HPV-59 (8, 74%). Molecular epidemiology is quite distinct from that found in other parts of the world. Ecuador is importing Papillomavirus vaccines, and general idea from health authorities is that these vaccines offer protection against 75% of papilloma virus infections. Results presented in this study, suggest that this protection is less than 30% for women in the province of Guayas.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"330 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2015-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73859919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Protein Functional Site Prediction Using a Conservative Grade and aProximate Grade 使用保守分级和近似分级的蛋白质功能位点预测
Pub Date : 2015-07-16 DOI: 10.4172/2153-0602.1000175
Yosuke Kondo, S. Miyazaki
So far, in order to predict important sites of a protein, many computational methods have been developed. In the era of big-data, it is required for improvements and sophistication of existing methods by integrating sequence data in the structural data. In this paper, we aim at two things: improving sequence-based methods and developing a new method using both sequence and structural data. Therefore, we developed an originally modified evolutionary trace method, in which we defined conservative grades calculated from a given multiple sequence alignment and a proximate grade in order to evaluate predicted active sites from a viewpoint of protein-ion, protein-ligand, protein-nucleic acid, proteinprotein interaction by use of three-dimensional structures. In other words, the proximate grade also can evaluate an amino acid residue. When we applied our method to translation elongation factor Tu/1A proteins, it showed that the conservative grades are evaluated accurately by the proximate grade. Consequently, our idea indicated two advantages. One is that we can take into account various cocrystal structures for evaluation. Another one is that, by calculating the fitness between the given conservative grade and the proximate grade, we can select the best conservative grade.
到目前为止,为了预测蛋白质的重要位点,已经开发了许多计算方法。在大数据时代,需要将序列数据整合到结构数据中,对现有方法进行改进和完善。在本文中,我们的目标是两件事:改进基于序列的方法和开发一种同时使用序列和结构数据的新方法。因此,我们开发了一种原始改进的进化痕迹方法,在该方法中,我们定义了从给定的多个序列比对和近似等级计算的保守等级,以便通过使用三维结构从蛋白质,蛋白质-配体,蛋白质-核酸,蛋白质-蛋白质相互作用的角度评估预测的活性位点。换句话说,近似等级也可以评价氨基酸残基。当我们将我们的方法应用于翻译延伸因子Tu/1A蛋白时,结果表明保守等级可以通过近似等级准确地评估。因此,我们的想法有两个好处。一是我们可以考虑不同的共晶结构来评估。另一种是通过计算给定保守等级与近似等级之间的适应度,选择最佳保守等级。
{"title":"Protein Functional Site Prediction Using a Conservative Grade and aProximate Grade","authors":"Yosuke Kondo, S. Miyazaki","doi":"10.4172/2153-0602.1000175","DOIUrl":"https://doi.org/10.4172/2153-0602.1000175","url":null,"abstract":"So far, in order to predict important sites of a protein, many computational methods have been developed. In the era of big-data, it is required for improvements and sophistication of existing methods by integrating sequence data in the structural data. In this paper, we aim at two things: improving sequence-based methods and developing a new method using both sequence and structural data. Therefore, we developed an originally modified evolutionary trace method, in which we defined conservative grades calculated from a given multiple sequence alignment and a proximate grade in order to evaluate predicted active sites from a viewpoint of protein-ion, protein-ligand, protein-nucleic acid, proteinprotein interaction by use of three-dimensional structures. In other words, the proximate grade also can evaluate an amino acid residue. When we applied our method to translation elongation factor Tu/1A proteins, it showed that the conservative grades are evaluated accurately by the proximate grade. Consequently, our idea indicated two advantages. One is that we can take into account various cocrystal structures for evaluation. Another one is that, by calculating the fitness between the given conservative grade and the proximate grade, we can select the best conservative grade.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"13 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2015-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84365596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Is it Time for Cognitive Bioinformatics 认知生物信息学的时代到了吗
Pub Date : 2015-07-14 DOI: 10.4172/2153-0602.1000173
A. Lisitsa, Elizabeth Stewart, E. Kolker
The concept of cognitive bioinformatics has been proposed for structuring of knowledge in the field of molecular biology. While cognitive science is considered as “thinking about the process of thinking”, cognitive bioinformatics strives to capture the process of thought and analysis as applied to the challenging intersection of diverse fields such as biology, informatics, and computer science collectively known as bioinformatics. Ten years ago cognitive bioinformatics was introduced as a model of the analysis performed by scientists working with molecular biology and biomedical web resources. At present, the concept of cognitive bioinformatics can be examined in the context of the opportunities represented by the information “data deluge” of life sciences technologies. The unbalanced nature of accumulating information along with some challenges poses currently intractable problems for researchers. The solutions to these problems at the micro-and macro-levels are considered with regards to the role of cognitive approaches in the field of bioinformatics.
认知生物信息学的概念已被提出用于分子生物学领域的知识结构。认知科学被认为是“对思维过程的思考”,认知生物信息学则致力于捕捉思维和分析的过程,并将其应用于生物学、信息学和计算机科学等不同领域的具有挑战性的交叉领域,统称为生物信息学。十年前,认知生物信息学作为一种模型被引入,由从事分子生物学和生物医学网络资源工作的科学家进行分析。目前,认知生物信息学的概念可以在生命科学技术的信息“数据洪流”所代表的机会的背景下进行检查。信息积累的不平衡性以及一些挑战是目前研究人员面临的棘手问题。这些问题的解决方案在微观和宏观层面上考虑到认知方法在生物信息学领域的作用。
{"title":"Is it Time for Cognitive Bioinformatics","authors":"A. Lisitsa, Elizabeth Stewart, E. Kolker","doi":"10.4172/2153-0602.1000173","DOIUrl":"https://doi.org/10.4172/2153-0602.1000173","url":null,"abstract":"The concept of cognitive bioinformatics has been proposed for structuring of knowledge in the field of molecular biology. While cognitive science is considered as “thinking about the process of thinking”, cognitive bioinformatics strives to capture the process of thought and analysis as applied to the challenging intersection of diverse fields such as biology, informatics, and computer science collectively known as bioinformatics. Ten years ago cognitive bioinformatics was introduced as a model of the analysis performed by scientists working with molecular biology and biomedical web resources. At present, the concept of cognitive bioinformatics can be examined in the context of the opportunities represented by the information “data deluge” of life sciences technologies. The unbalanced nature of accumulating information along with some challenges poses currently intractable problems for researchers. The solutions to these problems at the micro-and macro-levels are considered with regards to the role of cognitive approaches in the field of bioinformatics.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"100 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2015-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85794229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PGR: A Novel Graph Repository of Protein 3D-Structures PGR:一种新的蛋白质三维结构图存储库
Pub Date : 2015-07-09 DOI: 10.4172/2153-0602.1000172
Wajdi Dhifli, Abdoulaye Baniré Diallo
Graph theory and graph mining constitute rich fields of computational techniques to study the structures, topologies and properties of graphs. These techniques constitute a good asset in bioinformatics if there exist efficient methods for transforming biological data into graphs. In this paper, we present Protein Graph Repository (PGR), a novel database of protein 3D-structures transformed into graphs allowing the use of the large repertoire of graph theory techniques in protein mining. This repository contains graph representations of all currently known protein 3D-structures described in the Protein Data Bank (PDB). PGR also provides an efficient online converter of protein 3Dstructures into graphs, biological and graph-based description, pre-computed protein graph attributes and statistics, visualization of each protein graph, as well as graph-based protein similarity search tool. Such repository presents an enrichment of existing online databases that will help bridging the gap between graph mining and protein structure analysis. PGR data and features are unique and not included in any other protein database. The repository is available at http://wjdi.bioinfo. uqam.ca/
图论和图挖掘构成了研究图的结构、拓扑和性质的计算技术的丰富领域。如果存在将生物数据转换成图形的有效方法,这些技术将构成生物信息学中的一笔好资产。在本文中,我们提出了蛋白质图库(PGR),这是一个将蛋白质3d结构转换成图的新数据库,允许在蛋白质挖掘中使用大量图论技术。该存储库包含蛋白质数据库(PDB)中描述的所有当前已知的蛋白质3d结构的图形表示。PGR还提供了一个高效的蛋白质三维结构在线转换成图形,生物和基于图形的描述,预先计算的蛋白质图形属性和统计,每个蛋白质图形的可视化,以及基于图形的蛋白质相似度搜索工具。这样的存储库丰富了现有的在线数据库,将有助于弥合图挖掘和蛋白质结构分析之间的差距。PGR数据和特征是独一无二的,不包括在任何其他蛋白质数据库中。该存储库可从http://wjdi.bioinfo获得。uqam.ca /
{"title":"PGR: A Novel Graph Repository of Protein 3D-Structures","authors":"Wajdi Dhifli, Abdoulaye Baniré Diallo","doi":"10.4172/2153-0602.1000172","DOIUrl":"https://doi.org/10.4172/2153-0602.1000172","url":null,"abstract":"Graph theory and graph mining constitute rich fields of computational techniques to study the structures, topologies and properties of graphs. These techniques constitute a good asset in bioinformatics if there exist efficient methods for transforming biological data into graphs. In this paper, we present Protein Graph Repository (PGR), a novel database of protein 3D-structures transformed into graphs allowing the use of the large repertoire of graph theory techniques in protein mining. This repository contains graph representations of all currently known protein 3D-structures described in the Protein Data Bank (PDB). PGR also provides an efficient online converter of protein 3Dstructures into graphs, biological and graph-based description, pre-computed protein graph attributes and statistics, visualization of each protein graph, as well as graph-based protein similarity search tool. Such repository presents an enrichment of existing online databases that will help bridging the gap between graph mining and protein structure analysis. PGR data and features are unique and not included in any other protein database. The repository is available at http://wjdi.bioinfo. uqam.ca/","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"36 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81069342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Promoter Prediction in Bacterial DNA Sequences Using Expectation Maximization and Support Vector Machine Learning Approach 基于期望最大化和支持向量机器学习方法的细菌DNA序列启动子预测
Pub Date : 2015-07-08 DOI: 10.4172/2153-0602.1000171
Ahmad Maleki, Vahid Vaezinia, A. Fekri
Promoter is a part of the DNA sequence that comes before the gene and is key as a regulator of genes. Promoter prediction helps determine gene position and analyze gene expression. Hence, it is of great importance in the field of bioinformatics. In bioinformatics research, a number of machine learning approaches are applied to discover new meaningful knowledge from biological databases. In this study, two learning approaches, expectation maximization clustering and support vector machine classifier (EMSVM) are used to perform promoter detection. Expectation maximization (EM) algorithm is used to identify groups of samples that behave similarly and dissimilarly, such as the activity of promoters and non-promoters in the first stage, while the support vector machine (SVM) is used in the second stage to classify all the data into the correct class category. We have applied this method to datasets corresponding to σ24, σ32, σ38, σ70 promoters and its effectiveness was demonstrated on a range of different promoter regions. Furthermore, it was compared with other classification algorithms to indicate the appropriate performance of the proposed algorithm. Test results show that EMSVM performs better than other methods.
启动子是DNA序列的一部分,位于基因之前,是基因调控的关键。启动子预测有助于确定基因位置和分析基因表达。因此,它在生物信息学领域具有重要的意义。在生物信息学研究中,许多机器学习方法被应用于从生物数据库中发现新的有意义的知识。本研究采用期望最大化聚类和支持向量机分类器(EMSVM)两种学习方法进行启动子检测。期望最大化(EM)算法在第一阶段用于识别行为相似和不相似的样本组,例如启动子和非启动子的活动,而在第二阶段使用支持向量机(SVM)将所有数据分类到正确的类类别中。我们将该方法应用于σ24、σ32、σ38、σ70启动子对应的数据集,并在一系列不同的启动子区域上证明了该方法的有效性。此外,将其与其他分类算法进行了比较,以表明所提算法的适当性能。测试结果表明,EMSVM的性能优于其他方法。
{"title":"Promoter Prediction in Bacterial DNA Sequences Using Expectation Maximization and Support Vector Machine Learning Approach","authors":"Ahmad Maleki, Vahid Vaezinia, A. Fekri","doi":"10.4172/2153-0602.1000171","DOIUrl":"https://doi.org/10.4172/2153-0602.1000171","url":null,"abstract":"Promoter is a part of the DNA sequence that comes before the gene and is key as a regulator of genes. Promoter prediction helps determine gene position and analyze gene expression. Hence, it is of great importance in the field of bioinformatics. In bioinformatics research, a number of machine learning approaches are applied to discover new meaningful knowledge from biological databases. In this study, two learning approaches, expectation maximization clustering and support vector machine classifier (EMSVM) are used to perform promoter detection. Expectation maximization (EM) algorithm is used to identify groups of samples that behave similarly and dissimilarly, such as the activity of promoters and non-promoters in the first stage, while the support vector machine (SVM) is used in the second stage to classify all the data into the correct class category. We have applied this method to datasets corresponding to σ24, σ32, σ38, σ70 promoters and its effectiveness was demonstrated on a range of different promoter regions. Furthermore, it was compared with other classification algorithms to indicate the appropriate performance of the proposed algorithm. Test results show that EMSVM performs better than other methods.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"36 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2015-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74748558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Trichoderma species Cellulases Produced by Solid State Fermentation 固体发酵生产的木霉属纤维素酶
Pub Date : 2015-07-02 DOI: 10.4172/2153-0602.1000170
P. Sonika, Ey, M. Srivastava, M. Shahid, Vipul Kumar, A. Singh, Shubham Trivedi, Y. K. Srivastava
The main aim of this study was to analyze eight species of Trichoderma for cellulase enzyme production by solid state fermentation. Different carbon sources such as wheat bran, corn cob, sucrose, maltose and filter paper were used. Highest celluase enzyme production was achieved with T. harzianum on media supplemented with corn cob. The optimum pH, temperature and thermal stability of isolated enzymes were also analyzed. The best pH for enzyme production was found between 4-6. The optimum temperature range for cellulase production ranged between 30-40°C. Choosing the optimum pH, temperature and best carbon source are essential for the enzyme production. Compare to other fungal genera it has been found that Trichoderma spp. have the greater potential to synthesize cellulase enzyme.
本研究的主要目的是对8种木霉进行固体发酵生产纤维素酶的分析。采用了麦麸、玉米芯、蔗糖、麦芽糖和滤纸等不同的碳源。在添加玉米芯的培养基上,哈兹芽孢杆菌的纤维素酶产量最高。并对分离酶的最佳pH、温度和热稳定性进行了分析。产酶的最佳pH值在4 ~ 6之间。纤维素酶生产的最佳温度范围为30-40℃。选择最佳的pH、温度和最佳的碳源对酶的生产至关重要。与其他真菌属相比,木霉属具有更大的合成纤维素酶的潜力。
{"title":"Trichoderma species Cellulases Produced by Solid State Fermentation","authors":"P. Sonika, Ey, M. Srivastava, M. Shahid, Vipul Kumar, A. Singh, Shubham Trivedi, Y. K. Srivastava","doi":"10.4172/2153-0602.1000170","DOIUrl":"https://doi.org/10.4172/2153-0602.1000170","url":null,"abstract":"The main aim of this study was to analyze eight species of Trichoderma for cellulase enzyme production by solid state fermentation. Different carbon sources such as wheat bran, corn cob, sucrose, maltose and filter paper were used. Highest celluase enzyme production was achieved with T. harzianum on media supplemented with corn cob. The optimum pH, temperature and thermal stability of isolated enzymes were also analyzed. The best pH for enzyme production was found between 4-6. The optimum temperature range for cellulase production ranged between 30-40°C. Choosing the optimum pH, temperature and best carbon source are essential for the enzyme production. Compare to other fungal genera it has been found that Trichoderma spp. have the greater potential to synthesize cellulase enzyme.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"12 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2015-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82434325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Nutrigenomics: Just another ??Omic?? 营养基因组学:又一个基因组学?
Pub Date : 2015-03-30 DOI: 10.4172/2153-0602.10000S1
S. MirajkarSJKalePBBangarS, Dipika A Padole
Flax (Linum usitatissimum L., 2n = 30) belongs to the family Linaceae and is a dual-purpose crop with utility as an oilseed (linseed) as well as stem fiber (linen). It is emerging as one of the key sources of phytochemicals in the functional food arena. It is clinically proven that consumption of flax seed reduces the risk of heart attack, inflammatory disorders, arthrosclerosis and inhibits growth of prostate and breast cancers. Flax seed is the richest agricultural source of the essential fatty acid, α-lenolenic acid (ALA) of omega-3 class and lignans along with high quality proteins, soluble fibers and phenolic compounds. Oil and lignans are important nutraceuticals that accumulate in endosperm and seed coat, respectively during seed development.In our study, high-throughput proteomics approach was employed to determine the expression profile and identity of hundreds of proteins during seed filling in flax. The proteins were analyzed at 4, 8, 12, 16, 22, 30 and 48 days after flowering using one dimensional SDS-PAGE as prefractionation method and nLC-ESI-MS/MS. Relative protein concentration was determined by spiking samples with 50 fmol of standard BSA tryptic digest. Spectral counting of standard BSA peptides was considered for relative quantification of unknown proteins. A database was developed from predicted gene models of flax whole genome sequence and raw data were searched to identify the proteins and confirmed by BLAST analysis. We identified 965 non-redundant proteins, which were classified into 14 major functional categories. The proteins involved in metabolism, protein destination and storage, metabolite transport and disease/defense were the most abundant. For each functional category, a composite expression profile has been presented to gain insight into seed physiology and the general regulation of proteins associated with each functional class. Using this approach, the metabolism-related proteins were found to decrease, while the proteins associated with destination and storage increased during seed filling.
{"title":"Nutrigenomics: Just another ??Omic??","authors":"S. MirajkarSJKalePBBangarS, Dipika A Padole","doi":"10.4172/2153-0602.10000S1","DOIUrl":"https://doi.org/10.4172/2153-0602.10000S1","url":null,"abstract":"Flax (Linum usitatissimum L., 2n = 30) belongs to the family Linaceae and is a dual-purpose crop with utility as an oilseed (linseed) as well as stem fiber (linen). It is emerging as one of the key sources of phytochemicals in the functional food arena. It is clinically proven that consumption of flax seed reduces the risk of heart attack, inflammatory disorders, arthrosclerosis and inhibits growth of prostate and breast cancers. Flax seed is the richest agricultural source of the essential fatty acid, α-lenolenic acid (ALA) of omega-3 class and lignans along with high quality proteins, soluble fibers and phenolic compounds. Oil and lignans are important nutraceuticals that accumulate in endosperm and seed coat, respectively during seed development.In our study, high-throughput proteomics approach was employed to determine the expression profile and identity of hundreds of proteins during seed filling in flax. The proteins were analyzed at 4, 8, 12, 16, 22, 30 and 48 days after flowering using one dimensional SDS-PAGE as prefractionation method and nLC-ESI-MS/MS. Relative protein concentration was determined by spiking samples with 50 fmol of standard BSA tryptic digest. Spectral counting of standard BSA peptides was considered for relative quantification of unknown proteins. A database was developed from predicted gene models of flax whole genome sequence and raw data were searched to identify the proteins and confirmed by BLAST analysis. We identified 965 non-redundant proteins, which were classified into 14 major functional categories. The proteins involved in metabolism, protein destination and storage, metabolite transport and disease/defense were the most abundant. For each functional category, a composite expression profile has been presented to gain insight into seed physiology and the general regulation of proteins associated with each functional class. Using this approach, the metabolism-related proteins were found to decrease, while the proteins associated with destination and storage increased during seed filling.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72698721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Journal of Data Mining in Genomics & Proteomics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1