Journal of Data Mining in Genomics & Proteomics最新文献

英文中文

Genome Mining and Comparative Genomic Analysis of Five Coagulase- Negative Staphylococci (CNS) Isolated from Human Colon and Gall Bladder 从人结肠和胆囊分离的5种凝固酶阴性葡萄球菌(CNS)的基因组挖掘和比较基因组分析

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-28 DOI: 10.4172/2153-0602.1000192

R. G. Nair, G. Kaur, Indu Khatri, N. Singh, S. K. Maurya, Srikrishna Subramanian, A. Behera, D. Dahiya, J. Agrewala, S. Mayilraj

Coagulase-negative Staphylococci (CNS) are known to cause distinct types of infections in humans like endocarditis and urinary tract infections (UTI). Surprisingly, there is a lack of genome analysis data in literature against CNS particularly of human origin. In light of this, we performed genome mining and comparative genomic analysis of CNS strains Staphylococcus cohnii subsp. cohnii strain GM22B2, Staphylococcus equorum subsp. strain equorum G8HB1, Staphylococcus pasteuri strain BAB3 isolated from gall bladder and Staphylococcus haemolyticus strain 1HT3, Staphylococcus warneri strain 1DB1 isolated from colon. We identified 29% of shared virulence determinants in the CNS strains which involved resistance to antibiotics and toxic compounds, bacteriocins and ribosomally synthesized peptides, adhesion, invasion, intracellular resistance, prophage regions, pathogenicity islands. 10 unique virulence factors involved in adhesion, negative transcriptional regulation, resistance to copper and cadmium, phage maturation were also present in our strains. Apart from comparing the genome homology, size and G + C content, we also showed the presence 10 different CRISPR-cas genes in the CNS strains. Further, KAAS based annotation revealed the presence of CNS genes in different pathways involved in human diseases. In conclusion, this study is a first attempt to unveil the pathogenomics of CNS isolated from two distinct body organs and highlights the importance of CNS as emerging pathogens of health care sector.

已知凝固酶阴性葡萄球菌(CNS)可引起不同类型的人类感染，如心内膜炎和尿路感染(UTI)。令人惊讶的是，文献中缺乏针对中枢神经系统的基因组分析数据，特别是人类起源。鉴于此，我们进行了CNS菌株柯氏葡萄球菌亚种的基因组挖掘和比较基因组分析。牛颈葡萄球菌GM22B2;马氏葡萄球菌G8HB1、胆囊分离的巴氏葡萄球菌BAB3、结肠分离的溶血葡萄球菌1HT3、瓦纳里葡萄球菌1DB1。我们在CNS菌株中发现了29%的共同毒力决定因素，包括对抗生素和有毒化合物、细菌素和核糖体合成肽、粘附、侵袭、细胞内耐药、噬菌体区、致病性岛的耐药性。在我们的菌株中还存在10个独特的毒力因子，涉及粘附，负转录调控，对铜和镉的抗性，噬菌体成熟。除了比较基因组同源性、大小和G + C含量外，我们还发现在CNS菌株中存在10种不同的CRISPR-cas基因。此外，基于KAAS的注释揭示了CNS基因在参与人类疾病的不同途径中的存在。总之，本研究首次揭示了从两个不同的身体器官分离的中枢神经系统的病理基因组学，并强调了中枢神经系统作为卫生保健部门新兴病原体的重要性。

{"title":"Genome Mining and Comparative Genomic Analysis of Five Coagulase- Negative Staphylococci (CNS) Isolated from Human Colon and Gall Bladder","authors":"R. G. Nair, G. Kaur, Indu Khatri, N. Singh, S. K. Maurya, Srikrishna Subramanian, A. Behera, D. Dahiya, J. Agrewala, S. Mayilraj","doi":"10.4172/2153-0602.1000192","DOIUrl":"https://doi.org/10.4172/2153-0602.1000192","url":null,"abstract":"Coagulase-negative Staphylococci (CNS) are known to cause distinct types of infections in humans like endocarditis and urinary tract infections (UTI). Surprisingly, there is a lack of genome analysis data in literature against CNS particularly of human origin. In light of this, we performed genome mining and comparative genomic analysis of CNS strains Staphylococcus cohnii subsp. cohnii strain GM22B2, Staphylococcus equorum subsp. strain equorum G8HB1, Staphylococcus pasteuri strain BAB3 isolated from gall bladder and Staphylococcus haemolyticus strain 1HT3, Staphylococcus warneri strain 1DB1 isolated from colon. We identified 29% of shared virulence determinants in the CNS strains which involved resistance to antibiotics and toxic compounds, bacteriocins and ribosomally synthesized peptides, adhesion, invasion, intracellular resistance, prophage regions, pathogenicity islands. 10 unique virulence factors involved in adhesion, negative transcriptional regulation, resistance to copper and cadmium, phage maturation were also present in our strains. Apart from comparing the genome homology, size and G + C content, we also showed the presence 10 different CRISPR-cas genes in the CNS strains. Further, KAAS based annotation revealed the presence of CNS genes in different pathways involved in human diseases. In conclusion, this study is a first attempt to unveil the pathogenomics of CNS isolated from two distinct body organs and highlights the importance of CNS as emerging pathogens of health care sector.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"37 1","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2016-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75397581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Learning Machine Implementation for Big Data Analytics, Challenges and Solutions 学习机器实现大数据分析，挑战和解决方案

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-22 DOI: 10.4172/2153-0602.1000190

A. N. Al-Masri, Manal M. Nasir

Big Data analytics is one of the great challenges for Learning Machine (LM) algorithms because most real-life applications involve a massive information or big data knowledge base. By contrast, an Artificial Intelligent (AI) system with a data knowledge base should be able to compute the result in an accurate and fast manner. This study focused on the challenges and solutions of using with Big Data. Data processing is a mandatory step to transform unstructured Big Data into a meaningful and optimized data set in any LM module. However, an optimized data set must be deployed to support a distributed processing and real-time application. This work also reviewed the technologies currently used in Big Data analysis and LM computation and emphasized that the viability of using different solutions for certain applications could increase LM performance. The new development, especially in cloud computing and data transaction speed, offers significant advantages to the practical use of AI applications.

大数据分析是学习机(LM)算法面临的巨大挑战之一，因为大多数现实应用都涉及大量信息或大数据知识库。相比之下，具有数据知识库的人工智能(AI)系统应该能够以准确和快速的方式计算结果。本研究的重点是使用大数据的挑战和解决方案。在任何LM模块中，数据处理都是将非结构化大数据转换为有意义且优化的数据集的必要步骤。但是，必须部署优化的数据集来支持分布式处理和实时应用程序。这项工作还回顾了目前在大数据分析和LM计算中使用的技术，并强调了在某些应用中使用不同解决方案可以提高LM性能的可行性。新的发展，特别是在云计算和数据交易速度方面，为人工智能应用的实际应用提供了显著的优势。

引用次数: 1

Physicochemical Properties of Proteomic Ionic Liquids Matrices for MALDI-MS MALDI-MS中蛋白质组离子液体基质的理化性质研究

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-17 DOI: 10.4172/2153-0602.1000189

H. Abdelhamid

Ionic liquid matrices (ILMs) have great contributions and showed high improvements for the protein analysis using matrix assisted laser desorption/ionization mass spectrometry (MALDI-MS). The physicochemical properties of these materials are important to understand the ILM performance and design the effective ILMs. The present study represented the relationships of the chemical structure and the physicochemical properties of ILMs. Different organic bases for two common organic matrices called 2,5-dihydroxy benzoic acid (DHB) and 3,5-dimethoxy-4-hydroxycinnamic acid (Sinapinic acid, SA) were calculated. The two series showed the same profile for molar refractivity, molar volume, Parachor, index of refraction, polarizability and surface tension. However, ionic liquids based on sinapinic acid showed higher values than DHB for all parameters except index of refraction and surface tension. These parameters may explain the high performance of SA-ILs for protein analysis compared to DHB-ILs. The present results are important for one who is looking for a new design of ILMs.

离子液体基质(ILMs)对基质辅助激光解吸/电离质谱(MALDI-MS)的蛋白质分析有很大的贡献和很大的改进。这些材料的物理化学性质对了解ILM的性能和设计有效的ILM具有重要意义。本文研究了ILMs的化学结构与理化性质之间的关系。计算了2,5-二羟基苯甲酸(DHB)和3,5-二甲氧基-4-羟基肉桂酸(Sinapinic acid, SA)两种常见有机基质的不同有机碱。这两个系列在摩尔折射率、摩尔体积、帕拉索、折射率、极化率和表面张力方面表现出相同的分布。而以sinapinic酸为基础的离子液体除折射率和表面张力外，其他参数均高于DHB。这些参数可能解释了与DHB-ILs相比，SA-ILs在蛋白质分析方面的高性能。目前的结果对于寻找一种新的ilm设计的人来说是重要的。

引用次数: 11

Conceptual Aspects of Causal Networks in an Applied Context 应用环境中因果网络的概念方面

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-17 DOI: 10.4172/2153-0602.1000188

A. Yazdani, A. Yazdani, E. Boerwinkle

Making causal inference is conceptually straightforward in the setting of a randomized intervention, such as a clinical trial. However, in observational studies, which represent the majority of most large-scale epidemiologic studies, causal inference is complicated by confounding and lack of clear directionality underlying an observed association. In most large scale biomedical applications, causal inference is embodied in Directed Acyclic Graphs (DAG), which is an illustration of causal relationships (i.e., arrows) among the variables (i.e., nodes). A key concept for making causal inference in the context of observational studies is the assignment mechanism, whereby some individuals are treated and some are not. This perspective provides a structure for thinking about causal networks in the context of the assignment mechanism (AM). Estimation of effect sizes of the observed directed relationships is presented and discussed.

从概念上讲，在随机干预(如临床试验)的背景下，进行因果推理是直截了当的。然而，在大多数大规模流行病学研究的观察性研究中，由于观察到的关联存在混淆和缺乏明确的方向性，导致因果推断变得复杂。在大多数大规模生物医学应用中，因果推理体现在有向无环图(DAG)中，这是变量(即节点)之间因果关系(即箭头)的说明。在观察性研究中进行因果推理的一个关键概念是分配机制，即一些个体得到治疗，而另一些则没有。这个视角为在分配机制(AM)的背景下思考因果网络提供了一个结构。提出并讨论了观测到的有向关系的效应量的估计。

引用次数: 7

Imex Based Analysis of Repeat Sequences in Flavivirus Genomes, Including Dengue Virus 基于Imex的黄病毒基因组重复序列分析，包括登革病毒

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-06 DOI: 10.4172/2153-0602.1000187

Chaudhary Mashhood Alam, A. Iqbal, Babita Thadari, Safdar Ali

Simple sequence repeats (SSRs), also known as microsatellites, are 1-6 nucleotides repeat motif, present in varying number of iterations, across coding and non-coding regions of prokaryotes, eukaryotes and viruses. Present study focuses on simple sequence repeats (SSRs) in 27 Flavivirus genomes, which includes dengue virus. The comparative viral genomics in the light of SSRs would help us understand the diversity and adaptability to new hosts. A total of 1164 SSRs and 53 cSSRs were uncovered from the 27studied genomes. Mononucleotide A was the most prevalent repeat motif with an average distribution of around 6. This was followed by G (average distribution of 2). Amongst the dinucleotides AG/GA repeat motif was the most prevalent with an average distribution of 14 across studied genomes. The Flavivirus genomes lacked two essential features responsible for genome evolution, dinucleotide repeat motif AT/TA (least represented with average distribution of ~0.5) and cSSR in non-coding regions, suggesting a stable genome or evolution by hitherto unexplained mechanisms. The unveiling of conserved sequences in the isolates of Dengue virus suggests a basis for biomarker development for viral diagnostics.

简单序列重复序列(Simple sequence repeats, SSRs)又称微卫星，是一种1-6个核苷酸重复的基序，存在于原核生物、真核生物和病毒的编码区和非编码区，迭代次数不同。目前的研究主要集中在27个黄病毒基因组的简单序列重复序列(SSRs)上，其中包括登革热病毒。基于SSRs的比较病毒基因组学将有助于我们了解病毒的多样性和对新宿主的适应性。从27个基因组中共发现1164个SSRs和53个cSSRs。单核苷酸A是最普遍的重复基序，平均分布在6左右。其次是G(平均分布2个)。在二核苷酸中，AG/GA重复基序最为普遍，在研究的基因组中平均分布14个。黄病毒基因组缺少两个与基因组进化相关的基本特征，即二核苷酸重复基序AT/TA(最少，平均分布约0.5)和非编码区cSSR，这表明黄病毒基因组是稳定的，或通过迄今尚未解释的机制进行进化。登革病毒分离株中保守序列的揭示为开发用于病毒诊断的生物标志物提供了基础。

{"title":"Imex Based Analysis of Repeat Sequences in Flavivirus Genomes, Including Dengue Virus","authors":"Chaudhary Mashhood Alam, A. Iqbal, Babita Thadari, Safdar Ali","doi":"10.4172/2153-0602.1000187","DOIUrl":"https://doi.org/10.4172/2153-0602.1000187","url":null,"abstract":"Simple sequence repeats (SSRs), also known as microsatellites, are 1-6 nucleotides repeat motif, present in varying number of iterations, across coding and non-coding regions of prokaryotes, eukaryotes and viruses. Present study focuses on simple sequence repeats (SSRs) in 27 Flavivirus genomes, which includes dengue virus. The comparative viral genomics in the light of SSRs would help us understand the diversity and adaptability to new hosts. A total of 1164 SSRs and 53 cSSRs were uncovered from the 27studied genomes. Mononucleotide A was the most prevalent repeat motif with an average distribution of around 6. This was followed by G (average distribution of 2). Amongst the dinucleotides AG/GA repeat motif was the most prevalent with an average distribution of 14 across studied genomes. The Flavivirus genomes lacked two essential features responsible for genome evolution, dinucleotide repeat motif AT/TA (least represented with average distribution of ~0.5) and cSSR in non-coding regions, suggesting a stable genome or evolution by hitherto unexplained mechanisms. The unveiling of conserved sequences in the isolates of Dengue virus suggests a basis for biomarker development for viral diagnostics.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"6 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2016-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77044016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Family history, genetics services, and cancer risk perception at Brazilian unified health system 巴西统一卫生系统的家族史、遗传服务和癌症风险认知

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-04 DOI: 10.4172/2153-0602.S1.004

Milena FloriaSantos

引用次数: 0

Pharmacogenomics of AML: A road towards personalized medicine

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-04 DOI: 10.4172/2153-0602.S1.002

J. Lamba

Acute myeloid leukemia (AML) is the second most common form of childhood leukemia and has the worst prognosis of all major childhood cancers. Improving the treatment outcome for patients with AML remains a major clinical challenge. The nucleoside analog, cytarabine (ara-C), has been the mainstay of AML chemotherapy for more than 40 years. However, wide inter-patient variation in treatment response, development of resistance, and severe toxicity remain as major hurdles to effective ara-C chemotherapy. Ara-C is a prodrug that requires activation to ara-CTP by multiple phosphorylation steps. Incorporation of ara-CTP in place of dCTP results in chain termination, thereby blocking DNA and RNA synthesis and causing leukemic cell death. Thus, cellular pathways involved in ara-CTP formation and metabolism as well as in ara-CTP mediated cell death are likely to be significant determinants of ara-C treatment response. Inter-patient variation in relevant pharmacokinetic (PK) and pharmacodynamic (PD) genes may impact the clinical response and toxicity among patients receiving ara-C. We have evaluated genes of importance in ara-C chemotherapy and have found that genetic variation in the ara-C pathway genes had similar prognostic relevance as the well-established factors listed above. We will share our results on ara-C pharmcogenomics and its impact on clinical outcome in AM. Overall our results indicate that understanding of genetic variation in key ara-C metabolic pathway genes might be clinically relevant by providing additional explanation of the variability in clinical response beyond known prognostic factors and might have the potential of being additional prognostic markers of clinical outcome.

急性髓性白血病(AML)是第二常见的儿童白血病，是所有主要儿童癌症中预后最差的。改善AML患者的治疗结果仍然是一个重大的临床挑战。核苷类似物阿糖胞苷(ara-C)已成为AML化疗的主要药物超过40年。然而，患者之间在治疗反应、耐药性发展和严重毒性方面的广泛差异仍然是有效的ara-C化疗的主要障碍。Ara-C是一种前药，需要通过多个磷酸化步骤激活ara-CTP。用ara-CTP代替dCTP导致链终止，从而阻断DNA和RNA的合成，导致白血病细胞死亡。因此，参与ara-CTP形成和代谢以及ara-CTP介导的细胞死亡的细胞途径可能是ara-C治疗反应的重要决定因素。相关药代动力学(PK)和药效学(PD)基因的患者间差异可能影响接受ara-C的患者的临床反应和毒性。我们评估了在ara-C化疗中重要的基因，发现ara-C通路基因的遗传变异与上述已确定的因素具有类似的预后相关性。我们将分享我们在ara-C药物基因组学方面的研究结果及其对AM临床结果的影响。总的来说，我们的研究结果表明，对关键的ara-C代谢途径基因的遗传变异的理解可能具有临床相关性，因为它为临床反应的变异性提供了已知预后因素之外的额外解释，并可能成为临床结果的额外预后标志物。

{"title":"Pharmacogenomics of AML: A road towards personalized medicine","authors":"J. Lamba","doi":"10.4172/2153-0602.S1.002","DOIUrl":"https://doi.org/10.4172/2153-0602.S1.002","url":null,"abstract":"Acute myeloid leukemia (AML) is the second most common form of childhood leukemia and has the worst prognosis of all major childhood cancers. Improving the treatment outcome for patients with AML remains a major clinical challenge. The nucleoside analog, cytarabine (ara-C), has been the mainstay of AML chemotherapy for more than 40 years. However, wide inter-patient variation in treatment response, development of resistance, and severe toxicity remain as major hurdles to effective ara-C chemotherapy. Ara-C is a prodrug that requires activation to ara-CTP by multiple phosphorylation steps. Incorporation of ara-CTP in place of dCTP results in chain termination, thereby blocking DNA and RNA synthesis and causing leukemic cell death. Thus, cellular pathways involved in ara-CTP formation and metabolism as well as in ara-CTP mediated cell death are likely to be significant determinants of ara-C treatment response. Inter-patient variation in relevant pharmacokinetic (PK) and pharmacodynamic (PD) genes may impact the clinical response and toxicity among patients receiving ara-C. We have evaluated genes of importance in ara-C chemotherapy and have found that genetic variation in the ara-C pathway genes had similar prognostic relevance as the well-established factors listed above. We will share our results on ara-C pharmcogenomics and its impact on clinical outcome in AM. Overall our results indicate that understanding of genetic variation in key ara-C metabolic pathway genes might be clinically relevant by providing additional explanation of the variability in clinical response beyond known prognostic factors and might have the potential of being additional prognostic markers of clinical outcome.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"136 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72692418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Comparative genomics: From genome sequences to genome biology 比较基因组学:从基因组序列到基因组生物学

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-02-04 DOI: 10.4172/2153-0602.C1.003

Michael Y. Galperin

We have developed an invitro model system that consists of 300 human lymphoblastoid cell lines (LCLs) with extensive highthrough genomic data including genome wide SNPs, basal expression array, CpG methylation and microRNA data, together with drug cytotoxicity data to help identify candidate genes that might contribute to chemoresistance and response to targeted therapy and at the same time, to help understand the mechanisms involved in the variation in drug response. We have used this system to identify genes such as NT5C3 and FKBP5 that are important for response to cytidine analogues, gemcitabine and AraC, drugs that are commonly used to treat pancreatic, breast and hematological tumors. In addition, this system also allows us to validate biologically any signals identified during our clinical pharmacogenomic GWA studies. We have performed a GWAS using DNA samples from the largest breast cancer prevention trials, the NSABP P-1 and P-2 SERM (selective estrogen receptor modulator) prevention to identify markers that are associated with breast cancer risk after exposure to SERMs, tamoxifen and raloxifene. During the course of that study, we identified SNP signals in two genes, ZNF423 and CTSO showing that the risk alleles of these two SNP signals associated with Odds Ratio of 5.71 for breast cancer risk. Functional genomic studies further found a SNP dependent, estrogen or SERM dependent regulation of ZNF423 and CTSO expression, which is in parallel with the regulation of BRCA1 expression. These approaches using the LCL model let us to identify not only the potential candidate genes that could funciton as biomarkers for response to therapy but also help reveal novel biology underlying these genes and their involvement in regulation of important pathways in tumor growth and response to therapy.

{"title":"Comparative genomics: From genome sequences to genome biology","authors":"Michael Y. Galperin","doi":"10.4172/2153-0602.C1.003","DOIUrl":"https://doi.org/10.4172/2153-0602.C1.003","url":null,"abstract":"We have developed an invitro model system that consists of 300 human lymphoblastoid cell lines (LCLs) with extensive highthrough genomic data including genome wide SNPs, basal expression array, CpG methylation and microRNA data, together with drug cytotoxicity data to help identify candidate genes that might contribute to chemoresistance and response to targeted therapy and at the same time, to help understand the mechanisms involved in the variation in drug response. We have used this system to identify genes such as NT5C3 and FKBP5 that are important for response to cytidine analogues, gemcitabine and AraC, drugs that are commonly used to treat pancreatic, breast and hematological tumors. In addition, this system also allows us to validate biologically any signals identified during our clinical pharmacogenomic GWA studies. We have performed a GWAS using DNA samples from the largest breast cancer prevention trials, the NSABP P-1 and P-2 SERM (selective estrogen receptor modulator) prevention to identify markers that are associated with breast cancer risk after exposure to SERMs, tamoxifen and raloxifene. During the course of that study, we identified SNP signals in two genes, ZNF423 and CTSO showing that the risk alleles of these two SNP signals associated with Odds Ratio of 5.71 for breast cancer risk. Functional genomic studies further found a SNP dependent, estrogen or SERM dependent regulation of ZNF423 and CTSO expression, which is in parallel with the regulation of BRCA1 expression. These approaches using the LCL model let us to identify not only the potential candidate genes that could funciton as biomarkers for response to therapy but also help reveal novel biology underlying these genes and their involvement in regulation of important pathways in tumor growth and response to therapy.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"311 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72966226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Identification of Host-Specific Genetic Markers within 16S rDNA Intervening Sequences of 73 Genera of Fecal Bacteria 73属粪便细菌16S rDNA介入序列中宿主特异性遗传标记的鉴定

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-01-31 DOI: 10.4172/2153-0602.1000186

Zhenyu Shen, Ning Zhang, A. Mustapha, Mengshi Lin, Dong Xu, Daiyong Deng, M. Reed, Guolu Zheng

Ribosomal intervening sequences (IVSs) were recently proposed as genetic markers for microbial source tracking (MST). This study comprehensively investigated host specificities of IVSs within the 16S rDNA of 73 genera of dominant fecal bacteria using the approaches of bioinformatics and next generation sequencing (NGS). Thirteen types of IVSs were identified in silico to be associated with particular host species; they were found within bacteria of the genera Anaerovibrio, Bacteroides, Faecalibacterium, Mitsuokella, Peptostreptococcus, Phascolarctobacterium, and Subdoligranulum. Based on the DNA sequences of the thirteen types of IVSs, polymerase chain reaction (PCR) assays were developed. PCR amplifications using fecal DNA samples of target and non-target host species demonstrated that eight out of the 13 IVSs were highly associated with human, chicken/turkey, beef cattle/pig, or horse/pig/human feces. Based on the IVS polymorphisms, NGS was applied to search for single-host-associated IVSs from those linked to multiple host species. Consequently, a new type of IVS specific to beef cattle was found and confirmed by PCR amplification using cattle and non-cattle fecal samples. The results suggest that some IVSs may be used as the genetic markers for MST and that NGS may be useful in identifying novel host-specific genetic markers.

核糖体介入序列(IVSs)最近被提出作为微生物源追踪(MST)的遗传标记。本研究利用生物信息学和下一代测序(NGS)技术，全面研究了73属优势粪便细菌16S rDNA内IVSs的宿主特异性。通过计算机鉴定出13种与特定宿主物种相关的IVSs;它们被发现在厌氧菌属、拟杆菌属、粪杆菌属、Mitsuokella、胃链球菌属、Phascolarctobacterium和Subdoligranulum细菌中。根据13种IVSs的DNA序列，建立了聚合酶链反应(PCR)检测方法。利用目标和非目标宿主的粪便DNA样本进行PCR扩增表明，13种IVSs中有8种与人、鸡/火鸡、肉牛/猪或马/猪/人粪便高度相关。基于IVS多态性，应用NGS从与多个宿主物种相关的IVS中搜索单宿主相关的IVS。结果发现了一种肉牛特有的新型IVS，并通过牛和非牛粪便样本的PCR扩增得到了证实。结果表明，一些IVSs可作为MST的遗传标记，而NGS可用于鉴定新的宿主特异性遗传标记。

{"title":"Identification of Host-Specific Genetic Markers within 16S rDNA Intervening Sequences of 73 Genera of Fecal Bacteria","authors":"Zhenyu Shen, Ning Zhang, A. Mustapha, Mengshi Lin, Dong Xu, Daiyong Deng, M. Reed, Guolu Zheng","doi":"10.4172/2153-0602.1000186","DOIUrl":"https://doi.org/10.4172/2153-0602.1000186","url":null,"abstract":"Ribosomal intervening sequences (IVSs) were recently proposed as genetic markers for microbial source tracking (MST). This study comprehensively investigated host specificities of IVSs within the 16S rDNA of 73 genera of dominant fecal bacteria using the approaches of bioinformatics and next generation sequencing (NGS). Thirteen types of IVSs were identified in silico to be associated with particular host species; they were found within bacteria of the genera Anaerovibrio, Bacteroides, Faecalibacterium, Mitsuokella, Peptostreptococcus, Phascolarctobacterium, and Subdoligranulum. Based on the DNA sequences of the thirteen types of IVSs, polymerase chain reaction (PCR) assays were developed. PCR amplifications using fecal DNA samples of target and non-target host species demonstrated that eight out of the 13 IVSs were highly associated with human, chicken/turkey, beef cattle/pig, or horse/pig/human feces. Based on the IVS polymorphisms, NGS was applied to search for single-host-associated IVSs from those linked to multiple host species. Consequently, a new type of IVS specific to beef cattle was found and confirmed by PCR amplification using cattle and non-cattle fecal samples. The results suggest that some IVSs may be used as the genetic markers for MST and that NGS may be useful in identifying novel host-specific genetic markers.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"60 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2016-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90589872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Mining Datasets for Molecular Subtyping in Cancer 癌症分子分型的数据挖掘

Journal of Data Mining in Genomics & Proteomics

Pub Date : 2016-01-20 DOI: 10.4172/2153-0602.1000185

Sally Yepes, M. M. Torres

Given the heterogeneity in the clinical behavior of cancer patients with identical histopathological diagnosis, the search for unrecognized molecular subtypes, subtype-specific markers and the evaluation of their clinical-biological relevance are a necessity. This task is benefiting today from the high-throughput genomic technologies and free access to the datasets generated by the international genomic projects and the repositories of information. Machine learning strategies have proven to be useful in the identification of hidden trends in large datasets, contributing to the understanding of the molecular mechanisms and subtyping of cancer. However, the translation of new molecular subclasses and biomarkers into clinical settings requires their analytic validation and clinical trials to determine their clinical utility. Here, we provide an overview of the workflow to identify and confirm cancer subtypes, summarize a variety of methodological principles, and highlight representative studies. The generation of public big data on the most common malignancies is turning the molecular pathology into a database-driven discipline.

鉴于具有相同组织病理学诊断的癌症患者临床行为的异质性，寻找未被识别的分子亚型、亚型特异性标记物并评估其临床生物学相关性是必要的。今天，这项任务得益于高通量基因组技术和免费获取国际基因组项目和信息库生成的数据集。事实证明，机器学习策略在识别大型数据集中隐藏的趋势方面非常有用，有助于理解癌症的分子机制和亚型。然而，将新的分子亚类和生物标志物转化为临床环境需要分析验证和临床试验来确定其临床效用。在这里，我们概述了识别和确认癌症亚型的工作流程，总结了各种方法学原则，并重点介绍了具有代表性的研究。关于最常见恶性肿瘤的公共大数据的产生正在把分子病理学变成一个数据库驱动的学科。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Data Mining in Genomics & Proteomics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀