Pub Date : 2016-02-28DOI: 10.4172/2153-0602.1000192
R. G. Nair, G. Kaur, Indu Khatri, N. Singh, S. K. Maurya, Srikrishna Subramanian, A. Behera, D. Dahiya, J. Agrewala, S. Mayilraj
Coagulase-negative Staphylococci (CNS) are known to cause distinct types of infections in humans like endocarditis and urinary tract infections (UTI). Surprisingly, there is a lack of genome analysis data in literature against CNS particularly of human origin. In light of this, we performed genome mining and comparative genomic analysis of CNS strains Staphylococcus cohnii subsp. cohnii strain GM22B2, Staphylococcus equorum subsp. strain equorum G8HB1, Staphylococcus pasteuri strain BAB3 isolated from gall bladder and Staphylococcus haemolyticus strain 1HT3, Staphylococcus warneri strain 1DB1 isolated from colon. We identified 29% of shared virulence determinants in the CNS strains which involved resistance to antibiotics and toxic compounds, bacteriocins and ribosomally synthesized peptides, adhesion, invasion, intracellular resistance, prophage regions, pathogenicity islands. 10 unique virulence factors involved in adhesion, negative transcriptional regulation, resistance to copper and cadmium, phage maturation were also present in our strains. Apart from comparing the genome homology, size and G + C content, we also showed the presence 10 different CRISPR-cas genes in the CNS strains. Further, KAAS based annotation revealed the presence of CNS genes in different pathways involved in human diseases. In conclusion, this study is a first attempt to unveil the pathogenomics of CNS isolated from two distinct body organs and highlights the importance of CNS as emerging pathogens of health care sector.
{"title":"Genome Mining and Comparative Genomic Analysis of Five Coagulase- Negative Staphylococci (CNS) Isolated from Human Colon and Gall Bladder","authors":"R. G. Nair, G. Kaur, Indu Khatri, N. Singh, S. K. Maurya, Srikrishna Subramanian, A. Behera, D. Dahiya, J. Agrewala, S. Mayilraj","doi":"10.4172/2153-0602.1000192","DOIUrl":"https://doi.org/10.4172/2153-0602.1000192","url":null,"abstract":"Coagulase-negative Staphylococci (CNS) are known to cause distinct types of infections in humans like endocarditis and urinary tract infections (UTI). Surprisingly, there is a lack of genome analysis data in literature against CNS particularly of human origin. In light of this, we performed genome mining and comparative genomic analysis of CNS strains Staphylococcus cohnii subsp. cohnii strain GM22B2, Staphylococcus equorum subsp. strain equorum G8HB1, Staphylococcus pasteuri strain BAB3 isolated from gall bladder and Staphylococcus haemolyticus strain 1HT3, Staphylococcus warneri strain 1DB1 isolated from colon. We identified 29% of shared virulence determinants in the CNS strains which involved resistance to antibiotics and toxic compounds, bacteriocins and ribosomally synthesized peptides, adhesion, invasion, intracellular resistance, prophage regions, pathogenicity islands. 10 unique virulence factors involved in adhesion, negative transcriptional regulation, resistance to copper and cadmium, phage maturation were also present in our strains. Apart from comparing the genome homology, size and G + C content, we also showed the presence 10 different CRISPR-cas genes in the CNS strains. Further, KAAS based annotation revealed the presence of CNS genes in different pathways involved in human diseases. In conclusion, this study is a first attempt to unveil the pathogenomics of CNS isolated from two distinct body organs and highlights the importance of CNS as emerging pathogens of health care sector.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"37 1","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2016-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75397581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-22DOI: 10.4172/2153-0602.1000190
A. N. Al-Masri, Manal M. Nasir
Big Data analytics is one of the great challenges for Learning Machine (LM) algorithms because most real-life applications involve a massive information or big data knowledge base. By contrast, an Artificial Intelligent (AI) system with a data knowledge base should be able to compute the result in an accurate and fast manner. This study focused on the challenges and solutions of using with Big Data. Data processing is a mandatory step to transform unstructured Big Data into a meaningful and optimized data set in any LM module. However, an optimized data set must be deployed to support a distributed processing and real-time application. This work also reviewed the technologies currently used in Big Data analysis and LM computation and emphasized that the viability of using different solutions for certain applications could increase LM performance. The new development, especially in cloud computing and data transaction speed, offers significant advantages to the practical use of AI applications.
{"title":"Learning Machine Implementation for Big Data Analytics, Challenges and Solutions","authors":"A. N. Al-Masri, Manal M. Nasir","doi":"10.4172/2153-0602.1000190","DOIUrl":"https://doi.org/10.4172/2153-0602.1000190","url":null,"abstract":"Big Data analytics is one of the great challenges for Learning Machine (LM) algorithms because most real-life applications involve a massive information or big data knowledge base. By contrast, an Artificial Intelligent (AI) system with a data knowledge base should be able to compute the result in an accurate and fast manner. This study focused on the challenges and solutions of using with Big Data. Data processing is a mandatory step to transform unstructured Big Data into a meaningful and optimized data set in any LM module. However, an optimized data set must be deployed to support a distributed processing and real-time application. This work also reviewed the technologies currently used in Big Data analysis and LM computation and emphasized that the viability of using different solutions for certain applications could increase LM performance. The new development, especially in cloud computing and data transaction speed, offers significant advantages to the practical use of AI applications.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"104 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2016-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73391285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-17DOI: 10.4172/2153-0602.1000189
H. Abdelhamid
Ionic liquid matrices (ILMs) have great contributions and showed high improvements for the protein analysis using matrix assisted laser desorption/ionization mass spectrometry (MALDI-MS). The physicochemical properties of these materials are important to understand the ILM performance and design the effective ILMs. The present study represented the relationships of the chemical structure and the physicochemical properties of ILMs. Different organic bases for two common organic matrices called 2,5-dihydroxy benzoic acid (DHB) and 3,5-dimethoxy-4-hydroxycinnamic acid (Sinapinic acid, SA) were calculated. The two series showed the same profile for molar refractivity, molar volume, Parachor, index of refraction, polarizability and surface tension. However, ionic liquids based on sinapinic acid showed higher values than DHB for all parameters except index of refraction and surface tension. These parameters may explain the high performance of SA-ILs for protein analysis compared to DHB-ILs. The present results are important for one who is looking for a new design of ILMs.
{"title":"Physicochemical Properties of Proteomic Ionic Liquids Matrices for MALDI-MS","authors":"H. Abdelhamid","doi":"10.4172/2153-0602.1000189","DOIUrl":"https://doi.org/10.4172/2153-0602.1000189","url":null,"abstract":"Ionic liquid matrices (ILMs) have great contributions and showed high improvements for the protein analysis using matrix assisted laser desorption/ionization mass spectrometry (MALDI-MS). The physicochemical properties of these materials are important to understand the ILM performance and design the effective ILMs. The present study represented the relationships of the chemical structure and the physicochemical properties of ILMs. Different organic bases for two common organic matrices called 2,5-dihydroxy benzoic acid (DHB) and 3,5-dimethoxy-4-hydroxycinnamic acid (Sinapinic acid, SA) were calculated. The two series showed the same profile for molar refractivity, molar volume, Parachor, index of refraction, polarizability and surface tension. However, ionic liquids based on sinapinic acid showed higher values than DHB for all parameters except index of refraction and surface tension. These parameters may explain the high performance of SA-ILs for protein analysis compared to DHB-ILs. The present results are important for one who is looking for a new design of ILMs.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"47 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86681421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-17DOI: 10.4172/2153-0602.1000188
A. Yazdani, A. Yazdani, E. Boerwinkle
Making causal inference is conceptually straightforward in the setting of a randomized intervention, such as a clinical trial. However, in observational studies, which represent the majority of most large-scale epidemiologic studies, causal inference is complicated by confounding and lack of clear directionality underlying an observed association. In most large scale biomedical applications, causal inference is embodied in Directed Acyclic Graphs (DAG), which is an illustration of causal relationships (i.e., arrows) among the variables (i.e., nodes). A key concept for making causal inference in the context of observational studies is the assignment mechanism, whereby some individuals are treated and some are not. This perspective provides a structure for thinking about causal networks in the context of the assignment mechanism (AM). Estimation of effect sizes of the observed directed relationships is presented and discussed.
{"title":"Conceptual Aspects of Causal Networks in an Applied Context","authors":"A. Yazdani, A. Yazdani, E. Boerwinkle","doi":"10.4172/2153-0602.1000188","DOIUrl":"https://doi.org/10.4172/2153-0602.1000188","url":null,"abstract":"Making causal inference is conceptually straightforward in the setting of a randomized intervention, such as a clinical trial. However, in observational studies, which represent the majority of most large-scale epidemiologic studies, causal inference is complicated by confounding and lack of clear directionality underlying an observed association. In most large scale biomedical applications, causal inference is embodied in Directed Acyclic Graphs (DAG), which is an illustration of causal relationships (i.e., arrows) among the variables (i.e., nodes). A key concept for making causal inference in the context of observational studies is the assignment mechanism, whereby some individuals are treated and some are not. This perspective provides a structure for thinking about causal networks in the context of the assignment mechanism (AM). Estimation of effect sizes of the observed directed relationships is presented and discussed.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"45 1","pages":"1-3"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90983653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-06DOI: 10.4172/2153-0602.1000187
Chaudhary Mashhood Alam, A. Iqbal, Babita Thadari, Safdar Ali
Simple sequence repeats (SSRs), also known as microsatellites, are 1-6 nucleotides repeat motif, present in varying number of iterations, across coding and non-coding regions of prokaryotes, eukaryotes and viruses. Present study focuses on simple sequence repeats (SSRs) in 27 Flavivirus genomes, which includes dengue virus. The comparative viral genomics in the light of SSRs would help us understand the diversity and adaptability to new hosts. A total of 1164 SSRs and 53 cSSRs were uncovered from the 27studied genomes. Mononucleotide A was the most prevalent repeat motif with an average distribution of around 6. This was followed by G (average distribution of 2). Amongst the dinucleotides AG/GA repeat motif was the most prevalent with an average distribution of 14 across studied genomes. The Flavivirus genomes lacked two essential features responsible for genome evolution, dinucleotide repeat motif AT/TA (least represented with average distribution of ~0.5) and cSSR in non-coding regions, suggesting a stable genome or evolution by hitherto unexplained mechanisms. The unveiling of conserved sequences in the isolates of Dengue virus suggests a basis for biomarker development for viral diagnostics.
{"title":"Imex Based Analysis of Repeat Sequences in Flavivirus Genomes, Including Dengue Virus","authors":"Chaudhary Mashhood Alam, A. Iqbal, Babita Thadari, Safdar Ali","doi":"10.4172/2153-0602.1000187","DOIUrl":"https://doi.org/10.4172/2153-0602.1000187","url":null,"abstract":"Simple sequence repeats (SSRs), also known as microsatellites, are 1-6 nucleotides repeat motif, present in varying number of iterations, across coding and non-coding regions of prokaryotes, eukaryotes and viruses. Present study focuses on simple sequence repeats (SSRs) in 27 Flavivirus genomes, which includes dengue virus. The comparative viral genomics in the light of SSRs would help us understand the diversity and adaptability to new hosts. A total of 1164 SSRs and 53 cSSRs were uncovered from the 27studied genomes. Mononucleotide A was the most prevalent repeat motif with an average distribution of around 6. This was followed by G (average distribution of 2). Amongst the dinucleotides AG/GA repeat motif was the most prevalent with an average distribution of 14 across studied genomes. The Flavivirus genomes lacked two essential features responsible for genome evolution, dinucleotide repeat motif AT/TA (least represented with average distribution of ~0.5) and cSSR in non-coding regions, suggesting a stable genome or evolution by hitherto unexplained mechanisms. The unveiling of conserved sequences in the isolates of Dengue virus suggests a basis for biomarker development for viral diagnostics.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"6 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2016-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77044016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-04DOI: 10.4172/2153-0602.S1.004
Milena FloriaSantos
{"title":"Family history, genetics services, and cancer risk perception at Brazilian unified health system","authors":"Milena FloriaSantos","doi":"10.4172/2153-0602.S1.004","DOIUrl":"https://doi.org/10.4172/2153-0602.S1.004","url":null,"abstract":"","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72637484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-04DOI: 10.4172/2153-0602.S1.002
J. Lamba
Acute myeloid leukemia (AML) is the second most common form of childhood leukemia and has the worst prognosis of all major childhood cancers. Improving the treatment outcome for patients with AML remains a major clinical challenge. The nucleoside analog, cytarabine (ara-C), has been the mainstay of AML chemotherapy for more than 40 years. However, wide inter-patient variation in treatment response, development of resistance, and severe toxicity remain as major hurdles to effective ara-C chemotherapy. Ara-C is a prodrug that requires activation to ara-CTP by multiple phosphorylation steps. Incorporation of ara-CTP in place of dCTP results in chain termination, thereby blocking DNA and RNA synthesis and causing leukemic cell death. Thus, cellular pathways involved in ara-CTP formation and metabolism as well as in ara-CTP mediated cell death are likely to be significant determinants of ara-C treatment response. Inter-patient variation in relevant pharmacokinetic (PK) and pharmacodynamic (PD) genes may impact the clinical response and toxicity among patients receiving ara-C. We have evaluated genes of importance in ara-C chemotherapy and have found that genetic variation in the ara-C pathway genes had similar prognostic relevance as the well-established factors listed above. We will share our results on ara-C pharmcogenomics and its impact on clinical outcome in AM. Overall our results indicate that understanding of genetic variation in key ara-C metabolic pathway genes might be clinically relevant by providing additional explanation of the variability in clinical response beyond known prognostic factors and might have the potential of being additional prognostic markers of clinical outcome.
{"title":"Pharmacogenomics of AML: A road towards personalized medicine","authors":"J. Lamba","doi":"10.4172/2153-0602.S1.002","DOIUrl":"https://doi.org/10.4172/2153-0602.S1.002","url":null,"abstract":"Acute myeloid leukemia (AML) is the second most common form of childhood leukemia and has the worst prognosis of all major childhood cancers. Improving the treatment outcome for patients with AML remains a major clinical challenge. The nucleoside analog, cytarabine (ara-C), has been the mainstay of AML chemotherapy for more than 40 years. However, wide inter-patient variation in treatment response, development of resistance, and severe toxicity remain as major hurdles to effective ara-C chemotherapy. Ara-C is a prodrug that requires activation to ara-CTP by multiple phosphorylation steps. Incorporation of ara-CTP in place of dCTP results in chain termination, thereby blocking DNA and RNA synthesis and causing leukemic cell death. Thus, cellular pathways involved in ara-CTP formation and metabolism as well as in ara-CTP mediated cell death are likely to be significant determinants of ara-C treatment response. Inter-patient variation in relevant pharmacokinetic (PK) and pharmacodynamic (PD) genes may impact the clinical response and toxicity among patients receiving ara-C. We have evaluated genes of importance in ara-C chemotherapy and have found that genetic variation in the ara-C pathway genes had similar prognostic relevance as the well-established factors listed above. We will share our results on ara-C pharmcogenomics and its impact on clinical outcome in AM. Overall our results indicate that understanding of genetic variation in key ara-C metabolic pathway genes might be clinically relevant by providing additional explanation of the variability in clinical response beyond known prognostic factors and might have the potential of being additional prognostic markers of clinical outcome.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"136 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72692418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-02-04DOI: 10.4172/2153-0602.C1.003
Michael Y. Galperin
We have developed an invitro model system that consists of 300 human lymphoblastoid cell lines (LCLs) with extensive highthrough genomic data including genome wide SNPs, basal expression array, CpG methylation and microRNA data, together with drug cytotoxicity data to help identify candidate genes that might contribute to chemoresistance and response to targeted therapy and at the same time, to help understand the mechanisms involved in the variation in drug response. We have used this system to identify genes such as NT5C3 and FKBP5 that are important for response to cytidine analogues, gemcitabine and AraC, drugs that are commonly used to treat pancreatic, breast and hematological tumors. In addition, this system also allows us to validate biologically any signals identified during our clinical pharmacogenomic GWA studies. We have performed a GWAS using DNA samples from the largest breast cancer prevention trials, the NSABP P-1 and P-2 SERM (selective estrogen receptor modulator) prevention to identify markers that are associated with breast cancer risk after exposure to SERMs, tamoxifen and raloxifene. During the course of that study, we identified SNP signals in two genes, ZNF423 and CTSO showing that the risk alleles of these two SNP signals associated with Odds Ratio of 5.71 for breast cancer risk. Functional genomic studies further found a SNP dependent, estrogen or SERM dependent regulation of ZNF423 and CTSO expression, which is in parallel with the regulation of BRCA1 expression. These approaches using the LCL model let us to identify not only the potential candidate genes that could funciton as biomarkers for response to therapy but also help reveal novel biology underlying these genes and their involvement in regulation of important pathways in tumor growth and response to therapy.
{"title":"Comparative genomics: From genome sequences to genome biology","authors":"Michael Y. Galperin","doi":"10.4172/2153-0602.C1.003","DOIUrl":"https://doi.org/10.4172/2153-0602.C1.003","url":null,"abstract":"We have developed an invitro model system that consists of 300 human lymphoblastoid cell lines (LCLs) with extensive highthrough genomic data including genome wide SNPs, basal expression array, CpG methylation and microRNA data, together with drug cytotoxicity data to help identify candidate genes that might contribute to chemoresistance and response to targeted therapy and at the same time, to help understand the mechanisms involved in the variation in drug response. We have used this system to identify genes such as NT5C3 and FKBP5 that are important for response to cytidine analogues, gemcitabine and AraC, drugs that are commonly used to treat pancreatic, breast and hematological tumors. In addition, this system also allows us to validate biologically any signals identified during our clinical pharmacogenomic GWA studies. We have performed a GWAS using DNA samples from the largest breast cancer prevention trials, the NSABP P-1 and P-2 SERM (selective estrogen receptor modulator) prevention to identify markers that are associated with breast cancer risk after exposure to SERMs, tamoxifen and raloxifene. During the course of that study, we identified SNP signals in two genes, ZNF423 and CTSO showing that the risk alleles of these two SNP signals associated with Odds Ratio of 5.71 for breast cancer risk. Functional genomic studies further found a SNP dependent, estrogen or SERM dependent regulation of ZNF423 and CTSO expression, which is in parallel with the regulation of BRCA1 expression. These approaches using the LCL model let us to identify not only the potential candidate genes that could funciton as biomarkers for response to therapy but also help reveal novel biology underlying these genes and their involvement in regulation of important pathways in tumor growth and response to therapy.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"311 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72966226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-01-31DOI: 10.4172/2153-0602.1000186
Zhenyu Shen, Ning Zhang, A. Mustapha, Mengshi Lin, Dong Xu, Daiyong Deng, M. Reed, Guolu Zheng
Ribosomal intervening sequences (IVSs) were recently proposed as genetic markers for microbial source tracking (MST). This study comprehensively investigated host specificities of IVSs within the 16S rDNA of 73 genera of dominant fecal bacteria using the approaches of bioinformatics and next generation sequencing (NGS). Thirteen types of IVSs were identified in silico to be associated with particular host species; they were found within bacteria of the genera Anaerovibrio, Bacteroides, Faecalibacterium, Mitsuokella, Peptostreptococcus, Phascolarctobacterium, and Subdoligranulum. Based on the DNA sequences of the thirteen types of IVSs, polymerase chain reaction (PCR) assays were developed. PCR amplifications using fecal DNA samples of target and non-target host species demonstrated that eight out of the 13 IVSs were highly associated with human, chicken/turkey, beef cattle/pig, or horse/pig/human feces. Based on the IVS polymorphisms, NGS was applied to search for single-host-associated IVSs from those linked to multiple host species. Consequently, a new type of IVS specific to beef cattle was found and confirmed by PCR amplification using cattle and non-cattle fecal samples. The results suggest that some IVSs may be used as the genetic markers for MST and that NGS may be useful in identifying novel host-specific genetic markers.
{"title":"Identification of Host-Specific Genetic Markers within 16S rDNA Intervening Sequences of 73 Genera of Fecal Bacteria","authors":"Zhenyu Shen, Ning Zhang, A. Mustapha, Mengshi Lin, Dong Xu, Daiyong Deng, M. Reed, Guolu Zheng","doi":"10.4172/2153-0602.1000186","DOIUrl":"https://doi.org/10.4172/2153-0602.1000186","url":null,"abstract":"Ribosomal intervening sequences (IVSs) were recently proposed as genetic markers for microbial source tracking (MST). This study comprehensively investigated host specificities of IVSs within the 16S rDNA of 73 genera of dominant fecal bacteria using the approaches of bioinformatics and next generation sequencing (NGS). Thirteen types of IVSs were identified in silico to be associated with particular host species; they were found within bacteria of the genera Anaerovibrio, Bacteroides, Faecalibacterium, Mitsuokella, Peptostreptococcus, Phascolarctobacterium, and Subdoligranulum. Based on the DNA sequences of the thirteen types of IVSs, polymerase chain reaction (PCR) assays were developed. PCR amplifications using fecal DNA samples of target and non-target host species demonstrated that eight out of the 13 IVSs were highly associated with human, chicken/turkey, beef cattle/pig, or horse/pig/human feces. Based on the IVS polymorphisms, NGS was applied to search for single-host-associated IVSs from those linked to multiple host species. Consequently, a new type of IVS specific to beef cattle was found and confirmed by PCR amplification using cattle and non-cattle fecal samples. The results suggest that some IVSs may be used as the genetic markers for MST and that NGS may be useful in identifying novel host-specific genetic markers.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"60 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2016-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90589872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-01-20DOI: 10.4172/2153-0602.1000185
Sally Yepes, M. M. Torres
Given the heterogeneity in the clinical behavior of cancer patients with identical histopathological diagnosis, the search for unrecognized molecular subtypes, subtype-specific markers and the evaluation of their clinical-biological relevance are a necessity. This task is benefiting today from the high-throughput genomic technologies and free access to the datasets generated by the international genomic projects and the repositories of information. Machine learning strategies have proven to be useful in the identification of hidden trends in large datasets, contributing to the understanding of the molecular mechanisms and subtyping of cancer. However, the translation of new molecular subclasses and biomarkers into clinical settings requires their analytic validation and clinical trials to determine their clinical utility. Here, we provide an overview of the workflow to identify and confirm cancer subtypes, summarize a variety of methodological principles, and highlight representative studies. The generation of public big data on the most common malignancies is turning the molecular pathology into a database-driven discipline.
{"title":"Mining Datasets for Molecular Subtyping in Cancer","authors":"Sally Yepes, M. M. Torres","doi":"10.4172/2153-0602.1000185","DOIUrl":"https://doi.org/10.4172/2153-0602.1000185","url":null,"abstract":"Given the heterogeneity in the clinical behavior of cancer patients with identical histopathological diagnosis, the search for unrecognized molecular subtypes, subtype-specific markers and the evaluation of their clinical-biological relevance are a necessity. This task is benefiting today from the high-throughput genomic technologies and free access to the datasets generated by the international genomic projects and the repositories of information. Machine learning strategies have proven to be useful in the identification of hidden trends in large datasets, contributing to the understanding of the molecular mechanisms and subtyping of cancer. However, the translation of new molecular subclasses and biomarkers into clinical settings requires their analytic validation and clinical trials to determine their clinical utility. Here, we provide an overview of the workflow to identify and confirm cancer subtypes, summarize a variety of methodological principles, and highlight representative studies. The generation of public big data on the most common malignancies is turning the molecular pathology into a database-driven discipline.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"6 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2016-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91079051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}