Background: Numerous biomedical software applications access databases maintained by the US National Center for Biotechnology Information (NCBI). To ease software automation, NCBI provides a powerful but complex Web-service-based programming interface, eUtils. This paper describes a toolset that simplifies eUtils use through a graphical front-end that can be used by non-programmers to construct data-extraction pipelines. The front-end relies on a code library that provides high-level wrappers around eUtils functions, and which is distributed as open-source, allowing customization and enhancement by individuals with programming skills.
Methods: We initially created an application that queried eUtils to retrieve nephrology-specific biomedical literature citations for a user-definable set of genes. We later augmented the application code to create a general-purpose library that accesses eUtils capability as individual functions that could be combined into user-defined pipelines.
Results: The toolset's use is illustrated with an application that serves as a front-end to the library and can be used by non-programmers to construct user-defined pipelines. The operation of the library is illustrated for the literature-surveillance application, which serves as a case-study. An overview of the library is also provided.
Conclusions: The library simplifies use of the eUtils service by operating at a higher level, and also transparently addresses robustness issues that would need to be individually implemented otherwise, such as error recovery and prevention of overloading of the eUtils service.
{"title":"An eUtils toolset and its use for creating a pipeline to link genomics and proteomics analyses to domain-specific biomedical literature.","authors":"Prakash M Nadkarni, Chirag R Parikh","doi":"10.1186/2043-9113-2-9","DOIUrl":"https://doi.org/10.1186/2043-9113-2-9","url":null,"abstract":"<p><strong>Background: </strong>Numerous biomedical software applications access databases maintained by the US National Center for Biotechnology Information (NCBI). To ease software automation, NCBI provides a powerful but complex Web-service-based programming interface, eUtils. This paper describes a toolset that simplifies eUtils use through a graphical front-end that can be used by non-programmers to construct data-extraction pipelines. The front-end relies on a code library that provides high-level wrappers around eUtils functions, and which is distributed as open-source, allowing customization and enhancement by individuals with programming skills.</p><p><strong>Methods: </strong>We initially created an application that queried eUtils to retrieve nephrology-specific biomedical literature citations for a user-definable set of genes. We later augmented the application code to create a general-purpose library that accesses eUtils capability as individual functions that could be combined into user-defined pipelines.</p><p><strong>Results: </strong>The toolset's use is illustrated with an application that serves as a front-end to the library and can be used by non-programmers to construct user-defined pipelines. The operation of the library is illustrated for the literature-surveillance application, which serves as a case-study. An overview of the library is also provided.</p><p><strong>Conclusions: </strong>The library simplifies use of the eUtils service by operating at a higher level, and also transparently addresses robustness issues that would need to be individually implemented otherwise, such as error recovery and prevention of overloading of the eUtils service.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30577214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: 3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases mediated by 3D domain swapping in the human proteome.
Methods: We used a panel of four enrichment tools with two different ontologies and two annotations database to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant associations with proteins in swapped conformation and various disease categories (P-value < 0.05).
Results: We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D domain swapping and discuss new insights about the functional repertoire, pathway associations and disease implications of proteins involved in 3D domain swapping.
Conclusions: Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. GO term enrichment were used to infer terms associated with three different GO categories. Protein domain enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of disease were found to be associated with 3D domain swapping using functional disease ontology based enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance the understanding of functional and disease implications of 3D domain swapping.
{"title":"Functional repertoire, molecular pathways and diseases associated with 3D domain swapping in the human proteome.","authors":"Khader Shameer, Ramanathan Sowdhamini","doi":"10.1186/2043-9113-2-8","DOIUrl":"https://doi.org/10.1186/2043-9113-2-8","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases mediated by 3D domain swapping in the human proteome.</p><p><strong>Methods: </strong>We used a panel of four enrichment tools with two different ontologies and two annotations database to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant associations with proteins in swapped conformation and various disease categories (P-value < 0.05).</p><p><strong>Results: </strong>We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D domain swapping and discuss new insights about the functional repertoire, pathway associations and disease implications of proteins involved in 3D domain swapping.</p><p><strong>Conclusions: </strong>Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. GO term enrichment were used to infer terms associated with three different GO categories. Protein domain enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of disease were found to be associated with 3D domain swapping using functional disease ontology based enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance the understanding of functional and disease implications of 3D domain swapping.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30549800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco J Azuaje, Michaël Heymann, Anne-Marie Ternes, Anke Wienecke-Baldacchino, Daniel Struck, Danièle Moes, Reinhard Schneider
The 6th Benelux Bioinformatics Conference (BBC11) held in Luxembourg on 12 and 13 December 2011 attracted around 200 participants, including internationally-renowned guest speakers and more than 100 peer-reviewed submissions from 3 continents. Researchers from the public and private sectors convened at BBC11 to discuss advances and challenges in a wide spectrum of application areas. A key theme of the conference was the contribution of bioinformatics to enable and accelerate translational and clinical research. The BBC11 stressed the need for stronger collaborating efforts across disciplines and institutions. The demonstration of the clinical relevance of systems approaches and of next-generation sequencing-based measurement technologies are among the existing opportunities for increasing impact in translational research. Translational bioinformatics will benefit from research models that strike a balance between the importance of protecting intellectual property and the need to openly access scientific and technological advances. The full conference proceedings are freely available at http://www.bbc11.lu.
{"title":"Bioinformatics as a driver, not a passenger, of translational biomedical research: Perspectives from the 6th Benelux bioinformatics conference.","authors":"Francisco J Azuaje, Michaël Heymann, Anne-Marie Ternes, Anke Wienecke-Baldacchino, Daniel Struck, Danièle Moes, Reinhard Schneider","doi":"10.1186/2043-9113-2-7","DOIUrl":"https://doi.org/10.1186/2043-9113-2-7","url":null,"abstract":"<p><p> The 6th Benelux Bioinformatics Conference (BBC11) held in Luxembourg on 12 and 13 December 2011 attracted around 200 participants, including internationally-renowned guest speakers and more than 100 peer-reviewed submissions from 3 continents. Researchers from the public and private sectors convened at BBC11 to discuss advances and challenges in a wide spectrum of application areas. A key theme of the conference was the contribution of bioinformatics to enable and accelerate translational and clinical research. The BBC11 stressed the need for stronger collaborating efforts across disciplines and institutions. The demonstration of the clinical relevance of systems approaches and of next-generation sequencing-based measurement technologies are among the existing opportunities for increasing impact in translational research. Translational bioinformatics will benefit from research models that strike a balance between the importance of protecting intellectual property and the need to openly access scientific and technological advances. The full conference proceedings are freely available at http://www.bbc11.lu.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2012-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40160168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Peripheral arterial disease (PAD) is a relatively common manifestation of systemic atherosclerosis that leads to progressive narrowing of the lumen of leg arteries. Circulating monocytes are in contact with the arterial wall and can serve as reporters of vascular pathology in the setting of PAD. We performed gene expression analysis of peripheral blood mononuclear cells (PBMC) in patients with PAD and controls without PAD to identify differentially regulated genes.
Methods: PAD was defined as an ankle brachial index (ABI) ≤0.9 (n = 19) while age and gender matched controls had an ABI > 1.0 (n = 18). Microarray analysis was performed using Affymetrix HG-U133 plus 2.0 gene chips and analyzed using GeneSpring GX 11.0. Gene expression data was normalized using Robust Multichip Analysis (RMA) normalization method, differential expression was defined as a fold change ≥1.5, followed by unpaired Mann-Whitney test (P < 0.05) and correction for multiple testing by Benjamini and Hochberg False Discovery Rate. Meta-analysis of differentially expressed genes was performed using an integrated bioinformatics pipeline with tools for enrichment analysis using Gene Ontology (GO) terms, pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG), molecular event enrichment using Reactome annotations and network analysis using Ingenuity Pathway Analysis suite. Extensive biocuration was also performed to understand the functional context of genes.
Results: We identified 87 genes differentially expressed in the setting of PAD; 40 genes were upregulated and 47 genes were downregulated. We employed an integrated bioinformatics pipeline coupled with literature curation to characterize the functional coherence of differentially regulated genes.
Conclusion: Notably, upregulated genes mediate immune response, inflammation, apoptosis, stress response, phosphorylation, hemostasis, platelet activation and platelet aggregation. Downregulated genes included several genes from the zinc finger family that are involved in transcriptional regulation. These results provide insights into molecular mechanisms relevant to the pathophysiology of PAD.
背景:外周动脉疾病(PAD)是全身性动脉粥样硬化的一种相对常见的表现形式,会导致腿部动脉管腔逐渐变窄。循环中的单核细胞与动脉壁接触,可作为 PAD 病变的血管病理报告物。我们对 PAD 患者和未患 PAD 的对照组的外周血单核细胞(PBMC)进行了基因表达分析,以确定受不同调控的基因:PAD的定义是踝肱指数(ABI)≤0.9(n = 19),而年龄和性别匹配的对照组的ABI>1.0(n = 18)。使用 Affymetrix HG-U133 plus 2.0 基因芯片进行微阵列分析,并使用 GeneSpring GX 11.0 进行分析。基因表达数据采用 Robust Multichip Analysis (RMA) 归一化方法进行归一化,差异表达定义为折合变化≥1.5,然后进行非配对 Mann-Whitney 检验(P < 0.05),并用 Benjamini 和 Hochberg 假发现率校正多重检验。差异表达基因的元分析是利用集成生物信息学管道进行的,该管道包括利用基因本体(GO)术语进行富集分析的工具、利用京都基因和基因组百科全书(KEGG)进行通路分析的工具、利用 Reactome 注释进行分子事件富集的工具以及利用 Ingenuity Pathway Analysis 套件进行网络分析的工具。为了了解基因的功能背景,还进行了广泛的生物组学分析:结果:我们发现了 87 个在 PAD 环境中差异表达的基因,其中 40 个基因上调,47 个基因下调。我们采用了一个综合生物信息学管道,并结合文献整理来描述差异调控基因的功能一致性:结论:值得注意的是,上调基因介导免疫反应、炎症、细胞凋亡、应激反应、磷酸化、止血、血小板活化和血小板聚集。下调基因包括参与转录调控的锌指家族的几个基因。这些结果提供了与 PAD 病理生理学相关的分子机制的见解。
{"title":"Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease.","authors":"Rizwan Masud, Khader Shameer, Aparna Dhar, Keyue Ding, Iftikhar J Kullo","doi":"10.1186/2043-9113-2-6","DOIUrl":"10.1186/2043-9113-2-6","url":null,"abstract":"<p><strong>Background: </strong>Peripheral arterial disease (PAD) is a relatively common manifestation of systemic atherosclerosis that leads to progressive narrowing of the lumen of leg arteries. Circulating monocytes are in contact with the arterial wall and can serve as reporters of vascular pathology in the setting of PAD. We performed gene expression analysis of peripheral blood mononuclear cells (PBMC) in patients with PAD and controls without PAD to identify differentially regulated genes.</p><p><strong>Methods: </strong>PAD was defined as an ankle brachial index (ABI) ≤0.9 (n = 19) while age and gender matched controls had an ABI > 1.0 (n = 18). Microarray analysis was performed using Affymetrix HG-U133 plus 2.0 gene chips and analyzed using GeneSpring GX 11.0. Gene expression data was normalized using Robust Multichip Analysis (RMA) normalization method, differential expression was defined as a fold change ≥1.5, followed by unpaired Mann-Whitney test (P < 0.05) and correction for multiple testing by Benjamini and Hochberg False Discovery Rate. Meta-analysis of differentially expressed genes was performed using an integrated bioinformatics pipeline with tools for enrichment analysis using Gene Ontology (GO) terms, pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG), molecular event enrichment using Reactome annotations and network analysis using Ingenuity Pathway Analysis suite. Extensive biocuration was also performed to understand the functional context of genes.</p><p><strong>Results: </strong>We identified 87 genes differentially expressed in the setting of PAD; 40 genes were upregulated and 47 genes were downregulated. We employed an integrated bioinformatics pipeline coupled with literature curation to characterize the functional coherence of differentially regulated genes.</p><p><strong>Conclusion: </strong>Notably, upregulated genes mediate immune response, inflammation, apoptosis, stress response, phosphorylation, hemostasis, platelet activation and platelet aggregation. Downregulated genes included several genes from the zinc finger family that are involved in transcriptional regulation. These results provide insights into molecular mechanisms relevant to the pathophysiology of PAD.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2012-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3381689/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40157000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: In the biological sciences the TCID50 (median tissue culture infective dose) assay is often used to determine the strength of a virus.
Methods: When the so-called Spearman-Kaerber calculation is used, the ratio between the pfu (the number of plaque forming units, the effective number of virus particles) and the TCID50, theoretically approaches a simple function of Eulers constant. Further, the standard deviation of the logarithm of the TCID50 approaches a simple function of the dilution factor and the number of wells used for determining the ratios in the assay. However, these theoretical calculations assume that the dilutions of the assay are independent, and in practice this is not completely correct. The assay was simulated using Monte Carlo techniques.
Results: Our simulation studies show that the theoretical results actually hold true for practical implementations of the assay. Furthermore, the simulation studies show that the distribution of the (the log of) TCID50, although discrete in nature, has a close relationship to the normal distribution.
Conclusion: The pfu is proportional to the TCID50 titre with a factor of about 0.56 when using the Spearman-Kaerber calculation method. The normal distribution can be used for statistical inferences and ANOVA on the (the log of) TCID50 values is meaningful with group sizes of 5 and above.
{"title":"Monte Carlo simulation of the Spearman-Kaerber TCID50.","authors":"Niels H Wulff, Maria Tzatzaris, Philip J Young","doi":"10.1186/2043-9113-2-5","DOIUrl":"https://doi.org/10.1186/2043-9113-2-5","url":null,"abstract":"<p><strong>Background: </strong>In the biological sciences the TCID50 (median tissue culture infective dose) assay is often used to determine the strength of a virus.</p><p><strong>Methods: </strong>When the so-called Spearman-Kaerber calculation is used, the ratio between the pfu (the number of plaque forming units, the effective number of virus particles) and the TCID50, theoretically approaches a simple function of Eulers constant. Further, the standard deviation of the logarithm of the TCID50 approaches a simple function of the dilution factor and the number of wells used for determining the ratios in the assay. However, these theoretical calculations assume that the dilutions of the assay are independent, and in practice this is not completely correct. The assay was simulated using Monte Carlo techniques.</p><p><strong>Results: </strong>Our simulation studies show that the theoretical results actually hold true for practical implementations of the assay. Furthermore, the simulation studies show that the distribution of the (the log of) TCID50, although discrete in nature, has a close relationship to the normal distribution.</p><p><strong>Conclusion: </strong>The pfu is proportional to the TCID50 titre with a factor of about 0.56 when using the Spearman-Kaerber calculation method. The normal distribution can be used for statistical inferences and ANOVA on the (the log of) TCID50 values is meaningful with group sizes of 5 and above.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2012-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30457137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuela Hische, Abdelhalim Larhlimi, Franziska Schwarz, Antje Fischer-Rosinský, Thomas Bobbert, Anke Assmann, Gareth S Catchpole, Andreas Fh Pfeiffer, Lothar Willmitzer, Joachim Selbig, Joachim Spranger
Background: High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.
Methods: We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.
Results: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.
Conclusions: We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.
{"title":"A distinct metabolic signature predicts development of fasting plasma glucose.","authors":"Manuela Hische, Abdelhalim Larhlimi, Franziska Schwarz, Antje Fischer-Rosinský, Thomas Bobbert, Anke Assmann, Gareth S Catchpole, Andreas Fh Pfeiffer, Lothar Willmitzer, Joachim Selbig, Joachim Spranger","doi":"10.1186/2043-9113-2-3","DOIUrl":"https://doi.org/10.1186/2043-9113-2-3","url":null,"abstract":"<p><strong>Background: </strong>High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.</p><p><strong>Methods: </strong>We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.</p><p><strong>Results: </strong>We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.</p><p><strong>Conclusions: </strong>We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2012-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30432625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vladimir Lazarevic, Katrine Whiteson, Nadia Gaïa, Yann Gizard, David Hernandez, Laurent Farinelli, Magne Osterås, Patrice François, Jacques Schrenzel
Background: The salivary microbiota is a potential diagnostic indicator of several diseases. Culture-independent techniques are required to study the salivary microbial community since many of its members have not been cultivated.
Methods: We explored the bacterial community composition in the saliva sample using metagenomic whole genome shotgun (WGS) sequencing, the extraction of 16S rRNA gene fragments from metagenomic sequences (16S-WGS) and high-throughput sequencing of PCR-amplified bacterial 16S rDNA gene (16S-HTS) regions V1 and V3.
Results: The hierarchical clustering of data based on the relative abundance of bacterial genera revealed that distances between 16S-HTS datasets for V1 and V3 regions were greater than those obtained for the same V region with different numbers of PCR cycles. Datasets generated by 16S-HTS and 16S-WGS were even more distant. Finally, comparison of WGS and 16S-based datasets revealed the highest dissimilarity.The analysis of the 16S-HTS, WGS and 16S-WGS datasets revealed 206, 56 and 39 bacterial genera, respectively, 124 of which have not been previously identified in salivary microbiomes. A large fraction of DNA extracted from saliva corresponded to human DNA. Based on sequence similarity search against completely sequenced genomes, bacterial and viral sequences represented 0.73% and 0.0036% of the salivary metagenome, respectively. Several sequence reads were identified as parts of the human herpesvirus 7.
Conclusions: Analysis of the salivary metagenome may have implications in diagnostics e.g. in detection of microorganisms and viruses without designing specific tests for each pathogen.
{"title":"Analysis of the salivary microbiome using culture-independent techniques.","authors":"Vladimir Lazarevic, Katrine Whiteson, Nadia Gaïa, Yann Gizard, David Hernandez, Laurent Farinelli, Magne Osterås, Patrice François, Jacques Schrenzel","doi":"10.1186/2043-9113-2-4","DOIUrl":"https://doi.org/10.1186/2043-9113-2-4","url":null,"abstract":"<p><strong>Background: </strong>The salivary microbiota is a potential diagnostic indicator of several diseases. Culture-independent techniques are required to study the salivary microbial community since many of its members have not been cultivated.</p><p><strong>Methods: </strong>We explored the bacterial community composition in the saliva sample using metagenomic whole genome shotgun (WGS) sequencing, the extraction of 16S rRNA gene fragments from metagenomic sequences (16S-WGS) and high-throughput sequencing of PCR-amplified bacterial 16S rDNA gene (16S-HTS) regions V1 and V3.</p><p><strong>Results: </strong>The hierarchical clustering of data based on the relative abundance of bacterial genera revealed that distances between 16S-HTS datasets for V1 and V3 regions were greater than those obtained for the same V region with different numbers of PCR cycles. Datasets generated by 16S-HTS and 16S-WGS were even more distant. Finally, comparison of WGS and 16S-based datasets revealed the highest dissimilarity.The analysis of the 16S-HTS, WGS and 16S-WGS datasets revealed 206, 56 and 39 bacterial genera, respectively, 124 of which have not been previously identified in salivary microbiomes. A large fraction of DNA extracted from saliva corresponded to human DNA. Based on sequence similarity search against completely sequenced genomes, bacterial and viral sequences represented 0.73% and 0.0036% of the salivary metagenome, respectively. Several sequence reads were identified as parts of the human herpesvirus 7.</p><p><strong>Conclusions: </strong>Analysis of the salivary metagenome may have implications in diagnostics e.g. in detection of microorganisms and viruses without designing specific tests for each pathogen.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2012-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30432034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raffaele Fronza, Michele Tramonti, William R Atchley, Christine Nardini
Translational and evidence based medicine can take advantage of biotechnology advances that offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. The clinical information hidden in these data can be clarified with clinical bioinformatics approaches. We have recently proposed a method to analyze different layers of high-throughput (omic) data to preserve the emergent properties that appear in the cellular system when all molecular levels are interacting. We show here that this method applied to brain cancer data can uncover properties (i.e. molecules related to protective versus risky features in different types of brain cancers) that have been independently validated as survival markers, with potential important application in clinical practice.
{"title":"Brain cancer prognosis: independent validation of a clinical bioinformatics approach.","authors":"Raffaele Fronza, Michele Tramonti, William R Atchley, Christine Nardini","doi":"10.1186/2043-9113-2-2","DOIUrl":"https://doi.org/10.1186/2043-9113-2-2","url":null,"abstract":"<p><p> Translational and evidence based medicine can take advantage of biotechnology advances that offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. The clinical information hidden in these data can be clarified with clinical bioinformatics approaches. We have recently proposed a method to analyze different layers of high-throughput (omic) data to preserve the emergent properties that appear in the cellular system when all molecular levels are interacting. We show here that this method applied to brain cancer data can uncover properties (i.e. molecules related to protective versus risky features in different types of brain cancers) that have been independently validated as survival markers, with potential important application in clinical practice.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 ","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2012-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30430227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Systematic approach for drug discovery is an emerging discipline in systems biology research area. It aims at integrating interaction data and experimental data to elucidate diseases and also raises new issues in drug discovery for cancer treatment. However, drug target discovery is still at a trial-and-error experimental stage and it is a challenging task to develop a prediction model that can systematically detect possible drug targets to deal with complex diseases.
Methods: We integrate gene expression, disease genes and interaction networks to identify the effective drug targets which have a strong influence on disease genes using network flow approach. In the experiments, we adopt the microarray dataset containing 62 prostate cancer samples and 41 normal samples, 108 known prostate cancer genes and 322 approved drug targets treated in human extracted from DrugBank database to be candidate proteins as our test data. Using our method, we prioritize the candidate proteins and validate them to the known prostate cancer drug targets.
Results: We successfully identify potential drug targets which are strongly related to the well known drugs for prostate cancer treatment and also discover more potential drug targets which raise the attention to biologists at present. We denote that it is hard to discover drug targets based only on differential expression changes due to the fact that those genes used to be drug targets may not always have significant expression changes. Comparing to previous methods that depend on the network topology attributes, they turn out that the genes having potential as drug targets are weakly correlated to critical points in a network. In comparison with previous methods, our results have highest mean average precision and also rank the position of the truly drug targets higher. It thereby verifies the effectiveness of our method.
Conclusions: Our method does not know the real ideal routes in the disease network but it tries to find the feasible flow to give a strong influence to the disease genes through possible paths. We successfully formulate the identification of drug target prediction as a maximum flow problem on biological networks and discover potential drug targets in an accurate manner.
{"title":"A network flow approach to predict drug targets from microarray data, disease genes and interactome network - case study on prostate cancer.","authors":"Shih-Heng Yeh, Hsiang-Yuan Yeh, Von-Wun Soo","doi":"10.1186/2043-9113-2-1","DOIUrl":"10.1186/2043-9113-2-1","url":null,"abstract":"<p><strong>Background: </strong>Systematic approach for drug discovery is an emerging discipline in systems biology research area. It aims at integrating interaction data and experimental data to elucidate diseases and also raises new issues in drug discovery for cancer treatment. However, drug target discovery is still at a trial-and-error experimental stage and it is a challenging task to develop a prediction model that can systematically detect possible drug targets to deal with complex diseases.</p><p><strong>Methods: </strong>We integrate gene expression, disease genes and interaction networks to identify the effective drug targets which have a strong influence on disease genes using network flow approach. In the experiments, we adopt the microarray dataset containing 62 prostate cancer samples and 41 normal samples, 108 known prostate cancer genes and 322 approved drug targets treated in human extracted from DrugBank database to be candidate proteins as our test data. Using our method, we prioritize the candidate proteins and validate them to the known prostate cancer drug targets.</p><p><strong>Results: </strong>We successfully identify potential drug targets which are strongly related to the well known drugs for prostate cancer treatment and also discover more potential drug targets which raise the attention to biologists at present. We denote that it is hard to discover drug targets based only on differential expression changes due to the fact that those genes used to be drug targets may not always have significant expression changes. Comparing to previous methods that depend on the network topology attributes, they turn out that the genes having potential as drug targets are weakly correlated to critical points in a network. In comparison with previous methods, our results have highest mean average precision and also rank the position of the truly drug targets higher. It thereby verifies the effectiveness of our method.</p><p><strong>Conclusions: </strong>Our method does not know the real ideal routes in the disease network but it tries to find the feasible flow to give a strong influence to the disease genes through possible paths. We successfully formulate the identification of drug target prediction as a maximum flow problem on biological networks and discover potential drug targets in an accurate manner.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2012-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3285036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30381758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Tw Ebbert, Roy Rl Bastien, Kenneth M Boucher, Miguel Martín, Eva Carrasco, Rosalía Caballero, Inge J Stijleman, Philip S Bernard, Julio C Facelli
Background: Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.
Methods: Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.
Results: All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.
Conclusions: We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.
{"title":"Characterization of uncertainty in the classification of multivariate assays: application to PAM50 centroid-based genomic predictors for breast cancer treatment plans.","authors":"Mark Tw Ebbert, Roy Rl Bastien, Kenneth M Boucher, Miguel Martín, Eva Carrasco, Rosalía Caballero, Inge J Stijleman, Philip S Bernard, Julio C Facelli","doi":"10.1186/2043-9113-1-37","DOIUrl":"https://doi.org/10.1186/2043-9113-1-37","url":null,"abstract":"<p><strong>Background: </strong>Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.</p><p><strong>Methods: </strong>Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.</p><p><strong>Results: </strong>All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.</p><p><strong>Conclusions: </strong>We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"1 ","pages":"37"},"PeriodicalIF":0.0,"publicationDate":"2011-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-1-37","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30347520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}