Journal of clinical bioinformatics最新文献_第9页

An eUtils toolset and its use for creating a pipeline to link genomics and proteomics analyses to domain-specific biomedical literature. eUtils工具集及其用于创建将基因组学和蛋白质组学分析与特定领域的生物医学文献联系起来的管道的用途。

Journal of clinical bioinformatics

Pub Date : 2012-04-16 DOI: 10.1186/2043-9113-2-9

Prakash M Nadkarni, Chirag R Parikh

Background: Numerous biomedical software applications access databases maintained by the US National Center for Biotechnology Information (NCBI). To ease software automation, NCBI provides a powerful but complex Web-service-based programming interface, eUtils. This paper describes a toolset that simplifies eUtils use through a graphical front-end that can be used by non-programmers to construct data-extraction pipelines. The front-end relies on a code library that provides high-level wrappers around eUtils functions, and which is distributed as open-source, allowing customization and enhancement by individuals with programming skills.

Methods: We initially created an application that queried eUtils to retrieve nephrology-specific biomedical literature citations for a user-definable set of genes. We later augmented the application code to create a general-purpose library that accesses eUtils capability as individual functions that could be combined into user-defined pipelines.

Results: The toolset's use is illustrated with an application that serves as a front-end to the library and can be used by non-programmers to construct user-defined pipelines. The operation of the library is illustrated for the literature-surveillance application, which serves as a case-study. An overview of the library is also provided.

Conclusions: The library simplifies use of the eUtils service by operating at a higher level, and also transparently addresses robustness issues that would need to be individually implemented otherwise, such as error recovery and prevention of overloading of the eUtils service.

背景:许多生物医学软件应用程序访问由美国国家生物技术信息中心(NCBI)维护的数据库。为了简化软件自动化，NCBI提供了一个功能强大但复杂的基于web服务的编程接口eUtils。本文描述了一个工具集，该工具集通过图形化前端简化了eutil的使用，非程序员可以使用它来构建数据提取管道。前端依赖于一个代码库，该代码库提供了eUtils函数的高级包装器，并且作为开源分发，允许具有编程技能的个人进行定制和增强。方法:我们最初创建了一个应用程序，查询eutil来检索用户可定义的一组基因的肾脏病特定生物医学文献引文。我们后来扩展了应用程序代码，以创建一个通用的库，该库将eUtils功能作为单独的函数访问，这些函数可以组合到用户定义的管道中。结果:工具集的使用通过一个应用程序来说明，该应用程序作为库的前端，非程序员可以使用它来构造用户定义的管道。以文献监控应用为例，阐述了图书馆的运行情况。还提供了该库的概述。结论:该库通过在更高的级别上操作简化了eUtils服务的使用，并且还透明地解决了需要单独实现的健壮性问题，例如错误恢复和防止eUtils服务的过载。

{"title":"An eUtils toolset and its use for creating a pipeline to link genomics and proteomics analyses to domain-specific biomedical literature.","authors":"Prakash M Nadkarni, Chirag R Parikh","doi":"10.1186/2043-9113-2-9","DOIUrl":"https://doi.org/10.1186/2043-9113-2-9","url":null,"abstract":"Background: Numerous biomedical software applications access databases maintained by the US National Center for Biotechnology Information (NCBI). To ease software automation, NCBI provides a powerful but complex Web-service-based programming interface, eUtils. This paper describes a toolset that simplifies eUtils use through a graphical front-end that can be used by non-programmers to construct data-extraction pipelines. The front-end relies on a code library that provides high-level wrappers around eUtils functions, and which is distributed as open-source, allowing customization and enhancement by individuals with programming skills.Methods: We initially created an application that queried eUtils to retrieve nephrology-specific biomedical literature citations for a user-definable set of genes. We later augmented the application code to create a general-purpose library that accesses eUtils capability as individual functions that could be combined into user-defined pipelines.Results: The toolset's use is illustrated with an application that serves as a front-end to the library and can be used by non-programmers to construct user-defined pipelines. The operation of the library is illustrated for the literature-surveillance application, which serves as a case-study. An overview of the library is also provided.Conclusions: The library simplifies use of the eUtils service by operating at a higher level, and also transparently addresses robustness issues that would need to be individually implemented otherwise, such as error recovery and prevention of overloading of the eUtils service.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30577214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Functional repertoire, molecular pathways and diseases associated with 3D domain swapping in the human proteome. 与人类蛋白质组三维结构域交换相关的功能库、分子途径和疾病。

Journal of clinical bioinformatics

Pub Date : 2012-04-03 DOI: 10.1186/2043-9113-2-8

Khader Shameer, Ramanathan Sowdhamini

Unlabelled:

Background: 3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases mediated by 3D domain swapping in the human proteome.

Methods: We used a panel of four enrichment tools with two different ontologies and two annotations database to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant associations with proteins in swapped conformation and various disease categories (P-value < 0.05).

Results: We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D domain swapping and discuss new insights about the functional repertoire, pathway associations and disease implications of proteins involved in 3D domain swapping.

Conclusions: Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. GO term enrichment were used to infer terms associated with three different GO categories. Protein domain enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of disease were found to be associated with 3D domain swapping using functional disease ontology based enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance the understanding of functional and disease implications of 3D domain swapping.

背景:三维结构域交换是一种新的结构现象，在不同的低聚构象的蛋白质结构中观察到。一种独特的结构特征，即蛋白质二聚体或更高的低聚物的结构片段在蛋白质结构的两条或多条链之间共享，表征了3D结构域交换。三维结构域交换是多种功能机制的关键中介，在淀粉样变性、阿尔茨海默病、帕金森病和朊病毒病等构象疾病中起着重要的致病作用。我们报告了第一项研究，重点是识别人类蛋白质组中3D结构域交换介导的功能类别、途径和疾病。方法:我们使用了四种不同本体的浓缩工具和两个注释数据库，以获得与三维域交换相关的生物学和临床相关信息。蛋白质结构域富集分析和基因本体(GO)项富集分析揭示了参与交换的蛋白质的功能库。使用KEGG注释的途径分析揭示了参与三维结构域交换的人类蛋白质的多种途径关联。使用疾病本体(Disease Ontology)发现交换构象蛋白与各种疾病类别之间存在统计学意义上的关联(p值< 0.05)。结果:我们报告了涉及3D结构域交换的人类基因产物文献整理数据集的荟萃分析结果，并讨论了涉及3D结构域交换的蛋白质的功能库、途径关联和疾病含义的新见解。结论:我们的集成生物信息学管道包括四种不同的富集工具，两种本体和两种注释，揭示了与3D结构域交换的功能和疾病相关性的新见解。氧化石墨烯术语富集用于推断与三种不同氧化石墨烯类别相关的术语。蛋白质结构域富集用于鉴定交换蛋白中富集的保守结构域。使用KEGG注释的途径富集分析显示，具有交换构象的蛋白质存在于所有6类KEGG BRITE结构中，并且在5类中观察到显著富集的KEGG通路。利用基于功能疾病本体的富集分析，发现五大类疾病与三维域交换相关。五类人类疾病:癌症、呼吸系统或肺系统疾病、中枢神经系统退行性疾病、血管疾病和脑炎被发现是重要的。总之，我们的研究表明，基于生物信息学的分析方法使用整理的数据可以增强对三维结构域交换的功能和疾病影响的理解。

{"title":"Functional repertoire, molecular pathways and diseases associated with 3D domain swapping in the human proteome.","authors":"Khader Shameer, Ramanathan Sowdhamini","doi":"10.1186/2043-9113-2-8","DOIUrl":"https://doi.org/10.1186/2043-9113-2-8","url":null,"abstract":"Unlabelled: Background: 3D domain swapping is a novel structural phenomenon observed in diverse set of protein structures in oligomeric conformations. A distinct structural feature, where structural segments in a protein dimer or higher oligomer were shared between two or more chains of a protein structure, characterizes 3D domain swapping. 3D domain swapping was observed as a key mediator of numerous functional mechanisms and play pathogenic role in various diseases including conformational diseases like amyloidosis, Alzheimer's disease, Parkinson's disease and prion diseases. We report the first study with a focus on identifying functional classes, pathways and diseases mediated by 3D domain swapping in the human proteome.Methods: We used a panel of four enrichment tools with two different ontologies and two annotations database to derive biological and clinical relevant information associated with 3D domain swapping. Protein domain enrichment analysis followed by Gene Ontology (GO) term enrichment analysis revealed the functional repertoire of proteins involved in swapping. Pathway analysis using KEGG annotations revealed diverse pathway associations of human proteins involved in 3D domain swapping. Disease Ontology was used to find statistically significant associations with proteins in swapped conformation and various disease categories (P-value < 0.05).Results: We report meta-analysis results of a literature-curated dataset of human gene products involved in 3D domain swapping and discuss new insights about the functional repertoire, pathway associations and disease implications of proteins involved in 3D domain swapping.Conclusions: Our integrated bioinformatics pipeline comprising of four different enrichment tools, two ontologies and two annotations revealed new insights into the functional and disease correlations with 3D domain swapping. GO term enrichment were used to infer terms associated with three different GO categories. Protein domain enrichment was used to identify conserved domains enriched in swapped proteins. Pathway enrichment analysis using KEGG annotations revealed that proteins with swapped conformations are present in all six classes of KEGG BRITE hierarchy and significantly enriched KEGG pathways were observed in five classes. Five major classes of disease were found to be associated with 3D domain swapping using functional disease ontology based enrichment analysis. Five classes of human diseases: cancer, diseases of the respiratory or pulmonary system, degenerative diseases of the central nervous system, vascular disease and encephalitis were found to be significant. In conclusion, our study shows that bioinformatics based analytical approaches using curated data can enhance the understanding of functional and disease implications of 3D domain swapping.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30549800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Bioinformatics as a driver, not a passenger, of translational biomedical research: Perspectives from the 6th Benelux bioinformatics conference. 生物信息学作为转化生物医学研究的司机，而不是乘客:来自第六届比荷卢生物信息学会议的观点。

Journal of clinical bioinformatics

Pub Date : 2012-03-13 DOI: 10.1186/2043-9113-2-7

Francisco J Azuaje, Michaël Heymann, Anne-Marie Ternes, Anke Wienecke-Baldacchino, Daniel Struck, Danièle Moes, Reinhard Schneider

The 6th Benelux Bioinformatics Conference (BBC11) held in Luxembourg on 12 and 13 December 2011 attracted around 200 participants, including internationally-renowned guest speakers and more than 100 peer-reviewed submissions from 3 continents. Researchers from the public and private sectors convened at BBC11 to discuss advances and challenges in a wide spectrum of application areas. A key theme of the conference was the contribution of bioinformatics to enable and accelerate translational and clinical research. The BBC11 stressed the need for stronger collaborating efforts across disciplines and institutions. The demonstration of the clinical relevance of systems approaches and of next-generation sequencing-based measurement technologies are among the existing opportunities for increasing impact in translational research. Translational bioinformatics will benefit from research models that strike a balance between the importance of protecting intellectual property and the need to openly access scientific and technological advances. The full conference proceedings are freely available at http://www.bbc11.lu.

第六届比荷卢生物信息学会议(BBC11)于2011年12月12日至13日在卢森堡举行，吸引了约200名与会者，其中包括国际知名的演讲嘉宾和来自三大洲的100多份同行评议的意见书。来自公共和私营部门的研究人员齐聚BBC11，讨论广泛应用领域的进展和挑战。会议的一个关键主题是生物信息学对实现和加速转化和临床研究的贡献。BBC11强调需要加强跨学科和机构之间的合作。系统方法和下一代基于测序的测量技术的临床相关性的演示是在转化研究中增加影响的现有机会之一。翻译生物信息学将受益于在保护知识产权的重要性和公开获取科学技术进步的需要之间取得平衡的研究模式。完整的会议记录可在http://www.bbc11.lu免费获取。

{"title":"Bioinformatics as a driver, not a passenger, of translational biomedical research: Perspectives from the 6th Benelux bioinformatics conference.","authors":"Francisco J Azuaje, Michaël Heymann, Anne-Marie Ternes, Anke Wienecke-Baldacchino, Daniel Struck, Danièle Moes, Reinhard Schneider","doi":"10.1186/2043-9113-2-7","DOIUrl":"https://doi.org/10.1186/2043-9113-2-7","url":null,"abstract":" The 6th Benelux Bioinformatics Conference (BBC11) held in Luxembourg on 12 and 13 December 2011 attracted around 200 participants, including internationally-renowned guest speakers and more than 100 peer-reviewed submissions from 3 continents. Researchers from the public and private sectors convened at BBC11 to discuss advances and challenges in a wide spectrum of application areas. A key theme of the conference was the contribution of bioinformatics to enable and accelerate translational and clinical research. The BBC11 stressed the need for stronger collaborating efforts across disciplines and institutions. The demonstration of the clinical relevance of systems approaches and of next-generation sequencing-based measurement technologies are among the existing opportunities for increasing impact in translational research. Translational bioinformatics will benefit from research models that strike a balance between the importance of protecting intellectual property and the need to openly access scientific and technological advances. The full conference proceedings are freely available at http://www.bbc11.lu.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2012-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40160168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease. 外周动脉疾病时外周血单核细胞的基因表达谱分析。

Journal of clinical bioinformatics

Pub Date : 2012-03-12 DOI: 10.1186/2043-9113-2-6

Rizwan Masud, Khader Shameer, Aparna Dhar, Keyue Ding, Iftikhar J Kullo

Background: Peripheral arterial disease (PAD) is a relatively common manifestation of systemic atherosclerosis that leads to progressive narrowing of the lumen of leg arteries. Circulating monocytes are in contact with the arterial wall and can serve as reporters of vascular pathology in the setting of PAD. We performed gene expression analysis of peripheral blood mononuclear cells (PBMC) in patients with PAD and controls without PAD to identify differentially regulated genes.

Methods: PAD was defined as an ankle brachial index (ABI) ≤0.9 (n = 19) while age and gender matched controls had an ABI > 1.0 (n = 18). Microarray analysis was performed using Affymetrix HG-U133 plus 2.0 gene chips and analyzed using GeneSpring GX 11.0. Gene expression data was normalized using Robust Multichip Analysis (RMA) normalization method, differential expression was defined as a fold change ≥1.5, followed by unpaired Mann-Whitney test (P < 0.05) and correction for multiple testing by Benjamini and Hochberg False Discovery Rate. Meta-analysis of differentially expressed genes was performed using an integrated bioinformatics pipeline with tools for enrichment analysis using Gene Ontology (GO) terms, pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG), molecular event enrichment using Reactome annotations and network analysis using Ingenuity Pathway Analysis suite. Extensive biocuration was also performed to understand the functional context of genes.

Results: We identified 87 genes differentially expressed in the setting of PAD; 40 genes were upregulated and 47 genes were downregulated. We employed an integrated bioinformatics pipeline coupled with literature curation to characterize the functional coherence of differentially regulated genes.

Conclusion: Notably, upregulated genes mediate immune response, inflammation, apoptosis, stress response, phosphorylation, hemostasis, platelet activation and platelet aggregation. Downregulated genes included several genes from the zinc finger family that are involved in transcriptional regulation. These results provide insights into molecular mechanisms relevant to the pathophysiology of PAD.

背景：外周动脉疾病（PAD）是全身性动脉粥样硬化的一种相对常见的表现形式，会导致腿部动脉管腔逐渐变窄。循环中的单核细胞与动脉壁接触，可作为 PAD 病变的血管病理报告物。我们对 PAD 患者和未患 PAD 的对照组的外周血单核细胞（PBMC）进行了基因表达分析，以确定受不同调控的基因：PAD的定义是踝肱指数（ABI）≤0.9（n = 19），而年龄和性别匹配的对照组的ABI>1.0（n = 18）。使用 Affymetrix HG-U133 plus 2.0 基因芯片进行微阵列分析，并使用 GeneSpring GX 11.0 进行分析。基因表达数据采用 Robust Multichip Analysis (RMA) 归一化方法进行归一化，差异表达定义为折合变化≥1.5，然后进行非配对 Mann-Whitney 检验（P < 0.05），并用 Benjamini 和 Hochberg 假发现率校正多重检验。差异表达基因的元分析是利用集成生物信息学管道进行的，该管道包括利用基因本体（GO）术语进行富集分析的工具、利用京都基因和基因组百科全书（KEGG）进行通路分析的工具、利用 Reactome 注释进行分子事件富集的工具以及利用 Ingenuity Pathway Analysis 套件进行网络分析的工具。为了了解基因的功能背景，还进行了广泛的生物组学分析：结果：我们发现了 87 个在 PAD 环境中差异表达的基因，其中 40 个基因上调，47 个基因下调。我们采用了一个综合生物信息学管道，并结合文献整理来描述差异调控基因的功能一致性：结论：值得注意的是，上调基因介导免疫反应、炎症、细胞凋亡、应激反应、磷酸化、止血、血小板活化和血小板聚集。下调基因包括参与转录调控的锌指家族的几个基因。这些结果提供了与 PAD 病理生理学相关的分子机制的见解。

{"title":"Gene expression profiling of peripheral blood mononuclear cells in the setting of peripheral arterial disease.","authors":"Rizwan Masud, Khader Shameer, Aparna Dhar, Keyue Ding, Iftikhar J Kullo","doi":"10.1186/2043-9113-2-6","DOIUrl":"10.1186/2043-9113-2-6","url":null,"abstract":"Background: Peripheral arterial disease (PAD) is a relatively common manifestation of systemic atherosclerosis that leads to progressive narrowing of the lumen of leg arteries. Circulating monocytes are in contact with the arterial wall and can serve as reporters of vascular pathology in the setting of PAD. We performed gene expression analysis of peripheral blood mononuclear cells (PBMC) in patients with PAD and controls without PAD to identify differentially regulated genes.Methods: PAD was defined as an ankle brachial index (ABI) ≤0.9 (n = 19) while age and gender matched controls had an ABI > 1.0 (n = 18). Microarray analysis was performed using Affymetrix HG-U133 plus 2.0 gene chips and analyzed using GeneSpring GX 11.0. Gene expression data was normalized using Robust Multichip Analysis (RMA) normalization method, differential expression was defined as a fold change ≥1.5, followed by unpaired Mann-Whitney test (P < 0.05) and correction for multiple testing by Benjamini and Hochberg False Discovery Rate. Meta-analysis of differentially expressed genes was performed using an integrated bioinformatics pipeline with tools for enrichment analysis using Gene Ontology (GO) terms, pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG), molecular event enrichment using Reactome annotations and network analysis using Ingenuity Pathway Analysis suite. Extensive biocuration was also performed to understand the functional context of genes.Results: We identified 87 genes differentially expressed in the setting of PAD; 40 genes were upregulated and 47 genes were downregulated. We employed an integrated bioinformatics pipeline coupled with literature curation to characterize the functional coherence of differentially regulated genes.Conclusion: Notably, upregulated genes mediate immune response, inflammation, apoptosis, stress response, phosphorylation, hemostasis, platelet activation and platelet aggregation. Downregulated genes included several genes from the zinc finger family that are involved in transcriptional regulation. These results provide insights into molecular mechanisms relevant to the pathophysiology of PAD.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2012-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3381689/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40157000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Monte Carlo simulation of the Spearman-Kaerber TCID50. Spearman-Kaerber TCID50的蒙特卡罗模拟。

Journal of clinical bioinformatics

Pub Date : 2012-02-13 DOI: 10.1186/2043-9113-2-5

Niels H Wulff, Maria Tzatzaris, Philip J Young

Background: In the biological sciences the TCID50 (median tissue culture infective dose) assay is often used to determine the strength of a virus.

Methods: When the so-called Spearman-Kaerber calculation is used, the ratio between the pfu (the number of plaque forming units, the effective number of virus particles) and the TCID50, theoretically approaches a simple function of Eulers constant. Further, the standard deviation of the logarithm of the TCID50 approaches a simple function of the dilution factor and the number of wells used for determining the ratios in the assay. However, these theoretical calculations assume that the dilutions of the assay are independent, and in practice this is not completely correct. The assay was simulated using Monte Carlo techniques.

Results: Our simulation studies show that the theoretical results actually hold true for practical implementations of the assay. Furthermore, the simulation studies show that the distribution of the (the log of) TCID50, although discrete in nature, has a close relationship to the normal distribution.

Conclusion: The pfu is proportional to the TCID50 titre with a factor of about 0.56 when using the Spearman-Kaerber calculation method. The normal distribution can be used for statistical inferences and ANOVA on the (the log of) TCID50 values is meaningful with group sizes of 5 and above.

背景:在生物科学中，TCID50(中位组织培养感染剂量)测定常用于确定病毒的强度。方法:当使用所谓的Spearman-Kaerber计算时，pfu(斑块形成单位的数量，病毒颗粒的有效数量)与TCID50之间的比值在理论上接近欧拉常数的简单函数。此外，TCID50的对数的标准偏差接近稀释因子的简单函数和用于测定测定中比率的井数。然而，这些理论计算假设分析的稀释度是独立的，在实践中这是不完全正确的。采用蒙特卡罗技术模拟该实验。结果:我们的模拟研究表明，理论结果实际上适用于该分析的实际实施。此外，仿真研究表明，TCID50的(对数)分布虽然是离散的，但与正态分布有密切的关系。结论:采用Spearman-Kaerber法计算时，pfu与TCID50滴度成正比，因子约为0.56。正态分布可用于统计推断和方差分析，在群体规模为5及以上时，TCID50值的对数有意义。

{"title":"Monte Carlo simulation of the Spearman-Kaerber TCID50.","authors":"Niels H Wulff, Maria Tzatzaris, Philip J Young","doi":"10.1186/2043-9113-2-5","DOIUrl":"https://doi.org/10.1186/2043-9113-2-5","url":null,"abstract":"Background: In the biological sciences the TCID50 (median tissue culture infective dose) assay is often used to determine the strength of a virus.Methods: When the so-called Spearman-Kaerber calculation is used, the ratio between the pfu (the number of plaque forming units, the effective number of virus particles) and the TCID50, theoretically approaches a simple function of Eulers constant. Further, the standard deviation of the logarithm of the TCID50 approaches a simple function of the dilution factor and the number of wells used for determining the ratios in the assay. However, these theoretical calculations assume that the dilutions of the assay are independent, and in practice this is not completely correct. The assay was simulated using Monte Carlo techniques.Results: Our simulation studies show that the theoretical results actually hold true for practical implementations of the assay. Furthermore, the simulation studies show that the distribution of the (the log of) TCID50, although discrete in nature, has a close relationship to the normal distribution.Conclusion: The pfu is proportional to the TCID50 titre with a factor of about 0.56 when using the Spearman-Kaerber calculation method. The normal distribution can be used for statistical inferences and ANOVA on the (the log of) TCID50 values is meaningful with group sizes of 5 and above.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2012-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30457137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

A distinct metabolic signature predicts development of fasting plasma glucose. 一个独特的代谢特征预测空腹血糖的发展。

Journal of clinical bioinformatics

Pub Date : 2012-02-02 DOI: 10.1186/2043-9113-2-3

Manuela Hische, Abdelhalim Larhlimi, Franziska Schwarz, Antje Fischer-Rosinský, Thomas Bobbert, Anke Assmann, Gareth S Catchpole, Andreas Fh Pfeiffer, Lothar Willmitzer, Joachim Selbig, Joachim Spranger

Background: High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.

Methods: We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.

Results: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.

Conclusions: We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.

背景:在世界范围内，高血糖和糖尿病是造成健康寿命损失最大的疾病之一。因此，许多研究旨在确定糖代谢障碍和2型糖尿病发展的可靠风险标志物。然而，葡萄糖代谢受损的分子基础迄今尚未得到充分的了解。近年来所谓的“组学”方法的发展有望识别分子标记，并进一步了解糖代谢障碍和2型糖尿病的分子基础。虽然单变量统计方法经常被应用，但我们在这里证明，强烈建议应用多变量统计方法来充分捕捉使用高通量方法获得的数据的复杂性。方法:我们采集172名参与代谢综合征柏林波茨坦前瞻性随访研究(MESY-BEPO随访)的受试者的血浆样本。我们使用气相色谱-质谱联用技术(GC-MS)分析了这些样品，并测量了286种代谢物。此外，在基线和平均六年之后，使用标准方法测量空腹血糖水平。我们进行了相关分析，建立了线性回归模型和随机森林回归模型，以确定预测我们队列中空腹血糖发展的代谢物。结果:我们发现了由九种代谢物组成的代谢模式，使用随机森林回归进行十倍交叉验证，预测空腹血糖发展的准确性为0.47。我们还表明，添加已建立的风险标记并不能提高模型的准确性。然而，最终需要外部验证。虽然并不是所有的代谢物都属于最后一种模式，但这种模式将注意力引向了氨基酸代谢、能量代谢和氧化还原稳态。结论:我们证明，使用高通量方法(GC-MS)鉴定的代谢物在预测几年内空腹血糖的发展方面表现良好。值得注意的是，不是单一的，而是复杂的代谢物模式推动了预测，因此反映了潜在分子机制的复杂性。这一结果只能通过应用多元统计方法来获得。因此，我们强烈建议使用统计方法，抓住高通量方法给出的信息的复杂性。

{"title":"A distinct metabolic signature predicts development of fasting plasma glucose.","authors":"Manuela Hische, Abdelhalim Larhlimi, Franziska Schwarz, Antje Fischer-Rosinský, Thomas Bobbert, Anke Assmann, Gareth S Catchpole, Andreas Fh Pfeiffer, Lothar Willmitzer, Joachim Selbig, Joachim Spranger","doi":"10.1186/2043-9113-2-3","DOIUrl":"https://doi.org/10.1186/2043-9113-2-3","url":null,"abstract":"Background: High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.Methods: We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.Results: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.Conclusions: We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2012-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30432625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Analysis of the salivary microbiome using culture-independent techniques. 使用非培养技术分析唾液微生物组。

Journal of clinical bioinformatics

Pub Date : 2012-02-02 DOI: 10.1186/2043-9113-2-4

Vladimir Lazarevic, Katrine Whiteson, Nadia Gaïa, Yann Gizard, David Hernandez, Laurent Farinelli, Magne Osterås, Patrice François, Jacques Schrenzel

Background: The salivary microbiota is a potential diagnostic indicator of several diseases. Culture-independent techniques are required to study the salivary microbial community since many of its members have not been cultivated.

Methods: We explored the bacterial community composition in the saliva sample using metagenomic whole genome shotgun (WGS) sequencing, the extraction of 16S rRNA gene fragments from metagenomic sequences (16S-WGS) and high-throughput sequencing of PCR-amplified bacterial 16S rDNA gene (16S-HTS) regions V1 and V3.

Results: The hierarchical clustering of data based on the relative abundance of bacterial genera revealed that distances between 16S-HTS datasets for V1 and V3 regions were greater than those obtained for the same V region with different numbers of PCR cycles. Datasets generated by 16S-HTS and 16S-WGS were even more distant. Finally, comparison of WGS and 16S-based datasets revealed the highest dissimilarity.The analysis of the 16S-HTS, WGS and 16S-WGS datasets revealed 206, 56 and 39 bacterial genera, respectively, 124 of which have not been previously identified in salivary microbiomes. A large fraction of DNA extracted from saliva corresponded to human DNA. Based on sequence similarity search against completely sequenced genomes, bacterial and viral sequences represented 0.73% and 0.0036% of the salivary metagenome, respectively. Several sequence reads were identified as parts of the human herpesvirus 7.

Conclusions: Analysis of the salivary metagenome may have implications in diagnostics e.g. in detection of microorganisms and viruses without designing specific tests for each pathogen.

背景:唾液微生物群是几种疾病的潜在诊断指标。由于唾液微生物群落的许多成员尚未被培养，因此研究唾液微生物群落需要非培养技术。方法:采用宏基因组全基因组霰弹枪(WGS)测序、提取宏基因组序列(16S-WGS)中的16S rRNA基因片段，并对pcr扩增的细菌16S rDNA基因(16S- hts) V1区和V3区进行高通量测序，研究唾液样本细菌群落组成。结果:根据细菌属的相对丰度对数据进行分层聚类，V1区和V3区16S-HTS数据集之间的距离大于不同PCR循环数下同一V区16S-HTS数据集之间的距离。16S-HTS和16S-WGS生成的数据集距离更远。最后，WGS和基于16s的数据集的比较显示出最大的差异。对16S-HTS、WGS和16S-WGS数据集的分析分别揭示了206、56和39个细菌属，其中124个以前未在唾液微生物组中发现。从唾液中提取的大部分DNA与人类DNA相符。基于全序列基因组的序列相似性搜索，细菌和病毒序列分别占唾液元基因组的0.73%和0.0036%。几个序列读数被鉴定为人类疱疹病毒7的一部分。结论:唾液宏基因组的分析可能对诊断有一定的意义，例如在不为每种病原体设计特异性检测的情况下检测微生物和病毒。

{"title":"Analysis of the salivary microbiome using culture-independent techniques.","authors":"Vladimir Lazarevic, Katrine Whiteson, Nadia Gaïa, Yann Gizard, David Hernandez, Laurent Farinelli, Magne Osterås, Patrice François, Jacques Schrenzel","doi":"10.1186/2043-9113-2-4","DOIUrl":"https://doi.org/10.1186/2043-9113-2-4","url":null,"abstract":"Background: The salivary microbiota is a potential diagnostic indicator of several diseases. Culture-independent techniques are required to study the salivary microbial community since many of its members have not been cultivated.Methods: We explored the bacterial community composition in the saliva sample using metagenomic whole genome shotgun (WGS) sequencing, the extraction of 16S rRNA gene fragments from metagenomic sequences (16S-WGS) and high-throughput sequencing of PCR-amplified bacterial 16S rDNA gene (16S-HTS) regions V1 and V3.Results: The hierarchical clustering of data based on the relative abundance of bacterial genera revealed that distances between 16S-HTS datasets for V1 and V3 regions were greater than those obtained for the same V region with different numbers of PCR cycles. Datasets generated by 16S-HTS and 16S-WGS were even more distant. Finally, comparison of WGS and 16S-based datasets revealed the highest dissimilarity.The analysis of the 16S-HTS, WGS and 16S-WGS datasets revealed 206, 56 and 39 bacterial genera, respectively, 124 of which have not been previously identified in salivary microbiomes. A large fraction of DNA extracted from saliva corresponded to human DNA. Based on sequence similarity search against completely sequenced genomes, bacterial and viral sequences represented 0.73% and 0.0036% of the salivary metagenome, respectively. Several sequence reads were identified as parts of the human herpesvirus 7.Conclusions: Analysis of the salivary metagenome may have implications in diagnostics e.g. in detection of microorganisms and viruses without designing specific tests for each pathogen.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2012-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30432034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Brain cancer prognosis: independent validation of a clinical bioinformatics approach. 脑癌预后:临床生物信息学方法的独立验证。

Journal of clinical bioinformatics

Pub Date : 2012-02-01 DOI: 10.1186/2043-9113-2-2

Raffaele Fronza, Michele Tramonti, William R Atchley, Christine Nardini

Translational and evidence based medicine can take advantage of biotechnology advances that offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. The clinical information hidden in these data can be clarified with clinical bioinformatics approaches. We have recently proposed a method to analyze different layers of high-throughput (omic) data to preserve the emergent properties that appear in the cellular system when all molecular levels are interacting. We show here that this method applied to brain cancer data can uncover properties (i.e. molecules related to protective versus risky features in different types of brain cancers) that have been independently validated as survival markers, with potential important application in clinical practice.

转化医学和循证医学可以利用生物技术的进步，为筛选基因组、转录、转录后和转化观察的分子活动提供快速增长的各种高通量数据。这些数据中隐藏的临床信息可以用临床生物信息学方法来阐明。我们最近提出了一种方法来分析不同层的高通量(组学)数据，以保存当所有分子水平相互作用时出现在细胞系统中的涌现特性。我们在这里表明，将这种方法应用于脑癌数据可以揭示特性(即不同类型脑癌中与保护性和危险特征相关的分子)，这些特性已被独立验证为生存标记，在临床实践中具有潜在的重要应用。

引用次数: 3

A network flow approach to predict drug targets from microarray data, disease genes and interactome network - case study on prostate cancer. 从芯片数据、疾病基因和交互组网络预测药物靶点的网络流方法--前列腺癌案例研究。

Journal of clinical bioinformatics

Pub Date : 2012-01-13 DOI: 10.1186/2043-9113-2-1

Shih-Heng Yeh, Hsiang-Yuan Yeh, Von-Wun Soo

Background: Systematic approach for drug discovery is an emerging discipline in systems biology research area. It aims at integrating interaction data and experimental data to elucidate diseases and also raises new issues in drug discovery for cancer treatment. However, drug target discovery is still at a trial-and-error experimental stage and it is a challenging task to develop a prediction model that can systematically detect possible drug targets to deal with complex diseases.

Methods: We integrate gene expression, disease genes and interaction networks to identify the effective drug targets which have a strong influence on disease genes using network flow approach. In the experiments, we adopt the microarray dataset containing 62 prostate cancer samples and 41 normal samples, 108 known prostate cancer genes and 322 approved drug targets treated in human extracted from DrugBank database to be candidate proteins as our test data. Using our method, we prioritize the candidate proteins and validate them to the known prostate cancer drug targets.

Results: We successfully identify potential drug targets which are strongly related to the well known drugs for prostate cancer treatment and also discover more potential drug targets which raise the attention to biologists at present. We denote that it is hard to discover drug targets based only on differential expression changes due to the fact that those genes used to be drug targets may not always have significant expression changes. Comparing to previous methods that depend on the network topology attributes, they turn out that the genes having potential as drug targets are weakly correlated to critical points in a network. In comparison with previous methods, our results have highest mean average precision and also rank the position of the truly drug targets higher. It thereby verifies the effectiveness of our method.

Conclusions: Our method does not know the real ideal routes in the disease network but it tries to find the feasible flow to give a strong influence to the disease genes through possible paths. We successfully formulate the identification of drug target prediction as a maximum flow problem on biological networks and discover potential drug targets in an accurate manner.

背景：药物发现的系统方法是系统生物学研究领域的一门新兴学科。它旨在整合相互作用数据和实验数据以阐明疾病，同时也为癌症治疗的药物发现提出了新的课题。然而，药物靶点的发现仍处于试错实验阶段，如何开发一种预测模型，系统地检测可能的药物靶点，以应对复杂的疾病，是一项极具挑战性的任务：方法：我们整合了基因表达、疾病基因和相互作用网络，采用网络流方法识别出对疾病基因影响较大的有效药物靶点。在实验中，我们采用了包含 62 个前列腺癌样本和 41 个正常样本的微阵列数据集、108 个已知的前列腺癌基因以及从 DrugBank 数据库中提取的 322 个已批准的治疗人类的药物靶点作为候选蛋白作为测试数据。利用我们的方法，我们对候选蛋白质进行了优先排序，并将它们与已知的前列腺癌药物靶点进行了验证：结果：我们成功地发现了与已知前列腺癌治疗药物密切相关的潜在药物靶点，同时也发现了更多目前引起生物学家关注的潜在药物靶点。我们发现，仅根据差异表达变化很难发现药物靶点，因为那些被认为是药物靶点的基因并不总是有显著的表达变化。与以往依赖网络拓扑属性的方法相比，我们发现有可能成为药物靶点的基因与网络中的临界点相关性很弱。与之前的方法相比，我们的结果具有最高的平均精度，而且真正的药物靶点的位置排序也更靠前。这也验证了我们方法的有效性：我们的方法不知道疾病网络中真正的理想路径，但它试图找到可行的流程，通过可能的路径对疾病基因产生强烈的影响。我们成功地将药物靶点预测识别表述为生物网络中的最大流问题，并准确地发现了潜在的药物靶点。

{"title":"A network flow approach to predict drug targets from microarray data, disease genes and interactome network - case study on prostate cancer.","authors":"Shih-Heng Yeh, Hsiang-Yuan Yeh, Von-Wun Soo","doi":"10.1186/2043-9113-2-1","DOIUrl":"10.1186/2043-9113-2-1","url":null,"abstract":"Background: Systematic approach for drug discovery is an emerging discipline in systems biology research area. It aims at integrating interaction data and experimental data to elucidate diseases and also raises new issues in drug discovery for cancer treatment. However, drug target discovery is still at a trial-and-error experimental stage and it is a challenging task to develop a prediction model that can systematically detect possible drug targets to deal with complex diseases.Methods: We integrate gene expression, disease genes and interaction networks to identify the effective drug targets which have a strong influence on disease genes using network flow approach. In the experiments, we adopt the microarray dataset containing 62 prostate cancer samples and 41 normal samples, 108 known prostate cancer genes and 322 approved drug targets treated in human extracted from DrugBank database to be candidate proteins as our test data. Using our method, we prioritize the candidate proteins and validate them to the known prostate cancer drug targets.Results: We successfully identify potential drug targets which are strongly related to the well known drugs for prostate cancer treatment and also discover more potential drug targets which raise the attention to biologists at present. We denote that it is hard to discover drug targets based only on differential expression changes due to the fact that those genes used to be drug targets may not always have significant expression changes. Comparing to previous methods that depend on the network topology attributes, they turn out that the genes having potential as drug targets are weakly correlated to critical points in a network. In comparison with previous methods, our results have highest mean average precision and also rank the position of the truly drug targets higher. It thereby verifies the effectiveness of our method.Conclusions: Our method does not know the real ideal routes in the disease network but it tries to find the feasible flow to give a strong influence to the disease genes through possible paths. We successfully formulate the identification of drug target prediction as a maximum flow problem on biological networks and discover potential drug targets in an accurate manner.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2012-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3285036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30381758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterization of uncertainty in the classification of multivariate assays: application to PAM50 centroid-based genomic predictors for breast cancer treatment plans. 多变量分析分类中不确定性的表征:应用于基于PAM50质心的乳腺癌治疗计划基因组预测因子

Journal of clinical bioinformatics

Pub Date : 2011-12-23 DOI: 10.1186/2043-9113-1-37

Mark Tw Ebbert, Roy Rl Bastien, Kenneth M Boucher, Miguel Martín, Eva Carrasco, Rosalía Caballero, Inge J Stijleman, Philip S Bernard, Julio C Facelli

Background: Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.

Methods: Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.

Results: All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.

Conclusions: We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.

背景:用于辅助临床决策的多变量分析(MVAs)越来越普遍，但由于其复杂性，通常被认为是一种高风险的方法。一个关键的问题是，化验最终结果的不确定性还没有得到很好的理解。本研究的重点是开发一种过程来表征由实验室过程中的固有误差引起的MVA结果中的误差:样品制备和测量因素，如基因表达。方法:使用PAM50乳腺癌固有分类器，我们展示了如何表征MVA中的错误，以及这些错误如何影响向临床医生报告的结果。首先，我们通过对代表主要乳腺癌肿瘤亚型的四个原型样本进行重复测量，估计了PAM50测定中测量因素的误差分布。然后，利用误差分布和原始原型样本数据，我们使用蒙特卡罗模拟生成足够数量的模拟样本。在对所有模拟样本进行分类后，通过测量亚型可重复性来估计这些误差对PAM50肿瘤亚型分类的影响。亚型再现性以与母样本分类相同的模拟样本的百分比来衡量。此后，在GEICAM 9906临床试验的大量独立样本数据集上重复了模拟。来自GEICAM样本集的模拟样本用于探索更现实的场景，其中，与原型样本不同，许多样本不容易分类。结果:所有原型样本衍生的模拟样本与母样本分类相同。来自GEICAM集合的模拟样本的亚型也具有高度可重复性，但有不可忽略的数量的样本在其分类中表现出显着的可变性。结论:我们已经开发了一种通用的方法来估计mva内固有误差的影响。我们已经将该方法应用于PAM50测定，表明PAM50结果对测定中的固有误差具有弹性，但也发现在非原型样品中，实验误差可能导致肿瘤的完全不同分类。最后，我们提出了一种为临床医生提供可用的不确定度信息的方法。

{"title":"Characterization of uncertainty in the classification of multivariate assays: application to PAM50 centroid-based genomic predictors for breast cancer treatment plans.","authors":"Mark Tw Ebbert, Roy Rl Bastien, Kenneth M Boucher, Miguel Martín, Eva Carrasco, Rosalía Caballero, Inge J Stijleman, Philip S Bernard, Julio C Facelli","doi":"10.1186/2043-9113-1-37","DOIUrl":"https://doi.org/10.1186/2043-9113-1-37","url":null,"abstract":"Background: Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.Methods: Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.Results: All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.Conclusions: We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"1 ","pages":"37"},"PeriodicalIF":0.0,"publicationDate":"2011-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-1-37","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30347520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27