首页 > 最新文献

Journal of clinical bioinformatics最新文献

英文 中文
Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection. Huvariome:全基因组下一代等位基因频率测序的web服务器资源,以帮助病理候选基因选择。
Pub Date : 2012-11-19 DOI: 10.1186/2043-9113-2-19
Andrew Stubbs, Elizabeth A McClellan, Sebastiaan Horsman, Saskia D Hiltemann, Ivo Palli, Stephan Nouwens, Anton Hj Koning, Frits Hoogland, Joke Reumers, Daphne Heijsman, Sigrid Swagemakers, Andreas Kremer, Jules Meijerink, Diether Lambrechts, Peter J van der Spek

Unlabelled:

Background: Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.

Description: We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.

Conclusion: Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays

背景:下一代测序为临床研究科学家提供了无数变异的直接读取,包括个人,病理和常见的良性变异。重测序研究的目的是确定来自个体基因组、基于家族或肿瘤/正常基因组比较的候选致病变异。虽然在实验设计中使用适当的控制可以最大限度地减少选择的假阳性变异的数量,但在候选基因选择之前,使用高质量的全基因组参考数据可以进一步减少假阳性变异的数量。此外,使用平台相关的测序误差模型可以帮助从低覆盖率数据中恢复模棱两可的基因型。描述:我们开发了一个人类遗传变异的全基因组数据库Huvariome,由全基因组深度测序数据确定,具有高覆盖率和低错误率。该数据库被设计为独立于测序技术,但目前已由165个个体全基因组组成,包括小谱系和匹配的肿瘤/正常样本,通过完整基因组测序平台测序。已经确定了比荷卢经济联盟人群队列的常见变异,并与两组对照数据(165个基因组中的73个)的结果一起表示为基因型,Huvariome Core包括来自比荷卢经济联盟地区的31名健康个体,多样性小组包括代表10个不同人群的46名健康个体和三个谱系中的21个样本。用户可以通过网络界面通过基因或位置查询数据库,结果显示为数据集中检测到的变异频率。我们证明Huvariome可以提供准确的参考等位基因频率,以消除重测序实验中产生的测序不一致。Huvariome已被用于支持在参考队列中具有纯合子基因型的心肌病相关候选基因的选择。该数据库允许用户查看Huvariome核心样本中哪些选择的变异是常见变异(次要等位基因频率> 5%),从而通过过滤掉未在其他公共基因组变异数据库中列出的常见变异来帮助选择潜在的致病变异。Huvariome的未调用率和等位基因调用的准确性为用户提供了识别与人类基因组特定区域相关的平台依赖性错误的可能性。结论:Huvariome是一个简单易用的资源,可用于验证NGS实验获得的重测序结果。高序列覆盖率和低错误率为科学家提供了从谱系研究中消除假阳性结果的能力。结果通过web界面返回,该界面显示基于位置的遗传变异频率、对蛋白质功能的影响、与已知遗传变异的关联以及源自Huvariome Core和Diversity Panel数据的变异基础的质量评分。这些结果可用于识别和优先考虑罕见的变异,例如,可能与疾病相关。在测试Huvariome数据库的准确性时,在所有病例中都成功地预测了一种被模糊地称为编码单核苷酸变异的选择等位基因。通过限制从宿主机构获取患者衍生基因组来确保个人数据保护,这与未来的分子诊断相关。
{"title":"Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection.","authors":"Andrew Stubbs,&nbsp;Elizabeth A McClellan,&nbsp;Sebastiaan Horsman,&nbsp;Saskia D Hiltemann,&nbsp;Ivo Palli,&nbsp;Stephan Nouwens,&nbsp;Anton Hj Koning,&nbsp;Frits Hoogland,&nbsp;Joke Reumers,&nbsp;Daphne Heijsman,&nbsp;Sigrid Swagemakers,&nbsp;Andreas Kremer,&nbsp;Jules Meijerink,&nbsp;Diether Lambrechts,&nbsp;Peter J van der Spek","doi":"10.1186/2043-9113-2-19","DOIUrl":"https://doi.org/10.1186/2043-9113-2-19","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.</p><p><strong>Description: </strong>We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.</p><p><strong>Conclusion: </strong>Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays ","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-19","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31057306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
The H-factor as a novel quality metric for homology modeling. h因子作为一种新的同调建模质量度量。
Pub Date : 2012-11-02 DOI: 10.1186/2043-9113-2-18
Eric di Luccio, Patrice Koehl

Unlabelled:

Background: Drug discovery typically starts with the identification of a potential target that is then tested and validated either through high-throughput screening against a library of drug compounds or by rational drug design. When the putative target is a protein, the latter approach requires the knowledge of its structure. Finding the structure of a protein is however a difficult task. Significant progress has come from high-resolution techniques such as X-ray crystallography and NMR; there are many proteins however whose structure have not yet been solved. Computational techniques for structure prediction are viable alternatives to experimental techniques for these cases. However, the proper validation of the structural models they generate remains an issue.

Findings: In this report, we focus on homology modeling techniques and introduce the H-factor, a new indicator for assessing the quality of protein structure models generated with these techniques. The H-factor is meant to mimic the R-factor used in X-ray crystallography. The method for computing the H-factor is fully described with a demonstration of its effectiveness on a test set of target proteins.

Conclusions: We have developed a web service for computing the H-factor for models of a protein structure. This service is freely accessible at http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor.

背景:药物发现通常从确定潜在靶点开始,然后通过针对药物化合物文库的高通量筛选或通过合理的药物设计进行测试和验证。当假定的靶标是蛋白质时,后一种方法需要了解其结构。然而,发现蛋白质的结构是一项艰巨的任务。x射线晶体学和核磁共振等高分辨率技术取得了重大进展;然而,有许多蛋白质的结构尚未得到解决。在这些情况下,结构预测的计算技术是可行的替代实验技术。然而,它们生成的结构模型的正确验证仍然是一个问题。在本报告中,我们重点介绍了同源建模技术,并引入了h因子,这是一种评估用这些技术生成的蛋白质结构模型质量的新指标。h因子是为了模仿x射线晶体学中使用的r因子。计算h因子的方法是充分描述与演示其有效性的测试集的目标蛋白。结论:我们开发了一个用于计算蛋白质结构模型h因子的web服务。该服务可在http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor免费访问。
{"title":"The H-factor as a novel quality metric for homology modeling.","authors":"Eric di Luccio,&nbsp;Patrice Koehl","doi":"10.1186/2043-9113-2-18","DOIUrl":"https://doi.org/10.1186/2043-9113-2-18","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Drug discovery typically starts with the identification of a potential target that is then tested and validated either through high-throughput screening against a library of drug compounds or by rational drug design. When the putative target is a protein, the latter approach requires the knowledge of its structure. Finding the structure of a protein is however a difficult task. Significant progress has come from high-resolution techniques such as X-ray crystallography and NMR; there are many proteins however whose structure have not yet been solved. Computational techniques for structure prediction are viable alternatives to experimental techniques for these cases. However, the proper validation of the structural models they generate remains an issue.</p><p><strong>Findings: </strong>In this report, we focus on homology modeling techniques and introduce the H-factor, a new indicator for assessing the quality of protein structure models generated with these techniques. The H-factor is meant to mimic the R-factor used in X-ray crystallography. The method for computing the H-factor is fully described with a demonstration of its effectiveness on a test set of target proteins.</p><p><strong>Conclusions: </strong>We have developed a web service for computing the H-factor for models of a protein structure. This service is freely accessible at http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-18","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31023201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Modeling autism: a systems biology approach. 孤独症建模:一种系统生物学方法。
Pub Date : 2012-10-08 DOI: 10.1186/2043-9113-2-17
Mary Randolph-Gips, Pramila Srinivasan

Autism is the fastest growing developmental disorder in the world today. The prevalence of autism in the US has risen from 1 in 2500 in 1970 to 1 in 88 children today. People with autism present with repetitive movements and with social and communication impairments. These impairments can range from mild to profound. The estimated total lifetime societal cost of caring for one individual with autism is $3.2 million US dollars. With the rapid growth in this disorder and the great expense of caring for those with autism, it is imperative for both individuals and society that techniques be developed to model and understand autism. There is increasing evidence that those individuals diagnosed with autism present with highly diverse set of abnormalities affecting multiple systems of the body. To this date, little to no work has been done using a whole body systems biology approach to model the characteristics of this disorder. Identification and modelling of these systems might lead to new and improved treatment protocols, better diagnosis and treatment of the affected systems, which might lead to improved quality of life by themselves, and, in addition, might also help the core symptoms of autism due to the potential interconnections between the brain and nervous system with all these other systems being modeled. This paper first reviews research which shows that autism impacts many systems in the body, including the metabolic, mitochondrial, immunological, gastrointestinal and the neurological. These systems interact in complex and highly interdependent ways. Many of these disturbances have effects in most of the systems of the body. In particular, clinical evidence exists for increased oxidative stress, inflammation, and immune and mitochondrial dysfunction which can affect almost every cell in the body. Three promising research areas are discussed, hierarchical, subgroup analysis and modeling over time. This paper reviews some of the systems disturbed in autism and suggests several systems biology research areas. Autism poses a rich test bed for systems biology modeling techniques.

自闭症是当今世界上增长最快的发育障碍。自闭症在美国的患病率已经从1970年的1 / 2500上升到今天的1 / 88。自闭症患者表现为重复性动作,社交和沟通障碍。这些损伤从轻微到严重不等。据估计,照顾一名自闭症患者一生的社会总成本为320万美元。随着自闭症患者数量的快速增长和照顾自闭症患者的巨大费用,对个人和社会来说,开发技术来模拟和理解自闭症都是势在必行的。越来越多的证据表明,那些被诊断为自闭症的人表现出高度多样化的异常,影响着身体的多个系统。到目前为止,几乎没有人利用全身系统生物学方法来模拟这种疾病的特征。对这些系统的识别和建模可能会导致新的和改进的治疗方案,更好的诊断和治疗受影响的系统,这可能会提高生活质量,此外,也可能有助于自闭症的核心症状,因为大脑和神经系统之间的潜在联系与所有这些被建模的其他系统。本文首先综述了自闭症影响人体代谢、线粒体、免疫、胃肠和神经系统等多个系统的研究。这些系统以复杂和高度相互依赖的方式相互作用。许多这些干扰对身体的大多数系统都有影响。特别是,临床证据表明,氧化应激、炎症、免疫和线粒体功能障碍增加,几乎可以影响体内的每个细胞。讨论了三个有前途的研究领域:层次分析、子群分析和时间建模。本文综述了孤独症的一些系统干扰,并提出了几个系统生物学的研究方向。自闭症为系统生物学建模技术提供了丰富的实验平台。
{"title":"Modeling autism: a systems biology approach.","authors":"Mary Randolph-Gips,&nbsp;Pramila Srinivasan","doi":"10.1186/2043-9113-2-17","DOIUrl":"https://doi.org/10.1186/2043-9113-2-17","url":null,"abstract":"<p><p> Autism is the fastest growing developmental disorder in the world today. The prevalence of autism in the US has risen from 1 in 2500 in 1970 to 1 in 88 children today. People with autism present with repetitive movements and with social and communication impairments. These impairments can range from mild to profound. The estimated total lifetime societal cost of caring for one individual with autism is $3.2 million US dollars. With the rapid growth in this disorder and the great expense of caring for those with autism, it is imperative for both individuals and society that techniques be developed to model and understand autism. There is increasing evidence that those individuals diagnosed with autism present with highly diverse set of abnormalities affecting multiple systems of the body. To this date, little to no work has been done using a whole body systems biology approach to model the characteristics of this disorder. Identification and modelling of these systems might lead to new and improved treatment protocols, better diagnosis and treatment of the affected systems, which might lead to improved quality of life by themselves, and, in addition, might also help the core symptoms of autism due to the potential interconnections between the brain and nervous system with all these other systems being modeled. This paper first reviews research which shows that autism impacts many systems in the body, including the metabolic, mitochondrial, immunological, gastrointestinal and the neurological. These systems interact in complex and highly interdependent ways. Many of these disturbances have effects in most of the systems of the body. In particular, clinical evidence exists for increased oxidative stress, inflammation, and immune and mitochondrial dysfunction which can affect almost every cell in the body. Three promising research areas are discussed, hierarchical, subgroup analysis and modeling over time. This paper reviews some of the systems disturbed in autism and suggests several systems biology research areas. Autism poses a rich test bed for systems biology modeling techniques.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-17","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30959330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Cancer classification: Mutual information, target network and strategies of therapy. 肿瘤分类:相互信息、靶点网络和治疗策略。
Pub Date : 2012-10-02 DOI: 10.1186/2043-9113-2-16
Wen-Chin Hsu, Chan-Cheng Liu, Fu Chang, Su-Shing Chen

Unlabelled:

Background: Cancer therapy is a challenging research area because side effects often occur in chemo and radiation therapy. We intend to study a multi-targets and multi-components design that will provide synergistic results to improve efficiency of cancer therapy.

Methods: We have developed a general methodology, AMFES (Adaptive Multiple FEature Selection), for ranking and selecting important cancer biomarkers based on SVM (Support Vector Machine) classification. In particular, we exemplify this method by three datasets: a prostate cancer (three stages), a breast cancer (four subtypes), and another prostate cancer (normal vs. cancerous). Moreover, we have computed the target networks of these biomarkers as the signatures of the cancers with additional information (mutual information between biomarkers of the network). Then, we proposed a robust framework for synergistic therapy design approach which includes varies existing mechanisms.

Results: These methodologies were applied to three GEO datasets: GSE18655 (three prostate stages), GSE19536 (4 subtypes breast cancers) and GSE21036 (prostate cancer cells and normal cells) shown in. We selected 96 biomarkers for first prostate cancer dataset (three prostate stages), 72 for breast cancer (luminal A vs. luminal B), 68 for breast cancer (basal-like vs. normal-like), and 22 for another prostate cancer (cancerous vs. normal. In addition, we obtained statistically significant results of mutual information, which demonstrate that the dependencies among these biomarkers can be positive or negative.

Conclusions: We proposed an efficient feature ranking and selection scheme, AMFES, to select an important subset from a large number of features for any cancer dataset. Thus, we obtained the signatures of these cancers by building their target networks. Finally, we proposed a robust framework of synergistic therapy for cancer patients. Our framework is not only supported by real GEO datasets but also aim to a multi-targets/multi-components drug design tool, which improves the traditional single target/single component analysis methods. This framework builds a computational foundation which can provide a clear classification of cancers and lead to an efficient cancer therapy.

背景:癌症治疗是一个具有挑战性的研究领域,因为化疗和放疗经常发生副作用。我们打算研究一种多靶点和多组分的设计,将提供协同结果,以提高癌症治疗的效率。方法:我们开发了一种通用的方法,AMFES(自适应多特征选择),用于基于SVM(支持向量机)分类对重要的癌症生物标志物进行排序和选择。特别地,我们通过三个数据集举例说明了这种方法:前列腺癌(三个阶段),乳腺癌(四个亚型)和另一种前列腺癌(正常与癌)。此外,我们还计算了这些生物标志物的目标网络作为癌症的附加信息(网络生物标志物之间的相互信息)的特征。然后,我们提出了一个强大的框架的协同治疗设计方法,其中包括各种现有的机制。结果:这些方法应用于三个GEO数据集:GSE18655(三个前列腺分期),GSE19536(4种乳腺癌亚型)和GSE21036(前列腺癌细胞和正常细胞)。我们为第一个前列腺癌数据集选择了96个生物标志物(三个前列腺分期),72个用于乳腺癌(管腔A与管腔B), 68个用于乳腺癌(基底样与正常样),22个用于另一种前列腺癌(癌与正常)。此外,我们获得了统计上显著的互信息结果,这表明这些生物标志物之间的依赖关系可以是正的,也可以是负的。结论:我们提出了一种高效的特征排序和选择方案AMFES,可以从任何癌症数据集的大量特征中选择一个重要的子集。因此,我们通过建立目标网络获得了这些癌症的特征。最后,我们提出了一个强有力的框架,为癌症患者的协同治疗。该框架不仅得到了真实GEO数据集的支持,而且旨在建立一个多靶点/多成分的药物设计工具,从而改进了传统的单靶点/单成分分析方法。这个框架建立了一个计算基础,可以提供一个清晰的癌症分类,并导致一个有效的癌症治疗。
{"title":"Cancer classification: Mutual information, target network and strategies of therapy.","authors":"Wen-Chin Hsu,&nbsp;Chan-Cheng Liu,&nbsp;Fu Chang,&nbsp;Su-Shing Chen","doi":"10.1186/2043-9113-2-16","DOIUrl":"https://doi.org/10.1186/2043-9113-2-16","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Cancer therapy is a challenging research area because side effects often occur in chemo and radiation therapy. We intend to study a multi-targets and multi-components design that will provide synergistic results to improve efficiency of cancer therapy.</p><p><strong>Methods: </strong>We have developed a general methodology, AMFES (Adaptive Multiple FEature Selection), for ranking and selecting important cancer biomarkers based on SVM (Support Vector Machine) classification. In particular, we exemplify this method by three datasets: a prostate cancer (three stages), a breast cancer (four subtypes), and another prostate cancer (normal vs. cancerous). Moreover, we have computed the target networks of these biomarkers as the signatures of the cancers with additional information (mutual information between biomarkers of the network). Then, we proposed a robust framework for synergistic therapy design approach which includes varies existing mechanisms.</p><p><strong>Results: </strong>These methodologies were applied to three GEO datasets: GSE18655 (three prostate stages), GSE19536 (4 subtypes breast cancers) and GSE21036 (prostate cancer cells and normal cells) shown in. We selected 96 biomarkers for first prostate cancer dataset (three prostate stages), 72 for breast cancer (luminal A vs. luminal B), 68 for breast cancer (basal-like vs. normal-like), and 22 for another prostate cancer (cancerous vs. normal. In addition, we obtained statistically significant results of mutual information, which demonstrate that the dependencies among these biomarkers can be positive or negative.</p><p><strong>Conclusions: </strong>We proposed an efficient feature ranking and selection scheme, AMFES, to select an important subset from a large number of features for any cancer dataset. Thus, we obtained the signatures of these cancers by building their target networks. Finally, we proposed a robust framework of synergistic therapy for cancer patients. Our framework is not only supported by real GEO datasets but also aim to a multi-targets/multi-components drug design tool, which improves the traditional single target/single component analysis methods. This framework builds a computational foundation which can provide a clear classification of cancers and lead to an efficient cancer therapy.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-16","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30951391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
High-prevalence and broad spectrum of Cell Adhesion and Extracellular Matrix gene pathway mutations in epithelial ovarian cancer. 上皮性卵巢癌中细胞粘附和细胞外基质基因通路突变的高患病率和广谱性。
Pub Date : 2012-09-24 DOI: 10.1186/2043-9113-2-15
Arash Rafii, Najeeb M Halabi, Joel A Malek

Unlabelled:

Background: Ovarian cancer is the most deadly gynecological cancer because of late diagnosis, frequently with diffuse peritoneal metastases. Recent findings have shown that serous epithelial ovarian cancer has a narrow mutational spectrum with TP53 being the most frequently targeted when single genes are considered. It is, however, important to understand which pathways as a whole may be targeted for mutation.

Findings: Previously published mutational data provided by the cancer genome atlas networks findings on ovarian cancer was searched for statistically significant enrichment of genes in pathways. These pathways were then searched in all patients to identify the spectrum of mutations. Statistical significance was further shown through in-silico permutations of exome sequences using empirically observed mutation rates. We detected mutations in the cell adhesion pathway genes in more than 89% of serous epithelial ovarian cancer patients. This level of near universal mutational targeting of the cell adhesion pathway, including the extracellular matrix pathway, is previously unreported in epithelial ovarian cancer.

Conclusions: Taken together with previous studies on the role of cell adhesion and extracellular matrix gene expression in ovarian cancer and metastasis, our results identify pathways for which the mutational prevalence has previously been overlooked using single gene approaches. Analysis of mutations at the pathway level will be critical in studying heterogeneous diseases such as ovarian cancer.

背景:卵巢癌是最致命的妇科癌症,因其诊断较晚,常伴有弥漫性腹膜转移。最近的研究结果表明,浆液性上皮性卵巢癌具有狭窄的突变谱,当考虑单基因时,TP53是最常见的目标。然而,了解哪些途径作为一个整体可能成为突变的目标是很重要的。研究结果:先前发表的突变数据由癌症基因组图谱网络提供,对卵巢癌的研究结果进行了搜索,寻找通路中基因的统计学显著富集。然后在所有患者中搜索这些途径以确定突变谱。通过使用经验观察的突变率对外显子组序列进行计算机排列,进一步显示了统计学意义。我们在超过89%的浆液性上皮性卵巢癌患者中检测到细胞粘附途径基因的突变。这种水平的细胞粘附途径(包括细胞外基质途径)的几乎普遍的突变靶向,以前未在上皮性卵巢癌中报道过。结论:结合先前关于细胞粘附和细胞外基质基因表达在卵巢癌和转移中的作用的研究,我们的研究结果确定了以前使用单基因方法忽视突变流行的途径。在通路水平上分析突变对于研究异质性疾病(如卵巢癌)至关重要。
{"title":"High-prevalence and broad spectrum of Cell Adhesion and Extracellular Matrix gene pathway mutations in epithelial ovarian cancer.","authors":"Arash Rafii,&nbsp;Najeeb M Halabi,&nbsp;Joel A Malek","doi":"10.1186/2043-9113-2-15","DOIUrl":"https://doi.org/10.1186/2043-9113-2-15","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Ovarian cancer is the most deadly gynecological cancer because of late diagnosis, frequently with diffuse peritoneal metastases. Recent findings have shown that serous epithelial ovarian cancer has a narrow mutational spectrum with TP53 being the most frequently targeted when single genes are considered. It is, however, important to understand which pathways as a whole may be targeted for mutation.</p><p><strong>Findings: </strong>Previously published mutational data provided by the cancer genome atlas networks findings on ovarian cancer was searched for statistically significant enrichment of genes in pathways. These pathways were then searched in all patients to identify the spectrum of mutations. Statistical significance was further shown through in-silico permutations of exome sequences using empirically observed mutation rates. We detected mutations in the cell adhesion pathway genes in more than 89% of serous epithelial ovarian cancer patients. This level of near universal mutational targeting of the cell adhesion pathway, including the extracellular matrix pathway, is previously unreported in epithelial ovarian cancer.</p><p><strong>Conclusions: </strong>Taken together with previous studies on the role of cell adhesion and extracellular matrix gene expression in ovarian cancer and metastasis, our results identify pathways for which the mutational prevalence has previously been overlooked using single gene approaches. Analysis of mutations at the pathway level will be critical in studying heterogeneous diseases such as ovarian cancer.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30926720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A model-based statistic for detecting molecular markers associated with complex survival patterns in early-stage cancer. 用于检测与早期癌症复杂生存模式相关的分子标记的基于模型的统计。
Pub Date : 2012-08-06 DOI: 10.1186/2043-9113-2-14
Philippe Broët, Thierry Moreau

Unlabelled:

Background: In early-stage of cancer, primary treatment can be considered as effective at eliminating the tumor for a non-negligible proportion of patients whereas for the others it leads to a lower tumor burden and thereby potentially prolonged survival. In this mixed population of patients, it is of great interest to detect complex differences in survival distributions associated with molecular markers that potentially activate latent downstream pathways implicated in tumor progression.

Method: We propose a novel model-based score test designed for identifying molecular markers with complex effects on survival in early-stage cancer. From a biological point of view, the proposed score test allows to detect complex changes in the survival distributions linked to either the tumor burden or its dynamic growth.

Results: Simulation results show that the proposed statistic is powerful at identifying departure from the null hypothesis of no survival difference. The practical use of the proposed statistic is exemplified by analyzing the prognostic impact of Kras mutation in early-stage of lung adenocarcinomas. This analysis leads to the conclusion that Kras mutation has a significant negative prognostic impact on survival. Moreover, it emphasizes that the complex role of Kras mutation on survival would have been overlooked by considering results from the classical logrank test.

Conclusion: With the growing number of biological markers to be tested in early-stage cancer, the proposed score test statistic is a powerful tool for detecting molecular markers associated with complex survival patterns.

背景:在癌症的早期阶段,对于不可忽视的一部分患者来说,初级治疗可以有效地消除肿瘤,而对于其他患者来说,初级治疗可以降低肿瘤负担,从而可能延长生存期。在这种混合的患者群体中,检测与潜在激活与肿瘤进展相关的潜在下游通路的分子标记相关的生存分布的复杂差异是非常有趣的。方法:我们提出了一种新的基于模型的评分测试,旨在识别对早期癌症患者生存有复杂影响的分子标记。从生物学的角度来看,提出的评分测试允许检测与肿瘤负荷或其动态生长相关的生存分布的复杂变化。结果:模拟结果表明,所提出的统计量在识别无生存差异的零假设偏差方面是强大的。通过分析Kras突变对早期肺腺癌预后的影响,说明了所提出的统计数据的实际应用。这一分析得出结论,Kras突变对生存有显著的负面影响。此外,它强调Kras突变对生存的复杂作用可能被经典logrank试验的结果所忽视。结论:随着在早期癌症中需要检测的生物标志物越来越多,所提出的评分检验统计量是检测与复杂生存模式相关的分子标志物的有力工具。
{"title":"A model-based statistic for detecting molecular markers associated with complex survival patterns in early-stage cancer.","authors":"Philippe Broët,&nbsp;Thierry Moreau","doi":"10.1186/2043-9113-2-14","DOIUrl":"https://doi.org/10.1186/2043-9113-2-14","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>In early-stage of cancer, primary treatment can be considered as effective at eliminating the tumor for a non-negligible proportion of patients whereas for the others it leads to a lower tumor burden and thereby potentially prolonged survival. In this mixed population of patients, it is of great interest to detect complex differences in survival distributions associated with molecular markers that potentially activate latent downstream pathways implicated in tumor progression.</p><p><strong>Method: </strong>We propose a novel model-based score test designed for identifying molecular markers with complex effects on survival in early-stage cancer. From a biological point of view, the proposed score test allows to detect complex changes in the survival distributions linked to either the tumor burden or its dynamic growth.</p><p><strong>Results: </strong>Simulation results show that the proposed statistic is powerful at identifying departure from the null hypothesis of no survival difference. The practical use of the proposed statistic is exemplified by analyzing the prognostic impact of Kras mutation in early-stage of lung adenocarcinomas. This analysis leads to the conclusion that Kras mutation has a significant negative prognostic impact on survival. Moreover, it emphasizes that the complex role of Kras mutation on survival would have been overlooked by considering results from the classical logrank test.</p><p><strong>Conclusion: </strong>With the growing number of biological markers to be tested in early-stage cancer, the proposed score test statistic is a powerful tool for detecting molecular markers associated with complex survival patterns.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-14","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30814540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic model for tumour growth and metastasis formation. 肿瘤生长和转移形成的动态模型。
Pub Date : 2012-07-05 DOI: 10.1186/2043-9113-2-11
Volker Haustein, Udo Schumacher

A simple and fast computational model to describe the dynamics of tumour growth and metastasis formation is presented. The model is based on the calculation of successive generations of tumour cells and enables one to describe biologically important entities like tumour volume, time point of 1st metastatic growth or number of metastatic colonies at a given time. The model entirely relies on the chronology of these successive events of the metastatic cascade. The simulation calculations were performed for two embedded growth models to describe the Gompertzian like growth behaviour of tumours. The initial training of the models was carried out using an analytical solution for the size distribution of metastases of a hepatocellular carcinoma. We then show the applicability of our models to clinical data from the Munich Cancer Registry. Growth and dissemination characteristics of metastatic cells originating from cells in the primary breast cancer can be modelled thus showing its ability to perform systematic analyses relevant for clinical breast cancer research and treatment. In particular, our calculations show that generally metastases formation has already been initiated before the primary can be detected clinically.

提出了一个简单、快速的计算模型来描述肿瘤生长和转移形成的动力学过程。该模型基于对肿瘤细胞连续代的计算,使人们能够描述生物学上重要的实体,如肿瘤体积、第一次转移生长的时间点或在给定时间转移菌落的数量。该模型完全依赖于这些转移级联的连续事件的年表。模拟计算是对两个嵌入式生长模型进行的,以描述肿瘤的冈伯兹样生长行为。模型的初始训练是使用肝细胞癌转移灶大小分布的解析解进行的。然后,我们展示了我们的模型对慕尼黑癌症登记处临床数据的适用性。源自原发性乳腺癌细胞的转移细胞的生长和传播特征可以建模,从而显示其进行与乳腺癌临床研究和治疗相关的系统分析的能力。特别是,我们的计算表明,通常转移形成已经开始之前,原发可以检测到临床。
{"title":"A dynamic model for tumour growth and metastasis formation.","authors":"Volker Haustein,&nbsp;Udo Schumacher","doi":"10.1186/2043-9113-2-11","DOIUrl":"https://doi.org/10.1186/2043-9113-2-11","url":null,"abstract":"<p><p> A simple and fast computational model to describe the dynamics of tumour growth and metastasis formation is presented. The model is based on the calculation of successive generations of tumour cells and enables one to describe biologically important entities like tumour volume, time point of 1st metastatic growth or number of metastatic colonies at a given time. The model entirely relies on the chronology of these successive events of the metastatic cascade. The simulation calculations were performed for two embedded growth models to describe the Gompertzian like growth behaviour of tumours. The initial training of the models was carried out using an analytical solution for the size distribution of metastases of a hepatocellular carcinoma. We then show the applicability of our models to clinical data from the Munich Cancer Registry. Growth and dissemination characteristics of metastatic cells originating from cells in the primary breast cancer can be modelled thus showing its ability to perform systematic analyses relevant for clinical breast cancer research and treatment. In particular, our calculations show that generally metastases formation has already been initiated before the primary can be detected clinically.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-11","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40196950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Effects of autoclaving and high pressure on allergenicity of hazelnut proteins. 高压灭菌对榛子蛋白致敏性的影响。
Pub Date : 2012-05-22 DOI: 10.1186/2043-9113-2-12
Elena López, Carmen Cuadrado, Carmen Burbano, Maria Aranzazu Jiménez, Julia Rodríguez, Jesús F Crespo

Background: Hazelnut is reported as a causative agent of allergic reactions. However it is also an edible nut with health benefits. The allergenic characteristics of hazelnut-samples after autoclaving (AC) and high-pressure (HHP) processing have been studied and are also presented here. Previous studies demonstrated that AC treatments were responsible for structural transformation of protein structure motifs. Thus, structural analyses of allergen proteins from hazelnut were carried out to observe what is occurring in relation to the specific-IgE recognition of the related allergenic proteins. The aims of this work are to evaluate the effect of AC and HHP processing on hazelnut in vitro allergenicity using human-sera and to analyse the complexity of hazelnut allergen-protein structures.

Methods: Hazelnut-samples were subjected to AC and HHP processing. The specific IgE- reactivity was studied in 15 allergic clinic-patients via western blotting analyses. A series of homology-based-bioinformatics 3D-models (Cora 1, Cora 8, Cora 9 and Cora 11) were generated for the antigens included in the study to analyse the co mplexity of their protein structure. This study is supported by the Declaration of Helsinki and subsequent ethical guidelines.

Results: A severe reduction in vitro in allergenicity to hazelnut after AC processing was observed in the allergic clinic-patients studied. The specific-IgE binding of some of the described immunoreactive hazelnut protein-bands: Cora 1 ~18KDa, Cora 8 ~9KDa, Cora 9 ~35-40KDa and Cora 11 ~47-48 KDa decreases. Furthermore a relevant glycosylation was assigned and visualized via structural analysis of proteins (3D-modelling) for the first time in the protein-allergen Cora 11 showing a new role which could open a new door for allergenicity-unravellings.

Conclusion: Hazelnut allergenicity-studies in vivo via Prick-Prick and other means using AC processing are crucial to verify the data we observed via in vitro analyses. Glycosylation studies provided us with clues to elucidate, in the near future, mechanisms of the structures that contribute to hazelnut allergenicity, which thus, in turn, help alleviate food allergens.

背景:榛子被报道为过敏反应的病原体。然而,它也是一种有益健康的可食用坚果。本文研究了高压灭菌(AC)和高压(HHP)处理后榛子样品的致敏特性。先前的研究表明,AC处理负责蛋白质结构基序的结构转化。因此,对榛子过敏原蛋白进行了结构分析,以观察与相关过敏原蛋白的特异性ige识别有关的情况。本研究旨在利用人血清评价AC和HHP处理对榛子体外变应原性的影响,并分析榛子变应原蛋白结构的复杂性。方法:对榛子样品进行AC和HHP处理。应用免疫印迹法对15例临床过敏患者的特异性IgE反应性进行了研究。对纳入研究的抗原建立一系列基于同源性的生物信息学3d模型(Cora 1、Cora 8、Cora 9和Cora 11),分析其蛋白结构的共复杂性。这项研究得到了赫尔辛基宣言和随后的伦理准则的支持。结果:在临床研究的过敏患者中,观察到AC处理后对榛子的体外变应原性明显降低。部分描述的免疫反应性榛子蛋白带(Cora 1 ~18KDa、Cora 8 ~9KDa、Cora 9 ~35 ~ 40kda和Cora 11 ~47 ~ 48kda)的特异性ige结合降低。此外,通过蛋白质结构分析(3d建模),首次在蛋白质过敏原Cora 11中指定并可视化了相关的糖基化,显示了一个新的作用,可以为过敏原的解开打开新的大门。结论:用AC处理的针刺和其他方法在体内研究榛子的致敏性对验证我们在体外分析中观察到的数据至关重要。糖基化研究为我们在不久的将来阐明导致榛子过敏原的结构机制提供了线索,从而有助于减轻食物过敏原。
{"title":"Effects of autoclaving and high pressure on allergenicity of hazelnut proteins.","authors":"Elena López,&nbsp;Carmen Cuadrado,&nbsp;Carmen Burbano,&nbsp;Maria Aranzazu Jiménez,&nbsp;Julia Rodríguez,&nbsp;Jesús F Crespo","doi":"10.1186/2043-9113-2-12","DOIUrl":"https://doi.org/10.1186/2043-9113-2-12","url":null,"abstract":"<p><strong>Background: </strong>Hazelnut is reported as a causative agent of allergic reactions. However it is also an edible nut with health benefits. The allergenic characteristics of hazelnut-samples after autoclaving (AC) and high-pressure (HHP) processing have been studied and are also presented here. Previous studies demonstrated that AC treatments were responsible for structural transformation of protein structure motifs. Thus, structural analyses of allergen proteins from hazelnut were carried out to observe what is occurring in relation to the specific-IgE recognition of the related allergenic proteins. The aims of this work are to evaluate the effect of AC and HHP processing on hazelnut in vitro allergenicity using human-sera and to analyse the complexity of hazelnut allergen-protein structures.</p><p><strong>Methods: </strong>Hazelnut-samples were subjected to AC and HHP processing. The specific IgE- reactivity was studied in 15 allergic clinic-patients via western blotting analyses. A series of homology-based-bioinformatics 3D-models (Cora 1, Cora 8, Cora 9 and Cora 11) were generated for the antigens included in the study to analyse the co mplexity of their protein structure. This study is supported by the Declaration of Helsinki and subsequent ethical guidelines.</p><p><strong>Results: </strong>A severe reduction in vitro in allergenicity to hazelnut after AC processing was observed in the allergic clinic-patients studied. The specific-IgE binding of some of the described immunoreactive hazelnut protein-bands: Cora 1 ~18KDa, Cora 8 ~9KDa, Cora 9 ~35-40KDa and Cora 11 ~47-48 KDa decreases. Furthermore a relevant glycosylation was assigned and visualized via structural analysis of proteins (3D-modelling) for the first time in the protein-allergen Cora 11 showing a new role which could open a new door for allergenicity-unravellings.</p><p><strong>Conclusion: </strong>Hazelnut allergenicity-studies in vivo via Prick-Prick and other means using AC processing are crucial to verify the data we observed via in vitro analyses. Glycosylation studies provided us with clues to elucidate, in the near future, mechanisms of the structures that contribute to hazelnut allergenicity, which thus, in turn, help alleviate food allergens.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-12","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30636980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Splitting random forest (SRF) for determining compact sets of genes that distinguish between cancer subtypes. 分裂随机森林(SRF)用于确定区分癌症亚型的紧凑基因集。
Pub Date : 2012-05-22 DOI: 10.1186/2043-9113-2-13
Xiaowei Guan, Mark R Chance, Jill S Barnholtz-Sloan

Background: The identification of very small subsets of predictive variables is an important toπc that has not often been considered in the literature. In order to discover highly predictive yet compact gene set classifiers from whole genome expression data, a non-parametric, iterative algorithm, Splitting Random Forest (SRF), was developed to robustly identify genes that distinguish between molecular subtypes. The goal is to improve the prediction accuracy while considering sparsity.

Results: The optimal SRF 50 run (SRF50) gene classifiers for glioblastoma (GB), breast (BC) and ovarian cancer (OC) subtypes had overall prediction rates comparable to those from published datasets upon validation (80.1%-91.7%). The SRF50 sets outperformed other methods by identifying compact gene sets needed for distinguishing between tested cancer subtypes (10-200 fold fewer genes than ANOVA or published gene sets). The SRF50 sets achieved superior and robust overall and subtype prediction accuracies when compared with single random forest (RF) and the Top 50 ANOVA results (80.1% vs 77.8% for GB; 84.0% vs 74.1% for BC; 89.8% vs 88.9% for OC in SRF50 vs single RF comparison; 80.1% vs 77.2% for GB; 84.0% vs 82.7% for BC; 89.8% vs 87.0% for OC in SRF50 vs Top 50 ANOVA comparison). There was significant overlap between SRF50 and published gene sets, showing that SRF identifies the relevant sub-sets of important gene lists. Through Ingenuity Pathway Analysis (IPA), the overlap in "hub" genes between the SRF50 and published genes sets were RB1, πK3R1, PDGFBB and ERK1/2 for GB; ESR1, MYC, NFkB and ERK1/2 for BC; and Akt, FN1, NFkB, PDGFBB and ERK1/2 for OC.

Conclusions: The SRF approach is an effective driver of biomarker discovery research that reduces the number of genes needed for robust classification, dissects complex, high dimensional "omic" data and provides novel insights into the cellular mechanisms that define cancer subtypes.

背景:识别非常小的预测变量子集是一个重要的问题,但在文献中很少被考虑到。为了从全基因组表达数据中发现高度预测且紧凑的基因集分类器,开发了一种非参数迭代算法——分裂随机森林(SRF),以稳健地识别区分分子亚型的基因。目标是在考虑稀疏性的同时提高预测精度。结果:胶质母细胞瘤(GB)、乳腺癌(BC)和卵巢癌(OC)亚型的最佳SRF50基因分类器(SRF50)在验证时的总体预测率与已发表的数据集相当(80.1%-91.7%)。SRF50组通过识别用于区分被测试癌症亚型所需的紧凑基因集(比方差分析或已发表的基因集少10-200倍)优于其他方法。与单一随机森林(RF)和前50名ANOVA结果相比,SRF50集的总体和亚型预测精度更高,更稳健(80.1% vs 77.8%;84.0% vs 74.1% BC;SRF50和单一RF对比中OC的比例分别为89.8%和88.9%;80.1% vs GB 77.2%;84.0% vs BC 82.7%;在SRF50和前50名的方差分析比较中,OC的比例为89.8%和87.0%)。SRF50与已发表的基因集之间存在显著的重叠,表明SRF识别了重要基因列表的相关亚集。通过匠心途径分析(Ingenuity Pathway Analysis, IPA), SRF50与已发表基因集的“枢纽”基因重叠为GB的RB1、πK3R1、PDGFBB和ERK1/2;BC的ESR1、MYC、NFkB和ERK1/2;Akt、FN1、NFkB、PDGFBB和ERK1/2。结论:SRF方法是生物标志物发现研究的有效驱动因素,减少了强大分类所需的基因数量,剖析了复杂的高维“组学”数据,并为定义癌症亚型的细胞机制提供了新的见解。
{"title":"Splitting random forest (SRF) for determining compact sets of genes that distinguish between cancer subtypes.","authors":"Xiaowei Guan,&nbsp;Mark R Chance,&nbsp;Jill S Barnholtz-Sloan","doi":"10.1186/2043-9113-2-13","DOIUrl":"https://doi.org/10.1186/2043-9113-2-13","url":null,"abstract":"<p><strong>Background: </strong>The identification of very small subsets of predictive variables is an important toπc that has not often been considered in the literature. In order to discover highly predictive yet compact gene set classifiers from whole genome expression data, a non-parametric, iterative algorithm, Splitting Random Forest (SRF), was developed to robustly identify genes that distinguish between molecular subtypes. The goal is to improve the prediction accuracy while considering sparsity.</p><p><strong>Results: </strong>The optimal SRF 50 run (SRF50) gene classifiers for glioblastoma (GB), breast (BC) and ovarian cancer (OC) subtypes had overall prediction rates comparable to those from published datasets upon validation (80.1%-91.7%). The SRF50 sets outperformed other methods by identifying compact gene sets needed for distinguishing between tested cancer subtypes (10-200 fold fewer genes than ANOVA or published gene sets). The SRF50 sets achieved superior and robust overall and subtype prediction accuracies when compared with single random forest (RF) and the Top 50 ANOVA results (80.1% vs 77.8% for GB; 84.0% vs 74.1% for BC; 89.8% vs 88.9% for OC in SRF50 vs single RF comparison; 80.1% vs 77.2% for GB; 84.0% vs 82.7% for BC; 89.8% vs 87.0% for OC in SRF50 vs Top 50 ANOVA comparison). There was significant overlap between SRF50 and published gene sets, showing that SRF identifies the relevant sub-sets of important gene lists. Through Ingenuity Pathway Analysis (IPA), the overlap in \"hub\" genes between the SRF50 and published genes sets were RB1, πK3R1, PDGFBB and ERK1/2 for GB; ESR1, MYC, NFkB and ERK1/2 for BC; and Akt, FN1, NFkB, PDGFBB and ERK1/2 for OC.</p><p><strong>Conclusions: </strong>The SRF approach is an effective driver of biomarker discovery research that reduces the number of genes needed for robust classification, dissects complex, high dimensional \"omic\" data and provides novel insights into the cellular mechanisms that define cancer subtypes.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-13","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30636308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
A comparative analysis of protein targets of withdrawn cardiovascular drugs in human and mouse. 人与小鼠停药心血管药物蛋白靶点的比较分析。
Pub Date : 2012-05-01 DOI: 10.1186/2043-9113-2-10
Yuqi Zhao, Jingwen Wang, Yanjie Wang, Jingfei Huang

Background: Mouse is widely used in animal testing of cardiovascular disease. However, a large number of cardiovascular drugs that have been experimentally proved to work well on mouse were withdrawn because they caused adverse side effects in human.

Methods: In this study, we investigate whether binding patterns of withdrawn cardiovascular drugs are conserved between mouse and human through computational dockings and molecular dynamic simulations. In addition, we also measured the level of conservation of gene expression patterns of the drug targets and their interacting partners by analyzing the microarray data.

Results: The results show that target proteins of withdrawn cardiovascular drugs are functionally conserved between human and mouse. However, all the binding patterns of withdrawn drugs we retrieved show striking difference due to sequence divergence in drug-binding pocket, mainly through loss or gain of hydrogen bond donors and distinct drug-binding pockets. The binding affinities of withdrawn drugs to their receptors tend to be reduced from mouse to human. In contrast, the FDA-approved and best-selling drugs are little affected.

Conclusions: Our analysis suggests that sequence divergence in drug-binding pocket may be a reasonable explanation for the discrepancy of drug effects between animal models and human.

背景:小鼠被广泛用于心血管疾病的动物实验。然而,大量实验证明对小鼠有效的心血管药物由于对人体产生不良副作用而被撤下。方法:通过计算对接和分子动力学模拟,研究停药后心血管药物的结合模式在人鼠之间是否具有保守性。此外,我们还通过分析微阵列数据测量了药物靶点及其相互作用伙伴的基因表达模式的保护水平。结果:停用的心血管药物靶蛋白在人和小鼠之间具有功能保守性。然而,由于药物结合口袋的序列差异,我们检索到的所有退出药物的结合模式都表现出显著的差异,这主要是由于氢键供体的丢失或获得以及不同的药物结合口袋。从小鼠到人,停药后药物与其受体的结合亲和力趋于降低。相比之下,fda批准的畅销药物几乎没有受到影响。结论:我们的分析表明,药物结合口袋的序列差异可能是动物模型与人类模型之间药物作用差异的合理解释。
{"title":"A comparative analysis of protein targets of withdrawn cardiovascular drugs in human and mouse.","authors":"Yuqi Zhao,&nbsp;Jingwen Wang,&nbsp;Yanjie Wang,&nbsp;Jingfei Huang","doi":"10.1186/2043-9113-2-10","DOIUrl":"https://doi.org/10.1186/2043-9113-2-10","url":null,"abstract":"<p><strong>Background: </strong>Mouse is widely used in animal testing of cardiovascular disease. However, a large number of cardiovascular drugs that have been experimentally proved to work well on mouse were withdrawn because they caused adverse side effects in human.</p><p><strong>Methods: </strong>In this study, we investigate whether binding patterns of withdrawn cardiovascular drugs are conserved between mouse and human through computational dockings and molecular dynamic simulations. In addition, we also measured the level of conservation of gene expression patterns of the drug targets and their interacting partners by analyzing the microarray data.</p><p><strong>Results: </strong>The results show that target proteins of withdrawn cardiovascular drugs are functionally conserved between human and mouse. However, all the binding patterns of withdrawn drugs we retrieved show striking difference due to sequence divergence in drug-binding pocket, mainly through loss or gain of hydrogen bond donors and distinct drug-binding pockets. The binding affinities of withdrawn drugs to their receptors tend to be reduced from mouse to human. In contrast, the FDA-approved and best-selling drugs are little affected.</p><p><strong>Conclusions: </strong>Our analysis suggests that sequence divergence in drug-binding pocket may be a reasonable explanation for the discrepancy of drug effects between animal models and human.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40195584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Journal of clinical bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1