Rishabh Kala, Gregory W Peek, Tabitha M Hardy, Trygve O Tollefsbol
MicroRNAs (miRNAs) are remarkable molecules that appear to have a fundamental role in the biology of the cell. They constitute a class of non-protein encoding RNA molecules which have now emerged as key players in regulating the activity of mRNA. miRNAs are small RNAmolecules around 22 nucleotides in length, which affect the activity of specific mRNA, directly degrading it and/or preventing its translation into protein. The science of miRNAs holds them as candidate biomarkers for the early detection and management of cancer. There is also considerable excitement for the use of miRNAs as a novel class of therapeutic targets and as a new class of therapeutic agents for the treatment of cancers. From a clinical perspective, miRNAs can induce a number of effects and may have a diverse application in biomedical research. This review highlights the general mode of action of miRNAs, their biogenesis, the effect of diet on miRNA expression and the impact of miRNAs on cancer epigenetics and drug resistance in various cancers. Further we also provide emphasis on bioinformatics software which can be used to determine potential targets of miRNAs.
{"title":"MicroRNAs: an emerging science in cancer epigenetics.","authors":"Rishabh Kala, Gregory W Peek, Tabitha M Hardy, Trygve O Tollefsbol","doi":"10.1186/2043-9113-3-6","DOIUrl":"https://doi.org/10.1186/2043-9113-3-6","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are remarkable molecules that appear to have a fundamental role in the biology of the cell. They constitute a class of non-protein encoding RNA molecules which have now emerged as key players in regulating the activity of mRNA. miRNAs are small RNAmolecules around 22 nucleotides in length, which affect the activity of specific mRNA, directly degrading it and/or preventing its translation into protein. The science of miRNAs holds them as candidate biomarkers for the early detection and management of cancer. There is also considerable excitement for the use of miRNAs as a novel class of therapeutic targets and as a new class of therapeutic agents for the treatment of cancers. From a clinical perspective, miRNAs can induce a number of effects and may have a diverse application in biomedical research. This review highlights the general mode of action of miRNAs, their biogenesis, the effect of diet on miRNA expression and the impact of miRNAs on cancer epigenetics and drug resistance in various cancers. Further we also provide emphasis on bioinformatics software which can be used to determine potential targets of miRNAs.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2013-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-3-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31312027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Yeh Chuang, Hsueh-Wei Chang, Ming-Cheng Lin, Cheng-Hong Yang
Background: Single nucleotide polymorphisms (SNPs) in genes derived from distinct pathways are associated with a breast cancer risk. Identifying possible SNP-SNP interactions in genome-wide case-control studies is an important task when investigating genetic factors that influence common complex traits; the effects of SNP-SNP interaction need to be characterized. Furthermore, observations of the complex interplay (interactions) between SNPs for high-dimensional combinations are still computationally and methodologically challenging. An improved branch and bound algorithm with feature selection (IBBFS) is introduced to identify SNP combinations with a maximal difference of allele frequencies between the case and control groups in breast cancer, i.e., the high/low risk combinations of SNPs.
Results: A total of 220 real case and 334 real control breast cancer data are used to test IBBFS and identify significant SNP combinations. We used the odds ratio (OR) as a quantitative measure to estimate the associated cancer risk of multiple SNP combinations to identify the complex biological relationships underlying the progression of breast cancer, i.e., the most likely SNP combinations. Experimental results show the estimated odds ratio of the best SNP combination with genotypes is significantly smaller than 1 (between 0.165 and 0.657) for specific SNP combinations of the tested SNPs in the low risk groups. In the high risk groups, predicted SNP combinations with genotypes are significantly greater than 1 (between 2.384 and 6.167) for specific SNP combinations of the tested SNPs.
Conclusions: This study proposes an effective high-speed method to analyze SNP-SNP interactions in breast cancer association studies. A number of important SNPs are found to be significant for the high/low risk group. They can thus be considered a potential predictor for breast cancer association.
背景:来自不同途径的基因中的单核苷酸多态性(snp)与乳腺癌风险相关。在研究影响常见复杂性状的遗传因素时,在全基因组病例对照研究中确定可能的SNP-SNP相互作用是一项重要任务;SNP-SNP相互作用的影响需要被表征。此外,观察高维组合的snp之间复杂的相互作用(相互作用)在计算和方法上仍然具有挑战性。提出了一种改进的分支结合特征选择算法(branch and bound algorithm with feature selection, IBBFS),用于识别乳腺癌病例组与对照组之间等位基因频率差异最大的SNP组合,即SNP的高/低风险组合。结果:共使用220例真实病例和334例真实对照乳腺癌数据进行IBBFS检测,并发现显著SNP组合。我们使用比值比(OR)作为定量测量来估计多个SNP组合的相关癌症风险,以确定乳腺癌进展背后的复杂生物学关系,即最可能的SNP组合。实验结果显示,低风险人群中所检测SNP的特定SNP组合的最佳组合与基因型的比值比显著小于1(在0.165 ~ 0.657之间)。在高危人群中,检测SNP的特定SNP组合预测与基因型的SNP组合显著大于1(2.384 ~ 6.167)。结论:本研究提出了一种有效的快速分析乳腺癌关联研究中SNP-SNP相互作用的方法。许多重要的snp被发现对高/低风险群体是显著的。因此,它们可以被认为是乳腺癌关联的潜在预测因子。
{"title":"Improved branch and bound algorithm for detecting SNP-SNP interactions in breast cancer.","authors":"Li-Yeh Chuang, Hsueh-Wei Chang, Ming-Cheng Lin, Cheng-Hong Yang","doi":"10.1186/2043-9113-3-4","DOIUrl":"https://doi.org/10.1186/2043-9113-3-4","url":null,"abstract":"<p><strong>Background: </strong>Single nucleotide polymorphisms (SNPs) in genes derived from distinct pathways are associated with a breast cancer risk. Identifying possible SNP-SNP interactions in genome-wide case-control studies is an important task when investigating genetic factors that influence common complex traits; the effects of SNP-SNP interaction need to be characterized. Furthermore, observations of the complex interplay (interactions) between SNPs for high-dimensional combinations are still computationally and methodologically challenging. An improved branch and bound algorithm with feature selection (IBBFS) is introduced to identify SNP combinations with a maximal difference of allele frequencies between the case and control groups in breast cancer, i.e., the high/low risk combinations of SNPs.</p><p><strong>Results: </strong>A total of 220 real case and 334 real control breast cancer data are used to test IBBFS and identify significant SNP combinations. We used the odds ratio (OR) as a quantitative measure to estimate the associated cancer risk of multiple SNP combinations to identify the complex biological relationships underlying the progression of breast cancer, i.e., the most likely SNP combinations. Experimental results show the estimated odds ratio of the best SNP combination with genotypes is significantly smaller than 1 (between 0.165 and 0.657) for specific SNP combinations of the tested SNPs in the low risk groups. In the high risk groups, predicted SNP combinations with genotypes are significantly greater than 1 (between 2.384 and 6.167) for specific SNP combinations of the tested SNPs.</p><p><strong>Conclusions: </strong>This study proposes an effective high-speed method to analyze SNP-SNP interactions in breast cancer association studies. A number of important SNPs are found to be significant for the high/low risk group. They can thus be considered a potential predictor for breast cancer association.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2013-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-3-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31331582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Perry G Ridge, Christine Miller, Pinar Bayrak-Toydemir, D Hunter Best, Rong Mao, Jeffrey J Swensen, Elaine Lyon, Karl V Voelkerding
Unlabelled:
Background: The recent introduction of high throughput sequencing technologies into clinical genetics has made it practical to simultaneously sequence many genes. In contrast, previous technologies limited sequencing based tests to only a handful of genes. While the ability to more accurately diagnose inherited diseases is a great benefit it introduces specific challenges. Interpretation of missense mutations continues to be challenging and the number of variants of uncertain significance continues to grow.
Results: We leveraged the data available at ARUP Laboratories, a major reference laboratory, for the CFTR gene to explore specific challenges related to variant interpretation, including a focus on understanding ethnic-specific variants and an evaluation of existing databases for clinical interpretation of variants. In this study we analyzed 555 patients representing eight different ethnic groups. We observed 184 different variants, most of which were ethnic group specific. Eighty-five percent of these variants were present in the Cystic Fibrosis Mutation Database, whereas the Human Mutation Database and dbSNP/1000 Genomes had far fewer of the observed variants. Finally, 21 of the variants were novel and we report these variants and their clinical classifications.
Conclusions: Based on our analyses of data from six years of CFTR testing at ARUP Laboratories a more comprehensive, clinical grade database is needed for the accurate interpretation of observed variants. Furthermore, there is a particular need for more and better information regarding variants from individuals of non-Caucasian ethnicity.
{"title":"Cystic fibrosis testing in a referral laboratory: results and lessons from a six-year period.","authors":"Perry G Ridge, Christine Miller, Pinar Bayrak-Toydemir, D Hunter Best, Rong Mao, Jeffrey J Swensen, Elaine Lyon, Karl V Voelkerding","doi":"10.1186/2043-9113-3-3","DOIUrl":"10.1186/2043-9113-3-3","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>The recent introduction of high throughput sequencing technologies into clinical genetics has made it practical to simultaneously sequence many genes. In contrast, previous technologies limited sequencing based tests to only a handful of genes. While the ability to more accurately diagnose inherited diseases is a great benefit it introduces specific challenges. Interpretation of missense mutations continues to be challenging and the number of variants of uncertain significance continues to grow.</p><p><strong>Results: </strong>We leveraged the data available at ARUP Laboratories, a major reference laboratory, for the CFTR gene to explore specific challenges related to variant interpretation, including a focus on understanding ethnic-specific variants and an evaluation of existing databases for clinical interpretation of variants. In this study we analyzed 555 patients representing eight different ethnic groups. We observed 184 different variants, most of which were ethnic group specific. Eighty-five percent of these variants were present in the Cystic Fibrosis Mutation Database, whereas the Human Mutation Database and dbSNP/1000 Genomes had far fewer of the observed variants. Finally, 21 of the variants were novel and we report these variants and their clinical classifications.</p><p><strong>Conclusions: </strong>Based on our analyses of data from six years of CFTR testing at ARUP Laboratories a more comprehensive, clinical grade database is needed for the accurate interpretation of observed variants. Furthermore, there is a particular need for more and better information regarding variants from individuals of non-Caucasian ethnicity.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2013-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3563502/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31182509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dario Di Silvestre, Italo Zoppis, Francesca Brambilla, Valeria Bellettato, Giancarlo Mauri, Pierluigi Mauri
Unlabelled:
Background: Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.
Results: Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.
Conclusions: These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.
{"title":"Availability of MudPIT data for classification of biological samples.","authors":"Dario Di Silvestre, Italo Zoppis, Francesca Brambilla, Valeria Bellettato, Giancarlo Mauri, Pierluigi Mauri","doi":"10.1186/2043-9113-3-1","DOIUrl":"https://doi.org/10.1186/2043-9113-3-1","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.</p><p><strong>Results: </strong>Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.</p><p><strong>Conclusions: </strong>These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2013-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-3-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31156691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Walter Arancio, Carla Giordano, Giuseppe Pizzolanti
Unlabelled:
Background: Hutchinson-Gilford progeria syndrome is a rare dominant human disease of genetic origin. The average life expectancy is about 20 years, patients' life quality is still very poor and no efficient therapy has yet been developed. It is caused by mutation of the LMNA gene, which results in accumulation in the nuclear membrane of a particular splicing form of Lamin-A called progerin. The mechanism by which progerin perturbs cellular homeostasis and leads to the symptoms is still under debate.Micro-RNAs are able to negatively regulate transcription by coupling with the 3' UnTranslated Region of messenger RNAs. Several Micro-RNAs recognize the same 3' UnTranslated Region and each Micro-RNA can recognize multiple 3' UnTranslated Regions of different messenger RNAs. When different messenger RNAs are co-regulated via a similar panel of micro-RNAs, these messengers are called Competing Endogenous RNAs, or ceRNAs.The 3' UnTranslated Region of the longest LMNA transcript was analysed looking for its ceRNAs. The aim of this study was to search for candidate genes and gene ontology functions possibly influenced by LMNA mutations that may exert a role in progeria development.
Results: 11 miRNAs were isolated as potential LMNA regulators. By computational analysis, the miRNAs pointed to 17 putative LMNA ceRNAs. Gene ontology analysis of isolated ceRNAs showed an enrichment in RNA interference and control of cell cycle functions.
Conclusion: This study isolated novel genes and functions potentially involved in LMNA network of regulation that could be involved in laminopathies such as the Hutchinson-Gilford progeria syndrome.
{"title":"A ceRNA analysis on LMNA gene focusing on the Hutchinson-Gilford progeria syndrome.","authors":"Walter Arancio, Carla Giordano, Giuseppe Pizzolanti","doi":"10.1186/2043-9113-3-2","DOIUrl":"https://doi.org/10.1186/2043-9113-3-2","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Hutchinson-Gilford progeria syndrome is a rare dominant human disease of genetic origin. The average life expectancy is about 20 years, patients' life quality is still very poor and no efficient therapy has yet been developed. It is caused by mutation of the LMNA gene, which results in accumulation in the nuclear membrane of a particular splicing form of Lamin-A called progerin. The mechanism by which progerin perturbs cellular homeostasis and leads to the symptoms is still under debate.Micro-RNAs are able to negatively regulate transcription by coupling with the 3' UnTranslated Region of messenger RNAs. Several Micro-RNAs recognize the same 3' UnTranslated Region and each Micro-RNA can recognize multiple 3' UnTranslated Regions of different messenger RNAs. When different messenger RNAs are co-regulated via a similar panel of micro-RNAs, these messengers are called Competing Endogenous RNAs, or ceRNAs.The 3' UnTranslated Region of the longest LMNA transcript was analysed looking for its ceRNAs. The aim of this study was to search for candidate genes and gene ontology functions possibly influenced by LMNA mutations that may exert a role in progeria development.</p><p><strong>Results: </strong>11 miRNAs were isolated as potential LMNA regulators. By computational analysis, the miRNAs pointed to 17 putative LMNA ceRNAs. Gene ontology analysis of isolated ceRNAs showed an enrichment in RNA interference and control of cell cycle functions.</p><p><strong>Conclusion: </strong>This study isolated novel genes and functions potentially involved in LMNA network of regulation that could be involved in laminopathies such as the Hutchinson-Gilford progeria syndrome.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2013-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-3-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31160651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Identification of prognostic biomarkers is hallmark of cancer genomics. Since miRNAs regulate expression of multiple genes, they act as potent biomarkers in several cancers. Identification of miRNAs that are prognostically important has been done sporadically, but no resource is available till date that allows users to study prognostics of miRNAs of interest, utilizing the wealth of available data, in major cancer types.
Description: In this paper, we present a web based tool that allows users to study prognostic properties of miRNAs in several cancer types, using publicly available data. We have compiled data from Gene Expression Omnibus (GEO), and recently developed "The Cancer Genome Atlas (TCGA)", to create this tool. The tool is called "PROGmiR" and it is available at http://www.compbio.iupui.edu/progmir. Currently, our tool can be used to study overall survival implications for approximately 1050 human miRNAs in 16 major cancer types.
Conclusions: We believe this resource, as a hypothesis generation tool, will be helpful for researchers to link miRNA expression with cancer outcome and to design mechanistic studies. We studied performance of our tool using identified miRNA biomarkers from published studies. The prognostic plots created using our tool for specific miRNAs in specific cancer types corroborated with the findings in the studies.
{"title":"PROGmiR: a tool for identifying prognostic miRNA biomarkers in multiple cancers using publicly available data.","authors":"Chirayu Pankaj Goswami, Harikrishna Nakshatri","doi":"10.1186/2043-9113-2-23","DOIUrl":"https://doi.org/10.1186/2043-9113-2-23","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Identification of prognostic biomarkers is hallmark of cancer genomics. Since miRNAs regulate expression of multiple genes, they act as potent biomarkers in several cancers. Identification of miRNAs that are prognostically important has been done sporadically, but no resource is available till date that allows users to study prognostics of miRNAs of interest, utilizing the wealth of available data, in major cancer types.</p><p><strong>Description: </strong>In this paper, we present a web based tool that allows users to study prognostic properties of miRNAs in several cancer types, using publicly available data. We have compiled data from Gene Expression Omnibus (GEO), and recently developed \"The Cancer Genome Atlas (TCGA)\", to create this tool. The tool is called \"PROGmiR\" and it is available at http://www.compbio.iupui.edu/progmir. Currently, our tool can be used to study overall survival implications for approximately 1050 human miRNAs in 16 major cancer types.</p><p><strong>Conclusions: </strong>We believe this resource, as a hypothesis generation tool, will be helpful for researchers to link miRNA expression with cancer outcome and to design mechanistic studies. We studied performance of our tool using identified miRNA biomarkers from published studies. The prognostic plots created using our tool for specific miRNAs in specific cancer types corroborated with the findings in the studies.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2012-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40198017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenwei Wang, Alperen Taciroglu, Stefan R Maetschke, Colleen C Nelson, Mark A Ragan, Melissa J Davis
Unlabelled:
Background: Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset.
Results: We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours.
Conclusions: We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers.
{"title":"mCOPA: analysis of heterogeneous features in cancer expression data.","authors":"Chenwei Wang, Alperen Taciroglu, Stefan R Maetschke, Colleen C Nelson, Mark A Ragan, Melissa J Davis","doi":"10.1186/2043-9113-2-22","DOIUrl":"https://doi.org/10.1186/2043-9113-2-22","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset.</p><p><strong>Results: </strong>We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours.</p><p><strong>Conclusions: </strong>We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-22","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31103555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subhashini Srinivasan, Arun H Patil, Mohit Verma, Jonathan L Bingham, Raghunathan Srivatsan
Unlabelled:
Background: Second generation RNA sequencing technology (RNA-seq) offers the potential to interrogate genome-wide differential RNA splicing in cancer. However, since short RNA reads spanning spliced junctions cannot be mapped contiguously onto to the chromosomes, there is a need for methods to profile splicing from RNA-seq data. Before the invent of RNA-seq technologies, microarrays containing probe sequences representing exon-exon junctions of known genes have been used to hybridize cellular RNAs for measuring context-specific differential splicing. Here, we extend this approach to detect tumor-specific splicing in prostate cancer from a RNA-seq dataset.
Method: A database, SPEventH, representing probe sequences of under a million non-redundant splice events in human is created with exon-exon junctions of optimized length for use as virtual microarray. SPEventH is used to map tens of millions of reads from matched tumor-normal samples from ten individuals with prostate cancer. Differential counts of reads mapped to each event from tumor and matched normal is used to identify statistically significant tumor-specific splice events in prostate.
Results: We find sixty-one (61) splice events that are differentially expressed with a p-value of less than 0.0001 and a fold change of greater than 1.5 in prostate tumor compared to the respective matched normal samples. Interestingly, the only evidence, EST (BF372485), in the public database for one of the tumor-specific splice event joining one of the intron in KLK3 gene to an intron in KLK2, is also derived from prostate tumor-tissue. Also, the 765 events with a p-value of less than 0.001 is shown to cluster all twenty samples in a context-specific fashion with few exceptions stemming from low coverage of samples.
Conclusions: We demonstrate that virtual microarray experiments using a non-redundant database of splice events in human is both efficient and sensitive way to profile genome-wide splicing in biological samples and to detect tumor-specific splicing signatures in datasets from RNA-seq technologies. The signature from the large number of splice events that could cluster tumor and matched-normal samples into two tight separate clusters, suggests that differential splicing is yet another RNA phenotype, alongside gene expression and SNPs, that can be exploited for tumor stratification.
{"title":"Genome-wide Profiling of RNA splicing in prostate tumor from RNA-seq data using virtual microarrays.","authors":"Subhashini Srinivasan, Arun H Patil, Mohit Verma, Jonathan L Bingham, Raghunathan Srivatsan","doi":"10.1186/2043-9113-2-21","DOIUrl":"https://doi.org/10.1186/2043-9113-2-21","url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Second generation RNA sequencing technology (RNA-seq) offers the potential to interrogate genome-wide differential RNA splicing in cancer. However, since short RNA reads spanning spliced junctions cannot be mapped contiguously onto to the chromosomes, there is a need for methods to profile splicing from RNA-seq data. Before the invent of RNA-seq technologies, microarrays containing probe sequences representing exon-exon junctions of known genes have been used to hybridize cellular RNAs for measuring context-specific differential splicing. Here, we extend this approach to detect tumor-specific splicing in prostate cancer from a RNA-seq dataset.</p><p><strong>Method: </strong>A database, SPEventH, representing probe sequences of under a million non-redundant splice events in human is created with exon-exon junctions of optimized length for use as virtual microarray. SPEventH is used to map tens of millions of reads from matched tumor-normal samples from ten individuals with prostate cancer. Differential counts of reads mapped to each event from tumor and matched normal is used to identify statistically significant tumor-specific splice events in prostate.</p><p><strong>Results: </strong>We find sixty-one (61) splice events that are differentially expressed with a p-value of less than 0.0001 and a fold change of greater than 1.5 in prostate tumor compared to the respective matched normal samples. Interestingly, the only evidence, EST (BF372485), in the public database for one of the tumor-specific splice event joining one of the intron in KLK3 gene to an intron in KLK2, is also derived from prostate tumor-tissue. Also, the 765 events with a p-value of less than 0.001 is shown to cluster all twenty samples in a context-specific fashion with few exceptions stemming from low coverage of samples.</p><p><strong>Conclusions: </strong>We demonstrate that virtual microarray experiments using a non-redundant database of splice events in human is both efficient and sensitive way to profile genome-wide splicing in biological samples and to detect tumor-specific splicing signatures in datasets from RNA-seq technologies. The signature from the large number of splice events that could cluster tumor and matched-normal samples into two tight separate clusters, suggests that differential splicing is yet another RNA phenotype, alongside gene expression and SNPs, that can be exploited for tumor stratification.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2012-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-21","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31070995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philip S Crooke, John T Tossberg, Sara N Horst, John L Tauscher, Melodie A Henderson, Dawn B Beaulieu, David A Schwartz, Nancy J Olsen, Thomas M Aune
Background: Inflammatory bowel diseases, ulcerative colitis and Crohn's disease are considered to be of autoimmune origin, but the etiology of irritable bowel syndrome remains elusive. Furthermore, classifying patients into irritable bowel syndrome and inflammatory bowel diseases can be difficult without invasive testing and holds important treatment implications. Our aim was to assess the ability of gene expression profiling in blood to differentiate among these subject groups.
Methods: Transcript levels of a total of 45 genes in blood were determined by quantitative real-time polymerase chain reaction (RT-PCR). We applied three separate analytic approaches; one utilized a scoring system derived from combinations of ratios of expression levels of two genes and two different support vector machines.
Results: All methods discriminated different subject cohorts, irritable bowel syndrome from control, inflammatory bowel disease from control, irritable bowel syndrome from inflammatory bowel disease, and ulcerative colitis from Crohn's disease, with high degrees of sensitivity and specificity.
Conclusions: These results suggest these approaches may provide clinically useful prediction of the presence of these gastro-intestinal diseases and syndromes.
{"title":"Using gene expression data to identify certain gastro-intestinal diseases.","authors":"Philip S Crooke, John T Tossberg, Sara N Horst, John L Tauscher, Melodie A Henderson, Dawn B Beaulieu, David A Schwartz, Nancy J Olsen, Thomas M Aune","doi":"10.1186/2043-9113-2-20","DOIUrl":"https://doi.org/10.1186/2043-9113-2-20","url":null,"abstract":"<p><strong>Background: </strong>Inflammatory bowel diseases, ulcerative colitis and Crohn's disease are considered to be of autoimmune origin, but the etiology of irritable bowel syndrome remains elusive. Furthermore, classifying patients into irritable bowel syndrome and inflammatory bowel diseases can be difficult without invasive testing and holds important treatment implications. Our aim was to assess the ability of gene expression profiling in blood to differentiate among these subject groups.</p><p><strong>Methods: </strong>Transcript levels of a total of 45 genes in blood were determined by quantitative real-time polymerase chain reaction (RT-PCR). We applied three separate analytic approaches; one utilized a scoring system derived from combinations of ratios of expression levels of two genes and two different support vector machines.</p><p><strong>Results: </strong>All methods discriminated different subject cohorts, irritable bowel syndrome from control, inflammatory bowel disease from control, irritable bowel syndrome from inflammatory bowel disease, and ulcerative colitis from Crohn's disease, with high degrees of sensitivity and specificity.</p><p><strong>Conclusions: </strong>These results suggest these approaches may provide clinically useful prediction of the presence of these gastro-intestinal diseases and syndromes.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2012-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-20","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31066514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}