Background: Advances in sequencing technologies have allowed collection of massive genome-wide information that substantially advances lung cancer diagnosis and prognosis. Identifying influential markers for clinical endpoints of interest has been an indispensable and critical component of the statistical analysis pipeline. However, classical variable selection methods are not feasible or reliable for high-throughput genetic data. Our objective is to propose a model-free gene screening procedure for high-throughput right-censored data, and to develop a predictive gene signature for lung squamous cell carcinoma (LUSC) with the proposed procedure.
Methods: A gene screening procedure was developed based on a recently proposed independence measure. The Cancer Genome Atlas (TCGA) data on LUSC was then studied. The screening procedure was conducted to narrow down the set of influential genes to 378 candidates. A penalized Cox model was then fitted to the reduced set, which further identified a 6-gene signature for LUSC prognosis. The 6-gene signature was validated on datasets from the Gene Expression Omnibus.
Results: Both model-fitting and validation results reveal that our method selected influential genes that lead to biologically sensible findings as well as better predictive performance, compared to existing alternatives. According to our multivariable Cox regression analysis, the 6-gene signature was indeed a significant prognostic factor (p-value < 0.001) while controlling for clinical covariates.
Conclusions: Gene screening as a fast dimension reduction technique plays an important role in analyzing high-throughput data. The main contribution of this paper is to introduce a fundamental yet pragmatic model-free gene screening approach that aids statistical analysis of right-censored cancer data, and provide a lateral comparison with other available methods in the context of LUSC.
Our main objective was to identify abundantly expressed tyrosine kinases in multiple myeloma (MM) as potential therapeutic targets. We first compared the transcriptomes of malignant plasma cells from newly diagnosed MM patients who were risk-categorized based on the patient-specific EMC-92/SKY-92 gene expression signature values vs. normal plasma cells from healthy volunteers using archived datasets from the HOVON65/GMMG-HD4 randomized Phase 3 study evaluating the clinical efficacy of bortezomib induction/maintenance versus classic cytotoxic drugs and thalidomide maintenance. In particular, ERBB1/EGFR was significantly overexpressed in MM cells in comparison to normal control plasma cells, and it was differentially overexpressed in MM cells from high-risk patients. Amplified expression of EGFR/ERBB1 mRNA in MM cells was positively correlated with increased expression levels of mRNAs for several DNA binding proteins and transcription factors with known upregulating activity on EGFR/ERBB1 gene expression. MM patients with the highest ERBB1/EGFR expression level had significantly shorter PFS and OS times than patients with the lowest ERBB1/EGFR expression level. High expression levels of EGFR/ERBB1 were associated with significantly increased hazard ratios for unfavorable PFS and OS outcomes in both univariate and multivariate Cox proportional hazards models. The impact of high EGFR/ERBB1 expression on the PFS and OS outcomes remained significant even after accounting for the prognostic effects of other covariates. These results regarding the prognostic effect of EGFR/ERBB1 expression were validated using the MMRF-CoMMpass RNAseq dataset generated in patients treated with more recently applied drug combinations included in contemporary induction regimens. Our findings provide new insights regarding the molecular mechanism and potential clinical significance of upregulated EGFR/ERBB1 expression in MM.