[This corrects the article DOI: 10.1177/11769351231177267.].
[This corrects the article DOI: 10.1177/11769351231177267.].
Lung cancer is considered the most common and the deadliest cancer type. Lung cancer could be mainly of 2 types: small cell lung cancer and non-small cell lung cancer. Non-small cell lung cancer is affected by about 85% while small cell lung cancer is only about 14%. Over the last decade, functional genomics has arisen as a revolutionary tool for studying genetics and uncovering changes in gene expression. RNA-Seq has been applied to investigate the rare and novel transcripts that aid in discovering genetic changes that occur in tumours due to different lung cancers. Although RNA-Seq helps to understand and characterise the gene expression involved in lung cancer diagnostics, discovering the biomarkers remains a challenge. Usage of classification models helps uncover and classify the biomarkers based on gene expression levels over the different lung cancers. The current research concentrates on computing transcript statistics from gene transcript files with a normalised fold change of genes and identifying quantifiable differences in gene expression levels between the reference genome and lung cancer samples. The collected data is analysed, and machine learning models were developed to classify genes as causing NSCLC, causing SCLC, causing both or neither. An exploratory data analysis was performed to identify the probability distribution and principal features. Due to the limited number of features available, all of them were used in predicting the class. To address the imbalance in the dataset, an under-sampling algorithm Near Miss was carried out on the dataset. For classification, the research primarily focused on 4 supervised machine learning algorithms: Logistic Regression, KNN classifier, SVM classifier and Random Forest classifier and additionally, 2 ensemble algorithms were considered: XGboost and AdaBoost. Out of these, based on the weighted metrics considered, the Random Forest classifier showing 87% accuracy was considered to be the best performing algorithm and thus was used to predict the biomarkers causing NSCLC and SCLC. The imbalance and limited features in the dataset restrict any further improvement in the model's accuracy or precision. In our present study using the gene expression values (LogFC, P Value) as the feature sets in the Random Forest Classifier BRAF, KRAS, NRAS, EGFR is predicted to be the possible biomarkers causing NSCLC and ATF6, ATF3, PGDFA, PGDFD, PGDFC and PIP5K1C is predicted to be the possible biomarkers causing SCLC from the transcriptome analysis. It gave a precision of 91.3% and 91% recall after fine tuning. Some of the common biomarkers predicted for NSCLC and SCLC were CDK4, CDK6, BAK1, CDKN1A, DDB2.
Iron is an essential cofactor needed for normal functions of various enzymes and its depletion lead to increase DNA damage, genomic instability, deteriorate innate, adaptive immunity, and promote tumor development. It is also linked to tumorigenesis of breast cancer cells through enhancing mammary tumor growth and metastasis. There is insufficient data describing this association in Saudi Arabia. This study aims to determine the prevalence of iron deficiency and its association with breast cancer among premenopausal and postmenopausal women referred for breast cancer screening center in Al Ahsa, Eastern Province of Saudi Arabia. Age, hemoglobin level, iron level, history of anemia, or iron deficiency were collected from patients' medical records. The included participants were grouped based on their age into premenopausal (<50 years) or postmenopausal (⩾50 years). The definition of low Hb implemented (Hb below 12 g/dL) and low total serum Iron levels (below 8 μmol/L). Logistic regression test was used to compute the association between having a positive cancer screening test (radiological or histocytological) and participant's lab results. The results are presented as odds ratios and 95% confidence intervals. Thrree hundred fifty-seven women were included, 77% (n = 274) of them were premenopausal. This group cases had more history of iron deficiency (149 [60%] vs 25 (30%), P = .001) compared to those in the postmenopausal group. The risk of having a positive radiological cancer screening test was associated with age (OR = 1.04, 95% CI 1.02-1.06), but negatively was associated with iron level (OR = 0.9, 95% CI 0.86-0.97) among the entire cohort. This study is the first to propose an association between iron deficiency and breast cancer among Saudi young females. This could suggest iron level as a new risk factor that may be used by clinicians to assess breast cancer risk.
Background: The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.
Purpose: To develop algorithms matching heterogeneous data to "real" patients and "real" tumors with respect to patient identification (PI) and tumor identification (TI).
Methods: A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.
Results: Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).
Conclusion: The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.
Tumour volume is typically calculated using only length and width measurements, using width as a proxy for height in a 1:1 ratio. When tracking tumour growth over time, important morphological information and measurement accuracy is lost by ignoring height, which we show is a unique variable. Lengths, widths, and heights of 9522 subcutaneous tumours in mice were measured using 3D and thermal imaging. The average height:width ratio was found to be 1:3 proving that using width as a proxy for height overestimates tumour volume. Comparing volumes calculated with and without tumour height to the true volumes of excised tumours indeed showed that using the volume formula including height produced volumes 36X more accurate (based off of percentage difference). Monitoring the height:width relationship (prominence) across tumour growth curves indicated that prominence varied, and that height could change independent of width. Twelve cell lines were investigated individually; the scale of tumour prominence was cell line-dependent with relatively less prominent tumours (MC38, BL2, LL/2) and more prominent tumours (RENCA, HCT116) detected. Prominence trends across the growth cycle were also dependent on cell line; prominence was correlated with tumour growth in some cell lines (4T1, CT26, LNCaP), but not others (MC38, TC-1, LL/2). When pooled, invasive cell lines produced tumours that were significantly less prominent at volumes >1200 mm3 compared to non-invasive cell lines (P < .001). Modelling was used to show the impact of the increased accuracy gained by including height in volume calculations on several efficacy study outcomes. Variations in measurement accuracy contribute to experimental variation and irreproducibility of data, therefore we strongly advise researchers to measure height to improve accuracy in tumour studies.
Different tumor types are characterized by unique histopathological patterns including distinctive nuclear architectures. I hypothesized that the difference in nuclear appearance is reflected in different nuclear maps of chromosome territories, the discrete regions occupied by individual chromosomes in the interphase nucleus. To test this hypothesis, I used interchromosomal translocations (ITLs) as an analytical tool to map chromosome territories in 11 different tumor types from the TCGA PanCancer database encompassing 6003 tumors with 5295 ITLs. For each chromosome I determined the number and percentage of all ITLs for any given tumor type. Chromosomes were ranked according to the frequency and percentage of ITLs per chromosome. The ranking showed similar patterns for all tumor types. Chromosomes 1, 8, 11, 17, and 19 were ranked in the top quarter, accounting for 35.2% of 5295 ITLs, whereas chromosomes 13, 15, 18, 21, and X were in the bottom quarter, accounting for only 10.5% ITLs. The correlation between the chromosome ranking in the total group of 6003 tumors and the ranking in individual tumor types was significant, ranging from P < .0001 to .0033. Thus, contrary to my hypothesis, different tumor types share a common nuclear map of chromosome territories. Based on the large number of ITLs in 11 different types of malignancy one can discern a shared pattern of chromosome territories in cancer and propose a probabilistic model of chromosomes 1, 8, 11, 17, 19 in the center of the nucleus and chromosomes 13, 15, 18, 21, X at the periphery.
Motivation: The PAM50 signature/method is widely used for intrinsic subtyping of breast cancer samples. However, depending on the number and composition of the samples included in a cohort, the method may assign different subtypes to the same sample. This lack of robustness is mainly due to the fact that PAM50 subtracts a reference profile, which is computed using all samples in the cohort, from each sample before classification. In this paper we propose modifications to PAM50 to develop a simple and robust single-sample classifier, called MPAM50, for intrinsic subtyping of breast cancer. Like PAM50, the modified method uses a nearest centroid approach for classification, but the centroids are computed differently, and the distances to the centroids are determined using an alternative method. Additionally, MPAM50 uses unnormalized expression values for classification and does not subtract a reference profile from the samples. In other words, MPAM50 classifies each sample independently, and so avoids the previously mentioned robustness issue.
Results: A training set was employed to find the new MPAM50 centroids. MPAM50 was then tested on 19 independent datasets (obtained using various expression profiling technologies) containing 9637 samples. Overall good agreement was observed between the PAM50- and MPAM50-assigned subtypes with a median accuracy of 0.792, which (we show) is comparable with the median concordance between various implementations of PAM50. Additionally, MPAM50- and PAM50-assigned intrinsic subtypes were found to agree comparably with the reported clinical subtypes. Also, survival analyses indicated that MPAM50 preserves the prognostic value of the intrinsic subtypes. These observations demonstrate that MPAM50 can replace PAM50 without loss of performance. On the other hand, MPAM50 was compared with 2 previously published single-sample classifiers, and with 3 alternative modified PAM50 approaches. The results indicated a superior performance by MPAM50.
Conclusions: MPAM50 is a robust, simple, and accurate single-sample classifier of intrinsic subtypes of breast cancer.
Objectives: This study examined prescription NSAIDs as one of the leading predictors of incident depression and assessed the direction of the association among older cancer survivors with osteoarthritis.
Methods: This study used a retrospective cohort (N = 14, 992) of older adults with incident cancer (breast, prostate, colorectal cancers, or non-Hodgkin's lymphoma) and osteoarthritis. We used the longitudinal data from the linked Surveillance, Epidemiology, and End Results -Medicare data for the study period from 2006 through 2016, with a 12-month baseline and 12-month follow-up period. Cumulative NSAIDs days was assessed during the baseline period and incident depression was assessed during the follow-up period. An eXtreme Gradient Boosting (XGBoost) model was built with 10-fold repeated stratified cross-validation and hyperparameter tuning using the training dataset. The final model selected from the training data demonstrated high performance (Accuracy: 0.82, Recall: 0.75, Precision: 0.75) when applied to the test data. SHapley Additive exPlanations (SHAP) was used to interpret the output from the XGBoost model.
Results: Over 50% of the study cohort had at least one prescption of NSAIDs. Nearly 13% of the cohort were diagnosed with incident depression, with the rates ranging between 7.4% for prostate cancer and 17.0% for colorectal cancer. The highest incident depression rate of 25% was observed at 90 and 120 cumulative NSAIDs days thresholds. Cumulative NSAIDs days was the sixth leading predictor of incident depression among older adults with OA and cancer. Age, education, care fragmentation, polypharmacy, and zip code level poverty were the top 5 predictors of incident depression.
Conclusion: Overall, 1 in 8 older adults with cancer and OA were diagnosed with incident depression. Cumulative NSAIDs days was the sixth leading predictor with an overall positive association with incident depression. However, the association was complex and varied by the cumulative NSAIDs days.
Histone methyltransferases (HMTs) comprise a subclass of epigenetic regulators. Dysregulation of these enzymes results in aberrant epigenetic regulation, commonly observed in various tumor types, including hepatocellular adenocarcinoma (HCC). Probably, these epigenetic changes could lead to tumorigenesis processes. To predict how histone methyltransferase genes and their genetic alterations (somatic mutations, somatic copy number alterations, and gene expression changes) are involved in hepatocellular adenocarcinoma processes, we performed an integrated computational analysis of genetic alterations in 50 HMT genes present in hepatocellular adenocarcinoma. Biological data were obtained through the public repository with 360 samples from patients with hepatocellular carcinoma. Through these biological data, we identified 10 HMT genes (SETDB1, ASH1L, SMYD2, SMYD3, EHMT2, SETD3, PRDM14, PRDM16, KMT2C, and NSD3) with a significant genetic alteration rate (14%) within 360 samples. Of these 10 HMT genes, KMT2C and ASH1L have the highest mutation rate in HCC samples, 5.6% and 2.8%, respectively. Regarding somatic copy number alteration, ASH1L and SETDB1 are amplified in several samples, while SETD3, PRDM14, and NSD3 showed a high rate of large deletion. Finally, SETDB1, SETD3, PRDM14, and NSD3 could play an important role in the progression of hepatocellular adenocarcinoma since alterations in these genes lead to a decrease in patient survival, unlike patients who present these genes without genetic alterations. Our computational analysis provides new insights that help to understand how HMTs are associated with hepatocellular carcinoma, as well as provide a basis for future experimental investigations using HMTs as genetic targets against hepatocellular carcinoma.