Ryan J. Kramer , Kristen E. Rhodin , Aaron Therien , Vignesh Raman , Austin Eckhoff , Camryn Thompson , Betty C. Tong , Dan G. Blazer III , Michael E. Lidsky , Thomas D’Amico , Daniel P. Nussbaum
{"title":"利用多重对应分析进行无监督聚类,揭示多种胃肠道癌症中与临床相关的人口统计学变量","authors":"Ryan J. Kramer , Kristen E. Rhodin , Aaron Therien , Vignesh Raman , Austin Eckhoff , Camryn Thompson , Betty C. Tong , Dan G. Blazer III , Michael E. Lidsky , Thomas D’Amico , Daniel P. Nussbaum","doi":"10.1016/j.soi.2024.100009","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Patients with gastrointestinal malignancies represent a heterogenous population, even among those with similar stage and treatment pathways. Here, we used dimensionality reduction in the National Cancer Database (NCDB) to inform unsupervised clustering of patients with three gastrointestinal malignancies and examined outcomes among these computationally-derived groups.</p></div><div><h3>Methods</h3><p>The NCDB was queried for three cohorts of patients receiving multimodal therapy: stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. Multiple correspondence analysis (MCA), a dimensionality reduction technique well-suited for categorical variables such as demographic data in the NCDB, was performed on this cohort with variables including demographic and tumor characteristics. Principal components were analyzed to derive clusters. Outcomes for each cluster were compared using Kaplan-Meier survival methods.</p></div><div><h3>Results</h3><p>For esophageal (n = 11,399), gastric (n = 2033), and colon (n = 72,057) cancer, the same four variables were identified as highly representative. The principal variables were income quartile, education quartile, age quartile, and insurance type. Survival analysis demonstrated significant differences in overall survival between clusters in esophageal (p < 0.0001) and colon (p < 0.0001) cancer, but not gastric cancer (p = 0.56). Clusters defined by high income, high education, younger age, and private insurance fared better.</p></div><div><h3>Conclusions</h3><p>Using MCA, we identified combinations of 4 demographic variables in the NCDB with stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. These groupings had significantly different survival outcomes in colon and esophageal cancer. This work serves as proof-of-concept for the utility of unsupervised clustering for outcomes research in surgical malignancies and identifies at-risk populations.</p></div>","PeriodicalId":101191,"journal":{"name":"Surgical Oncology Insight","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950247024000057/pdfft?md5=3c00e0283b85506b14944aa9afd3a079&pid=1-s2.0-S2950247024000057-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Unsupervised clustering using multiple correspondence analysis reveals clinically-relevant demographic variables across multiple gastrointestinal cancers\",\"authors\":\"Ryan J. Kramer , Kristen E. Rhodin , Aaron Therien , Vignesh Raman , Austin Eckhoff , Camryn Thompson , Betty C. Tong , Dan G. Blazer III , Michael E. Lidsky , Thomas D’Amico , Daniel P. Nussbaum\",\"doi\":\"10.1016/j.soi.2024.100009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><p>Patients with gastrointestinal malignancies represent a heterogenous population, even among those with similar stage and treatment pathways. Here, we used dimensionality reduction in the National Cancer Database (NCDB) to inform unsupervised clustering of patients with three gastrointestinal malignancies and examined outcomes among these computationally-derived groups.</p></div><div><h3>Methods</h3><p>The NCDB was queried for three cohorts of patients receiving multimodal therapy: stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. Multiple correspondence analysis (MCA), a dimensionality reduction technique well-suited for categorical variables such as demographic data in the NCDB, was performed on this cohort with variables including demographic and tumor characteristics. Principal components were analyzed to derive clusters. Outcomes for each cluster were compared using Kaplan-Meier survival methods.</p></div><div><h3>Results</h3><p>For esophageal (n = 11,399), gastric (n = 2033), and colon (n = 72,057) cancer, the same four variables were identified as highly representative. The principal variables were income quartile, education quartile, age quartile, and insurance type. Survival analysis demonstrated significant differences in overall survival between clusters in esophageal (p < 0.0001) and colon (p < 0.0001) cancer, but not gastric cancer (p = 0.56). Clusters defined by high income, high education, younger age, and private insurance fared better.</p></div><div><h3>Conclusions</h3><p>Using MCA, we identified combinations of 4 demographic variables in the NCDB with stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. These groupings had significantly different survival outcomes in colon and esophageal cancer. This work serves as proof-of-concept for the utility of unsupervised clustering for outcomes research in surgical malignancies and identifies at-risk populations.</p></div>\",\"PeriodicalId\":101191,\"journal\":{\"name\":\"Surgical Oncology Insight\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2950247024000057/pdfft?md5=3c00e0283b85506b14944aa9afd3a079&pid=1-s2.0-S2950247024000057-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Surgical Oncology Insight\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2950247024000057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical Oncology Insight","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950247024000057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised clustering using multiple correspondence analysis reveals clinically-relevant demographic variables across multiple gastrointestinal cancers
Objective
Patients with gastrointestinal malignancies represent a heterogenous population, even among those with similar stage and treatment pathways. Here, we used dimensionality reduction in the National Cancer Database (NCDB) to inform unsupervised clustering of patients with three gastrointestinal malignancies and examined outcomes among these computationally-derived groups.
Methods
The NCDB was queried for three cohorts of patients receiving multimodal therapy: stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. Multiple correspondence analysis (MCA), a dimensionality reduction technique well-suited for categorical variables such as demographic data in the NCDB, was performed on this cohort with variables including demographic and tumor characteristics. Principal components were analyzed to derive clusters. Outcomes for each cluster were compared using Kaplan-Meier survival methods.
Results
For esophageal (n = 11,399), gastric (n = 2033), and colon (n = 72,057) cancer, the same four variables were identified as highly representative. The principal variables were income quartile, education quartile, age quartile, and insurance type. Survival analysis demonstrated significant differences in overall survival between clusters in esophageal (p < 0.0001) and colon (p < 0.0001) cancer, but not gastric cancer (p = 0.56). Clusters defined by high income, high education, younger age, and private insurance fared better.
Conclusions
Using MCA, we identified combinations of 4 demographic variables in the NCDB with stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. These groupings had significantly different survival outcomes in colon and esophageal cancer. This work serves as proof-of-concept for the utility of unsupervised clustering for outcomes research in surgical malignancies and identifies at-risk populations.