Identification and validation of a seven-gene prognostic marker in colon cancer based on single-cell transcriptome analysis

IF 1.9 4区 生物学 Q4 CELL BIOLOGY IET Systems Biology Pub Date : 2022-03-30 DOI:10.1049/syb2.12041
Yang Zhou, Yang Guo, Yuanhe Wang
{"title":"Identification and validation of a seven-gene prognostic marker in colon cancer based on single-cell transcriptome analysis","authors":"Yang Zhou,&nbsp;Yang Guo,&nbsp;Yuanhe Wang","doi":"10.1049/syb2.12041","DOIUrl":null,"url":null,"abstract":"<p>Colon cancer (CC) is one of the most commonly diagnosed tumours worldwide. Single-cell RNA sequencing (scRNA-seq) can accurately reflect the heterogeneity within and between tumour cells and identify important genes associated with cancer development and growth. In this study, scRNA-seq was used to identify reliable prognostic biomarkers in CC. ScRNA-seq data of CC before and after 5-fluorouracil treatment were first downloaded from the Gene Expression Omnibus database. The data were pre-processed, and dimensionality reduction was performed using principal component analysis and t-distributed stochastic neighbour embedding algorithms. Additionally, the transcriptome data, somatic variant data, and clinical reports of patients with CC were obtained from The Cancer Genome Atlas database. Seven key genes were identified using Cox regression analysis and the least absolute shrinkage and selection operator method to establish signatures associated with CC prognoses. The identified signatures were validated on independent datasets, and somatic mutations and potential oncogenic pathways were further explored. Based on these features, gene signatures, and other clinical variables, a more effective predictive model nomogram for patients with CC was constructed, and a decision curve analysis was performed to assess the utility of the nomogram. A prognostic signature consisting of seven prognostic-related genes, including <i>CAV2</i>, <i>EREG</i>, <i>NGFRAP1</i>, <i>WBSCR22</i>, <i>SPINT2</i>, <i>CCDC28A</i>, and <i>BCL10</i>, was constructed and validated. The proficiency and credibility of the signature were verified in both internal and external datasets, and the results showed that the seven-gene signature could effectively predict the prognosis of patients with CC under various clinical conditions. A nomogram was then constructed based on features such as the RiskScore, patients' age, neoplasm stage, and tumor (T), nodes (N), and metastases (M) classification, and the nomogram had good clinical utility. Higher RiskScores were associated with a higher tumour mutational burden, which was confirmed to be a prognostic risk factor. Gene set enrichment analysis showed that high-score groups were enriched in ‘cytoplasmic DNA sensing’, ‘Extracellular matrix receptor interactions’, and ‘focal adhesion’, and low-score groups were enriched in ‘natural killer cell-mediated cytotoxicity’, and ‘T-cell receptor signalling pathways’, among other pathways. A robust seven-gene marker for CC was identified based on scRNA-seq data and was validated in multiple independent cohort studies. These findings provide a new potential marker to predict the prognosis of patients with CC.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"16 2","pages":"72-83"},"PeriodicalIF":1.9000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.12041","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.12041","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 3

Abstract

Colon cancer (CC) is one of the most commonly diagnosed tumours worldwide. Single-cell RNA sequencing (scRNA-seq) can accurately reflect the heterogeneity within and between tumour cells and identify important genes associated with cancer development and growth. In this study, scRNA-seq was used to identify reliable prognostic biomarkers in CC. ScRNA-seq data of CC before and after 5-fluorouracil treatment were first downloaded from the Gene Expression Omnibus database. The data were pre-processed, and dimensionality reduction was performed using principal component analysis and t-distributed stochastic neighbour embedding algorithms. Additionally, the transcriptome data, somatic variant data, and clinical reports of patients with CC were obtained from The Cancer Genome Atlas database. Seven key genes were identified using Cox regression analysis and the least absolute shrinkage and selection operator method to establish signatures associated with CC prognoses. The identified signatures were validated on independent datasets, and somatic mutations and potential oncogenic pathways were further explored. Based on these features, gene signatures, and other clinical variables, a more effective predictive model nomogram for patients with CC was constructed, and a decision curve analysis was performed to assess the utility of the nomogram. A prognostic signature consisting of seven prognostic-related genes, including CAV2, EREG, NGFRAP1, WBSCR22, SPINT2, CCDC28A, and BCL10, was constructed and validated. The proficiency and credibility of the signature were verified in both internal and external datasets, and the results showed that the seven-gene signature could effectively predict the prognosis of patients with CC under various clinical conditions. A nomogram was then constructed based on features such as the RiskScore, patients' age, neoplasm stage, and tumor (T), nodes (N), and metastases (M) classification, and the nomogram had good clinical utility. Higher RiskScores were associated with a higher tumour mutational burden, which was confirmed to be a prognostic risk factor. Gene set enrichment analysis showed that high-score groups were enriched in ‘cytoplasmic DNA sensing’, ‘Extracellular matrix receptor interactions’, and ‘focal adhesion’, and low-score groups were enriched in ‘natural killer cell-mediated cytotoxicity’, and ‘T-cell receptor signalling pathways’, among other pathways. A robust seven-gene marker for CC was identified based on scRNA-seq data and was validated in multiple independent cohort studies. These findings provide a new potential marker to predict the prognosis of patients with CC.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于单细胞转录组分析的结肠癌七基因预后标志物的鉴定和验证
结肠癌(CC)是世界上最常见的肿瘤之一。单细胞RNA测序(scRNA-seq)能够准确反映肿瘤细胞内及细胞间的异质性,识别与肿瘤发生生长相关的重要基因。在本研究中,scRNA-seq用于鉴定可靠的CC预后生物标志物,首先从基因表达Omnibus数据库下载5-氟尿嘧啶治疗前后CC的scRNA-seq数据。对数据进行预处理,利用主成分分析和t分布随机邻居嵌入算法进行降维。此外,从the Cancer Genome Atlas数据库中获得了CC患者的转录组数据、体细胞变异数据和临床报告。使用Cox回归分析和最小绝对收缩和选择算子方法确定了七个关键基因,以建立与CC预后相关的特征。鉴定的特征在独立的数据集上得到验证,并进一步探索体细胞突变和潜在的致癌途径。基于这些特征、基因特征和其他临床变量,构建了一个更有效的CC患者预测模型nomogram,并进行决策曲线分析来评估nomogram的效用。构建并验证了由CAV2、EREG、NGFRAP1、WBSCR22、SPINT2、CCDC28A和BCL10等7个预后相关基因组成的预后特征。在内部和外部数据集中验证了签名的熟练度和可信度,结果表明,七基因签名可以有效预测CC患者在各种临床条件下的预后。然后根据RiskScore、患者年龄、肿瘤分期、肿瘤(T)、淋巴结(N)和转移(M)分类等特征构建nomogram, nomogram具有良好的临床应用价值。较高的风险评分与较高的肿瘤突变负担相关,这被证实是一个预后风险因素。基因集富集分析显示,高分组富集于“细胞质DNA传感”、“细胞外基质受体相互作用”和“局灶黏附”,而低分组富集于“自然杀伤细胞介导的细胞毒性”和“t细胞受体信号通路”等途径。基于scRNA-seq数据确定了一个强大的七基因CC标记,并在多个独立队列研究中得到验证。这些发现为预测CC患者预后提供了一个新的潜在指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IET Systems Biology
IET Systems Biology 生物-数学与计算生物学
CiteScore
4.20
自引率
4.30%
发文量
17
审稿时长
>12 weeks
期刊介绍: IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells. The scope includes the following topics: Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.
期刊最新文献
DDANet: A deep dilated attention network for intracerebral haemorrhage segmentation. Human essential gene identification based on feature fusion and feature screening. Inference and analysis of cell-cell communication of non-myeloid circulating cells in late sepsis based on single-cell RNA-seq. siRNAEfficacyDB: An experimentally supported small interfering RNA efficacy database. Deep-GB: A novel deep learning model for globular protein prediction using CNN-BiLSTM architecture and enhanced PSSM with trisection strategy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1