Wei Peng , Zhihao Zhang , Wei Dai , Zhihao Ping , Xiaodong Fu , Li Liu , Lijun Liu , Ning Yu
{"title":"MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning","authors":"Wei Peng , Zhihao Zhang , Wei Dai , Zhihao Ping , Xiaodong Fu , Li Liu , Lijun Liu , Ning Yu","doi":"10.1016/j.ymeth.2024.11.001","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integrating<!--> <!-->the<!--> <!-->gene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methods<!--> <!-->in cell classification. In this work, we propose MVCLST, a multi-view comparative learning<!--> <!-->method to analyze spatial transcriptomics<!--> <!-->data for accurate cell type classification. MVCLST<!--> <!-->constructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to cluster<!--> <!-->the learned features<!--> <!-->for cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomics<!--> <!-->data analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. It<!--> <!-->also outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulb<!--> <!-->data.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"232 ","pages":"Pages 115-128"},"PeriodicalIF":4.2000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S104620232400238X","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integrating the gene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methods in cell classification. In this work, we propose MVCLST, a multi-view comparative learning method to analyze spatial transcriptomics data for accurate cell type classification. MVCLST constructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to cluster the learned features for cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomics data analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. It also outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulb data.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.