Zhangjie Di , Bo Yang , Meng Li , Yue Wu , Hong Ji
{"title":"Batch effects correction in scRNA-seq based on biological-noise decoupling autoencoder and central-cross loss","authors":"Zhangjie Di , Bo Yang , Meng Li , Yue Wu , Hong Ji","doi":"10.1016/j.compbiolchem.2024.108261","DOIUrl":null,"url":null,"abstract":"<div><div>Technical or biologically irrelevant differences caused by different experiments, times, or sequencing platforms can generate batch effects that mask the true biological information. Therefore, batch effects are typically removed when analyzing single-cell RNA sequencing (scRNA-seq) datasets for downstream tasks. Existing batch correction methods usually mitigate batch effects by reducing the data from different batches to a lower dimensional space before clustering, potentially leading to the loss of rare cell types. To address this problem, we introduce a novel single-cell data batch effect correction model using Biological-noise Decoupling Autoencoder (BDA) and Central-cross Loss termed BDACL. The model initially reconstructs raw data using an auto-encoder and conducts preliminary clustering. We then construct a similarity matrix and a hierarchical clustering tree to delineate relationships within and between different batches. Finally, we introduce a Central-cross Loss (CL). This loss leverages cross-entropy loss to prompt the model to better distinguish between different cluster labels. Additionally, it employs the Central Loss to encourage samples to form more compact clusters in the embedding space, thereby enhancing the consistency and interpretability of clustering results to mitigate differences between different batches. The primary innovation of this model lies in reconstructing data with an auto-encoder and gradually merging smaller clusters into larger ones using a hierarchical clustering tree. By using reallocated cluster labels as training labels and employing the Central-cross Loss, the model effectively eliminates batch effects in an unsupervised manner. Compared to current methods, BDACL can mitigate batch effects without losing rare cell types.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108261"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124002494","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Technical or biologically irrelevant differences caused by different experiments, times, or sequencing platforms can generate batch effects that mask the true biological information. Therefore, batch effects are typically removed when analyzing single-cell RNA sequencing (scRNA-seq) datasets for downstream tasks. Existing batch correction methods usually mitigate batch effects by reducing the data from different batches to a lower dimensional space before clustering, potentially leading to the loss of rare cell types. To address this problem, we introduce a novel single-cell data batch effect correction model using Biological-noise Decoupling Autoencoder (BDA) and Central-cross Loss termed BDACL. The model initially reconstructs raw data using an auto-encoder and conducts preliminary clustering. We then construct a similarity matrix and a hierarchical clustering tree to delineate relationships within and between different batches. Finally, we introduce a Central-cross Loss (CL). This loss leverages cross-entropy loss to prompt the model to better distinguish between different cluster labels. Additionally, it employs the Central Loss to encourage samples to form more compact clusters in the embedding space, thereby enhancing the consistency and interpretability of clustering results to mitigate differences between different batches. The primary innovation of this model lies in reconstructing data with an auto-encoder and gradually merging smaller clusters into larger ones using a hierarchical clustering tree. By using reallocated cluster labels as training labels and employing the Central-cross Loss, the model effectively eliminates batch effects in an unsupervised manner. Compared to current methods, BDACL can mitigate batch effects without losing rare cell types.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.