{"title":"一种新的用于可扩展单细胞数据分析的粗化图学习方法","authors":"Mohit Kataria , Ekta Srivastava , Kumar Arjun , Sandeep Kumar , Ishaan Gupta , Jayadeva","doi":"10.1016/j.compbiomed.2025.109873","DOIUrl":null,"url":null,"abstract":"<div><div>The emergence of single-cell technologies, including flow and mass cytometry, as well as single-cell RNA sequencing, has revolutionized the study of cellular heterogeneity, generating vast datasets rich in biological insights. Despite the effectiveness of graph-based analyses in deciphering the complexities of these datasets, managing large-scale graph representations of single-cell data remains computationally challenging. Coarsening has been employed to tackle this difficulty. However, current coarsening techniques such as Cytocoarsening, Heavy Edge Matching (HEM), and Locally Variable Edges (LVE) often suffer from slow processing speeds and limited adaptability. To address these challenges, we propose a novel approach utilizing Feature-Aware Graph Coarsening via Hashing (FACH), which integrates locality-sensitive hashing for scalable and efficient single-cell data analysis. This method directly extracts informative, low-dimensional cell representations from raw single-cell RNA sequencing and mass cytometry data, significantly improving processing speed while preserving essential data features. We demonstrate its effectiveness in downstream tasks, such as scalable graph neural network training on coarsened single-cell data, highlighting its ability to retain crucial biological features and enable efficient, accurate analyses. Our method directly extracts informative, low-dimensional cell representations from raw single-cell RNA sequencing and mass cytometry data, significantly improving processing speed and preserving critical biological features, such as transcriptional signatures and network topology. It reduces computational time by at least 50% compared to existing methods and achieves superior classification accuracy, such as 88.1% on the Baron Human dataset, underscoring its efficiency and precision in large-scale single-cell analysis.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"188 ","pages":"Article 109873"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel coarsened graph learning method for scalable single-cell data analysis\",\"authors\":\"Mohit Kataria , Ekta Srivastava , Kumar Arjun , Sandeep Kumar , Ishaan Gupta , Jayadeva\",\"doi\":\"10.1016/j.compbiomed.2025.109873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The emergence of single-cell technologies, including flow and mass cytometry, as well as single-cell RNA sequencing, has revolutionized the study of cellular heterogeneity, generating vast datasets rich in biological insights. Despite the effectiveness of graph-based analyses in deciphering the complexities of these datasets, managing large-scale graph representations of single-cell data remains computationally challenging. Coarsening has been employed to tackle this difficulty. However, current coarsening techniques such as Cytocoarsening, Heavy Edge Matching (HEM), and Locally Variable Edges (LVE) often suffer from slow processing speeds and limited adaptability. To address these challenges, we propose a novel approach utilizing Feature-Aware Graph Coarsening via Hashing (FACH), which integrates locality-sensitive hashing for scalable and efficient single-cell data analysis. This method directly extracts informative, low-dimensional cell representations from raw single-cell RNA sequencing and mass cytometry data, significantly improving processing speed while preserving essential data features. We demonstrate its effectiveness in downstream tasks, such as scalable graph neural network training on coarsened single-cell data, highlighting its ability to retain crucial biological features and enable efficient, accurate analyses. Our method directly extracts informative, low-dimensional cell representations from raw single-cell RNA sequencing and mass cytometry data, significantly improving processing speed and preserving critical biological features, such as transcriptional signatures and network topology. It reduces computational time by at least 50% compared to existing methods and achieves superior classification accuracy, such as 88.1% on the Baron Human dataset, underscoring its efficiency and precision in large-scale single-cell analysis.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"188 \",\"pages\":\"Article 109873\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482525002240\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525002240","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
A novel coarsened graph learning method for scalable single-cell data analysis
The emergence of single-cell technologies, including flow and mass cytometry, as well as single-cell RNA sequencing, has revolutionized the study of cellular heterogeneity, generating vast datasets rich in biological insights. Despite the effectiveness of graph-based analyses in deciphering the complexities of these datasets, managing large-scale graph representations of single-cell data remains computationally challenging. Coarsening has been employed to tackle this difficulty. However, current coarsening techniques such as Cytocoarsening, Heavy Edge Matching (HEM), and Locally Variable Edges (LVE) often suffer from slow processing speeds and limited adaptability. To address these challenges, we propose a novel approach utilizing Feature-Aware Graph Coarsening via Hashing (FACH), which integrates locality-sensitive hashing for scalable and efficient single-cell data analysis. This method directly extracts informative, low-dimensional cell representations from raw single-cell RNA sequencing and mass cytometry data, significantly improving processing speed while preserving essential data features. We demonstrate its effectiveness in downstream tasks, such as scalable graph neural network training on coarsened single-cell data, highlighting its ability to retain crucial biological features and enable efficient, accurate analyses. Our method directly extracts informative, low-dimensional cell representations from raw single-cell RNA sequencing and mass cytometry data, significantly improving processing speed and preserving critical biological features, such as transcriptional signatures and network topology. It reduces computational time by at least 50% compared to existing methods and achieves superior classification accuracy, such as 88.1% on the Baron Human dataset, underscoring its efficiency and precision in large-scale single-cell analysis.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.