{"title":"CytoEMD","authors":"Haidong Yi, Natalie Stanley","doi":"10.1145/3535508.3545525","DOIUrl":null,"url":null,"abstract":"Modern single-cell technologies, such as Cytometry by Time of Flight (CyTOF), measure the simultaneous expression of multiple protein markers per cell and have enabled the characterization of the immune system at unparalleled depths across numerous clinical applications. Despite the success of a variety of developed bioinformatics techniques for automatically characterizing cells into particular immune cell-types, methods to encode variation across heterogeneous cellular landscapes and with respect to a clinical outcome of interest are still lacking. To summarize and unravel the immunological variation across multiple samples profiled with CyTOF, we developed CytoEMD, a fast and scalable metric-based method to encode a compact vector representation for each profiled sample. CytoEMD uses earth mover's distance (EMD) to quantify the differences between pairs of profiled samples, which can be further projected into a latent space for visualization and interpretation. We compared CytoEMD to gating-based and deep-learning based set autoencoder methods and found that the CytoEMD approach 1) correctly captures between-patient variation, and 2) is more efficient and requires significantly fewer parameters. CytoEMD further promotes interpretability by providing insight into the cell-types driving variation between samples. CytoEMD is available as an open-sourced python package at https://github.com/CompCy-lab/CytoEMD.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern single-cell technologies, such as Cytometry by Time of Flight (CyTOF), measure the simultaneous expression of multiple protein markers per cell and have enabled the characterization of the immune system at unparalleled depths across numerous clinical applications. Despite the success of a variety of developed bioinformatics techniques for automatically characterizing cells into particular immune cell-types, methods to encode variation across heterogeneous cellular landscapes and with respect to a clinical outcome of interest are still lacking. To summarize and unravel the immunological variation across multiple samples profiled with CyTOF, we developed CytoEMD, a fast and scalable metric-based method to encode a compact vector representation for each profiled sample. CytoEMD uses earth mover's distance (EMD) to quantify the differences between pairs of profiled samples, which can be further projected into a latent space for visualization and interpretation. We compared CytoEMD to gating-based and deep-learning based set autoencoder methods and found that the CytoEMD approach 1) correctly captures between-patient variation, and 2) is more efficient and requires significantly fewer parameters. CytoEMD further promotes interpretability by providing insight into the cell-types driving variation between samples. CytoEMD is available as an open-sourced python package at https://github.com/CompCy-lab/CytoEMD.