Brenna C Novotny, Raymond Moore, Lynn Langit, David Haley, Rachel L Maus, Jun Jiang, Caitlin Ward, Ray Guo, Ellen L Goode, Svetomir N Markovic, Chen Wang
{"title":"SpaFlow: a Nextflow pipeline for QC and clustering of MxIF datasets.","authors":"Brenna C Novotny, Raymond Moore, Lynn Langit, David Haley, Rachel L Maus, Jun Jiang, Caitlin Ward, Ray Guo, Ellen L Goode, Svetomir N Markovic, Chen Wang","doi":"10.1093/bioadv/vbaf032","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Multiplex immunofluorescence (MxIF) enables the quantification of multiple protein markers at a single-cell level while preserving spatial information, offering a powerful tool for studying tissue microenvironments. However, the flexibility in MxIF panel design poses challenges in standardizing cell phenotyping.</p><p><strong>Results: </strong>We present SpaFlow, an efficient, customizable pipeline for unsupervised clustering and classification of MxIF data, implemented using Nextflow. SpaFlow performs quality control, clustering, and postclustering analysis on segmented and quantified MxIF data, facilitating reproducible and scalable analyses across various computing platforms. The SpaFlow pipeline integrates three clustering and classification packages-Seurat, SCIMAP, and CELESTA-each providing unique methodologies for identifying cell types based on phenotypic markers. A novel \"meta-clustering\" approach condenses clusters across multiple regions of interest into common meta-clusters, streamlining the cell-type identification process in large datasets. SpaFlow's robust quality control steps, including signal summation and cell density filtering, mitigate artifacts that may impact clustering accuracy. We demonstrate the utility of SpaFlow in a case study involving 297 ovarian tumor cores, where SpaFlow successfully identified biologically meaningful cell populations, including tumor-infiltrating lymphocytes, efficiently and rapidly. Additionally, SpaFlow's reproducibility is validated using serial tonsil sections, confirming its capability to consistently identify distinctive cell populations across matched ROIs.</p><p><strong>Availability and implementation: </strong>SpaFlow is freely available with detailed documentation and examples at https://github.com/dimi-lab/SpaFlow.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf032"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879158/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Multiplex immunofluorescence (MxIF) enables the quantification of multiple protein markers at a single-cell level while preserving spatial information, offering a powerful tool for studying tissue microenvironments. However, the flexibility in MxIF panel design poses challenges in standardizing cell phenotyping.
Results: We present SpaFlow, an efficient, customizable pipeline for unsupervised clustering and classification of MxIF data, implemented using Nextflow. SpaFlow performs quality control, clustering, and postclustering analysis on segmented and quantified MxIF data, facilitating reproducible and scalable analyses across various computing platforms. The SpaFlow pipeline integrates three clustering and classification packages-Seurat, SCIMAP, and CELESTA-each providing unique methodologies for identifying cell types based on phenotypic markers. A novel "meta-clustering" approach condenses clusters across multiple regions of interest into common meta-clusters, streamlining the cell-type identification process in large datasets. SpaFlow's robust quality control steps, including signal summation and cell density filtering, mitigate artifacts that may impact clustering accuracy. We demonstrate the utility of SpaFlow in a case study involving 297 ovarian tumor cores, where SpaFlow successfully identified biologically meaningful cell populations, including tumor-infiltrating lymphocytes, efficiently and rapidly. Additionally, SpaFlow's reproducibility is validated using serial tonsil sections, confirming its capability to consistently identify distinctive cell populations across matched ROIs.
Availability and implementation: SpaFlow is freely available with detailed documentation and examples at https://github.com/dimi-lab/SpaFlow.