Abdur Rahman M A Basher, Caleb Hallinan, Kwonmoo Lee
{"title":"Heterogeneity-Preserving Discriminative Feature Selection for Disease-Specific Subtype Discovery.","authors":"Abdur Rahman M A Basher, Caleb Hallinan, Kwonmoo Lee","doi":"10.1101/2023.05.14.540686","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of disease-specific subtypes can provide valuable insights into disease progression and potential individualized therapies, important aspects of precision medicine given the complex nature of disease heterogeneity. The advent of high-throughput technologies has enabled the generation and analysis of various molecular data types, such as single-cell RNA-seq, proteomic, and imaging datasets, on a large scale. While these datasets offer opportunities for subtype discovery, they also pose challenges in finding subtype signatures due to their high dimensionality. Feature selection, a key step in the machine learning pipeline, involves selecting signatures that reduce feature size for more efficient downstream computational analysis. Although many existing methods focus on selecting features that differentiate known diseases or cell states, they often struggle to identify features that both preserve heterogeneity and reveal subtypes. To address this, we utilized deep metric learning-based feature embedding to explore the statistical properties of features crucial for preserving heterogeneity. Our analysis indicated that features with a notable difference in interquartile range (IQR) between classes hold important subtype information. Guided by this insight, we developed a statistical method called PHet (Preserving Heterogeneity), which employs iterative subsampling and differential analysis of IQR combined with Fisher's method to identify a small set of features that preserve heterogeneity and enhance subtype clustering quality. Validation on public single-cell RNA-seq and microarray datasets demonstrated PHet's ability to maintain sample heterogeneity while distinguishing known disease/cell states, with a tendency to outperform previous differential expression and outlier-based methods. Furthermore, an analysis of a single-cell RNA-seq dataset from mouse tracheal epithelial cells identified two distinct basal cell subtypes differentiating towards a luminal secretory phenotype using PHet-based features, demonstrating promising results in a real-data application. These results highlight PHet's potential to enhance our understanding of disease mechanisms and cell differentiation, contributing significantly to the field of personalized medicine.</p>","PeriodicalId":12314,"journal":{"name":"Expert Opinion on Therapeutic Patents","volume":"6 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10769187/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Opinion on Therapeutic Patents","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1101/2023.05.14.540686","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
The identification of disease-specific subtypes can provide valuable insights into disease progression and potential individualized therapies, important aspects of precision medicine given the complex nature of disease heterogeneity. The advent of high-throughput technologies has enabled the generation and analysis of various molecular data types, such as single-cell RNA-seq, proteomic, and imaging datasets, on a large scale. While these datasets offer opportunities for subtype discovery, they also pose challenges in finding subtype signatures due to their high dimensionality. Feature selection, a key step in the machine learning pipeline, involves selecting signatures that reduce feature size for more efficient downstream computational analysis. Although many existing methods focus on selecting features that differentiate known diseases or cell states, they often struggle to identify features that both preserve heterogeneity and reveal subtypes. To address this, we utilized deep metric learning-based feature embedding to explore the statistical properties of features crucial for preserving heterogeneity. Our analysis indicated that features with a notable difference in interquartile range (IQR) between classes hold important subtype information. Guided by this insight, we developed a statistical method called PHet (Preserving Heterogeneity), which employs iterative subsampling and differential analysis of IQR combined with Fisher's method to identify a small set of features that preserve heterogeneity and enhance subtype clustering quality. Validation on public single-cell RNA-seq and microarray datasets demonstrated PHet's ability to maintain sample heterogeneity while distinguishing known disease/cell states, with a tendency to outperform previous differential expression and outlier-based methods. Furthermore, an analysis of a single-cell RNA-seq dataset from mouse tracheal epithelial cells identified two distinct basal cell subtypes differentiating towards a luminal secretory phenotype using PHet-based features, demonstrating promising results in a real-data application. These results highlight PHet's potential to enhance our understanding of disease mechanisms and cell differentiation, contributing significantly to the field of personalized medicine.
期刊介绍:
Expert Opinion on Therapeutic Patents (ISSN 1354-3776 [print], 1744-7674 [electronic]) is a MEDLINE-indexed, peer-reviewed, international journal publishing review articles on recent pharmaceutical patent claims, providing expert opinion the scope for future development, in the context of the scientific literature.
The Editors welcome:
Reviews covering recent patent claims on compounds or applications with therapeutic potential, including biotherapeutics and small-molecule agents with specific molecular targets; and patenting trends in a particular therapeutic area
Patent Evaluations examining the aims and chemical and biological claims of individual patents
Perspectives on issues relating to intellectual property
The audience consists of scientists, managers and decision-makers in the pharmaceutical industry and others closely involved in R&D
Sample our Bioscience journals, sign in here to start your access, Latest two full volumes FREE to you for 14 days.