Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples.

Murat Dundar, Halid Ziya Yerebakan, Bartek Rajwa
{"title":"Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples.","authors":"Murat Dundar, Halid Ziya Yerebakan, Bartek Rajwa","doi":"10.1145/2623330.2623695","DOIUrl":null,"url":null,"abstract":"<p><p>We present a clustering algorithm for discovering rare yet significant recurring classes across a batch of samples in the presence of random effects. We model each sample data by an infinite mixture of Dirichlet-process Gaussian-mixture models (DPMs) with each DPM representing the noisy realization of its corresponding class distribution in a given sample. We introduce dependencies across multiple samples by placing a global Dirichlet process prior over individual DPMs. This hierarchical prior introduces a sharing mechanism across samples and allows for identifying local realizations of classes across samples. We use collapsed Gibbs sampler for inference to recover local DPMs and identify their class associations. We demonstrate the utility of the proposed algorithm, processing a flow cytometry data set containing two extremely rare cell populations, and report results that significantly outperform competing techniques. The source code of the proposed algorithm is available on the web via the link: http://cs.iupui.edu/~dundar/aspire.htm.</p>","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9963292/pdf/nihms-1875696.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2014/8/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present a clustering algorithm for discovering rare yet significant recurring classes across a batch of samples in the presence of random effects. We model each sample data by an infinite mixture of Dirichlet-process Gaussian-mixture models (DPMs) with each DPM representing the noisy realization of its corresponding class distribution in a given sample. We introduce dependencies across multiple samples by placing a global Dirichlet process prior over individual DPMs. This hierarchical prior introduces a sharing mechanism across samples and allows for identifying local realizations of classes across samples. We use collapsed Gibbs sampler for inference to recover local DPMs and identify their class associations. We demonstrate the utility of the proposed algorithm, processing a flow cytometry data set containing two extremely rare cell populations, and report results that significantly outperform competing techniques. The source code of the proposed algorithm is available on the web via the link: http://cs.iupui.edu/~dundar/aspire.htm.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
批量发现重复出现的罕见类别,从而识别异常样本。
我们提出了一种聚类算法,用于在存在随机效应的情况下发现一批样本中罕见但重要的重复类。我们用 Dirichlet 过程高斯混合物模型(DPM)的无限混合物对每个样本数据进行建模,每个 DPM 代表其相应类别分布在给定样本中的噪声实现。我们通过在单个 DPM 上放置全局 Dirichlet 过程先验来引入多个样本之间的依赖关系。这种分层先验引入了跨样本共享机制,并允许识别跨样本的局部类实现。我们使用折叠吉布斯采样器进行推理,以恢复局部 DPM 并识别其类别关联。我们通过处理包含两种极其罕见细胞群的流式细胞仪数据集,展示了所提算法的实用性,并报告了明显优于其他竞争技术的结果。建议算法的源代码可通过以下链接获取:http://cs.iupui.edu/~dundar/aspire.htm。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting Age-Related Macular Degeneration Progression with Contrastive Attention and Time-Aware LSTM. MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization. Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes. MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph. Federated Adversarial Debiasing for Fair and Transferable Representations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1