Personalized anomaly detection using deep active learning

A. V. Sadr, Bruce A Bassett, Emmanuel Sekyi
{"title":"Personalized anomaly detection using deep active learning","authors":"A. V. Sadr, Bruce A Bassett, Emmanuel Sekyi","doi":"10.1093/rasti/rzad032","DOIUrl":null,"url":null,"abstract":"\n Anomaly detection algorithms are typically applied to static, unchanging, data features hand-crafted by the user. But how does a user systematically craft good features for anomalies that have never been seen? Here we couple deep learning with active learning – in which an Oracle iteratively labels small amounts of data selected algorithmically over a series of rounds – to automatically and dynamically improve the data features for efficient outlier detection. This approach, AHUNT, shows excellent performance on MNIST, CIFAR10, and Galaxy-DECaLS data, significantly outperforming both standard anomaly detection and active learning algorithms with static feature spaces. Beyond improved performance, AHUNT also allows the number of anomaly classes to grow organically in response to the Oracle’s evaluations. Extensive ablation studies explore the impact of Oracle question selection strategy and loss function on performance. We illustrate how the dynamic anomaly class taxonomy represents another step towards fully personalized rankings of different anomaly classes that reflect a user’s interests, allowing the algorithm to learn to ignore statistically significant but uninteresting outliers (e.g. noise). This should prove useful in the era of massive astronomical datasets serving diverse sets of users who can only review a tiny subset of the incoming data.","PeriodicalId":367327,"journal":{"name":"RAS Techniques and Instruments","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RAS Techniques and Instruments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/rasti/rzad032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Anomaly detection algorithms are typically applied to static, unchanging, data features hand-crafted by the user. But how does a user systematically craft good features for anomalies that have never been seen? Here we couple deep learning with active learning – in which an Oracle iteratively labels small amounts of data selected algorithmically over a series of rounds – to automatically and dynamically improve the data features for efficient outlier detection. This approach, AHUNT, shows excellent performance on MNIST, CIFAR10, and Galaxy-DECaLS data, significantly outperforming both standard anomaly detection and active learning algorithms with static feature spaces. Beyond improved performance, AHUNT also allows the number of anomaly classes to grow organically in response to the Oracle’s evaluations. Extensive ablation studies explore the impact of Oracle question selection strategy and loss function on performance. We illustrate how the dynamic anomaly class taxonomy represents another step towards fully personalized rankings of different anomaly classes that reflect a user’s interests, allowing the algorithm to learn to ignore statistically significant but uninteresting outliers (e.g. noise). This should prove useful in the era of massive astronomical datasets serving diverse sets of users who can only review a tiny subset of the incoming data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用深度主动学习的个性化异常检测
异常检测算法通常应用于静态的、不变的、由用户手工制作的数据特征。但是,用户如何系统地为从未见过的异常设计良好的功能呢?在这里,我们将深度学习与主动学习结合起来——Oracle在一系列轮中迭代标记算法选择的少量数据——以自动和动态地改进数据特征,从而有效地检测离群值。这种方法,AHUNT,在MNIST, CIFAR10和Galaxy-DECaLS数据上显示出优异的性能,显著优于标准异常检测和静态特征空间的主动学习算法。除了提高性能之外,AHUNT还允许异常类的数量根据Oracle的评估有机地增长。广泛的消融研究探讨了Oracle问题选择策略和损失函数对性能的影响。我们说明了动态异常类分类法是如何向反映用户兴趣的不同异常类的完全个性化排名迈出的又一步,允许算法学习忽略统计上显着但无趣的异常值(例如噪声)。在海量天文数据集的时代,这应该证明是有用的,这些数据集服务于不同的用户集,而这些用户只能查看传入数据的一小部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classifying LEO satellite platforms with boosted decision trees PyExoCross: a Python program for generating spectra and cross sections from molecular line lists The verification of periodicity with the use of recurrent neural networks REPUBLIC: A variability-preserving systematic-correction algorithm for PLATO’s multi-camera light curves A simple spacecraft – vector intersection methodology and applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1