High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response

IF 65.3 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Foundations and Trends in Machine Learning Pub Date : 2020-11-01 DOI:10.1109/MLHPCAI4S51975.2020.00012
J. Wozniak, H. Yoo, J. Mohd-Yusof, Bogdan Nicolae, Nicholson T. Collier, J. Ozik, T. Brettin, Rick L. Stevens
{"title":"High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response","authors":"J. Wozniak, H. Yoo, J. Mohd-Yusof, Bogdan Nicolae, Nicholson T. Collier, J. Ozik, T. Brettin, Rick L. Stevens","doi":"10.1109/MLHPCAI4S51975.2020.00012","DOIUrl":null,"url":null,"abstract":"Machine learning in biomedicine is reliant on the availability of large, high-quality data sets. These corpora are used for training statistical or deep learning-based models that can be validated against other data sets and ultimately used to guide decisions. The quality of these data sets is an essential component of the quality of the models and their decisions. Thus, identifying and inspecting outlier data is critical for evaluating, curating, and using biomedical data sets. Many techniques are available to look for outlier data, but it is not clear how to evaluate the impact on highly complex deep learning methods. In this paper, we use deep learning ensembles and workflows to construct a system for automatically identifying data subsets that have a large impact on the trained models. These effects can be quantified and presented to the user for further inspection, which could improve data quality overall. We then present results from running this method on the near-exascale Summit supercomputer.","PeriodicalId":47667,"journal":{"name":"Foundations and Trends in Machine Learning","volume":"82 1","pages":"1-10"},"PeriodicalIF":65.3000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLHPCAI4S51975.2020.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 6

Abstract

Machine learning in biomedicine is reliant on the availability of large, high-quality data sets. These corpora are used for training statistical or deep learning-based models that can be validated against other data sets and ultimately used to guide decisions. The quality of these data sets is an essential component of the quality of the models and their decisions. Thus, identifying and inspecting outlier data is critical for evaluating, curating, and using biomedical data sets. Many techniques are available to look for outlier data, but it is not clear how to evaluate the impact on highly complex deep learning methods. In this paper, we use deep learning ensembles and workflows to construct a system for automatically identifying data subsets that have a large impact on the trained models. These effects can be quantified and presented to the user for further inspection, which could improve data quality overall. We then present results from running this method on the near-exascale Summit supercomputer.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高旁路学习:自动检测显著影响药物反应的肿瘤细胞
生物医学中的机器学习依赖于大量高质量数据集的可用性。这些语料库用于训练统计或基于深度学习的模型,这些模型可以针对其他数据集进行验证,并最终用于指导决策。这些数据集的质量是模型及其决策质量的重要组成部分。因此,识别和检查异常数据对于评估、管理和使用生物医学数据集至关重要。有许多技术可用于寻找离群数据,但尚不清楚如何评估对高度复杂的深度学习方法的影响。在本文中,我们使用深度学习集成和工作流来构建一个系统,用于自动识别对训练模型有很大影响的数据子集。这些影响可以量化并呈现给用户以供进一步检查,这可以提高总体数据质量。然后,我们展示了在接近百亿亿次的Summit超级计算机上运行该方法的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Foundations and Trends in Machine Learning
Foundations and Trends in Machine Learning COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
108.50
自引率
0.00%
发文量
5
期刊介绍: Each issue of Foundations and Trends® in Machine Learning comprises a monograph of at least 50 pages written by research leaders in the field. We aim to publish monographs that provide an in-depth, self-contained treatment of topics where there have been significant new developments. Typically, this means that the monographs we publish will contain a significant level of mathematical detail (to describe the central methods and/or theory for the topic at hand), and will not eschew these details by simply pointing to existing references. Literature surveys and original research papers do not fall within these aims.
期刊最新文献
Model-based Reinforcement Learning: A Survey Probabilistic Learning Reinforcement Learning Support Vector Machine Advanced Clustering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1