Participatory Science and Machine Learning Applied to Millions of Sources in the Hobby-Eberly Telescope Dark Energy Experiment

Lindsay R. House, Karl Gebhardt, Keely Finkelstein, Erin Mentuch Cooper, Dustin Davis, Daniel J. Farrow, Donald P. Schneider
{"title":"Participatory Science and Machine Learning Applied to Millions of Sources in the Hobby-Eberly Telescope Dark Energy Experiment","authors":"Lindsay R. House, Karl Gebhardt, Keely Finkelstein, Erin Mentuch Cooper, Dustin Davis, Daniel J. Farrow, Donald P. Schneider","doi":"arxiv-2409.08359","DOIUrl":null,"url":null,"abstract":"We are merging a large participatory science effort with machine learning to\nenhance the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX). Our overall\ngoal is to remove false positives, allowing us to use lower signal-to-noise\ndata and sources with low goodness-of-fit. With six million classifications\nthrough Dark Energy Explorers, we can confidently determine if a source is not\nreal at over 94% confidence level when classified by at least ten individuals;\nthis confidence level increases for higher signal-to-noise sources. To date, we\nhave only been able to apply this direct analysis to 190,000 sources. The full\nsample of HETDEX will contain around 2-3M sources, including nearby galaxies\n([O II] emitters), distant galaxies (Lyman-alpha emitters or LAEs), false\npositives, and contamination from instrument issues. We can accommodate this\ntenfold increase by using machine learning with visually-vetted samples from\nDark Energy Explorers. We have already increased by over ten-fold in number of\nsources that have been visually vetted from our previous pilot study where we\nonly had 14,000 visually vetted LAE candidates. This paper expands on the\nprevious work increasing the visually-vetted sample from 14,000 to 190,000. In\naddition, using our currently visually-vetted sample, we generate a real or\nfalse positive classification for the full candidate sample of 1.2 million\nLAEs. We currently have approximately 17,000 volunteers from 159 countries\naround the world. Thus, we are applying participatory or citizen scientist\nanalysis to our full HETDEX dataset, creating a free educational opportunity\nthat requires no prior technical knowledge.","PeriodicalId":501565,"journal":{"name":"arXiv - PHYS - Physics Education","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Physics Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We are merging a large participatory science effort with machine learning to enhance the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX). Our overall goal is to remove false positives, allowing us to use lower signal-to-noise data and sources with low goodness-of-fit. With six million classifications through Dark Energy Explorers, we can confidently determine if a source is not real at over 94% confidence level when classified by at least ten individuals; this confidence level increases for higher signal-to-noise sources. To date, we have only been able to apply this direct analysis to 190,000 sources. The full sample of HETDEX will contain around 2-3M sources, including nearby galaxies ([O II] emitters), distant galaxies (Lyman-alpha emitters or LAEs), false positives, and contamination from instrument issues. We can accommodate this tenfold increase by using machine learning with visually-vetted samples from Dark Energy Explorers. We have already increased by over ten-fold in number of sources that have been visually vetted from our previous pilot study where we only had 14,000 visually vetted LAE candidates. This paper expands on the previous work increasing the visually-vetted sample from 14,000 to 190,000. In addition, using our currently visually-vetted sample, we generate a real or false positive classification for the full candidate sample of 1.2 million LAEs. We currently have approximately 17,000 volunteers from 159 countries around the world. Thus, we are applying participatory or citizen scientist analysis to our full HETDEX dataset, creating a free educational opportunity that requires no prior technical knowledge.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将参与式科学和机器学习应用于霍比-艾伯利望远镜暗能量实验中的数百万个来源
我们正在将大规模的参与式科学工作与机器学习相结合,以增强霍比-艾伯力望远镜暗能量实验(HETDEX)。我们的总体目标是消除误报,使我们能够使用较低的信噪比数据和拟合度较低的源。通过 "暗能量探索者 "进行的六百万次分类,当至少有十个人进行分类时,我们就能以 94% 以上的置信度确定一个源是否真实;对于信噪比较高的源,这一置信度还会增加。迄今为止,我们只能对 190,000 个源进行这种直接分析。HETDEX 的完整样本将包含大约 200-300 万个源,其中包括附近的星系([O II] 发射器)、遥远的星系(莱曼-阿尔法发射器或 LAEs)、假阳性以及仪器问题造成的污染。我们可以通过使用机器学习和 "暗能量探索者"(Dark Energy Explorers)中经过目视检验的样本来满足成倍增长的需求。与之前的试点研究相比,我们已经将经过目测审核的源数量增加了十倍以上,当时我们只有 14,000 个经过目测审核的 LAE 候选样本。本文在前一项工作的基础上,将经过目测审核的样本从 14,000 个增加到 190,000 个。此外,我们还利用目前经过目测审核的样本,对 120 万个 LAE 候选样本进行了真假阳性分类。我们目前拥有来自全球 159 个国家/地区的约 17,000 名志愿者。因此,我们正在将参与式或公民科学分析应用于我们的全部 HETDEX 数据集,创造一个无需任何技术知识的免费教育机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementing New Technology in Educational Systems Reflecting to learn in a physics multimedia communication course The Law of Closest Approach Investigating the Design--Science Connection in a multi-week Engineering Design (ED)-based introductory physics laboratory task A Conceptual Framework for Understanding Empathy in Physics Faculty
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1