Lindsay R. House, Karl Gebhardt, Keely Finkelstein, Erin Mentuch Cooper, Dustin Davis, Daniel J. Farrow, Donald P. Schneider
{"title":"Participatory Science and Machine Learning Applied to Millions of Sources in the Hobby-Eberly Telescope Dark Energy Experiment","authors":"Lindsay R. House, Karl Gebhardt, Keely Finkelstein, Erin Mentuch Cooper, Dustin Davis, Daniel J. Farrow, Donald P. Schneider","doi":"arxiv-2409.08359","DOIUrl":null,"url":null,"abstract":"We are merging a large participatory science effort with machine learning to\nenhance the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX). Our overall\ngoal is to remove false positives, allowing us to use lower signal-to-noise\ndata and sources with low goodness-of-fit. With six million classifications\nthrough Dark Energy Explorers, we can confidently determine if a source is not\nreal at over 94% confidence level when classified by at least ten individuals;\nthis confidence level increases for higher signal-to-noise sources. To date, we\nhave only been able to apply this direct analysis to 190,000 sources. The full\nsample of HETDEX will contain around 2-3M sources, including nearby galaxies\n([O II] emitters), distant galaxies (Lyman-alpha emitters or LAEs), false\npositives, and contamination from instrument issues. We can accommodate this\ntenfold increase by using machine learning with visually-vetted samples from\nDark Energy Explorers. We have already increased by over ten-fold in number of\nsources that have been visually vetted from our previous pilot study where we\nonly had 14,000 visually vetted LAE candidates. This paper expands on the\nprevious work increasing the visually-vetted sample from 14,000 to 190,000. In\naddition, using our currently visually-vetted sample, we generate a real or\nfalse positive classification for the full candidate sample of 1.2 million\nLAEs. We currently have approximately 17,000 volunteers from 159 countries\naround the world. Thus, we are applying participatory or citizen scientist\nanalysis to our full HETDEX dataset, creating a free educational opportunity\nthat requires no prior technical knowledge.","PeriodicalId":501565,"journal":{"name":"arXiv - PHYS - Physics Education","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Physics Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We are merging a large participatory science effort with machine learning to
enhance the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX). Our overall
goal is to remove false positives, allowing us to use lower signal-to-noise
data and sources with low goodness-of-fit. With six million classifications
through Dark Energy Explorers, we can confidently determine if a source is not
real at over 94% confidence level when classified by at least ten individuals;
this confidence level increases for higher signal-to-noise sources. To date, we
have only been able to apply this direct analysis to 190,000 sources. The full
sample of HETDEX will contain around 2-3M sources, including nearby galaxies
([O II] emitters), distant galaxies (Lyman-alpha emitters or LAEs), false
positives, and contamination from instrument issues. We can accommodate this
tenfold increase by using machine learning with visually-vetted samples from
Dark Energy Explorers. We have already increased by over ten-fold in number of
sources that have been visually vetted from our previous pilot study where we
only had 14,000 visually vetted LAE candidates. This paper expands on the
previous work increasing the visually-vetted sample from 14,000 to 190,000. In
addition, using our currently visually-vetted sample, we generate a real or
false positive classification for the full candidate sample of 1.2 million
LAEs. We currently have approximately 17,000 volunteers from 159 countries
around the world. Thus, we are applying participatory or citizen scientist
analysis to our full HETDEX dataset, creating a free educational opportunity
that requires no prior technical knowledge.