{"title":"Constraining Galaxy-Halo connection using machine learning","authors":"A. Jana , L. Samushia","doi":"10.1016/j.ascom.2024.100883","DOIUrl":null,"url":null,"abstract":"<div><div>We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature. Properly restricting the HOD parameter space, transforming the training data, and carefully selecting ML algorithms are essential for achieving unbiased and robust predictions. Among the methods tested, artificial neural networks (ANNs) outperform random forests (RF) and ridge regression in predicting clustering statistics, when the HOD prior space is appropriately restricted. We demonstrate these findings using the projected two-point correlation function (<span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span>), angular multipoles of the correlation function (<span><math><mrow><msub><mrow><mi>ξ</mi></mrow><mrow><mi>ℓ</mi></mrow></msub><mrow><mo>(</mo><mi>r</mi><mo>)</mo></mrow></mrow></math></span>), and the void probability function (VPF) of Luminous Red Galaxies from Dark Energy Spectroscopic Instrument mocks. Our results show that while combining <span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span> and VPF improves parameter constraints, adding the multipoles <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>, <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, and <span><math><msub><mrow><mi>ξ</mi></mrow><mrow><mn>4</mn></mrow></msub></math></span> to <span><math><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>p</mi></mrow></msub><mrow><mo>(</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>p</mi></mrow></msub><mo>)</mo></mrow></mrow></math></span> does not significantly improve the constraints.</div></div>","PeriodicalId":48757,"journal":{"name":"Astronomy and Computing","volume":"49 ","pages":"Article 100883"},"PeriodicalIF":1.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Astronomy and Computing","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213133724000982","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature. Properly restricting the HOD parameter space, transforming the training data, and carefully selecting ML algorithms are essential for achieving unbiased and robust predictions. Among the methods tested, artificial neural networks (ANNs) outperform random forests (RF) and ridge regression in predicting clustering statistics, when the HOD prior space is appropriately restricted. We demonstrate these findings using the projected two-point correlation function (), angular multipoles of the correlation function (), and the void probability function (VPF) of Luminous Red Galaxies from Dark Energy Spectroscopic Instrument mocks. Our results show that while combining and VPF improves parameter constraints, adding the multipoles , , and to does not significantly improve the constraints.
Astronomy and ComputingASTRONOMY & ASTROPHYSICSCOMPUTER SCIENCE,-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
4.10
自引率
8.00%
发文量
67
期刊介绍:
Astronomy and Computing is a peer-reviewed journal that focuses on the broad area between astronomy, computer science and information technology. The journal aims to publish the work of scientists and (software) engineers in all aspects of astronomical computing, including the collection, analysis, reduction, visualisation, preservation and dissemination of data, and the development of astronomical software and simulations. The journal covers applications for academic computer science techniques to astronomy, as well as novel applications of information technologies within astronomy.