TT-HEALpix:用于大规模天文目录高效交叉匹配的新数据索引策略

IF 3.3 3区 物理与天体物理 Q2 ASTRONOMY & ASTROPHYSICS Publications of the Astronomical Society of the Pacific Pub Date : 2024-03-01 DOI:10.1088/1538-3873/ad2721
Qing Zhao, Chengkui Zhang, Hao Li, Tingting Zhao, Chenzhou Cui, Dongwei Fan
{"title":"TT-HEALpix:用于大规模天文目录高效交叉匹配的新数据索引策略","authors":"Qing Zhao, Chengkui Zhang, Hao Li, Tingting Zhao, Chenzhou Cui, Dongwei Fan","doi":"10.1088/1538-3873/ad2721","DOIUrl":null,"url":null,"abstract":"\n Cross-matching is an indispensable operation in the data preparation, analysis, and research processes of multi-band astronomy and time-domain astronomy. Multi-catalog time-series data reconstruction is an important part of time-domain astronomy. In the large-scale distributed reconstruction process, boundary problems have always affected the accuracy of time-series data. To optimize these boundary problems and improve data precision, this paper proposes a new hybrid astronomical data indexing method called Translated Transformation based HEALPix Dual Index (TT-HEALPix). Under the reasonable Healpix division level, by translation transformation, the two indexes before and after the transformation form a unique pseudo-hybrid index strategy, which not only retains the advantages of the hybrid index scheme suitable for large-scale parallel computing, but also compensates for its shortage of high omission at the block boundary position. Based on TT-HEALPix, this paper completes the multi-catalog time-series reconstruction process on the Spark platform and compares it with the HEALPix+HTM hybrid indexing strategy. The experiments demonstrate that TT-HEALPix has significant advantages over the traditional HEALPix+HTM hybrid indexing method in terms of data accuracy and cross-matching efficiency. At level 9 of the Healpix index, TT-HEALPix achieves a 6%–19% improvement in cross-matching efficiency in a distributed environment compared to HEALPix+HTM. In terms of data accuracy, for the AST3-II dataset at level 9, TT-HEALPix has 62.2% accuracy improvement over HEALPix and 45.5% improvement over HEALPix+HTM. In conclusion, the proposed novel indexing strategy, TT-HEALPix, is better suited to the efficiency and accuracy requirements of cross-match.","PeriodicalId":20820,"journal":{"name":"Publications of the Astronomical Society of the Pacific","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TT-HEALpix: A New Data Indexing Strategy for Efficient Cross-match of Large-scale Astronomical Catalogs\",\"authors\":\"Qing Zhao, Chengkui Zhang, Hao Li, Tingting Zhao, Chenzhou Cui, Dongwei Fan\",\"doi\":\"10.1088/1538-3873/ad2721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Cross-matching is an indispensable operation in the data preparation, analysis, and research processes of multi-band astronomy and time-domain astronomy. Multi-catalog time-series data reconstruction is an important part of time-domain astronomy. In the large-scale distributed reconstruction process, boundary problems have always affected the accuracy of time-series data. To optimize these boundary problems and improve data precision, this paper proposes a new hybrid astronomical data indexing method called Translated Transformation based HEALPix Dual Index (TT-HEALPix). Under the reasonable Healpix division level, by translation transformation, the two indexes before and after the transformation form a unique pseudo-hybrid index strategy, which not only retains the advantages of the hybrid index scheme suitable for large-scale parallel computing, but also compensates for its shortage of high omission at the block boundary position. Based on TT-HEALPix, this paper completes the multi-catalog time-series reconstruction process on the Spark platform and compares it with the HEALPix+HTM hybrid indexing strategy. The experiments demonstrate that TT-HEALPix has significant advantages over the traditional HEALPix+HTM hybrid indexing method in terms of data accuracy and cross-matching efficiency. At level 9 of the Healpix index, TT-HEALPix achieves a 6%–19% improvement in cross-matching efficiency in a distributed environment compared to HEALPix+HTM. In terms of data accuracy, for the AST3-II dataset at level 9, TT-HEALPix has 62.2% accuracy improvement over HEALPix and 45.5% improvement over HEALPix+HTM. In conclusion, the proposed novel indexing strategy, TT-HEALPix, is better suited to the efficiency and accuracy requirements of cross-match.\",\"PeriodicalId\":20820,\"journal\":{\"name\":\"Publications of the Astronomical Society of the Pacific\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Publications of the Astronomical Society of the Pacific\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1088/1538-3873/ad2721\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Publications of the Astronomical Society of the Pacific","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/1538-3873/ad2721","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0

摘要

在多波段天文学和时域天文学的数据准备、分析和研究过程中,交叉匹配是一项不可或缺的操作。多目录时序数据重建是时域天文学的重要组成部分。在大规模分布式重建过程中,边界问题一直影响着时序数据的准确性。为了优化这些边界问题,提高数据精度,本文提出了一种新的混合天文数据索引方法--基于变换的 HEALPix 双索引(TT-HEALPix)。在合理的 Healpix 分割层次下,通过平移变换,变换前后的两个索引形成一种独特的伪混合索引策略,既保留了混合索引方案适用于大规模并行计算的优点,又弥补了其在块边界位置高遗漏的不足。本文以 TT-HEALPix 为基础,在 Spark 平台上完成了多目录时间序列重建过程,并与 HEALPix+HTM 混合索引策略进行了比较。实验证明,与传统的 HEALPix+HTM 混合索引方法相比,TT-HEALPix 在数据准确性和交叉匹配效率方面具有显著优势。在 Healpix 索引的第 9 级,TT-HEALPix 在分布式环境中的交叉匹配效率比 HEALPix+HTM 提高了 6%-19%。在数据准确性方面,对于第 9 级的 AST3-II 数据集,TT-HEALPix 比 HEALPix 提高了 62.2% 的准确性,比 HEALPix+HTM 提高了 45.5%。总之,所提出的新型索引策略 TT-HEALPix 更能满足交叉配对的效率和准确性要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TT-HEALpix: A New Data Indexing Strategy for Efficient Cross-match of Large-scale Astronomical Catalogs
Cross-matching is an indispensable operation in the data preparation, analysis, and research processes of multi-band astronomy and time-domain astronomy. Multi-catalog time-series data reconstruction is an important part of time-domain astronomy. In the large-scale distributed reconstruction process, boundary problems have always affected the accuracy of time-series data. To optimize these boundary problems and improve data precision, this paper proposes a new hybrid astronomical data indexing method called Translated Transformation based HEALPix Dual Index (TT-HEALPix). Under the reasonable Healpix division level, by translation transformation, the two indexes before and after the transformation form a unique pseudo-hybrid index strategy, which not only retains the advantages of the hybrid index scheme suitable for large-scale parallel computing, but also compensates for its shortage of high omission at the block boundary position. Based on TT-HEALPix, this paper completes the multi-catalog time-series reconstruction process on the Spark platform and compares it with the HEALPix+HTM hybrid indexing strategy. The experiments demonstrate that TT-HEALPix has significant advantages over the traditional HEALPix+HTM hybrid indexing method in terms of data accuracy and cross-matching efficiency. At level 9 of the Healpix index, TT-HEALPix achieves a 6%–19% improvement in cross-matching efficiency in a distributed environment compared to HEALPix+HTM. In terms of data accuracy, for the AST3-II dataset at level 9, TT-HEALPix has 62.2% accuracy improvement over HEALPix and 45.5% improvement over HEALPix+HTM. In conclusion, the proposed novel indexing strategy, TT-HEALPix, is better suited to the efficiency and accuracy requirements of cross-match.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Publications of the Astronomical Society of the Pacific
Publications of the Astronomical Society of the Pacific 地学天文-天文与天体物理
CiteScore
6.70
自引率
5.70%
发文量
103
审稿时长
4-8 weeks
期刊介绍: The Publications of the Astronomical Society of the Pacific (PASP), the technical journal of the Astronomical Society of the Pacific (ASP), has been published regularly since 1889, and is an integral part of the ASP''s mission to advance the science of astronomy and disseminate astronomical information. The journal provides an outlet for astronomical results of a scientific nature and serves to keep readers in touch with current astronomical research. It contains refereed research and instrumentation articles, invited and contributed reviews, tutorials, and dissertation summaries.
期刊最新文献
The Valuable Long-period Cluster Cepheid KQ Scorpii and other Calibration Candidates A New Parameterization for Finding Solutions for Microlensing Exoplanet Light Curves Multi-amplifier Sensing Charge-coupled Devices for Next Generation Spectroscopy Ejecta Masses in Type Ia Supernovae—Implications for the Progenitor and the Explosion Scenario* * Based in part on observations obtained with the Hobby-Eberly Telescope (HET), which is a joint project of the University of Texas at Austin, the Pennsylvania State University, Ludwig-Maximillians-Universitaet Muenchen, and Georg-August Universitaet Goettingen. The HET is named in honor of its principal benefactors, William P. Hobby and Robert E. Eberly. Physical Properties of Embedded Clusters in ATLASGAL Clumps with H ii Regions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1