{"title":"ARIM-mdx Data System: Towards a Nationwide Data Platform for Materials Science","authors":"Masatoshi Hanai, Ryo Ishikawa, Mitsuaki Kawamura, Masato Ohnishi, Norio Takenaka, Kou Nakamura, Daiju Matsumura, Seiji Fujikawa, Hiroki Sakamoto, Yukinori Ochiai, Tetsuo Okane, Shin-Ichiro Kuroki, Atsuo Yamada, Toyotaro Suzumura, Junichiro Shiomi, Kenjiro Taura, Yoshio Mita, Naoya Shibata, Yuichi Ikuhara","doi":"arxiv-2409.06734","DOIUrl":null,"url":null,"abstract":"In modern materials science, effective and high-volume data management across\nleading-edge experimental facilities and world-class supercomputers is\nindispensable for cutting-edge research. Such facilities and supercomputers are\ntypically utilized by a wide range of researchers across different fields and\norganizations in academia and industry. However, existing integrated systems\nthat handle data from these resources have primarily focused just on\nsmaller-scale cross-institutional or single-domain operations. As a result,\nthey often lack the scalability, efficiency, agility, and interdisciplinarity,\nneeded for handling substantial volumes of data from various researchers. In this paper, we introduce ARIM-mdx data system, a nationwide data platform\nfor materials science in Japan. The platform involves 8 universities and\ninstitutes all over Japan through the governmental materials science project.\nCurrently in its trial phase, the ARIM-mdx data system is utilized by over 800\nresearchers from around 140 organizations in academia and industry, being\nintended to gradually expand its reach. The system employs a hybrid\narchitecture, combining a peta-scale dedicated storage system for security and\nstability with a high-performance academic cloud for efficiency and\nscalability. Through direct network connections between them, the system\nachieves 4.7x latency reduction compared to a conventional approach, resulting\nin near real-time interactive data analysis. It also utilizes specialized IoT\ndevices for secure data transfer from equipment computers and connects to\nmultiple supercomputers via an academic ultra-fast network, achieving 4x faster\ndata transfer compared to the public Internet. The ARIM-mdx data system, as a\npioneering nationwide data platform, has the potential to contribute to the\ncreation of new research communities and accelerates innovations.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"69 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In modern materials science, effective and high-volume data management across
leading-edge experimental facilities and world-class supercomputers is
indispensable for cutting-edge research. Such facilities and supercomputers are
typically utilized by a wide range of researchers across different fields and
organizations in academia and industry. However, existing integrated systems
that handle data from these resources have primarily focused just on
smaller-scale cross-institutional or single-domain operations. As a result,
they often lack the scalability, efficiency, agility, and interdisciplinarity,
needed for handling substantial volumes of data from various researchers. In this paper, we introduce ARIM-mdx data system, a nationwide data platform
for materials science in Japan. The platform involves 8 universities and
institutes all over Japan through the governmental materials science project.
Currently in its trial phase, the ARIM-mdx data system is utilized by over 800
researchers from around 140 organizations in academia and industry, being
intended to gradually expand its reach. The system employs a hybrid
architecture, combining a peta-scale dedicated storage system for security and
stability with a high-performance academic cloud for efficiency and
scalability. Through direct network connections between them, the system
achieves 4.7x latency reduction compared to a conventional approach, resulting
in near real-time interactive data analysis. It also utilizes specialized IoT
devices for secure data transfer from equipment computers and connects to
multiple supercomputers via an academic ultra-fast network, achieving 4x faster
data transfer compared to the public Internet. The ARIM-mdx data system, as a
pioneering nationwide data platform, has the potential to contribute to the
creation of new research communities and accelerates innovations.