Juanle Wang, Kun Bu, Dongmei Yan, Jingyue Wang, Bowen Duan, M. Zhang, Guojin He
{"title":"Classification framework and semantic labeling for Big Earth Data","authors":"Juanle Wang, Kun Bu, Dongmei Yan, Jingyue Wang, Bowen Duan, M. Zhang, Guojin He","doi":"10.1080/20964471.2022.2123946","DOIUrl":null,"url":null,"abstract":"ABSTRACT Big Earth Data refers to the multidimensional integration and association of scientific data, including geography, resources, environment, ecology, and biology. An effective data classification system and label management strategy are important foundations for long-term management of data resources. The objective of this study was to construct a classification system and realize multidimensional semantic data label management for the Big Earth Data Science Engineering Program (CASEarth). This study constructed two sets of classification and coding systems that realize classification by mapping each other; namely, the geosphere-level and Sustainable Development Goals (SDGs) indicator classifications. This technique was based on natural language processing technology and solved problems with subject-word segmentation, weight calculation, and dynamic matching. A prototype system for classification and label management was constructed based on existing CASEarth datasets of more than 1,100. Furthermore, we expect our study to provide the methodology and technical support for user-oriented classification and label management services for Big Earth Data.","PeriodicalId":8765,"journal":{"name":"Big Earth Data","volume":"34 1","pages":"886 - 903"},"PeriodicalIF":4.2000,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Earth Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1080/20964471.2022.2123946","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT Big Earth Data refers to the multidimensional integration and association of scientific data, including geography, resources, environment, ecology, and biology. An effective data classification system and label management strategy are important foundations for long-term management of data resources. The objective of this study was to construct a classification system and realize multidimensional semantic data label management for the Big Earth Data Science Engineering Program (CASEarth). This study constructed two sets of classification and coding systems that realize classification by mapping each other; namely, the geosphere-level and Sustainable Development Goals (SDGs) indicator classifications. This technique was based on natural language processing technology and solved problems with subject-word segmentation, weight calculation, and dynamic matching. A prototype system for classification and label management was constructed based on existing CASEarth datasets of more than 1,100. Furthermore, we expect our study to provide the methodology and technical support for user-oriented classification and label management services for Big Earth Data.