首页 > 最新文献

Data Intelligence最新文献

英文 中文
Role of Sports Based on Big Data Analysis in Promoting the Physique and Health of Children and Adolescents 基于大数据分析的体育对促进儿童青少年体质健康的作用
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-13 DOI: 10.1162/dint_a_00207
Pengfei Wen, Jinsong Wu
Healthy body is the foundation of young people's growth. With the popularization and globalization of the Internet, multimedia technology is rapidly changing the impact of traditional media on the growth of young people. The current health situation of young people is not optimistic. The decline in physical fitness, obesity and psychological dysplasia of adolescents have aroused the concern of all sectors of society. In recent years, the emergence and dissemination of big data (BD) has brought a new dimension to the value of data applications. The combination of BD and youth health services provides young people with good health opportunities. Through the recording, analysis and release of adolescent physical health data, the system has established an extensive knowledge database on adolescent physical and mental health, thus improving the physical health of adolescents. This paper summarized and combed the overview and application of BD, and analyzed and discussed the reasons for the continuous decline of young people's physique. Through the analysis of the application of BD in the promotion of young people's physical health, this paper proposed more achievable improvement strategies and plans, and then summarizes and discusses the experiment. According to the survey and experiment, the random simulation algorithm was introduced into daily exercise, diet and life preference. The new system and health improvement strategy designed for teenagers’ physical health using BD could help students improve their physical health by 55%.
健康的身体是年轻人成长的基础。随着互联网的普及和全球化,多媒体技术正在迅速改变传统媒体对年轻人成长的影响。目前年轻人的健康状况不容乐观。青少年体质下降、肥胖和心理发育不良引起了社会各界的关注。近年来,大数据的出现和传播为数据应用的价值带来了新的维度。BD和青少年健康服务的结合为青少年提供了良好的健康机会。通过对青少年身体健康数据的记录、分析和发布,该系统建立了广泛的青少年身心健康知识库,从而提高了青少年的身体健康水平。本文对BD的概述和应用进行了总结和梳理,并对青少年体质持续下降的原因进行了分析和探讨。通过对BD在促进青少年身体健康方面的应用分析,提出了更可实现的改进策略和计划,并对实验进行了总结和讨论。根据调查和实验,将随机模拟算法引入日常锻炼、饮食和生活偏好中。使用BD为青少年身体健康设计的新系统和健康改善策略可以帮助学生改善55%的身体健康。
{"title":"Role of Sports Based on Big Data Analysis in Promoting the Physique and Health of Children and Adolescents","authors":"Pengfei Wen, Jinsong Wu","doi":"10.1162/dint_a_00207","DOIUrl":"https://doi.org/10.1162/dint_a_00207","url":null,"abstract":"\u0000 Healthy body is the foundation of young people's growth. With the popularization and globalization of the Internet, multimedia technology is rapidly changing the impact of traditional media on the growth of young people. The current health situation of young people is not optimistic. The decline in physical fitness, obesity and psychological dysplasia of adolescents have aroused the concern of all sectors of society. In recent years, the emergence and dissemination of big data (BD) has brought a new dimension to the value of data applications. The combination of BD and youth health services provides young people with good health opportunities. Through the recording, analysis and release of adolescent physical health data, the system has established an extensive knowledge database on adolescent physical and mental health, thus improving the physical health of adolescents. This paper summarized and combed the overview and application of BD, and analyzed and discussed the reasons for the continuous decline of young people's physique. Through the analysis of the application of BD in the promotion of young people's physical health, this paper proposed more achievable improvement strategies and plans, and then summarizes and discusses the experiment. According to the survey and experiment, the random simulation algorithm was introduced into daily exercise, diet and life preference. The new system and health improvement strategy designed for teenagers’ physical health using BD could help students improve their physical health by 55%.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49231209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data 统一访问大型异构数据的最优虚拟数据模型预测
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-13 DOI: 10.1162/dint_a_00216
Chahrazed B. Bachir Belmehdi, A. Khiat, Nabil Keskes
The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations. Data virtualization systems, including Ontology-Based Data Access (ODBA) query data on-the-fly against the original data sources without any prior data materialization. Existing approaches by design use a fixed model e.g., TABULAR as the only Virtual Data Model - a uniform schema built on-the-fly to load, transform, and join relevant data. While other data models, such as GRAPH or DOCUMENT, are more flexible and, thus, can be more suitable for some common types of queries, such as join or nested queries. Those queries are hard to predict because they depend on many criteria, such as query plan, data model, data size, and operations. To address the problem of selecting the optimal virtual data model for queries on large datasets, we present a new approach that (1) builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and (2) calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries. OPTIMA - implementation of our approach currently leverages state-of-the-art Big Data technologies, Apache-Spark and Graphx, and implements two virtual data models, GRAPH and TABULAR, and supports out-of-the-box five data s ources m odels: property graph, document-based, e.g., wide-columnar, relational, and tabular, stored in Neo4j, MongoDB, Cassandra, MySQL, and CSV respectively. Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831, thus, a reduction in query execution time of over 40% for the tabular model selection and over 30% for the graph model selection.
行业中生成数据的增长需要新的高效大数据集成方法,以便最终用户统一访问数据,以执行更好的业务运营。数据虚拟化系统,包括基于本体的数据访问(ODBA),在没有任何先前数据物化的情况下,根据原始数据源动态查询数据。现有的设计方法使用固定模型,例如TABULAR作为唯一的虚拟数据模型,这是一种动态构建的统一模式,用于加载、转换和连接相关数据。而其他数据模型,如GRAPH或DOCUMENT,则更灵活,因此更适合于一些常见类型的查询,如联接或嵌套查询。这些查询很难预测,因为它们依赖于许多条件,如查询计划、数据模型、数据大小和操作。为了解决在大型数据集上选择最佳虚拟数据模型进行查询的问题,我们提出了一种新方法,该方法(1)建立在OBDA的基础上,以分布式方式查询和连接大型异构数据,(2)调用深度学习方法,使用从SPARQL查询中提取的特征来预测最佳虚拟数据模式。OPTIMA-我们方法的实现目前利用了最先进的大数据技术,Apache Spark和Graphx,并实现了两个虚拟数据模型,GRAPH和TABULAR,并支持开箱即用的五种数据源模型:属性图、基于文档的(例如,宽列、关系和表格),分别存储在Neo4j、MongoDB、Cassandra、MySQL和CSV中。大量实验表明,我们的方法以0.831的精度返回了最佳虚拟模型,因此,对于表格模型选择,查询执行时间减少了40%以上,对于图形模型选择,则查询执行时间缩短了30%以上。
{"title":"Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data","authors":"Chahrazed B. Bachir Belmehdi, A. Khiat, Nabil Keskes","doi":"10.1162/dint_a_00216","DOIUrl":"https://doi.org/10.1162/dint_a_00216","url":null,"abstract":"\u0000 The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations. Data virtualization systems, including Ontology-Based Data Access (ODBA) query data on-the-fly against the original data sources without any prior data materialization. Existing approaches by design use a fixed model e.g., TABULAR as the only Virtual Data Model - a uniform schema built on-the-fly to load, transform, and join relevant data. While other data models, such as GRAPH or DOCUMENT, are more flexible and, thus, can be more suitable for some common types of queries, such as join or nested queries. Those queries are hard to predict because they depend on many criteria, such as query plan, data model, data size, and operations. To address the problem of selecting the optimal virtual data model for queries on large datasets, we present a new approach that (1) builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and (2) calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries. OPTIMA - implementation of our approach currently leverages state-of-the-art Big Data technologies, Apache-Spark and Graphx, and implements two virtual data models, GRAPH and TABULAR, and supports out-of-the-box five data s ources m odels: property graph, document-based, e.g., wide-columnar, relational, and tabular, stored in Neo4j, MongoDB, Cassandra, MySQL, and CSV respectively. Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831, thus, a reduction in query execution time of over 40% for the tabular model selection and over 30% for the graph model selection.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44346624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models 基于温度综合指数和混合频率模型的总用电量预测
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-13 DOI: 10.1162/dint_a_00215
Xuerong Li, W. Shang, Xun Zhang, Baoguo Shan, Xiang Wang
ABSTRACT The total electricity consumption (TEC) can accurately reflect the operation of the national economy, and the forecasting of the TEC can help predict the economic development trend, as well as provide insights for the formulation of macro policies. Nowadays, high-frequency and massive multi-source data provide a new way to predict the TEC. In this paper, a “seasonal-cumulative temperature index” is constructed based on high-frequency temperature data, and a mixed-frequency prediction model based on multi-source big data (Mixed Data Sampling with Monthly Temperature and Daily Temperature index, MIDAS-MT-DT) is proposed. Experimental results show that the MIDAS-MT-DT model achieves higher prediction accuracy, and the “seasonal-cumulative temperature index” can improve prediction accuracy.
总用电量(TEC)能够准确反映国民经济的运行情况,对其进行预测有助于预测经济发展趋势,并为宏观政策的制定提供参考。如今,高频率、海量的多源数据为TEC的预测提供了新的途径。本文基于高频温度数据构建了“季节积温指数”,提出了基于多源大数据的混合频率预测模型(Mixed data Sampling with Monthly temperature and Daily temperature index, MIDAS-MT-DT)。实验结果表明,MIDAS-MT-DT模型具有较高的预测精度,“季节积温指数”可以提高预测精度。
{"title":"Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models","authors":"Xuerong Li, W. Shang, Xun Zhang, Baoguo Shan, Xiang Wang","doi":"10.1162/dint_a_00215","DOIUrl":"https://doi.org/10.1162/dint_a_00215","url":null,"abstract":"ABSTRACT The total electricity consumption (TEC) can accurately reflect the operation of the national economy, and the forecasting of the TEC can help predict the economic development trend, as well as provide insights for the formulation of macro policies. Nowadays, high-frequency and massive multi-source data provide a new way to predict the TEC. In this paper, a “seasonal-cumulative temperature index” is constructed based on high-frequency temperature data, and a mixed-frequency prediction model based on multi-source big data (Mixed Data Sampling with Monthly Temperature and Daily Temperature index, MIDAS-MT-DT) is proposed. Experimental results show that the MIDAS-MT-DT model achieves higher prediction accuracy, and the “seasonal-cumulative temperature index” can improve prediction accuracy.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"750-766"},"PeriodicalIF":3.9,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41730012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RCMR 280k: Refined Corpus for Move Recognition Based on PubMed Abstracts RCMR 280k:基于PubMed摘要的精细移动识别语料库
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-27 DOI: 10.1162/dint_a_00214
Jie Li, Gaihong Yu, Zhixiong Zhang
ABSTRACT Existing datasets for move recognition, such as PubMed 200k RCT, exhibit several problems that significantly impact recognition performance, especially for Background and Objective labels. In order to improve the move recognition performance, we introduce a method and construct a refined corpus based on PubMed, named RCMR 280k. This corpus comprises approximately 280,000 structured abstracts, totaling 3,386,008 sentences, each sentence is labeled with one of five categories: Background, Objective, Method, Result, or Conclusion. We also construct a subset of RCMR, named RCMR_RCT, corresponding to medical subdomain of RCTs. We conduct comparison experiments using our RCMR, RCMR_RCT with PubMed 380k and PubMed 200k RCT, respectively. The best results, obtained using the MSMBERT model, show that: (1) our RCMR outperforms PubMed 380k by 0.82%, while our RCMR_RCT outperforms PubMed 200k RCT by 9.35%; (2) compared with PubMed 380k, our corpus achieve better improvement on the Results and Conclusions categories, with average F1 performance improves 1% and 0.82%, respectively; (3) compared with PubMed 200k RCT, our corpus significantly improves the performance in the Background and Objective categories, with average F1 scores improves 28.31% and 37.22%, respectively. To the best of our knowledge, our RCMR is among the rarely high-quality, resource-rich refined PubMed corpora available. Our work in this paper has been applied in the SciAIEngine, which is openly accessible for researchers to conduct move recognition task.
现有的移动识别数据集,如PubMed 200k RCT,存在几个显著影响识别性能的问题,特别是对于背景和目标标签。为了提高移动识别的性能,我们引入了一种基于PubMed的方法,并构建了一个精细化的语料库RCMR 280k。该语料库包含大约280,000个结构化摘要,共计3,386,008个句子,每个句子标记为五个类别之一:背景,目标,方法,结果或结论。我们还构造了一个RCMR子集,命名为RCMR_RCT,对应于RCTs的医学子域。我们分别使用我们的RCMR、RCMR_RCT与PubMed 380k和PubMed 200k RCT进行对比实验。使用MSMBERT模型获得的最佳结果表明:(1)我们的RCMR比PubMed 380k高0.82%,而我们的RCMR_RCT比PubMed 200k RCT高9.35%;(2)与PubMed 380k相比,我们的语料库在Results和conclusion类别上取得了更好的改进,平均F1性能分别提高了1%和0.82%;(3)与PubMed 200k RCT相比,我们的语料库在Background和Objective类别上的性能显著提高,平均F1分数分别提高了28.31%和37.22%。据我们所知,我们的RCMR是为数不多的高质量、资源丰富的精炼PubMed语料库之一。我们的工作已应用于SciAIEngine,该引擎对研究人员开放,可供他们进行移动识别任务。
{"title":"RCMR 280k: Refined Corpus for Move Recognition Based on PubMed Abstracts","authors":"Jie Li, Gaihong Yu, Zhixiong Zhang","doi":"10.1162/dint_a_00214","DOIUrl":"https://doi.org/10.1162/dint_a_00214","url":null,"abstract":"ABSTRACT Existing datasets for move recognition, such as PubMed 200k RCT, exhibit several problems that significantly impact recognition performance, especially for Background and Objective labels. In order to improve the move recognition performance, we introduce a method and construct a refined corpus based on PubMed, named RCMR 280k. This corpus comprises approximately 280,000 structured abstracts, totaling 3,386,008 sentences, each sentence is labeled with one of five categories: Background, Objective, Method, Result, or Conclusion. We also construct a subset of RCMR, named RCMR_RCT, corresponding to medical subdomain of RCTs. We conduct comparison experiments using our RCMR, RCMR_RCT with PubMed 380k and PubMed 200k RCT, respectively. The best results, obtained using the MSMBERT model, show that: (1) our RCMR outperforms PubMed 380k by 0.82%, while our RCMR_RCT outperforms PubMed 200k RCT by 9.35%; (2) compared with PubMed 380k, our corpus achieve better improvement on the Results and Conclusions categories, with average F1 performance improves 1% and 0.82%, respectively; (3) compared with PubMed 200k RCT, our corpus significantly improves the performance in the Background and Objective categories, with average F1 scores improves 28.31% and 37.22%, respectively. To the best of our knowledge, our RCMR is among the rarely high-quality, resource-rich refined PubMed corpora available. Our work in this paper has been applied in the SciAIEngine, which is openly accessible for researchers to conduct move recognition task.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"511-536"},"PeriodicalIF":3.9,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41438269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association discovery and outlier detection of air pollution emissions from industrial enterprises driven by big data 大数据驱动下工业企业大气污染排放的关联发现与离群值检测
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-27 DOI: 10.1162/dint_a_00205
Zhen Peng, Yunxiao Zhang, Yunchong Wang, Tianle Tang
ABSTRACT Air pollution is a major issue related to national economy and people's livelihood. At present, the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole, and is a lack of attention to enterprise pollutant emissions from the micro level. Limited by the amount and time granularity of data from enterprises, enterprise pollutant emissions are still understudied. Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei, the data mining of enterprises pollution emissions is carried out in the paper, including the association analysis between different features based on grey association, the association mining between different data based on association rule and the outlier detection based on clustering. The results show that: (1) The industries affecting NOx and SO2 mainly are electric power, heat production and supply industry, metal smelting and processing industries in Beijing-Tianjin-Hebei; (2) These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules; (3) The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters, of which three categories belong to outliers with excessive emissions of total VOCs, PM and NH3 respectively.
摘要大气污染是关系国计民生的重大问题。目前,对大气污染的研究大多集中在特定行业或地区的污染物排放,缺乏从微观层面对企业污染物排放的关注。受企业数据量和时间粒度的限制,企业污染物排放的研究仍然不足。本文以京津冀监测的工业企业大气污染排放大数据为驱动,对企业污染排放进行数据挖掘,包括基于灰色关联的不同特征之间的关联分析、基于关联规则的不同数据之间的关联挖掘和基于聚类的异常值检测。结果表明:(1)京津冀地区影响NOx和SO2的行业主要是电力、热力生产和供应行业、金属冶炼和加工行业;(2) 河北省衡水市和石家庄市附近的这些地区形成了强有力的关联规则;(3) 京津冀工业企业分为六类,其中三类分别属于VOCs、PM和NH3排放总量超标的异常值。
{"title":"Association discovery and outlier detection of air pollution emissions from industrial enterprises driven by big data","authors":"Zhen Peng, Yunxiao Zhang, Yunchong Wang, Tianle Tang","doi":"10.1162/dint_a_00205","DOIUrl":"https://doi.org/10.1162/dint_a_00205","url":null,"abstract":"ABSTRACT Air pollution is a major issue related to national economy and people's livelihood. At present, the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole, and is a lack of attention to enterprise pollutant emissions from the micro level. Limited by the amount and time granularity of data from enterprises, enterprise pollutant emissions are still understudied. Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei, the data mining of enterprises pollution emissions is carried out in the paper, including the association analysis between different features based on grey association, the association mining between different data based on association rule and the outlier detection based on clustering. The results show that: (1) The industries affecting NOx and SO2 mainly are electric power, heat production and supply industry, metal smelting and processing industries in Beijing-Tianjin-Hebei; (2) These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules; (3) The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters, of which three categories belong to outliers with excessive emissions of total VOCs, PM and NH3 respectively.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"438-456"},"PeriodicalIF":3.9,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42742773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building Community Consensus for Scientific Metadata with YAMZ 用YAMZ构建科学元数据的社区共识
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-01 DOI: 10.1162/dint_a_00211
Jane Greenberg, Scott McClellan, Christopher B. Rauch, Xintong Zhao, Mat Kelly, Yuan An, J. Kunze, Rachel Orenstein, Claire E. Porter, V. Meschke, Eric Toberer
ABSTRACT This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a series of metadata standardization challenges, explores crowdsourcing factors that offer possible solutions, and introduces the YAMZ system. A YAMZ demonstration is presented with members of the Toberer materials science laboratory at the Colorado School of Mines, where there is a need to confirm and maintain a shared understanding for the vocabulary supporting research documentation, data management, and their larger metadata infrastructure. The demonstration involves three key steps: 1) Sampling terms for the demonstration, 2) Engaging graduate student researchers in the demonstration, and 3) Reflecting on the demonstration. The results of these steps, including examples of the dialog provenance among lab members and voting, show the ease with YAMZ can facilitate building metadata vocabulary consensus. The conclusion discusses implications and highlights next steps.
本文报告了YAMZ (Yet Another Metadata Zoo)作为围绕元数据术语建立社区共识的机制的演示。该演示的动机是元数据标准环境的复杂性,以及研究人员需要更多用户友好的方法来实现词汇共识。本文回顾了一系列元数据标准化挑战,探讨了提供可能解决方案的众包因素,并介绍了YAMZ系统。科罗拉多矿业学院的Toberer材料科学实验室的成员演示了YAMZ,在那里需要确认和维护对支持研究文档、数据管理及其更大的元数据基础设施的词汇表的共同理解。演示包括三个关键步骤:1)演示的采样条款,2)让研究生研究人员参与演示,以及3)对演示进行反思。这些步骤的结果,包括实验室成员之间对话来源和投票的示例,表明使用YAMZ可以方便地构建元数据词汇表共识。结论部分讨论了影响并强调了下一步的步骤。
{"title":"Building Community Consensus for Scientific Metadata with YAMZ","authors":"Jane Greenberg, Scott McClellan, Christopher B. Rauch, Xintong Zhao, Mat Kelly, Yuan An, J. Kunze, Rachel Orenstein, Claire E. Porter, V. Meschke, Eric Toberer","doi":"10.1162/dint_a_00211","DOIUrl":"https://doi.org/10.1162/dint_a_00211","url":null,"abstract":"ABSTRACT This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a series of metadata standardization challenges, explores crowdsourcing factors that offer possible solutions, and introduces the YAMZ system. A YAMZ demonstration is presented with members of the Toberer materials science laboratory at the Colorado School of Mines, where there is a need to confirm and maintain a shared understanding for the vocabulary supporting research documentation, data management, and their larger metadata infrastructure. The demonstration involves three key steps: 1) Sampling terms for the demonstration, 2) Engaging graduate student researchers in the demonstration, and 3) Reflecting on the demonstration. The results of these steps, including examples of the dialog provenance among lab members and voting, show the ease with YAMZ can facilitate building metadata vocabulary consensus. The conclusion discusses implications and highlights next steps.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"242-260"},"PeriodicalIF":3.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48643178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Domain Repository Connectivity 改进域存储库连通性
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-01 DOI: 10.1162/dint_a_00120
T. Habermann
ABSTRACT Domain repositories, i.e. repositories that store, manage, and persist data pertaining to a specific scientific domain, are common and growing in the research landscape. Many of these repositories develop close, long-term communities made up of individuals and organizations that collect, analyze, and publish results based on the data in the repositories. Connections between these datasets, papers, people, and organizations are an important part of the knowledge infrastructure surrounding the repository. All these research objects, people, and organizations can now be identified using various unique and persistent identifiers (PIDs) and it is possible for domain repositories to build on their existing communities to facilitate and accelerate the identifier adoption process. As community members contribute to multiple datasets and articles, identifiers for them, once found, can be used multiple times. We explore this idea by defining a connectivity metric and applying it to datasets collected and papers published by members of the UNAVCO community. Finding identifiers in DataCite and Crossref metadata and spreading those identifiers through the UNAVCO DataCite metadata can increase connectivity from less than 10% to close to 50% for people and organizations.
领域存储库,即存储、管理和持久化与特定科学领域有关的数据的存储库,在研究领域中很常见并且正在增长。这些存储库中有许多开发了紧密的、长期的社区,社区由个人和组织组成,这些个人和组织根据存储库中的数据收集、分析和发布结果。这些数据集、论文、人员和组织之间的连接是围绕存储库的知识基础设施的重要组成部分。所有这些研究对象、人员和组织现在都可以使用各种唯一和持久标识符(pid)来标识,并且域存储库可以在其现有社区的基础上构建,以促进和加速标识符采用过程。由于社区成员贡献了多个数据集和文章,他们的标识符一旦被发现,就可以多次使用。我们通过定义连接度量并将其应用于联合国维和部队社区成员收集的数据集和发表的论文来探索这一想法。在DataCite和Crossref元数据中查找标识符,并通过联阿维和部队DataCite元数据传播这些标识符,可将个人和组织的连通性从不到10%提高到接近50%。
{"title":"Improving Domain Repository Connectivity","authors":"T. Habermann","doi":"10.1162/dint_a_00120","DOIUrl":"https://doi.org/10.1162/dint_a_00120","url":null,"abstract":"ABSTRACT Domain repositories, i.e. repositories that store, manage, and persist data pertaining to a specific scientific domain, are common and growing in the research landscape. Many of these repositories develop close, long-term communities made up of individuals and organizations that collect, analyze, and publish results based on the data in the repositories. Connections between these datasets, papers, people, and organizations are an important part of the knowledge infrastructure surrounding the repository. All these research objects, people, and organizations can now be identified using various unique and persistent identifiers (PIDs) and it is possible for domain repositories to build on their existing communities to facilitate and accelerate the identifier adoption process. As community members contribute to multiple datasets and articles, identifiers for them, once found, can be used multiple times. We explore this idea by defining a connectivity metric and applying it to datasets collected and papers published by members of the UNAVCO community. Finding identifiers in DataCite and Crossref metadata and spreading those identifiers through the UNAVCO DataCite metadata can increase connectivity from less than 10% to close to 50% for people and organizations.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"6-26"},"PeriodicalIF":3.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44228283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Metadata as Data Intelligence 元数据作为数据智能
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-01 DOI: 10.1162/dint_e_00212
Jane Greenberg, Mingfang Wu, Wei Liu, Fenghong Liu
Metadata, as a type of data, describes content, provides context, documents transactions, and situates data. Interest in metadata has steadily grown over the last several decades, motivated initially by the increase in digital information, open access, early data sharing policies, and interoperability goals. This foundation has accelerated in more recent times, due to the increase in research data management policies and advances in AI. Specific to research data management, one of the larger factors has been the global adoption of the FAIR (findable, accessible, interoperable, and reusable) data principles [1, 2], which are highly metadatadriven. Additionally, researchers across nearly every domain are interested in leveraging metadata for machine learning and other AI applications. The accelerated interest in metadata expands across other communities as well. For example, industry seeks metadata to meet company goals; and users of information systems and social computing applications wish to know how their metadata is being used and demand greater control of who has access to their data and metadata. All of these developments underscore the fact that metadata is intelligent data, or what Riley has called value added data [3]. Overall, this intense and growing interest in metadata helps to frame the contributions included in this special issue of Data Intelligence.
元数据作为一种数据类型,描述内容、提供上下文、记录事务并定位数据。在过去的几十年里,人们对元数据的兴趣稳步增长,最初的动机是数字信息、开放访问、早期数据共享政策和互操作性目标的增加。最近,由于研究数据管理政策的增加和人工智能的进步,这一基础加速了。具体到研究数据管理,一个更大的因素是全球采用了FAIR(可查找、可访问、可互操作和可重复使用)数据原则[1,2],这是高度元数据驱动的。此外,几乎每个领域的研究人员都对利用元数据进行机器学习和其他人工智能应用感兴趣。对元数据的兴趣加速了,这也扩展到了其他社区。例如,行业寻求元数据以满足公司目标;信息系统和社交计算应用程序的用户希望知道他们的元数据是如何使用的,并要求对谁可以访问他们的数据和元数据进行更大的控制。所有这些发展都强调了一个事实,即元数据是智能数据,或者Riley所说的增值数据[3]。总的来说,这种对元数据的强烈和日益增长的兴趣有助于界定本期《数据智能》特刊中的贡献。
{"title":"Metadata as Data Intelligence","authors":"Jane Greenberg, Mingfang Wu, Wei Liu, Fenghong Liu","doi":"10.1162/dint_e_00212","DOIUrl":"https://doi.org/10.1162/dint_e_00212","url":null,"abstract":"Metadata, as a type of data, describes content, provides context, documents transactions, and situates data. Interest in metadata has steadily grown over the last several decades, motivated initially by the increase in digital information, open access, early data sharing policies, and interoperability goals. This foundation has accelerated in more recent times, due to the increase in research data management policies and advances in AI. Specific to research data management, one of the larger factors has been the global adoption of the FAIR (findable, accessible, interoperable, and reusable) data principles [1, 2], which are highly metadatadriven. Additionally, researchers across nearly every domain are interested in leveraging metadata for machine learning and other AI applications. The accelerated interest in metadata expands across other communities as well. For example, industry seeks metadata to meet company goals; and users of information systems and social computing applications wish to know how their metadata is being used and demand greater control of who has access to their data and metadata. All of these developments underscore the fact that metadata is intelligent data, or what Riley has called value added data [3]. Overall, this intense and growing interest in metadata helps to frame the contributions included in this special issue of Data Intelligence.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"1-5"},"PeriodicalIF":3.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49406830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view Feature Learning for the Over-penalty in Adversarial Domain Adaptation 对抗域自适应中过度惩罚的多视图特征学习
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00199
Yuhong Zhang, Jianqing Wu, Qi Zhang, Xuegang Hu
Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different distribution. Recently, adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation learning. However, the adversarial methods learn the transferability at the expense of the discriminability in feature representation, leading to low generalization to the target domain. To this end, we propose a Multi-view Feature Learning method for the Overpenalty in Adversarial Domain Adaptation. Specifically, multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation, which will counter the over-penalty for discriminability in adversarial training. Besides, the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable features. Extensive experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.
领域自适应旨在将知识从标记的源领域转移到遵循相似但不同分布的未标记的目标领域。近年来,由于领域不变特征表示学习的优异性能,基于对抗性的方法取得了显著的成功。然而,对抗性方法以牺牲特征表示中的可分辨性为代价来学习可转移性,导致对目标域的泛化能力较低。为此,我们提出了一种多视角特征学习方法来解决对抗性领域适应中的过度惩罚问题。具体而言,提出了多视图表示学习来丰富领域不变特征表示中包含的判别信息,这将克服对抗性训练中对判别性的过度惩罚。此外,提出了域内的类分布来代替域间的类分布,以在可转移特征的学习中捕获更多的判别信息。大量实验表明,我们的方法可以在保持可转移性的同时提高可分辨性,超过了领域自适应基准数据集中最先进的方法。
{"title":"Multi-view Feature Learning for the Over-penalty in Adversarial Domain Adaptation","authors":"Yuhong Zhang, Jianqing Wu, Qi Zhang, Xuegang Hu","doi":"10.1162/dint_a_00199","DOIUrl":"https://doi.org/10.1162/dint_a_00199","url":null,"abstract":"\u0000 Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different distribution. Recently, adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation learning. However, the adversarial methods learn the transferability at the expense of the discriminability in feature representation, leading to low generalization to the target domain. To this end, we propose a Multi-view Feature Learning method for the Overpenalty in Adversarial Domain Adaptation. Specifically, multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation, which will counter the over-penalty for discriminability in adversarial training. Besides, the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable features. Extensive experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43526918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrating Functional Status Information into Knowledge Graphs to Support Self-Health Management 将功能状态信息集成到知识图中以支持自我健康管理
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-02-22 DOI: 10.1162/dint_a_00203
M. Dragoni, Tania Bailoni, Ivan Donadello, Jean-Claude Martin, H. Lindgren
ABSTRACT Functional Status Information (FSI) describes physical and mental wellness at the whole-person level. It includes information on activity performance, social role participation, and environmental and personal factors that affect the well-being and quality of life. Collecting and analyzing this information is critical to address the needs for caring for an aging global population, and to provide effective care for individuals with chronic conditions, multi-morbidity, and disability. Personal knowledge graphs (PKGs) represent a suitable way for meaning in a complete and structured way all information related to people's FSI and reasoning over them to build tailored coaching solutions supporting them in daily life for conducting a healthy living. In this paper, we present the development process related to the creation of a PKG by starting from the HeLiS ontology in order to enable the design of an AI-enabled system with the aim of increasing, within people, the self-awareness of their own functional status. In particular, we focus on the three modules extending the HeLiS ontology aiming to represent (i) enablers and (ii) barriers playing potential roles in improving (or deteriorating) own functional status and (iii) arguments driving the FSI collection process. Finally, we show how these modules have been instantiated into real-world scenarios.
摘要功能状态信息(FSI)描述了整个人的身心健康。它包括有关活动表现、社会角色参与以及影响幸福感和生活质量的环境和个人因素的信息。收集和分析这些信息对于满足全球老龄化人口的护理需求以及为患有慢性病、多发病和残疾的个人提供有效护理至关重要。个人知识图(PKG)代表了一种合适的方式,以完整和结构化的方式表达与人们的FSI相关的所有信息,并对其进行推理,以建立量身定制的指导解决方案,支持他们在日常生活中过上健康的生活。在本文中,我们从HeLiS本体论出发,介绍了与创建PKG相关的开发过程,以实现人工智能系统的设计,目的是提高人们对自身功能状态的自我意识。特别是,我们专注于扩展HeLiS本体的三个模块,旨在表示(i)使能因素和(ii)在改善(或恶化)自身功能状态方面发挥潜在作用的障碍,以及(iii)驱动FSI收集过程的论点。最后,我们展示了如何将这些模块实例化到真实世界的场景中。
{"title":"Integrating Functional Status Information into Knowledge Graphs to Support Self-Health Management","authors":"M. Dragoni, Tania Bailoni, Ivan Donadello, Jean-Claude Martin, H. Lindgren","doi":"10.1162/dint_a_00203","DOIUrl":"https://doi.org/10.1162/dint_a_00203","url":null,"abstract":"ABSTRACT Functional Status Information (FSI) describes physical and mental wellness at the whole-person level. It includes information on activity performance, social role participation, and environmental and personal factors that affect the well-being and quality of life. Collecting and analyzing this information is critical to address the needs for caring for an aging global population, and to provide effective care for individuals with chronic conditions, multi-morbidity, and disability. Personal knowledge graphs (PKGs) represent a suitable way for meaning in a complete and structured way all information related to people's FSI and reasoning over them to build tailored coaching solutions supporting them in daily life for conducting a healthy living. In this paper, we present the development process related to the creation of a PKG by starting from the HeLiS ontology in order to enable the design of an AI-enabled system with the aim of increasing, within people, the self-awareness of their own functional status. In particular, we focus on the three modules extending the HeLiS ontology aiming to represent (i) enablers and (ii) barriers playing potential roles in improving (or deteriorating) own functional status and (iii) arguments driving the FSI collection process. Finally, we show how these modules have been instantiated into real-world scenarios.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"636-662"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49224482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1