Healthy body is the foundation of young people's growth. With the popularization and globalization of the Internet, multimedia technology is rapidly changing the impact of traditional media on the growth of young people. The current health situation of young people is not optimistic. The decline in physical fitness, obesity and psychological dysplasia of adolescents have aroused the concern of all sectors of society. In recent years, the emergence and dissemination of big data (BD) has brought a new dimension to the value of data applications. The combination of BD and youth health services provides young people with good health opportunities. Through the recording, analysis and release of adolescent physical health data, the system has established an extensive knowledge database on adolescent physical and mental health, thus improving the physical health of adolescents. This paper summarized and combed the overview and application of BD, and analyzed and discussed the reasons for the continuous decline of young people's physique. Through the analysis of the application of BD in the promotion of young people's physical health, this paper proposed more achievable improvement strategies and plans, and then summarizes and discusses the experiment. According to the survey and experiment, the random simulation algorithm was introduced into daily exercise, diet and life preference. The new system and health improvement strategy designed for teenagers’ physical health using BD could help students improve their physical health by 55%.
{"title":"Role of Sports Based on Big Data Analysis in Promoting the Physique and Health of Children and Adolescents","authors":"Pengfei Wen, Jinsong Wu","doi":"10.1162/dint_a_00207","DOIUrl":"https://doi.org/10.1162/dint_a_00207","url":null,"abstract":"\u0000 Healthy body is the foundation of young people's growth. With the popularization and globalization of the Internet, multimedia technology is rapidly changing the impact of traditional media on the growth of young people. The current health situation of young people is not optimistic. The decline in physical fitness, obesity and psychological dysplasia of adolescents have aroused the concern of all sectors of society. In recent years, the emergence and dissemination of big data (BD) has brought a new dimension to the value of data applications. The combination of BD and youth health services provides young people with good health opportunities. Through the recording, analysis and release of adolescent physical health data, the system has established an extensive knowledge database on adolescent physical and mental health, thus improving the physical health of adolescents. This paper summarized and combed the overview and application of BD, and analyzed and discussed the reasons for the continuous decline of young people's physique. Through the analysis of the application of BD in the promotion of young people's physical health, this paper proposed more achievable improvement strategies and plans, and then summarizes and discusses the experiment. According to the survey and experiment, the random simulation algorithm was introduced into daily exercise, diet and life preference. The new system and health improvement strategy designed for teenagers’ physical health using BD could help students improve their physical health by 55%.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49231209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chahrazed B. Bachir Belmehdi, A. Khiat, Nabil Keskes
The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations. Data virtualization systems, including Ontology-Based Data Access (ODBA) query data on-the-fly against the original data sources without any prior data materialization. Existing approaches by design use a fixed model e.g., TABULAR as the only Virtual Data Model - a uniform schema built on-the-fly to load, transform, and join relevant data. While other data models, such as GRAPH or DOCUMENT, are more flexible and, thus, can be more suitable for some common types of queries, such as join or nested queries. Those queries are hard to predict because they depend on many criteria, such as query plan, data model, data size, and operations. To address the problem of selecting the optimal virtual data model for queries on large datasets, we present a new approach that (1) builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and (2) calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries. OPTIMA - implementation of our approach currently leverages state-of-the-art Big Data technologies, Apache-Spark and Graphx, and implements two virtual data models, GRAPH and TABULAR, and supports out-of-the-box five data s ources m odels: property graph, document-based, e.g., wide-columnar, relational, and tabular, stored in Neo4j, MongoDB, Cassandra, MySQL, and CSV respectively. Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831, thus, a reduction in query execution time of over 40% for the tabular model selection and over 30% for the graph model selection.
{"title":"Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data","authors":"Chahrazed B. Bachir Belmehdi, A. Khiat, Nabil Keskes","doi":"10.1162/dint_a_00216","DOIUrl":"https://doi.org/10.1162/dint_a_00216","url":null,"abstract":"\u0000 The growth of generated data in the industry requires new efficient big data integration approaches for uniform data access by end-users to perform better business operations. Data virtualization systems, including Ontology-Based Data Access (ODBA) query data on-the-fly against the original data sources without any prior data materialization. Existing approaches by design use a fixed model e.g., TABULAR as the only Virtual Data Model - a uniform schema built on-the-fly to load, transform, and join relevant data. While other data models, such as GRAPH or DOCUMENT, are more flexible and, thus, can be more suitable for some common types of queries, such as join or nested queries. Those queries are hard to predict because they depend on many criteria, such as query plan, data model, data size, and operations. To address the problem of selecting the optimal virtual data model for queries on large datasets, we present a new approach that (1) builds on the principal of OBDA to query and join large heterogeneous data in a distributed manner and (2) calls a deep learning method to predict the optimal virtual data model using features extracted from SPARQL queries. OPTIMA - implementation of our approach currently leverages state-of-the-art Big Data technologies, Apache-Spark and Graphx, and implements two virtual data models, GRAPH and TABULAR, and supports out-of-the-box five data s ources m odels: property graph, document-based, e.g., wide-columnar, relational, and tabular, stored in Neo4j, MongoDB, Cassandra, MySQL, and CSV respectively. Extensive experiments show that our approach is returning the optimal virtual model with an accuracy of 0.831, thus, a reduction in query execution time of over 40% for the tabular model selection and over 30% for the graph model selection.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44346624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuerong Li, W. Shang, Xun Zhang, Baoguo Shan, Xiang Wang
ABSTRACT The total electricity consumption (TEC) can accurately reflect the operation of the national economy, and the forecasting of the TEC can help predict the economic development trend, as well as provide insights for the formulation of macro policies. Nowadays, high-frequency and massive multi-source data provide a new way to predict the TEC. In this paper, a “seasonal-cumulative temperature index” is constructed based on high-frequency temperature data, and a mixed-frequency prediction model based on multi-source big data (Mixed Data Sampling with Monthly Temperature and Daily Temperature index, MIDAS-MT-DT) is proposed. Experimental results show that the MIDAS-MT-DT model achieves higher prediction accuracy, and the “seasonal-cumulative temperature index” can improve prediction accuracy.
总用电量(TEC)能够准确反映国民经济的运行情况,对其进行预测有助于预测经济发展趋势,并为宏观政策的制定提供参考。如今,高频率、海量的多源数据为TEC的预测提供了新的途径。本文基于高频温度数据构建了“季节积温指数”,提出了基于多源大数据的混合频率预测模型(Mixed data Sampling with Monthly temperature and Daily temperature index, MIDAS-MT-DT)。实验结果表明,MIDAS-MT-DT模型具有较高的预测精度,“季节积温指数”可以提高预测精度。
{"title":"Total Electricity Consumption Forecasting Based on Temperature Composite Index and Mixed-Frequency Models","authors":"Xuerong Li, W. Shang, Xun Zhang, Baoguo Shan, Xiang Wang","doi":"10.1162/dint_a_00215","DOIUrl":"https://doi.org/10.1162/dint_a_00215","url":null,"abstract":"ABSTRACT The total electricity consumption (TEC) can accurately reflect the operation of the national economy, and the forecasting of the TEC can help predict the economic development trend, as well as provide insights for the formulation of macro policies. Nowadays, high-frequency and massive multi-source data provide a new way to predict the TEC. In this paper, a “seasonal-cumulative temperature index” is constructed based on high-frequency temperature data, and a mixed-frequency prediction model based on multi-source big data (Mixed Data Sampling with Monthly Temperature and Daily Temperature index, MIDAS-MT-DT) is proposed. Experimental results show that the MIDAS-MT-DT model achieves higher prediction accuracy, and the “seasonal-cumulative temperature index” can improve prediction accuracy.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"750-766"},"PeriodicalIF":3.9,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41730012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Existing datasets for move recognition, such as PubMed 200k RCT, exhibit several problems that significantly impact recognition performance, especially for Background and Objective labels. In order to improve the move recognition performance, we introduce a method and construct a refined corpus based on PubMed, named RCMR 280k. This corpus comprises approximately 280,000 structured abstracts, totaling 3,386,008 sentences, each sentence is labeled with one of five categories: Background, Objective, Method, Result, or Conclusion. We also construct a subset of RCMR, named RCMR_RCT, corresponding to medical subdomain of RCTs. We conduct comparison experiments using our RCMR, RCMR_RCT with PubMed 380k and PubMed 200k RCT, respectively. The best results, obtained using the MSMBERT model, show that: (1) our RCMR outperforms PubMed 380k by 0.82%, while our RCMR_RCT outperforms PubMed 200k RCT by 9.35%; (2) compared with PubMed 380k, our corpus achieve better improvement on the Results and Conclusions categories, with average F1 performance improves 1% and 0.82%, respectively; (3) compared with PubMed 200k RCT, our corpus significantly improves the performance in the Background and Objective categories, with average F1 scores improves 28.31% and 37.22%, respectively. To the best of our knowledge, our RCMR is among the rarely high-quality, resource-rich refined PubMed corpora available. Our work in this paper has been applied in the SciAIEngine, which is openly accessible for researchers to conduct move recognition task.
{"title":"RCMR 280k: Refined Corpus for Move Recognition Based on PubMed Abstracts","authors":"Jie Li, Gaihong Yu, Zhixiong Zhang","doi":"10.1162/dint_a_00214","DOIUrl":"https://doi.org/10.1162/dint_a_00214","url":null,"abstract":"ABSTRACT Existing datasets for move recognition, such as PubMed 200k RCT, exhibit several problems that significantly impact recognition performance, especially for Background and Objective labels. In order to improve the move recognition performance, we introduce a method and construct a refined corpus based on PubMed, named RCMR 280k. This corpus comprises approximately 280,000 structured abstracts, totaling 3,386,008 sentences, each sentence is labeled with one of five categories: Background, Objective, Method, Result, or Conclusion. We also construct a subset of RCMR, named RCMR_RCT, corresponding to medical subdomain of RCTs. We conduct comparison experiments using our RCMR, RCMR_RCT with PubMed 380k and PubMed 200k RCT, respectively. The best results, obtained using the MSMBERT model, show that: (1) our RCMR outperforms PubMed 380k by 0.82%, while our RCMR_RCT outperforms PubMed 200k RCT by 9.35%; (2) compared with PubMed 380k, our corpus achieve better improvement on the Results and Conclusions categories, with average F1 performance improves 1% and 0.82%, respectively; (3) compared with PubMed 200k RCT, our corpus significantly improves the performance in the Background and Objective categories, with average F1 scores improves 28.31% and 37.22%, respectively. To the best of our knowledge, our RCMR is among the rarely high-quality, resource-rich refined PubMed corpora available. Our work in this paper has been applied in the SciAIEngine, which is openly accessible for researchers to conduct move recognition task.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"511-536"},"PeriodicalIF":3.9,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41438269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Air pollution is a major issue related to national economy and people's livelihood. At present, the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole, and is a lack of attention to enterprise pollutant emissions from the micro level. Limited by the amount and time granularity of data from enterprises, enterprise pollutant emissions are still understudied. Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei, the data mining of enterprises pollution emissions is carried out in the paper, including the association analysis between different features based on grey association, the association mining between different data based on association rule and the outlier detection based on clustering. The results show that: (1) The industries affecting NOx and SO2 mainly are electric power, heat production and supply industry, metal smelting and processing industries in Beijing-Tianjin-Hebei; (2) These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules; (3) The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters, of which three categories belong to outliers with excessive emissions of total VOCs, PM and NH3 respectively.
{"title":"Association discovery and outlier detection of air pollution emissions from industrial enterprises driven by big data","authors":"Zhen Peng, Yunxiao Zhang, Yunchong Wang, Tianle Tang","doi":"10.1162/dint_a_00205","DOIUrl":"https://doi.org/10.1162/dint_a_00205","url":null,"abstract":"ABSTRACT Air pollution is a major issue related to national economy and people's livelihood. At present, the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole, and is a lack of attention to enterprise pollutant emissions from the micro level. Limited by the amount and time granularity of data from enterprises, enterprise pollutant emissions are still understudied. Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei, the data mining of enterprises pollution emissions is carried out in the paper, including the association analysis between different features based on grey association, the association mining between different data based on association rule and the outlier detection based on clustering. The results show that: (1) The industries affecting NOx and SO2 mainly are electric power, heat production and supply industry, metal smelting and processing industries in Beijing-Tianjin-Hebei; (2) These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules; (3) The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters, of which three categories belong to outliers with excessive emissions of total VOCs, PM and NH3 respectively.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"438-456"},"PeriodicalIF":3.9,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42742773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jane Greenberg, Scott McClellan, Christopher B. Rauch, Xintong Zhao, Mat Kelly, Yuan An, J. Kunze, Rachel Orenstein, Claire E. Porter, V. Meschke, Eric Toberer
ABSTRACT This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a series of metadata standardization challenges, explores crowdsourcing factors that offer possible solutions, and introduces the YAMZ system. A YAMZ demonstration is presented with members of the Toberer materials science laboratory at the Colorado School of Mines, where there is a need to confirm and maintain a shared understanding for the vocabulary supporting research documentation, data management, and their larger metadata infrastructure. The demonstration involves three key steps: 1) Sampling terms for the demonstration, 2) Engaging graduate student researchers in the demonstration, and 3) Reflecting on the demonstration. The results of these steps, including examples of the dialog provenance among lab members and voting, show the ease with YAMZ can facilitate building metadata vocabulary consensus. The conclusion discusses implications and highlights next steps.
本文报告了YAMZ (Yet Another Metadata Zoo)作为围绕元数据术语建立社区共识的机制的演示。该演示的动机是元数据标准环境的复杂性,以及研究人员需要更多用户友好的方法来实现词汇共识。本文回顾了一系列元数据标准化挑战,探讨了提供可能解决方案的众包因素,并介绍了YAMZ系统。科罗拉多矿业学院的Toberer材料科学实验室的成员演示了YAMZ,在那里需要确认和维护对支持研究文档、数据管理及其更大的元数据基础设施的词汇表的共同理解。演示包括三个关键步骤:1)演示的采样条款,2)让研究生研究人员参与演示,以及3)对演示进行反思。这些步骤的结果,包括实验室成员之间对话来源和投票的示例,表明使用YAMZ可以方便地构建元数据词汇表共识。结论部分讨论了影响并强调了下一步的步骤。
{"title":"Building Community Consensus for Scientific Metadata with YAMZ","authors":"Jane Greenberg, Scott McClellan, Christopher B. Rauch, Xintong Zhao, Mat Kelly, Yuan An, J. Kunze, Rachel Orenstein, Claire E. Porter, V. Meschke, Eric Toberer","doi":"10.1162/dint_a_00211","DOIUrl":"https://doi.org/10.1162/dint_a_00211","url":null,"abstract":"ABSTRACT This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a series of metadata standardization challenges, explores crowdsourcing factors that offer possible solutions, and introduces the YAMZ system. A YAMZ demonstration is presented with members of the Toberer materials science laboratory at the Colorado School of Mines, where there is a need to confirm and maintain a shared understanding for the vocabulary supporting research documentation, data management, and their larger metadata infrastructure. The demonstration involves three key steps: 1) Sampling terms for the demonstration, 2) Engaging graduate student researchers in the demonstration, and 3) Reflecting on the demonstration. The results of these steps, including examples of the dialog provenance among lab members and voting, show the ease with YAMZ can facilitate building metadata vocabulary consensus. The conclusion discusses implications and highlights next steps.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"242-260"},"PeriodicalIF":3.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48643178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACT Domain repositories, i.e. repositories that store, manage, and persist data pertaining to a specific scientific domain, are common and growing in the research landscape. Many of these repositories develop close, long-term communities made up of individuals and organizations that collect, analyze, and publish results based on the data in the repositories. Connections between these datasets, papers, people, and organizations are an important part of the knowledge infrastructure surrounding the repository. All these research objects, people, and organizations can now be identified using various unique and persistent identifiers (PIDs) and it is possible for domain repositories to build on their existing communities to facilitate and accelerate the identifier adoption process. As community members contribute to multiple datasets and articles, identifiers for them, once found, can be used multiple times. We explore this idea by defining a connectivity metric and applying it to datasets collected and papers published by members of the UNAVCO community. Finding identifiers in DataCite and Crossref metadata and spreading those identifiers through the UNAVCO DataCite metadata can increase connectivity from less than 10% to close to 50% for people and organizations.
{"title":"Improving Domain Repository Connectivity","authors":"T. Habermann","doi":"10.1162/dint_a_00120","DOIUrl":"https://doi.org/10.1162/dint_a_00120","url":null,"abstract":"ABSTRACT Domain repositories, i.e. repositories that store, manage, and persist data pertaining to a specific scientific domain, are common and growing in the research landscape. Many of these repositories develop close, long-term communities made up of individuals and organizations that collect, analyze, and publish results based on the data in the repositories. Connections between these datasets, papers, people, and organizations are an important part of the knowledge infrastructure surrounding the repository. All these research objects, people, and organizations can now be identified using various unique and persistent identifiers (PIDs) and it is possible for domain repositories to build on their existing communities to facilitate and accelerate the identifier adoption process. As community members contribute to multiple datasets and articles, identifiers for them, once found, can be used multiple times. We explore this idea by defining a connectivity metric and applying it to datasets collected and papers published by members of the UNAVCO community. Finding identifiers in DataCite and Crossref metadata and spreading those identifiers through the UNAVCO DataCite metadata can increase connectivity from less than 10% to close to 50% for people and organizations.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"6-26"},"PeriodicalIF":3.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44228283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jane Greenberg, Mingfang Wu, Wei Liu, Fenghong Liu
Metadata, as a type of data, describes content, provides context, documents transactions, and situates data. Interest in metadata has steadily grown over the last several decades, motivated initially by the increase in digital information, open access, early data sharing policies, and interoperability goals. This foundation has accelerated in more recent times, due to the increase in research data management policies and advances in AI. Specific to research data management, one of the larger factors has been the global adoption of the FAIR (findable, accessible, interoperable, and reusable) data principles [1, 2], which are highly metadatadriven. Additionally, researchers across nearly every domain are interested in leveraging metadata for machine learning and other AI applications. The accelerated interest in metadata expands across other communities as well. For example, industry seeks metadata to meet company goals; and users of information systems and social computing applications wish to know how their metadata is being used and demand greater control of who has access to their data and metadata. All of these developments underscore the fact that metadata is intelligent data, or what Riley has called value added data [3]. Overall, this intense and growing interest in metadata helps to frame the contributions included in this special issue of Data Intelligence.
{"title":"Metadata as Data Intelligence","authors":"Jane Greenberg, Mingfang Wu, Wei Liu, Fenghong Liu","doi":"10.1162/dint_e_00212","DOIUrl":"https://doi.org/10.1162/dint_e_00212","url":null,"abstract":"Metadata, as a type of data, describes content, provides context, documents transactions, and situates data. Interest in metadata has steadily grown over the last several decades, motivated initially by the increase in digital information, open access, early data sharing policies, and interoperability goals. This foundation has accelerated in more recent times, due to the increase in research data management policies and advances in AI. Specific to research data management, one of the larger factors has been the global adoption of the FAIR (findable, accessible, interoperable, and reusable) data principles [1, 2], which are highly metadatadriven. Additionally, researchers across nearly every domain are interested in leveraging metadata for machine learning and other AI applications. The accelerated interest in metadata expands across other communities as well. For example, industry seeks metadata to meet company goals; and users of information systems and social computing applications wish to know how their metadata is being used and demand greater control of who has access to their data and metadata. All of these developments underscore the fact that metadata is intelligent data, or what Riley has called value added data [3]. Overall, this intense and growing interest in metadata helps to frame the contributions included in this special issue of Data Intelligence.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"1-5"},"PeriodicalIF":3.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49406830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different distribution. Recently, adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation learning. However, the adversarial methods learn the transferability at the expense of the discriminability in feature representation, leading to low generalization to the target domain. To this end, we propose a Multi-view Feature Learning method for the Overpenalty in Adversarial Domain Adaptation. Specifically, multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation, which will counter the over-penalty for discriminability in adversarial training. Besides, the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable features. Extensive experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.
{"title":"Multi-view Feature Learning for the Over-penalty in Adversarial Domain Adaptation","authors":"Yuhong Zhang, Jianqing Wu, Qi Zhang, Xuegang Hu","doi":"10.1162/dint_a_00199","DOIUrl":"https://doi.org/10.1162/dint_a_00199","url":null,"abstract":"\u0000 Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different distribution. Recently, adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation learning. However, the adversarial methods learn the transferability at the expense of the discriminability in feature representation, leading to low generalization to the target domain. To this end, we propose a Multi-view Feature Learning method for the Overpenalty in Adversarial Domain Adaptation. Specifically, multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation, which will counter the over-penalty for discriminability in adversarial training. Besides, the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable features. Extensive experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43526918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Dragoni, Tania Bailoni, Ivan Donadello, Jean-Claude Martin, H. Lindgren
ABSTRACT Functional Status Information (FSI) describes physical and mental wellness at the whole-person level. It includes information on activity performance, social role participation, and environmental and personal factors that affect the well-being and quality of life. Collecting and analyzing this information is critical to address the needs for caring for an aging global population, and to provide effective care for individuals with chronic conditions, multi-morbidity, and disability. Personal knowledge graphs (PKGs) represent a suitable way for meaning in a complete and structured way all information related to people's FSI and reasoning over them to build tailored coaching solutions supporting them in daily life for conducting a healthy living. In this paper, we present the development process related to the creation of a PKG by starting from the HeLiS ontology in order to enable the design of an AI-enabled system with the aim of increasing, within people, the self-awareness of their own functional status. In particular, we focus on the three modules extending the HeLiS ontology aiming to represent (i) enablers and (ii) barriers playing potential roles in improving (or deteriorating) own functional status and (iii) arguments driving the FSI collection process. Finally, we show how these modules have been instantiated into real-world scenarios.
{"title":"Integrating Functional Status Information into Knowledge Graphs to Support Self-Health Management","authors":"M. Dragoni, Tania Bailoni, Ivan Donadello, Jean-Claude Martin, H. Lindgren","doi":"10.1162/dint_a_00203","DOIUrl":"https://doi.org/10.1162/dint_a_00203","url":null,"abstract":"ABSTRACT Functional Status Information (FSI) describes physical and mental wellness at the whole-person level. It includes information on activity performance, social role participation, and environmental and personal factors that affect the well-being and quality of life. Collecting and analyzing this information is critical to address the needs for caring for an aging global population, and to provide effective care for individuals with chronic conditions, multi-morbidity, and disability. Personal knowledge graphs (PKGs) represent a suitable way for meaning in a complete and structured way all information related to people's FSI and reasoning over them to build tailored coaching solutions supporting them in daily life for conducting a healthy living. In this paper, we present the development process related to the creation of a PKG by starting from the HeLiS ontology in order to enable the design of an AI-enabled system with the aim of increasing, within people, the self-awareness of their own functional status. In particular, we focus on the three modules extending the HeLiS ontology aiming to represent (i) enablers and (ii) barriers playing potential roles in improving (or deteriorating) own functional status and (iii) arguments driving the FSI collection process. Finally, we show how these modules have been instantiated into real-world scenarios.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"636-662"},"PeriodicalIF":3.9,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49224482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}