首页 > 最新文献

Data and information management最新文献

英文 中文
DeepFake video detection: Insights into model generalisation — A Systematic review DeepFake视频检测:对模型泛化的见解-系统回顾
Pub Date : 2025-03-28 DOI: 10.1016/j.dim.2025.100099
Ramcharan Ramanaharan, Deepani B. Guruge, Johnson I. Agbinya
Deep learning generative models have progressed to a stage where distinguishing fake images and videos has become difficult, posing risks to personal integrity, potentially leading to social instability, and disrupting government functioning. Existing reviews have mainly focused on the approaches used to detect DeepFakes, and the data sets used for those approaches. However, challenges persist when attempting to generalise detection techniques to identify previously unseen datasets. The purpose of this systematic review is to explore state-of-the-art frameworks for DeepFake detection and provide readers with an understanding of the strengths and weaknesses of current approaches, as well as the generalisability of existing detection techniques. The study indicates that generalising DeepFake detection remains a challenge that requires further research. Moreover, 46.3% of the selected publications agreed that DeepFake detection techniques could be generalised to identify various types of DeepFakes. A key limitation in achieving generalisation is the tendency of models to overfit to available data datasets, reducing their effectiveness in adapting to new or unseen types of DeepFakes. This review emphasises the need for the development of extensive and diverse datasets that more accurately reflect the wide range of DeepFake manipulations encountered in real-world applications. Lastly, the paper explores potential advancements that could pave the way to the next generation of solutions against DeepFakes.
深度学习生成模型已经发展到难以识别虚假图像和视频的阶段,这给个人诚信带来了风险,可能导致社会不稳定,并扰乱政府运作。现有的评论主要集中在用于检测DeepFakes的方法以及用于这些方法的数据集上。然而,在试图推广检测技术以识别以前未见过的数据集时,挑战仍然存在。本系统综述的目的是探索最先进的DeepFake检测框架,并让读者了解当前方法的优缺点,以及现有检测技术的通用性。该研究表明,推广DeepFake检测仍然是一个需要进一步研究的挑战。此外,46.3%的选定出版物同意DeepFake检测技术可以推广到识别各种类型的DeepFakes。实现泛化的一个关键限制是模型倾向于过度拟合现有数据集,从而降低了它们适应新的或未见过的DeepFakes类型的有效性。这篇综述强调了开发广泛和多样化的数据集的必要性,这些数据集更准确地反映了在现实应用中遇到的广泛的DeepFake操作。最后,本文探讨了可能为下一代对抗DeepFakes的解决方案铺平道路的潜在进展。
{"title":"DeepFake video detection: Insights into model generalisation — A Systematic review","authors":"Ramcharan Ramanaharan,&nbsp;Deepani B. Guruge,&nbsp;Johnson I. Agbinya","doi":"10.1016/j.dim.2025.100099","DOIUrl":"10.1016/j.dim.2025.100099","url":null,"abstract":"<div><div>Deep learning generative models have progressed to a stage where distinguishing fake images and videos has become difficult, posing risks to personal integrity, potentially leading to social instability, and disrupting government functioning. Existing reviews have mainly focused on the approaches used to detect DeepFakes, and the data sets used for those approaches. However, challenges persist when attempting to generalise detection techniques to identify previously unseen datasets. The purpose of this systematic review is to explore state-of-the-art frameworks for DeepFake detection and provide readers with an understanding of the strengths and weaknesses of current approaches, as well as the generalisability of existing detection techniques. The study indicates that generalising DeepFake detection remains a challenge that requires further research. Moreover, 46.3% of the selected publications agreed that DeepFake detection techniques could be generalised to identify various types of DeepFakes. A key limitation in achieving generalisation is the tendency of models to overfit to available data datasets, reducing their effectiveness in adapting to new or unseen types of DeepFakes. This review emphasises the need for the development of extensive and diverse datasets that more accurately reflect the wide range of DeepFake manipulations encountered in real-world applications. Lastly, the paper explores potential advancements that could pave the way to the next generation of solutions against DeepFakes.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100099"},"PeriodicalIF":0.0,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating explainability in language classification models: A unified framework incorporating feature attribution methods and key factors affecting faithfulness 语言分类模型的可解释性评价:一个包含特征归因方法和影响可信度关键因素的统一框架
Pub Date : 2025-03-22 DOI: 10.1016/j.dim.2025.100101
Tahereh Dehdarirad
This paper presents a unified framework for evaluating explainability methods in language classification models, integrating feature attribution and interaction approaches while considering key factors impacting faithfulness: model architecture, dataset characteristics, and evidence type. By comparing classical (Logistic Regression and Random Forest) and transformer models (RoBERTa and DistilBERT), the faithfulness of SHAP, LIME, and Integrated Gradients (IG) across positive, negative, and all evidence types were examined. In classical models, SHAP and LIME generally provide faithful explanations for positive and all evidence types, with SHAP and Random Forest best handling negative evidence.
For transformer models, the faithfulness of LIME and SHAP varies by model, dataset, and evidence type. LIME performs consistently well in complex models like RoBERTa and DistilBERT, while SHAP excels with positive evidence across datasets and is most effective in RoBERTa for negative evidence. For all evidence, SHAP also shows broader applicability across evidence types, whereas LIME is suited to specific datasets, such as Brexit, especially in RoBERTa and DistilBERT. For longer texts, IG and SHAP outperform LIME, with SHAP excelling in complex architectures like RoBERTa. When using IG, RoBERTa provides slightly more faithful explanations than DistilBERT for positive evidence, though only DistilBERT aligns with expected trends for negative evidence.
Feature interaction analyses using Shapley Taylor Interaction (STI) and Archipelago reveal that RoBERTa consistently provides more cohesive explanations than DistilBERT across datasets, especially with Archipelago. STI-based models produce more interpretable, human-relevant phrases, achieving higher relevance ratings, especially when evaluated within full text contexts.
本文提出了一个统一的框架,用于评估语言分类模型中的可解释性方法,整合特征归因和交互方法,同时考虑影响可信度的关键因素:模型架构、数据集特征和证据类型。通过比较经典模型(Logistic回归和随机森林)和变形模型(RoBERTa和DistilBERT),检验了SHAP、LIME和综合梯度(IG)在正、负和所有证据类型中的可信度。在经典模型中,SHAP和LIME一般对正证据和所有证据类型都提供了忠实的解释,而SHAP和Random Forest对负证据的处理效果最好。对于变压器模型,LIME和SHAP的可信度因模型、数据集和证据类型而异。LIME在RoBERTa和DistilBERT等复杂模型中表现一贯良好,而SHAP在跨数据集的正证据方面表现出色,在RoBERTa中对负证据最有效。对于所有的证据,SHAP也显示出跨证据类型的更广泛的适用性,而LIME则适用于特定的数据集,例如英国脱欧,特别是在RoBERTa和DistilBERT中。对于较长的文本,IG和SHAP优于LIME,其中SHAP在RoBERTa等复杂架构中表现出色。当使用IG时,RoBERTa对正证据的解释比蒸馏酒提供的稍微更忠实,尽管只有蒸馏酒与负证据的预期趋势一致。使用Shapley Taylor interaction (STI)和Archipelago进行的特征交互分析表明,RoBERTa在数据集上提供的解释比DistilBERT更有凝聚力,尤其是在Archipelago上。基于sti的模型产生更多可解释的、与人类相关的短语,实现更高的相关性评级,特别是在全文上下文中进行评估时。
{"title":"Evaluating explainability in language classification models: A unified framework incorporating feature attribution methods and key factors affecting faithfulness","authors":"Tahereh Dehdarirad","doi":"10.1016/j.dim.2025.100101","DOIUrl":"10.1016/j.dim.2025.100101","url":null,"abstract":"<div><div>This paper presents a unified framework for evaluating explainability methods in language classification models, integrating feature attribution and interaction approaches while considering key factors impacting faithfulness: model architecture, dataset characteristics, and evidence type. By comparing classical (Logistic Regression and Random Forest) and transformer models (RoBERTa and DistilBERT), the faithfulness of SHAP, LIME, and Integrated Gradients (IG) across positive, negative, and all evidence types were examined. In classical models, SHAP and LIME generally provide faithful explanations for positive and all evidence types, with SHAP and Random Forest best handling negative evidence.</div><div>For transformer models, the faithfulness of LIME and SHAP varies by model, dataset, and evidence type. LIME performs consistently well in complex models like RoBERTa and DistilBERT, while SHAP excels with positive evidence across datasets and is most effective in RoBERTa for negative evidence. For all evidence, SHAP also shows broader applicability across evidence types, whereas LIME is suited to specific datasets, such as Brexit, especially in RoBERTa and DistilBERT. For longer texts, IG and SHAP outperform LIME, with SHAP excelling in complex architectures like RoBERTa. When using IG, RoBERTa provides slightly more faithful explanations than DistilBERT for positive evidence, though only DistilBERT aligns with expected trends for negative evidence.</div><div>Feature interaction analyses using Shapley Taylor Interaction (STI) and Archipelago reveal that RoBERTa consistently provides more cohesive explanations than DistilBERT across datasets, especially with Archipelago. STI-based models produce more interpretable, human-relevant phrases, achieving higher relevance ratings, especially when evaluated within full text contexts.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100101"},"PeriodicalIF":0.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Doctor-patient or patient-patient interaction? Relationships and differences of their roles in influencing patient satisfaction in online health communities 医患互动还是医患互动?在线健康社区中影响患者满意度的关系和作用差异
Pub Date : 2025-03-21 DOI: 10.1016/j.dim.2025.100098
Yaqi Huang , Wenhao Wang , Junjie Zhou
Exploring patients’ satisfaction with online health communities (OHCs) is essential to understanding how OHCs can empower patients. Existing studies primarily investigated the role of external support, while this study explores how doctor-patient (DP) and patient-patient (PP) interactions influence patient satisfaction, with a particular emphasis on intrinsic motivation. Additionally, it investigates the relationships and differences between DP and PP interactions. This study conceptualized patient empowerment as perceived control (PC), perceived support (PS), and shared decision-making (SDM). We developed a model with eight hypotheses to explore these dynamics. The model was tested using structural equation modeling with 470 valid samples, as well as a qualitative post-hoc analysis to provide additional evidence. The empirical results indicate that both DP and PP interactions directly and indirectly influence patient satisfaction, with PC, PS, and SDM acting as mediating factors. DP interaction has a stronger impact on SDM compared to PP interaction, whereas PP interaction has a stronger influence on PC and PS than DP interaction. Our findings reveal that PC, PS, and SDM play a partial mediating role between user interactions and patient satisfaction. DP interaction has a greater influence on patients through SDM, while PP interaction has a greater effect on patients through PC and PS. These insights clarify the distinct roles and impacts of DP and PP interactions on patient satisfaction and offer valuable guidance for the effective operation of OHCs.
探索患者对在线健康社区(ohc)的满意度对于了解ohc如何赋予患者权力至关重要。现有研究主要探讨外部支持的作用,而本研究探讨医患互动(DP)和医患互动(PP)如何影响患者满意度,特别强调内在动机。此外,研究了DP和PP相互作用之间的关系和差异。本研究将患者授权概念化为感知控制(PC)、感知支持(PS)和共同决策(SDM)。我们开发了一个包含八个假设的模型来探索这些动态。该模型使用470个有效样本的结构方程模型进行了测试,并进行了定性的事后分析以提供额外的证据。实证结果表明,DP和PP相互作用直接或间接影响患者满意度,其中PC、PS和SDM是中介因素。DP相互作用对SDM的影响强于PP相互作用,而PP相互作用对PC和PS的影响强于DP相互作用。我们的研究结果表明,PC、PS和SDM在用户交互和患者满意度之间起部分中介作用。DP交互作用通过SDM对患者的影响更大,而PP交互作用通过PC和PS对患者的影响更大。这些见解阐明了DP和PP交互作用对患者满意度的不同作用和影响,为ohc的有效运作提供了有价值的指导。
{"title":"Doctor-patient or patient-patient interaction? Relationships and differences of their roles in influencing patient satisfaction in online health communities","authors":"Yaqi Huang ,&nbsp;Wenhao Wang ,&nbsp;Junjie Zhou","doi":"10.1016/j.dim.2025.100098","DOIUrl":"10.1016/j.dim.2025.100098","url":null,"abstract":"<div><div>Exploring patients’ satisfaction with online health communities (OHCs) is essential to understanding how OHCs can empower patients. Existing studies primarily investigated the role of external support, while this study explores how doctor-patient (DP) and patient-patient (PP) interactions influence patient satisfaction, with a particular emphasis on intrinsic motivation. Additionally, it investigates the relationships and differences between DP and PP interactions. This study conceptualized patient empowerment as perceived control (PC), perceived support (PS), and shared decision-making (SDM). We developed a model with eight hypotheses to explore these dynamics. The model was tested using structural equation modeling with 470 valid samples, as well as a qualitative post-hoc analysis to provide additional evidence. The empirical results indicate that both DP and PP interactions directly and indirectly influence patient satisfaction, with PC, PS, and SDM acting as mediating factors. DP interaction has a stronger impact on SDM compared to PP interaction, whereas PP interaction has a stronger influence on PC and PS than DP interaction. Our findings reveal that PC, PS, and SDM play a partial mediating role between user interactions and patient satisfaction. DP interaction has a greater influence on patients through SDM, while PP interaction has a greater effect on patients through PC and PS. These insights clarify the distinct roles and impacts of DP and PP interactions on patient satisfaction and offer valuable guidance for the effective operation of OHCs.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100098"},"PeriodicalIF":0.0,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redefining research metrics: Introducing the inverse-H-index and efficacy equation in scholarly publication analysis 重新定义研究指标:在学术出版物分析中引入逆h指数和功效方程
Pub Date : 2025-03-20 DOI: 10.1016/j.dim.2025.100100
Fuyi Wei , Mengjia Yuan , Le Yang , Hong Zhu
The exploratory study examines the distribution of scholarly publications and citations within a geometric Graph-H framework, focusing on identifying and addressing gaps in the distribution matrix. Additionally, the paper introduces the economic concept of paper publications. Utilizing the geometric principles foundational to the original H-index, this study proposes the inverse-H-index as a metric to quantify discrepancies between citation counts and optimal capacity. Based on this quantified framework, the paper introduces an efficacy concept and equation to evaluate the effectiveness of researchers in achieving their H-index values through their publications and citations. The validity of this equation is assessed through Pearson testing and validation using datasets from Clarivate's report. By comparing the H-index, inverse-H-index, and efficacy measures across different groups of scholars, this study finds that the proposed equation effectively captures researchers' publication activities and identifies attributes not highlighted by the traditional H-index.
探索性研究考察了几何Graph-H框架内学术出版物和引文的分布,重点是识别和解决分布矩阵中的差距。此外,本文还介绍了纸质出版物的经济概念。利用原始h指数的基本几何原理,本研究提出了逆h指数作为量化被引数与最优容量之间差异的度量。在此量化框架的基础上,本文引入了有效性的概念和方程来评价研究者通过发表和被引达到h指数值的有效性。该方程的有效性通过Pearson检验进行评估,并使用Clarivate报告中的数据集进行验证。通过比较不同学者群体的h指数、逆h指数和功效指标,本研究发现,所提出的方程有效地捕捉了研究人员的发表活动,并识别了传统h指数未突出的属性。
{"title":"Redefining research metrics: Introducing the inverse-H-index and efficacy equation in scholarly publication analysis","authors":"Fuyi Wei ,&nbsp;Mengjia Yuan ,&nbsp;Le Yang ,&nbsp;Hong Zhu","doi":"10.1016/j.dim.2025.100100","DOIUrl":"10.1016/j.dim.2025.100100","url":null,"abstract":"<div><div>The exploratory study examines the distribution of scholarly publications and citations within a geometric Graph-H framework, focusing on identifying and addressing gaps in the distribution matrix. Additionally, the paper introduces the economic concept of paper publications. Utilizing the geometric principles foundational to the original H-index, this study proposes the inverse-H-index as a metric to quantify discrepancies between citation counts and optimal capacity. Based on this quantified framework, the paper introduces an efficacy concept and equation to evaluate the effectiveness of researchers in achieving their H-index values through their publications and citations. The validity of this equation is assessed through Pearson testing and validation using datasets from Clarivate's report. By comparing the H-index, inverse-H-index, and efficacy measures across different groups of scholars, this study finds that the proposed equation effectively captures researchers' publication activities and identifies attributes not highlighted by the traditional H-index.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100100"},"PeriodicalIF":0.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving structural learning in Bayesian networks: Stationarity analysis for algorithm choice 改进贝叶斯网络的结构学习:算法选择的平稳性分析
Pub Date : 2025-03-19 DOI: 10.1016/j.dim.2025.100097
German Cuaya-Simbro, Manolo Tellez Meneses, Elías Ruiz Hernández
Structural learning in Bayesian networks is crucial for accurate modeling of complex systems. However, the performance of structural learning algorithms is significantly influenced by data characteristics. This study investigates the impact of data stationarity on the performance of structural learning algorithms and proposes a measure for selecting the most appropriate algorithm based on stationarity analysis. We compared the performance of various algorithms on both stationary and non-stationary datasets, using the KPSS test to assess stationarity. Our findings indicate that Max-Min Hill Climbing (MMHC) is particularly effective for stationary data, while Hill Climbing performs better for non-stationary data. These results highlight the importance of tailoring algorithm selection to data characteristics and provide practical guidelines for researchers and practitioners. Future research could explore the development of more adaptive algorithms and delve deeper into the relationship between data stationarity and algorithm performance.
贝叶斯网络中的结构学习对于复杂系统的精确建模至关重要。然而,结构学习算法的性能受到数据特征的显著影响。本文研究了数据平稳性对结构学习算法性能的影响,并提出了一种基于平稳性分析选择最合适算法的方法。我们比较了各种算法在平稳和非平稳数据集上的性能,使用KPSS测试来评估平稳性。我们的研究结果表明,Max-Min Hill climb (MMHC)对平稳数据特别有效,而Hill climb对非平稳数据表现更好。这些结果突出了根据数据特征定制算法选择的重要性,并为研究人员和从业者提供了实用指南。未来的研究可以探索更多自适应算法的发展,并深入研究数据平稳性与算法性能之间的关系。
{"title":"Improving structural learning in Bayesian networks: Stationarity analysis for algorithm choice","authors":"German Cuaya-Simbro,&nbsp;Manolo Tellez Meneses,&nbsp;Elías Ruiz Hernández","doi":"10.1016/j.dim.2025.100097","DOIUrl":"10.1016/j.dim.2025.100097","url":null,"abstract":"<div><div>Structural learning in Bayesian networks is crucial for accurate modeling of complex systems. However, the performance of structural learning algorithms is significantly influenced by data characteristics. This study investigates the impact of data stationarity on the performance of structural learning algorithms and proposes a measure for selecting the most appropriate algorithm based on stationarity analysis. We compared the performance of various algorithms on both stationary and non-stationary datasets, using the KPSS test to assess stationarity. Our findings indicate that Max-Min Hill Climbing (MMHC) is particularly effective for stationary data, while Hill Climbing performs better for non-stationary data. These results highlight the importance of tailoring algorithm selection to data characteristics and provide practical guidelines for researchers and practitioners. Future research could explore the development of more adaptive algorithms and delve deeper into the relationship between data stationarity and algorithm performance.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100097"},"PeriodicalIF":0.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open access data policies and technologies: Introduction to special issue 开放存取数据政策与技术:专题导论
Pub Date : 2025-03-01 DOI: 10.1016/j.dim.2025.100093
Di Wang, Misita Anwar, Rong Tang
This special issue of the journal “Data and Information Management” looks into the global development of open access data (OAD) from the perspectives of policies and technologies. The papers included in this issue explore barriers and factors for OAD practices, identify policy gaps, and propose technological solutions to improve data accessibility and interoperability. This introduction situates these contributions within broader discussions of the evolving landscape of OAD, highlights its societal and economic benefits, and identifies future directions for studies of OAD policies and technologies to advance practices globally for a more sustainable and equal OAD ecosystem.
本期《数据与信息管理》特刊从政策和技术的角度探讨了开放获取数据(OAD)的全球发展。本期收录的论文探讨了OAD实践的障碍和因素,确定了政策差距,并提出了改善数据可访问性和互操作性的技术解决方案。本引言将这些贡献置于对OAD不断演变的景观的更广泛讨论中,强调了其社会和经济效益,并确定了OAD政策和技术研究的未来方向,以推动全球实践,实现更可持续和平等的OAD生态系统。
{"title":"Open access data policies and technologies: Introduction to special issue","authors":"Di Wang,&nbsp;Misita Anwar,&nbsp;Rong Tang","doi":"10.1016/j.dim.2025.100093","DOIUrl":"10.1016/j.dim.2025.100093","url":null,"abstract":"<div><div>This special issue of the journal “Data and Information Management” looks into the global development of open access data (OAD) from the perspectives of policies and technologies. The papers included in this issue explore barriers and factors for OAD practices, identify policy gaps, and propose technological solutions to improve data accessibility and interoperability. This introduction situates these contributions within broader discussions of the evolving landscape of OAD, highlights its societal and economic benefits, and identifies future directions for studies of OAD policies and technologies to advance practices globally for a more sustainable and equal OAD ecosystem.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 1","pages":"Article 100093"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying factors and configurations influencing the effectiveness of government data openness in China based on fsQCA 基于fsQCA识别影响中国政府数据开放有效性的因素和配置
Pub Date : 2025-03-01 DOI: 10.1016/j.dim.2024.100071
Xu Chen, Muhua Hu
Engaging government data openness is of great significance to economic development and social services. As the government data openness process continues to deepen in China, it is worth studying the factors that affect government data openness and the development paths leading to the high performance of data opening. Based on the Technology-Organization-Environment (TOE) theory, this paper proposes a government data open analysis framework including five condition variables (i.e., data support, technical support, government support, economic development, and social development). Using Fuzzy-set Qualitative Comparative Analysis (fsQCA) to analyze data from 25 provincial governments, we discover the key influencing factors and configurations leading to high-level and non-high-level data openness. Experimental results show that a single factor does not determine the level of government data opening. Instead, it is jointly affected by multiple factors in technology, organization, and environment. Three configuration paths are found in developing China’s provincial government data openness, including technology-environment-driven, technology-organization-environment-driven, and technology-organization-driven modes. The analysis results of this paper provide inspiration and suggestions for provincial governments to improve the level of government data opening according to local characteristics.
参与政府数据开放对经济发展和社会服务具有重要意义。随着中国政府数据开放进程的不断深入,政府数据开放的影响因素和数据开放的高效发展路径值得研究。基于技术-组织-环境(TOE)理论,提出了一个包含数据支持、技术支持、政府支持、经济发展和社会发展五个条件变量的政府数据开放分析框架。采用模糊集定性比较分析(fsQCA)对25个省级政府数据进行分析,发现了导致高级别和非高级别数据开放的关键影响因素和配置。实验结果表明,单一因素不能决定政府数据开放水平。相反,它受到技术、组织和环境等多种因素的共同影响。中国省级政府数据开放存在三种配置路径,即技术-环境驱动模式、技术-组织-环境驱动模式和技术-组织驱动模式。本文的分析结果为省级政府根据地方特点提高政府数据开放水平提供了启示和建议。
{"title":"Identifying factors and configurations influencing the effectiveness of government data openness in China based on fsQCA","authors":"Xu Chen,&nbsp;Muhua Hu","doi":"10.1016/j.dim.2024.100071","DOIUrl":"10.1016/j.dim.2024.100071","url":null,"abstract":"<div><div>Engaging government data openness is of great significance to economic development and social services. As the government data openness process continues to deepen in China, it is worth studying the factors that affect government data openness and the development paths leading to the high performance of data opening. Based on the Technology-Organization-Environment (TOE) theory, this paper proposes a government data open analysis framework including five condition variables (i.e., data support, technical support, government support, economic development, and social development). Using Fuzzy-set Qualitative Comparative Analysis (fsQCA) to analyze data from 25 provincial governments, we discover the key influencing factors and configurations leading to high-level and non-high-level data openness. Experimental results show that a single factor does not determine the level of government data opening. Instead, it is jointly affected by multiple factors in technology, organization, and environment. Three configuration paths are found in developing China’s provincial government data openness, including technology-environment-driven, technology-organization-environment-driven, and technology-organization-driven modes. The analysis results of this paper provide inspiration and suggestions for provincial governments to improve the level of government data opening according to local characteristics.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 1","pages":"Article 100071"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140400196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metadata application profile as a mechanism for semantic interoperability in FAIR and open data publishing 作为 FAIR 和开放数据发布中语义互操作性机制的元数据应用程序清单
Pub Date : 2025-03-01 DOI: 10.1016/j.dim.2024.100068
Nishad Thalhath, Mitsuharu Nagamori, Tetsuo Sakaguchi
Application profiles, also known as metadata application profiles, are customised collections of vocabularies adapted from various namespaces and tailored for specific local applications. These profiles act as constrainers and explainers for the (meta)data. Semantic interoperability is the ability of computer systems to exchange data in a mutually understandable manner, facilitating data sharing across diverse platforms and applications without compromising its meaning. As a critical component of semantic interoperability, application profiles enforce semantics to (meta)data, enhancing its openness, interoperability, and reusability. This study assesses the feasibility of representing a comprehensive application profile in a format aligned with the semantic web, ensuring interoperability between profiles and datasets. Dublin Core Description Set Profiles (DSP) is adapted as the modeling framework for metadata application profiles, steering the associated datasets toward RDF compliance. The research outcomes include “Yet Another Metadata Application Profiles” (YAMA) as a preprocessor grounded in the DSP framework for developing and managing metadata application profiles. YAMA facilitates the generation of various standard formats of application profiles, ensuring they are represented in human-readable documentation, machine-actionable forms, and even data validation languages. A data mapping extension to YAMA is proposed to ensure the semantic interoperability of open data, bridging non-RDF data structures to RDF, thus enabling the publication of 5-star open data. This ensures smooth dataset integration and the creation of linkable, semantically rich open datasets. The work emphasizes the pivotal role of application profiles in fortifying the semantic interoperability of (meta)data, thereby elevating dataset openness.
应用程序配置文件(也称为元数据应用程序配置文件)是自定义的词汇表集合,这些词汇表来自各种名称空间,并针对特定的本地应用程序进行了定制。这些概要文件充当(元)数据的约束和解释器。语义互操作性是计算机系统以相互理解的方式交换数据的能力,促进数据在不同平台和应用程序之间共享而不损害其含义。作为语义互操作性的关键组件,应用程序概要文件将语义强制到(元)数据,增强其开放性、互操作性和可重用性。本研究评估了以与语义网一致的格式表示综合应用程序概要的可行性,确保概要和数据集之间的互操作性。都柏林核心描述集概要文件(DSP)被用作元数据应用程序概要文件的建模框架,引导相关数据集遵从RDF。研究成果包括“Yet Another Metadata Application Profiles”(YAMA)作为一个基于DSP框架的预处理器,用于开发和管理元数据应用profile。YAMA促进了应用程序概要文件的各种标准格式的生成,确保它们以人类可读的文档、机器可操作的形式,甚至数据验证语言表示。提出了对YAMA的数据映射扩展,以确保开放数据的语义互操作性,将非RDF数据结构桥接到RDF,从而实现五星级开放数据的发布。这确保了平稳的数据集集成和创建可链接的、语义丰富的开放数据集。这项工作强调了应用程序配置文件在加强(元)数据的语义互操作性方面的关键作用,从而提高了数据集的开放性。
{"title":"Metadata application profile as a mechanism for semantic interoperability in FAIR and open data publishing","authors":"Nishad Thalhath,&nbsp;Mitsuharu Nagamori,&nbsp;Tetsuo Sakaguchi","doi":"10.1016/j.dim.2024.100068","DOIUrl":"10.1016/j.dim.2024.100068","url":null,"abstract":"<div><div>Application profiles, also known as metadata application profiles, are customised collections of vocabularies adapted from various namespaces and tailored for specific local applications. These profiles act as constrainers and explainers for the (meta)data. Semantic interoperability is the ability of computer systems to exchange data in a mutually understandable manner, facilitating data sharing across diverse platforms and applications without compromising its meaning. As a critical component of semantic interoperability, application profiles enforce semantics to (meta)data, enhancing its openness, interoperability, and reusability. This study assesses the feasibility of representing a comprehensive application profile in a format aligned with the semantic web, ensuring interoperability between profiles and datasets. Dublin Core Description Set Profiles (DSP) is adapted as the modeling framework for metadata application profiles, steering the associated datasets toward RDF compliance. The research outcomes include “Yet Another Metadata Application Profiles” (YAMA) as a preprocessor grounded in the DSP framework for developing and managing metadata application profiles. YAMA facilitates the generation of various standard formats of application profiles, ensuring they are represented in human-readable documentation, machine-actionable forms, and even data validation languages. A data mapping extension to YAMA is proposed to ensure the semantic interoperability of open data, bridging non-RDF data structures to RDF, thus enabling the publication of 5-star open data. This ensures smooth dataset integration and the creation of linkable, semantically rich open datasets. The work emphasizes the pivotal role of application profiles in fortifying the semantic interoperability of (meta)data, thereby elevating dataset openness.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 1","pages":"Article 100068"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140464109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding barriers affecting the adoption and usage of open access data in the context of organizations 了解影响在组织环境中采用和使用开放获取数据的障碍
Pub Date : 2025-03-01 DOI: 10.1016/j.dim.2023.100049
Murat Tahir Çaldağ , Ebru Gökalp
Data and information are growing exponentially due to advances and innovations in information and communication technologies. This growth forces organizations to gain new capabilities to compete or stay current in their marketplace. Open Access Data (OAD), resulting from the open source movement, is seen as a double-edged sword from the management perspective. It can potentially provide enormous social and economic value, such as transparency, participation culture, innovativeness, and accountability to organizations, governments, and, more importantly, citizens. Although the benefits of organizational adoption are significant, most OAD-related projects fail cause of organizational barriers and resistance to adoption. This study first aims to find these organizational barriers on adopting OAD to raise awareness of the obstacles organizations must overcome. Towards this aim, after conducting a systematic literature review (SLR) and an expert panel, a research model based on the Technology – Organization – Environment (TOE) framework is proposed in this study. As a result of SLR, 97 barriers were identified from ten primary studies. After critically examining these barriers, a research model classifying 22 crucial barriers to organizational OAD adoption based on the TOE framework is proposed. Another significant contribution of this study is to draw attention to under-researched barriers in the literature, such as Power and Control, Political Commitment, Inter-Organizational Trust, IT Governance, and Competitive Pressure.
由于信息和通信技术的进步和创新,数据和信息呈指数级增长。这种增长迫使组织获得新的能力来竞争或保持当前的市场。开放获取数据(OAD)是开源运动的产物,从管理的角度来看,它是一把双刃剑。它可以潜在地提供巨大的社会和经济价值,例如透明度、参与文化、创新以及对组织、政府,更重要的是对公民的问责制。尽管组织采用的好处是显著的,但大多数与oad相关的项目失败的原因是组织的障碍和对采用的抵制。本研究首先旨在找出组织采用OAD的障碍,以提高对组织必须克服的障碍的认识。为此,本研究在系统文献回顾(SLR)和专家小组讨论的基础上,提出了一个基于技术-组织-环境(TOE)框架的研究模型。作为SLR的结果,从10个初步研究中确定了97个障碍。在仔细研究了这些障碍之后,提出了一个基于TOE框架对组织采用OAD的22个关键障碍进行分类的研究模型。本研究的另一个重要贡献是提请注意文献中研究不足的障碍,如权力与控制、政治承诺、组织间信任、IT治理和竞争压力。
{"title":"Understanding barriers affecting the adoption and usage of open access data in the context of organizations","authors":"Murat Tahir Çaldağ ,&nbsp;Ebru Gökalp","doi":"10.1016/j.dim.2023.100049","DOIUrl":"10.1016/j.dim.2023.100049","url":null,"abstract":"<div><div>Data and information are growing exponentially due to advances and innovations in information and communication technologies. This growth forces organizations to gain new capabilities to compete or stay current in their marketplace. Open Access Data (OAD), resulting from the open source movement, is seen as a double-edged sword from the management perspective. It can potentially provide enormous social and economic value, such as transparency, participation culture, innovativeness, and accountability to organizations, governments, and, more importantly, citizens. Although the benefits of organizational adoption are significant, most OAD-related projects fail cause of organizational barriers and resistance to adoption. This study first aims to find these organizational barriers on adopting OAD to raise awareness of the obstacles organizations must overcome. Towards this aim, after conducting a systematic literature review (SLR) and an expert panel, a research model based on the Technology – Organization – Environment (TOE) framework is proposed in this study. As a result of SLR, 97 barriers were identified from ten primary studies. After critically examining these barriers, a research model classifying 22 crucial barriers to organizational OAD adoption based on the TOE framework is proposed. Another significant contribution of this study is to draw attention to under-researched barriers in the literature, such as Power and Control, Political Commitment, Inter-Organizational Trust, IT Governance, and Competitive Pressure.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 1","pages":"Article 100049"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135298620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data use policies on state COVID-19 dashboards in the United States: Key characteristics, topical focus, and identifiable gaps 美国各州COVID-19仪表板上的数据使用政策:主要特征、主题重点和可识别的差距
Pub Date : 2025-03-01 DOI: 10.1016/j.dim.2023.100050
Rong Tang, Zhan Hu, Yishan Zhang
In this paper, we report the findings of an investigation into the data use policies published on the COVID-19 dashboards developed by the 50 state governments of the United States as well as the government of District of Columbia. Specifically, we examined the key attributes of the dashboard data notes, such as data source, update frequency, and data suppression disclaimers. We also studied the terms and phrases used, as well as topic themes of the data policy texts. Using a data policy analysis model, our results revealed a series of gaps and inconsistencies in the policy statements. Connecting these gaps and inconsistencies with potential problems that could violate individual Open Data Principles (ODP) and the FAIR principles, we made recommendations to help resolve these missing areas and fix the inconsistencies, so that open government data can be managed and used to further the very core of open data practice. Further research that we plan to carry out includes confirmation and validation of our analysis model and our approach of linking the examination and assessment of open data policy with ODP and the FAIR principles.
在本文中,我们报告了对美国50个州政府和哥伦比亚特区政府制定的COVID-19仪表板上公布的数据使用政策的调查结果。具体来说,我们检查了仪表板数据注释的关键属性,例如数据源、更新频率和数据抑制免责声明。我们还研究了所使用的术语和短语,以及数据策略文本的主题。使用数据政策分析模型,我们的结果揭示了政策声明中的一系列差距和不一致之处。将这些差距和不一致与可能违反个人开放数据原则(ODP)和公平原则的潜在问题联系起来,我们提出了建议,以帮助解决这些缺失的领域并修复不一致,从而可以管理和使用开放政府数据来进一步推进开放数据实践的核心。我们计划开展的进一步研究包括确认和验证我们的分析模型,以及我们将开放数据政策的审查和评估与ODP和FAIR原则联系起来的方法。
{"title":"Data use policies on state COVID-19 dashboards in the United States: Key characteristics, topical focus, and identifiable gaps","authors":"Rong Tang,&nbsp;Zhan Hu,&nbsp;Yishan Zhang","doi":"10.1016/j.dim.2023.100050","DOIUrl":"10.1016/j.dim.2023.100050","url":null,"abstract":"<div><div>In this paper, we report the findings of an investigation into the data use policies published on the COVID-19 dashboards developed by the 50 state governments of the United States as well as the government of District of Columbia. Specifically, we examined the key attributes of the dashboard data notes, such as data source, update frequency, and data suppression disclaimers. We also studied the terms and phrases used, as well as topic themes of the data policy texts. Using a data policy analysis model, our results revealed a series of gaps and inconsistencies in the policy statements. Connecting these gaps and inconsistencies with potential problems that could violate individual Open Data Principles (ODP) and the FAIR principles, we made recommendations to help resolve these missing areas and fix the inconsistencies, so that open government data can be managed and used to further the very core of open data practice. Further research that we plan to carry out includes confirmation and validation of our analysis model and our approach of linking the examination and assessment of open data policy with ODP and the FAIR principles.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 1","pages":"Article 100050"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135200197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data and information management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1