首页 > 最新文献

ACM Journal of Data and Information Quality最新文献

英文 中文
Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text Text2EL+:专家指导下使用非结构化文本丰富事件日志
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-10 DOI: 10.1145/3640018
D. T. K. Geeganage, M. Wynn, A. Hofstede
Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.
通过应用流程挖掘,可以根据事件日志中捕获的流程执行数据改进业务流程。当然,这些数据的质量决定了改进建议的质量。提高数据质量并非易事,为此,利用非结构化文本(如来自注释、评论和意见的文本)来丰富事件日志的内容大有可为。为此,本文介绍了 Text2EL+ 这种利用非结构化文本丰富事件日志的三阶段方法。在第一阶段,从与组织流程相关联的非结构化文本中提取事件和(案例和事件)属性。在第二阶段,这些事件和属性在纳入事件日志之前要经过语义和上下文验证。在第三阶段,也是最后一个阶段,由于认识到人类领域专业知识的重要性,专家指导被用来去除冗余和不相关的事件,从而进一步提高数据质量。专家的输入被用来训练一个带有定制标签的命名实体识别(NER)模型,以检测事件日志元素。该方法应用了自然语言处理技术、句子嵌入、训练管道和模型,以及上下文和表达验证。对与医疗案例研究相关的各种非结构化临床笔记进行了分析,并通过实验评估了衍生事件日志元素的完整性、一致性和正确性。结果表明,所提出的方法是可行和适用的。
{"title":"Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text","authors":"D. T. K. Geeganage, M. Wynn, A. Hofstede","doi":"10.1145/3640018","DOIUrl":"https://doi.org/10.1145/3640018","url":null,"abstract":"Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-trivial and there is great potential to exploit unstructured text, e.g. from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this paper introduces Text2EL+ a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"5 8","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139440108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation 用于数据质量评估的消费类物联网设备特征目录
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-09 DOI: 10.1145/3639708
Valentina Golendukhina, Harald Foidl, Daniel Hörl, Michael Felderer
The Internet of Things (IoT) is rapidly growing and spreading across different markets, including the customer market and consumer IoT (CIoT). The large variety of gadgets and their availability makes CIoT more and more influential, especially in the wearable and smart home domains. However, the large variety of devices and their inconsistent quality due to varying hardware costs have an influence on the data produced by such devices. In this article, a catalog of CIoT properties is introduced, which enables the prediction of data quality. The data quality catalog contains six categories and 21 properties with descriptions and trust score calculation methods. A diagramming tool is implemented to support and facilitate the process of evaluation. The tool was assessed in an experimental setting with 14 users and received positive feedback. Additionally, we provide an exemplary application for smartwatch devices and compare the results obtained with the approach with the users’ evaluation based on the feedback from 158 smartwatch owners. As a result, the method-based ranking does not provide similar results to the regular users. However, it yields comparable outcomes to the assessment conducted by experienced users.
物联网(IoT)正在迅速发展,并在不同市场蔓延,包括客户市场和消费物联网(CIoT)。种类繁多的小工具及其可用性使得 CIoT 的影响力越来越大,尤其是在可穿戴设备和智能家居领域。然而,由于硬件成本不同,设备种类繁多且质量不一,这对此类设备产生的数据产生了影响。本文介绍了可预测数据质量的 CIoT 属性目录。数据质量目录包含 6 个类别和 21 个属性,并附有说明和信任分值计算方法。为支持和促进评估过程,还实施了一个图表工具。该工具在 14 个用户的实验环境中进行了评估,并获得了积极反馈。此外,我们还为智能手表设备提供了一个示例应用,并根据 158 位智能手表用户的反馈,将该方法获得的结果与用户的评价进行了比较。结果表明,基于该方法的排名并不能提供与普通用户相似的结果。不过,它得出的结果与经验丰富的用户进行的评估结果相当。
{"title":"A Catalog of Consumer IoT Device Characteristics for Data Quality Estimation","authors":"Valentina Golendukhina, Harald Foidl, Daniel Hörl, Michael Felderer","doi":"10.1145/3639708","DOIUrl":"https://doi.org/10.1145/3639708","url":null,"abstract":"The Internet of Things (IoT) is rapidly growing and spreading across different markets, including the customer market and consumer IoT (CIoT). The large variety of gadgets and their availability makes CIoT more and more influential, especially in the wearable and smart home domains. However, the large variety of devices and their inconsistent quality due to varying hardware costs have an influence on the data produced by such devices. In this article, a catalog of CIoT properties is introduced, which enables the prediction of data quality. The data quality catalog contains six categories and 21 properties with descriptions and trust score calculation methods. A diagramming tool is implemented to support and facilitate the process of evaluation. The tool was assessed in an experimental setting with 14 users and received positive feedback. Additionally, we provide an exemplary application for smartwatch devices and compare the results obtained with the approach with the users’ evaluation based on the feedback from 158 smartwatch owners. As a result, the method-based ranking does not provide similar results to the regular users. However, it yields comparable outcomes to the assessment conducted by experienced users.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"28 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139444060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI explainibility and acceptance; a case study for underwater mine hunting 人工智能的可解释性和可接受性;水下猎雷案例研究
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-12-21 DOI: 10.1145/3635113
Gj. Richard, Thales Dms, Imt Atlantique, France J. Habonneau, France D. Gueriot, France
In critical operational context such as Mine Warfare, Automatic Target Recognition (ATR) algorithms are still hardly accepted. The complexity of their decision-making hampers understanding of predictions despite performances approaching human expert ones. Much research has been done in Explainability Artificial Intelligence (XAI) field to avoid this ”black box” effect. This field of research attempts to provide explanations for the decision-making of complex networks to promote their acceptability. Most of the explanation methods applied on image classifier networks provide heat maps. These maps highlight pixels according to their importance in decision-making. In this work, we first implement different XAI methods for the automatic classification of Synthetic Aperture Sonar (SAS) images by convolutional neural networks (CNN). These different methods are based on a Post-Hoc approach. We study and compare the different heat maps obtained. Secondly, we evaluate the benefits and the usefulness of explainability in an operational framework for collaboration. To do this, different user tests are carried out with different levels of assistance ranging from classification for an unaided operator, to classification with explained ATR. These tests allow us to study whether heat maps are useful in this context. The results obtained show that the heat maps explanation have a disputed utility according to the operators. Heat map presence does not increase the quality of the classifications. On the contrary, it even increases the response time. Nevertheless, half of operators see a certain usefulness in heat maps explanation.
在地雷战等关键作战环境中,自动目标识别(ATR)算法仍然很难被接受。尽管其性能接近人类专家的预测,但其决策的复杂性妨碍了对预测的理解。为了避免这种 "黑箱 "效应,可解释人工智能(XAI)领域开展了大量研究。这一研究领域试图为复杂网络的决策提供解释,以提高其可接受性。大多数应用于图像分类器网络的解释方法都提供了热图。这些热图根据像素在决策中的重要性突出显示像素。在这项工作中,我们首先采用不同的 XAI 方法,通过卷积神经网络(CNN)对合成孔径雷达(SAS)图像进行自动分类。这些不同的方法都是基于 "Post-Hoc "方法。我们对所获得的不同热图进行了研究和比较。其次,我们评估了可解释性在协作操作框架中的优势和实用性。为此,我们进行了不同级别的用户测试,包括从无人协助的操作员分类到有解释的 ATR 分类。通过这些测试,我们可以研究热图在这种情况下是否有用。测试结果表明,热图解释对操作员的作用存在争议。热图的存在并没有提高分类的质量。相反,它甚至增加了响应时间。不过,半数操作员认为热图解释有一定的实用性。
{"title":"AI explainibility and acceptance; a case study for underwater mine hunting","authors":"Gj. Richard, Thales Dms, Imt Atlantique, France J. Habonneau, France D. Gueriot, France","doi":"10.1145/3635113","DOIUrl":"https://doi.org/10.1145/3635113","url":null,"abstract":"In critical operational context such as Mine Warfare, Automatic Target Recognition (ATR) algorithms are still hardly accepted. The complexity of their decision-making hampers understanding of predictions despite performances approaching human expert ones. Much research has been done in Explainability Artificial Intelligence (XAI) field to avoid this ”black box” effect. This field of research attempts to provide explanations for the decision-making of complex networks to promote their acceptability. Most of the explanation methods applied on image classifier networks provide heat maps. These maps highlight pixels according to their importance in decision-making. In this work, we first implement different XAI methods for the automatic classification of Synthetic Aperture Sonar (SAS) images by convolutional neural networks (CNN). These different methods are based on a Post-Hoc approach. We study and compare the different heat maps obtained. Secondly, we evaluate the benefits and the usefulness of explainability in an operational framework for collaboration. To do this, different user tests are carried out with different levels of assistance ranging from classification for an unaided operator, to classification with explained ATR. These tests allow us to study whether heat maps are useful in this context. The results obtained show that the heat maps explanation have a disputed utility according to the operators. Heat map presence does not increase the quality of the classifications. On the contrary, it even increases the response time. Nevertheless, half of operators see a certain usefulness in heat maps explanation.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"8 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138952091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data quality assessment through a preference model 通过偏好模型评估数据质量
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-11-29 DOI: 10.1145/3632407
Julian Le Deunf, Arwa Khannoussi, Laurent Lecornu, Patrick Meyer, John Puentes
Evaluating the quality of data is a problem of a multi-dimensional nature and quite frequently depends on the perspective of an expected use or final purpose of the data. Numerous works have explored the well-known specification of data quality dimensions in various application domains, without addressing the inter-dependencies and aggregation of quality attributes for decision support. In this work we therefore propose a context-dependent formal process to evaluate the quality of data which integrates a preference model from the field of Multi-Criteria Decision Aiding. The parameters of this preference model are determined through interviews with work-domain experts. We show the interest of the proposal on a case study related to the evaluation of the quality of hydrographical survey data.
评估数据质量是一个多维度的问题,通常取决于数据的预期用途或最终目的。许多著作都探讨了各种应用领域中众所周知的数据质量维度规范,但却没有解决决策支持中质量属性的相互依赖和聚合问题。因此,在这项工作中,我们提出了一个与上下文相关的正式流程来评估数据质量,该流程整合了多标准决策辅助领域的偏好模型。该偏好模型的参数是通过与工作领域专家的访谈确定的。我们通过一个与水文勘测数据质量评估相关的案例研究,展示了该建议的意义所在。
{"title":"Data quality assessment through a preference model","authors":"Julian Le Deunf, Arwa Khannoussi, Laurent Lecornu, Patrick Meyer, John Puentes","doi":"10.1145/3632407","DOIUrl":"https://doi.org/10.1145/3632407","url":null,"abstract":"Evaluating the quality of data is a problem of a multi-dimensional nature and quite frequently depends on the perspective of an expected use or final purpose of the data. Numerous works have explored the well-known specification of data quality dimensions in various application domains, without addressing the inter-dependencies and aggregation of quality attributes for decision support. In this work we therefore propose a context-dependent formal process to evaluate the quality of data which integrates a preference model from the field of Multi-Criteria Decision Aiding. The parameters of this preference model are determined through interviews with work-domain experts. We show the interest of the proposal on a case study related to the evaluation of the quality of hydrographical survey data.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"20 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139214628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Special Issue on Quality Aspects of Data Preparation 编辑数据准备的质量问题特刊
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-11-01 DOI: 10.1145/3626461
Marco Console, Maurizio Lenzerini
This Special Issue of the Journal of Data and Information Quality (JDIQ) contains novel theoretical and methodological contributions as well as state-of-the-art reviews and research perspectives on quality aspects of data preparation. In this editorial, we summarize the scope of the issue and briefly describe its content.
本期《数据和信息质量期刊》(JDIQ)特刊收录了有关数据准备质量方面的新理论、新方法以及最新评论和研究观点。在这篇社论中,我们将概述特刊的范围并简要介绍其内容。
{"title":"Editorial: Special Issue on Quality Aspects of Data Preparation","authors":"Marco Console, Maurizio Lenzerini","doi":"10.1145/3626461","DOIUrl":"https://doi.org/10.1145/3626461","url":null,"abstract":"This Special Issue of the Journal of Data and Information Quality (JDIQ) contains novel theoretical and methodological contributions as well as state-of-the-art reviews and research perspectives on quality aspects of data preparation. In this editorial, we summarize the scope of the issue and briefly describe its content.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"1180 1","pages":"1 - 2"},"PeriodicalIF":2.1,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139294854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Human-in-the-Loop Ontology Curation Results through Task Design 通过任务设计增强人在循环本体管理结果
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-10-06 DOI: 10.1145/3626960
Stefani Tsaneva, Marta Sabou
The success of artificial intelligence (AI) applications is heavily dependant on the quality of data they rely on. Thus, data curation, dealing with cleaning, organising and managing data, has become a significant research area to be addressed. Increasingly, semantic data structures such as ontologies and knowledge graphs empower the new generation of AI systems. In this paper, we focus on ontologies, as a special type of data. Ontologies are conceptual data structures representing a domain of interest and are often used as a backbone to knowledge-based intelligent systems or as an additional input for machine learning algorithms. Low-quality ontologies, containing incorrectly represented information or controversial concepts modelled from a single viewpoint can lead to invalid application outputs and biased systems. Thus, we focus on the curation of ontologies as a crucial factor for ensuring trust in the enabled AI systems. While some ontology quality aspects can be automatically evaluated, others require a human-in-the-loop evaluation. Yet, despite the importance of the field several ontology quality aspects have not yet been addressed and there is a lack of guidelines for optimal design of human computation tasks to perform such evaluations. In this paper, we advance the state-of-the-art by making two novel contributions: First, we propose a human-computation (HC)-based approach for the verification of ontology restrictions - an ontology evaluation aspect that has not yet been addressed with HC techniques. Second, by performing two controlled experiments with a junior expert crowd, we empirically derive task design guidelines for achieving high-quality evaluation results related to i) the formalism for representing ontology axioms and ii) crowd qualification testing . We find that the representation format of the ontology does not significantly influence the campaign results, nevertheless, contributors expressed a preference in working with a graphical ontology representation. Additionally we show that an objective qualification test is better fitted at assessing contributors’ prior knowledge rather than a subjective self-assessment and that prior modelling knowledge of the contributors had a positive effect on their judgements. We make all artefacts designed and used in the experimental campaign publicly available.
人工智能(AI)应用的成功在很大程度上取决于它们所依赖的数据的质量。因此,处理清理、组织和管理数据的数据管理已经成为一个重要的研究领域。语义数据结构(如本体和知识图)越来越多地为新一代人工智能系统提供支持。在本文中,我们关注本体,作为一种特殊类型的数据。本体是代表感兴趣领域的概念数据结构,通常用作基于知识的智能系统的骨干或作为机器学习算法的附加输入。低质量的本体,包含不正确表示的信息或从单一视点建模的有争议的概念,可能导致无效的应用程序输出和有偏见的系统。因此,我们专注于本体的管理,作为确保对启用的AI系统的信任的关键因素。虽然可以自动评估本体质量的某些方面,但其他方面需要人工在环评估。然而,尽管该领域很重要,但仍有几个本体质量方面尚未得到解决,并且缺乏用于执行此类评估的人类计算任务的优化设计指南。在本文中,我们通过做出两项新颖的贡献来推进最先进的技术:首先,我们提出了一种基于人类计算(HC)的方法来验证本体限制-这是HC技术尚未解决的本体评估方面。其次,通过对初级专家群体进行两个对照实验,我们经验地得出了任务设计指南,以获得与i)表示本体公理的形式主义和ii)群体资格测试相关的高质量评估结果。我们发现本体的表示格式并没有显著影响竞选结果,然而,贡献者表达了使用图形本体表示的偏好。此外,我们表明,客观资格测试更适合于评估贡献者的先验知识,而不是主观的自我评估,并且贡献者的先验建模知识对他们的判断有积极的影响。我们将在实验活动中设计和使用的所有人工制品公开提供。
{"title":"Enhancing Human-in-the-Loop Ontology Curation Results through Task Design","authors":"Stefani Tsaneva, Marta Sabou","doi":"10.1145/3626960","DOIUrl":"https://doi.org/10.1145/3626960","url":null,"abstract":"The success of artificial intelligence (AI) applications is heavily dependant on the quality of data they rely on. Thus, data curation, dealing with cleaning, organising and managing data, has become a significant research area to be addressed. Increasingly, semantic data structures such as ontologies and knowledge graphs empower the new generation of AI systems. In this paper, we focus on ontologies, as a special type of data. Ontologies are conceptual data structures representing a domain of interest and are often used as a backbone to knowledge-based intelligent systems or as an additional input for machine learning algorithms. Low-quality ontologies, containing incorrectly represented information or controversial concepts modelled from a single viewpoint can lead to invalid application outputs and biased systems. Thus, we focus on the curation of ontologies as a crucial factor for ensuring trust in the enabled AI systems. While some ontology quality aspects can be automatically evaluated, others require a human-in-the-loop evaluation. Yet, despite the importance of the field several ontology quality aspects have not yet been addressed and there is a lack of guidelines for optimal design of human computation tasks to perform such evaluations. In this paper, we advance the state-of-the-art by making two novel contributions: First, we propose a human-computation (HC)-based approach for the verification of ontology restrictions - an ontology evaluation aspect that has not yet been addressed with HC techniques. Second, by performing two controlled experiments with a junior expert crowd, we empirically derive task design guidelines for achieving high-quality evaluation results related to i) the formalism for representing ontology axioms and ii) crowd qualification testing . We find that the representation format of the ontology does not significantly influence the campaign results, nevertheless, contributors expressed a preference in working with a graphical ontology representation. Additionally we show that an objective qualification test is better fitted at assessing contributors’ prior knowledge rather than a subjective self-assessment and that prior modelling knowledge of the contributors had a positive effect on their judgements. We make all artefacts designed and used in the experimental campaign publicly available.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135347474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Multimodality, Multidimensional Representation, and Multimedia Quality Assessment Toward Information Quality in Social Web of Things 社论:多模态、多维表征和多媒体质量评估,实现社交物联网的信息质量
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-30 DOI: 10.1145/3625102
Chinmay Chakraborty, Mohammad Hossein Khosravi, Muhammad Khurram Khan, Houbing Song
This editorial summarizes the content of the collection on Multimodality, Multidimensional Representation, and Multimedia Quality Assessment Toward Information Quality in Social Web of Things for the Journal of Data and Information Quality.
这篇社论总结了《数据和信息质量》杂志的 "多模态、多维表示和多媒体质量评估 "文集的内容,旨在提高社交物联网的信息质量。
{"title":"Editorial: Multimodality, Multidimensional Representation, and Multimedia Quality Assessment Toward Information Quality in Social Web of Things","authors":"Chinmay Chakraborty, Mohammad Hossein Khosravi, Muhammad Khurram Khan, Houbing Song","doi":"10.1145/3625102","DOIUrl":"https://doi.org/10.1145/3625102","url":null,"abstract":"This editorial summarizes the content of the collection on Multimodality, Multidimensional Representation, and Multimedia Quality Assessment Toward Information Quality in Social Web of Things for the Journal of Data and Information Quality.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"23 1","pages":"1 - 3"},"PeriodicalIF":2.1,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139332411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validating Synthetic Usage Data in Living Lab Environments 在生活实验室环境中验证合成使用数据
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-24 DOI: 10.1145/3623640
Timo Breuer, Norbert Fuhr, Philipp Schaer
Evaluating retrieval performance without editorial relevance judgments is challenging, but instead, user interactions can be used as relevance signals. Living labs offer a way for small-scale platforms to validate information retrieval systems with real users. If enough user interaction data are available, click models can be parameterized from historical sessions to evaluate systems before exposing users to experimental rankings. However, interaction data are sparse in living labs, and little is studied about how click models can be validated for reliable user simulations when click data are available in moderate amounts. This work introduces an evaluation approach for validating synthetic usage data generated by click models in data-sparse human-in-the-loop environments like living labs. We ground our methodology on the click model's estimates about a system ranking compared to a reference ranking for which the relative performance is known. Our experiments compare different click models and their reliability and robustness as more session log data becomes available. In our setup, simple click models can reliably determine the relative system performance with already 20 logged sessions for 50 queries. In contrast, more complex click models require more session data for reliable estimates, but they are a better choice in simulated interleaving experiments when enough session data are available. While it is easier for click models to distinguish between more diverse systems, it is harder to reproduce the system ranking based on the same retrieval algorithm with different interpolation weights. Our setup is entirely open, and we share the code to reproduce the experiments.
在没有编辑相关性判断的情况下评估检索性能是具有挑战性的,但相反,用户交互可以用作相关性信号。生活实验室为小规模平台提供了一种与真实用户验证信息检索系统的方法。如果有足够的用户交互数据可用,点击模型可以从历史会话参数化,以便在向用户展示实验排名之前评估系统。然而,在真实的实验室中,交互数据是稀疏的,并且很少有人研究当点击数据数量适中时,如何验证点击模型以进行可靠的用户模拟。这项工作介绍了一种评估方法,用于验证在数据稀疏的人在环环境(如生活实验室)中由点击模型生成的综合使用数据。我们的方法基于点击模型对系统排名的估计,并与相对性能已知的参考排名进行比较。随着会话日志数据越来越多,我们的实验比较了不同的点击模型及其可靠性和鲁棒性。在我们的设置中,简单的单击模型可以可靠地确定50个查询的20个记录会话的相对系统性能。相比之下,更复杂的点击模型需要更多的会话数据来进行可靠的估计,但在模拟交错实验中,当有足够的会话数据可用时,它们是更好的选择。虽然点击模型更容易区分更多不同的系统,但基于相同的检索算法,使用不同的插值权重来再现系统排名是比较困难的。我们的设置是完全开放的,我们共享代码来重现实验。
{"title":"Validating Synthetic Usage Data in Living Lab Environments","authors":"Timo Breuer, Norbert Fuhr, Philipp Schaer","doi":"10.1145/3623640","DOIUrl":"https://doi.org/10.1145/3623640","url":null,"abstract":"Evaluating retrieval performance without editorial relevance judgments is challenging, but instead, user interactions can be used as relevance signals. Living labs offer a way for small-scale platforms to validate information retrieval systems with real users. If enough user interaction data are available, click models can be parameterized from historical sessions to evaluate systems before exposing users to experimental rankings. However, interaction data are sparse in living labs, and little is studied about how click models can be validated for reliable user simulations when click data are available in moderate amounts. This work introduces an evaluation approach for validating synthetic usage data generated by click models in data-sparse human-in-the-loop environments like living labs. We ground our methodology on the click model's estimates about a system ranking compared to a reference ranking for which the relative performance is known. Our experiments compare different click models and their reliability and robustness as more session log data becomes available. In our setup, simple click models can reliably determine the relative system performance with already 20 logged sessions for 50 queries. In contrast, more complex click models require more session data for reliable estimates, but they are a better choice in simulated interleaving experiments when enough session data are available. While it is easier for click models to distinguish between more diverse systems, it is harder to reproduce the system ranking based on the same retrieval algorithm with different interpolation weights. Our setup is entirely open, and we share the code to reproduce the experiments.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135925519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experience: Data Management for delivering COVID-19 relief in Panama 经验:在巴拿马提供COVID-19救援的数据管理
Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-22 DOI: 10.1145/3623511
Luis Del Vasto-Terrientes
A data-driven public sector recognizes data as a key element for implementing policies based on evidence. The open data movement has been a major catalyst for elevating data to a privileged position in many governments around the globe. In Panama, open data has enabled the improvement of data management in each institution. However, it is required to go further to create an integrated data-driven government with a common objective. Public institutions collect a huge amount of data that may never be used, and some others do not contain enough quality to provide trustworthy results. The state of emergency caused by the COVID-19 showed the necessity of establishing a common digital government vision for planning, delivering, and monitoring public services, as well as strengthening the technical foundation in the public sector to improve data value cycle: acquisition, storage, and exploitation. This paper reports from a data custodian perspective how the state of emergency worked as a catalyst to boost government data management, specifically for the Vale Digital program, a social relief linked to the identity card implemented by the Panamanian government during the COVID-19 pandemic, which may possibly be the greatest government data integration to date in terms of impact, data volume, rapid implementation, and institutions involved.
数据驱动的公共部门认识到数据是实施基于证据的政策的关键要素。开放数据运动一直是将数据提升到全球许多政府特权地位的主要催化剂。在巴拿马,开放数据使每个机构的数据管理得到改善。然而,需要进一步建立一个具有共同目标的综合数据驱动型政府。公共机构收集的大量数据可能永远不会被使用,而其他一些数据的质量不足以提供值得信赖的结果。COVID-19造成的紧急状态表明,有必要建立一个共同的数字政府愿景,以规划、提供和监测公共服务,并加强公共部门的技术基础,以改善数据的价值周期:获取、存储和利用。本文从数据托管的角度报道了紧急状态如何成为促进政府数据管理的催化剂,特别是对valedigital项目,这是巴拿马政府在2019冠状病毒病大流行期间实施的一项与身份证相关的社会救济计划,从影响、数据量、快速实施和涉及的机构来看,这可能是迄今为止最大的政府数据整合。
{"title":"Experience: Data Management for delivering COVID-19 relief in Panama","authors":"Luis Del Vasto-Terrientes","doi":"10.1145/3623511","DOIUrl":"https://doi.org/10.1145/3623511","url":null,"abstract":"A data-driven public sector recognizes data as a key element for implementing policies based on evidence. The open data movement has been a major catalyst for elevating data to a privileged position in many governments around the globe. In Panama, open data has enabled the improvement of data management in each institution. However, it is required to go further to create an integrated data-driven government with a common objective. Public institutions collect a huge amount of data that may never be used, and some others do not contain enough quality to provide trustworthy results. The state of emergency caused by the COVID-19 showed the necessity of establishing a common digital government vision for planning, delivering, and monitoring public services, as well as strengthening the technical foundation in the public sector to improve data value cycle: acquisition, storage, and exploitation. This paper reports from a data custodian perspective how the state of emergency worked as a catalyst to boost government data management, specifically for the Vale Digital program, a social relief linked to the identity card implemented by the Panamanian government during the COVID-19 pandemic, which may possibly be the greatest government data integration to date in terms of impact, data volume, rapid implementation, and institutions involved.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136060963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Process-Data Quality: The True Frontier of Process Mining 过程数据质量:过程挖掘的真正前沿
IF 2.1 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-23 DOI: 10.1145/3613247
A. T. Ter Hofstede, A. Koschmider, Andrea Marrella, R. Andrews, D. Fischer, Sareh Sadeghianasl, M. Wynn, M. Comuzzi, Jochen De Weerdt, Kanika Goel, Niels Martin, P. Soffer
Since its emergence over two decades ago, process mining has flourished as a discipline, with numerous contributions to its theory, widespread practical applications, and mature support by commercial tooling environments. However, its potential for significant organisational impact is hampered by poor quality event data. Process mining starts with the acquisition and preparation of event data coming from different data sources. These are then transformed into event logs, consisting of process execution traces including multiple events. In real-life scenarios, event logs suffer from significant data quality problems, which must be recognised and effectively resolved for obtaining meaningful insights from process mining analysis. Despite its importance, the topic of data quality in process mining has received limited attention. In this paper, we discuss the emerging challenges related to process-data quality from both a research and practical point of view. Additionally, we present a corresponding research agenda with key research directions.
自二十多年前出现以来,过程挖掘作为一门学科蓬勃发展,对其理论做出了许多贡献,广泛的实际应用,并得到了商业工具环境的成熟支持。然而,它对组织产生重大影响的潜力受到低质量事件数据的阻碍。流程挖掘从获取和准备来自不同数据源的事件数据开始。然后将它们转换为事件日志,由包含多个事件的流程执行跟踪组成。在现实场景中,事件日志存在严重的数据质量问题,必须识别并有效解决这些问题,才能从过程挖掘分析中获得有意义的见解。尽管过程挖掘中的数据质量问题很重要,但其受到的关注却很少。在本文中,我们从研究和实践的角度讨论了与过程数据质量相关的新挑战。此外,我们还提出了相应的研究议程和重点研究方向。
{"title":"Process-Data Quality: The True Frontier of Process Mining","authors":"A. T. Ter Hofstede, A. Koschmider, Andrea Marrella, R. Andrews, D. Fischer, Sareh Sadeghianasl, M. Wynn, M. Comuzzi, Jochen De Weerdt, Kanika Goel, Niels Martin, P. Soffer","doi":"10.1145/3613247","DOIUrl":"https://doi.org/10.1145/3613247","url":null,"abstract":"Since its emergence over two decades ago, process mining has flourished as a discipline, with numerous contributions to its theory, widespread practical applications, and mature support by commercial tooling environments. However, its potential for significant organisational impact is hampered by poor quality event data. Process mining starts with the acquisition and preparation of event data coming from different data sources. These are then transformed into event logs, consisting of process execution traces including multiple events. In real-life scenarios, event logs suffer from significant data quality problems, which must be recognised and effectively resolved for obtaining meaningful insights from process mining analysis. Despite its importance, the topic of data quality in process mining has received limited attention. In this paper, we discuss the emerging challenges related to process-data quality from both a research and practical point of view. Additionally, we present a corresponding research agenda with key research directions.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"19 1","pages":"1 - 21"},"PeriodicalIF":2.1,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89016744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Journal of Data and Information Quality
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1