2022 26th International Conference Information Visualisation (IV)最新文献

英文中文

Traffic Flow Indicator: Predicting Jams in a City 交通流量指标:预测城市拥堵

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00056

Joao Vaz, Nuno Datia, Matilde Pato, J. Pires

Road traffic inside cities is responsible for noise and pollution, that causes health problems, fuel consumption and waste of time in jams. Mitigation solutions are usually used to soften the impact of this problem in most cities. In particular, the city of Lisbon has taken measures to reduce pollution by closing areas of the city to the most polluting cars - the zero emission zones. However, the city still lacks visual analytics support for traffic decisions in real-time. In this paper we present a traffic flow indicator that can indicate the road traffic fluidity inside a region of interest for a given time frame, and integrated it into a interactive dashboard supported by a predictive model. With this solution, decision makers can analyse historical data and predict short-term traffic behaviour.

城市内的道路交通造成噪音和污染，造成健康问题、燃料消耗和交通堵塞浪费时间。在大多数城市，通常采用缓解办法来减轻这一问题的影响。特别是，里斯本市已经采取措施，禁止污染最严重的汽车进入城市的一些区域——零排放区，以减少污染。然而，这个城市仍然缺乏实时交通决策的可视化分析支持。在本文中，我们提出了一种交通流量指示器，可以指示给定时间框架内感兴趣区域内的道路交通流动性，并将其集成到一个由预测模型支持的交互式仪表板中。有了这个解决方案，决策者可以分析历史数据并预测短期交通行为。

引用次数: 0

Visualizing Temporal Data using Time-dependent Non-decreasing Monotone Functions 利用时变非递减单调函数可视化时间数据

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00015

Maria D'Amaral Ferreira, João Moura Pires, C. Damásio

Ahstract- The occurrence of seasonal natural phenomena depends on the conditions leading to it and not directly on the progression of time, meaning its context varies across time and space. Examples of this include comparing plant growth, insect development or wildfire risk during the same time period at different locations or in different time periods at the same location. However, visualizing and comparing such phenomena usually implies plotting it across the time axis as it's perceived as temporal data. Since it's not directly dependent of time, identifying patters of recurrence using this technique is inefficient. Because of this, we proposed transforming (when needed) the dependent function to a non-decreasing monotone one, in order to preserve the monotonic property of time progression. Then we used the resulting function as a time axis replacement to achieve an equal ground of comparison between the different contexts in which the phenomenon occurs. We applied this technique to real data from seasonal natural phenomena, such as plant and insect growth, to compare its progression in different temporal and spatial contexts. Since the dependent function of the phenomenon was scientifically known, we were able to directly use the technique to infer its seasonality patterns. Furthermore, we applied the technique to real data from the coronavirus worldwide pandemic by hypothesizing its dependent function and analysing if it was able to reduce the existing temporal misalignment between different contexts, like years and countries. The results achieved were positive, although not as remarkable as when the dependent function was known.

摘要:季节性自然现象的发生取决于导致它的条件，而不是直接取决于时间的进展，这意味着它的背景随着时间和空间的变化而变化。这方面的例子包括比较同一时期不同地点或同一地点不同时期的植物生长、昆虫发育或野火风险。然而，可视化和比较这些现象通常意味着将其绘制在时间轴上，因为它被视为时间数据。由于它不直接依赖于时间，使用这种技术识别复发模式是低效的。为此，我们提出在需要时将相关函数变换为非递减单调函数，以保持时间级数的单调性。然后，我们使用结果函数作为时间轴替换，以实现在现象发生的不同上下文之间的平等比较。我们将这种技术应用于季节性自然现象的真实数据，如植物和昆虫的生长，以比较其在不同时空背景下的进展。由于这种现象的相关函数在科学上是已知的，我们能够直接使用该技术来推断其季节性模式。此外，我们通过假设其依赖函数并分析它是否能够减少不同背景(如年份和国家)之间现有的时间偏差，将该技术应用于冠状病毒全球大流行的真实数据。取得的结果是积极的，尽管不像依赖函数已知时那样显著。

{"title":"Visualizing Temporal Data using Time-dependent Non-decreasing Monotone Functions","authors":"Maria D'Amaral Ferreira, João Moura Pires, C. Damásio","doi":"10.1109/IV56949.2022.00015","DOIUrl":"https://doi.org/10.1109/IV56949.2022.00015","url":null,"abstract":"Ahstract- The occurrence of seasonal natural phenomena depends on the conditions leading to it and not directly on the progression of time, meaning its context varies across time and space. Examples of this include comparing plant growth, insect development or wildfire risk during the same time period at different locations or in different time periods at the same location. However, visualizing and comparing such phenomena usually implies plotting it across the time axis as it's perceived as temporal data. Since it's not directly dependent of time, identifying patters of recurrence using this technique is inefficient. Because of this, we proposed transforming (when needed) the dependent function to a non-decreasing monotone one, in order to preserve the monotonic property of time progression. Then we used the resulting function as a time axis replacement to achieve an equal ground of comparison between the different contexts in which the phenomenon occurs. We applied this technique to real data from seasonal natural phenomena, such as plant and insect growth, to compare its progression in different temporal and spatial contexts. Since the dependent function of the phenomenon was scientifically known, we were able to directly use the technique to infer its seasonality patterns. Furthermore, we applied the technique to real data from the coronavirus worldwide pandemic by hypothesizing its dependent function and analysing if it was able to reduce the existing temporal misalignment between different contexts, like years and countries. The results achieved were positive, although not as remarkable as when the dependent function was known.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133339126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation of Deep Learning Context-Sensitive Visualization Models 深度学习上下文敏感可视化模型的评价

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00066

A. Dunn, D. Inkpen, Razvan Andonie

The introduction of Transformer neural networks has changed the landscape of Natural Language Processing (NLP) during the recent years. These models are very complex, and therefore hard to debug and explain. In this context, visual explanation became an attractive approach. The visualization of the path that leads to certain outputs of a model is at the core of visual explanation, as this illuminates the features or parts of the model that may need to be changed to achieve the desired results. In particular, one goal of a NLP visual explanation is to highlight the most significant parts of the text that have the greatest impact on the model output. Several visual explanation methods for NLP models were recently proposed. A major challenge is how to compare the performances of such methods since we cannot simply use the usual classification accuracy measures to evaluate the quality of visualizations. We need good metrics and rigorous criteria to measure how useful the extracted knowledge is for explaining the models. In addition, we want to visualize the differences between the knowledge extracted by different models, in order to be able to rank them. In this paper, we investigate how to evaluate explanations/visualizations resulted from machine learning models for text classification. The goal is not to improve the accuracy of a particular NLP classifier, but to assess the quality of the visualizations that explain its decisions. We describe several methods for evaluating the quality of NLP visualizations, including both automated techniques based on quantifiable measures and subjective techniques based on human judgements.

近年来，变压器神经网络的引入改变了自然语言处理(NLP)的格局。这些模型非常复杂，因此很难调试和解释。在这种情况下，视觉解释成为一种有吸引力的方法。导致模型的某些输出的路径的可视化是可视化解释的核心，因为它阐明了可能需要更改以实现预期结果的模型的特征或部分。特别是，NLP可视化解释的一个目标是突出文本中对模型输出影响最大的最重要部分。最近提出了几种NLP模型的可视化解释方法。一个主要的挑战是如何比较这些方法的性能，因为我们不能简单地使用通常的分类精度度量来评估可视化的质量。我们需要良好的度量和严格的标准来衡量提取的知识对解释模型的有用程度。此外，我们希望可视化不同模型提取的知识之间的差异，以便能够对它们进行排序。在本文中，我们研究了如何评估机器学习模型对文本分类的解释/可视化结果。目标不是提高特定NLP分类器的准确性，而是评估解释其决策的可视化的质量。我们描述了几种评估NLP可视化质量的方法，包括基于可量化测量的自动化技术和基于人类判断的主观技术。

{"title":"Evaluation of Deep Learning Context-Sensitive Visualization Models","authors":"A. Dunn, D. Inkpen, Razvan Andonie","doi":"10.1109/IV56949.2022.00066","DOIUrl":"https://doi.org/10.1109/IV56949.2022.00066","url":null,"abstract":"The introduction of Transformer neural networks has changed the landscape of Natural Language Processing (NLP) during the recent years. These models are very complex, and therefore hard to debug and explain. In this context, visual explanation became an attractive approach. The visualization of the path that leads to certain outputs of a model is at the core of visual explanation, as this illuminates the features or parts of the model that may need to be changed to achieve the desired results. In particular, one goal of a NLP visual explanation is to highlight the most significant parts of the text that have the greatest impact on the model output. Several visual explanation methods for NLP models were recently proposed. A major challenge is how to compare the performances of such methods since we cannot simply use the usual classification accuracy measures to evaluate the quality of visualizations. We need good metrics and rigorous criteria to measure how useful the extracted knowledge is for explaining the models. In addition, we want to visualize the differences between the knowledge extracted by different models, in order to be able to rank them. In this paper, we investigate how to evaluate explanations/visualizations resulted from machine learning models for text classification. The goal is not to improve the accuracy of a particular NLP classifier, but to assess the quality of the visualizations that explain its decisions. We describe several methods for evaluating the quality of NLP visualizations, including both automated techniques based on quantifiable measures and subjective techniques based on human judgements.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128300081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparative evaluation of the Scatter Plot Matrix and Parallel Coordinates Plot Matrix 散点图矩阵与平行坐标图矩阵的比较评价

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00027

Hugh Garner, S. Fernstad

The Scatter Plot Matrix (SPLOM) and the Parallel Coordinates Plot Matrix (PCPM) are frequently used in exploratory data analysis for multivariate data to explore pairwise relationships, clustering and outliers. The SPLOM and PCPM are complex visualization methods with many potential interactions between data, task and visual representation. While numerous studies exist evaluating the SPLOM and Parallel Coordinates Plot (PCP) there is, to the best of our knowledge, no existing study evaluating the PCPM. This pilot study presents an evaluation of the performance of the SPLOM and PCPM for a set of common explorative tasks and identifies key directions for future work. The overall results indicate a minimal performance difference between the visualization methods for most tasks, but with significant variance between users, interactions between data features and response by method, and strong user preferences depending on task. As such, we recommend careful consideration of the background of potential users when choosing a method, and/or the use of complementary or linked views. Further work is required to understand the particular mechanisms impacting users' highly variable performance with the PCPM.

散点图矩阵(SPLOM)和平行坐标图矩阵(PCPM)在多变量数据的探索性数据分析中经常被使用，以探索成对关系、聚类和异常值。SPLOM和PCPM是复杂的可视化方法，在数据、任务和可视化表示之间存在许多潜在的相互作用。虽然已有许多研究对SPLOM和平行坐标图(PCP)进行了评价，但据我们所知，尚无对PCPM进行评价的研究。这项试点研究对SPLOM和PCPM的性能进行了评估，并为一系列共同的探索性任务确定了未来工作的关键方向。总体结果表明，对于大多数任务，可视化方法之间的性能差异很小，但在用户之间、数据特征之间的交互和方法响应之间、以及用户对任务的强烈偏好之间存在显著差异。因此，我们建议在选择方法时仔细考虑潜在用户的背景，和/或使用互补或链接视图。需要进一步的工作来理解影响PCPM高度可变的用户性能的特定机制。

{"title":"Comparative evaluation of the Scatter Plot Matrix and Parallel Coordinates Plot Matrix","authors":"Hugh Garner, S. Fernstad","doi":"10.1109/IV56949.2022.00027","DOIUrl":"https://doi.org/10.1109/IV56949.2022.00027","url":null,"abstract":"The Scatter Plot Matrix (SPLOM) and the Parallel Coordinates Plot Matrix (PCPM) are frequently used in exploratory data analysis for multivariate data to explore pairwise relationships, clustering and outliers. The SPLOM and PCPM are complex visualization methods with many potential interactions between data, task and visual representation. While numerous studies exist evaluating the SPLOM and Parallel Coordinates Plot (PCP) there is, to the best of our knowledge, no existing study evaluating the PCPM. This pilot study presents an evaluation of the performance of the SPLOM and PCPM for a set of common explorative tasks and identifies key directions for future work. The overall results indicate a minimal performance difference between the visualization methods for most tasks, but with significant variance between users, interactions between data features and response by method, and strong user preferences depending on task. As such, we recommend careful consideration of the background of potential users when choosing a method, and/or the use of complementary or linked views. Further work is required to understand the particular mechanisms impacting users' highly variable performance with the PCPM.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122846821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Eye of the Rider. Visualization and data-driven heuristics for the critical analysis of gig economy 骑士之眼。零工经济批判性分析的可视化和数据驱动启发式

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00068

N. Lettieri, Delfina Malandrino, Alfonso Guarino, R. Zaccagnino

The digital evolution of economies and markets brings changes that go largely beyond growth and efficiency. In the gig economy, also fueled by algorithmic management solutions, digital labour platforms (DPLs) raise significant issues that include power asymmetries, new forms of workers' abuse and discrimination, algorithms' opacity and over-control. In such a scenario, while normative frameworks evolve novel safeguards, a crucial challenge is that of feeding public (social, institutional) oversight on the dynamics that, at various levels, affect the gig work world. In this paper, we show how the combination of visual analytics and data-driven heuristics can be used to offer new insights on and promote higher levels of transparency and awareness about the fairness of DPLs' activity seen, in the first place, from the workers' perspective. Solutions will be presented as part of GigAdvisor, an experimental cross-platform application developed within an ongoing research that draws on computational social science methods to enable new critical approaches to the digital economy.

经济和市场的数字化演变带来的变化在很大程度上超越了增长和效率。在零工经济中，同样受到算法管理解决方案的推动，数字劳动平台(DPLs)引发了一些重大问题，包括权力不对称、新形式的工人虐待和歧视、算法的不透明和过度控制。在这种情况下，虽然规范框架会发展出新的保障措施，但一个关键的挑战是，如何在各个层面上对影响零工世界的动态进行公众(社会、机构)监督。在本文中，我们展示了如何将可视化分析和数据驱动的启发式相结合，以提供新的见解，并促进更高水平的透明度和对dpl活动公平性的认识，首先从工人的角度来看。解决方案将作为GigAdvisor的一部分展示，GigAdvisor是一个正在进行的研究中开发的实验性跨平台应用程序，利用计算社会科学方法为数字经济提供新的关键方法。

{"title":"The Eye of the Rider. Visualization and data-driven heuristics for the critical analysis of gig economy","authors":"N. Lettieri, Delfina Malandrino, Alfonso Guarino, R. Zaccagnino","doi":"10.1109/IV56949.2022.00068","DOIUrl":"https://doi.org/10.1109/IV56949.2022.00068","url":null,"abstract":"The digital evolution of economies and markets brings changes that go largely beyond growth and efficiency. In the gig economy, also fueled by algorithmic management solutions, digital labour platforms (DPLs) raise significant issues that include power asymmetries, new forms of workers' abuse and discrimination, algorithms' opacity and over-control. In such a scenario, while normative frameworks evolve novel safeguards, a crucial challenge is that of feeding public (social, institutional) oversight on the dynamics that, at various levels, affect the gig work world. In this paper, we show how the combination of visual analytics and data-driven heuristics can be used to offer new insights on and promote higher levels of transparency and awareness about the fairness of DPLs' activity seen, in the first place, from the workers' perspective. Solutions will be presented as part of GigAdvisor, an experimental cross-platform application developed within an ongoing research that draws on computational social science methods to enable new critical approaches to the digital economy.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Cybersecurity Incident Analysis Workflow with Analytical Provenance 利用分析来源改进网络安全事件分析工作流程

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00058

Vít Rusňák, L. Janečková, Filip Drgon, Anna-Marie Dombajova, Veronika Kudelkova

Cybersecurity incident analysis is an exploratory, data-driven process over records and logs from network monitoring tools. The process is rarely linear and frequently breaks down into multiple investigation branches. Analysts document all the steps and lessons learned and suggest mitigations. However, current tools provide only limited support for analytical provenance. As a result, analysts have to record all the details regarding the performed steps and notes in separate documents. Such a procedure increases their cognitive demands and is naturally error-prone. This paper proposes a conceptual design of the analytical tool implementing means of analytical provenance in cybersecurity incident analysis workflows. We identified the user requirements and designed and implemented a proof of concept prototype application Incident Analyzer. Qualitative feedback from four domain experts confirmed that our approach is promising and can significantly improve current cybersecurity and network incident analysis practices.

网络安全事件分析是对来自网络监控工具的记录和日志进行探索性、数据驱动的过程。这个过程很少是线性的，经常分解成多个调查分支。分析人员记录了所有的步骤和经验教训，并提出了缓解措施。然而，目前的工具只提供有限的分析来源支持。因此，分析人员必须在单独的文档中记录有关执行步骤和注释的所有细节。这样的过程增加了他们的认知需求，自然容易出错。本文提出了一种分析工具的概念设计，在网络安全事件分析工作流中实现分析来源的方法。我们确定了用户需求，设计并实现了一个概念验证原型应用程序Incident Analyzer。来自四位领域专家的定性反馈证实，我们的方法很有前途，可以显著改善当前的网络安全和网络事件分析实践。

引用次数: 0

VennSOM: A SOM-Assisted Visualization of Binary Data 一个som辅助的二进制数据可视化

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00072

M. Trutschl, P. Kilgore, Billy A. Tran, Hyun-Woong Nam, Eric Clifford, Adesewa Akande, U. Cvek

Venn diagrams are a useful method of visualizing Boolean data; however, their data aggregation causes fine detail about the data to be lost. In this paper, we present a method of augmenting Venn diagrams, so that they may depict similarity relationships among individual records in the data using the Self-Organizing Map. We applied this method to a synthetic data set and an empirical proteomics data set. We found that we were able to separate data within each region of the Venn diagram based on dimensional values, and that we can highlight the clustering of $p$-values in the empirical set.

维恩图是可视化布尔数据的一种有用方法;但是，它们的数据聚合会导致丢失有关数据的详细信息。在本文中，我们提出了一种增加维恩图的方法，使它们可以用自组织图来描述数据中各个记录之间的相似关系。我们将此方法应用于合成数据集和经验蛋白质组学数据集。我们发现，我们能够根据维度值在维恩图的每个区域内分离数据，并且我们可以突出显示经验集中$p$值的聚类。

引用次数: 0

Design Thinking at a glance - An overview of models along with enablers and barriers of bringing it to the workplace and life 设计思维概览-概述模型以及将其带入工作场所和生活的促成因素和障碍

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-07-01 DOI: 10.1109/IV56949.2022.00046

S. Kernbach, Anja Svetina Nabergoj, Anastasia Liakhavets, Andrei Petukh

Today's rapid-changing environment requires individuals, organizations and society-at-large to faster react, act, and proactively shape the future. Design thinking has become a popular innovation method to help proactively design a future to look forward to. However, many design thinking efforts do not go beyond short-term workshops and mini-projects and neglect the longer-term successful implementation of design thinking in organization which is often challenging. Therefore, this paper aims to shed light on the enablers and barriers of successfully implementing design thinking in organizations by providing a first conceptual overview of the enablers and barriers on an organizational and individual level to inform, educate and motivate researchers, practitioners, and educational institutions to discuss and implement design thinking programs in the future more carefully.

当今瞬息万变的环境要求个人、组织和整个社会更快地做出反应、采取行动，并积极主动地塑造未来。设计思维已经成为一种流行的创新方法，帮助人们主动设计未来。然而，许多设计思维的努力并没有超越短期的研讨会和小型项目，而忽视了设计思维在组织中的长期成功实施，这往往是具有挑战性的。因此，本文旨在通过对组织和个人层面上的促成因素和障碍的第一个概念性概述，阐明在组织中成功实施设计思维的促成因素和障碍，以告知、教育和激励研究人员、从业者和教育机构在未来更仔细地讨论和实施设计思维计划。

引用次数: 3

Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery 用于知识发现的可解释混合数据表示和无损可视化工具包

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-06-13 DOI: 10.1109/IV56949.2022.00060

B. Kovalerchuk, Elijah McCoy

Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.

为异构/混合数据开发机器学习(ML)算法是一个长期存在的问题。许多ML算法不适用于混合数据，包括数字和非数字数据、文本、图形等生成可解释的模型。另一个长期存在的问题是开发用于多维混合数据无损可视化的算法。机器学习的进一步发展在很大程度上取决于混合数据的可解释机器学习算法的成功和多维数据的无损可解释可视化。后者允许最终用户使用视觉知识发现开发可解释的ML模型，最终用户可以带来训练数据中缺失的有价值的领域知识。混合数据的挑战包括:(1)为数字ML算法生成非数字属性的数字编码方案，以提供准确和可解释的ML模型;(2)生成n-D非数字数据的无损可视化方法，并在这些可视化中发现可视化规则。本文提出了混合数据类型的分类，分析了混合数据类型对机器学习的重要性，并给出了开发的处理混合数据的实验工具包。它结合了数据类型编辑器、VisCanvas数据可视化和规则发现系统，这些都可以在GitHub上获得。

{"title":"Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery","authors":"B. Kovalerchuk, Elijah McCoy","doi":"10.1109/IV56949.2022.00060","DOIUrl":"https://doi.org/10.1109/IV56949.2022.00060","url":null,"abstract":"Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"369 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122777909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Interpretable Machine Learning for Self-Service High-Risk Decision-Making 自助式高风险决策的可解释机器学习

2022 26th International Conference Information Visualisation (IV)

Pub Date : 2022-05-09 DOI: 10.1109/IV56949.2022.00061

Charles Recaido, B. Kovalerchuk

This paper contributes to interpretable machine learning via visual knowledge discovery in general line coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and general line coordinates are combined to create a visual self-service machine learning model. Dynamic Scaffolding Coordinates as lossless multidimensional coordinate systems are proposed, and their applications as visual models is shown. DSC1 and DSC2 can map multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a graph construction algorithm. The hyperblock analysis was used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree rules and a series of DSC1 or DSC2 plots can visualize a decision tree. The DSC1 and DSC2 plots were tested on benchmark datasets from the UCI ML repository. They allowed for visual classification of data. Additionally, areas of hyperblock impurity were discovered and used to establish dataset splits that highlight the upper estimate of worst-case model accuracy to guide model selection for high-risk decision-making. Major benefits of DSC1 and DSC2 is their highly interpretable nature. They allow domain experts to control or establish new machine learning models through visual pattern discovery.

本文通过在一般直线坐标(GLC)中的视觉知识发现为可解释的机器学习做出了贡献。超块作为可解释数据集单元的概念和一般的直线坐标相结合，创建了一个可视化的自助机器学习模型。提出了动态脚手架坐标系作为无损多维坐标系，并给出了动态脚手架坐标系作为可视化模型的应用。DSC1和DSC2可以使用图构建算法将多个数据集属性映射到单个二维(X, Y)笛卡尔平面上。使用超块分析来确定视觉上吸引人的数据集属性顺序并减少线遮挡。结果表明，超块可以泛化决策树规则，一系列DSC1或DSC2图可以可视化决策树。DSC1和DSC2图在来自UCI ML存储库的基准数据集上进行测试。它们允许对数据进行可视化分类。此外，超块杂质区域被发现并用于建立数据集分割，突出最坏情况模型精度的上限估计，以指导高风险决策的模型选择。DSC1和DSC2的主要优点是它们的高度可解释性。它们允许领域专家通过视觉模式发现来控制或建立新的机器学习模型。

{"title":"Interpretable Machine Learning for Self-Service High-Risk Decision-Making","authors":"Charles Recaido, B. Kovalerchuk","doi":"10.1109/IV56949.2022.00061","DOIUrl":"https://doi.org/10.1109/IV56949.2022.00061","url":null,"abstract":"This paper contributes to interpretable machine learning via visual knowledge discovery in general line coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and general line coordinates are combined to create a visual self-service machine learning model. Dynamic Scaffolding Coordinates as lossless multidimensional coordinate systems are proposed, and their applications as visual models is shown. DSC1 and DSC2 can map multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a graph construction algorithm. The hyperblock analysis was used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree rules and a series of DSC1 or DSC2 plots can visualize a decision tree. The DSC1 and DSC2 plots were tested on benchmark datasets from the UCI ML repository. They allowed for visual classification of data. Additionally, areas of hyperblock impurity were discovered and used to establish dataset splits that highlight the upper estimate of worst-case model accuracy to guide model selection for high-risk decision-making. Major benefits of DSC1 and DSC2 is their highly interpretable nature. They allow domain experts to control or establish new machine learning models through visual pattern discovery.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"101 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120971591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 26th International Conference Information Visualisation (IV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀