2022 IEEE Visualization and Visual Analytics (VIS)最新文献

英文中文

Visual Auditor: Interactive Visualization for Detection and Summarization of Model Biases 可视化审计师:用于检测和总结模型偏差的交互式可视化

2022 IEEE Visualization and Visual Analytics (VIS)

Pub Date : 2022-06-25 DOI: 10.1109/VIS54862.2022.00018

David Munechika, Zijie J. Wang, Jack Reidy, Josh Rubin, Krishna Gade, K. Kenthapadi, Duen Horng Chau

As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their de-ployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underper-forming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually understanding and interacting with the results of these algorithms. We propose Visual Auditor, an interactive visualization tool for auditing and summarizing model biases. Visual Auditor assists model validation by providing an interpretable overview of intersectional bias (bias that is present when examining populations defined by multiple features), details about relationships between problematic data slices, and a comparison between underperforming and overper-forming data slices in a model. Our open-source tool runs directly in both computational notebooks and web browsers, making model auditing accessible and easily integrated into current ML development workflows. An observational user study in collaboration with domain experts at Fiddler AI highlights that our tool can help ML practitioners identify and understand model biases.

随着机器学习(ML)系统变得越来越广泛，有必要在部署之前对这些系统进行偏差审计。最近的研究开发了一种算法，可以有效地识别以数据的可解释的、表现不佳的子集(或切片)形式出现的交叉偏差。然而，如果没有一个工具来直观地理解和与这些算法的结果交互，这些解决方案及其见解是有限的。我们提出Visual Auditor，一个用于审计和总结模型偏差的交互式可视化工具。Visual Auditor通过提供可解释的交叉偏差(在检查由多个特征定义的总体时出现的偏差)概述、问题数据片之间的关系细节以及模型中表现不佳和过度形成的数据片之间的比较来帮助模型验证。我们的开源工具直接在计算笔记本和web浏览器中运行，使模型审计易于访问并轻松集成到当前的ML开发工作流程中。与Fiddler AI领域专家合作的一项观察性用户研究强调，我们的工具可以帮助ML从业者识别和理解模型偏差。

{"title":"Visual Auditor: Interactive Visualization for Detection and Summarization of Model Biases","authors":"David Munechika, Zijie J. Wang, Jack Reidy, Josh Rubin, Krishna Gade, K. Kenthapadi, Duen Horng Chau","doi":"10.1109/VIS54862.2022.00018","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00018","url":null,"abstract":"As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their de-ployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underper-forming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually understanding and interacting with the results of these algorithms. We propose Visual Auditor, an interactive visualization tool for auditing and summarizing model biases. Visual Auditor assists model validation by providing an interpretable overview of intersectional bias (bias that is present when examining populations defined by multiple features), details about relationships between problematic data slices, and a comparison between underperforming and overper-forming data slices in a model. Our open-source tool runs directly in both computational notebooks and web browsers, making model auditing accessible and easily integrated into current ML development workflows. An observational user study in collaboration with domain experts at Fiddler AI highlights that our tool can help ML practitioners identify and understand model biases.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125371166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Plotly-Resampler: Effective Visual Analytics for Large Time Series plot - resampler:大时间序列的有效视觉分析

2022 IEEE Visualization and Visual Analytics (VIS)

Pub Date : 2022-06-17 DOI: 10.1109/VIS54862.2022.00013

Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, S. Hoecke

Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.

可视化分析可以说是熟悉数据的最重要的一步。对于时间序列来说尤其如此，因为这种数据类型很难描述，并且在使用汇总统计等数据时无法完全理解。要实现有效的时间序列可视化，必须满足四个要求;工具应该(1)具有交互性，(2)可扩展到数百万个数据点，(3)可在传统数据科学环境中集成，以及(4)高度可配置。我们观察到，开源Python可视化工具包使数据科学家能够完成大多数可视化分析任务，但缺乏可扩展性和交互性的结合，无法实现有效的时间序列可视化。为了满足这些需求，我们创建了plot - resampler，这是一个开源Python库。plot - resampler是Plotly Python绑定的附加组件，通过根据当前图形视图聚合底层数据，增强了交互式工具包之上的折线图可扩展性。plot - resampler的构建是灵活的，因为工具的反应性定性地影响分析人员如何在视觉上探索和分析数据。基准测试任务强调了我们的工具包在样本数量和时间序列方面如何优于替代方案。此外，plot - resampler灵活的数据聚合功能为研究新的聚合技术铺平了道路。plot - resampler的可积性，以及它的可配置性、便利性和高可伸缩性，允许在日常Python环境中有效地分析高频数据。

{"title":"Plotly-Resampler: Effective Visual Analytics for Large Time Series","authors":"Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, S. Hoecke","doi":"10.1109/VIS54862.2022.00013","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00013","url":null,"abstract":"Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124904400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Uniform Manifold Approximation with Two-phase Optimization 统一流形逼近与两阶段优化

2022 IEEE Visualization and Visual Analytics (VIS)

Pub Date : 2022-05-01 DOI: 10.1109/VIS54862.2022.00025

Hyeon Jeon, Hyung-Kwon Ko, S. Lee, Jaemin Jo, Jinwook Seo

We introduce Uniform Manifold Approximation with Two-phase Optimization (UMATO), a dimensionality reduction (DR) technique that improves UMAP to capture the global structure of high-dimensional data more accurately. In UMATO, optimization is divided into two phases so that the resulting embeddings can depict the global structure reliably while preserving the local structure with sufficient accuracy. In the first phase, hub points are identified and projected to construct a skeletal layout for the global structure. In the second phase, the remaining points are added to the embedding preserving the regional characteristics of local areas. Through quan-titative experiments, we found that UMATO (1) outperformed widely used DR techniques in preserving the global structure while (2) pro-ducing competitive accuracy in representing the local structure. We also verified that UMATO is preferable in terms of robustness over diverse initialization methods, numbers of epochs, and subsampling techniques.

我们引入了统一流形近似与两相优化(UMATO)，一种降维(DR)技术，改进了UMAP，以更准确地捕获高维数据的全局结构。在UMATO中，优化分为两个阶段，以便得到的嵌入既能可靠地描述全局结构，又能以足够的精度保留局部结构。在第一阶段，识别和投影枢纽点，以构建全局结构的骨架布局。在第二阶段，将剩余的点加入到嵌入中，保持局部区域的区域特征。通过定量实验，我们发现UMATO(1)在保留全局结构方面优于广泛使用的DR技术，而(2)在表示局部结构方面具有竞争力的准确性。我们还验证了UMATO在鲁棒性方面优于不同的初始化方法，时代数量和子采样技术。

引用次数: 6

Toward Systematic Considerations of Missingness in Visual Analytics 对视觉分析中缺失的系统考虑

2022 IEEE Visualization and Visual Analytics (VIS)

Pub Date : 2021-08-10 DOI: 10.1109/VIS54862.2022.00031

Maoyuan Sun, Yue Ma, Yuanxin Wang, Tianyi Li, Jian Zhao, Yujun Liu, Ping-Shou Zhong

Data-driven decision making has been a common task in today's big data era, from simple choices such as finding a fast way to drive home, to complex decisions on medical treatment. It is often supported by visual analytics. For various reasons (e.g., system failure, interrupted network, intentional information hiding, or bias), visual analytics for sensemaking of data involves missingness (e.g., data loss and incomplete analysis), which impacts human decisions. For example, missing data can cost a business millions of dollars, and failing to recognize key evidence can put an innocent person in jail. Being aware of missingness is critical to avoid such catastrophes. To fulfill this, as an initial step, we consider missingness in visual analytics from two aspects: data-centric and human-centric. The former emphasizes missingness in three data-related categories: data composition, data relationship, and data usage. The latter focuses on the human-perceived missingness at three levels: observed-level, inferred-level, and ignored-level. Based on them, we discuss possible roles of visualizations for handling missingness, and conclude our discussion with future research opportunities.

在当今的大数据时代，数据驱动的决策已经成为一项常见的任务，从简单的选择，比如找到一条快速回家的路，到复杂的医疗决策。它通常由可视化分析支持。由于各种原因(例如，系统故障、网络中断、故意隐藏信息或偏见)，用于数据语义的可视化分析涉及缺失(例如，数据丢失和不完整的分析)，这会影响人类的决策。例如，丢失数据可能会使企业损失数百万美元，而未能识别关键证据可能会使无辜的人入狱。意识到失踪是避免此类灾难的关键。为了实现这一点，作为第一步，我们从两个方面考虑视觉分析的缺失:以数据为中心和以人为中心。前者强调与数据相关的三个类别的缺失:数据组成、数据关系和数据使用。后者侧重于人类感知缺失的三个层面:观察层面、推断层面和忽略层面。在此基础上，我们讨论了可视化在处理缺失方面可能发挥的作用，并对未来的研究机会进行了总结。

{"title":"Toward Systematic Considerations of Missingness in Visual Analytics","authors":"Maoyuan Sun, Yue Ma, Yuanxin Wang, Tianyi Li, Jian Zhao, Yujun Liu, Ping-Shou Zhong","doi":"10.1109/VIS54862.2022.00031","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00031","url":null,"abstract":"Data-driven decision making has been a common task in today's big data era, from simple choices such as finding a fast way to drive home, to complex decisions on medical treatment. It is often supported by visual analytics. For various reasons (e.g., system failure, interrupted network, intentional information hiding, or bias), visual analytics for sensemaking of data involves missingness (e.g., data loss and incomplete analysis), which impacts human decisions. For example, missing data can cost a business millions of dollars, and failing to recognize key evidence can put an innocent person in jail. Being aware of missingness is critical to avoid such catastrophes. To fulfill this, as an initial step, we consider missingness in visual analytics from two aspects: data-centric and human-centric. The former emphasizes missingness in three data-related categories: data composition, data relationship, and data usage. The latter focuses on the human-perceived missingness at three levels: observed-level, inferred-level, and ignored-level. Based on them, we discuss possible roles of visualizations for handling missingness, and conclude our discussion with future research opportunities.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128814141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring D3 Implementation Challenges on Stack Overflow 探索D3在堆栈溢出上的实现挑战

2022 IEEE Visualization and Visual Analytics (VIS)

Pub Date : 2021-08-04 DOI: 10.1109/VIS54862.2022.00009

L. Battle, Danni Feng, Kelli Webber

Visualization languages help to standardize the process of designing effective visualizations, one of the most prominent being D3. However, few researchers have analyzed at scale how users incorporate these languages into existing visualization programming processes, i.e., implementation workflows. In this paper, we present an analysis of the experiences of D3 users as observed through Stack Overflow, summarizing common D3 implementation workflows and challenges discussed online. Our results show how the visualization community may be limiting its understanding of users' visualization implementation challenges by ignoring the larger context in which languages such as D3 are used. Based on our findings, we suggest new research directions to enhance the user experience with visualization languages. All our data and code are available at: https://osf.io/fup48/.

可视化语言有助于标准化设计有效可视化的过程，其中最突出的是D3。然而，很少有研究人员大规模地分析用户如何将这些语言纳入现有的可视化编程过程，即实现工作流。在本文中，我们通过Stack Overflow对D3用户的体验进行了分析，总结了常见的D3实现工作流和在线讨论的挑战。我们的研究结果表明，可视化社区可能会通过忽视D3等语言使用的更大背景来限制其对用户可视化实现挑战的理解。基于我们的研究结果，我们提出了新的研究方向，以增强可视化语言的用户体验。我们所有的数据和代码都可以在https://osf.io/fup48/上找到。

引用次数: 8

Guided Data Discovery in Interactive Visualizations via Active Search 通过主动搜索在交互式可视化中引导数据发现

2022 IEEE Visualization and Visual Analytics (VIS)

Pub Date : 2020-10-16 DOI: 10.1109/VIS54862.2022.00023

S. Monadjemi, Sunwoo Ha, Quan Nguyen, Henry Chai, R. Garnett, Alvitta Ottley

Recent advances in visual analytics have enabled us to learn from user interactions and uncover analytic goals. These innovations set the foundation for actively guiding users during data exploration. Providing such guidance will become more critical as datasets grow in size and complexity, precluding exhaustive investigation. Mean-while, the machine learning community also struggles with datasets growing in size and complexity, precluding exhaustive labeling. Active learning is a broad family of algorithms developed for actively guiding models during training. We will consider the intersection of these analogous research thrusts. First, we discuss the nuances of matching the choice of an active learning algorithm to the task at hand. This is critical for performance, a fact we demonstrate in a simulation study. We then present results of a user study for the particular task of data discovery guided by an active learning algorithm specifically designed for this task.

视觉分析的最新进展使我们能够从用户交互中学习并揭示分析目标。这些创新为在数据探索过程中积极引导用户奠定了基础。随着数据集的规模和复杂性的增长，提供这样的指导将变得更加重要，这将排除详尽的调查。与此同时，机器学习社区也在努力应对不断增长的数据集的规模和复杂性，从而排除了详尽的标签。主动学习是为在训练过程中主动引导模型而开发的一大类算法。我们将考虑这些类似的研究重点的交集。首先，我们讨论了将主动学习算法的选择与手头的任务相匹配的细微差别。这对性能至关重要，我们在模拟研究中证明了这一点。然后，我们提出了一项用户研究的结果，该研究由专门为该任务设计的主动学习算法指导，用于数据发现的特定任务。

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE Visualization and Visual Analytics (VIS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀