Pub Date : 2022-06-25DOI: 10.1109/VIS54862.2022.00018
David Munechika, Zijie J. Wang, Jack Reidy, Josh Rubin, Krishna Gade, K. Kenthapadi, Duen Horng Chau
As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their de-ployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underper-forming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually understanding and interacting with the results of these algorithms. We propose Visual Auditor, an interactive visualization tool for auditing and summarizing model biases. Visual Auditor assists model validation by providing an interpretable overview of intersectional bias (bias that is present when examining populations defined by multiple features), details about relationships between problematic data slices, and a comparison between underperforming and overper-forming data slices in a model. Our open-source tool runs directly in both computational notebooks and web browsers, making model auditing accessible and easily integrated into current ML development workflows. An observational user study in collaboration with domain experts at Fiddler AI highlights that our tool can help ML practitioners identify and understand model biases.
{"title":"Visual Auditor: Interactive Visualization for Detection and Summarization of Model Biases","authors":"David Munechika, Zijie J. Wang, Jack Reidy, Josh Rubin, Krishna Gade, K. Kenthapadi, Duen Horng Chau","doi":"10.1109/VIS54862.2022.00018","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00018","url":null,"abstract":"As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their de-ployment. Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underper-forming subsets (or slices) of the data. However, these solutions and their insights are limited without a tool for visually understanding and interacting with the results of these algorithms. We propose Visual Auditor, an interactive visualization tool for auditing and summarizing model biases. Visual Auditor assists model validation by providing an interpretable overview of intersectional bias (bias that is present when examining populations defined by multiple features), details about relationships between problematic data slices, and a comparison between underperforming and overper-forming data slices in a model. Our open-source tool runs directly in both computational notebooks and web browsers, making model auditing accessible and easily integrated into current ML development workflows. An observational user study in collaboration with domain experts at Fiddler AI highlights that our tool can help ML practitioners identify and understand model biases.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125371166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-17DOI: 10.1109/VIS54862.2022.00013
Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, S. Hoecke
Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.
{"title":"Plotly-Resampler: Effective Visual Analytics for Large Time Series","authors":"Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, S. Hoecke","doi":"10.1109/VIS54862.2022.00013","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00013","url":null,"abstract":"Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124904400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1109/VIS54862.2022.00025
Hyeon Jeon, Hyung-Kwon Ko, S. Lee, Jaemin Jo, Jinwook Seo
We introduce Uniform Manifold Approximation with Two-phase Optimization (UMATO), a dimensionality reduction (DR) technique that improves UMAP to capture the global structure of high-dimensional data more accurately. In UMATO, optimization is divided into two phases so that the resulting embeddings can depict the global structure reliably while preserving the local structure with sufficient accuracy. In the first phase, hub points are identified and projected to construct a skeletal layout for the global structure. In the second phase, the remaining points are added to the embedding preserving the regional characteristics of local areas. Through quan-titative experiments, we found that UMATO (1) outperformed widely used DR techniques in preserving the global structure while (2) pro-ducing competitive accuracy in representing the local structure. We also verified that UMATO is preferable in terms of robustness over diverse initialization methods, numbers of epochs, and subsampling techniques.
{"title":"Uniform Manifold Approximation with Two-phase Optimization","authors":"Hyeon Jeon, Hyung-Kwon Ko, S. Lee, Jaemin Jo, Jinwook Seo","doi":"10.1109/VIS54862.2022.00025","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00025","url":null,"abstract":"We introduce Uniform Manifold Approximation with Two-phase Optimization (UMATO), a dimensionality reduction (DR) technique that improves UMAP to capture the global structure of high-dimensional data more accurately. In UMATO, optimization is divided into two phases so that the resulting embeddings can depict the global structure reliably while preserving the local structure with sufficient accuracy. In the first phase, hub points are identified and projected to construct a skeletal layout for the global structure. In the second phase, the remaining points are added to the embedding preserving the regional characteristics of local areas. Through quan-titative experiments, we found that UMATO (1) outperformed widely used DR techniques in preserving the global structure while (2) pro-ducing competitive accuracy in representing the local structure. We also verified that UMATO is preferable in terms of robustness over diverse initialization methods, numbers of epochs, and subsampling techniques.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133914801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data-driven decision making has been a common task in today's big data era, from simple choices such as finding a fast way to drive home, to complex decisions on medical treatment. It is often supported by visual analytics. For various reasons (e.g., system failure, interrupted network, intentional information hiding, or bias), visual analytics for sensemaking of data involves missingness (e.g., data loss and incomplete analysis), which impacts human decisions. For example, missing data can cost a business millions of dollars, and failing to recognize key evidence can put an innocent person in jail. Being aware of missingness is critical to avoid such catastrophes. To fulfill this, as an initial step, we consider missingness in visual analytics from two aspects: data-centric and human-centric. The former emphasizes missingness in three data-related categories: data composition, data relationship, and data usage. The latter focuses on the human-perceived missingness at three levels: observed-level, inferred-level, and ignored-level. Based on them, we discuss possible roles of visualizations for handling missingness, and conclude our discussion with future research opportunities.
{"title":"Toward Systematic Considerations of Missingness in Visual Analytics","authors":"Maoyuan Sun, Yue Ma, Yuanxin Wang, Tianyi Li, Jian Zhao, Yujun Liu, Ping-Shou Zhong","doi":"10.1109/VIS54862.2022.00031","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00031","url":null,"abstract":"Data-driven decision making has been a common task in today's big data era, from simple choices such as finding a fast way to drive home, to complex decisions on medical treatment. It is often supported by visual analytics. For various reasons (e.g., system failure, interrupted network, intentional information hiding, or bias), visual analytics for sensemaking of data involves missingness (e.g., data loss and incomplete analysis), which impacts human decisions. For example, missing data can cost a business millions of dollars, and failing to recognize key evidence can put an innocent person in jail. Being aware of missingness is critical to avoid such catastrophes. To fulfill this, as an initial step, we consider missingness in visual analytics from two aspects: data-centric and human-centric. The former emphasizes missingness in three data-related categories: data composition, data relationship, and data usage. The latter focuses on the human-perceived missingness at three levels: observed-level, inferred-level, and ignored-level. Based on them, we discuss possible roles of visualizations for handling missingness, and conclude our discussion with future research opportunities.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128814141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-04DOI: 10.1109/VIS54862.2022.00009
L. Battle, Danni Feng, Kelli Webber
Visualization languages help to standardize the process of designing effective visualizations, one of the most prominent being D3. However, few researchers have analyzed at scale how users incorporate these languages into existing visualization programming processes, i.e., implementation workflows. In this paper, we present an analysis of the experiences of D3 users as observed through Stack Overflow, summarizing common D3 implementation workflows and challenges discussed online. Our results show how the visualization community may be limiting its understanding of users' visualization implementation challenges by ignoring the larger context in which languages such as D3 are used. Based on our findings, we suggest new research directions to enhance the user experience with visualization languages. All our data and code are available at: https://osf.io/fup48/.
{"title":"Exploring D3 Implementation Challenges on Stack Overflow","authors":"L. Battle, Danni Feng, Kelli Webber","doi":"10.1109/VIS54862.2022.00009","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00009","url":null,"abstract":"Visualization languages help to standardize the process of designing effective visualizations, one of the most prominent being D3. However, few researchers have analyzed at scale how users incorporate these languages into existing visualization programming processes, i.e., implementation workflows. In this paper, we present an analysis of the experiences of D3 users as observed through Stack Overflow, summarizing common D3 implementation workflows and challenges discussed online. Our results show how the visualization community may be limiting its understanding of users' visualization implementation challenges by ignoring the larger context in which languages such as D3 are used. Based on our findings, we suggest new research directions to enhance the user experience with visualization languages. All our data and code are available at: https://osf.io/fup48/.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117303959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-16DOI: 10.1109/VIS54862.2022.00023
S. Monadjemi, Sunwoo Ha, Quan Nguyen, Henry Chai, R. Garnett, Alvitta Ottley
Recent advances in visual analytics have enabled us to learn from user interactions and uncover analytic goals. These innovations set the foundation for actively guiding users during data exploration. Providing such guidance will become more critical as datasets grow in size and complexity, precluding exhaustive investigation. Mean-while, the machine learning community also struggles with datasets growing in size and complexity, precluding exhaustive labeling. Active learning is a broad family of algorithms developed for actively guiding models during training. We will consider the intersection of these analogous research thrusts. First, we discuss the nuances of matching the choice of an active learning algorithm to the task at hand. This is critical for performance, a fact we demonstrate in a simulation study. We then present results of a user study for the particular task of data discovery guided by an active learning algorithm specifically designed for this task.
{"title":"Guided Data Discovery in Interactive Visualizations via Active Search","authors":"S. Monadjemi, Sunwoo Ha, Quan Nguyen, Henry Chai, R. Garnett, Alvitta Ottley","doi":"10.1109/VIS54862.2022.00023","DOIUrl":"https://doi.org/10.1109/VIS54862.2022.00023","url":null,"abstract":"Recent advances in visual analytics have enabled us to learn from user interactions and uncover analytic goals. These innovations set the foundation for actively guiding users during data exploration. Providing such guidance will become more critical as datasets grow in size and complexity, precluding exhaustive investigation. Mean-while, the machine learning community also struggles with datasets growing in size and complexity, precluding exhaustive labeling. Active learning is a broad family of algorithms developed for actively guiding models during training. We will consider the intersection of these analogous research thrusts. First, we discuss the nuances of matching the choice of an active learning algorithm to the task at hand. This is critical for performance, a fact we demonstrate in a simulation study. We then present results of a user study for the particular task of data discovery guided by an active learning algorithm specifically designed for this task.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127253057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}