As datasets grow to tera- and petabyte sizes, exploratory data visualization becomes very difficult: a screen is limited to a few million pixels, and main memory to a few tens of millions of data points. Yet these very large scale analyses are of tremendous interest to industry and academia. This paper discusses some of the major challenges involved in data analytics at scale, including issues of computation, communication, and rendering. It identifies techniques for handling large scale data, grouped into "look at less of it," and "look at it faster." Using these techniques involves a number of difficult design tradeoffs for both the ways that data can be represented, and the ways that users can interact with the visualizations.
{"title":"Big data exploration requires collaboration between visualization and data infrastructures","authors":"Danyel Fisher","doi":"10.1145/2939502.2939518","DOIUrl":"https://doi.org/10.1145/2939502.2939518","url":null,"abstract":"As datasets grow to tera- and petabyte sizes, exploratory data visualization becomes very difficult: a screen is limited to a few million pixels, and main memory to a few tens of millions of data points. Yet these very large scale analyses are of tremendous interest to industry and academia. This paper discusses some of the major challenges involved in data analytics at scale, including issues of computation, communication, and rendering. It identifies techniques for handling large scale data, grouped into \"look at less of it,\" and \"look at it faster.\" Using these techniques involves a number of difficult design tradeoffs for both the ways that data can be represented, and the ways that users can interact with the visualizations.","PeriodicalId":356971,"journal":{"name":"HILDA '16","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126872504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Exploratory data analysis is challenging given the complexity of data. Models find structure in the data lessening the complexity for users. These models have parameters that can be adjusted to explore the data from many different angles providing more ways to learn about the data. "Human in the loop" means users can interact with the parameters to explore alternative structures. This exploration allows for discovery. This paper examines usability issues of Human-Model Interaction (HMI) for data analytics. In particular, we bridge the gaps between a user's intention and the parameters of a WMDS model during HMI communication.
{"title":"Bridging the gap between user intention and model parameters for human-in-the-loop data analytics","authors":"J. Self, R. K. Vinayagam, J. T. Fry, Chris North","doi":"10.1145/2939502.2939505","DOIUrl":"https://doi.org/10.1145/2939502.2939505","url":null,"abstract":"Exploratory data analysis is challenging given the complexity of data. Models find structure in the data lessening the complexity for users. These models have parameters that can be adjusted to explore the data from many different angles providing more ways to learn about the data. \"Human in the loop\" means users can interact with the parameters to explore alternative structures. This exploration allows for discovery. This paper examines usability issues of Human-Model Interaction (HMI) for data analytics. In particular, we bridge the gaps between a user's intention and the parameters of a WMDS model during HMI communication.","PeriodicalId":356971,"journal":{"name":"HILDA '16","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128152104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Declarative languages have a long tradition in both the database systems and data visualization communities, separating specifications from implementations. In databases, declarative languages like SQL shield application programmers from changes to physical and logical properties like disk layouts, indexes and schema changes. In data visualization, declarative languages like Polaris, ggplot2 and Vega shield visualization programmers from variations in rendering, including screen layout, resolution, and color schemes. Declarative languages have been considered a foundational step forward in both communities.
{"title":"A DeVIL-ish approach to inconsistency in interactive visualizations","authors":"Yifan Wu, J. Hellerstein, Eugene Wu","doi":"10.1145/2939502.2939517","DOIUrl":"https://doi.org/10.1145/2939502.2939517","url":null,"abstract":"Declarative languages have a long tradition in both the database systems and data visualization communities, separating specifications from implementations. In databases, declarative languages like SQL shield application programmers from changes to physical and logical properties like disk layouts, indexes and schema changes. In data visualization, declarative languages like Polaris, ggplot2 and Vega shield visualization programmers from variations in rendering, including screen layout, resolution, and color schemes. Declarative languages have been considered a foundational step forward in both communities.","PeriodicalId":356971,"journal":{"name":"HILDA '16","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131193470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As digital objects become increasingly important in people's lives, people may need to understand the provenance, or lineage and history, of an important digital object, to understand how it was produced. This is particularly important for objects created from large, multi-source collections of personal data. As the metadata describing provenance, Provenance Data, is commonly represented as a labelled directed acyclic graph, the challenge is to create effective interfaces onto such graphs so that people can understand the provenance of key digital objects. This unsolved problem is especially challenging for the case of novice and intermittent users and complex provenance graphs. We tackle this by creating an interface based on a clustering approach. This was designed to enable users to view provenance graphs, and to simplify complex graphs by combining several nodes. Our core contribution is the design of a prototype interface that supports clustering and its analytic evaluation in terms of desirable properties of visualisation interfaces.
{"title":"Clustering provenance facilitating provenance exploration through data abstraction","authors":"Linus Karsai, A. Fekete, J. Kay, P. Missier","doi":"10.1145/2939502.2939508","DOIUrl":"https://doi.org/10.1145/2939502.2939508","url":null,"abstract":"As digital objects become increasingly important in people's lives, people may need to understand the provenance, or lineage and history, of an important digital object, to understand how it was produced. This is particularly important for objects created from large, multi-source collections of personal data. As the metadata describing provenance, Provenance Data, is commonly represented as a labelled directed acyclic graph, the challenge is to create effective interfaces onto such graphs so that people can understand the provenance of key digital objects. This unsolved problem is especially challenging for the case of novice and intermittent users and complex provenance graphs. We tackle this by creating an interface based on a clustering approach. This was designed to enable users to view provenance graphs, and to simplify complex graphs by combining several nodes. Our core contribution is the design of a prototype interface that supports clustering and its analytic evaluation in terms of desirable properties of visualisation interfaces.","PeriodicalId":356971,"journal":{"name":"HILDA '16","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129127827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The surfacing of trends from data collections such as user-generated content streams and news articles is a popular and important data analysis activity, used in applications such as business intelligence, quantitative stock trading and, social media exploration. Unlike traditional content analysis, trend analysis includes an additional vital time dimension: a trend can be defined as a temporal pattern over a group of semantically related items. The unsupervised discovery of trends is often not sufficient, either due to inadequacies in the trend analysis algorithm, or because the data collection itself does not possess all of the information to identify the trend. Thus, it is necessary for an expert human-in-the-loop to be involved in the process of trend analysis. To this end, we introduce TrendQuery, a system designed towards iterative and interactive surfacing of trends. Our system provides a set of trends to the expert, and enumerates iterative operations to curate the result. This process continues until the expert is satisfied with the surfaced trends. Since the space of possible tweaks to the result can be extremely large, the system continually provides feedback and guidance to the expert to prioritize possible operations. Our system allows interactive curation of trends providing better insights than a purely unsupervised approach.
{"title":"TrendQuery: a system for interactive exploration of trends","authors":"N. Kamat, Eugene Wu, Arnab Nandi","doi":"10.1145/2939502.2939514","DOIUrl":"https://doi.org/10.1145/2939502.2939514","url":null,"abstract":"The surfacing of trends from data collections such as user-generated content streams and news articles is a popular and important data analysis activity, used in applications such as business intelligence, quantitative stock trading and, social media exploration. Unlike traditional content analysis, trend analysis includes an additional vital time dimension: a trend can be defined as a temporal pattern over a group of semantically related items. The unsupervised discovery of trends is often not sufficient, either due to inadequacies in the trend analysis algorithm, or because the data collection itself does not possess all of the information to identify the trend. Thus, it is necessary for an expert human-in-the-loop to be involved in the process of trend analysis.\u0000 To this end, we introduce TrendQuery, a system designed towards iterative and interactive surfacing of trends. Our system provides a set of trends to the expert, and enumerates iterative operations to curate the result. This process continues until the expert is satisfied with the surfaced trends. Since the space of possible tweaks to the result can be extremely large, the system continually provides feedback and guidance to the expert to prioritize possible operations. Our system allows interactive curation of trends providing better insights than a purely unsupervised approach.","PeriodicalId":356971,"journal":{"name":"HILDA '16","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125393212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}