{"title":"Oscar:一种基于语义的数据分组方法","authors":"V. Setlur, M. Correll, S. Battersby","doi":"10.1109/VIS54862.2022.00029","DOIUrl":null,"url":null,"abstract":"Binning is applied to categorize data values or to see distributions of data. Existing binning algorithms often rely on statistical properties of data. However, there are semantic considerations for selecting appropriate binning schemes. Surveys, for instance, gather respon-dent data for demographic-related questions such as age, salary, number of employees, etc., that are bucketed into defined semantic categories. In this paper, we leverage common semantic categories from survey data and Tableau Public visualizations to identify a set of semantic binning categories. We employ these semantic binning categories in Oscar: a method for automatically selecting bins based on the inferred semantic type of the field. We conducted a crowdsourced study with 120 participants to better understand user preferences for bins generated by Oscar vs. binning provided in Tableau. We find that maps and histograms using binned values generated by Oscar are preferred by users as compared to binning schemes based purely on the statistical properties of the data.","PeriodicalId":190244,"journal":{"name":"2022 IEEE Visualization and Visual Analytics (VIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Oscar: A Semantic-based Data Binning Approach\",\"authors\":\"V. Setlur, M. Correll, S. Battersby\",\"doi\":\"10.1109/VIS54862.2022.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Binning is applied to categorize data values or to see distributions of data. Existing binning algorithms often rely on statistical properties of data. However, there are semantic considerations for selecting appropriate binning schemes. Surveys, for instance, gather respon-dent data for demographic-related questions such as age, salary, number of employees, etc., that are bucketed into defined semantic categories. In this paper, we leverage common semantic categories from survey data and Tableau Public visualizations to identify a set of semantic binning categories. We employ these semantic binning categories in Oscar: a method for automatically selecting bins based on the inferred semantic type of the field. We conducted a crowdsourced study with 120 participants to better understand user preferences for bins generated by Oscar vs. binning provided in Tableau. We find that maps and histograms using binned values generated by Oscar are preferred by users as compared to binning schemes based purely on the statistical properties of the data.\",\"PeriodicalId\":190244,\"journal\":{\"name\":\"2022 IEEE Visualization and Visual Analytics (VIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Visualization and Visual Analytics (VIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VIS54862.2022.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Visualization and Visual Analytics (VIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VIS54862.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Binning is applied to categorize data values or to see distributions of data. Existing binning algorithms often rely on statistical properties of data. However, there are semantic considerations for selecting appropriate binning schemes. Surveys, for instance, gather respon-dent data for demographic-related questions such as age, salary, number of employees, etc., that are bucketed into defined semantic categories. In this paper, we leverage common semantic categories from survey data and Tableau Public visualizations to identify a set of semantic binning categories. We employ these semantic binning categories in Oscar: a method for automatically selecting bins based on the inferred semantic type of the field. We conducted a crowdsourced study with 120 participants to better understand user preferences for bins generated by Oscar vs. binning provided in Tableau. We find that maps and histograms using binned values generated by Oscar are preferred by users as compared to binning schemes based purely on the statistical properties of the data.