Vicente Nejar de Almeida, Eduardo Ribeiro, Nassim Bouarour, João Luiz Dihl Comba, Sihem Amer-Yahia
{"title":"舍瓦:用于统计假设探索的可视化分析系统","authors":"Vicente Nejar de Almeida, Eduardo Ribeiro, Nassim Bouarour, João Luiz Dihl Comba, Sihem Amer-Yahia","doi":"10.14778/3611540.3611631","DOIUrl":null,"url":null,"abstract":"We demonstrate SHEVA, a System for Hypothesis Exploration with Visual Analytics. SHEVA adopts an Exploratory Data Analysis (EDA) approach to discovering statistically-sound insights from large datasets. The system addresses three longstanding challenges in Multiple Hypothesis Testing: (i) the likelihood of rejecting the null hypothesis by chance, (ii) the pitfall of not being representative of the input data, and (iii) the ability to navigate among many data regions while preserving the user's train of thought. To address (i) & (ii), SHEVA implements significance adjustment methods that account for data-informed properties such as coverage and novelty. To address (iii), SHEVA proposes to guide users by recommending one-sample and two-sample hypotheses in a stepwise fashion following a data hierarchy. Users may choose from a collection of pre-trained hypothesis exploration policies and let SHEVA guide them through the most significant hypotheses in the data, or intervene to override suggested hypotheses. Furthermore, SHEVA relies on data-to-visual element mappings to convey hypothesis testing results in an interpretable fashion, and allows hypothesis pipelines to be stored and retrieved later to be tested on new datasets.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"42 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SHEVA: A Visual Analytics System for Statistical Hypothesis Exploration\",\"authors\":\"Vicente Nejar de Almeida, Eduardo Ribeiro, Nassim Bouarour, João Luiz Dihl Comba, Sihem Amer-Yahia\",\"doi\":\"10.14778/3611540.3611631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We demonstrate SHEVA, a System for Hypothesis Exploration with Visual Analytics. SHEVA adopts an Exploratory Data Analysis (EDA) approach to discovering statistically-sound insights from large datasets. The system addresses three longstanding challenges in Multiple Hypothesis Testing: (i) the likelihood of rejecting the null hypothesis by chance, (ii) the pitfall of not being representative of the input data, and (iii) the ability to navigate among many data regions while preserving the user's train of thought. To address (i) & (ii), SHEVA implements significance adjustment methods that account for data-informed properties such as coverage and novelty. To address (iii), SHEVA proposes to guide users by recommending one-sample and two-sample hypotheses in a stepwise fashion following a data hierarchy. Users may choose from a collection of pre-trained hypothesis exploration policies and let SHEVA guide them through the most significant hypotheses in the data, or intervene to override suggested hypotheses. Furthermore, SHEVA relies on data-to-visual element mappings to convey hypothesis testing results in an interpretable fashion, and allows hypothesis pipelines to be stored and retrieved later to be tested on new datasets.\",\"PeriodicalId\":54220,\"journal\":{\"name\":\"Proceedings of the Vldb Endowment\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Vldb Endowment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14778/3611540.3611631\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611631","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
SHEVA: A Visual Analytics System for Statistical Hypothesis Exploration
We demonstrate SHEVA, a System for Hypothesis Exploration with Visual Analytics. SHEVA adopts an Exploratory Data Analysis (EDA) approach to discovering statistically-sound insights from large datasets. The system addresses three longstanding challenges in Multiple Hypothesis Testing: (i) the likelihood of rejecting the null hypothesis by chance, (ii) the pitfall of not being representative of the input data, and (iii) the ability to navigate among many data regions while preserving the user's train of thought. To address (i) & (ii), SHEVA implements significance adjustment methods that account for data-informed properties such as coverage and novelty. To address (iii), SHEVA proposes to guide users by recommending one-sample and two-sample hypotheses in a stepwise fashion following a data hierarchy. Users may choose from a collection of pre-trained hypothesis exploration policies and let SHEVA guide them through the most significant hypotheses in the data, or intervene to override suggested hypotheses. Furthermore, SHEVA relies on data-to-visual element mappings to convey hypothesis testing results in an interpretable fashion, and allows hypothesis pipelines to be stored and retrieved later to be tested on new datasets.
期刊介绍:
The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.