首页 > 最新文献

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)最新文献

英文 中文
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns 在实践中测试兴趣度量:购买模式的大规模分析
M. Kirchgessner, V. Leroy, S. Amer-Yahia, Shashwati Mishra
Understanding customer buying patterns is of great interest to the retail industry. Association rule mining is a common technique for extracting correlations such as people in the South of France buy rosé wine or customers who buy paté also buy salted butter and sour bread. Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare different rule rankings. We report on how we used C A PA to compare 34 interestingness measures applied to patterns extracted from customer receipts of more than 1,800 stores for a period of one year.
了解顾客的购买模式对零售业来说是非常重要的。关联规则挖掘是一种用于提取相关性的常用技术,例如法国南部的人购买玫瑰红葡萄酒,或者购买pat的顾客也购买咸黄油和酸面包。不幸的是,筛选大量的购买模式在实践中是没有用的,因为热门产品在顶级规则中占主导地位。因此,人们提出了许多“有趣”的衡量标准(超过30个)来对规则进行排名。然而,对于哪种衡量方法更适合零售数据,目前还没有达成一致。此外,由于模式挖掘算法为每个产品输出数千个关联规则,因此分析人员依靠排名度量来识别最有趣的规则的能力至关重要。在本文中,我们开发了CAPA(模式比较分析),这是一个为分析人员提供比较不同规则排名能力的框架。我们报告了我们如何使用C A PA来比较34个有趣的度量,这些度量适用于从1800多家商店的顾客收据中提取的模式,为期一年。
{"title":"Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns","authors":"M. Kirchgessner, V. Leroy, S. Amer-Yahia, Shashwati Mishra","doi":"10.1109/DSAA.2016.53","DOIUrl":"https://doi.org/10.1109/DSAA.2016.53","url":null,"abstract":"Understanding customer buying patterns is of great interest to the retail industry. Association rule mining is a common technique for extracting correlations such as people in the South of France buy rosé wine or customers who buy paté also buy salted butter and sour bread. Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of \"interestingness\" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare different rule rankings. We report on how we used C A PA to compare 34 interestingness measures applied to patterns extracted from customer receipts of more than 1,800 stores for a period of one year.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"43 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114116545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing 无痛苦地持续监测A/B测试:在贝叶斯测试中可选择停止
Alex Deng, Jiannan Lu, Shouyuan Chen
A/B testing is one of the most successful applications of statistical theory in the Internet age. A crucial problem of Null Hypothesis Statistical Testing (NHST), the backbone of A/B testing methodology, is that experimenters are not allowed to continuously monitor the results and make decisions in real time. Many people see this restriction as a setback against the trend in the technology toward real time data analytics. Recently, Bayesian Hypothesis Testing, which intuitively is more suitable for real time decision making, attracted growing interest as a viable alternative to NHST. While corrections of NHST for the continuous monitoring setting are well established in the existing literature and known in A/B testing community, the debate over the issue of whether continuous monitoring is a proper practice in Bayesian testing exists among both academic researchers and general practitioners. In this paper, we formally prove the validity of Bayesian testing under proper stopping rules, and illustrate the theoretical results with concrete simulation illustrations. We point out common bad practices where stopping rules are not proper, and discuss how priors can be learned objectively. General guidelines for researchers and practitioners are also provided.
A/B测试是统计理论在互联网时代最成功的应用之一。零假设统计检验(NHST)是A/B测试方法的支柱,它存在一个关键问题,即实验者不允许持续监控结果并实时做出决策。许多人认为这种限制是对实时数据分析技术趋势的挫折。最近,直观上更适合实时决策的贝叶斯假设检验作为NHST的可行替代方案引起了越来越多的兴趣。虽然在现有文献和A/B测试社区中,对连续监测设置的NHST的修正已经很好地建立起来,但关于连续监测是否是贝叶斯测试的适当实践的问题,在学术研究人员和全科医生之间都存在争论。本文正式证明了贝叶斯测试在适当停止规则下的有效性,并用具体的仿真实例说明了理论结果。我们指出了常见的不良做法,其中停止规则是不适当的,并讨论了如何客观地学习先验。还提供了研究人员和从业人员的一般指导方针。
{"title":"Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing","authors":"Alex Deng, Jiannan Lu, Shouyuan Chen","doi":"10.1109/DSAA.2016.33","DOIUrl":"https://doi.org/10.1109/DSAA.2016.33","url":null,"abstract":"A/B testing is one of the most successful applications of statistical theory in the Internet age. A crucial problem of Null Hypothesis Statistical Testing (NHST), the backbone of A/B testing methodology, is that experimenters are not allowed to continuously monitor the results and make decisions in real time. Many people see this restriction as a setback against the trend in the technology toward real time data analytics. Recently, Bayesian Hypothesis Testing, which intuitively is more suitable for real time decision making, attracted growing interest as a viable alternative to NHST. While corrections of NHST for the continuous monitoring setting are well established in the existing literature and known in A/B testing community, the debate over the issue of whether continuous monitoring is a proper practice in Bayesian testing exists among both academic researchers and general practitioners. In this paper, we formally prove the validity of Bayesian testing under proper stopping rules, and illustrate the theoretical results with concrete simulation illustrations. We point out common bad practices where stopping rules are not proper, and discuss how priors can be learned objectively. General guidelines for researchers and practitioners are also provided.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115212799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Learning Multifaceted Latent Activities from Heterogeneous Mobile Data 从异构移动数据中学习多方面潜在活动
Thanh-Binh Nguyen, Vu Nguyen, Nguyen Cong Thuong, S. Venkatesh, Mohan J. Kumar, Dinh Q. Phung
Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, limited existing methods have addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudentLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.
从异构数据中推断抽象上下文和活动对于上下文感知的泛在应用程序至关重要,但仍然是最具挑战性的问题之一。贝叶斯非参数机器学习的最新进展,特别是基于层次狄利克雷过程(HDP)的主题模型理论,为这些挑战提供了一个优雅的解决方案。然而,有限的现有方法已经解决了从异构数据源(如从移动设备收集的数据)推断潜在的多方面活动和上下文的问题。在本文中,我们将原始的HDP扩展到使用更丰富的基本度量作为产品空间的结构来建模异构数据。该模型被称为产品空间HDP (PS-HDP),能够自然地处理来自多个来源的异构数据,并能原理地识别未知数量的潜在结构。虽然这个框架是通用的,但我们目前的工作主要集中在推断(潜在的)who-when-where同时进行的三重活动,这对应于从收集的身份、地点和时间数据中诱导活动。我们在合成数据和真实数据集(StudentLife数据集)上演示了我们的模型。我们报告结果,并对发现的活动和模式进行分析,以证明该模型的优点。我们还使用包括f1评分、NMI、RI、纯度在内的标准指标定量评估PS-HDP模型的性能,并将其与已知的现有基线方法进行比较。
{"title":"Learning Multifaceted Latent Activities from Heterogeneous Mobile Data","authors":"Thanh-Binh Nguyen, Vu Nguyen, Nguyen Cong Thuong, S. Venkatesh, Mohan J. Kumar, Dinh Q. Phung","doi":"10.1109/DSAA.2016.48","DOIUrl":"https://doi.org/10.1109/DSAA.2016.48","url":null,"abstract":"Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, limited existing methods have addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudentLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1965 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127482525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1