{"title":"KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022","authors":"","doi":"10.1145/3534678","DOIUrl":"https://doi.org/10.1145/3534678","url":null,"abstract":"","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74993252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021","authors":"","doi":"10.1145/3447548","DOIUrl":"https://doi.org/10.1145/3447548","url":null,"abstract":"","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78896566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predictive parity and error rate balance are both widely accepted and adopted criteria for assessing fairness of classifiers. The realization that these equally reasonable criteria can lead to contradictory results has, nonetheless, generated a lot of debate/controversy, and has motivated the development of mathematical results establishing the impossibility of concomitantly satisfying predictive parity and error rate balance. Here, we investigate these fairness criteria from a causality perspective. By taking into consideration the data generation process giving rise to the observed data, as well as, the data generation process giving rise to the predictions, and assuming faithfulness, we prove that when the base rates differ across the protected groups and there is no perfect separation, then a standard classifier cannot achieve exact predictive parity. (Where, by standard classifier we mean a classifier trained in the usual way, without adopting pre-processing, in-processing, or post-processing fairness techniques.) This result holds in general, irrespective of the data generation process giving rise to the observed data. Furthermore, we show that the amount of disparate mistreatment for the positive predictive value metric is proportional to the difference between the base rates. For the error rate balance, as well as, the closely related equalized odds and equality of opportunity criteria, we show that there are, nonetheless, data generation processes that can still satisfy these criteria when the base rates differ by protected group, and we characterize the conditions under which these criteria hold. We illustrate our results using synthetic data, and with the re-analysis of the COMPAS data.
{"title":"A Causal Look at Statistical Definitions of Discrimination","authors":"E. C. Neto","doi":"10.1145/3394486.3403130","DOIUrl":"https://doi.org/10.1145/3394486.3403130","url":null,"abstract":"Predictive parity and error rate balance are both widely accepted and adopted criteria for assessing fairness of classifiers. The realization that these equally reasonable criteria can lead to contradictory results has, nonetheless, generated a lot of debate/controversy, and has motivated the development of mathematical results establishing the impossibility of concomitantly satisfying predictive parity and error rate balance. Here, we investigate these fairness criteria from a causality perspective. By taking into consideration the data generation process giving rise to the observed data, as well as, the data generation process giving rise to the predictions, and assuming faithfulness, we prove that when the base rates differ across the protected groups and there is no perfect separation, then a standard classifier cannot achieve exact predictive parity. (Where, by standard classifier we mean a classifier trained in the usual way, without adopting pre-processing, in-processing, or post-processing fairness techniques.) This result holds in general, irrespective of the data generation process giving rise to the observed data. Furthermore, we show that the amount of disparate mistreatment for the positive predictive value metric is proportional to the difference between the base rates. For the error rate balance, as well as, the closely related equalized odds and equality of opportunity criteria, we show that there are, nonetheless, data generation processes that can still satisfy these criteria when the base rates differ by protected group, and we characterize the conditions under which these criteria hold. We illustrate our results using synthetic data, and with the re-analysis of the COMPAS data.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"175 1","pages":"873-881"},"PeriodicalIF":0.0,"publicationDate":"2020-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79765269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As data science research continues to expand into a variety of applied fields, the need for talented and diverse individuals has been widely acknowledged. Despite this acknowledgement, data science lags behind other STEM disciplines in achieving a diverse workforce. Through work we have undertaken in the past as part of the Broadening Participation in Data Mining workshop (BPDM) and our work with ACM SIGKDD, we seek to build a better workforce that is positioned to address the data science problems of the next hundred years. A significant barrier to trainee long-term career success is their limited ability of underrepresented trainees to demonstrate their analytical abilities and sophisticated inferential talents to address key data issues in our community. In this talk we will present an overview of the goals of the Diversity and Inclusion track and share our vision for how we bridge this diversity divide that our society and our data science workforce needs right now. We are interested in how diversity is encountered across ethnic, gender, and ability identities. To this end we have prepared an exciting new program activities to facilitate broader conversations in the data science field that cover not only technical ideas but innovative thinking in what the future of data science can look like if we diversify the group of contributors and enlarge those included.
{"title":"Bringing Inclusive Diversity to Data Science: Opportunities and Challenges","authors":"Heriberto Acosta Maestre","doi":"10.1145/3394486.3411076","DOIUrl":"https://doi.org/10.1145/3394486.3411076","url":null,"abstract":"As data science research continues to expand into a variety of applied fields, the need for talented and diverse individuals has been widely acknowledged. Despite this acknowledgement, data science lags behind other STEM disciplines in achieving a diverse workforce. Through work we have undertaken in the past as part of the Broadening Participation in Data Mining workshop (BPDM) and our work with ACM SIGKDD, we seek to build a better workforce that is positioned to address the data science problems of the next hundred years. A significant barrier to trainee long-term career success is their limited ability of underrepresented trainees to demonstrate their analytical abilities and sophisticated inferential talents to address key data issues in our community. In this talk we will present an overview of the goals of the Diversity and Inclusion track and share our vision for how we bridge this diversity divide that our society and our data science workforce needs right now. We are interested in how diversity is encountered across ethnic, gender, and ability identities. To this end we have prepared an exciting new program activities to facilitate broader conversations in the data science field that cover not only technical ideas but innovative thinking in what the future of data science can look like if we diversify the group of contributors and enlarge those included.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"8 1","pages":"3596"},"PeriodicalIF":0.0,"publicationDate":"2020-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76480350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Representation of Hispanics, especially Hispanic women, is notoriously low in data science programs in higher education and in the tech industry. The engagement of undergraduate students in research, often and early in their path towards degree completion, has been championed as one of the principal reforms necessary to increase the number of capable professionals in STEM. The benefits attributed to undergraduate research experiences have been reported to disproportionately benefit individuals from groups that have been historically underrepresented in STEM. The IDI-BD2K (Increasing Diversity in Interdisciplinary Big Data to Knowledge) Program funded by the NIH at the University of Puerto Rico Río Piedras (UPRRP) was designed to bridge the increasing digital and data divide at the university. The college's population is 98 percent Hispanic and yet there is no formal data science program. There also exists a gender imbalance in computing at the College of Natural Sciences at the UPRRP. Over 60 percent of the undergraduate students in Biology are women. However, the percentage of women in Computer Science hovers around 15 percent. The IDI-BD2K was created to address both these concerns and increase the participation of Hispanics in interdisciplinary computational and quantitative research. In this talk, I will highlight the need for mutually beneficial university collaborations to reduce the digital and data divide, create greater awareness of the growing disparities and increase the number of future faculty with experience teaching diverse students.
{"title":"Mutually Beneficial Collaborations to Broaden Participation of Hispanics in Data Science","authors":"Patricia Ordóñez Franco","doi":"10.1145/3394486.3411075","DOIUrl":"https://doi.org/10.1145/3394486.3411075","url":null,"abstract":"Representation of Hispanics, especially Hispanic women, is notoriously low in data science programs in higher education and in the tech industry. The engagement of undergraduate students in research, often and early in their path towards degree completion, has been championed as one of the principal reforms necessary to increase the number of capable professionals in STEM. The benefits attributed to undergraduate research experiences have been reported to disproportionately benefit individuals from groups that have been historically underrepresented in STEM. The IDI-BD2K (Increasing Diversity in Interdisciplinary Big Data to Knowledge) Program funded by the NIH at the University of Puerto Rico Río Piedras (UPRRP) was designed to bridge the increasing digital and data divide at the university. The college's population is 98 percent Hispanic and yet there is no formal data science program. There also exists a gender imbalance in computing at the College of Natural Sciences at the UPRRP. Over 60 percent of the undergraduate students in Biology are women. However, the percentage of women in Computer Science hovers around 15 percent. The IDI-BD2K was created to address both these concerns and increase the participation of Hispanics in interdisciplinary computational and quantitative research. In this talk, I will highlight the need for mutually beneficial university collaborations to reduce the digital and data divide, create greater awareness of the growing disparities and increase the number of future faculty with experience teaching diverse students.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"33 1","pages":"3594-3595"},"PeriodicalIF":0.0,"publicationDate":"2020-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75171392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020","authors":"","doi":"10.1145/3394486","DOIUrl":"https://doi.org/10.1145/3394486","url":null,"abstract":"","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87663978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diversity and Inclusion, a Perspective from a Four Years MSI Faculty Member","authors":"Eliana Valenzuela Andrade","doi":"10.1145/3394486.3411064","DOIUrl":"https://doi.org/10.1145/3394486.3411064","url":null,"abstract":"","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"22 1","pages":"3582"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78513103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Industry has always leveraged cutting edge quantitative research techniques. From finance and insurance, to marketing and manufacturing, efficiencies and advantages have been seized through measurement, prediction, and the generation of insights' but never at this scale. Organizations which previously may have employed one or two data scientists are now scaling the work to dozens if not hundreds of practitioners. Where previously only a handful of organizations could boast that they were leveraging machine learning and statistical models, now it's a rarity to find an untouched industry or player. Organizations are now faced with the challenges of empowering, scaling, and measuring this workforce to sustain the transformation to the prediction economy. In this talk, I will discuss how and why we built the Domino Data Lab platform. I will talk about the challenges we faced technologically, organizationally and culturally when bringing a system of record to data science.
工业界总是利用尖端的定量研究技术。从金融和保险,到营销和制造业,效率和优势已经通过测量、预测和产生见解而获得,但从未达到如此规模。以前可能只雇用一两个数据科学家的组织现在正在将工作扩展到几十个,如果不是几百个的话。以前只有少数组织可以吹嘘他们利用了机器学习和统计模型,现在很少能找到一个没有接触过的行业或参与者。组织现在面临着授权、扩展和衡量这些劳动力的挑战,以维持向预测经济的转变。在本次演讲中,我将讨论如何以及为什么构建Domino Data Lab平台。我将讨论我们在将记录系统引入数据科学时所面临的技术、组织和文化方面的挑战。
{"title":"More than the Sum of its Parts: Building Domino Data Lab","authors":"Eduardo Ariño de la Rubia","doi":"10.1145/3097983.3106682","DOIUrl":"https://doi.org/10.1145/3097983.3106682","url":null,"abstract":"Industry has always leveraged cutting edge quantitative research techniques. From finance and insurance, to marketing and manufacturing, efficiencies and advantages have been seized through measurement, prediction, and the generation of insights' but never at this scale. Organizations which previously may have employed one or two data scientists are now scaling the work to dozens if not hundreds of practitioners. Where previously only a handful of organizations could boast that they were leveraging machine learning and statistical models, now it's a rarity to find an untouched industry or player. Organizations are now faced with the challenges of empowering, scaling, and measuring this workforce to sustain the transformation to the prediction economy. In this talk, I will discuss how and why we built the Domino Data Lab platform. I will talk about the challenges we faced technologically, organizationally and culturally when bringing a system of record to data science.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"29 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2017-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75333349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep neural network representations play an important role in computer vision, speech, computational linguistics, robotics, reinforcement learning and many other data-rich domains. In this talk I will show that learning-to-learn and compositionality are key ingredients for dealing with knowledge transfer so as to solve a wide range of tasks, for dealing with small-data regimes, and for continual learning. I will demonstrate this with several examples from my research team: learning to learn by gradient descent by gradient descent, neural programmers and interpreters, and learning communication.
{"title":"Learning to Learn and Compositionality with Deep Recurrent Neural Networks: Learning to Learn and Compositionality","authors":"Nando de Freitas","doi":"10.1145/2939672.2945358","DOIUrl":"https://doi.org/10.1145/2939672.2945358","url":null,"abstract":"Deep neural network representations play an important role in computer vision, speech, computational linguistics, robotics, reinforcement learning and many other data-rich domains. In this talk I will show that learning-to-learn and compositionality are key ingredients for dealing with knowledge transfer so as to solve a wide range of tasks, for dealing with small-data regimes, and for continual learning. I will demonstrate this with several examples from my research team: learning to learn by gradient descent by gradient descent, neural programmers and interpreters, and learning communication.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"25 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2016-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78224889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid prevalence of smart mobile devices, the number of mobile Apps available has exploded over the past few years. To facilitate the choice of mobile Apps, existing mobile App recommender systems typically recommend popular mobile Apps to mobile users. However, mobile Apps are highly varied and often poorly understood, particularly for their activities and functions related to privacy and security. Therefore, more and more mobile users are reluctant to adopt mobile Apps due to the risk of privacy invasion and other security concerns. To fill this crucial void, in this paper, we propose to develop a mobile App recommender system with privacy and security awareness. The design goal is to equip the recommender system with the functionality which allows to automatically detect and evaluate the security risk of mobile Apps. Then, the recommender system can provide App recommendations by considering both the Apps' popularity and the users' security preferences. Specifically, a mobile App can lead to security risk because insecure data access permissions have been implemented in this App. Therefore, we first develop the techniques to automatically detect the potential security risk for each mobile App by exploiting the requested permissions. Then, we propose a flexible approach based on modern portfolio theory for recommending Apps by striking a balance between the Apps' popularity and the users' security concerns, and build an App hash tree to efficiently recommend Apps. Finally, we evaluate our approach with extensive experiments on a large-scale data set collected from Google Play. The experimental results clearly validate the effectiveness of our approach.
{"title":"Mobile app recommendations with security and privacy awareness","authors":"Hengshu Zhu, Hui Xiong, Yong Ge, Enhong Chen","doi":"10.1145/2623330.2623705","DOIUrl":"https://doi.org/10.1145/2623330.2623705","url":null,"abstract":"With the rapid prevalence of smart mobile devices, the number of mobile Apps available has exploded over the past few years. To facilitate the choice of mobile Apps, existing mobile App recommender systems typically recommend popular mobile Apps to mobile users. However, mobile Apps are highly varied and often poorly understood, particularly for their activities and functions related to privacy and security. Therefore, more and more mobile users are reluctant to adopt mobile Apps due to the risk of privacy invasion and other security concerns. To fill this crucial void, in this paper, we propose to develop a mobile App recommender system with privacy and security awareness. The design goal is to equip the recommender system with the functionality which allows to automatically detect and evaluate the security risk of mobile Apps. Then, the recommender system can provide App recommendations by considering both the Apps' popularity and the users' security preferences. Specifically, a mobile App can lead to security risk because insecure data access permissions have been implemented in this App. Therefore, we first develop the techniques to automatically detect the potential security risk for each mobile App by exploiting the requested permissions. Then, we propose a flexible approach based on modern portfolio theory for recommending Apps by striking a balance between the Apps' popularity and the users' security concerns, and build an App hash tree to efficiently recommend Apps. Finally, we evaluate our approach with extensive experiments on a large-scale data set collected from Google Play. The experimental results clearly validate the effectiveness of our approach.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"49 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72749345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}