{"title":"Session details: Session 9: Recommendation","authors":"M. Zhang","doi":"10.1145/3310349","DOIUrl":"https://doi.org/10.1145/3310349","url":null,"abstract":"","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115406590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longqi Yang, Yu Wang, D. Dunne, Michael Sobolev, Mor Naaman, D. Estrin
Recent years have witnessed the flourishing of podcasts, a unique type of audio medium. Prior work on podcast content modeling focused on analyzing Automatic Speech Recognition outputs, which ignored vocal, musical, and conversational properties (e.g., energy, humor, and creativity) that uniquely characterize this medium. In this paper, we present an Adversarial Learning-based Podcast Representation (ALPR) that captures non-textual aspects of podcasts. Through extensive experiments on a large-scale podcast dataset (88,728 episodes from 18,433 channels), we show that (1) ALPR significantly outperforms the state-of-the-art features developed for music and speech in predicting theseriousness andenergy of podcasts, and (2) incorporating ALPR significantly improves the performance of topic-based podcast-popularity prediction. Our experiments also reveal factors that correlate with podcast popularity.
{"title":"More Than Just Words: Modeling Non-Textual Characteristics of Podcasts","authors":"Longqi Yang, Yu Wang, D. Dunne, Michael Sobolev, Mor Naaman, D. Estrin","doi":"10.1145/3289600.3290993","DOIUrl":"https://doi.org/10.1145/3289600.3290993","url":null,"abstract":"Recent years have witnessed the flourishing of podcasts, a unique type of audio medium. Prior work on podcast content modeling focused on analyzing Automatic Speech Recognition outputs, which ignored vocal, musical, and conversational properties (e.g., energy, humor, and creativity) that uniquely characterize this medium. In this paper, we present an Adversarial Learning-based Podcast Representation (ALPR) that captures non-textual aspects of podcasts. Through extensive experiments on a large-scale podcast dataset (88,728 episodes from 18,433 channels), we show that (1) ALPR significantly outperforms the state-of-the-art features developed for music and speech in predicting theseriousness andenergy of podcasts, and (2) incorporating ALPR significantly improves the performance of topic-based podcast-popularity prediction. Our experiments also reveal factors that correlate with podcast popularity.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Bird, K. Kenthapadi, Emre Kıcıman, Margaret Mitchell
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial aims to present an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness-first" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice, by presenting case studies from different technology companies. Based on our experiences in industry, we will identify open problems and research challenges for the data mining / machine learning community.
{"title":"Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned","authors":"Sarah Bird, K. Kenthapadi, Emre Kıcıman, Margaret Mitchell","doi":"10.1145/3289600.3291383","DOIUrl":"https://doi.org/10.1145/3289600.3291383","url":null,"abstract":"Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial aims to present an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a \"fairness-first\" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice, by presenting case studies from different technology companies. Based on our experiences in industry, we will identify open problems and research challenges for the data mining / machine learning community.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125607010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Hassan Awadallah, C. Gurrin, M. Sanderson, Ryen W. White
The task intelligence workshop at the 2019 ACM Web Search and Data Mining (WSDM) conference comprised a mixture of research paper presentations, reports from data challenge participants, invited keynote(s) on broad topics related to tasks, and a workshop-wide discussion about task intelligence and its implications for system development.
{"title":"Task Intelligence Workshop @ WSDM 2019","authors":"Ahmed Hassan Awadallah, C. Gurrin, M. Sanderson, Ryen W. White","doi":"10.1145/3289600.3291374","DOIUrl":"https://doi.org/10.1145/3289600.3291374","url":null,"abstract":"The task intelligence workshop at the 2019 ACM Web Search and Data Mining (WSDM) conference comprised a mixture of research paper presentations, reports from data challenge participants, invited keynote(s) on broad topics related to tasks, and a workshop-wide discussion about task intelligence and its implications for system development.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122015362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Designing desirable and aesthetical manifestation of web graphic user interfaces (GUI) is a challenging task for web developers. After determining a web page's content, developers usually refer to existing pages, and adapt the styles from desired pages into the target one. However, it is not only difficult to find appropriate pages to exhibit the target page's content, but also tedious to incorporate styles from different pages harmoniously in the target page. To tackle these two issues, we propose FaceOff, a data-driven automation system that assists the manifestation design of web GUI. FaceOff constructs a repository of web GUI templates based on 15,491 web pages from popular websites and professional design examples. Given a web page for designing manifestation, FaceOff first segments it into multiple blocks, and retrieves GUI templates in the repository for each block. Subsequently, FaceOff recommends multiple combinations of templates according to a Convolutional Neural Network (CNN) based style-embedding model, which makes the recommended style combinations diverse and accordant. We demonstrate that FaceOff can retrieve suitable GUI templates with well-designed and harmonious style, and thus alleviate the developer efforts.
{"title":"FaceOff: Assisting the Manifestation Design of Web Graphical User Interface","authors":"Shuyu Zheng, Ziniu Hu, Yun Ma","doi":"10.1145/3289600.3290610","DOIUrl":"https://doi.org/10.1145/3289600.3290610","url":null,"abstract":"Designing desirable and aesthetical manifestation of web graphic user interfaces (GUI) is a challenging task for web developers. After determining a web page's content, developers usually refer to existing pages, and adapt the styles from desired pages into the target one. However, it is not only difficult to find appropriate pages to exhibit the target page's content, but also tedious to incorporate styles from different pages harmoniously in the target page. To tackle these two issues, we propose FaceOff, a data-driven automation system that assists the manifestation design of web GUI. FaceOff constructs a repository of web GUI templates based on 15,491 web pages from popular websites and professional design examples. Given a web page for designing manifestation, FaceOff first segments it into multiple blocks, and retrieves GUI templates in the repository for each block. Subsequently, FaceOff recommends multiple combinations of templates according to a Convolutional Neural Network (CNN) based style-embedding model, which makes the recommended style combinations diverse and accordant. We demonstrate that FaceOff can retrieve suitable GUI templates with well-designed and harmonious style, and thus alleviate the developer efforts.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131277420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 4: FATE & Privacy","authors":"Fernando Diaz","doi":"10.1145/3310344","DOIUrl":"https://doi.org/10.1145/3310344","url":null,"abstract":"","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131875490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen Zhang, B. Paudel, Wei Zhang, A. Bernstein, Huajun Chen
Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions -- bi-directional effects between entities and relations --- help select related information when predicting a new triple, but haven't been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective -- giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions.
{"title":"Interaction Embeddings for Prediction and Explanation in Knowledge Graphs","authors":"Wen Zhang, B. Paudel, Wei Zhang, A. Bernstein, Huajun Chen","doi":"10.1145/3289600.3291014","DOIUrl":"https://doi.org/10.1145/3289600.3291014","url":null,"abstract":"Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions -- bi-directional effects between entities and relations --- help select related information when predicting a new triple, but haven't been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective -- giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130679556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The phenomenon of edge clustering in real-world networks is a fundamental property underlying many ideas and techniques in network science. Clustering is typically quantified by the clustering coefficient, which measures the fraction of pairs of neighbors of a given center node that are connected. However, many common explanations of edge clustering attribute the triadic closure to a head node instead of the center node of a length-2 path; for example, a friend of my friend is also my friend. While such explanations are common in network analysis, there is no measurement for edge clustering that can be attributed to the head node. Here we develop local closure coefficients as a metric quantifying head-node-based edge clustering. We define the local closure coefficient as the fraction of length-2 paths emanating from the head node that induce a triangle. This subtle difference in definition leads to remarkably different properties from traditional clustering coefficients. We analyze correlations with node degree, connect the closure coefficient to community detection, and show that closure coefficients as a feature can improve link prediction.
{"title":"The Local Closure Coefficient: A New Perspective On Network Clustering","authors":"Hao Yin, Austin R. Benson, J. Leskovec","doi":"10.1145/3289600.3290991","DOIUrl":"https://doi.org/10.1145/3289600.3290991","url":null,"abstract":"The phenomenon of edge clustering in real-world networks is a fundamental property underlying many ideas and techniques in network science. Clustering is typically quantified by the clustering coefficient, which measures the fraction of pairs of neighbors of a given center node that are connected. However, many common explanations of edge clustering attribute the triadic closure to a head node instead of the center node of a length-2 path; for example, a friend of my friend is also my friend. While such explanations are common in network analysis, there is no measurement for edge clustering that can be attributed to the head node. Here we develop local closure coefficients as a metric quantifying head-node-based edge clustering. We define the local closure coefficient as the fraction of length-2 paths emanating from the head node that induce a triangle. This subtle difference in definition leads to remarkably different properties from traditional clustering coefficients. We analyze correlations with node degree, connect the closure coefficient to community detection, and show that closure coefficients as a feature can improve link prediction.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134444017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Comparative summarization is an effective strategy to discover important similarities and differences in collections of documents biased to users' interests. A natural method of this task is to find important and corresponding content. In this paper, we propose a novel research task of automatic query-based across-time summarization in news archives as well as we introduce an effective method to solve this task. The proposed model first learns an orthogonal transformation between temporally distant news collections. Then, it generates a set of corresponding sentence pairs based on a concise integer linear programming framework. We experimentally demonstrate the effectiveness of our method on the New York Times Annotated Corpus.
{"title":"Across-Time Comparative Summarization of News Articles","authors":"Yijun Duan, A. Jatowt","doi":"10.1145/3289600.3291008","DOIUrl":"https://doi.org/10.1145/3289600.3291008","url":null,"abstract":"Comparative summarization is an effective strategy to discover important similarities and differences in collections of documents biased to users' interests. A natural method of this task is to find important and corresponding content. In this paper, we propose a novel research task of automatic query-based across-time summarization in news archives as well as we introduce an effective method to solve this task. The proposed model first learns an orthogonal transformation between temporally distant news collections. Then, it generates a set of corresponding sentence pairs based on a concise integer linear programming framework. We experimentally demonstrate the effectiveness of our method on the New York Times Annotated Corpus.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133574638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, Rui Yan
We consider context-response matching with multiple types of representations for multi-turn response selection in retrieval-based chatbots. The representations encode semantics of contexts and responses on words, n-grams, and sub-sequences of utterances, and capture both short-term and long-term dependencies among words. With such a number of representations in hand, we study how to fuse them in a deep neural architecture for matching and how each of them contributes to matching. To this end, we propose a multi-representation fusion network where the representations can be fused into matching at an early stage, at an intermediate stage, or at the last stage. We empirically compare different representations and fusing strategies on two benchmark data sets. Evaluation results indicate that late fusion is always better than early fusion, and by fusing the representations at the last stage, our model significantly outperforms the existing methods, and achieves new state-of-the-art performance on both data sets. Through a thorough ablation study, we demonstrate the effect of each representation to matching, which sheds light on how to select them in practical systems.
{"title":"Multi-Representation Fusion Network for Multi-Turn Response Selection in Retrieval-Based Chatbots","authors":"Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, Rui Yan","doi":"10.1145/3289600.3290985","DOIUrl":"https://doi.org/10.1145/3289600.3290985","url":null,"abstract":"We consider context-response matching with multiple types of representations for multi-turn response selection in retrieval-based chatbots. The representations encode semantics of contexts and responses on words, n-grams, and sub-sequences of utterances, and capture both short-term and long-term dependencies among words. With such a number of representations in hand, we study how to fuse them in a deep neural architecture for matching and how each of them contributes to matching. To this end, we propose a multi-representation fusion network where the representations can be fused into matching at an early stage, at an intermediate stage, or at the last stage. We empirically compare different representations and fusing strategies on two benchmark data sets. Evaluation results indicate that late fusion is always better than early fusion, and by fusing the representations at the last stage, our model significantly outperforms the existing methods, and achieves new state-of-the-art performance on both data sets. Through a thorough ablation study, we demonstrate the effect of each representation to matching, which sheds light on how to select them in practical systems.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133845865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}