Reducing healthcare costs and improving quality of outcomes is a challenge even in developed economies. Much information remains in paper form, lack common standards, sharing is uncommon and frequently hampered by the lack of foolproof de-identification for patient privacy. All of these issues impede opportunities for data mining and analysis that would enable better predictive and preventive medicine. These issues are compounded in emerging economies due to geopolitical constraints, transportation and geographic barriers, a much more limited clinical workforce, and infrastructural challenges to delivery. Thus, simple, high-impact deliverable interventions such as universal childhood immunization and maternal childcare are hampered by poor monitoring and reporting systems. This workshop is focused on identifying challenges to be overcome for effectively delivering efficient healthcare and to the masses. Specifically, we will provide a forum to discuss research directions, share experience and insights from both academia and industry. The anticipated outcome of the workshop is an assessment of the state of the art in the area, as well as identification of critical next steps to pursue in this topic.
{"title":"Data management & analytics for healthcare (DARE 2013)","authors":"Ullas Nambiar, T. Niranjan","doi":"10.1145/2505515.2505820","DOIUrl":"https://doi.org/10.1145/2505515.2505820","url":null,"abstract":"Reducing healthcare costs and improving quality of outcomes is a challenge even in developed economies. Much information remains in paper form, lack common standards, sharing is uncommon and frequently hampered by the lack of foolproof de-identification for patient privacy. All of these issues impede opportunities for data mining and analysis that would enable better predictive and preventive medicine. These issues are compounded in emerging economies due to geopolitical constraints, transportation and geographic barriers, a much more limited clinical workforce, and infrastructural challenges to delivery. Thus, simple, high-impact deliverable interventions such as universal childhood immunization and maternal childcare are hampered by poor monitoring and reporting systems. This workshop is focused on identifying challenges to be overcome for effectively delivering efficient healthcare and to the masses. Specifically, we will provide a forum to discuss research directions, share experience and insights from both academia and industry. The anticipated outcome of the workshop is an assessment of the state of the art in the area, as well as identification of critical next steps to pursue in this topic.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79167363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.
{"title":"A generic front-stage for semi-stream processing","authors":"M. Naeem, Gerald Weber, G. Dobbie, C. Lutteroth","doi":"10.1145/2505515.2505734","DOIUrl":"https://doi.org/10.1145/2505515.2505734","url":null,"abstract":"Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81589192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tamara Martín-Wanton, Julio Gonzalo, Enrique Amigó
Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled "background" tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.
{"title":"An unsupervised transfer learning approach to discover topics for online reputation management","authors":"Tamara Martín-Wanton, Julio Gonzalo, Enrique Amigó","doi":"10.1145/2505515.2507845","DOIUrl":"https://doi.org/10.1145/2505515.2507845","url":null,"abstract":"Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled \"background\" tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84409689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we approach the problem of real-time filtering in the Twitter Microblogging platform. We adapt an effective traditional news filtering technique, which uses a text classifier inspired by Rocchio's relevance feedback algorithm, to build and dynamically update a profile of the user's interests in real-time. In our adaptation, we tackle two challenges that are particularly prevalent in Twitter: sparsity and drift. In particular, sparsity stems from the brevity of tweets, while drift occurs as events related to the topic develop or the interests of the user change. First, to tackle the acute sparsity problem, we apply query expansion to derive terms or related tweets for a richer initialisation of the user interests within the profile. Second, to deal with drift, we modify the user profile to balance between the importance of the short-term interests, i.e. emerging subtopics, and the long-term interests in the overall topic. Moreover, we investigate an event detection method from Twitter and newswire streams to predict times at which drift may happen. Through experiments using the TREC Microblog track 2012, we show that our approach is effective for a number of common filtering metrics such as the user's utility, and that it compares favourably with state-of-the-art news filtering baselines. Our results also uncover the impact of different factors on handling topic drifting.
{"title":"On sparsity and drift for effective real-time filtering in microblogs","authors":"M. Albakour, C. Macdonald, I. Ounis","doi":"10.1145/2505515.2505709","DOIUrl":"https://doi.org/10.1145/2505515.2505709","url":null,"abstract":"In this paper, we approach the problem of real-time filtering in the Twitter Microblogging platform. We adapt an effective traditional news filtering technique, which uses a text classifier inspired by Rocchio's relevance feedback algorithm, to build and dynamically update a profile of the user's interests in real-time. In our adaptation, we tackle two challenges that are particularly prevalent in Twitter: sparsity and drift. In particular, sparsity stems from the brevity of tweets, while drift occurs as events related to the topic develop or the interests of the user change. First, to tackle the acute sparsity problem, we apply query expansion to derive terms or related tweets for a richer initialisation of the user interests within the profile. Second, to deal with drift, we modify the user profile to balance between the importance of the short-term interests, i.e. emerging subtopics, and the long-term interests in the overall topic. Moreover, we investigate an event detection method from Twitter and newswire streams to predict times at which drift may happen. Through experiments using the TREC Microblog track 2012, we show that our approach is effective for a number of common filtering metrics such as the user's utility, and that it compares favourably with state-of-the-art news filtering baselines. Our results also uncover the impact of different factors on handling topic drifting.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84438900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Much of the conversation on "big data" is centered on data technologies and analytics platforms and how established companies apply them. While those technologies and platforms are certainly very important for industry incumbents, data analytics is also often a key building block for new start-up entrants looking to disrupt industry verticals. In many cases, the best examples of novel applications of data to create new services and competitive advantage require a complete rethinking of organizational design in order to create feedback loops and rethink cost structures. The company I founded, SignalFire is applying data for competitive advantage in my own industry, venture capital, but there are myriad examples of this trend across industries such as transportation, financial services, retail, media and many other markets. In this talk, I will discuss how we analyze these trends as venture capitalists and will look at a few case studies of specific companies leveraging data to innovate in their industries.
{"title":"Leveraging data to change industry paradigms","authors":"C. Farmer","doi":"10.1145/2505515.2514694","DOIUrl":"https://doi.org/10.1145/2505515.2514694","url":null,"abstract":"Much of the conversation on \"big data\" is centered on data technologies and analytics platforms and how established companies apply them. While those technologies and platforms are certainly very important for industry incumbents, data analytics is also often a key building block for new start-up entrants looking to disrupt industry verticals. In many cases, the best examples of novel applications of data to create new services and competitive advantage require a complete rethinking of organizational design in order to create feedback loops and rethink cost structures. The company I founded, SignalFire is applying data for competitive advantage in my own industry, venture capital, but there are myriad examples of this trend across industries such as transportation, financial services, retail, media and many other markets. In this talk, I will discuss how we analyze these trends as venture capitalists and will look at a few case studies of specific companies leveraging data to innovate in their industries.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84558269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Location-based social networks (LBSNs) offer researchers rich data to study people's online activities and mobility patterns. One important application of such studies is to provide personalized point-of-interest (POI) recommendations to enhance user experience in LBSNs. Previous solutions directly predict users' preference on locations but fail to provide insights about users' preference transitions among locations. In this work, we propose a novel category-aware POI recommendation model, which exploits the transition patterns of users' preference over location categories to improve location recommendation accuracy. Our approach consists of two stages: (1) preference transition (over location categories) prediction, and (2) category-aware POI recommendation. Matrix factorization is employed to predict a user's preference transitions over categories and then her preference on locations in the corresponding categories. Real data based experiments demonstrate that our approach outperforms the state-of-the-art POI recommendation models by at least 39.75% in terms of recall.
{"title":"Personalized point-of-interest recommendation by mining users' preference transition","authors":"Xin Liu, Yong Liu, K. Aberer, C. Miao","doi":"10.1145/2505515.2505639","DOIUrl":"https://doi.org/10.1145/2505515.2505639","url":null,"abstract":"Location-based social networks (LBSNs) offer researchers rich data to study people's online activities and mobility patterns. One important application of such studies is to provide personalized point-of-interest (POI) recommendations to enhance user experience in LBSNs. Previous solutions directly predict users' preference on locations but fail to provide insights about users' preference transitions among locations. In this work, we propose a novel category-aware POI recommendation model, which exploits the transition patterns of users' preference over location categories to improve location recommendation accuracy. Our approach consists of two stages: (1) preference transition (over location categories) prediction, and (2) category-aware POI recommendation. Matrix factorization is employed to predict a user's preference transitions over categories and then her preference on locations in the corresponding categories. Real data based experiments demonstrate that our approach outperforms the state-of-the-art POI recommendation models by at least 39.75% in terms of recall.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84957605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Keller, Emmanuel Müller, Andreas Wixler, Klemens Böhm
There exists a variety of traditional outlier models, which measure the deviation of outliers with respect to the full attribute space. However, these techniques fail to detect outliers that deviate only w.r.t. an attribute subset. To address this problem, recent techniques focus on a selection of subspaces that allow: (1) A clear distinction between clustered objects and outliers; (2) a description of outlier reasons by the selected subspaces. However, depending on the outlier model used, different objects in different subspaces have the highest deviation. It is an open research issue to make subspace selection adaptive to the outlier score of each object and flexible w.r.t. the use of different outlier models. In this work we propose such a flexible and adaptive subspace selection scheme. Our generic processing allows instantiations with different outlier models. We utilize the differences of outlier scores in random subspaces to perform a combinatorial refinement of relevant subspaces. Our refinement allows an individual selection of subspaces for each outlier, which is tailored to the underlying outlier model. In the experiments we show the flexibility of our subspace search w.r.t. various outlier models such as distance-based, angle-based, and local-density-based outlier detection.
{"title":"Flexible and adaptive subspace search for outlier analysis","authors":"F. Keller, Emmanuel Müller, Andreas Wixler, Klemens Böhm","doi":"10.1145/2505515.2505560","DOIUrl":"https://doi.org/10.1145/2505515.2505560","url":null,"abstract":"There exists a variety of traditional outlier models, which measure the deviation of outliers with respect to the full attribute space. However, these techniques fail to detect outliers that deviate only w.r.t. an attribute subset. To address this problem, recent techniques focus on a selection of subspaces that allow: (1) A clear distinction between clustered objects and outliers; (2) a description of outlier reasons by the selected subspaces. However, depending on the outlier model used, different objects in different subspaces have the highest deviation. It is an open research issue to make subspace selection adaptive to the outlier score of each object and flexible w.r.t. the use of different outlier models. In this work we propose such a flexible and adaptive subspace selection scheme. Our generic processing allows instantiations with different outlier models. We utilize the differences of outlier scores in random subspaces to perform a combinatorial refinement of relevant subspaces. Our refinement allows an individual selection of subspaces for each outlier, which is tailored to the underlying outlier model. In the experiments we show the flexibility of our subspace search w.r.t. various outlier models such as distance-based, angle-based, and local-density-based outlier detection.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85941255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenhao Tan, Ed H. Chi, David A. Huffaker, Gueorgi Kossinets, Alex Smola
Consumer review sites and recommender systems typically rely on a large volume of user-contributed ratings, which makes rating acquisition an essential component in the design of such systems. User ratings are then summarized to provide an aggregate score representing a popular evaluation of an item. An inherent problem in such summarization is potential bias due to raters self-selection and heterogeneity in terms of experience, tastes and rating scale interpretation. There are two major approaches to collecting ratings, which have different advantages and disadvantages. One is to allow a large number of volunteers to choose and rate items directly (a method employed by e.g. Yelp and Google Places). Alternatively, a panel of raters may be maintained and invited to rate a predefined set of items at regular intervals (such as in Zagat Survey). The latter approach arguably results in more consistent reviews and reduced selection bias, however, at the expense of much smaller coverage (fewer rated items). In this paper, we examine the two different approaches to collecting user ratings of restaurants and explore the question of whether it is possible to reconcile them. Specifically, we study the problem of inferring the more calibrated Zagat Survey ratings (which we dub 'expert ratings') from the user-generated ratings ('grassroots') in Google Places. To that effect, we employ latent factor models and provide a probabilistic treatment of the ordinal rankings. We can predict Zagat Survey ratings accurately from ad hoc user-generated ratings by joint optimization on two datasets. We analyze the resulting model, and find that users become more discerning as they submit more ratings. We also describe an approach towards cross-city recommendations, answering questions such as 'What is the equivalent of the Per Se restaurant in Chicago'?
{"title":"Instant foodie: predicting expert ratings from grassroots","authors":"Chenhao Tan, Ed H. Chi, David A. Huffaker, Gueorgi Kossinets, Alex Smola","doi":"10.1145/2505515.2505712","DOIUrl":"https://doi.org/10.1145/2505515.2505712","url":null,"abstract":"Consumer review sites and recommender systems typically rely on a large volume of user-contributed ratings, which makes rating acquisition an essential component in the design of such systems. User ratings are then summarized to provide an aggregate score representing a popular evaluation of an item. An inherent problem in such summarization is potential bias due to raters self-selection and heterogeneity in terms of experience, tastes and rating scale interpretation. There are two major approaches to collecting ratings, which have different advantages and disadvantages. One is to allow a large number of volunteers to choose and rate items directly (a method employed by e.g. Yelp and Google Places). Alternatively, a panel of raters may be maintained and invited to rate a predefined set of items at regular intervals (such as in Zagat Survey). The latter approach arguably results in more consistent reviews and reduced selection bias, however, at the expense of much smaller coverage (fewer rated items). In this paper, we examine the two different approaches to collecting user ratings of restaurants and explore the question of whether it is possible to reconcile them. Specifically, we study the problem of inferring the more calibrated Zagat Survey ratings (which we dub 'expert ratings') from the user-generated ratings ('grassroots') in Google Places. To that effect, we employ latent factor models and provide a probabilistic treatment of the ordinal rankings. We can predict Zagat Survey ratings accurately from ad hoc user-generated ratings by joint optimization on two datasets. We analyze the resulting model, and find that users become more discerning as they submit more ratings. We also describe an approach towards cross-city recommendations, answering questions such as 'What is the equivalent of the Per Se restaurant in Chicago'?","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82986033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the text classification problem that a period of time created test data is different from the training data, and present a method for text classification based on temporal adaptation. We first applied lexical chains for the training data to collect terms with semantic relatedness, and created sets (we call these Sem sets). Semantically related terms in the documents are replaced to their representative term. For the results, we identified short terms that are salient for a specific period of time. Finally, we trained SVM classifiers by applying a temporal weighting function to each selected short terms within the training data, and classified test data. Temporal weighting function is weighted each short term in the training data according to the temporal distance between training and test data. The results using MedLine data showed that the method was comparable to the current state-of-the-art biased-SVM method, especially the method is effective when testing on data far from the training data.
{"title":"Timeline adaptation for text classification","authors":"Fumiyo Fukumoto, Yoshimi Suzuki, A. Takasu","doi":"10.1145/2505515.2507833","DOIUrl":"https://doi.org/10.1145/2505515.2507833","url":null,"abstract":"In this paper, we address the text classification problem that a period of time created test data is different from the training data, and present a method for text classification based on temporal adaptation. We first applied lexical chains for the training data to collect terms with semantic relatedness, and created sets (we call these Sem sets). Semantically related terms in the documents are replaced to their representative term. For the results, we identified short terms that are salient for a specific period of time. Finally, we trained SVM classifiers by applying a temporal weighting function to each selected short terms within the training data, and classified test data. Temporal weighting function is weighted each short term in the training data according to the temporal distance between training and test data. The results using MedLine data showed that the method was comparable to the current state-of-the-art biased-SVM method, especially the method is effective when testing on data far from the training data.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"63 12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85397256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aditya Pal, Fei Wang, Michelle X. Zhou, Jeffrey Nichols, Barton A. Smith
An online community consists of a group of users who share a common interest, background, or experience and their collective goal is to contribute towards the welfare of the community members. Question answering is an important feature that enables community members to exchange knowledge within the community boundary. The overwhelming number of communities necessitates the need for a good question routing strategy so that new questions gets routed to the appropriately focused community and thus get resolved. In this paper, we consider the novel problem of routing questions to the right community and propose a framework to select the right set of communities for a question. We begin by using several prior proposed features for users and add some additional features, namely language attributes and inclination to respond, for community modeling. Then we introduce two k nearest neighbor based aggregation algorithms for computing community scores. We show how these scores can be combined to recommend communities and test the effectiveness of the recommendations over a large real world dataset.
{"title":"Question routing to user communities","authors":"Aditya Pal, Fei Wang, Michelle X. Zhou, Jeffrey Nichols, Barton A. Smith","doi":"10.1145/2505515.2505669","DOIUrl":"https://doi.org/10.1145/2505515.2505669","url":null,"abstract":"An online community consists of a group of users who share a common interest, background, or experience and their collective goal is to contribute towards the welfare of the community members. Question answering is an important feature that enables community members to exchange knowledge within the community boundary. The overwhelming number of communities necessitates the need for a good question routing strategy so that new questions gets routed to the appropriately focused community and thus get resolved. In this paper, we consider the novel problem of routing questions to the right community and propose a framework to select the right set of communities for a question. We begin by using several prior proposed features for users and add some additional features, namely language attributes and inclination to respond, for community modeling. Then we introduce two k nearest neighbor based aggregation algorithms for computing community scores. We show how these scores can be combined to recommend communities and test the effectiveness of the recommendations over a large real world dataset.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85791917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}