People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users' general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.
{"title":"Supporting exploratory people search: a study of factor transparency and user control","authors":"Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue","doi":"10.1145/2505515.2505684","DOIUrl":"https://doi.org/10.1145/2505515.2505684","url":null,"abstract":"People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users' general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"2012 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86356299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Kazai, Emine Yilmaz, Nick Craswell, S. Tahaghoghi
Preference based methods for collecting relevance data for information retrieval (IR) evaluation have been shown to lead to better inter-assessor agreement than the traditional method of judging individual documents. However, little is known as to why preference judging reduces assessor disagreement and whether better agreement among assessors also means better agreement with user satisfaction, as signaled by user clicks. In this paper, we examine the relationship between assessor disagreement and various click based measures, such as click preference strength and user intent similarity, for judgments collected from editorial judges and crowd workers using single absolute, pairwise absolute and pairwise preference based judging methods. We find that trained judges are significantly more likely to agree with each other and with users than crowd workers, but inter-assessor agreement does not mean agreement with users. Switching to a pairwise judging mode improves crowdsourcing quality close to that of trained judges. We also find a relationship between intent similarity and assessor-user agreement, where the nature of the relationship changes across judging modes. Overall, our findings suggest that the awareness of different possible intents, enabled by pairwise judging, is a key reason of the improved agreement, and a crucial requirement when crowdsourcing relevance data.
{"title":"User intent and assessor disagreement in web search evaluation","authors":"G. Kazai, Emine Yilmaz, Nick Craswell, S. Tahaghoghi","doi":"10.1145/2505515.2505716","DOIUrl":"https://doi.org/10.1145/2505515.2505716","url":null,"abstract":"Preference based methods for collecting relevance data for information retrieval (IR) evaluation have been shown to lead to better inter-assessor agreement than the traditional method of judging individual documents. However, little is known as to why preference judging reduces assessor disagreement and whether better agreement among assessors also means better agreement with user satisfaction, as signaled by user clicks. In this paper, we examine the relationship between assessor disagreement and various click based measures, such as click preference strength and user intent similarity, for judgments collected from editorial judges and crowd workers using single absolute, pairwise absolute and pairwise preference based judging methods. We find that trained judges are significantly more likely to agree with each other and with users than crowd workers, but inter-assessor agreement does not mean agreement with users. Switching to a pairwise judging mode improves crowdsourcing quality close to that of trained judges. We also find a relationship between intent similarity and assessor-user agreement, where the nature of the relationship changes across judging modes. Overall, our findings suggest that the awareness of different possible intents, enabled by pairwise judging, is a key reason of the improved agreement, and a crucial requirement when crowdsourcing relevance data.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82123574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We build a system to extract user interests from Twitter messages. Specifically, we extract interest candidates using linguistic patterns and rank them using four different keyphrase ranking techniques: TFIDF, TextRank, LDA-TextRank, and Relevance-Interestingness-Rank (RI-Rank). We also explore the complementary relation between TFIDF and TextRank in ranking interest candidates. Top ranked interests are evaluated with user feedback gathered from an online survey. The results show that TFIDF and TextRank are both suitable for extracting user interests from tweets. Moreover, the combination of TFIDF and TextRank consistently yields the highest user positive feedback.
{"title":"Interest mining from user tweets","authors":"Thuy Vu, V. Perez","doi":"10.1145/2505515.2507883","DOIUrl":"https://doi.org/10.1145/2505515.2507883","url":null,"abstract":"We build a system to extract user interests from Twitter messages. Specifically, we extract interest candidates using linguistic patterns and rank them using four different keyphrase ranking techniques: TFIDF, TextRank, LDA-TextRank, and Relevance-Interestingness-Rank (RI-Rank). We also explore the complementary relation between TFIDF and TextRank in ranking interest candidates. Top ranked interests are evaluated with user feedback gathered from an online survey. The results show that TFIDF and TextRank are both suitable for extracting user interests from tweets. Moreover, the combination of TFIDF and TextRank consistently yields the highest user positive feedback.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82166977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duck-Ho Bae, Jin-Hyung Kim, Yong-Yeon Jo, Sang-Wook Kim, Hyun-Kyo Oh, Chanik Park
This paper introduces the notion of intelligent SSDs. First, we present the design considerations of intelligent SSDs, and then examine their potential benefits under various settings in data mining applications.
{"title":"Intelligent SSD: a turbo for big data mining","authors":"Duck-Ho Bae, Jin-Hyung Kim, Yong-Yeon Jo, Sang-Wook Kim, Hyun-Kyo Oh, Chanik Park","doi":"10.1145/2505515.2507847","DOIUrl":"https://doi.org/10.1145/2505515.2507847","url":null,"abstract":"This paper introduces the notion of intelligent SSDs. First, we present the design considerations of intelligent SSDs, and then examine their potential benefits under various settings in data mining applications.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81370819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Entity attribute values, such as "lord of the rings" for movie.title or "infant" for shoe.gender, are atomic components of entity expressions. Discovering alternative surface forms of attribute values is important for improving entity recognition and retrieval. In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. The framework can integrate signals from multiple information sources into a similarity function between attribute values. And the weights of these signals are optimized in an unsupervised manner. Extensive experiments across multiple domains demonstrate the effectiveness of our clustering framework for mining entity attribute synonyms.
{"title":"Mining entity attribute synonyms via compact clustering","authors":"Yanen Li, B. Hsu, ChengXiang Zhai, Kuansan Wang","doi":"10.1145/2505515.2505608","DOIUrl":"https://doi.org/10.1145/2505515.2505608","url":null,"abstract":"Entity attribute values, such as \"lord of the rings\" for movie.title or \"infant\" for shoe.gender, are atomic components of entity expressions. Discovering alternative surface forms of attribute values is important for improving entity recognition and retrieval. In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. The framework can integrate signals from multiple information sources into a similarity function between attribute values. And the weights of these signals are optimized in an unsupervised manner. Extensive experiments across multiple domains demonstrate the effectiveness of our clustering framework for mining entity attribute synonyms.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81563841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is crucial for the success of a search-driven web application to answer users' queries in the best possible way. A common approach is to use click models for guessing the relevance of search results. However, these models are imprecise and waive valuable information one can gain from non-click user interactions. We introduce TellMyRelevance!---a novel automatic end-to-end pipeline for tracking cursor interactions at the client, analyzing these and learning according relevance models. Yet, the models depend on the layout of the search results page involved, which makes them difficult to evaluate and compare. Thus, we use a Random Mouse Cursor as an extension to our pipeline for generating layout-dependent baselines. Based on these, we can perform evaluations of real-world relevance models. A large-scale interaction log analysis showed that we can learn relevance models whose predictions compare favorably to predictions of an existing state-of-the-art click model.
{"title":"TellMyRelevance!: predicting the relevance of web search results from cursor interactions","authors":"Maximilian Speicher, A. Both, M. Gaedke","doi":"10.1145/2505515.2505703","DOIUrl":"https://doi.org/10.1145/2505515.2505703","url":null,"abstract":"It is crucial for the success of a search-driven web application to answer users' queries in the best possible way. A common approach is to use click models for guessing the relevance of search results. However, these models are imprecise and waive valuable information one can gain from non-click user interactions. We introduce TellMyRelevance!---a novel automatic end-to-end pipeline for tracking cursor interactions at the client, analyzing these and learning according relevance models. Yet, the models depend on the layout of the search results page involved, which makes them difficult to evaluate and compare. Thus, we use a Random Mouse Cursor as an extension to our pipeline for generating layout-dependent baselines. Based on these, we can perform evaluations of real-world relevance models. A large-scale interaction log analysis showed that we can learn relevance models whose predictions compare favorably to predictions of an existing state-of-the-art click model.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83934151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Probabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption has been slow due to the high level of expertise expected both in probability and in the domain from the user. Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for the attributes, latent variables that cluster the records, and factors that reflect and represent the foreign key links, whilst allowing efficient inference. Experiments demonstrate the accuracy of the model and scalability of inference on synthetic and real-world data.
{"title":"Automated probabilistic modeling for relational data","authors":"Sameer Singh, T. Graepel","doi":"10.1145/2505515.2507828","DOIUrl":"https://doi.org/10.1145/2505515.2507828","url":null,"abstract":"Probabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption has been slow due to the high level of expertise expected both in probability and in the domain from the user. Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for the attributes, latent variables that cluster the records, and factors that reflect and represent the foreign key links, whilst allowing efficient inference. Experiments demonstrate the accuracy of the model and scalability of inference on synthetic and real-world data.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84389983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. By venture finance, specifically, we are interested in the investment activity of venture capital (VC) firms and their investment partners. We have access to a dataset of recorded venture financings (i.e., investments) by VCs and their investment partners in private US companies. This research was undertaken in partnership with Correlation Ventures, a venture capital firm who are pioneering the use of predictive analytics in order to better inform investment decision making. This paper undertakes a detailed empirical study and data analysis then demonstrates the efficacy of recommender systems in this novel application domain.
{"title":"An empirical study of top-n recommendation for venture finance","authors":"T. Stone, Weinan Zhang, Xiaoxue Zhao","doi":"10.1145/2505515.2507882","DOIUrl":"https://doi.org/10.1145/2505515.2507882","url":null,"abstract":"This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. By venture finance, specifically, we are interested in the investment activity of venture capital (VC) firms and their investment partners. We have access to a dataset of recorded venture financings (i.e., investments) by VCs and their investment partners in private US companies. This research was undertaken in partnership with Correlation Ventures, a venture capital firm who are pioneering the use of predictive analytics in order to better inform investment decision making. This paper undertakes a detailed empirical study and data analysis then demonstrates the efficacy of recommender systems in this novel application domain.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84454030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Query expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to 140 characters. Pseudo-relevance feedback assumes that the top ranked documents in the initial search results are relevant and that they contain topic-related words appropriate for relevance feedback. However, those assumptions do not always hold in reality because the initial search results often contain many irrelevant documents. In such a case, only a few of the suggested expansion words may be useful with many others being useless or even harmful. To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. Our experiments using a corpus of microblog data (the Tweets2011 corpus) demonstrate that the proposed two-stage relevance feedback approaches considerably improve search result relevance over almost all topics.
{"title":"Improving pseudo-relevance feedback via tweet selection","authors":"Taiki Miyanishi, Kazuhiro Seki, K. Uehara","doi":"10.1145/2505515.2505701","DOIUrl":"https://doi.org/10.1145/2505515.2505701","url":null,"abstract":"Query expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to 140 characters. Pseudo-relevance feedback assumes that the top ranked documents in the initial search results are relevant and that they contain topic-related words appropriate for relevance feedback. However, those assumptions do not always hold in reality because the initial search results often contain many irrelevant documents. In such a case, only a few of the suggested expansion words may be useful with many others being useless or even harmful. To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. Our experiments using a corpus of microblog data (the Tweets2011 corpus) demonstrate that the proposed two-stage relevance feedback approaches considerably improve search result relevance over almost all topics.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80545887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalisation is an important area in the field of IR that attempts to adapt ranking algorithms so that the results returned are tuned towards the searcher's interests. In this work we use query logs to build personalised ranking models in which user profiles are constructed based on the representation of clicked documents over a topic space. Instead of employing a human-generated ontology, we use novel latent topic models to determine these topics. Our experiments show that by subtly introducing user profiles as part of the ranking algorithm, rather than by re-ranking an existing list, we can provide personalised ranked lists of documents which improve significantly over a non-personalised baseline. Further examination shows that the performance of the personalised system is particularly good in cases where prior knowledge of the search query is limited.
{"title":"Building user profiles from topic models for personalised search","authors":"Morgan Harvey, F. Crestani, Mark James Carman","doi":"10.1145/2505515.2505642","DOIUrl":"https://doi.org/10.1145/2505515.2505642","url":null,"abstract":"Personalisation is an important area in the field of IR that attempts to adapt ranking algorithms so that the results returned are tuned towards the searcher's interests. In this work we use query logs to build personalised ranking models in which user profiles are constructed based on the representation of clicked documents over a topic space. Instead of employing a human-generated ontology, we use novel latent topic models to determine these topics. Our experiments show that by subtly introducing user profiles as part of the ranking algorithm, rather than by re-ranking an existing list, we can provide personalised ranked lists of documents which improve significantly over a non-personalised baseline. Further examination shows that the performance of the personalised system is particularly good in cases where prior knowledge of the search query is limited.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89468788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}