{"title":"用于检索CQA论坛中相似问题的部分标记监督主题模型","authors":"Debasis Ganguly, G. Jones","doi":"10.1145/2808194.2809460","DOIUrl":null,"url":null,"abstract":"Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums\",\"authors\":\"Debasis Ganguly, G. Jones\",\"doi\":\"10.1145/2808194.2809460\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.\",\"PeriodicalId\":440325,\"journal\":{\"name\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"volume\":\"230 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808194.2809460\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums
Manual annotations, e.g. tags and links, of user generated content in community question answering forums and social media play an important role in making the content searchable. During the active phase of a new question entered into a CQA forum, a moderator or an answerer often has to make a significant effort to manually search for related question threads (which we refer to as documents), that he may consider linking to the current question. This manual effort can be greatly reduced by an automated search process to suggest a list of candidate documents to be linked to the new document. We described our investigation of link recommendation for this task. We approach the problem as an ad-hoc information retrieval (IR) task in which a new document (question) acts as the query and the intention is to retrieve a list of potentially relevant documents (previously asked questions in the forum), which could then be linked (manually) to the new one. In contrast to standard ad-hoc search, two pieces of human annotated additional information, namely the tags of the documents and the known links between existing document pairs, can potentially be used to improve the search quality for new questions. To utilize this additional information, we propose a generative model of tagged documents which jointly estimates the distribution of topics corresponding to each tag of a document along with the likelihood of a document being linked to another one. The model predictions are then incorporated in the query likelihood estimate of a standard language model (LM) of IR. Experiments conducted on three months of a crawled StackOverflow dataset show that utilizing the tag specific topic distributions results in a significant improvement in retrieval of the candidate set of related documents.