D. Hoogeveen, Li Wang, Timothy Baldwin, Karin M. Verspoor
{"title":"Web Forum Retrieval and Text Analytics: A Survey","authors":"D. Hoogeveen, Li Wang, Timothy Baldwin, Karin M. Verspoor","doi":"10.1561/1500000062","DOIUrl":null,"url":null,"abstract":"This survey presents an overview of information retrieval, natural languageprocessing and machine learning research that makes use of forumdata, including both discussion forums and community questionansweringcQA archives. The focus is on automated analysis, withthe goal of gaining a better understanding of the data and its users.We discuss the different strategies used for both retrieval taskspost retrieval, question retrieval, and answer retrieval and classificationtasks post type classification, question classification, post qualityassessment, subjectivity, and viewpoint classification at the postlevel, as well as at the thread level thread retrieval, solvedness andtask orientation, discourse structure recovery and dialogue act tagging,QA-pair extraction, and thread summarisation. We also review workon forum users, including user satisfaction, expert finding, questionrecommendation and routing, and community analysis.The survey includes a brief history of forums, an overview of thedifferent kinds of forums, a summary of publicly available datasets forforum research, and a short discussion on the evaluation of retrievaltasks using forum data.The aim is to give a broad overview of the different kinds of forumresearch, a summary of the methods that have been applied, some insightsinto successful strategies, and potential areas for future research.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"54 1","pages":"1-163"},"PeriodicalIF":8.3000,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1561/1500000062","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 34
Abstract
This survey presents an overview of information retrieval, natural languageprocessing and machine learning research that makes use of forumdata, including both discussion forums and community questionansweringcQA archives. The focus is on automated analysis, withthe goal of gaining a better understanding of the data and its users.We discuss the different strategies used for both retrieval taskspost retrieval, question retrieval, and answer retrieval and classificationtasks post type classification, question classification, post qualityassessment, subjectivity, and viewpoint classification at the postlevel, as well as at the thread level thread retrieval, solvedness andtask orientation, discourse structure recovery and dialogue act tagging,QA-pair extraction, and thread summarisation. We also review workon forum users, including user satisfaction, expert finding, questionrecommendation and routing, and community analysis.The survey includes a brief history of forums, an overview of thedifferent kinds of forums, a summary of publicly available datasets forforum research, and a short discussion on the evaluation of retrievaltasks using forum data.The aim is to give a broad overview of the different kinds of forumresearch, a summary of the methods that have been applied, some insightsinto successful strategies, and potential areas for future research.
期刊介绍:
The surge in research across all domains in the past decade has resulted in a plethora of new publications, causing an exponential growth in published research. Navigating through this extensive literature and staying current has become a time-consuming challenge. While electronic publishing provides instant access to more articles than ever, discerning the essential ones for a comprehensive understanding of any topic remains an issue. To tackle this, Foundations and Trends® in Information Retrieval - FnTIR - addresses the problem by publishing high-quality survey and tutorial monographs in the field.
Each issue of Foundations and Trends® in Information Retrieval - FnT IR features a 50-100 page monograph authored by research leaders, covering tutorial subjects, research retrospectives, and survey papers that provide state-of-the-art reviews within the scope of the journal.