{"title":"Topic Detection and Extraction in Chat","authors":"Paige Adams, C. Martell","doi":"10.1109/ICSC.2008.61","DOIUrl":null,"url":null,"abstract":"Internet-based Chat environments such as Internet relay Chat and instant messaging pose a challenge for data mining and information retrieval systems due to the multi-threaded, overlapping nature of the dialog and the nonstandard usage of language. In this paper we present preliminary methods of topic detection and topic thread extraction that augment a typical TF-IDF-based vector space model approach with temporal relationship information between posts of the Chat dialog combined with WordNet hypernym augmentation. We show results that promise better performance than using only a TF-IDF bag-of-words vector space model.","PeriodicalId":102805,"journal":{"name":"2008 IEEE International Conference on Semantic Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2008.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 93
Abstract
Internet-based Chat environments such as Internet relay Chat and instant messaging pose a challenge for data mining and information retrieval systems due to the multi-threaded, overlapping nature of the dialog and the nonstandard usage of language. In this paper we present preliminary methods of topic detection and topic thread extraction that augment a typical TF-IDF-based vector space model approach with temporal relationship information between posts of the Chat dialog combined with WordNet hypernym augmentation. We show results that promise better performance than using only a TF-IDF bag-of-words vector space model.