{"title":"短文本连贯主题发现的多层次聚类模型","authors":"Emmanuel Maithya, L. Nderu, D. Njagi","doi":"10.23919/IST-Africa56635.2022.9845648","DOIUrl":null,"url":null,"abstract":"Deducing meaning from collections of documents has become an increasingly important task for decision makers, both in industry and academia. To address this challenge, topic modelling techniques have been developed to identify and isolate words that most closely summarise the contents of document collections. However, the topics extracted from collections of short texts by these techniques, achieve low coherence scores, thereby defeating the purpose for which these techniques were created. In this paper, we propose the n-gram_cluster model, a model that exploits the semantic closeness between n-grams and word clusters formed from collections of the n-grams at different levels to discover topics. The model is able to discover semantically coherent topics from collections of short texts. We evaluated the performance of our model against those of three other conventional models showing that it is able to form topics that achieve comparatively higher coherence scores.","PeriodicalId":142887,"journal":{"name":"2022 IST-Africa Conference (IST-Africa)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts\",\"authors\":\"Emmanuel Maithya, L. Nderu, D. Njagi\",\"doi\":\"10.23919/IST-Africa56635.2022.9845648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deducing meaning from collections of documents has become an increasingly important task for decision makers, both in industry and academia. To address this challenge, topic modelling techniques have been developed to identify and isolate words that most closely summarise the contents of document collections. However, the topics extracted from collections of short texts by these techniques, achieve low coherence scores, thereby defeating the purpose for which these techniques were created. In this paper, we propose the n-gram_cluster model, a model that exploits the semantic closeness between n-grams and word clusters formed from collections of the n-grams at different levels to discover topics. The model is able to discover semantically coherent topics from collections of short texts. We evaluated the performance of our model against those of three other conventional models showing that it is able to form topics that achieve comparatively higher coherence scores.\",\"PeriodicalId\":142887,\"journal\":{\"name\":\"2022 IST-Africa Conference (IST-Africa)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IST-Africa Conference (IST-Africa)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/IST-Africa56635.2022.9845648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IST-Africa Conference (IST-Africa)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/IST-Africa56635.2022.9845648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts
Deducing meaning from collections of documents has become an increasingly important task for decision makers, both in industry and academia. To address this challenge, topic modelling techniques have been developed to identify and isolate words that most closely summarise the contents of document collections. However, the topics extracted from collections of short texts by these techniques, achieve low coherence scores, thereby defeating the purpose for which these techniques were created. In this paper, we propose the n-gram_cluster model, a model that exploits the semantic closeness between n-grams and word clusters formed from collections of the n-grams at different levels to discover topics. The model is able to discover semantically coherent topics from collections of short texts. We evaluated the performance of our model against those of three other conventional models showing that it is able to form topics that achieve comparatively higher coherence scores.