D. Kosmajac, Kirstie Smith, Vlado Keselj, S. Kirkland
{"title":"基于短语嵌入质心距离的健康老龄化开放式调查问题图主题提取","authors":"D. Kosmajac, Kirstie Smith, Vlado Keselj, S. Kirkland","doi":"10.1109/ICDMW51313.2020.00088","DOIUrl":null,"url":null,"abstract":"Open-ended questions are a very important part of research surveys. However, they can pose a challenge when it comes to processing since manual processing requires a labour-intensive human effort. Automation of the task requires application of NLP methods since free text does not ensure standardized structure. To tackle this problem, we present a solution for topic discovery and analysis of open-ended survey items. We use graph-based representation of the text that adds structure and enables easier manipulation and keyphrase retrieval. Additionally, we use pre-trained fastText aligned word vectors to cluster similar phrases even if they are written in different languages. The goal is to produce topic word and phrase representatives that are easy to interpret by a domain expert. We compare the method with traditional LDA and two state-of-the-art algorithms: BTM and WNTM. The resulting keyphrases representing topics are more intuitive to the domain experts than the ones obtained by reference topic models in similar experimental settings.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions\",\"authors\":\"D. Kosmajac, Kirstie Smith, Vlado Keselj, S. Kirkland\",\"doi\":\"10.1109/ICDMW51313.2020.00088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open-ended questions are a very important part of research surveys. However, they can pose a challenge when it comes to processing since manual processing requires a labour-intensive human effort. Automation of the task requires application of NLP methods since free text does not ensure standardized structure. To tackle this problem, we present a solution for topic discovery and analysis of open-ended survey items. We use graph-based representation of the text that adds structure and enables easier manipulation and keyphrase retrieval. Additionally, we use pre-trained fastText aligned word vectors to cluster similar phrases even if they are written in different languages. The goal is to produce topic word and phrase representatives that are easy to interpret by a domain expert. We compare the method with traditional LDA and two state-of-the-art algorithms: BTM and WNTM. The resulting keyphrases representing topics are more intuitive to the domain experts than the ones obtained by reference topic models in similar experimental settings.\",\"PeriodicalId\":426846,\"journal\":{\"name\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW51313.2020.00088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions
Open-ended questions are a very important part of research surveys. However, they can pose a challenge when it comes to processing since manual processing requires a labour-intensive human effort. Automation of the task requires application of NLP methods since free text does not ensure standardized structure. To tackle this problem, we present a solution for topic discovery and analysis of open-ended survey items. We use graph-based representation of the text that adds structure and enables easier manipulation and keyphrase retrieval. Additionally, we use pre-trained fastText aligned word vectors to cluster similar phrases even if they are written in different languages. The goal is to produce topic word and phrase representatives that are easy to interpret by a domain expert. We compare the method with traditional LDA and two state-of-the-art algorithms: BTM and WNTM. The resulting keyphrases representing topics are more intuitive to the domain experts than the ones obtained by reference topic models in similar experimental settings.