Kenji Contreras, Gabriel Verbel, J. Sánchez, J. Sánchez-Galán
{"title":"使用主题建模分析巴拿马议会程序与神经和统计方法","authors":"Kenji Contreras, Gabriel Verbel, J. Sánchez, J. Sánchez-Galán","doi":"10.1109/CONCAPAN48024.2022.9997766","DOIUrl":null,"url":null,"abstract":"This work used statistical and neural topic modelling algorithms to perform an unsupervised analysis on a Panamanian parliamentary proceedings corpus of 2086 Spanish language texts. The statistical algorithm known as Latent Dirichlet Allocation (LDA) was employed and its performance compared to BERTopic, a neural-based method that encompasses deep neural document embeddings, non-linear dimensionality reduction, and hierarchical density clustering. Both models achieved topic diversity and coherence, yielding interpretable results with different levels of semantic abstraction. Furthermore, taking advantage of BERTopic’s features, Dynamic Topic Modelling analysis showed global and evolutionary word frequency of health-related topics. Results can be used for in-depth analysis of political trends, even though more complex hyperparameter tuning might be necessary to achieve higher topic coherence and interpretability.","PeriodicalId":138415,"journal":{"name":"2022 IEEE 40th Central America and Panama Convention (CONCAPAN)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Topic Modelling for Analyzing Panamanian Parliamentary Proceedings with Neural and Statistical Methods\",\"authors\":\"Kenji Contreras, Gabriel Verbel, J. Sánchez, J. Sánchez-Galán\",\"doi\":\"10.1109/CONCAPAN48024.2022.9997766\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work used statistical and neural topic modelling algorithms to perform an unsupervised analysis on a Panamanian parliamentary proceedings corpus of 2086 Spanish language texts. The statistical algorithm known as Latent Dirichlet Allocation (LDA) was employed and its performance compared to BERTopic, a neural-based method that encompasses deep neural document embeddings, non-linear dimensionality reduction, and hierarchical density clustering. Both models achieved topic diversity and coherence, yielding interpretable results with different levels of semantic abstraction. Furthermore, taking advantage of BERTopic’s features, Dynamic Topic Modelling analysis showed global and evolutionary word frequency of health-related topics. Results can be used for in-depth analysis of political trends, even though more complex hyperparameter tuning might be necessary to achieve higher topic coherence and interpretability.\",\"PeriodicalId\":138415,\"journal\":{\"name\":\"2022 IEEE 40th Central America and Panama Convention (CONCAPAN)\",\"volume\":\"112 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 40th Central America and Panama Convention (CONCAPAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONCAPAN48024.2022.9997766\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 40th Central America and Panama Convention (CONCAPAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONCAPAN48024.2022.9997766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Topic Modelling for Analyzing Panamanian Parliamentary Proceedings with Neural and Statistical Methods
This work used statistical and neural topic modelling algorithms to perform an unsupervised analysis on a Panamanian parliamentary proceedings corpus of 2086 Spanish language texts. The statistical algorithm known as Latent Dirichlet Allocation (LDA) was employed and its performance compared to BERTopic, a neural-based method that encompasses deep neural document embeddings, non-linear dimensionality reduction, and hierarchical density clustering. Both models achieved topic diversity and coherence, yielding interpretable results with different levels of semantic abstraction. Furthermore, taking advantage of BERTopic’s features, Dynamic Topic Modelling analysis showed global and evolutionary word frequency of health-related topics. Results can be used for in-depth analysis of political trends, even though more complex hyperparameter tuning might be necessary to achieve higher topic coherence and interpretability.