基于高斯和逻辑编码网络与预训练词嵌入相结合的短文本主题建模方法

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-11-22 DOI:10.1016/j.neucom.2024.128941
Si Zhang, Jiali Xu, Ning Hui, Peiyun Zhai
{"title":"基于高斯和逻辑编码网络与预训练词嵌入相结合的短文本主题建模方法","authors":"Si Zhang,&nbsp;Jiali Xu,&nbsp;Ning Hui,&nbsp;Peiyun Zhai","doi":"10.1016/j.neucom.2024.128941","DOIUrl":null,"url":null,"abstract":"<div><div>The development of neural networks has provided a flexible learning framework for topic modeling. Currently, topic modeling based on neural networks has garnered wide attention. Despite its widespread application, the implementation of neural topic modeling still needs to be improved due to the complexity of short texts. Short texts usually contains only a few words and a small amount of feature information, lacking sufficient word co-occurrence and context sharing information. This results in challenges such as sparse features and poor interpretability in topic modeling. To alleviate this issue, an innovative model called <strong>T</strong>opic <strong>M</strong>odeling of <strong>E</strong>nhanced <strong>N</strong>eural <strong>N</strong>etwork with word <strong>E</strong>mbedding (ENNETM) was proposed. Firstly, we introduced an enhanced network into the inference network part, which integrated the Gaussian and Logistic coding networks to improve the performance and the interpretability of topic extraction. Secondly, we introduced the pre-trained word embedding into the Gaussian decoding network part of the model to enrich the contextual semantic information. Comprehensive experiments were carried out on three public datasets, 20NewGroups, AG_news and TagMyNews, and the results showed that the proposed method outperformed several state-of-the-art models in topic extraction and text classification.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128941"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A short text topic modeling method based on integrating Gaussian and Logistic coding networks with pre-trained word embeddings\",\"authors\":\"Si Zhang,&nbsp;Jiali Xu,&nbsp;Ning Hui,&nbsp;Peiyun Zhai\",\"doi\":\"10.1016/j.neucom.2024.128941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The development of neural networks has provided a flexible learning framework for topic modeling. Currently, topic modeling based on neural networks has garnered wide attention. Despite its widespread application, the implementation of neural topic modeling still needs to be improved due to the complexity of short texts. Short texts usually contains only a few words and a small amount of feature information, lacking sufficient word co-occurrence and context sharing information. This results in challenges such as sparse features and poor interpretability in topic modeling. To alleviate this issue, an innovative model called <strong>T</strong>opic <strong>M</strong>odeling of <strong>E</strong>nhanced <strong>N</strong>eural <strong>N</strong>etwork with word <strong>E</strong>mbedding (ENNETM) was proposed. Firstly, we introduced an enhanced network into the inference network part, which integrated the Gaussian and Logistic coding networks to improve the performance and the interpretability of topic extraction. Secondly, we introduced the pre-trained word embedding into the Gaussian decoding network part of the model to enrich the contextual semantic information. Comprehensive experiments were carried out on three public datasets, 20NewGroups, AG_news and TagMyNews, and the results showed that the proposed method outperformed several state-of-the-art models in topic extraction and text classification.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"616 \",\"pages\":\"Article 128941\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224017120\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017120","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

神经网络的发展为主题建模提供了一个灵活的学习框架。目前,基于神经网络的主题建模得到了广泛的关注。尽管应用广泛,但由于短文本的复杂性,神经主题建模的实现仍有待改进。短文本通常只包含少量的单词和少量的特征信息,缺乏足够的单词共现和上下文共享信息。这导致了主题建模中的稀疏特征和较差的可解释性等挑战。为了解决这一问题,提出了一种基于词嵌入的增强神经网络主题建模(ENNETM)模型。首先,我们在推理网络部分引入了一种增强网络,将高斯和逻辑编码网络相结合,提高了主题抽取的性能和可解释性。其次,在模型的高斯解码网络部分引入预训练词嵌入,丰富上下文语义信息;在20NewGroups、AG_news和TagMyNews三个公共数据集上进行了综合实验,结果表明该方法在主题提取和文本分类方面优于几种最先进的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A short text topic modeling method based on integrating Gaussian and Logistic coding networks with pre-trained word embeddings
The development of neural networks has provided a flexible learning framework for topic modeling. Currently, topic modeling based on neural networks has garnered wide attention. Despite its widespread application, the implementation of neural topic modeling still needs to be improved due to the complexity of short texts. Short texts usually contains only a few words and a small amount of feature information, lacking sufficient word co-occurrence and context sharing information. This results in challenges such as sparse features and poor interpretability in topic modeling. To alleviate this issue, an innovative model called Topic Modeling of Enhanced Neural Network with word Embedding (ENNETM) was proposed. Firstly, we introduced an enhanced network into the inference network part, which integrated the Gaussian and Logistic coding networks to improve the performance and the interpretability of topic extraction. Secondly, we introduced the pre-trained word embedding into the Gaussian decoding network part of the model to enrich the contextual semantic information. Comprehensive experiments were carried out on three public datasets, 20NewGroups, AG_news and TagMyNews, and the results showed that the proposed method outperformed several state-of-the-art models in topic extraction and text classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
Monocular thermal SLAM with neural radiance fields for 3D scene reconstruction Learning a more compact representation for low-rank tensor completion An HVS-derived network for assessing the quality of camouflaged targets with feature fusion Global Span Semantic Dependency Awareness and Filtering Network for nested named entity recognition A user behavior-aware multi-task learning model for enhanced short video recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1