针对复杂长文本分类的认知启发多粒度模型(包含标签信息

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2023-12-26 DOI:10.1007/s12559-023-10237-1
Li Gao, Yi Liu, Jianmin Zhu, Zhen Yu
{"title":"针对复杂长文本分类的认知启发多粒度模型(包含标签信息","authors":"Li Gao, Yi Liu, Jianmin Zhu, Zhen Yu","doi":"10.1007/s12559-023-10237-1","DOIUrl":null,"url":null,"abstract":"<p>Because the abstracts contain complex information and the labels of abstracts do not contain information about categories, it is difficult for cognitive models to extract comprehensive features to match the corresponding labels. In this paper, a cognitively inspired multi-granularity model incorporating label information (LIMG) is proposed to solve these problems. Firstly, we use information of abstracts to give labels the actual semantics. It can improve the semantic representation of word embeddings. Secondly, the model uses the dual channel pooling convolutional neural network (DCP-CNN) and the timescale shrink gated recurrent units (TSGRU) to extract multi-granularity information of abstracts. One of the channels in DCP-CNN highlights the key content and the other is used for TSGRU to extract context-related features of abstracts. Finally, TSGRU adds a timescale to retain the long-term dependence by recuring the past information and a soft thresholding algorithm to realize the noise reduction. Experiments were carried out on four benchmark datasets: Arxiv Academic Paper Dataset (AAPD), Web of Science (WOS), Amazon Review and Yahoo! Answers. As compared to the baseline models, the accuracy is improved by up to 3.36%. On AAPD (54,840 abstracts) and WOS (46,985 abstracts) datasets, the micro-F1 score reached 75.62% and 81.68%, respectively. The results show that acquiring label semantics from abstracts can enhance text representations and multi-granularity feature extraction can inspire the cognitive system’s <i>understanding</i> of the complex information in abstracts.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"28 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Cognitively Inspired Multi-granularity Model Incorporating Label Information for Complex Long Text Classification\",\"authors\":\"Li Gao, Yi Liu, Jianmin Zhu, Zhen Yu\",\"doi\":\"10.1007/s12559-023-10237-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Because the abstracts contain complex information and the labels of abstracts do not contain information about categories, it is difficult for cognitive models to extract comprehensive features to match the corresponding labels. In this paper, a cognitively inspired multi-granularity model incorporating label information (LIMG) is proposed to solve these problems. Firstly, we use information of abstracts to give labels the actual semantics. It can improve the semantic representation of word embeddings. Secondly, the model uses the dual channel pooling convolutional neural network (DCP-CNN) and the timescale shrink gated recurrent units (TSGRU) to extract multi-granularity information of abstracts. One of the channels in DCP-CNN highlights the key content and the other is used for TSGRU to extract context-related features of abstracts. Finally, TSGRU adds a timescale to retain the long-term dependence by recuring the past information and a soft thresholding algorithm to realize the noise reduction. Experiments were carried out on four benchmark datasets: Arxiv Academic Paper Dataset (AAPD), Web of Science (WOS), Amazon Review and Yahoo! Answers. As compared to the baseline models, the accuracy is improved by up to 3.36%. On AAPD (54,840 abstracts) and WOS (46,985 abstracts) datasets, the micro-F1 score reached 75.62% and 81.68%, respectively. The results show that acquiring label semantics from abstracts can enhance text representations and multi-granularity feature extraction can inspire the cognitive system’s <i>understanding</i> of the complex information in abstracts.</p>\",\"PeriodicalId\":51243,\"journal\":{\"name\":\"Cognitive Computation\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12559-023-10237-1\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-023-10237-1","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

由于抽象内容包含复杂信息,而抽象内容的标签又不包含类别信息,因此认知模型很难提取全面的特征来匹配相应的标签。本文提出了一种包含标签信息的认知启发多粒度模型(LIMG)来解决这些问题。首先,我们利用抽象信息赋予标签实际语义。这可以改善词嵌入的语义表示。其次,该模型使用双通道池化卷积神经网络(DCP-CNN)和时标收缩门控递归单元(TSGRU)来提取摘要的多粒度信息。DCP-CNN 中的一个通道突出关键内容,另一个通道用于 TSGRU 提取摘要的上下文相关特征。最后,TSGRU 增加了一个时间尺度,通过重现过去的信息来保留长期依赖性,并增加了一个软阈值算法来实现降噪。我们在四个基准数据集上进行了实验:实验在四个基准数据集上进行:Arxiv 学术论文数据集(AAPD)、科学网(WOS)、亚马逊评论和雅虎答案。与基准模型相比,准确率提高了 3.36%。在 AAPD(54,840 篇摘要)和 WOS(46,985 篇摘要)数据集上,micro-F1 分数分别达到了 75.62% 和 81.68%。结果表明,从摘要中获取标签语义可以增强文本表征,而多粒度特征提取则可以启发认知系统对摘要中复杂信息的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Cognitively Inspired Multi-granularity Model Incorporating Label Information for Complex Long Text Classification

Because the abstracts contain complex information and the labels of abstracts do not contain information about categories, it is difficult for cognitive models to extract comprehensive features to match the corresponding labels. In this paper, a cognitively inspired multi-granularity model incorporating label information (LIMG) is proposed to solve these problems. Firstly, we use information of abstracts to give labels the actual semantics. It can improve the semantic representation of word embeddings. Secondly, the model uses the dual channel pooling convolutional neural network (DCP-CNN) and the timescale shrink gated recurrent units (TSGRU) to extract multi-granularity information of abstracts. One of the channels in DCP-CNN highlights the key content and the other is used for TSGRU to extract context-related features of abstracts. Finally, TSGRU adds a timescale to retain the long-term dependence by recuring the past information and a soft thresholding algorithm to realize the noise reduction. Experiments were carried out on four benchmark datasets: Arxiv Academic Paper Dataset (AAPD), Web of Science (WOS), Amazon Review and Yahoo! Answers. As compared to the baseline models, the accuracy is improved by up to 3.36%. On AAPD (54,840 abstracts) and WOS (46,985 abstracts) datasets, the micro-F1 score reached 75.62% and 81.68%, respectively. The results show that acquiring label semantics from abstracts can enhance text representations and multi-granularity feature extraction can inspire the cognitive system’s understanding of the complex information in abstracts.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cognitive Computation
Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES
CiteScore
9.30
自引率
3.70%
发文量
116
审稿时长
>12 weeks
期刊介绍: Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.
期刊最新文献
A Joint Network for Low-Light Image Enhancement Based on Retinex Incorporating Template-Based Contrastive Learning into Cognitively Inspired, Low-Resource Relation Extraction A Novel Cognitive Rough Approach for Severity Analysis of Autistic Children Using Spherical Fuzzy Bipolar Soft Sets Cognitively Inspired Three-Way Decision Making and Bi-Level Evolutionary Optimization for Mobile Cybersecurity Threats Detection: A Case Study on Android Malware Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1