针对复杂长文本分类的认知启发多粒度模型（包含标签信息

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2023-12-26 DOI:10.1007/s12559-023-10237-1

Li Gao, Yi Liu, Jianmin Zhu, Zhen Yu

{"title":"针对复杂长文本分类的认知启发多粒度模型（包含标签信息","authors":"Li Gao, Yi Liu, Jianmin Zhu, Zhen Yu","doi":"10.1007/s12559-023-10237-1","DOIUrl":null,"url":null,"abstract":"Because the abstracts contain complex information and the labels of abstracts do not contain information about categories, it is difficult for cognitive models to extract comprehensive features to match the corresponding labels. In this paper, a cognitively inspired multi-granularity model incorporating label information (LIMG) is proposed to solve these problems. Firstly, we use information of abstracts to give labels the actual semantics. It can improve the semantic representation of word embeddings. Secondly, the model uses the dual channel pooling convolutional neural network (DCP-CNN) and the timescale shrink gated recurrent units (TSGRU) to extract multi-granularity information of abstracts. One of the channels in DCP-CNN highlights the key content and the other is used for TSGRU to extract context-related features of abstracts. Finally, TSGRU adds a timescale to retain the long-term dependence by recuring the past information and a soft thresholding algorithm to realize the noise reduction. Experiments were carried out on four benchmark datasets: Arxiv Academic Paper Dataset (AAPD), Web of Science (WOS), Amazon Review and Yahoo! Answers. As compared to the baseline models, the accuracy is improved by up to 3.36%. On AAPD (54,840 abstracts) and WOS (46,985 abstracts) datasets, the micro-F1 score reached 75.62% and 81.68%, respectively. The results show that acquiring label semantics from abstracts can enhance text representations and multi-granularity feature extraction can inspire the cognitive system’s understanding of the complex information in abstracts.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"28 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Cognitively Inspired Multi-granularity Model Incorporating Label Information for Complex Long Text Classification\",\"authors\":\"Li Gao, Yi Liu, Jianmin Zhu, Zhen Yu\",\"doi\":\"10.1007/s12559-023-10237-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Because the abstracts contain complex information and the labels of abstracts do not contain information about categories, it is difficult for cognitive models to extract comprehensive features to match the corresponding labels. In this paper, a cognitively inspired multi-granularity model incorporating label information (LIMG) is proposed to solve these problems. Firstly, we use information of abstracts to give labels the actual semantics. It can improve the semantic representation of word embeddings. Secondly, the model uses the dual channel pooling convolutional neural network (DCP-CNN) and the timescale shrink gated recurrent units (TSGRU) to extract multi-granularity information of abstracts. One of the channels in DCP-CNN highlights the key content and the other is used for TSGRU to extract context-related features of abstracts. Finally, TSGRU adds a timescale to retain the long-term dependence by recuring the past information and a soft thresholding algorithm to realize the noise reduction. Experiments were carried out on four benchmark datasets: Arxiv Academic Paper Dataset (AAPD), Web of Science (WOS), Amazon Review and Yahoo! Answers. As compared to the baseline models, the accuracy is improved by up to 3.36%. On AAPD (54,840 abstracts) and WOS (46,985 abstracts) datasets, the micro-F1 score reached 75.62% and 81.68%, respectively. The results show that acquiring label semantics from abstracts can enhance text representations and multi-granularity feature extraction can inspire the cognitive system’s understanding of the complex information in abstracts.\",\"PeriodicalId\":51243,\"journal\":{\"name\":\"Cognitive Computation\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12559-023-10237-1\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-023-10237-1","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于抽象内容包含复杂信息，而抽象内容的标签又不包含类别信息，因此认知模型很难提取全面的特征来匹配相应的标签。本文提出了一种包含标签信息的认知启发多粒度模型（LIMG）来解决这些问题。首先，我们利用抽象信息赋予标签实际语义。这可以改善词嵌入的语义表示。其次，该模型使用双通道池化卷积神经网络（DCP-CNN）和时标收缩门控递归单元（TSGRU）来提取摘要的多粒度信息。DCP-CNN 中的一个通道突出关键内容，另一个通道用于 TSGRU 提取摘要的上下文相关特征。最后，TSGRU 增加了一个时间尺度，通过重现过去的信息来保留长期依赖性，并增加了一个软阈值算法来实现降噪。我们在四个基准数据集上进行了实验：实验在四个基准数据集上进行：Arxiv 学术论文数据集（AAPD）、科学网（WOS）、亚马逊评论和雅虎答案。与基准模型相比，准确率提高了 3.36%。在 AAPD（54,840 篇摘要）和 WOS（46,985 篇摘要）数据集上，micro-F1 分数分别达到了 75.62% 和 81.68%。结果表明，从摘要中获取标签语义可以增强文本表征，而多粒度特征提取则可以启发认知系统对摘要中复杂信息的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Cognitively Inspired Multi-granularity Model Incorporating Label Information for Complex Long Text Classification

Because the abstracts contain complex information and the labels of abstracts do not contain information about categories, it is difficult for cognitive models to extract comprehensive features to match the corresponding labels. In this paper, a cognitively inspired multi-granularity model incorporating label information (LIMG) is proposed to solve these problems. Firstly, we use information of abstracts to give labels the actual semantics. It can improve the semantic representation of word embeddings. Secondly, the model uses the dual channel pooling convolutional neural network (DCP-CNN) and the timescale shrink gated recurrent units (TSGRU) to extract multi-granularity information of abstracts. One of the channels in DCP-CNN highlights the key content and the other is used for TSGRU to extract context-related features of abstracts. Finally, TSGRU adds a timescale to retain the long-term dependence by recuring the past information and a soft thresholding algorithm to realize the noise reduction. Experiments were carried out on four benchmark datasets: Arxiv Academic Paper Dataset (AAPD), Web of Science (WOS), Amazon Review and Yahoo! Answers. As compared to the baseline models, the accuracy is improved by up to 3.36%. On AAPD (54,840 abstracts) and WOS (46,985 abstracts) datasets, the micro-F1 score reached 75.62% and 81.68%, respectively. The results show that acquiring label semantics from abstracts can enhance text representations and multi-granularity feature extraction can inspire the cognitive system’s understanding of the complex information in abstracts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES

CiteScore

9.30

自引率

3.70%

发文量

116

审稿时长

>12 weeks

期刊介绍： Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.