Chinese nested entity recognition method for the finance domain based on heterogeneous graph network

IF 7.4 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-07-01 DOI:10.1016/j.ipm.2024.103812
Han Zhang , Yiping Dang , Yazhou Zhang , Siyuan Liang , Junxiu Liu , Lixia Ji
{"title":"Chinese nested entity recognition method for the finance domain based on heterogeneous graph network","authors":"Han Zhang ,&nbsp;Yiping Dang ,&nbsp;Yazhou Zhang ,&nbsp;Siyuan Liang ,&nbsp;Junxiu Liu ,&nbsp;Lixia Ji","doi":"10.1016/j.ipm.2024.103812","DOIUrl":null,"url":null,"abstract":"<div><p>In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001717","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于异构图网络的中文金融领域嵌套实体识别方法
在金融领域,嵌套命名实体识别已成为命名实体识别任务中的热门话题。传统的嵌套实体识别方法容易忽略实体之间的依赖关系,而且这些方法大多适用于英语通用领域。因此,我们提出了一种基于异构图网络(HGFNER)的金融领域中文嵌套实体识别方法。该方法由两部分组成:候选实体的边界划分模型和候选实体的内部关系图模型。首先,通过引入专家知识的候选实体边界划分模型,对文本中包含的平面实体进行划分,并针对中文金融领域实体边界长、领域特征强等问题对文本进行分割。然后,利用异构图从空间依赖和句法依赖两方面来表示实体的内部结构,实现从多角度学习实体间依赖关系的目标。同时,为了不影响模型的运行效率,我们还提出了针对领域词典中 n-gram 序列的快速匹配算法 DAAC_BM,以解决中文匹配中多模式快速匹配算法面临的内存溢出和空间浪费问题。此外,我们还提出了金融领域的中文嵌套实体数据集 CFNE,据我们所知,这是金融领域第一个公开的注释数据集。HGFNER 在 CFNE 上实现了最先进的宏 F1 值,达到了 86.41%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
期刊最新文献
Fusing temporal and semantic dependencies for session-based recommendation A Universal Adaptive Algorithm for Graph Anomaly Detection A context-aware attention and graph neural network-based multimodal framework for misogyny detection Multi-granularity contrastive zero-shot learning model based on attribute decomposition Asymmetric augmented paradigm-based graph neural architecture search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1