Chinese nested entity recognition method for the finance domain based on heterogeneous graph network

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-07-01 DOI:10.1016/j.ipm.2024.103812

Han Zhang , Yiping Dang , Yazhou Zhang , Siyuan Liang , Junxiu Liu , Lixia Ji

{"title":"Chinese nested entity recognition method for the finance domain based on heterogeneous graph network","authors":"Han Zhang , Yiping Dang , Yazhou Zhang , Siyuan Liang , Junxiu Liu , Lixia Ji","doi":"10.1016/j.ipm.2024.103812","DOIUrl":null,"url":null,"abstract":"<div><p>In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"61 5","pages":"Article 103812"},"PeriodicalIF":6.9000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001717","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the finance domain, nested named entities recognition has become a hot topic in named entity recognition tasks. Traditional nested entity recognition methods easily ignore the dependency relationships between entities, and these methods are mostly suitable for English general domain. Therefore, we propose a Chinese nested entity recognition method for the finance domain based on heterogeneous graph network(HGFNER). This method consists of two parts: the boundary division model of candidate entities and the internal relationship graph model of candidate entities. First, the boundary division model of candidate entities that introduces expert knowledge is used to partition the flat entities contained in the text and segment the text to address issues such as long entity boundaries and strong domain features in the Chinese finance domain. Then, by using heterogeneous graphs to represent the internal structure of entities from both spatial and syntactic dependencies to achieve the goal of learning dependency relationships between entities from multiple perspectives. Meanwhile, so as not to affect the operational efficiency of the model, we also propose a fast matching algorithm DAAC_BM for n-gram sequences in domain dictionaries to solve the problems of memory overflow and space waste faced by multi-pattern fast matching algorithms in Chinese matching. In addition, we propose a Chinese nested entity dataset CFNE for the financial field, which, as far as we know, is the first publicly available annotated dataset in the field. HGFNER achieves state-of-the-art macro-F1 value on CFNE, reaching 86.41%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于异构图网络的中文金融领域嵌套实体识别方法

在金融领域，嵌套命名实体识别已成为命名实体识别任务中的热门话题。传统的嵌套实体识别方法容易忽略实体之间的依赖关系，而且这些方法大多适用于英语通用领域。因此，我们提出了一种基于异构图网络（HGFNER）的金融领域中文嵌套实体识别方法。该方法由两部分组成：候选实体的边界划分模型和候选实体的内部关系图模型。首先，通过引入专家知识的候选实体边界划分模型，对文本中包含的平面实体进行划分，并针对中文金融领域实体边界长、领域特征强等问题对文本进行分割。然后，利用异构图从空间依赖和句法依赖两方面来表示实体的内部结构，实现从多角度学习实体间依赖关系的目标。同时，为了不影响模型的运行效率，我们还提出了针对领域词典中 n-gram 序列的快速匹配算法 DAAC_BM，以解决中文匹配中多模式快速匹配算法面临的内存溢出和空间浪费问题。此外，我们还提出了金融领域的中文嵌套实体数据集 CFNE，据我们所知，这是金融领域第一个公开的注释数据集。HGFNER 在 CFNE 上实现了最先进的宏 F1 值，达到了 86.41%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.