A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts

IF 2.1 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS ACM Journal on Computing and Cultural Heritage Pub Date : 2023-11-02 DOI:10.1145/3631124
Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan
{"title":"A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts","authors":"Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan","doi":"10.1145/3631124","DOIUrl":null,"url":null,"abstract":"[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.","PeriodicalId":54310,"journal":{"name":"ACM Journal on Computing and Cultural Heritage","volume":"15 4","pages":"0"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631124","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于句法和语义的中国非物质文化遗产文本实体-关系联合检测与概化方法
【目的/意义】对自然语言语料库进行标注,不仅便于研究人员从中提取知识,而且有助于实现对语料库的更深层次挖掘。但人文知识领域的标注语料库较少。而人文文本的语义标注难度较大,因为它对研究者的领域背景要求较高,甚至需要领域专家的参与。在此基础上,本研究提出了一种在缺乏标注语料库的领域中检测实体和关系的方法,为基于文本实例构建概念模型提供了可参考的思路。[方法/过程]本研究基于句法和语义特征,从谓词优先的角度提出SPO三元组识别规则,从三元组的内容及其谓词意义的角度提出归纳规则。识别规则用于提取以谓词为中心的文本描述性SPO三元组。在聚类和调整三元组后,利用本文提出的概化规则获得粗粒度的实体和关系,进而形成概念模型。【结果/结论】本研究从描述性文本中识别出具有较高精度和概要性的SPO三元组,并对其进行概化,形成领域概念模型。本文提出的方法为缺少标注语料的领域的实体关系检测提供了一种研究思路,形成的领域概念模型为构建领域关联数据图提供了参考。通过对中国四大传统节日相关文本的实践验证了该方法的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Journal on Computing and Cultural Heritage
ACM Journal on Computing and Cultural Heritage Arts and Humanities-Conservation
CiteScore
4.60
自引率
8.30%
发文量
90
期刊介绍: ACM Journal on Computing and Cultural Heritage (JOCCH) publishes papers of significant and lasting value in all areas relating to the use of information and communication technologies (ICT) in support of Cultural Heritage. The journal encourages the submission of manuscripts that demonstrate innovative use of technology for the discovery, analysis, interpretation and presentation of cultural material, as well as manuscripts that illustrate applications in the Cultural Heritage sector that challenge the computational technologies and suggest new research opportunities in computer science.
期刊最新文献
Heritage Iconographic Content Structuring: from Automatic Linking to Visual Validation Digitising the Deep Past: Machine Learning for Rock Art Motif Classification in an Educational Citizen Science Application Interpretable Clusters for Representing Citizens’ Sense of Belonging through Interaction with Cultural Heritage Classification of Impressionist and Pointillist paintings based on their brushstrokes characteristics ZoAM GameBot: a Journey to the Lost Computational World in the Amazonia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1