基于句法和语义的中国非物质文化遗产文本实体-关系联合检测与概化方法

IF 2.2 3区计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS ACM Journal on Computing and Cultural Heritage Pub Date : 2023-11-02 DOI:10.1145/3631124

Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan

{"title":"基于句法和语义的中国非物质文化遗产文本实体-关系联合检测与概化方法","authors":"Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan","doi":"10.1145/3631124","DOIUrl":null,"url":null,"abstract":"[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.","PeriodicalId":54310,"journal":{"name":"ACM Journal on Computing and Cultural Heritage","volume":"15 4","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts\",\"authors\":\"Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan\",\"doi\":\"10.1145/3631124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.\",\"PeriodicalId\":54310,\"journal\":{\"name\":\"ACM Journal on Computing and Cultural Heritage\",\"volume\":\"15 4\",\"pages\":\"0\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal on Computing and Cultural Heritage\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3631124\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631124","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

【目的/意义】对自然语言语料库进行标注，不仅便于研究人员从中提取知识，而且有助于实现对语料库的更深层次挖掘。但人文知识领域的标注语料库较少。而人文文本的语义标注难度较大，因为它对研究者的领域背景要求较高，甚至需要领域专家的参与。在此基础上，本研究提出了一种在缺乏标注语料库的领域中检测实体和关系的方法，为基于文本实例构建概念模型提供了可参考的思路。[方法/过程]本研究基于句法和语义特征，从谓词优先的角度提出SPO三元组识别规则，从三元组的内容及其谓词意义的角度提出归纳规则。识别规则用于提取以谓词为中心的文本描述性SPO三元组。在聚类和调整三元组后，利用本文提出的概化规则获得粗粒度的实体和关系，进而形成概念模型。【结果/结论】本研究从描述性文本中识别出具有较高精度和概要性的SPO三元组，并对其进行概化，形成领域概念模型。本文提出的方法为缺少标注语料的领域的实体关系检测提供了一种研究思路，形成的领域概念模型为构建领域关联数据图提供了参考。通过对中国四大传统节日相关文本的实践验证了该方法的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts

[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Journal on Computing and Cultural Heritage Arts and Humanities-Conservation

CiteScore

4.60

自引率

8.30%

发文量

期刊介绍： ACM Journal on Computing and Cultural Heritage (JOCCH) publishes papers of significant and lasting value in all areas relating to the use of information and communication technologies (ICT) in support of Cultural Heritage. The journal encourages the submission of manuscripts that demonstrate innovative use of technology for the discovery, analysis, interpretation and presentation of cultural material, as well as manuscripts that illustrate applications in the Cultural Heritage sector that challenge the computational technologies and suggest new research opportunities in computer science.