{"title":"基于句法和语义的中国非物质文化遗产文本实体-关系联合检测与概化方法","authors":"Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan","doi":"10.1145/3631124","DOIUrl":null,"url":null,"abstract":"[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.","PeriodicalId":54310,"journal":{"name":"ACM Journal on Computing and Cultural Heritage","volume":"15 4","pages":"0"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts\",\"authors\":\"Yuyao Tan, Hao Wang, Zibo Zhao, Tao Fan\",\"doi\":\"10.1145/3631124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.\",\"PeriodicalId\":54310,\"journal\":{\"name\":\"ACM Journal on Computing and Cultural Heritage\",\"volume\":\"15 4\",\"pages\":\"0\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal on Computing and Cultural Heritage\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3631124\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal on Computing and Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631124","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
A Joint Entity-Relation Detection and Generalization Method Based on Syntax and semantic for Chinese Intangible Cultural Heritage Texts
[Purpose/Significance] The annotation of natural language corpus not only facilitates researchers to extract knowledge from it, but also helps to achieve deeper mining of the corpus. But the annotated corpus in the humanities knowledge domain is less. And the semantic annotation of humanities texts is difficult, because it requires a high domain background for researchers, even requires the participation of domain experts. Based on this, this study proposes a method for detecting entities and relations in domain which is lack of annotated corpus, and provides a referenceable idea for constructing conceptual models based on textual instances. [Method/Process] Based on syntactic and semantic features, this study proposes SPO triple recognition rules from the perspective of giving priority to predicates and generalization rules from the perspective of triple's content and the meaning of its predicate. The recognition rules are used to extract text-descriptive SPO triples centered on predicates. After clustering and adjusting triples, use the generalization rules proposed in this study to obtain coarse-grained entities and relations, and then form a conceptual model. [Results/Conclusions] This study recognizes SPO triples with high precision and summarization from descriptive texts, generalizes them and then forms a domain conceptual model. The method proposed in this paper provides a research idea for entity-relation detection in a domain with missing annotated corpus, and the formed domain conceptual model provides a reference for building a domain Linked Data Graph. The feasibility of the method is verified through practice on texts related to the four traditional Chinese festivals.
期刊介绍:
ACM Journal on Computing and Cultural Heritage (JOCCH) publishes papers of significant and lasting value in all areas relating to the use of information and communication technologies (ICT) in support of Cultural Heritage. The journal encourages the submission of manuscripts that demonstrate innovative use of technology for the discovery, analysis, interpretation and presentation of cultural material, as well as manuscripts that illustrate applications in the Cultural Heritage sector that challenge the computational technologies and suggest new research opportunities in computer science.