{"title":"汉语篇章的浅句法分析","authors":"Chuqiao Yu, Igor Bessmertny","doi":"10.1109/CIACT.2017.7977287","DOIUrl":null,"url":null,"abstract":"The paper considers a problem of automatic processing of natural language Chinese texts. One of the pressing tasks in this area is automatic fact acquisition from text documents by a query because existing automatic translators are useless at this task. The goal of the work is direct extraction of facts from the text in the original language without its translation. The suggested approach consists of syntactic analysis of sentences with subsequent matching of parts of speech found with a formalized query in the form of subject-object-predicate. A distinctive feature of the proposed algorithm of syntactic analysis is the absence of phase of segmentation into words for the sequence of hieroglyphs that make up the sentences. The bottleneck at this task is a dictionary because the correct interpretation of a phrase can be impossible when a word is absent in the dictionary. To eliminate this problem, we propose to identify a sentence model by function words while limitedness of the dictionary could be compensated by an automatic building of a subject area thesaurus and a dictionary of common words using statistical processing of a document corpus. The suggested approach was approved on a small topic area with a limited dictionary where it demonstrates its robustness. The analysis of temporal characteristics of the developed algorithm was carried out as well. As the proposed algorithm uses a naive inference, the parsing speed at real tasks could be unacceptable low, and this should become a subject for further research.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Shallow syntactic analysis of Chinese texts\",\"authors\":\"Chuqiao Yu, Igor Bessmertny\",\"doi\":\"10.1109/CIACT.2017.7977287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper considers a problem of automatic processing of natural language Chinese texts. One of the pressing tasks in this area is automatic fact acquisition from text documents by a query because existing automatic translators are useless at this task. The goal of the work is direct extraction of facts from the text in the original language without its translation. The suggested approach consists of syntactic analysis of sentences with subsequent matching of parts of speech found with a formalized query in the form of subject-object-predicate. A distinctive feature of the proposed algorithm of syntactic analysis is the absence of phase of segmentation into words for the sequence of hieroglyphs that make up the sentences. The bottleneck at this task is a dictionary because the correct interpretation of a phrase can be impossible when a word is absent in the dictionary. To eliminate this problem, we propose to identify a sentence model by function words while limitedness of the dictionary could be compensated by an automatic building of a subject area thesaurus and a dictionary of common words using statistical processing of a document corpus. The suggested approach was approved on a small topic area with a limited dictionary where it demonstrates its robustness. The analysis of temporal characteristics of the developed algorithm was carried out as well. As the proposed algorithm uses a naive inference, the parsing speed at real tasks could be unacceptable low, and this should become a subject for further research.\",\"PeriodicalId\":218079,\"journal\":{\"name\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIACT.2017.7977287\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文研究了自然语言中文文本的自动处理问题。该领域的紧迫任务之一是通过查询从文本文档中自动获取事实,因为现有的自动翻译器在此任务中是无用的。这项工作的目标是直接从原文中提取事实,而不需要翻译。建议的方法包括对句子进行句法分析,然后用主-客体-谓词形式的形式化查询对所发现的词性进行匹配。所提出的句法分析算法的一个显著特征是,对于组成句子的象形文字序列,没有分词的阶段。这个任务的瓶颈是字典,因为当字典中没有单词时,可能不可能正确解释短语。为了消除这个问题,我们建议通过功能词来识别句子模型,而字典的局限性可以通过使用文档语料库的统计处理自动构建主题领域同义词库和常用词词典来补偿。建议的方法在一个具有有限字典的小主题领域得到了批准,在那里它证明了它的鲁棒性。并对所开发算法的时间特性进行了分析。由于所提出的算法使用朴素推理,在实际任务中的解析速度可能低得令人无法接受,这应该成为进一步研究的主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Shallow syntactic analysis of Chinese texts
The paper considers a problem of automatic processing of natural language Chinese texts. One of the pressing tasks in this area is automatic fact acquisition from text documents by a query because existing automatic translators are useless at this task. The goal of the work is direct extraction of facts from the text in the original language without its translation. The suggested approach consists of syntactic analysis of sentences with subsequent matching of parts of speech found with a formalized query in the form of subject-object-predicate. A distinctive feature of the proposed algorithm of syntactic analysis is the absence of phase of segmentation into words for the sequence of hieroglyphs that make up the sentences. The bottleneck at this task is a dictionary because the correct interpretation of a phrase can be impossible when a word is absent in the dictionary. To eliminate this problem, we propose to identify a sentence model by function words while limitedness of the dictionary could be compensated by an automatic building of a subject area thesaurus and a dictionary of common words using statistical processing of a document corpus. The suggested approach was approved on a small topic area with a limited dictionary where it demonstrates its robustness. The analysis of temporal characteristics of the developed algorithm was carried out as well. As the proposed algorithm uses a naive inference, the parsing speed at real tasks could be unacceptable low, and this should become a subject for further research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Smart solar tracking system for optimal power generation SVM with Gaussian kernel-based image spam detection on textual features Comparison between LDA & NMF for event-detection from large text stream data Research on the wisdom education platform of cloud computing architecture Robust TS fuzzy controller for helicopter via parallel distributed compensation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1