首页 > 最新文献

2014 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
The formation of modern Chinese imperative sentence combining 现代汉语祈使句组合的形成
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973480
Hao Zhao, Shuqin Shi
A comprehensive perception of the essence of human imperative behaviors and a thorough study of imperative sentences require researchers to shift their views from imperative sentences to imperative sentence combining. This research treats the linking patterns and manners between or among sentences within imperative sentence combining. The linking patterns refer to the linking between imperative sentences and other kinds of sentences and the linking within imperative sentences. The former is achieved through logical semantic relations and in simple manner while the latter involves marked linking and unmarked linking, the manners of which are varied.
全面认识人类祈使句行为的本质,对祈使句进行深入的研究,要求研究者将研究视角从祈使句转向祈使句组合。本文研究祈使句组合中句子之间的连接方式和方式。连接模式是指祈使句与其他类型的句子之间的连接以及祈使句内部的连接。前者是通过逻辑语义关系实现的,方式简单;后者则有标记连接和无标记连接,其方式多种多样。
{"title":"The formation of modern Chinese imperative sentence combining","authors":"Hao Zhao, Shuqin Shi","doi":"10.1109/IALP.2014.6973480","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973480","url":null,"abstract":"A comprehensive perception of the essence of human imperative behaviors and a thorough study of imperative sentences require researchers to shift their views from imperative sentences to imperative sentence combining. This research treats the linking patterns and manners between or among sentences within imperative sentence combining. The linking patterns refer to the linking between imperative sentences and other kinds of sentences and the linking within imperative sentences. The former is achieved through logical semantic relations and in simple manner while the latter involves marked linking and unmarked linking, the manners of which are varied.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125463992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An acoustic analysis of English monophthongs by Tibetan speakers 藏语英语单音元音的声学分析
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973495
Shourong An, Hui Feng, Huixia Wang, J. Dang
This paper, under the framework of Flege's(1995) Speech Learning Model, investigates the acoustic features of English monophthongs produced by college students who speak Tibetan (Lhasa). Acoustic features of English monophthongs produced by Tibetan speakers are compared with those of British news broadcasters (Deterding, 1997). The study found that: 1. Under the influence of Tibetan vowel system, English vowel space for Tibetan is centralized and smaller than that of RP(Received Pronunciation). 2. Female Tibetan cannot distinguish vowel contrast /e/and /æ/ very well. For English vowels produced by female Tibetan, /i:/ is lower than that of RP, /i/ and /a:/ are fronter than RP /i/ and /a:/, what's more, /u:/ is less fronter than RP/u:/; male Tibetan differentiate vowels /e/ and /æ/ successfully. For English vowels produced by male Tibetan, /i:/ is less fronter than RP /i:/, /a:/ is much fronter than RP/a:/ and /u:/ is less fronter and lower than RP/u:/.
本文在Flege(1995)语音学习模型的框架下,对拉萨语大学生的英语单音语音特征进行了研究。将藏语英语单音元音与英国新闻播音员的英语单音元音进行了比较(Deterding, 1997)。研究发现:1。在藏语元音系统的影响下,藏语的英语元音空间比较集中,比RP(Received Pronunciation)的元音空间小。2. 藏族女性不能很好地区分元音对比/e/和/æ/。藏族女性发的英语元音,/i:/比RP低,/i/和/a:/比RP/ i/和/a:/前,/u:/比RP/u:/前;藏族男性能够很好地区分/e/和/æ/。对于男性藏语产生的英语元音,/i:/比RP/ i:/的前音低,/a:/比RP/a:/的前音高,/u:/比RP/u:/的前音低。
{"title":"An acoustic analysis of English monophthongs by Tibetan speakers","authors":"Shourong An, Hui Feng, Huixia Wang, J. Dang","doi":"10.1109/IALP.2014.6973495","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973495","url":null,"abstract":"This paper, under the framework of Flege's(1995) Speech Learning Model, investigates the acoustic features of English monophthongs produced by college students who speak Tibetan (Lhasa). Acoustic features of English monophthongs produced by Tibetan speakers are compared with those of British news broadcasters (Deterding, 1997). The study found that: 1. Under the influence of Tibetan vowel system, English vowel space for Tibetan is centralized and smaller than that of RP(Received Pronunciation). 2. Female Tibetan cannot distinguish vowel contrast /e/and /æ/ very well. For English vowels produced by female Tibetan, /i:/ is lower than that of RP, /i/ and /a:/ are fronter than RP /i/ and /a:/, what's more, /u:/ is less fronter than RP/u:/; male Tibetan differentiate vowels /e/ and /æ/ successfully. For English vowels produced by male Tibetan, /i:/ is less fronter than RP /i:/, /a:/ is much fronter than RP/a:/ and /u:/ is less fronter and lower than RP/u:/.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124530230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A prosodic analysis of insertion repair at transition space in Chinese conversation 汉语会话中过渡空间插入补缀的韵律分析
Pub Date : 2014-12-03 DOI: 10.1109/IALP.2014.6973515
Wei Zhang
This paper examines the prosodic cues of a particular type of self-repair, namely, insertion repair at transition place in Mandarin Chinese. Studies on the prosodic cues of self-repair mostly examined repair types such as repetition and replacement initiated before a turn unit's completion. Studies on further talk past a turn unit's completion observed different prosodic packaging for further talk constructed as a new separate unit and further talk constructed as syntactic extension of the preceding unit. Our study of the insertion repair produced past a turn unit's completion examined the prosodic cues such as speech rate, F0 and intensity of the repair-target unit, the repairing unit and the inserted material in the repair unit. The results show a general pattern for the repairing unit to be produced with quickened tempo, reduced pitch and amplitude and less energy. The results also show that the inserted materials in the repairing unit tend to receive prominence in terms of intensity.
本文研究了普通话中一种特殊类型的自我修复,即过渡位置插入修复的韵律线索。对自我修复的韵律线索的研究主要考察了在一个回合单元完成之前开始的重复和替换等修复类型。通过对一个回合单元完成后的进一步谈话的研究发现,进一步谈话作为一个新的独立单元和进一步谈话作为前一个单元的句法延伸有不同的韵律包装。我们对插入修复完成后产生的插入修复单元的研究考察了语音频率、F0和修复目标单元、修复单元和修复单元中插入材料的强度等韵律线索。研究结果表明,修复装置生产的一般模式是加快节拍,减小音高和振幅,减少能量。结果还表明,在修复单元中插入的材料在强度上有突出的趋势。
{"title":"A prosodic analysis of insertion repair at transition space in Chinese conversation","authors":"Wei Zhang","doi":"10.1109/IALP.2014.6973515","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973515","url":null,"abstract":"This paper examines the prosodic cues of a particular type of self-repair, namely, insertion repair at transition place in Mandarin Chinese. Studies on the prosodic cues of self-repair mostly examined repair types such as repetition and replacement initiated before a turn unit's completion. Studies on further talk past a turn unit's completion observed different prosodic packaging for further talk constructed as a new separate unit and further talk constructed as syntactic extension of the preceding unit. Our study of the insertion repair produced past a turn unit's completion examined the prosodic cues such as speech rate, F0 and intensity of the repair-target unit, the repairing unit and the inserted material in the repair unit. The results show a general pattern for the repairing unit to be produced with quickened tempo, reduced pitch and amplitude and less energy. The results also show that the inserted materials in the repairing unit tend to receive prominence in terms of intensity.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132292178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic processing of Chinese special structure ‘Ba-construction’ 中国特殊结构“坝-构”的自动化加工
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973486
Yiyi Zhao
Ba-construction is a special syntactic structure in modern Chinese. This paper gives a short summary on these topics and extracts 500 sentences including Ba-construction from CCRL. After detail analysis of the samples' phrase structure, the author builds the rules for computer based on CFG. These rules are tested by CTT - a parsing tree tracer. The author also points out the problems existed in the rules of CFG and the direction of further improvement.
“霸构”是现代汉语中一种特殊的句法结构。本文对这些主题进行了简要的总结,并从CCRL中提取了500个包含ba结构的句子。在详细分析了样本的短语结构后,作者建立了基于CFG的计算机规则。这些规则由CTT(一种解析树跟踪器)测试。作者还指出了CFG规则中存在的问题和进一步完善的方向。
{"title":"Automatic processing of Chinese special structure ‘Ba-construction’","authors":"Yiyi Zhao","doi":"10.1109/IALP.2014.6973486","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973486","url":null,"abstract":"Ba-construction is a special syntactic structure in modern Chinese. This paper gives a short summary on these topics and extracts 500 sentences including Ba-construction from CCRL. After detail analysis of the samples' phrase structure, the author builds the rules for computer based on CFG. These rules are tested by CTT - a parsing tree tracer. The author also points out the problems existed in the rules of CFG and the direction of further improvement.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126301458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of glottal inverse filtering for asthma and HIE infant cries classification 声门反滤在哮喘和HIE婴儿哭声分类中的应用
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973505
Anshu Chittora, H. Patil
In this paper, feature derived from the glottal inverse filtering of the speech signal is used for classification of pathological infant cries. Glottal inverse filtering is used to estimate the glottal volume velocity waveform (i.e., the source of voicing for infant cry). Here, GIF is used to separate the glottal source and vocal tract filter. The source and the filter features are used for pathological cries classification. Through the experimental results, importance of both the features in cry classification is investigated. State-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is also used to compare performance of the proposed feature set. Experimental results show classification accuracy of 76.28 % with the proposed features as opposed to state-of-the-art, MFCC feature which shows classification accuracy of 71.13 %. Fusion of the proposed feature set with MFCC gives classification accuracy of 78.35 % indicating that proposed feature captures the complimentary information in infant cry signal. All experiments were conducted with SVM classifier with radial basis function kernel.
本文利用语音信号声门反滤波的特征对病理性婴儿哭声进行分类。声门反滤波用于估计声门音量速度波形(即婴儿哭声的发声源)。在这里,GIF被用来分离声门源和声道滤波器。利用源特征和滤波器特征对病理性哭喊进行分类。通过实验结果,探讨了这两种特征在哭泣分类中的重要性。最先进的特征集,即Mel频率倒谱系数(MFCC)也用于比较所提出的特征集的性能。实验结果表明,与目前最先进的MFCC特征的分类准确率为71.13%相比,该特征的分类准确率为76.28%。将所提出的特征集与MFCC融合,分类准确率达到78.35%,表明所提出的特征捕获了婴儿哭声信号中的互补信息。所有实验均采用径向基函数核支持向量机分类器进行。
{"title":"Use of glottal inverse filtering for asthma and HIE infant cries classification","authors":"Anshu Chittora, H. Patil","doi":"10.1109/IALP.2014.6973505","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973505","url":null,"abstract":"In this paper, feature derived from the glottal inverse filtering of the speech signal is used for classification of pathological infant cries. Glottal inverse filtering is used to estimate the glottal volume velocity waveform (i.e., the source of voicing for infant cry). Here, GIF is used to separate the glottal source and vocal tract filter. The source and the filter features are used for pathological cries classification. Through the experimental results, importance of both the features in cry classification is investigated. State-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is also used to compare performance of the proposed feature set. Experimental results show classification accuracy of 76.28 % with the proposed features as opposed to state-of-the-art, MFCC feature which shows classification accuracy of 71.13 %. Fusion of the proposed feature set with MFCC gives classification accuracy of 78.35 % indicating that proposed feature captures the complimentary information in infant cry signal. All experiments were conducted with SVM classifier with radial basis function kernel.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"99 30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131020232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Discovering linguistic knowledge by converting printed dictionaries of minority languages into machine readable dictionaries 通过将少数民族语言的印刷字典转换成机器可读字典来发现语言知识
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973522
Bali Ranaivo-Malançon, Suhaila Saee, Jennifer Fiona Wilfred Busu
The goal of the project presented in this paper is to explore the linguistic knowledge hidden in printed dictionaries of minority languages. Firstly, the printed dictionary has to be converted into a machine readable dictionary. The second step is to make use of existing language processing tools to discover the hidden knowledge. To illustrate the proposed idea, a version of an English-Penan dictionary is used as the case-study. It appears that even with a small amount of data, some interesting information, like the first list of functional words, some collocations, and an insight of the morphological structure of the Penan language can be discovered.
本课题的目标是探索隐藏在少数民族语言印刷词典中的语言知识。首先,印刷的字典必须转换成机器可读的字典。第二步是利用现有的语言处理工具发现隐藏的知识。为了说明所提出的观点,本文以英汉词典为例进行了研究。似乎即使只有少量的数据,也可以发现一些有趣的信息,比如第一个功能词列表,一些搭配,以及对Penan语言形态结构的洞察。
{"title":"Discovering linguistic knowledge by converting printed dictionaries of minority languages into machine readable dictionaries","authors":"Bali Ranaivo-Malançon, Suhaila Saee, Jennifer Fiona Wilfred Busu","doi":"10.1109/IALP.2014.6973522","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973522","url":null,"abstract":"The goal of the project presented in this paper is to explore the linguistic knowledge hidden in printed dictionaries of minority languages. Firstly, the printed dictionary has to be converted into a machine readable dictionary. The second step is to make use of existing language processing tools to discover the hidden knowledge. To illustrate the proposed idea, a version of an English-Penan dictionary is used as the case-study. It appears that even with a small amount of data, some interesting information, like the first list of functional words, some collocations, and an insight of the morphological structure of the Penan language can be discovered.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124357809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
“Wanyi···Ye···”
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973469
Shuzhen Shi, Pu Li
This paper discuses concessive compact construction “wanyi···ye···” from Chinese Information Processing. The simple sentence with “Wanyi” and “Ye” and the concessive compact construction of “wanyi···ye···” are similar in syntax, which have been distinguished at first. The semantic feature of concessive compact construction “wanyi···ye···” is subjective, which has been betrayed in different ways.
本文讨论了《汉语信息加工》中“万一··业···”的让步紧凑结构。“完一”、“爷”的简单句和“完一··爷···”的让步紧凑结构在句法上是相似的,这是首先加以区分的。“万一··也···”这一让与紧凑结构的语义特征具有主观性,这种主观性表现在不同的方面。
{"title":"“Wanyi···Ye···”","authors":"Shuzhen Shi, Pu Li","doi":"10.1109/IALP.2014.6973469","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973469","url":null,"abstract":"This paper discuses concessive compact construction “wanyi···ye···” from Chinese Information Processing. The simple sentence with “Wanyi” and “Ye” and the concessive compact construction of “wanyi···ye···” are similar in syntax, which have been distinguished at first. The semantic feature of concessive compact construction “wanyi···ye···” is subjective, which has been betrayed in different ways.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124031065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative study on extractive speech summarization of broadcast news and parliamentary meeting speech 广播新闻摘录式讲话摘要与议会会议讲话的比较研究
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973497
Jian Zhang, Huaqiang Yuan
We carry out a comprehensive study of acous-tic/prosodic, linguistic and structural features for speech summarization, contrasting two genres of speech, namely Mandarin Broadcast News and Cantonese Parliamentary Speech. We find that structural features are superior to acoustic and lexical features when summarizing broadcast news because of the fact that in the same Mandarin broadcast program, the distribution and flow of summary utterances are relatively consistent. We use different machine learning algorithms to construct the binary-class summarizers to select the best features for extractive summarization, and obtain state-of-the-art performances: ROUGE-L F-measure of 0.682 for Mandarin Broadcast News, and 0.737 for Cantonese Parliamentary Meeting Speech. In the case of Parliamentary Meeting Speech summarization, we show that our summarizer performed surprisingly well ROUGE-L F-measure of 0.729 by using ASR transcription despite the character error rate of 27%. We also discover that the different choices of algorithms almost do not affect the consistency of our findings.
本文从语音总结的声学/韵律、语言和结构特征等方面对普通话广播新闻和粤语国会演讲两种语言类型进行了对比研究。我们发现,在对广播新闻进行总结时,结构特征优于声学和词汇特征,因为在同一个普通话广播节目中,总结话语的分布和流向是相对一致的。我们使用不同的机器学习算法来构建二类摘要器,以选择最佳特征进行提取摘要,并获得了最先进的性能:普通话广播新闻的ROUGE-L f-测度为0.682,粤语议会会议演讲的ROUGE-L f-测度为0.737。在议会会议演讲摘要的情况下,我们表明,尽管字符错误率为27%,但我们的摘要器使用ASR转录的ROUGE-L f度量值为0.729,表现令人惊讶。我们还发现,算法的不同选择几乎不影响我们发现的一致性。
{"title":"A comparative study on extractive speech summarization of broadcast news and parliamentary meeting speech","authors":"Jian Zhang, Huaqiang Yuan","doi":"10.1109/IALP.2014.6973497","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973497","url":null,"abstract":"We carry out a comprehensive study of acous-tic/prosodic, linguistic and structural features for speech summarization, contrasting two genres of speech, namely Mandarin Broadcast News and Cantonese Parliamentary Speech. We find that structural features are superior to acoustic and lexical features when summarizing broadcast news because of the fact that in the same Mandarin broadcast program, the distribution and flow of summary utterances are relatively consistent. We use different machine learning algorithms to construct the binary-class summarizers to select the best features for extractive summarization, and obtain state-of-the-art performances: ROUGE-L F-measure of 0.682 for Mandarin Broadcast News, and 0.737 for Cantonese Parliamentary Meeting Speech. In the case of Parliamentary Meeting Speech summarization, we show that our summarizer performed surprisingly well ROUGE-L F-measure of 0.729 by using ASR transcription despite the character error rate of 27%. We also discover that the different choices of algorithms almost do not affect the consistency of our findings.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134535834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An approach for automatically structuring Vietnamese legal text 一种自动构建越南法律文本的方法
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973500
Thinh D. Bui, Quoc B. Ho
Recognizing the structures of legal texts is significant to understand texts in this domain. In this paper, we describe a task of automatically structuring Vietnamese legal documents based on our study of this domain in several aspects: their linguistic features and patterns of recognition with trigger sets. The task focuses on the recognition of the logical structures of the documents stored in law database of Vietnam Ministry of Justice. A rule-based approach in association with some patterns is applied and is verified on a manually built corpus. Experimental result got 64.37% on assumption annotation, 64.15% on provision annotation and 75.76% on sanction annotation in the Fβ=1 score on the corpus of Vietnamese Enterprise Law articles.
认识法律文本的结构对于理解这一领域的文本具有重要意义。在本文中,我们描述了一个基于该领域研究的自动结构化越南法律文件的任务:它们的语言特征和触发集识别模式。该任务的重点是对越南司法部法律数据库中存储的文件的逻辑结构进行识别。应用与一些模式相关联的基于规则的方法,并在手动构建的语料库上进行验证。实验结果表明,在越南企业法文章语料库的Fβ=1分数中,假设标注率为64.37%,条款标注率为64.15%,制裁标注率为75.76%。
{"title":"An approach for automatically structuring Vietnamese legal text","authors":"Thinh D. Bui, Quoc B. Ho","doi":"10.1109/IALP.2014.6973500","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973500","url":null,"abstract":"Recognizing the structures of legal texts is significant to understand texts in this domain. In this paper, we describe a task of automatically structuring Vietnamese legal documents based on our study of this domain in several aspects: their linguistic features and patterns of recognition with trigger sets. The task focuses on the recognition of the logical structures of the documents stored in law database of Vietnam Ministry of Justice. A rule-based approach in association with some patterns is applied and is verified on a manually built corpus. Experimental result got 64.37% on assumption annotation, 64.15% on provision annotation and 75.76% on sanction annotation in the Fβ=1 score on the corpus of Vietnamese Enterprise Law articles.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127025275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Named entity recognition in Assamese using CRFS and rules 使用CRFS和规则在阿萨姆邦进行命名实体识别
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973498
Padmaja Sharma, U. Sharma, J. Kalita
Named Entity Recognition (NER) is an important task in all Natural Language Processing (NLP) applications. It is the process of identifying and classifying the proper noun into classes such as person, location, organization and miscellaneous. Substantial work has been done in English and other European languages, achieving greater accuracy compared to the Indian Languages. Although NER in Indian languages is a difficult and challenging task and suffers from scarcity of resources, such work has started to appear recently. This paper discusses work on NER in Assamese using both Conditional Random Fields and a Rule-Based approach which gives an F-measure of 90-95% accuracy.
命名实体识别(NER)是所有自然语言处理(NLP)应用中的一个重要任务。它是对专有名词进行人、地、组织、杂等类的识别和分类的过程。用英语和其他欧洲语言进行了大量的工作,与印度语言相比,取得了更高的准确性。尽管印度语言的NER是一项困难和具有挑战性的任务,并且受到资源稀缺的影响,但这种工作最近开始出现。本文讨论了在阿萨姆邦使用条件随机场和基于规则的方法进行NER的工作,该方法给出了90-95%准确率的f度量。
{"title":"Named entity recognition in Assamese using CRFS and rules","authors":"Padmaja Sharma, U. Sharma, J. Kalita","doi":"10.1109/IALP.2014.6973498","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973498","url":null,"abstract":"Named Entity Recognition (NER) is an important task in all Natural Language Processing (NLP) applications. It is the process of identifying and classifying the proper noun into classes such as person, location, organization and miscellaneous. Substantial work has been done in English and other European languages, achieving greater accuracy compared to the Indian Languages. Although NER in Indian languages is a difficult and challenging task and suffers from scarcity of resources, such work has started to appear recently. This paper discusses work on NER in Assamese using both Conditional Random Fields and a Rule-Based approach which gives an F-measure of 90-95% accuracy.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116887652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2014 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1