首页 > 最新文献

2014 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
Vocal tract length normalization for vowel recognition in low resource languages 低资源语言中元音识别的声道长度归一化
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973516
Shubham Sharma, Maulik C. Madhavi, H. Patil
Vocal Tract Length Normalization (VTLN) is used to design vocal tract length normalized Automatic Speech Recognition (ASR) systems. It has led to improvement in the performance of ASR systems by taking into account the physiological differences among speakers. Recently, a number of speech recognition applications are being developed for Indian languages. In this paper, we use state-of-the-art method for VTLN based on maximum likelihood approach. A vowel recognition system has been developed for two low resourced Indian languages, viz., Gujarati and Marathi. Appropriate warping factors have been obtained for all speakers considered for training and testing procedures. An improvement in the performance of vowel recognition is observed as compared to state-of-the-art Mel Frequency Cepstral Coefficients (MFCC).
声道长度归一化(VTLN)用于设计声道长度归一化的自动语音识别(ASR)系统。它通过考虑说话者之间的生理差异而改善了ASR系统的性能。最近,一些针对印度语言的语音识别应用程序正在开发中。在本文中,我们使用基于极大似然方法的最先进的VTLN方法。为两种资源贫乏的印度语言,即古吉拉特语和马拉地语,开发了一个元音识别系统。已为所有考虑培训和测试程序的扬声器获得适当的翘曲因素。与最先进的Mel频率倒谱系数(MFCC)相比,元音识别性能有所改善。
{"title":"Vocal tract length normalization for vowel recognition in low resource languages","authors":"Shubham Sharma, Maulik C. Madhavi, H. Patil","doi":"10.1109/IALP.2014.6973516","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973516","url":null,"abstract":"Vocal Tract Length Normalization (VTLN) is used to design vocal tract length normalized Automatic Speech Recognition (ASR) systems. It has led to improvement in the performance of ASR systems by taking into account the physiological differences among speakers. Recently, a number of speech recognition applications are being developed for Indian languages. In this paper, we use state-of-the-art method for VTLN based on maximum likelihood approach. A vowel recognition system has been developed for two low resourced Indian languages, viz., Gujarati and Marathi. Appropriate warping factors have been obtained for all speakers considered for training and testing procedures. An improvement in the performance of vowel recognition is observed as compared to state-of-the-art Mel Frequency Cepstral Coefficients (MFCC).","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extracting parallel phrases from comparable corpora 从可比语料库中提取平行短语
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973501
Jiexin Zhang, Hailong Cao, T. Zhao
The state-of-the-art statistical machine translation models are trained with the parallel corpora. However, the traditional SMT loses its power when it comes to language pairs with few bilingual resources. This paper proposes a novel method that treats the phrase extraction as a classification task. We first automatically generate the training and testing phrase pairs for the classifier. Then, we train a SVM classifier which can determine the phrase pairs are either parallel or non-parallel. The proposed approach is evaluated on the translation task of Chinese-English. Experimental results show that the precision of the classifier on test sets is above 70% and the accuracy is above 98% The quality of the extracted data is also evaluated by measuring the impact on the performance of a state-of-the-art SMT system, which is built with a small parallel corpus. It shows better results over the baseline system.
用并行语料库训练最先进的统计机器翻译模型。然而,当涉及到双语资源很少的语言对时,传统的SMT就失去了它的力量。本文提出了一种将短语提取作为分类任务的新方法。我们首先为分类器自动生成训练和测试短语对。然后,我们训练了一个支持向量机分类器,它可以判断短语对是并行的还是非并行的。在汉英翻译任务中对该方法进行了评价。实验结果表明,该分类器在测试集上的准确率在70%以上,准确率在98%以上,并通过测量对最先进的SMT系统性能的影响来评估提取数据的质量,该系统使用小型并行语料库构建。它显示了比基线系统更好的结果。
{"title":"Extracting parallel phrases from comparable corpora","authors":"Jiexin Zhang, Hailong Cao, T. Zhao","doi":"10.1109/IALP.2014.6973501","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973501","url":null,"abstract":"The state-of-the-art statistical machine translation models are trained with the parallel corpora. However, the traditional SMT loses its power when it comes to language pairs with few bilingual resources. This paper proposes a novel method that treats the phrase extraction as a classification task. We first automatically generate the training and testing phrase pairs for the classifier. Then, we train a SVM classifier which can determine the phrase pairs are either parallel or non-parallel. The proposed approach is evaluated on the translation task of Chinese-English. Experimental results show that the precision of the classifier on test sets is above 70% and the accuracy is above 98% The quality of the extracted data is also evaluated by measuring the impact on the performance of a state-of-the-art SMT system, which is built with a small parallel corpus. It shows better results over the baseline system.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121309177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A spectral transition measure based MELCEPSTRAL features for obstruent detection 基于MELCEPSTRAL特征的光谱跃迁测量用于障碍物检测
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973511
Bhavik B. Vachhani, Kewal D. Malde, Maulik C. Madhavi, H. Patil
Obstruents are the key landmark events found in the speech signal. In this paper, we propose use of spectral transition measure (STM) to locate the obstruents in the continuous speech. The proposed approach does not take in to account any prior information (like phonetic sequence, speech transcription, and number of obstruents in the speech). Hence this approach is unsupervised and unconstraint approach. In this paper, we propose use of state-of-the-art Mel Frequency Cepstral Coefficients (MFCC)-based features to capture spectral transition for obstruent detection task. It is expected more spectral transition in the vicinity of obstruents. The entire experimental setup is developed on TIMIT database. The detection efficiency and estimated probability are around 77 % and 0.77 respectively (with 30 ms agreement duration and 0.4 STM threshold).
障碍是在语音信号中发现的关键地标事件。在本文中,我们提出使用频谱转移测量(STM)来定位连续语音中的障碍物。该方法不考虑任何先验信息(如语音序列、语音转录和语音中障碍的数量)。因此,这种方法是无监督和无约束的方法。在本文中,我们建议使用最先进的Mel频率倒谱系数(MFCC)为基础的特征来捕捉频谱转移阻塞检测任务。预计在障碍物附近会有更多的光谱转变。整个实验装置是在TIMIT数据库上开发的。检测效率和估计概率分别约为77%和0.77(协议持续时间为30 ms,阈值为0.4 STM)。
{"title":"A spectral transition measure based MELCEPSTRAL features for obstruent detection","authors":"Bhavik B. Vachhani, Kewal D. Malde, Maulik C. Madhavi, H. Patil","doi":"10.1109/IALP.2014.6973511","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973511","url":null,"abstract":"Obstruents are the key landmark events found in the speech signal. In this paper, we propose use of spectral transition measure (STM) to locate the obstruents in the continuous speech. The proposed approach does not take in to account any prior information (like phonetic sequence, speech transcription, and number of obstruents in the speech). Hence this approach is unsupervised and unconstraint approach. In this paper, we propose use of state-of-the-art Mel Frequency Cepstral Coefficients (MFCC)-based features to capture spectral transition for obstruent detection task. It is expected more spectral transition in the vicinity of obstruents. The entire experimental setup is developed on TIMIT database. The detection efficiency and estimated probability are around 77 % and 0.77 respectively (with 30 ms agreement duration and 0.4 STM threshold).","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Development of language resources for speech application in Gujarati and Marathi 古吉拉特语和马拉地语语音应用语言资源的开发
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973517
Maulik C. Madhavi, Shubham Sharma, H. Patil
This paper discusses development of resources using linguistics and signal processing aspects for two low resource Indian languages, viz., Gujarati and Marathi. Speech resource development discusses the details of data collection, transcription at phone and syllable level and corresponding linguistic units such as phones and syllables. In order to analyze the performance at different fluency levels, three types of recording modes, viz., read, conversation and lecture are considered in this paper. Manual annotation of speech in terms of International Phonetic Alphabet (IPA) symbols is presented. In the later section, we discuss speech segmentation at syllable level and prosodic level marking (pitch marking). Short-term Energy contour is smoothened using group-delay-based algorithm in order to detect syllable units in the speech signal. Detection rate obtained for syllable marking within 20 % agreement duration is of the order of 60 % in case of read mode speech. Prosody pitch marks are analyzed via Fo pattern of a speech signal. The key strength of this study is the analysis for different kinds of recording modes, viz., read, conversation and lecture mode. It is found that CV (where, Consonant is followed by Vowel) type of syllables have highest occurrence (more than 50 %) in both the languages. Read speech is observed to perform better than spontaneous speech in terms of automatic prosodic marking.
本文讨论了两种低资源印度语言,即古吉拉特语和马拉地语,使用语言学和信号处理方面的资源开发。语音资源开发讨论了数据收集的细节,在电话和音节水平转录和相应的语言单位,如电话和音节。为了分析在不同流利程度下的表现,本文考虑了三种记录模式,即阅读、对话和讲座。提出了用国际音标(IPA)符号标注语音的方法。在后面的部分中,我们将讨论音节水平的语音分割和韵律水平标记(音高标记)。为了检测语音信号中的音节单位,采用基于群延迟的算法对短期能量轮廓进行平滑处理。在20%的协议持续时间内,对于读模式语音,音节标记的检测率约为60%。通过语音信号的Fo模式分析韵律音高标记。本研究的重点在于分析了不同的录音模式,即阅读、对话和讲课模式。研究发现,CV(辅音后元音)型音节在两种语言中的出现率最高(超过50%)。阅读语音在自动韵律标记方面的表现优于自发语音。
{"title":"Development of language resources for speech application in Gujarati and Marathi","authors":"Maulik C. Madhavi, Shubham Sharma, H. Patil","doi":"10.1109/IALP.2014.6973517","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973517","url":null,"abstract":"This paper discusses development of resources using linguistics and signal processing aspects for two low resource Indian languages, viz., Gujarati and Marathi. Speech resource development discusses the details of data collection, transcription at phone and syllable level and corresponding linguistic units such as phones and syllables. In order to analyze the performance at different fluency levels, three types of recording modes, viz., read, conversation and lecture are considered in this paper. Manual annotation of speech in terms of International Phonetic Alphabet (IPA) symbols is presented. In the later section, we discuss speech segmentation at syllable level and prosodic level marking (pitch marking). Short-term Energy contour is smoothened using group-delay-based algorithm in order to detect syllable units in the speech signal. Detection rate obtained for syllable marking within 20 % agreement duration is of the order of 60 % in case of read mode speech. Prosody pitch marks are analyzed via Fo pattern of a speech signal. The key strength of this study is the analysis for different kinds of recording modes, viz., read, conversation and lecture mode. It is found that CV (where, Consonant is followed by Vowel) type of syllables have highest occurrence (more than 50 %) in both the languages. Read speech is observed to perform better than spontaneous speech in terms of automatic prosodic marking.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Acoustic features of Mandarin monophthongs by Tibetan speakers 藏语普通话单音元音的声学特征
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973503
Lu Zhao, Hui Feng, Huixia Wang, J. Dang
The present study, under the framework of Lado's Contrastive Analysis Hypothesis (1957) and Flege's Speech Learning Model (1995), is to find out whether and to what extent the Tibetan vowel space of native Tibetan speakers affects the “working” vowel space of Mandarin Chinese. The experiment adopts the experimental phonetic approach to examine the features of vowel space of 10 Tibetan speakers (5 male and 5 female) when they read monosyllabic words in Tibetan and Standard Chinese. When compared with the vowel space of Chinese speakers, the vowel space of Tibetan speakers presents the following features: 1. The overall distribution of the vowel space of Tibetan speakers is higher than that of Chinese speakers because the vowels produced by Tibetan speakers have lower Fl. 2. Under the influence of Tibetan vowel system, Tibetan speakers' vowel space of Mandarin monophthongs is slightly to the right of Chinese speakers' vowel space. 3. With the calculation of the Euclidean Distance between Tibetan speakers' monophthongs and Chinese speakers' monophthongs, the production of Mandarin monophthongs by Tibetan male speakers cannot be explained by Flege's Speech Learning Model, while the production of Mandarin monophthongs by Tibetan female speakers provides more evidence to the justification of Speech Learning Model.
本研究在Lado的对比分析假说(1957)和Flege的言语学习模型(1995)的框架下,探讨以藏语为母语的人的藏语元音空间是否以及在多大程度上影响普通话的“工作”元音空间。本实验采用实验语音学方法,考察了10名藏族使用者(5男5女)在阅读藏文和普通话单音节词时的元音空间特征。藏族语音的元音空间与汉语语音的元音空间相比,呈现出以下特点:1.藏族语音的元音空间;藏族发音者元音空间的总体分布高于汉语发音者,这是因为藏族发音者元音的音高较低。在藏语元音系统的影响下,藏语普通话单音元音空间略向右。3.通过对藏语和汉语单音的欧几里得距离的计算,发现藏族男性说话者产生普通话单音不能用Flege的言语学习模型来解释,而藏族女性说话者产生普通话单音则为言语学习模型的合理性提供了更多的证据。
{"title":"Acoustic features of Mandarin monophthongs by Tibetan speakers","authors":"Lu Zhao, Hui Feng, Huixia Wang, J. Dang","doi":"10.1109/IALP.2014.6973503","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973503","url":null,"abstract":"The present study, under the framework of Lado's Contrastive Analysis Hypothesis (1957) and Flege's Speech Learning Model (1995), is to find out whether and to what extent the Tibetan vowel space of native Tibetan speakers affects the “working” vowel space of Mandarin Chinese. The experiment adopts the experimental phonetic approach to examine the features of vowel space of 10 Tibetan speakers (5 male and 5 female) when they read monosyllabic words in Tibetan and Standard Chinese. When compared with the vowel space of Chinese speakers, the vowel space of Tibetan speakers presents the following features: 1. The overall distribution of the vowel space of Tibetan speakers is higher than that of Chinese speakers because the vowels produced by Tibetan speakers have lower Fl. 2. Under the influence of Tibetan vowel system, Tibetan speakers' vowel space of Mandarin monophthongs is slightly to the right of Chinese speakers' vowel space. 3. With the calculation of the Euclidean Distance between Tibetan speakers' monophthongs and Chinese speakers' monophthongs, the production of Mandarin monophthongs by Tibetan male speakers cannot be explained by Flege's Speech Learning Model, while the production of Mandarin monophthongs by Tibetan female speakers provides more evidence to the justification of Speech Learning Model.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133411277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Automatic acquisition of morphological resources for Melanau language 美拉瑙语词形资源的自动获取
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973523
Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang
Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.
计算形态学资源是提供形态学信息以创建形态学分析器所必需的关键组成部分。要手动获取形态学资源,需要两个主要组成部分。预处理和词法归纳两个组成部分导致了两个问题:1)耗时;2)从资源不足的语言角度管理资源的模糊性。为了克服上述问题,我们提出了一种从人工方式扩展而来的形态资源自动获取工具。在这项工作中,提出的自动工具中的三个主要模块是:i)标记化-对原始文本进行标记并生成词表;ii)转换-将形态学资源的软拷贝转换为所需格式;iii)分词工具集成-集成两个已建立的分词工具,即Linguistica和Morfessor,从生成的词表中获取形态学信息。测试方法主要有组件测试和集成测试两种。结果表明,所提出的工具已经被设计出来,并证明了它的有效性,使语言学家能够轻松地获得他们的词表和分割数据。我们相信所提出的工具将有助于其他研究人员以自动化的方式为资源不足的语言构建计算形态资源。
{"title":"Automatic acquisition of morphological resources for Melanau language","authors":"Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang","doi":"10.1109/IALP.2014.6973523","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973523","url":null,"abstract":"Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131323182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local phrase reordering model for complicated Chinese NPs in patent Chinese-English machine translation 专利汉英机器翻译中复杂中文np的局部短语重排模型
Pub Date : 2014-10-01 DOI: 10.1109/IALP.2014.6973496
Xiaodie Liu, Yun Zhu, Yaohong Jin
We focused on when and how to reorder the complicated Chinese NPs with two, three, four or five semantic-units and the semantic units were smallest chunks for reordering in Chinese-English Machine Translation. By analyzing clear parallels and striking distinctions between complicated Chinese NPs and their English, we built 17 formalized rules to identify the boundaries of semantic units with the Boundary-Words deduced from semantic features to recognized what to reorder and developed a strategy on how to reorder the internal ordering of complicated Chinese NPs when translated into English. At last, we used a rule-based MT system to test our work, and the experimental results showed that our strategy and rule-based method were very efficient.
我们主要研究了在汉英机器翻译中,如何对具有两个、三个、四个或五个语义单元的复杂中文np进行重新排序,并且语义单元是进行重新排序的最小块。通过分析复杂中文np与英语之间的明显相似之处和显著区别,我们构建了17条形式化规则,利用语义特征推导出的边界词来识别语义单元的边界,从而识别需要重新排序的内容,并制定了一套在翻译成英语时如何重新排序复杂中文np的内部顺序的策略。最后,我们使用基于规则的机器翻译系统对我们的工作进行了测试,实验结果表明我们的策略和基于规则的方法是非常有效的。
{"title":"Local phrase reordering model for complicated Chinese NPs in patent Chinese-English machine translation","authors":"Xiaodie Liu, Yun Zhu, Yaohong Jin","doi":"10.1109/IALP.2014.6973496","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973496","url":null,"abstract":"We focused on when and how to reorder the complicated Chinese NPs with two, three, four or five semantic-units and the semantic units were smallest chunks for reordering in Chinese-English Machine Translation. By analyzing clear parallels and striking distinctions between complicated Chinese NPs and their English, we built 17 formalized rules to identify the boundaries of semantic units with the Boundary-Words deduced from semantic features to recognized what to reorder and developed a strategy on how to reorder the internal ordering of complicated Chinese NPs when translated into English. At last, we used a rule-based MT system to test our work, and the experimental results showed that our strategy and rule-based method were very efficient.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"2001 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128563330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2014 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1