首页 > 最新文献

2014 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
Logical operative processes of semantic grammar for machine interpretation 用于机器解释的语义语法的逻辑操作过程
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973487
Sivakumar Ramakrishnan, Pradeep Isawasan, V. Mohanan
The purpose of this paper is to identify and reveal the significance of primary logical operative processes of semantic grammar of any languages for the establishment of machine interpretation. This neo generative mechanism for logical semantic representation for machine interpretation has been systematically analyzed by logical linguistic and mathematical postulations. These logical operative processes structurally provide a way in which grammatical properties of language can be treated within a framework of speech acts to accommodate and to ease the machine interpretation for ontological representation and cognitive act. This treatment also allows the sentences to be semantically interpreted and hermeneutically analyzed within the temporal movement of speech act for machine interpretation. The logical postulation of operative processes of grammar enables to provide an explanation of the grammatical intuitions of a native speaker of a language in terms of both a variety of cognitive operations and knowledge of distinct object categories to be applied in the machine interpretation.
本文的目的是识别和揭示任何语言语义语法的初级逻辑操作过程对建立机器解释的意义。通过逻辑语言学和数学假设,系统地分析了这种用于机器解释的逻辑语义表示的新生成机制。这些逻辑操作过程在结构上提供了一种方法,在语言行为的框架内处理语言的语法属性,以适应和简化机器对本体论表示和认知行为的解释。这种处理还允许在机器解释的言语行为的时间运动中对句子进行语义解释和解释学分析。语法操作过程的逻辑假设能够在机器解释中应用的各种认知操作和不同对象类别的知识方面,为母语者的语法直觉提供解释。
{"title":"Logical operative processes of semantic grammar for machine interpretation","authors":"Sivakumar Ramakrishnan, Pradeep Isawasan, V. Mohanan","doi":"10.1109/IALP.2014.6973487","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973487","url":null,"abstract":"The purpose of this paper is to identify and reveal the significance of primary logical operative processes of semantic grammar of any languages for the establishment of machine interpretation. This neo generative mechanism for logical semantic representation for machine interpretation has been systematically analyzed by logical linguistic and mathematical postulations. These logical operative processes structurally provide a way in which grammatical properties of language can be treated within a framework of speech acts to accommodate and to ease the machine interpretation for ontological representation and cognitive act. This treatment also allows the sentences to be semantically interpreted and hermeneutically analyzed within the temporal movement of speech act for machine interpretation. The logical postulation of operative processes of grammar enables to provide an explanation of the grammatical intuitions of a native speaker of a language in terms of both a variety of cognitive operations and knowledge of distinct object categories to be applied in the machine interpretation.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129436498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment classification using Enhanced Contextual Valence Shifters 基于增强语境效价移位的情感分类
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973485
V. Phu, Phan Thi Tuoi
We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.
我们探索了不同的方法来提高情感分类的准确性。文档的情感取向可以是积极的(+)、消极的(-)或中立的(0)。我们将来自[2,3,4,5,6]的五个字典组合成一个包含21137个条目的新字典。这本新词典有许多以前五本词典所没有的动词、副词、短语和习语。结果表明,该文提出的基于术语计数法和增强语境价转移法相结合的情感分类方法提高了情感分类的准确性。该方法在测试数据集上的准确率为68.984%,在训练数据集上的准确率为69.224%。所有这些方法都是基于我们的新词典和互联网电影数据集来实现评论分类的。
{"title":"Sentiment classification using Enhanced Contextual Valence Shifters","authors":"V. Phu, Phan Thi Tuoi","doi":"10.1109/IALP.2014.6973485","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973485","url":null,"abstract":"We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121989443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus 设计一个印尼语词性标记集并手动标记印尼语语料库
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973519
A. Dinakaramani, Rashel Fam, A. Luthfi, R. Manurung
We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.
我们描述了我们在为印尼语设计一个语言学原则词性(POS)标记集方面的工作。该过程包括对现有标记集的详细研究和分析,以及对印尼语语料库的手动标记。这项工作的结果是一个印尼语POS标记集,包含23个标记和一个印尼语语料库,其中有超过25万个词汇标记,这些标记已经使用该标记集进行了手动标记。
{"title":"Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus","authors":"A. Dinakaramani, Rashel Fam, A. Luthfi, R. Manurung","doi":"10.1109/IALP.2014.6973519","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973519","url":null,"abstract":"We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129033432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Category-associated collocative concept primitives extraction 与类别相关的并置概念原语提取
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973475
Zhejie Chi, Quan Zhang
Collocation is studied as an essential linguistic phenomenon in traditional natural language processing. Similarity, collocative concept primitives are introduced in HNC Concept Primitive Space to present the concept primitive pair co-occurring frequently. Collocative concept primitives can be studied with categories together as concept primitives usually contain category information. To explore the collocation phenomenon in the field of HNC and apply collocative information to language processing, this paper presents a two-stage approach to extract category-associated collocative concept primitives from a classification corpus. By conducting collocative concept primitives extraction in each sub-category corpus and carrying out category-associated collocative concept primitives extraction in the summarized corpus, we generate a category-associated collocative concept primitives list for each category. Our experiments show the items we extract are consistent with the reality and are of significance.
在传统的自然语言处理中,搭配是一种重要的语言现象。在HNC概念原语空间中引入相似性、搭配性概念原语来表示频繁共现的概念原语对。搭配概念原语可以和范畴一起研究,因为概念原语通常包含范畴信息。为了探索HNC领域的搭配现象,并将搭配信息应用于语言处理,本文提出了一种从分类语料库中提取与类别相关的搭配概念原语的两阶段方法。通过在每个子类别语料库中进行搭配概念原语提取,并在汇总的语料库中进行与类别相关的搭配概念原语提取,生成每个类别的与类别相关的搭配概念原语列表。实验表明,我们提取的项目与现实相符,具有一定的意义。
{"title":"Category-associated collocative concept primitives extraction","authors":"Zhejie Chi, Quan Zhang","doi":"10.1109/IALP.2014.6973475","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973475","url":null,"abstract":"Collocation is studied as an essential linguistic phenomenon in traditional natural language processing. Similarity, collocative concept primitives are introduced in HNC Concept Primitive Space to present the concept primitive pair co-occurring frequently. Collocative concept primitives can be studied with categories together as concept primitives usually contain category information. To explore the collocation phenomenon in the field of HNC and apply collocative information to language processing, this paper presents a two-stage approach to extract category-associated collocative concept primitives from a classification corpus. By conducting collocative concept primitives extraction in each sub-category corpus and carrying out category-associated collocative concept primitives extraction in the summarized corpus, we generate a category-associated collocative concept primitives list for each category. Our experiments show the items we extract are consistent with the reality and are of significance.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Which performs better for new word detection, character based or Chinese Word Segmentation based? 基于字符的新单词检测和基于中文分词,哪个表现更好?
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973474
Haijun Zhang, Shumin Shi
This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable to carry out evaluation. Accordingly, this paper proposed a pragmatic quantitative model to analyze and estimate the performance of NWD for all kinds of cases, especially for large-scale corpus situation. Studies have shown there is a good mutual authentication between experimental results and conclusion from the quantitative model. On the basis of analysis for experimental data and quantitative model, a reliable conclusion for effects of Chinese NWD basing the two strategies is reached, which can give a certain instruction for follow-up studies in Chinese new word detection.
提出了一种基于重复提取的新词检测性能评价方法。对于小规模语料库,我们提出采用条件随机场(Conditional Random Field, CRF)作为统计框架来评估NWD不同策略的效果。对于大规模语料库的情况,由于标注的语料库没有无限多,对比实验无法进行评价。为此,本文提出了一种语用定量模型,用于分析和评价NWD在各种情况下,特别是在大规模语料库情况下的性能。研究表明,实验结果与定量模型得出的结论具有良好的相互验证性。在对实验数据和定量模型进行分析的基础上,得出了基于两种策略的汉语新词检测效果的可靠结论,可以为汉语新词检测的后续研究提供一定的指导。
{"title":"Which performs better for new word detection, character based or Chinese Word Segmentation based?","authors":"Haijun Zhang, Shumin Shi","doi":"10.1109/IALP.2014.6973474","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973474","url":null,"abstract":"This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable to carry out evaluation. Accordingly, this paper proposed a pragmatic quantitative model to analyze and estimate the performance of NWD for all kinds of cases, especially for large-scale corpus situation. Studies have shown there is a good mutual authentication between experimental results and conclusion from the quantitative model. On the basis of analysis for experimental data and quantitative model, a reliable conclusion for effects of Chinese NWD basing the two strategies is reached, which can give a certain instruction for follow-up studies in Chinese new word detection.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115327803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Research on recognition of semantic chunk boundary in Tibetan 藏文语义块边界识别研究
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973476
Tianhang Wang, Shumin Shi, Heyan Huang, Congjun Long, Ruijing Li
Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.
语义块能够很好地描述句子的语义框架。它在机器翻译、QA系统等自然语言处理应用中起着重要的作用。目前,藏文语块的研究主要基于规则方法。本文根据藏文鲜明的语言特征,首先提出了藏文语义块的描述性定义及其标注方案,然后提出了一种特征选择算法,从候选特征模板中自动选择合适的语义块。通过对两种不同的藏语语料库,即语料库-句子和语料库-话语进行实验,使用条件随机场(CRF)模型和最大熵(ME)模型,F-Measure分别达到95.84%、94.95%和91.97%、88.82%。实验结果表明,本文提出的藏文语义块定义是合理的、可操作的。此外,在小尺度语料库中利用统计技术进行边界识别是可行和有效的。
{"title":"Research on recognition of semantic chunk boundary in Tibetan","authors":"Tianhang Wang, Shumin Shi, Heyan Huang, Congjun Long, Ruijing Li","doi":"10.1109/IALP.2014.6973476","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973476","url":null,"abstract":"Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Information decompression of Xinjiang travel materials 新疆旅游资料的信息解压
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973479
Kaihong Yang, Shuzhen Shi
Previous discussions on the translation of travel materials are mainly confined to functional and semeiotic perspectives. Authors of this paper hold that Xinjiang travel materials involve implicit information related to distinguished ethnical, geographical and historical cultures which cannot be absorbed comprehensively by English-speakers who do not share the same cultural backgrounds. They try to settle the problem with application of information decompression which means to amplify information redundancy to reduce unpredictability during message transmission. Meanwhile, they take translation plus comment, translation plus supplementation and translation plus explanation as measures in decompression. To be exact, in Chinese-English translation of Xinjiang travel materials, authors of the paper decompress the original texts and release the cultural connotations by means of translation plus comment, translation plus supplementation and translation plus explanation so as to convey correct and adequate information to receivers, shorten the cultural gap and achieve effective communication. This paper tries to propose a new prospective for the translation of Xinjiang travel materials.
以往关于旅游资料翻译的讨论主要局限于功能和符号学的角度。笔者认为,新疆旅游资料中蕴含着不同民族、地理和历史文化的隐性信息,不同文化背景的英语使用者无法全面吸收这些信息。他们试图通过信息解压的应用来解决这个问题,即通过放大信息冗余来减少信息传输过程中的不可预测性。同时,他们采取翻译加评论、翻译加补充、翻译加解释等减压措施。确切地说,在新疆旅游资料的汉英翻译中,作者通过翻译加评论、翻译加补充、翻译加解释等方式对原文进行解压,释放文化内涵,从而向受众传达正确、充分的信息,缩短文化差距,实现有效的沟通。本文试图为新疆旅游资料的翻译提供一个新的视角。
{"title":"Information decompression of Xinjiang travel materials","authors":"Kaihong Yang, Shuzhen Shi","doi":"10.1109/IALP.2014.6973479","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973479","url":null,"abstract":"Previous discussions on the translation of travel materials are mainly confined to functional and semeiotic perspectives. Authors of this paper hold that Xinjiang travel materials involve implicit information related to distinguished ethnical, geographical and historical cultures which cannot be absorbed comprehensively by English-speakers who do not share the same cultural backgrounds. They try to settle the problem with application of information decompression which means to amplify information redundancy to reduce unpredictability during message transmission. Meanwhile, they take translation plus comment, translation plus supplementation and translation plus explanation as measures in decompression. To be exact, in Chinese-English translation of Xinjiang travel materials, authors of the paper decompress the original texts and release the cultural connotations by means of translation plus comment, translation plus supplementation and translation plus explanation so as to convey correct and adequate information to receivers, shorten the cultural gap and achieve effective communication. This paper tries to propose a new prospective for the translation of Xinjiang travel materials.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128290762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic type disambiguation for Japanese verbs 日语动词语义型消歧
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973471
Shohei Okada, Kazuhide Yamamoto
The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese part-of-speech (POS) for predicates is classified according to sense; however, the POS classifications differ from their semantic classification. On this subject, semantic types, which aim to classify predicates, have been proposed. In this paper, we describe semantic types and present our construction of a disambiguator for Japanese verbs. Specifically, we constructed this disambiguator using a support vector machine by building feature vectors. We used semantic categories of noun and results of morphological analysis for the feature vectors. We then achieved 69.9% accuracy of disambiguation for newspaper articles using 10-fold cross-validation.
近年来,人们对从大量文本中提取和分析对服务或产品的评价和意见越来越感兴趣。根据意义对谓词进行分类是很重要的,因为一个陈述是否包含说话人的观点在很大程度上取决于它的谓词。一般认为,日语中谓语的词性是按意义分类的;然而,词性分类与语义分类不同。在这个问题上,已经提出了语义类型,目的是对谓词进行分类。本文描述了日语动词的语义类型,并提出了一个消歧义器的结构。具体而言,我们通过构建特征向量,使用支持向量机构建了该消歧器。我们使用名词的语义范畴和形态学分析的结果作为特征向量。然后,我们使用10倍交叉验证实现了69.9%的报纸文章消歧准确率。
{"title":"Semantic type disambiguation for Japanese verbs","authors":"Shohei Okada, Kazuhide Yamamoto","doi":"10.1109/IALP.2014.6973471","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973471","url":null,"abstract":"The interest has been increasing in recent years in extracting and analyzing evaluations and opinions of service or products from large bodies of text. It is important to classify predicates according to sense because whether or not a statement includes the speaker's opinion depends strongly on its predicate. It is generally assumed that Japanese part-of-speech (POS) for predicates is classified according to sense; however, the POS classifications differ from their semantic classification. On this subject, semantic types, which aim to classify predicates, have been proposed. In this paper, we describe semantic types and present our construction of a disambiguator for Japanese verbs. Specifically, we constructed this disambiguator using a support vector machine by building feature vectors. We used semantic categories of noun and results of morphological analysis for the feature vectors. We then achieved 69.9% accuracy of disambiguation for newspaper articles using 10-fold cross-validation.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"10 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120822909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NormAPI: An API for normalizing Filipino shortcut texts NormAPI:用于规范化菲律宾快捷文本的API
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973494
N. Nocon, G. Cuevas, Darwin Magat, Peter Suministrado, C. Cheng
As the number of Internet and mobile phone users grow, texting and chatting have become popular means of communication. Reaching new heights, the extensive use of cellphones and Internet led into the creation of a new language, where words are transformed and made shorter using various styles. Shortcut texting is used in informal venues such as SMS, online, chat rooms, forums and posts in social networks. Huge amounts of data originating from these informal sources can be utilized for various tasks in machine learning and data analytics. As these data may be written in shortcut forms, text normalization is necessary before NLP actions such as information extraction, data mining, text summarization, opinion classification, and even bilingual translations can be fully achieved, by acting as a preprocessing stage that transforms all informal texts back to their original and more understandable forms. This paper is about NormAPI, an API for normalizing Filipino shortcut texts. NormAPI primarily intends to be used as a preprocessing system that corrects informalities in shortcut texts before they are handed for complete data processing.
随着互联网和手机用户数量的增长,发短信和聊天已经成为流行的交流方式。手机和互联网的广泛使用达到了新的高度,导致了一种新的语言的产生,在这种语言中,单词被转换并使用各种风格变得更短。快捷短信用于非正式场合,如短信、在线、聊天室、论坛和社交网络帖子。来自这些非正式来源的大量数据可以用于机器学习和数据分析中的各种任务。由于这些数据可能以快捷形式编写,因此在信息提取、数据挖掘、文本摘要、意见分类甚至双语翻译等NLP操作完全实现之前,文本规范化是必要的。文本规范化是将所有非正式文本转换回其原始和更易于理解的形式的预处理阶段。本文是关于NormAPI,一个用于规范化菲律宾快捷文本的API。NormAPI主要打算作为一个预处理系统,在快捷文本被交付完整的数据处理之前纠正它们的非正式性。
{"title":"NormAPI: An API for normalizing Filipino shortcut texts","authors":"N. Nocon, G. Cuevas, Darwin Magat, Peter Suministrado, C. Cheng","doi":"10.1109/IALP.2014.6973494","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973494","url":null,"abstract":"As the number of Internet and mobile phone users grow, texting and chatting have become popular means of communication. Reaching new heights, the extensive use of cellphones and Internet led into the creation of a new language, where words are transformed and made shorter using various styles. Shortcut texting is used in informal venues such as SMS, online, chat rooms, forums and posts in social networks. Huge amounts of data originating from these informal sources can be utilized for various tasks in machine learning and data analytics. As these data may be written in shortcut forms, text normalization is necessary before NLP actions such as information extraction, data mining, text summarization, opinion classification, and even bilingual translations can be fully achieved, by acting as a preprocessing stage that transforms all informal texts back to their original and more understandable forms. This paper is about NormAPI, an API for normalizing Filipino shortcut texts. NormAPI primarily intends to be used as a preprocessing system that corrects informalities in shortcut texts before they are handed for complete data processing.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126680942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Influence of various asymmetrical contextual factors for TTS in a low resource language 低资源语言中各种不对称语境因素对TTS的影响
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973509
Nirmesh J. Shah, Mohammadi Zaki, H. Patil
The generalized statistical framework of Hidden Markov Model (HMM) has been successfully applied from the field of speech recognition to speech synthesis. In this paper, we have applied HMM-based Speech Synthesis (HTS) method to Gujarati (one of the official languages of India). Adaption and evaluation of HTS for Gujarati language has been done here. In addition, to understand the influence of asymmetrical contextual factors on quality of synthesized speech, we have conducted series of experiments. Evaluation of different HTS built for Gujarati speech using various asymmetrical contextual factors is done in terms of naturalness and speech intelligibility. From the experimental results, it is evident that when more weightage is given to left phoneme in asymmetrical contextual factor, HTS performance improves compared to conventional symmetrical contextual factors for both triphone and pentaphone case. Furthermore, we achieved best performance for Gujarati HTS with left-left-left-centre-right (i.e., LLLCR) contextual factors.
隐马尔可夫模型的广义统计框架已成功地从语音识别领域应用到语音合成领域。在本文中,我们将基于hmm的语音合成(HTS)方法应用于古吉拉特语(印度官方语言之一)。本文对古吉拉特语的HTS进行了改编和评价。此外,为了了解不对称语境因素对合成语音质量的影响,我们进行了一系列实验。利用不同的不对称语境因素对古吉拉特语构建的不同HTS进行了自然度和语音可理解度的评估。实验结果表明,无论在三声部还是五声部情况下,在不对称语境因素中增加左音素的权重,HTS的性能都比传统的对称语境因素有所提高。此外,我们在具有左-左-左-中-右(即LLLCR)背景因素的古吉拉特语HTS中取得了最佳表现。
{"title":"Influence of various asymmetrical contextual factors for TTS in a low resource language","authors":"Nirmesh J. Shah, Mohammadi Zaki, H. Patil","doi":"10.1109/IALP.2014.6973509","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973509","url":null,"abstract":"The generalized statistical framework of Hidden Markov Model (HMM) has been successfully applied from the field of speech recognition to speech synthesis. In this paper, we have applied HMM-based Speech Synthesis (HTS) method to Gujarati (one of the official languages of India). Adaption and evaluation of HTS for Gujarati language has been done here. In addition, to understand the influence of asymmetrical contextual factors on quality of synthesized speech, we have conducted series of experiments. Evaluation of different HTS built for Gujarati speech using various asymmetrical contextual factors is done in terms of naturalness and speech intelligibility. From the experimental results, it is evident that when more weightage is given to left phoneme in asymmetrical contextual factor, HTS performance improves compared to conventional symmetrical contextual factors for both triphone and pentaphone case. Furthermore, we achieved best performance for Gujarati HTS with left-left-left-centre-right (i.e., LLLCR) contextual factors.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126214289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2014 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1