首页 > 最新文献

2014 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
Nonlinear analysis of natural vs. HTS-based synthetic speech 自然语音与基于hts的合成语音的非线性分析
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973518
H. Patil, S. Adarsa
Many investigations on speech nonlinearities have been carried out and these studies provide strong evidences to support nonlinear system modelling of speech production. The nonlinear characteristics that these studies point to are analogous to chaotic systems. This paper aims to provide evidence of chaotic nature of speech signal and use it for feature extraction to distinguish synthetic and natural speech. The feature used to extract chaos is Lyapunov Exponent (LE). The synthetic speech is found to have higher values of LE in comparison with natural speech. We propose a new feature based on LE for detection of synthetic speech. The synthetic speech used is from Hidden Markov Model (HMM)-based speech synthesis system (HTS) trained using low resource Indian language-Gujarati. This work may find its application for improving robustness of speaker verification (SV) systems against imposture attack using synthetic speech.
许多关于语音非线性的研究已经开展,这些研究为支持语音产生的非线性系统建模提供了有力的证据。这些研究指出的非线性特性类似于混沌系统。本文旨在提供语音信号混沌性的证据,并将其用于特征提取来区分合成语音和自然语音。用来提取混沌的特征是李雅普诺夫指数(Lyapunov index, LE)。与自然语音相比,合成语音具有更高的LE值。我们提出了一种基于LE的合成语音检测新特征。所使用的合成语音来自基于隐马尔可夫模型(HMM)的语音合成系统(HTS),该系统使用低资源印度语言古吉拉特语进行训练。这项工作可能会发现其应用于提高说话人验证(SV)系统的鲁棒性,以对抗使用合成语音的冒充攻击。
{"title":"Nonlinear analysis of natural vs. HTS-based synthetic speech","authors":"H. Patil, S. Adarsa","doi":"10.1109/IALP.2014.6973518","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973518","url":null,"abstract":"Many investigations on speech nonlinearities have been carried out and these studies provide strong evidences to support nonlinear system modelling of speech production. The nonlinear characteristics that these studies point to are analogous to chaotic systems. This paper aims to provide evidence of chaotic nature of speech signal and use it for feature extraction to distinguish synthetic and natural speech. The feature used to extract chaos is Lyapunov Exponent (LE). The synthetic speech is found to have higher values of LE in comparison with natural speech. We propose a new feature based on LE for detection of synthetic speech. The synthetic speech used is from Hidden Markov Model (HMM)-based speech synthesis system (HTS) trained using low resource Indian language-Gujarati. This work may find its application for improving robustness of speaker verification (SV) systems against imposture attack using synthetic speech.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117098474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effectiveness of multiscale fractal dimension-based phonetic segmentation in speech synthesis for low resource language 基于多尺度分形维数的语音分割在低资源语言语音合成中的有效性
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973508
Mohammadi Zaki, Nirmesh J. Shah, H. Patil
Phonetic segmentation plays a key role in developing various speech applications. In this work, we propose to use various features for automatic phonetic segmentation task for forced Viterbi alignment and compare their effectiveness. We propose to use novel multiscale fractal dimension-based features concatenated with Mel-Frequency Cepstral Coefficients (MFCC). The novel features are expected to capture additional nonlinearities in speech production which should improve the performance of segmentation task. However, to evaluate effectiveness of these segmentation algorithms, we require manual accurate phoneme-level labeled data which is not available for low resource languages such as Gujarati (a low resource language and one of the official languages of India). In order to measure effectiveness of various segmentation algorithms, HMM-based speech synthesis system (HTS) for Gujarati have been built. From the subjective and objective evaluations, it is observed that FD-based features for segmentation work moderately better than other state-of-the-art features such as MFCC, Perceptual Linear Prediction Cepstral Coefficients (PLP-CC), Cochlear Filter Cepstral Coefficients (CFCC), and RelAtive SpecTrAl (RASTA)-based PLP-CC. The Mean Opinion Score (MOS) and the Degraded-MOS, which are the measures of naturalness indicate an improvement of 9.69% with the proposed features from the MFCC (which is found to be the best among the other features) based features.
语音切分在各种语音应用的开发中起着关键作用。在这项工作中,我们提出了使用各种特征来完成自动语音分割任务,并比较了它们的有效性。我们提出了一种新的基于多尺度分形维的特征与Mel-Frequency倒谱系数(MFCC)相连接的方法。新的特征有望捕获语音产生中的额外非线性,从而提高分割任务的性能。然而,为了评估这些分割算法的有效性,我们需要人工精确的音素级标记数据,这对于古吉拉特语(一种低资源语言,也是印度的官方语言之一)等低资源语言是不可用的。为了衡量各种分割算法的有效性,建立了基于hmm的古吉拉特语语音合成系统(HTS)。从主观和客观的评价来看,基于fd的分割特征比MFCC、感知线性预测倒谱系数(PLP-CC)、耳蜗滤波器倒谱系数(CFCC)和基于相对谱(RASTA)的PLP-CC等其他最先进的特征工作得更好。作为自然度度量的Mean Opinion Score (MOS)和Degraded-MOS表明,基于MFCC(在其他特征中被认为是最好的)的特征所提出的特征改善了9.69%。
{"title":"Effectiveness of multiscale fractal dimension-based phonetic segmentation in speech synthesis for low resource language","authors":"Mohammadi Zaki, Nirmesh J. Shah, H. Patil","doi":"10.1109/IALP.2014.6973508","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973508","url":null,"abstract":"Phonetic segmentation plays a key role in developing various speech applications. In this work, we propose to use various features for automatic phonetic segmentation task for forced Viterbi alignment and compare their effectiveness. We propose to use novel multiscale fractal dimension-based features concatenated with Mel-Frequency Cepstral Coefficients (MFCC). The novel features are expected to capture additional nonlinearities in speech production which should improve the performance of segmentation task. However, to evaluate effectiveness of these segmentation algorithms, we require manual accurate phoneme-level labeled data which is not available for low resource languages such as Gujarati (a low resource language and one of the official languages of India). In order to measure effectiveness of various segmentation algorithms, HMM-based speech synthesis system (HTS) for Gujarati have been built. From the subjective and objective evaluations, it is observed that FD-based features for segmentation work moderately better than other state-of-the-art features such as MFCC, Perceptual Linear Prediction Cepstral Coefficients (PLP-CC), Cochlear Filter Cepstral Coefficients (CFCC), and RelAtive SpecTrAl (RASTA)-based PLP-CC. The Mean Opinion Score (MOS) and the Degraded-MOS, which are the measures of naturalness indicate an improvement of 9.69% with the proposed features from the MFCC (which is found to be the best among the other features) based features.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116110483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The analysis on mistaken segmentation of Tibetan words based on statistical method 基于统计方法的藏文词分词错误分析
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973513
Congjun Long, Yiyong Lan, Xiaobing Zhao
In this paper, by using the Tibetan word segmentation system, IEA-TWordSeg, the authors attempt segmentation of the total 1271 sentences in the closed set and 1000 sentences in an open set. The accuracy of testing is 99.54% and 92.41% respectively. The authors describe the wrong segmentation types as well as the causes of the mistakes, and demonstrate the proportion of different types of segmentation errors. The purpose of the article is to provide clues for those who intend to improve the accuracy of Tibetan word segmentation system.
本文利用藏文分词系统IEA-TWordSeg,对封闭集的1271个句子和开放集的1000个句子进行了分词。检测准确率分别为99.54%和92.41%。作者描述了错误的分词类型和错误产生的原因,并展示了不同类型的分词错误所占的比例。本文的目的是为提高藏文分词系统的准确率提供一些线索。
{"title":"The analysis on mistaken segmentation of Tibetan words based on statistical method","authors":"Congjun Long, Yiyong Lan, Xiaobing Zhao","doi":"10.1109/IALP.2014.6973513","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973513","url":null,"abstract":"In this paper, by using the Tibetan word segmentation system, IEA-TWordSeg, the authors attempt segmentation of the total 1271 sentences in the closed set and 1000 sentences in an open set. The accuracy of testing is 99.54% and 92.41% respectively. The authors describe the wrong segmentation types as well as the causes of the mistakes, and demonstrate the proportion of different types of segmentation errors. The purpose of the article is to provide clues for those who intend to improve the accuracy of Tibetan word segmentation system.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116382833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The usage of Zongshi 宗师的用法
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973470
Shuqin Shi, Kaihong Yang
Zongshi conjunction word, can be used in different syntax contexts. Zongshi usually directs a simple sentence which has been used as subordinate clause in compound sentence. Zongshi and other adverbs or conjunctions can express different logical relations in compound sentence. The original meaning of Zongshi is resuming or states a fact, which depends on the specific context. The sentence with Zongshi has a strong tendency of subjectivity.
宗师连词,可以用在不同的句法语境中。宗师通常指导一个简单句,在复合句中用作从句。在复合句中,“总”和其他副词或连词可以表达不同的逻辑关系。“宗事”的本意是恢复或陈述一个事实,这取决于具体的语境。“宗师”句具有很强的主观性。
{"title":"The usage of Zongshi","authors":"Shuqin Shi, Kaihong Yang","doi":"10.1109/IALP.2014.6973470","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973470","url":null,"abstract":"Zongshi conjunction word, can be used in different syntax contexts. Zongshi usually directs a simple sentence which has been used as subordinate clause in compound sentence. Zongshi and other adverbs or conjunctions can express different logical relations in compound sentence. The original meaning of Zongshi is resuming or states a fact, which depends on the specific context. The sentence with Zongshi has a strong tendency of subjectivity.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132069642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid approach for aligning parallel sentences for languages without a written form using standard Malay and Malay dialects 使用标准马来语和马来方言对没有书面形式的语言平行句进行对齐的混合方法
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973524
Y. Khaw, T. Tan
Alignment of parallel text is a step for building a machine translation. Parallel text alignment is important because linguistic information can be retrieved from the result of alignment which including bilingual dictionaries and grammars correspondence of each language. In this paper, we propose a hybrid approach for standard Malay-dialectal Malay parallel text alignment. The Malay dialects in Malaysia can be grouped according to the states such as Perak dialect, Kedah dialect and Terengganu dialect. It is important to learn Malay dialects as it is still flourished and widely used in many areas especially for unofficial matters. Kelantanese Malay is used as an example for dialectal Malay in this research. The obtained precision and recall values of the proposed alignment methods are above 90%.
平行文本的对齐是构建机器翻译的一个步骤。平行文本对齐是重要的,因为可以从对齐的结果中检索语言信息,包括双语词典和每种语言的语法对应。在本文中,我们提出了一种标准马来语-方言马来语平行文本对齐的混合方法。马来西亚的马来语方言可按州分,如霹雳方言、吉打方言、登嘉楼方言等。学习马来语方言是很重要的,因为它仍然很流行,在许多领域被广泛使用,特别是在非官方事务中。本研究以吉兰丹马来语作为马来方言的例子。所提出的对齐方法得到的精度和查全率均在90%以上。
{"title":"Hybrid approach for aligning parallel sentences for languages without a written form using standard Malay and Malay dialects","authors":"Y. Khaw, T. Tan","doi":"10.1109/IALP.2014.6973524","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973524","url":null,"abstract":"Alignment of parallel text is a step for building a machine translation. Parallel text alignment is important because linguistic information can be retrieved from the result of alignment which including bilingual dictionaries and grammars correspondence of each language. In this paper, we propose a hybrid approach for standard Malay-dialectal Malay parallel text alignment. The Malay dialects in Malaysia can be grouped according to the states such as Perak dialect, Kedah dialect and Terengganu dialect. It is important to learn Malay dialects as it is still flourished and widely used in many areas especially for unofficial matters. Kelantanese Malay is used as an example for dialectal Malay in this research. The obtained precision and recall values of the proposed alignment methods are above 90%.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134371450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semantic conceptual primitives computing in text classification 文本分类中的语义概念原语计算
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973472
Quan Zhang, Yi Yuan, Xiangfeng Wei, Zhejie Chi, Peimin Cong, Yihua Du
This paper presents a method for enhancing text classification performance with semantic computing. It adopts conceptual primitives with semantic relations as knowledge expression. Based on the semantic expression, it mined the association relation of primitives among different text classification, and these association rules take association relation as text classification feature. The presented method not only considers what kind of the semantic primitives that a text contains, but also takes account of the association relation of the semantic primitives. Moreover, we test the method with public text classification text set. The experiment result shows that, comparing with the commonly used methods, this method prompts text classification performance.
提出了一种利用语义计算提高文本分类性能的方法。它采用带有语义关系的概念原语作为知识表达。在语义表达的基础上,挖掘不同文本分类之间原语的关联关系,这些关联规则以关联关系作为文本分类特征。该方法不仅考虑了文本包含什么样的语义原语,而且考虑了语义原语之间的关联关系。此外,我们还使用公共文本分类文本集对该方法进行了测试。实验结果表明,与常用方法相比,该方法提高了文本分类性能。
{"title":"Semantic conceptual primitives computing in text classification","authors":"Quan Zhang, Yi Yuan, Xiangfeng Wei, Zhejie Chi, Peimin Cong, Yihua Du","doi":"10.1109/IALP.2014.6973472","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973472","url":null,"abstract":"This paper presents a method for enhancing text classification performance with semantic computing. It adopts conceptual primitives with semantic relations as knowledge expression. Based on the semantic expression, it mined the association relation of primitives among different text classification, and these association rules take association relation as text classification feature. The presented method not only considers what kind of the semantic primitives that a text contains, but also takes account of the association relation of the semantic primitives. Moreover, we test the method with public text classification text set. The experiment result shows that, comparing with the commonly used methods, this method prompts text classification performance.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"2 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131830068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Concepts identification of an NL query in NLIDB systems NLIDB系统中NL查询的概念识别
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973483
Saikrishna Srirampur, Ravi Chandibhamar, Ashish Palakurthi, R. Mamidi
This paper proposes a novel approach to capture the concept1 of an NL query. Given an NL query, the query is mapped to a tagset, which carries the concepts information. The tagset was created by mapping every noun chunk to the attribute of a table (tableName.attributeNarne) and every verb chunk to a relation in the ER schema. The approach is discussed using the Courses Management domain of a University and can be extended to other domains. The tagset here was formed using the ER-schema of the Courses Management Portal of our university. We used the statistical approach to identify the concepts. We ourselves formed a tagged corpus with different types of NL queries. Conditional Random Field algorithm was used for the classification. The results are very promising and are compared to the rule based approach seen in Gupta et al. (2012) [1].
本文提出了一种捕捉自然语言查询概念的新方法。给定一个NL查询,该查询被映射到一个带有概念信息的标记集。标记集是通过将每个名词块映射到表的属性(tableName.attributeNarne)和将每个动词块映射到ER模式中的关系来创建的。该方法在大学课程管理领域进行了讨论,并且可以扩展到其他领域。本文的标签集是利用我校课程管理门户的er模式形成的。我们使用统计方法来确定概念。我们自己用不同类型的自然语言查询形成了一个带标签的语料库。采用条件随机场算法进行分类。结果非常有希望,并与Gupta等人(2012)中看到的基于规则的方法进行了比较。
{"title":"Concepts identification of an NL query in NLIDB systems","authors":"Saikrishna Srirampur, Ravi Chandibhamar, Ashish Palakurthi, R. Mamidi","doi":"10.1109/IALP.2014.6973483","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973483","url":null,"abstract":"This paper proposes a novel approach to capture the concept1 of an NL query. Given an NL query, the query is mapped to a tagset, which carries the concepts information. The tagset was created by mapping every noun chunk to the attribute of a table (tableName.attributeNarne) and every verb chunk to a relation in the ER schema. The approach is discussed using the Courses Management domain of a University and can be extended to other domains. The tagset here was formed using the ER-schema of the Courses Management Portal of our university. We used the statistical approach to identify the concepts. We ourselves formed a tagged corpus with different types of NL queries. Conditional Random Field algorithm was used for the classification. The results are very promising and are compared to the rule based approach seen in Gupta et al. (2012) [1].","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131439921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An extracted database content from WordNet for Natural Language Processing and Word Games 从WordNet中提取的用于自然语言处理和文字游戏的数据库内容
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973502
Josephine E. Petralba
WordNet which is available online and in desktop applications, is an English dictionary where the synonym sets of group of words are linked by means of semantic relations such as hyponymy, meronymy and entailment, among others. The main objective of this paper is to provide the Natural Language Processing (NLP) researchers and Word Game developers with a database such that WordNet content are accessed using simple Structured Query Language (SQL) queries. A distribution copy of Wordnet 3.0 database was downloaded, and loaded into a mySQL database. It was then migrated to Oracle where the database processing to accomplish the objectives of this project was performed. There were 7 tables, 32 materialized views and 4 stored functions constructed. It is at the WordNet dictionary displays that an NLP researcher will initially investigate what Wordnet content he/she needs. Most of the objects were created with reference to the displays. The aim was to come-up with simple SQLs such that the output of an SQL is similar to what is displayed online. Queries to extract content for some Word Games such as HangarooTM and Batang Henyo™ (Genius Child) exemplified the use of this project for Word Games. For Oracle users, distribution copies were made available in a collection of SQL scripts. Non-Oracle users were provided with Excel spreadsheets, Comma Separated Values (CSV) and eXtended Markup Language (XML) files that they can import or load.
WordNet是一个英语词典,可以在线使用,也可以在桌面应用程序中使用。WordNet是一个英语词典,其中一组词的同义词集通过语义关系(如上下义、重名和蕴涵等)联系在一起。本文的主要目的是为自然语言处理(NLP)研究人员和文字游戏开发人员提供一个数据库,以便使用简单的结构化查询语言(SQL)查询访问WordNet内容。下载Wordnet 3.0数据库的分发副本,并将其加载到mySQL数据库中。然后将其迁移到Oracle,在Oracle中执行数据库处理以完成该项目的目标。共构建了7个表、32个物化视图和4个存储函数。在WordNet词典展示中,NLP研究人员将首先调查他/她需要的WordNet内容。大多数的物品都是参照展品制作的。其目的是使用简单的SQL,使SQL的输出与在线显示的类似。为一些文字游戏(如HangarooTM和Batang Henyo™(Genius Child))提取内容的查询示例了该项目在文字游戏中的使用。对于Oracle用户,发行版副本可以在SQL脚本集合中使用。为非oracle用户提供了可以导入或加载的Excel电子表格、逗号分隔值(CSV)和扩展标记语言(XML)文件。
{"title":"An extracted database content from WordNet for Natural Language Processing and Word Games","authors":"Josephine E. Petralba","doi":"10.1109/IALP.2014.6973502","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973502","url":null,"abstract":"WordNet which is available online and in desktop applications, is an English dictionary where the synonym sets of group of words are linked by means of semantic relations such as hyponymy, meronymy and entailment, among others. The main objective of this paper is to provide the Natural Language Processing (NLP) researchers and Word Game developers with a database such that WordNet content are accessed using simple Structured Query Language (SQL) queries. A distribution copy of Wordnet 3.0 database was downloaded, and loaded into a mySQL database. It was then migrated to Oracle where the database processing to accomplish the objectives of this project was performed. There were 7 tables, 32 materialized views and 4 stored functions constructed. It is at the WordNet dictionary displays that an NLP researcher will initially investigate what Wordnet content he/she needs. Most of the objects were created with reference to the displays. The aim was to come-up with simple SQLs such that the output of an SQL is similar to what is displayed online. Queries to extract content for some Word Games such as HangarooTM and Batang Henyo™ (Genius Child) exemplified the use of this project for Word Games. For Oracle users, distribution copies were made available in a collection of SQL scripts. Non-Oracle users were provided with Excel spreadsheets, Comma Separated Values (CSV) and eXtended Markup Language (XML) files that they can import or load.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126660200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Classification of phonemes using modulation spectrogram based features for Gujarati language 基于调制谱图特征的古吉拉特语音素分类
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973506
Anshu Chittora, H. Patil
In this paper, features extracted from modulation spectrogram are used to classify the phonemes in Gujarati language. Modulation spectrogram which is a 2-dimensional (i.e., 2-D) feature vector, is then reduced to a smaller feature dimension by using the proposed feature extraction method. Gujarati database was manually segmented in 31 phoneme classes. These phonemes are then classified using support vector machine (SVM) classifier. Classification accuracy of phoneme classification is 94.5 % as opposed to classification with the state-of-the-art feature set Mel frequency cepstral coefficients (MFCC), which yields 92.74 % classification accuracy. Classification accuracy for broad phoneme classes, viz., vowel, stops, nasals, semivowels, affricates and fricatives is also determined. Phoneme classification in their respective classes is 95.03 % correct with the proposed feature set. Fusion of MFCC with the proposed feature set is performing even better, giving phoneme classification accuracy of 95.7%. With the fusion of features phoneme classification in sonorant and obstruent classes is found to be 97.01 % accurate.
本文利用调制谱图提取的特征对古吉拉特语的音素进行分类。调制谱图是一个二维(即2-D)特征向量,然后使用所提出的特征提取方法将其降为更小的特征维。将古吉拉特语数据库手工分割为31个音素类。然后使用支持向量机(SVM)分类器对这些音素进行分类。音素分类的准确率为94.5%,而使用最先进的特征集Mel频率倒谱系数(MFCC)进行分类的准确率为92.74%。还确定了元音、顿音、鼻音、半元音、模糊元音和摩擦音等广泛音素类别的分类准确性。使用所提出的特征集对各自类别的音素分类正确率为95.03%。MFCC与所提出的特征集的融合表现更好,音素分类准确率达到95.7%。通过特征融合,音素分类的正确率达到97.01%。
{"title":"Classification of phonemes using modulation spectrogram based features for Gujarati language","authors":"Anshu Chittora, H. Patil","doi":"10.1109/IALP.2014.6973506","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973506","url":null,"abstract":"In this paper, features extracted from modulation spectrogram are used to classify the phonemes in Gujarati language. Modulation spectrogram which is a 2-dimensional (i.e., 2-D) feature vector, is then reduced to a smaller feature dimension by using the proposed feature extraction method. Gujarati database was manually segmented in 31 phoneme classes. These phonemes are then classified using support vector machine (SVM) classifier. Classification accuracy of phoneme classification is 94.5 % as opposed to classification with the state-of-the-art feature set Mel frequency cepstral coefficients (MFCC), which yields 92.74 % classification accuracy. Classification accuracy for broad phoneme classes, viz., vowel, stops, nasals, semivowels, affricates and fricatives is also determined. Phoneme classification in their respective classes is 95.03 % correct with the proposed feature set. Fusion of MFCC with the proposed feature set is performing even better, giving phoneme classification accuracy of 95.7%. With the fusion of features phoneme classification in sonorant and obstruent classes is found to be 97.01 % accurate.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132623842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A rule-based method for Chinese punctuations processing in sentences segmentation 基于规则的汉语分句标点处理方法
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973504
Jing Wang, Yun Zhu, Yaohong Jin
In this paper, a rule-based sentence segmentation system is proposed. We studied the usage and function of Chinese punctuation marks, and classified them into 4 categories. According to whether punctuation can split a sentence, we tagged it with a label SST or un-SST. Experiments were conducted on 4 different kinds of corpus containing 12 kinds of Chinese punctuation marks, and our model achieves a high F-measure over 90% overall. Experiment results show that our approach is effectively for sentence segmentation.
本文提出了一种基于规则的句子分词系统。我们研究了汉语标点符号的用法和功能,并将其分为4类。根据标点是否能将句子分开,我们将其标记为SST或un-SST。在包含12种汉语标点符号的4种不同语料库上进行了实验,我们的模型总体f值达到90%以上。实验结果表明,该方法对句子切分是有效的。
{"title":"A rule-based method for Chinese punctuations processing in sentences segmentation","authors":"Jing Wang, Yun Zhu, Yaohong Jin","doi":"10.1109/IALP.2014.6973504","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973504","url":null,"abstract":"In this paper, a rule-based sentence segmentation system is proposed. We studied the usage and function of Chinese punctuation marks, and classified them into 4 categories. According to whether punctuation can split a sentence, we tagged it with a label SST or un-SST. Experiments were conducted on 4 different kinds of corpus containing 12 kinds of Chinese punctuation marks, and our model achieves a high F-measure over 90% overall. Experiment results show that our approach is effectively for sentence segmentation.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122670833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2014 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1