首页 > 最新文献

2014 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
The retrieval research of non-adjacent keywords in Chinese corpus — A case study of “Yi…Jiu…” construction 汉语语料库中非相邻关键词检索研究——以“一……九……”结构为例
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973507
Xiao-ru Tan, Lijiao Yang
Corpus Concordancing is a popular research topic. The function of retrieving data from corpus by providing non-adjacent keywords is widely used by users. However, the precision of retrieval results is not very high because the machine can't recognize the relationship of the non-adjacent keywords. To deal with this problem, this paper proposed a rule-based method for the “Yi...Jiu...” construction, which could exclude the unrelated data, even though the data include the keywords. The experiments show that the precision is close to 82%.
语料库检索是一个热门的研究课题。通过提供非相邻关键字从语料库中检索数据的功能被用户广泛使用。然而,由于机器不能识别非相邻关键词之间的关系,检索结果的精度不是很高。为了解决这一问题,本文提出了一种基于规则的“一……九……”构造,可以排除不相关的数据,即使数据中包含了关键词。实验表明,该方法的精度接近82%。
{"title":"The retrieval research of non-adjacent keywords in Chinese corpus — A case study of “Yi…Jiu…” construction","authors":"Xiao-ru Tan, Lijiao Yang","doi":"10.1109/IALP.2014.6973507","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973507","url":null,"abstract":"Corpus Concordancing is a popular research topic. The function of retrieving data from corpus by providing non-adjacent keywords is widely used by users. However, the precision of retrieval results is not very high because the machine can't recognize the relationship of the non-adjacent keywords. To deal with this problem, this paper proposed a rule-based method for the “Yi...Jiu...” construction, which could exclude the unrelated data, even though the data include the keywords. The experiments show that the precision is close to 82%.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"59 40","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120814048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cepstral Mean Subtraction based features for Singer Identification 一种基于倒谱均值减法的歌手识别方法
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973510
Purushotam G. Radadia, H. Patil
Singer IDentification (SID) is a very challenging problem in Music Information Retrieval (MIR) system. Instrumental accompaniments, quality of recording apparatus and other singing voices (in chorus) make SID very difficult and challenging research problem. In this paper, we propose SID system on large database of 500 Hindi (Bollywood) songs using state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and Cepstral Mean Subtracted (CMS) features. We compare the performance of 3rd order polynomial classifier and Gaussian Mixture Model (GMM). With 3rd order polynomial classifier, we achieved % SID accuracy of 78 % and 89.5 % (and Equal Error Rate (EER) of 6.75 % and 6.42 %) for MFCC and CMSMFCC, respectively. Furthermore, score-level fusion of MFCC and CMSMFCC reduced EER by 0.95 % than MFCC alone. On the other hand, GMM gave % SID accuracy of 70.75 % for both MFCC and CMSMFCC. Finally, we found that CMS-based features are effective to alleviate album effect in SID problem.
歌手身份识别是音乐信息检索(MIR)系统中一个非常具有挑战性的问题。乐器伴奏,录音设备的质量和其他歌声(合唱)使SID非常困难和具有挑战性的研究问题。在本文中,我们利用最先进的Mel频率倒谱系数(MFCC)和倒谱均值减去(CMS)特征,在500首印度语(宝莱坞)歌曲的大型数据库上提出了SID系统。我们比较了三阶多项式分类器和高斯混合模型(GMM)的性能。使用三阶多项式分类器,我们在MFCC和CMSMFCC上分别获得了78%和89.5%的SID准确率(相等错误率(EER)分别为6.75%和6.42%)。此外,MFCC和CMSMFCC的评分水平融合比MFCC单独降低了0.95%的EER。另一方面,GMM对MFCC和CMSMFCC的SID精度均为70.75%。最后,我们发现基于cms的特性可以有效地缓解SID问题中的相册效应。
{"title":"A Cepstral Mean Subtraction based features for Singer Identification","authors":"Purushotam G. Radadia, H. Patil","doi":"10.1109/IALP.2014.6973510","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973510","url":null,"abstract":"Singer IDentification (SID) is a very challenging problem in Music Information Retrieval (MIR) system. Instrumental accompaniments, quality of recording apparatus and other singing voices (in chorus) make SID very difficult and challenging research problem. In this paper, we propose SID system on large database of 500 Hindi (Bollywood) songs using state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and Cepstral Mean Subtracted (CMS) features. We compare the performance of 3rd order polynomial classifier and Gaussian Mixture Model (GMM). With 3rd order polynomial classifier, we achieved % SID accuracy of 78 % and 89.5 % (and Equal Error Rate (EER) of 6.75 % and 6.42 %) for MFCC and CMSMFCC, respectively. Furthermore, score-level fusion of MFCC and CMSMFCC reduced EER by 0.95 % than MFCC alone. On the other hand, GMM gave % SID accuracy of 70.75 % for both MFCC and CMSMFCC. Finally, we found that CMS-based features are effective to alleviate album effect in SID problem.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116254988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
One-expression classification in Bengali and its role in Bengali-English machine translation 孟加拉语单表达分类及其在孟加拉语-英语机器翻译中的作用
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973489
Apurbalal Senapati, Utpal Garain
This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.
本文试图对孟加拉语中的单表达式进行分析,并证明其在机器翻译中的有效性。在1.77亿词的语料库中研究了单表达式的特征。提出了一种对单表达式进行分组的分类方案。识别有助于分类的特征,并在作者生成的包含2006个单一表达式实例的带注释的数据集上训练基于crf的分类器。在与训练数据不同的测试集(包含300个孟加拉语单表达式实例)上测试分类器的性能。结果表明,该分类器在75%的情况下可以正确分类单一表达式。最后,研究了该分类任务在机器翻译(孟加拉语-英语)中的应用。翻译准确率从39%(由Google翻译)提高到60%(通过提出的方法),并且发现这种改进具有统计学意义。所有带注释的数据集(之前没有)都是免费的,以促进对该主题的进一步研究。
{"title":"One-expression classification in Bengali and its role in Bengali-English machine translation","authors":"Apurbalal Senapati, Utpal Garain","doi":"10.1109/IALP.2014.6973489","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973489","url":null,"abstract":"This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124588580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Building an Indonesian named entity recognizer using Wikipedia and DBPedia 使用Wikipedia和DBPedia构建印尼语命名实体识别器
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973520
A. Luthfi, Bayu Distiawan Trisedya, R. Manurung
This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.
本文描述了利用维基百科1和DBPedia 2等在线数据开发印度尼西亚NER系统。该系统基于斯坦福NER系统[8],并利用从维基百科自动构建的训练文档。维基百科文档中的每个实体,即具有超链接的单词或短语,都根据从DBPedia获得的信息进行标记。在第一个版本中,我们只对三个实体感兴趣,即:Person、Place和Organization。系统使用交叉折叠验证进行评估,也使用手动注释的金标准进行评估。使用交叉验证评价,我们的印尼语NER获得了90%以上的精度和召回值,而使用金标准的评价表明印尼语NER达到了很高的精度,但召回率很低。
{"title":"Building an Indonesian named entity recognizer using Wikipedia and DBPedia","authors":"A. Luthfi, Bayu Distiawan Trisedya, R. Manurung","doi":"10.1109/IALP.2014.6973520","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973520","url":null,"abstract":"This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121993509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Automatic Arabic term extraction from special domain corpora 从特殊领域语料库中自动提取阿拉伯语术语
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973468
A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi
The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.
在数字图书馆、阿拉伯大学出版物网站和评审期刊中,机器可读的阿拉伯语特殊领域文本的可用性促进了许多有趣的研究和应用。这些应用包括从特殊领域语料库中自动提取术语。这些提取的术语可以作为其他应用和研究的基础,如特殊领域词典的构建、术语资源的创建和特殊领域本体的构建。文献调查显示,对阿拉伯语特殊领域文本的研究缺乏;此外,已经确定的少数研究使用复杂和计算昂贵的方法。在本研究中,我们使用两种基本的方法从阿拉伯语特殊领域语料库中自动提取术语。我们的方法基于两个简单的启发式。特殊领域语料库中出现频率最高的词和n-gram通常是术语,它们本身通常被功能词所限制。我们把我们的方法应用在一个应用阿拉伯语言学的语料库上。我们获得的结果与其他阿拉伯语术语提取研究相当,当只考虑与应用阿拉伯语言学领域严格相关的术语时,他们显示出87%的准确性,当包括相关术语时,准确度为93.7%。
{"title":"Automatic Arabic term extraction from special domain corpora","authors":"A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi","doi":"10.1109/IALP.2014.6973468","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973468","url":null,"abstract":"The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125886994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The refined MI: A significant improvement to mutual information 精细化的MI:对相互信息的重大改进
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973512
Maha Alrabiah, A. Al-Salman, E. Atwell
Distributional lexical semantics is an empirical approach that is mainly concerned with modeling words' meanings using word distribution statistics gathered from very large corpora. It is basically built on the Distributional Hypothesis by Zellig Harris in 1970, which states that the difference in words' meanings is associated with the difference in their distribution in text. This difference in meaning originates from two kinds of relations between words, which are syntagmatic and paradigmatic relations. Syntagmatic relations are linear combinatorial relations that are established between words that co-occur together in sequential text; while paradigmatic relations are substitutional relations that are established between words that occur in the same context, share neighboring words, but do not co-occur in the same text. In this paper, we present a new association measure, the Refined MI, for measuring syntagmatic relations between words. In addition, an experimental study to evaluate the performance of the proposed measure is presented. The measure showed outstanding results in identifying significant co-occurrences from Classical Arabic text.
分布词汇语义是一种实证方法,主要是利用从非常大的语料库中收集的词分布统计数据来建模词的含义。它基本上建立在泽利格·哈里斯1970年提出的分布假设的基础上,该假设指出,单词的意义差异与它们在文本中的分布差异有关。这种意义上的差异源于词与词之间的两种关系,即组合关系和聚合关系。组合关系是在连续文本中同时出现的词之间建立的线性组合关系;而聚合关系是在同一语境中出现的词之间建立的替代关系,它们共享相邻的词,但不会在同一文本中同时出现。在本文中,我们提出了一种新的关联度量——精炼MI,用于度量词之间的组合关系。此外,还进行了一项实验研究,以评估所提出的措施的性能。该方法在识别经典阿拉伯语文本中重要的共现现象方面显示出杰出的结果。
{"title":"The refined MI: A significant improvement to mutual information","authors":"Maha Alrabiah, A. Al-Salman, E. Atwell","doi":"10.1109/IALP.2014.6973512","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973512","url":null,"abstract":"Distributional lexical semantics is an empirical approach that is mainly concerned with modeling words' meanings using word distribution statistics gathered from very large corpora. It is basically built on the Distributional Hypothesis by Zellig Harris in 1970, which states that the difference in words' meanings is associated with the difference in their distribution in text. This difference in meaning originates from two kinds of relations between words, which are syntagmatic and paradigmatic relations. Syntagmatic relations are linear combinatorial relations that are established between words that co-occur together in sequential text; while paradigmatic relations are substitutional relations that are established between words that occur in the same context, share neighboring words, but do not co-occur in the same text. In this paper, we present a new association measure, the Refined MI, for measuring syntagmatic relations between words. In addition, an experimental study to evaluate the performance of the proposed measure is presented. The measure showed outstanding results in identifying significant co-occurrences from Classical Arabic text.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130943329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic detection of subject/object drops in Bengali 孟加拉语中主体/客体掉落自动检测
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973488
Arjun Das, Utpal Garain, Apurbalal Senapati
This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.
本文提出了一个开创性的尝试,自动检测滴在孟加拉语。孟加拉语中的支配语素是指主语、宾语和动词语素。孟加拉语是一种亲滴语,亲滴语属于主语/宾语滴语,这是本研究的重点。检测算法使用了现成的孟加拉语NLP工具,如POS标记器、分块器和依赖解析器。最初应用简单的语言规则来快速注释8,455个句子的数据集,然后手动检查这些句子。然后使用修正后的数据集训练两个分类器,将句子分类为有滴或没有滴。之前其他研究者使用的特征已经被考虑。这两种分类器的总体性能相当。作为副产品,目前的研究产生了另一个有用的NLP资源(除了drop-annotated数据集),即根据及物性对孟加拉语动词(881个词根动词的所有形态变体)进行分类,这反过来又被分类器用作一个特征。
{"title":"Automatic detection of subject/object drops in Bengali","authors":"Arjun Das, Utpal Garain, Apurbalal Senapati","doi":"10.1109/IALP.2014.6973488","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973488","url":null,"abstract":"This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"8 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113978152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Imagistic and propositional languages in classical Chinese poetry 中国古典诗歌中的意象语言和命题语言
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973493
J. Lee, Y. Kong
We analyze the use of “imagistic lan-guage” and “propositional language” in Classical Chinese poems. It is commonly held that the lines in the middle of a poem tend to be imagistic, while those at the end tend to be propositional. Using features proposed by two literary scholars, Yu-kung Kao and Tsu-lin Mei, we report on the distribution of the imagistic and propositional styles in a tree-bank of Classical Chinese poems. We conclude that imagistic language is indeed rarely found at the end of poems, but propositional language may be more present in the middle of the poem than previously assumed.
我们分析了“意象语言”和“命题语言”在中国古典诗歌中的运用。人们普遍认为,一首诗中间的几行诗倾向于意象化,而末尾的几行诗倾向于命题化。本文运用高有公和梅祖麟两位文学学者提出的特征,对中国古典诗词的意象风格和命题风格的分布进行了研究。我们得出的结论是,意象语言确实很少出现在诗歌的末尾,但命题语言可能比之前假设的更多地出现在诗歌的中间。
{"title":"Imagistic and propositional languages in classical Chinese poetry","authors":"J. Lee, Y. Kong","doi":"10.1109/IALP.2014.6973493","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973493","url":null,"abstract":"We analyze the use of “imagistic lan-guage” and “propositional language” in Classical Chinese poems. It is commonly held that the lines in the middle of a poem tend to be imagistic, while those at the end tend to be propositional. Using features proposed by two literary scholars, Yu-kung Kao and Tsu-lin Mei, we report on the distribution of the imagistic and propositional styles in a tree-bank of Classical Chinese poems. We conclude that imagistic language is indeed rarely found at the end of poems, but propositional language may be more present in the middle of the poem than previously assumed.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134218149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on building Chinese semantic lexicon based on the concept definition of HowNet 基于知网概念定义的汉语语义词典构建研究
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973477
H. Cao, Sen Zhang
Emotional tendency refers to people's attitude towards people or things. It is a kind of subjective judgments and it can be divided into several parts, such as praise or criticize, positive or negative, good or bad. The judgment of emotional words' emotional tendency and the problem of how to give emotional words a weight are the base of text tendency analysis. The study of semantic weight has been widely used in text tendency analysis, public sentiment, as well as text classification. This essay extracts words from glossary concept library (refer to glossary.dat) of HowNet and polish the library. In order to make the calculation study of the emotional words' weight more accurately, the paper studies synonyms and antonyms, as well seed words selection manually. The experiment proved the method attains the results expected in sentiment judgment, weight calculation and application in text analysis.
情感倾向是指人们对人或事物的态度。它是一种主观判断,它可以分为几个部分,如赞扬或批评,积极或消极,好或坏。情感词的情感倾向性判断以及如何赋予情感词权重问题是文本倾向性分析的基础。语义权重的研究已广泛应用于文本倾向性分析、舆情分析以及文本分类等领域。本文从HowNet的词汇表概念库(参见glossary.dat)中提取词汇,并对其进行润色。为了使情感词权重的计算研究更加准确,本文对情感词的近义词和反义词进行了研究,并进行了人工种子词的选择。实验证明,该方法在情感判断、权重计算和文本分析等方面都达到了预期的效果。
{"title":"Research on building Chinese semantic lexicon based on the concept definition of HowNet","authors":"H. Cao, Sen Zhang","doi":"10.1109/IALP.2014.6973477","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973477","url":null,"abstract":"Emotional tendency refers to people's attitude towards people or things. It is a kind of subjective judgments and it can be divided into several parts, such as praise or criticize, positive or negative, good or bad. The judgment of emotional words' emotional tendency and the problem of how to give emotional words a weight are the base of text tendency analysis. The study of semantic weight has been widely used in text tendency analysis, public sentiment, as well as text classification. This essay extracts words from glossary concept library (refer to glossary.dat) of HowNet and polish the library. In order to make the calculation study of the emotional words' weight more accurately, the paper studies synonyms and antonyms, as well seed words selection manually. The experiment proved the method attains the results expected in sentiment judgment, weight calculation and application in text analysis.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125169293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning word embeddings from dependency relations 从依赖关系中学习词嵌入
Pub Date : 2014-12-04 DOI: 10.1109/IALP.2014.6973490
Yinggong Zhao, Shujian Huang, Xinyu Dai, Jianbing Zhang, Jiajun Chen
Continuous-space word representation has demonstrated its effectiveness in many natural language pro-cessing(NLP) tasks. The basic idea for embedding training is to update embedding matrix based on its context. However, such context has been constrained on fixed surrounding words, which we believe are not sufficient to represent the actual relations for given center word. In this work we extend previous approach by learning distributed representations from dependency structure of a sentence which can capture long distance relations. Such context can learn better semantics for words, which is proved on Semantic-Syntactic Word Relationship task. Besides, competitive result is also achieved for dependency embeddings on WordSim-353 task.
连续空间词表示在许多自然语言处理(NLP)任务中已经证明了它的有效性。嵌入训练的基本思想是基于上下文更新嵌入矩阵。然而,这种语境被限制在固定的周围词上,我们认为这些词不足以代表给定中心词的实际关系。在这项工作中,我们扩展了以前的方法,从可以捕获长距离关系的句子的依赖结构中学习分布式表示。这在语义-句法词关系任务中得到了验证。此外,在WordSim-353任务上的依赖项嵌入也取得了竞争结果。
{"title":"Learning word embeddings from dependency relations","authors":"Yinggong Zhao, Shujian Huang, Xinyu Dai, Jianbing Zhang, Jiajun Chen","doi":"10.1109/IALP.2014.6973490","DOIUrl":"https://doi.org/10.1109/IALP.2014.6973490","url":null,"abstract":"Continuous-space word representation has demonstrated its effectiveness in many natural language pro-cessing(NLP) tasks. The basic idea for embedding training is to update embedding matrix based on its context. However, such context has been constrained on fixed surrounding words, which we believe are not sufficient to represent the actual relations for given center word. In this work we extend previous approach by learning distributed representations from dependency structure of a sentence which can capture long distance relations. Such context can learn better semantics for words, which is proved on Semantic-Syntactic Word Relationship task. Besides, competitive result is also achieved for dependency embeddings on WordSim-353 task.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125554127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2014 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1