Automatic detection of subject/object drops in Bengali

2014 International Conference on Asian Language Processing (IALP) Pub Date : 2014-12-04 DOI:10.1109/IALP.2014.6973488

Arjun Das, Utpal Garain, Apurbalal Senapati

引用次数: 3

Abstract

This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

孟加拉语中主体/客体掉落自动检测

本文提出了一个开创性的尝试，自动检测滴在孟加拉语。孟加拉语中的支配语素是指主语、宾语和动词语素。孟加拉语是一种亲滴语，亲滴语属于主语/宾语滴语，这是本研究的重点。检测算法使用了现成的孟加拉语NLP工具，如POS标记器、分块器和依赖解析器。最初应用简单的语言规则来快速注释8,455个句子的数据集，然后手动检查这些句子。然后使用修正后的数据集训练两个分类器，将句子分类为有滴或没有滴。之前其他研究者使用的特征已经被考虑。这两种分类器的总体性能相当。作为副产品，目前的研究产生了另一个有用的NLP资源(除了drop-annotated数据集)，即根据及物性对孟加拉语动词(881个词根动词的所有形态变体)进行分类，这反过来又被分类器用作一个特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 International Conference on Asian Language Processing (IALP)

自引率

0.00%

发文量