Building a foundation of HPSG-based treebank on Bangla language

2007 10th international conference on computer and information technology Pub Date : 2007-12-01 DOI:10.1109/ICCITECHN.2007.4579375

A. Mahmud, M. Khan

引用次数: 5

Abstract

Now a day, the importance of a large annotated corpus for NLP researchers is widely known. In this paper, we describe an initial phase of developing a linguistically annotated corpus for non-configurational dasiaBanglapsila language. Since, the formalism differs from those posited for configurational languages; several features have been added for constraint based parsing through HPSG-based formalism. We propose an outline of a semi-automated process by applying both case marking approach and some morphological analysis to constraint the parsing of a relatively free word order language for creating a linguistically rich, highly-lexicalized annotated corpus.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

建立基于hpsg的孟加拉语树库基础

如今，一个大型注释语料库对NLP研究人员的重要性已广为人知。在本文中，我们描述了开发非配置数据库的语言注释语料库的初始阶段。因为，形式主义不同于为配置语言所假定的形式主义;通过基于hpsg的形式，为基于约束的解析添加了几个特性。我们提出了一个半自动化过程的大纲，通过应用大小写标记方法和一些形态学分析来约束相对自由的词序语言的解析，以创建语言丰富，高度词汇化的注释语料库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2007 10th international conference on computer and information technology

自引率

0.00%

发文量