Implementation Approach of Indian Language Gujarati Grammar's Concept “sandhi” using the Concepts of Rule-based NLP

2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) Pub Date : 2021-03-17 DOI:10.1109/INDIACom51348.2021.00085

N. Patel, Dhiren R. Patel

{"title":"Implementation Approach of Indian Language Gujarati Grammar's Concept “sandhi” using the Concepts of Rule-based NLP","authors":"N. Patel, Dhiren R. Patel","doi":"10.1109/INDIACom51348.2021.00085","DOIUrl":null,"url":null,"abstract":"The term ‘language’ in NLP has to be understood as natural languages like Gujarati, Hindi, English etc., which we use in daily life to communicate. Most of the NLP research has been centered on English & other European Languages. NLP research concerning the Indian language like Gujarati is commenced in the last few years. The centre of attention of this paper is to demonstrate the road map of implementation of Gujarati grammar's concept “sandhi ”. In our words sandhi is a word segmentation process & it is present in most of the South Asian language, such as Devnagri, Sanskrit, Hindi, and Gujarati & even in Chinese & Thai languages.” Sandhi leads to phonetic transformation at word boundaries of a written chunk (small part), and the sounds at the end of word join together to form a single chunk of the character sequence.” Our main spotlight is on rule-based implementation of “sandhi”. Similar to every Indian scripting language Gujarati language (Grammar) also has its own specified rules of composition for combining the consonants, vowels and modifiers. We have identified certain rules by which we accomplish the practical implementation of “sandhi ”. There are many sandhi rules available, each denoting a unique combination of phonetic transformations, documented in the grammatical tradition of Gujarati. The Sandhi does not make any syntactic or semantic changes to the words implicated. Sandhi is an elective operation that depends only on the alertness of the writer.","PeriodicalId":415594,"journal":{"name":"2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIACom51348.2021.00085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The term ‘language’ in NLP has to be understood as natural languages like Gujarati, Hindi, English etc., which we use in daily life to communicate. Most of the NLP research has been centered on English & other European Languages. NLP research concerning the Indian language like Gujarati is commenced in the last few years. The centre of attention of this paper is to demonstrate the road map of implementation of Gujarati grammar's concept “sandhi ”. In our words sandhi is a word segmentation process & it is present in most of the South Asian language, such as Devnagri, Sanskrit, Hindi, and Gujarati & even in Chinese & Thai languages.” Sandhi leads to phonetic transformation at word boundaries of a written chunk (small part), and the sounds at the end of word join together to form a single chunk of the character sequence.” Our main spotlight is on rule-based implementation of “sandhi”. Similar to every Indian scripting language Gujarati language (Grammar) also has its own specified rules of composition for combining the consonants, vowels and modifiers. We have identified certain rules by which we accomplish the practical implementation of “sandhi ”. There are many sandhi rules available, each denoting a unique combination of phonetic transformations, documented in the grammatical tradition of Gujarati. The Sandhi does not make any syntactic or semantic changes to the words implicated. Sandhi is an elective operation that depends only on the alertness of the writer.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于规则的NLP概念实现印度古吉拉特语语法概念“变调”的方法

NLP中的“语言”一词必须理解为自然语言，如古吉拉特语、印地语、英语等，我们在日常生活中使用这些语言进行交流。大多数NLP研究都集中在英语和其他欧洲语言上。关于古吉拉特语等印度语言的NLP研究是在过去几年开始的。本文的重点是展示古吉拉特语语法中“变调”概念的实施路线图。用我们的话说，sandhi是一个分词过程，它存在于大多数南亚语言中，比如Devnagri，梵语，印地语和古吉拉特语，甚至在汉语和泰语中。”变调导致在书写块(小部分)的单词边界处的语音转换，单词末尾的发音连接在一起形成字符序列的单个块。我们的主要焦点是基于规则的“sandhi”实施。与所有印度脚本语言类似，古吉拉特语(语法)也有自己特定的组合规则，用于组合辅音、元音和修饰语。我们已经确定了一些规则，通过这些规则我们可以完成“sandhi”的实际实施。有许多变调规则可用，每一个表示一个独特的组合的语音转换，记录在古吉拉特语的语法传统。变调对所涉及的词没有任何句法或语义上的改变。变调是一种选择性操作，只依赖于写作者的警觉性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)

自引率

0.00%

发文量