基于节拍的相似印地语诗歌的两级数据转换学习方法

IF 1.1 3区文学 0 HUMANITIES, MULTIDISCIPLINARY Digital Scholarship in the Humanities Pub Date : 2023-03-20 DOI:10.1093/llc/fqad011

Komal Naaz, Niraj Kumar Singh

{"title":"基于节拍的相似印地语诗歌的两级数据转换学习方法","authors":"Komal Naaz, Niraj Kumar Singh","doi":"10.1093/llc/fqad011","DOIUrl":null,"url":null,"abstract":"\n With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A learning approach towards metre-based classification of similar Hindi poems using proposed two-level data transformation\",\"authors\":\"Komal Naaz, Niraj Kumar Singh\",\"doi\":\"10.1093/llc/fqad011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqad011\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqad011","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 1

摘要

随着科技的进步和资源的数字化，人文学科的计算问题也不例外。诗歌自动分类是一个定义明确的问题，可以使用多种方法来解决。基于情绪的诗歌分类是一种流行的分类方法。我们提出了一种基于节拍的印地语格律诗分类的学习方法。基于节拍的诗歌分类模型采用基于规则的方法，而本系统采用学习模型进行分类。特征提取和分类是自然语言处理中文本分类的两个主要组成部分。文本通过特征提取过程转换为机器可读的数字，随后提交给分类模型。诗歌，以其最自然的形式，不适合任何基于学习的算法。然而，将数据转换成一定的形式并从中选择固定数量的特征(特征提取)使得使用尚未触及的机器学习方法进行分类成为可能，并且可以作为相关研究领域的基准。这篇文章讨论了六种流行的和类似类型的印地语诗歌。对数据集进行收集和处理，形成早期数据集，经过数据转换和特征工程两级处理，得到预处理数据集。然后将预处理的数据集作为输入馈送到选定的机器学习模型(伯努利Naïve贝叶斯，k近邻，随机森林和支持向量机)，产生准确率最高为99%的分类结果，并根据观察到的错误分类进一步进行后处理步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A learning approach towards metre-based classification of similar Hindi poems using proposed two-level data transformation

With the advancement in technology and digitalization of resources, computation of humanities problems is no exception to remain untouched. Automatic poetry classification is now a well-defined problem which can be solved using various approaches. Mood-based poetry classification is one of the popular ones. We propose a learning approach towards metre-based classification of Hindi metrical poetry. The state of art model for the metre-based poetry classification uses the rule-based approach whereas the proposed system uses learning models to perform classification. Feature extraction and classification are the two main components of text classification in natural language processing. Text is transformed into machine-readable numbers through the process of feature extraction, which is subsequently submitted to classification models. Poems, in their most natural formulation, are unfit to any learning-based algorithms. However, transforming the data into certain form and selecting a fixed number of features out of it (feature extraction) made the classification possible using machine learning approach which was yet untouched and can act as benchmark for the concerned area of research. The article deals with six popular and similar types of Hindi poems. The dataset is collected and processed to form an early dataset that undergoes two levels of data transformation and feature engineering, resulting in the pre-processed dataset. The pre-processed dataset is then fed as input to selected machine learning models (Bernoulli Naïve Bayes, k-nearest neighbour, random forest, and support vector machine) producing classification result with best accuracy of 99%, that further undergoes a post-processing step based on observed misclassifications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Scholarship in the Humanities Multiple-

CiteScore

1.80

自引率

25.00%

发文量

期刊介绍： DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.