自动文本分类:马拉地语文档

Jaydeep Jalindar Patil, N. Bogiri
{"title":"自动文本分类:马拉地语文档","authors":"Jaydeep Jalindar Patil, N. Bogiri","doi":"10.1109/ICESA.2015.7503438","DOIUrl":null,"url":null,"abstract":"Information technology generated huge data on the internet. Initially this data is mainly in English language so majority of data mining research work is on the English text documents. As the internet usage increased, data in other languages like Marathi, Tamil, Telugu and Punjabi etc. increased on the internet. This paper presents the retrieval system for Marathi language documents based on the user profile. User profile considers the user's interests, user's browsing history. The system shows the Marathi documents to the end user based on the user profile. Automatic text categorization is useful in better management and retrieval of these text documents and also makes document retrieval as simple task. This paper discusses the automatic text categorization of Marathi documents and literature survey of the related work done in automatic text categorization of Marathi documents. Various learning techniques exist for the classification of text documents like Naïve Bayes, Support Vector Machine and Decision Trees etc. There are different clustering techniques used for text categorization like Label Induction Grouping Algorithm, Suffix Tree Clustering, and K- means etc. Literature survey shows that for non-English documents VSM [Vector Space Model] gives the better results than any other models. The system provides text categorization of Marathi documents by using the LINGO [Label Induction Grouping] algorithm. LINGO is based on the VSM [Vector Space Model]. The system uses the dataset which contains 200 documents of 20 different categories. The result represents that for Marathi text documents LINGO clustering algorithm is efficient.","PeriodicalId":259816,"journal":{"name":"2015 International Conference on Energy Systems and Applications","volume":"174 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Automatic text categorization: Marathi documents\",\"authors\":\"Jaydeep Jalindar Patil, N. Bogiri\",\"doi\":\"10.1109/ICESA.2015.7503438\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information technology generated huge data on the internet. Initially this data is mainly in English language so majority of data mining research work is on the English text documents. As the internet usage increased, data in other languages like Marathi, Tamil, Telugu and Punjabi etc. increased on the internet. This paper presents the retrieval system for Marathi language documents based on the user profile. User profile considers the user's interests, user's browsing history. The system shows the Marathi documents to the end user based on the user profile. Automatic text categorization is useful in better management and retrieval of these text documents and also makes document retrieval as simple task. This paper discusses the automatic text categorization of Marathi documents and literature survey of the related work done in automatic text categorization of Marathi documents. Various learning techniques exist for the classification of text documents like Naïve Bayes, Support Vector Machine and Decision Trees etc. There are different clustering techniques used for text categorization like Label Induction Grouping Algorithm, Suffix Tree Clustering, and K- means etc. Literature survey shows that for non-English documents VSM [Vector Space Model] gives the better results than any other models. The system provides text categorization of Marathi documents by using the LINGO [Label Induction Grouping] algorithm. LINGO is based on the VSM [Vector Space Model]. The system uses the dataset which contains 200 documents of 20 different categories. The result represents that for Marathi text documents LINGO clustering algorithm is efficient.\",\"PeriodicalId\":259816,\"journal\":{\"name\":\"2015 International Conference on Energy Systems and Applications\",\"volume\":\"174 5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Energy Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICESA.2015.7503438\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Energy Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICESA.2015.7503438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

摘要

信息技术在互联网上产生了巨大的数据。最初,这些数据主要是英文的,因此大多数数据挖掘研究工作都是在英文文本文档上进行的。随着互联网使用量的增加,其他语言的数据,如马拉地语、泰米尔语、泰卢固语和旁遮普语等在互联网上增加了。提出了一种基于用户配置文件的马拉地语文档检索系统。用户档案考虑用户的兴趣,用户的浏览历史。系统根据用户配置文件向最终用户显示马拉地语文档。自动文本分类有助于更好地管理和检索这些文本文档,并使文档检索成为一项简单的任务。本文讨论了马拉地语文献的自动文本分类,综述了马拉地语文献自动文本分类的相关工作。文本文档分类有多种学习技术,如Naïve贝叶斯、支持向量机和决策树等。有不同的聚类技术用于文本分类,如标签归纳分组算法、后缀树聚类和K- means等。文献调查表明,对于非英语文档,VSM [Vector Space Model]给出的结果比其他任何模型都要好。该系统使用LINGO[标签归纳分组]算法对马拉地语文档进行文本分类。LINGO基于VSM[向量空间模型]。系统使用包含20个不同类别的200个文档的数据集。结果表明,对于马拉地语文本文档,LINGO聚类算法是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatic text categorization: Marathi documents
Information technology generated huge data on the internet. Initially this data is mainly in English language so majority of data mining research work is on the English text documents. As the internet usage increased, data in other languages like Marathi, Tamil, Telugu and Punjabi etc. increased on the internet. This paper presents the retrieval system for Marathi language documents based on the user profile. User profile considers the user's interests, user's browsing history. The system shows the Marathi documents to the end user based on the user profile. Automatic text categorization is useful in better management and retrieval of these text documents and also makes document retrieval as simple task. This paper discusses the automatic text categorization of Marathi documents and literature survey of the related work done in automatic text categorization of Marathi documents. Various learning techniques exist for the classification of text documents like Naïve Bayes, Support Vector Machine and Decision Trees etc. There are different clustering techniques used for text categorization like Label Induction Grouping Algorithm, Suffix Tree Clustering, and K- means etc. Literature survey shows that for non-English documents VSM [Vector Space Model] gives the better results than any other models. The system provides text categorization of Marathi documents by using the LINGO [Label Induction Grouping] algorithm. LINGO is based on the VSM [Vector Space Model]. The system uses the dataset which contains 200 documents of 20 different categories. The result represents that for Marathi text documents LINGO clustering algorithm is efficient.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Performance assessment of steel reheating furnace GREEN SOLUTION (GS): A new initiative for Energy Efficient Computing where Humans and Machines work together Ingenious energy monitoring, control and management of electrical supply Smart parking management system using RFID and OCR MLP-neural network based detection and classification of Power Quality Disturbances
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1