基于粗糙集的文本挖掘特征选择方法

N. Sailaja, L. P. Sree, N. Mangathayaru
{"title":"基于粗糙集的文本挖掘特征选择方法","authors":"N. Sailaja, L. P. Sree, N. Mangathayaru","doi":"10.1109/IC3I.2016.7917932","DOIUrl":null,"url":null,"abstract":"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Rough set based feature selection approach for text mining\",\"authors\":\"N. Sailaja, L. P. Sree, N. Mangathayaru\",\"doi\":\"10.1109/IC3I.2016.7917932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.\",\"PeriodicalId\":305971,\"journal\":{\"name\":\"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3I.2016.7917932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7917932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

文本可以被认为是字符的组合。在非结构化文本数据规模巨大的环境中,用计算机处理此类数据是一项具有挑战性的任务。为了从文本中提取有意义和有用的模式,需要一些预处理方法和算法。特征选择或约简生成旨在确定一个最小的属性子集,该子集可以表示与原始特征(属性)表示的相同的知识。粗糙集理论(RST)就是这样一个数学工具,它的应用可以取得巨大的成功。本文提出了一种基于粗糙集的文本数据集特征选择方法,实现了文本挖掘的目的。我们采取了不同的样本文本案例文件(如传记文本数据,不同领域的样本研究文章,来自某些来源的新闻文章)作为输入,这些文件可以是。txt,。pdf等形式或任何其他格式。我们还给出了我们提出的算法的复杂性分析和样本文本数据集的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Rough set based feature selection approach for text mining
Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Single-resistance-controlled quadrature oscillator employing two current differencing buffered amplifier FMODC: Fuzzy guided multi-objective document clustering by GA A study on disruption tolerant session based mobile architecture How effective is Black Hole Algorithm? Design of a high gain 16 element array of microstrip patch antennas for millimeter wave applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1