Extracting knowledge using probabilistic classifier for text mining

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering Pub Date : 2013-04-15 DOI:10.1109/ICPRIME.2013.6496517

S. Subbaiah

引用次数: 9

Abstract

Text mining is a process of extracting knowledge from large text documents. A new probabilistic classifier for text mining is proposed in this paper. It uses ODP taxonomy and domain ontology and datasets to cluster and identify the category of the given text document. The proposed work has three steps, namely, preprocessing, rule generation and probability calculation. At the stage of preprocessing the input document is split into paragraphs and statements. In rule generation, the documents from the training set are read. In probability calculation, positive and negative weight factor is calculated. The proposed algorithm calculates the positive probability value and negative probability value for each term set or pattern identified from the document. Based on the calculated probability value the probabilistic classifier indexes the document to the concern group of the cluster.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于概率分类器的文本挖掘知识提取

文本挖掘是一种从大型文本文档中提取知识的过程。提出了一种新的用于文本挖掘的概率分类器。它使用ODP分类法、领域本体和数据集对给定文本文档进行聚类和分类。本文的工作分为预处理、规则生成和概率计算三个步骤。在预处理阶段，输入文档被分成段落和语句。在规则生成中，从训练集中读取文档。在概率计算中，计算正负权因子。所提出的算法计算从文档中识别的每个术语集或模式的正概率值和负概率值。根据计算出的概率值，概率分类器将文档索引到聚类的关注组中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering

自引率

0.00%

发文量

期刊最新文献

Separable reversible data hiding using Rc4 algorithm Personal approach for mobile search: A review Bijective soft set based classification of medical data Deployment and power assignment problem in Wireless Sensor Networks for intruder detection application using MEA Protein sequence motif patterns using adaptive Fuzzy C-Means granular computing model