Research on Medical Document Categorization

Qirui Zhang, Yonggang Xue, Huaying Zhou, Jinghua Tan
{"title":"Research on Medical Document Categorization","authors":"Qirui Zhang, Yonggang Xue, Huaying Zhou, Jinghua Tan","doi":"10.1109/FBIE.2008.83","DOIUrl":null,"url":null,"abstract":"Medical document categorization is the process of automatically assigning one or more predefined category labels to medical documents. Document indexing plays a very important role in the process of classification. This paper proposes an improved method of computing term weights which is called tfidfie (term frequency, inverted document frequency and inverted entropy). In comparison with the tfidf (term frequency and inverted document frequency) function, the tfidfie function adds an information entropy factor, H, which represents the distribution of documents in the training set in which the term occurs. Then, we discuss the effects of training set in medical document categorization. An imbalanced training set decreases the performance of classifier. Considering the characteristics of medical documents, the medical classifiers are constructed by the methods of Naive Bayes and Rocchio respectively. The experiment results show that tfidfie improves the classification performance and Naive Bayes outperforms Rocchio.","PeriodicalId":415908,"journal":{"name":"2008 International Seminar on Future BioMedical Information Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Seminar on Future BioMedical Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FBIE.2008.83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Medical document categorization is the process of automatically assigning one or more predefined category labels to medical documents. Document indexing plays a very important role in the process of classification. This paper proposes an improved method of computing term weights which is called tfidfie (term frequency, inverted document frequency and inverted entropy). In comparison with the tfidf (term frequency and inverted document frequency) function, the tfidfie function adds an information entropy factor, H, which represents the distribution of documents in the training set in which the term occurs. Then, we discuss the effects of training set in medical document categorization. An imbalanced training set decreases the performance of classifier. Considering the characteristics of medical documents, the medical classifiers are constructed by the methods of Naive Bayes and Rocchio respectively. The experiment results show that tfidfie improves the classification performance and Naive Bayes outperforms Rocchio.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
医学文献分类研究
医疗文档分类是将一个或多个预定义的类别标签自动分配给医疗文档的过程。文献标引在分类过程中起着非常重要的作用。本文提出了一种改进的术语权重计算方法,称为tfidfie(术语频率、倒立文档频率和倒立熵)。与tfidf (term frequency and inverse document frequency)函数相比,tfidf函数增加了一个信息熵因子H, H表示该词出现在训练集中的文档分布。然后,我们讨论了训练集在医学文献分类中的作用。不平衡的训练集会降低分类器的性能。针对医学文献的特点,分别采用朴素贝叶斯和罗基奥方法构建医学分类器。实验结果表明,该算法提高了分类性能,朴素贝叶斯算法优于罗基奥算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Realization and Application Research of BP Neural Network Based on MATLAB Design of Intelligent Guiding Equipment Based on Man-Machine Interaction and Multi-sensor Technique Modeling of the Combustion Optimizing Based on Fuzzy Neural Networks Research of OFDM System for PLC in UCM Based on Precoder Algorithm A New General Binary Image Watermarking in DCT Domain
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1