无监督数据驱动分类法学习

Mahmoud M. Hosny, S. El-Beltagy, M.E. Allam
{"title":"无监督数据驱动分类法学习","authors":"Mahmoud M. Hosny, S. El-Beltagy, M.E. Allam","doi":"10.1109/ACLING.2015.8","DOIUrl":null,"url":null,"abstract":"The ability to effectively organize textual information is a big challenge in intelligent text processing. With the increase in the amount of textual data being generated, this task is becoming more and more essential. In this paper we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system to achieve its goal. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those manually annotated by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Unsupervised Data Driven Taxonomy Learning\",\"authors\":\"Mahmoud M. Hosny, S. El-Beltagy, M.E. Allam\",\"doi\":\"10.1109/ACLING.2015.8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability to effectively organize textual information is a big challenge in intelligent text processing. With the increase in the amount of textual data being generated, this task is becoming more and more essential. In this paper we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system to achieve its goal. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those manually annotated by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%.\",\"PeriodicalId\":404268,\"journal\":{\"name\":\"2015 First International Conference on Arabic Computational Linguistics (ACLing)\",\"volume\":\"184 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 First International Conference on Arabic Computational Linguistics (ACLing)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACLING.2015.8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACLING.2015.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

有效组织文本信息的能力是智能文本处理的一大挑战。随着生成的文本数据量的增加,这一任务变得越来越重要。在本文中,我们提出了一个无监督的计算机辅助工具,用于自动构建分类方案和分类法,以提高自动文本分类的过程。该工具利用维基百科知识库及其分类系统来实现其目标。该工具的验证是使用从Google版主系列(埃及2.0)创意银行获得的大型语言数据集的子集完成的。通过比较从工具中自动获得的结果与由三个不同的评估人员手动注释的结果之间的相似性来评估工具的输出,验证工具的有效性。该工具的准确率为88.6%,召回率为81.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Unsupervised Data Driven Taxonomy Learning
The ability to effectively organize textual information is a big challenge in intelligent text processing. With the increase in the amount of textual data being generated, this task is becoming more and more essential. In this paper we present an unsupervised computer-aided tool for automatically building classification schemes and taxonomies for enhancing the process of automated text classification. The tool utilizes the Wikipedia knowledge base and its categorization system to achieve its goal. Validation of the tool was done using a subset of a large language dataset obtained from the Google moderator series (Egypt 2.0) idea bank. The output of the tool was evaluated by comparing the similarity between the results obtained automatically from the tool, and those manually annotated by three different human evaluators, verifying the effectiveness of the tool. The tool showed effectiveness with a precision of 88.6% and recall of 81.2%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Which Configuration Works Best? An Experimental Study on Supervised Arabic Twitter Sentiment Analysis Increasing the Accuracy of Opinion Mining in Arabic Tunisian Arabic aeb Wordnet: Current State and Future Extensions A Named Entities Recognition System for Modern Standard Arabic using Rule-Based Approach Transducers Cascades for an Automatic Recognition of Arabic Named Entities in Order to Establish Links to Free Resources
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1