新闻的自修剪分类模型

Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis, M. Alamaniotis
{"title":"新闻的自修剪分类模型","authors":"Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis, M. Alamaniotis","doi":"10.1109/IISA.2019.8900751","DOIUrl":null,"url":null,"abstract":"News aggregators are on-line services that collect articles from numerous reputable media and news providers and reorganize them in a convenient manner with the aim of assisting their users to access the information they seek. One of the most important tools offered by news aggregators is based on the classification of the articles into a fixed set of categories. In this article, we introduce a supervised classification method for news articles that analyzes their titles and constructs multiple types of tokens including single words and n-grams of variable sizes. In the sequel, it employs several statistics, such as frequencies and token-class correlations, to assign two importance scores to each token. These scores reflect the ambiguity of a token; namely, how significant it is for the classification of an article to a category. The tokens and their scores are stored in a support structure that is subsequently used to classify the unlabeled articles. In addition, we propose a dimensionality reduction approach that reduces the size of the model without significant degradation of its classification performance. The algorithm is experimentally evaluated by employing a popular dataset of news articles and is found to outperform standard classification methods.","PeriodicalId":371385,"journal":{"name":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Self-Pruning Classification Model for News\",\"authors\":\"Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis, M. Alamaniotis\",\"doi\":\"10.1109/IISA.2019.8900751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"News aggregators are on-line services that collect articles from numerous reputable media and news providers and reorganize them in a convenient manner with the aim of assisting their users to access the information they seek. One of the most important tools offered by news aggregators is based on the classification of the articles into a fixed set of categories. In this article, we introduce a supervised classification method for news articles that analyzes their titles and constructs multiple types of tokens including single words and n-grams of variable sizes. In the sequel, it employs several statistics, such as frequencies and token-class correlations, to assign two importance scores to each token. These scores reflect the ambiguity of a token; namely, how significant it is for the classification of an article to a category. The tokens and their scores are stored in a support structure that is subsequently used to classify the unlabeled articles. In addition, we propose a dimensionality reduction approach that reduces the size of the model without significant degradation of its classification performance. The algorithm is experimentally evaluated by employing a popular dataset of news articles and is found to outperform standard classification methods.\",\"PeriodicalId\":371385,\"journal\":{\"name\":\"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA.2019.8900751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2019.8900751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

新闻聚合器是一种在线服务,它从众多知名媒体和新闻提供者那里收集文章,并以方便的方式对其进行重组,目的是帮助用户访问他们所寻找的信息。新闻聚合器提供的最重要的工具之一是将文章分类为一组固定的类别。在本文中,我们介绍了一种新闻文章的监督分类方法,该方法分析了新闻文章的标题,并构建了多种类型的标记,包括单个单词和可变大小的n-gram。在续集中,它使用了一些统计数据,如频率和标记类相关性,为每个标记分配两个重要分数。这些分数反映了标记的模糊性;也就是说,将一件物品归类到一个类别的重要性。令牌及其分数存储在支撑结构中,随后用于对未标记的物品进行分类。此外,我们提出了一种降维方法,可以在不显著降低分类性能的情况下减少模型的大小。通过使用流行的新闻文章数据集对该算法进行了实验评估,发现其优于标准分类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Self-Pruning Classification Model for News
News aggregators are on-line services that collect articles from numerous reputable media and news providers and reorganize them in a convenient manner with the aim of assisting their users to access the information they seek. One of the most important tools offered by news aggregators is based on the classification of the articles into a fixed set of categories. In this article, we introduce a supervised classification method for news articles that analyzes their titles and constructs multiple types of tokens including single words and n-grams of variable sizes. In the sequel, it employs several statistics, such as frequencies and token-class correlations, to assign two importance scores to each token. These scores reflect the ambiguity of a token; namely, how significant it is for the classification of an article to a category. The tokens and their scores are stored in a support structure that is subsequently used to classify the unlabeled articles. In addition, we propose a dimensionality reduction approach that reduces the size of the model without significant degradation of its classification performance. The algorithm is experimentally evaluated by employing a popular dataset of news articles and is found to outperform standard classification methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A NoSQL Approach for Aspect Mining of Cultural Heritage Streaming Data Advancing Adult Online Education through a SN-Learning Environment Smart educational games and Consent under the scope of General Data Protection Regulation Timetable Scheduling Using a Hybrid Particle Swarm Optimization with Local Search Approach Data Mining for Smart Cities: Predicting Electricity Consumption by Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1