Automatically Detecting Personal Topics by Clustering Emails

Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu
{"title":"Automatically Detecting Personal Topics by Clustering Emails","authors":"Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu","doi":"10.1109/ETCS.2010.238","DOIUrl":null,"url":null,"abstract":"Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.","PeriodicalId":193276,"journal":{"name":"2010 Second International Workshop on Education Technology and Computer Science","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Workshop on Education Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCS.2010.238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过聚类邮件自动检测个人主题
电子邮件在我们的日常生活中扮演着重要的角色。人们已经认识到,将电子邮件聚类成有意义的组可以大大节省处理电子邮件的认知负荷。如何对邮件进行组织和管理,如何方便有效地挖掘有意义的数据,成为邮箱用户越来越关心的问题。本文提出了一种基于聚类算法的个人话题检测方法。首先对邮件进行预处理,构建改进的邮件VSM(向量空间模型),以一种新的方法将邮件的正文和主题结合起来进行标记,然后采用先进的k-means算法对邮件进行聚类,并基于最低相似度设计核选择算法,得到合适的关键词对每个聚类的主题进行标记。最后,在20个新闻组邮件数据集上进行的实验表明了该方法的有效性,实验结果与标记的人聚类结果也很好地匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A New Classifier for Multi-Class Problems Based on Negative Selection Algorithm Establishment of School Database Management System Based on VB and MapX: A Case Study of Shandong University of Technology Application of Microscopic Image Segmentation Technology in Locust-Control Pesticide Research Questions and Strategies of Learning Design under the Informational Circumstance Intelligent Test System on the Power Battery of Hybrid Electric Vehicle
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1