{"title":"通过聚类邮件自动检测个人主题","authors":"Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu","doi":"10.1109/ETCS.2010.238","DOIUrl":null,"url":null,"abstract":"Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.","PeriodicalId":193276,"journal":{"name":"2010 Second International Workshop on Education Technology and Computer Science","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Automatically Detecting Personal Topics by Clustering Emails\",\"authors\":\"Huijie Yang, Junyong Luo, Meijuan Yin, Yan Liu\",\"doi\":\"10.1109/ETCS.2010.238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.\",\"PeriodicalId\":193276,\"journal\":{\"name\":\"2010 Second International Workshop on Education Technology and Computer Science\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Second International Workshop on Education Technology and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ETCS.2010.238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Workshop on Education Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCS.2010.238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatically Detecting Personal Topics by Clustering Emails
Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.