{"title":"短文本分类中的术语相似度度量","authors":"H. Seki, Shuhei Toriyama","doi":"10.1109/IWCIA47330.2019.8955045","DOIUrl":null,"url":null,"abstract":"We study term expansion (or document expansion), which is used for classifying documents, especially for short documents such as twitter and blogs on the Web. Term expansion enables us to augment the sparse information in those short documents. Carpineto et al. have proposed a term expansion method based on FCA (Formal Concept Analysis), while Rogers et al. have proposed another term expansion method based on LDA (Latent Dirichlet Allocation). In this paper, we take the notion of weighted term similarity measures in FCA, and examine its effectiveness used for term expansion. We also study the effectiveness of some correlation measures in the field of association rule mining. We perform some experimental study on the effects of the proposed term similarity measures in term expansion using two short text corpora. The experimental results show that those weighted term similarity measures, when choosing an appropriate weight value, outperform the prior methods.","PeriodicalId":139434,"journal":{"name":"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On Term Similarity Measures for Short Text Classification\",\"authors\":\"H. Seki, Shuhei Toriyama\",\"doi\":\"10.1109/IWCIA47330.2019.8955045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study term expansion (or document expansion), which is used for classifying documents, especially for short documents such as twitter and blogs on the Web. Term expansion enables us to augment the sparse information in those short documents. Carpineto et al. have proposed a term expansion method based on FCA (Formal Concept Analysis), while Rogers et al. have proposed another term expansion method based on LDA (Latent Dirichlet Allocation). In this paper, we take the notion of weighted term similarity measures in FCA, and examine its effectiveness used for term expansion. We also study the effectiveness of some correlation measures in the field of association rule mining. We perform some experimental study on the effects of the proposed term similarity measures in term expansion using two short text corpora. The experimental results show that those weighted term similarity measures, when choosing an appropriate weight value, outperform the prior methods.\",\"PeriodicalId\":139434,\"journal\":{\"name\":\"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWCIA47330.2019.8955045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWCIA47330.2019.8955045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On Term Similarity Measures for Short Text Classification
We study term expansion (or document expansion), which is used for classifying documents, especially for short documents such as twitter and blogs on the Web. Term expansion enables us to augment the sparse information in those short documents. Carpineto et al. have proposed a term expansion method based on FCA (Formal Concept Analysis), while Rogers et al. have proposed another term expansion method based on LDA (Latent Dirichlet Allocation). In this paper, we take the notion of weighted term similarity measures in FCA, and examine its effectiveness used for term expansion. We also study the effectiveness of some correlation measures in the field of association rule mining. We perform some experimental study on the effects of the proposed term similarity measures in term expansion using two short text corpora. The experimental results show that those weighted term similarity measures, when choosing an appropriate weight value, outperform the prior methods.