J. Trejo, G. Sidorov, Marco Moreno, Sabino Miranda-Jiménez, Rodrigo Cadena Martínez
{"title":"基于软相似度的路透社-21578语料多标签分类","authors":"J. Trejo, G. Sidorov, Marco Moreno, Sabino Miranda-Jiménez, Rodrigo Cadena Martínez","doi":"10.1109/MICAI.2014.7","DOIUrl":null,"url":null,"abstract":"In classification tasks one of the main problems is to choose which features provide best results, i.e., Construct a vector space model. In this paper, we show how to complement traditional vector space model with the concept of soft similarity. We use the combination of the traditional tf-idf model with latent Dirichlet allocation applied in multi-label classification. We considered multi-label files of the Reuters-21578 corpus as study case. The methodology is evaluated using the multi-label algorithm Rakell. We used the traditional tf-idf model as the baseline. We present the F1 measures for both models for various feature sets, preprocessing techniques and vector sizes. The new model obtains better results than the base line model.","PeriodicalId":189896,"journal":{"name":"2014 13th Mexican International Conference on Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Soft Similarity in Multi-label Classification for Reuters-21578 Corpus\",\"authors\":\"J. Trejo, G. Sidorov, Marco Moreno, Sabino Miranda-Jiménez, Rodrigo Cadena Martínez\",\"doi\":\"10.1109/MICAI.2014.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In classification tasks one of the main problems is to choose which features provide best results, i.e., Construct a vector space model. In this paper, we show how to complement traditional vector space model with the concept of soft similarity. We use the combination of the traditional tf-idf model with latent Dirichlet allocation applied in multi-label classification. We considered multi-label files of the Reuters-21578 corpus as study case. The methodology is evaluated using the multi-label algorithm Rakell. We used the traditional tf-idf model as the baseline. We present the F1 measures for both models for various feature sets, preprocessing techniques and vector sizes. The new model obtains better results than the base line model.\",\"PeriodicalId\":189896,\"journal\":{\"name\":\"2014 13th Mexican International Conference on Artificial Intelligence\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 13th Mexican International Conference on Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MICAI.2014.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 13th Mexican International Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICAI.2014.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Soft Similarity in Multi-label Classification for Reuters-21578 Corpus
In classification tasks one of the main problems is to choose which features provide best results, i.e., Construct a vector space model. In this paper, we show how to complement traditional vector space model with the concept of soft similarity. We use the combination of the traditional tf-idf model with latent Dirichlet allocation applied in multi-label classification. We considered multi-label files of the Reuters-21578 corpus as study case. The methodology is evaluated using the multi-label algorithm Rakell. We used the traditional tf-idf model as the baseline. We present the F1 measures for both models for various feature sets, preprocessing techniques and vector sizes. The new model obtains better results than the base line model.