{"title":"基于良好编辑相似函数的学习实验研究","authors":"A. Bellet, M. Sebban, Amaury Habrard","doi":"10.1109/ICTAI.2011.27","DOIUrl":null,"url":null,"abstract":"Similarity functions are essential to many learning algorithms. To allow their use in support vector machines (SVM), i.e., for the convergence of the learning algorithm to be guaranteed, they must be valid kernels. In the case of structured data, the similarities based on the popular edit distance often do not satisfy this requirement, which explains why they are typically used with k-nearest neighbor (k-NN). A common approach to use such edit similarities in SVM is to transform them into potentially (but not provably) valid kernels. Recently, a different theory of learning with (e,g,t) -good similarity functions was proposed, allowing the use of non-kernel similarity functions. Moreover, the resulting models are supposedly sparse, as opposed to standard SVM models that can be unnecessarily dense. In this paper, we study the relevance and applicability of this theory in the context of string edit similarities. We show that they are naturally good for a given string classification task and provide experimental evidence that the obtained models not only clearly outperform the k-NN approach, but are also competitive with standard SVM models learned with state-of-the-art edit kernels, while being much sparser.","PeriodicalId":332661,"journal":{"name":"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An Experimental Study on Learning with Good Edit Similarity Functions\",\"authors\":\"A. Bellet, M. Sebban, Amaury Habrard\",\"doi\":\"10.1109/ICTAI.2011.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Similarity functions are essential to many learning algorithms. To allow their use in support vector machines (SVM), i.e., for the convergence of the learning algorithm to be guaranteed, they must be valid kernels. In the case of structured data, the similarities based on the popular edit distance often do not satisfy this requirement, which explains why they are typically used with k-nearest neighbor (k-NN). A common approach to use such edit similarities in SVM is to transform them into potentially (but not provably) valid kernels. Recently, a different theory of learning with (e,g,t) -good similarity functions was proposed, allowing the use of non-kernel similarity functions. Moreover, the resulting models are supposedly sparse, as opposed to standard SVM models that can be unnecessarily dense. In this paper, we study the relevance and applicability of this theory in the context of string edit similarities. We show that they are naturally good for a given string classification task and provide experimental evidence that the obtained models not only clearly outperform the k-NN approach, but are also competitive with standard SVM models learned with state-of-the-art edit kernels, while being much sparser.\",\"PeriodicalId\":332661,\"journal\":{\"name\":\"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2011.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2011.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Experimental Study on Learning with Good Edit Similarity Functions
Similarity functions are essential to many learning algorithms. To allow their use in support vector machines (SVM), i.e., for the convergence of the learning algorithm to be guaranteed, they must be valid kernels. In the case of structured data, the similarities based on the popular edit distance often do not satisfy this requirement, which explains why they are typically used with k-nearest neighbor (k-NN). A common approach to use such edit similarities in SVM is to transform them into potentially (but not provably) valid kernels. Recently, a different theory of learning with (e,g,t) -good similarity functions was proposed, allowing the use of non-kernel similarity functions. Moreover, the resulting models are supposedly sparse, as opposed to standard SVM models that can be unnecessarily dense. In this paper, we study the relevance and applicability of this theory in the context of string edit similarities. We show that they are naturally good for a given string classification task and provide experimental evidence that the obtained models not only clearly outperform the k-NN approach, but are also competitive with standard SVM models learned with state-of-the-art edit kernels, while being much sparser.