{"title":"多类响应的混合专家注释:近似率和一致贝叶斯推断","authors":"Yang Ge, Wenxin Jiang","doi":"10.1145/1143844.1143886","DOIUrl":null,"url":null,"abstract":"We report that mixtures of m multinomial logistic regression can be used to approximate a class of 'smooth' probability models for multiclass responses. With bounded second derivatives of log-odds, the approximation rate is O(m-2/s) in Hellinger distance or O(m-4/s) in Kullback-Leibler divergence. Here s = dim(x) is the dimension of the input space (or the number of predictors). With the availability of training data of size n, we also show that 'consistency' in multiclass regression and classification can be achieved, simultaneously for all classes, when posterior based inference is performed in a Bayesian framework. Loosely speaking, such 'consistency' refers to performance being often close to the best possible for large n. Consistency can be achieved either by taking m = mn, or by taking m to be uniformly distributed among {1, ...,mn} according to the prior, where 1 ≺ mn ≺ na in order as n grows, for some a ∈ (0, 1).","PeriodicalId":124011,"journal":{"name":"Proceedings of the 23rd international conference on Machine learning","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A note on mixtures of experts for multiclass responses: approximation rate and Consistent Bayesian Inference\",\"authors\":\"Yang Ge, Wenxin Jiang\",\"doi\":\"10.1145/1143844.1143886\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We report that mixtures of m multinomial logistic regression can be used to approximate a class of 'smooth' probability models for multiclass responses. With bounded second derivatives of log-odds, the approximation rate is O(m-2/s) in Hellinger distance or O(m-4/s) in Kullback-Leibler divergence. Here s = dim(x) is the dimension of the input space (or the number of predictors). With the availability of training data of size n, we also show that 'consistency' in multiclass regression and classification can be achieved, simultaneously for all classes, when posterior based inference is performed in a Bayesian framework. Loosely speaking, such 'consistency' refers to performance being often close to the best possible for large n. Consistency can be achieved either by taking m = mn, or by taking m to be uniformly distributed among {1, ...,mn} according to the prior, where 1 ≺ mn ≺ na in order as n grows, for some a ∈ (0, 1).\",\"PeriodicalId\":124011,\"journal\":{\"name\":\"Proceedings of the 23rd international conference on Machine learning\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd international conference on Machine learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1143844.1143886\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd international conference on Machine learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1143844.1143886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A note on mixtures of experts for multiclass responses: approximation rate and Consistent Bayesian Inference
We report that mixtures of m multinomial logistic regression can be used to approximate a class of 'smooth' probability models for multiclass responses. With bounded second derivatives of log-odds, the approximation rate is O(m-2/s) in Hellinger distance or O(m-4/s) in Kullback-Leibler divergence. Here s = dim(x) is the dimension of the input space (or the number of predictors). With the availability of training data of size n, we also show that 'consistency' in multiclass regression and classification can be achieved, simultaneously for all classes, when posterior based inference is performed in a Bayesian framework. Loosely speaking, such 'consistency' refers to performance being often close to the best possible for large n. Consistency can be achieved either by taking m = mn, or by taking m to be uniformly distributed among {1, ...,mn} according to the prior, where 1 ≺ mn ≺ na in order as n grows, for some a ∈ (0, 1).