Luis Javier Rodriguez-Fuentes, M. Peñagarikano, A. Varona, M. Díez, Germán Bordel, D. M. González, Jesús Antonio Villalba López, A. Miguel, A. Ortega, EDUARDO LLEIDA SOLANO, A. Abad, Oscar Koller, I. Trancoso, Paula Lopez-Otero, Laura Docío Fernández, C. García-Mateo, R. Saeidi, Mehdi Soufifar, T. Kinnunen, T. Svendsen, P. Fränti
{"title":"Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation","authors":"Luis Javier Rodriguez-Fuentes, M. Peñagarikano, A. Varona, M. Díez, Germán Bordel, D. M. González, Jesús Antonio Villalba López, A. Miguel, A. Ortega, EDUARDO LLEIDA SOLANO, A. Abad, Oscar Koller, I. Trancoso, Paula Lopez-Otero, Laura Docío Fernández, C. García-Mateo, R. Saeidi, Mehdi Soufifar, T. Kinnunen, T. Svendsen, P. Fränti","doi":"10.1109/ASRU.2011.6163961","DOIUrl":null,"url":null,"abstract":"Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless the fusion approach, it is assumed that different systems may contribute complementary information, either because they are developed on different datasets, or because they use different features or different modeling approaches. Most authors apply fusion as a final resource for improving performance based on an existing set of systems. Though relative performance gains decrease as larger sets of systems are considered, best performance is usually attained by fusing all the available systems, which may lead to high computational costs. In this paper, we aim to discover which technologies combine the best through fusion and to analyse the factors (data, features, modeling methodologies, etc.) that may explain such a good performance. Results are presented and discussed for a number of systems provided by the participating sites and the organizing team of the Albayzin 2010 Language Recognition Evaluation. We hope the conclusions of this work help research groups make better decisions in developing language recognition technology.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Best language recognition performance is commonly obtained by fusing the scores of several heterogeneous systems. Regardless the fusion approach, it is assumed that different systems may contribute complementary information, either because they are developed on different datasets, or because they use different features or different modeling approaches. Most authors apply fusion as a final resource for improving performance based on an existing set of systems. Though relative performance gains decrease as larger sets of systems are considered, best performance is usually attained by fusing all the available systems, which may lead to high computational costs. In this paper, we aim to discover which technologies combine the best through fusion and to analyse the factors (data, features, modeling methodologies, etc.) that may explain such a good performance. Results are presented and discussed for a number of systems provided by the participating sites and the organizing team of the Albayzin 2010 Language Recognition Evaluation. We hope the conclusions of this work help research groups make better decisions in developing language recognition technology.