Usman Malik, Mukesh Barange, Julien Saunier, A. Pauchet
{"title":"对话行为注释中手动与ASR转录训练的机器学习模型的性能比较","authors":"Usman Malik, Mukesh Barange, Julien Saunier, A. Pauchet","doi":"10.1109/ICTAI.2018.00156","DOIUrl":null,"url":null,"abstract":"Automatic dialogue act annotation of speech utterances is an important task in human-agent interaction in order to correctly interpret user utterances. Speech utterances can be transcribed manually or via Automatic Speech Recognizer (ASR). In this article, several Machine Learning models are trained on manual and ASR transcriptions of user utterances, using bag of words and n-grams feature generation approaches, and evaluated on ASR transcribed test set. Results show that models trained using ASR transcriptions perform better than algorithms trained on manual transcription. The impact of irregular distribution of dialogue acts on the accuracy of statistical models is also investigated, and a partial solution to this issue is shown using multimodal information as input.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Performance Comparison of Machine Learning Models Trained on Manual vs ASR Transcriptions for Dialogue Act Annotation\",\"authors\":\"Usman Malik, Mukesh Barange, Julien Saunier, A. Pauchet\",\"doi\":\"10.1109/ICTAI.2018.00156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic dialogue act annotation of speech utterances is an important task in human-agent interaction in order to correctly interpret user utterances. Speech utterances can be transcribed manually or via Automatic Speech Recognizer (ASR). In this article, several Machine Learning models are trained on manual and ASR transcriptions of user utterances, using bag of words and n-grams feature generation approaches, and evaluated on ASR transcribed test set. Results show that models trained using ASR transcriptions perform better than algorithms trained on manual transcription. The impact of irregular distribution of dialogue acts on the accuracy of statistical models is also investigated, and a partial solution to this issue is shown using multimodal information as input.\",\"PeriodicalId\":254686,\"journal\":{\"name\":\"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"volume\":\"116 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2018.00156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Comparison of Machine Learning Models Trained on Manual vs ASR Transcriptions for Dialogue Act Annotation
Automatic dialogue act annotation of speech utterances is an important task in human-agent interaction in order to correctly interpret user utterances. Speech utterances can be transcribed manually or via Automatic Speech Recognizer (ASR). In this article, several Machine Learning models are trained on manual and ASR transcriptions of user utterances, using bag of words and n-grams feature generation approaches, and evaluated on ASR transcribed test set. Results show that models trained using ASR transcriptions perform better than algorithms trained on manual transcription. The impact of irregular distribution of dialogue acts on the accuracy of statistical models is also investigated, and a partial solution to this issue is shown using multimodal information as input.