VITA Search -一个在线媒体资源的智能多模式搜索和存档系统

2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT) Pub Date : 2019-10-01 DOI:10.1109/AICT47866.2019.8981781

Zhanibek Kozhirbayev, Zhandos Yessenbayev, Bagdat Myrzakhmetov

{"title":"VITA Search -一个在线媒体资源的智能多模式搜索和存档系统","authors":"Zhanibek Kozhirbayev, Zhandos Yessenbayev, Bagdat Myrzakhmetov","doi":"10.1109/AICT47866.2019.8981781","DOIUrl":null,"url":null,"abstract":"In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.","PeriodicalId":329473,"journal":{"name":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources\",\"authors\":\"Zhanibek Kozhirbayev, Zhandos Yessenbayev, Bagdat Myrzakhmetov\",\"doi\":\"10.1109/AICT47866.2019.8981781\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.\",\"PeriodicalId\":329473,\"journal\":{\"name\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"279 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT47866.2019.8981781\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT47866.2019.8981781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们介绍了智能多模态搜索和档案系统的工作，其中应用了哈萨克语和俄语语音识别，语言识别和口语术语检测方法方面的科学发现。本文描述了所开发系统的目标、体系结构以及子系统模块。VITA搜索系统可以准确地确定来自各种广播频道的哈萨克语和俄语数据中所需口语信息的确切时间。语音识别单元使用Kaldi工具包从原始音频数据生成格。使用深度神经网络训练的声学模型显示了显著的结果。在训练集上，哈萨克语语音识别的错误率为3.86，俄语语音识别的错误率为9.85。此外，我们整合了一个使用长短期记忆递归神经网络训练的语言识别模型，以便为输入音频选择正确的模型。在口语术语检测方面，我们采用基于单词和代理的方法在格中搜索关键字术语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources

In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)

自引率

0.00%

发文量