VITA Search -一个在线媒体资源的智能多模式搜索和存档系统

Zhanibek Kozhirbayev, Zhandos Yessenbayev, Bagdat Myrzakhmetov
{"title":"VITA Search -一个在线媒体资源的智能多模式搜索和存档系统","authors":"Zhanibek Kozhirbayev, Zhandos Yessenbayev, Bagdat Myrzakhmetov","doi":"10.1109/AICT47866.2019.8981781","DOIUrl":null,"url":null,"abstract":"In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.","PeriodicalId":329473,"journal":{"name":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","volume":"279 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources\",\"authors\":\"Zhanibek Kozhirbayev, Zhandos Yessenbayev, Bagdat Myrzakhmetov\",\"doi\":\"10.1109/AICT47866.2019.8981781\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.\",\"PeriodicalId\":329473,\"journal\":{\"name\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"279 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICT47866.2019.8981781\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICT47866.2019.8981781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们介绍了智能多模态搜索和档案系统的工作,其中应用了哈萨克语和俄语语音识别,语言识别和口语术语检测方法方面的科学发现。本文描述了所开发系统的目标、体系结构以及子系统模块。VITA搜索系统可以准确地确定来自各种广播频道的哈萨克语和俄语数据中所需口语信息的确切时间。语音识别单元使用Kaldi工具包从原始音频数据生成格。使用深度神经网络训练的声学模型显示了显著的结果。在训练集上,哈萨克语语音识别的错误率为3.86,俄语语音识别的错误率为9.85。此外,我们整合了一个使用长短期记忆递归神经网络训练的语言识别模型,以便为输入音频选择正确的模型。在口语术语检测方面,我们采用基于单词和代理的方法在格中搜索关键字术语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources
In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Geometric fractal index as a tool of the time series analysis Facial Emotion Recognition using Convolutional Neural Networks Algorithm Diagnosis of Anemia on the basis of the Method of the Synthesis of the Decisive Rules How to Design Dialogue Scenarios and Estimate Main Dialogue Parameters for a Voice-Controlled Man-Machine Interface A Conceptual Model of an Intelligent Platform for Security Risk Assessment in SMEs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1