Kyudan Jung, Seungmin Bae, Nam Joon Kim, Hyun Gon Ryu, Hyuk-Jae Lee
{"title":"利用词频差提高自动识别和光学字符识别性能","authors":"Kyudan Jung, Seungmin Bae, Nam Joon Kim, Hyun Gon Ryu, Hyuk-Jae Lee","doi":"10.1109/ICEIC61013.2024.10457220","DOIUrl":null,"url":null,"abstract":"Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) to facilitate interactions between humans and machines. This paper proposes a system that enhances ASR performance. The proposed method accumulates images captured from lecture videos in real-time every 30 seconds. The frequency ratios between text data from captured images and text data calculated offline from over 333K are used to improve the ASR performance. Experimental results showed that the word error rate (WER) decreased by a maximum of 0.68% compared to using only the traditional ASR. Especially, the recognition rate for specialized terms frequently used in lectures showed an improvement of 64%.","PeriodicalId":518726,"journal":{"name":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"2 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving ASR Performance with OCR Through Using Word Frequency Difference\",\"authors\":\"Kyudan Jung, Seungmin Bae, Nam Joon Kim, Hyun Gon Ryu, Hyuk-Jae Lee\",\"doi\":\"10.1109/ICEIC61013.2024.10457220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) to facilitate interactions between humans and machines. This paper proposes a system that enhances ASR performance. The proposed method accumulates images captured from lecture videos in real-time every 30 seconds. The frequency ratios between text data from captured images and text data calculated offline from over 333K are used to improve the ASR performance. Experimental results showed that the word error rate (WER) decreased by a maximum of 0.68% compared to using only the traditional ASR. Especially, the recognition rate for specialized terms frequently used in lectures showed an improvement of 64%.\",\"PeriodicalId\":518726,\"journal\":{\"name\":\"2024 International Conference on Electronics, Information, and Communication (ICEIC)\",\"volume\":\"2 1\",\"pages\":\"1-4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 International Conference on Electronics, Information, and Communication (ICEIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEIC61013.2024.10457220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC61013.2024.10457220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
最近,人们对对话式人工智能(AI)的兴趣与日俱增。因此,人们正在积极开展自动语音识别(ASR)方面的研究,以促进人类与机器之间的互动。本文提出了一种可提高 ASR 性能的系统。所提出的方法每 30 秒实时累积从讲座视频中捕获的图像。捕获图像中的文本数据与从超过 333K 文本数据中离线计算出的文本数据之间的频率比被用来提高 ASR 性能。实验结果表明,与仅使用传统 ASR 相比,词错误率(WER)最大降低了 0.68%。特别是对讲座中常用专业术语的识别率提高了 64%。
Improving ASR Performance with OCR Through Using Word Frequency Difference
Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) to facilitate interactions between humans and machines. This paper proposes a system that enhances ASR performance. The proposed method accumulates images captured from lecture videos in real-time every 30 seconds. The frequency ratios between text data from captured images and text data calculated offline from over 333K are used to improve the ASR performance. Experimental results showed that the word error rate (WER) decreased by a maximum of 0.68% compared to using only the traditional ASR. Especially, the recognition rate for specialized terms frequently used in lectures showed an improvement of 64%.