<p>Al-Aboosi, Yasin, Univ. of Mustansiriyah</p><p>A, Revathi, SASTRA Deemed Univ.</p><p>A, UMAMAGESWARI, SRM Univ. - Ramapuram Campus</p><p>Ab. Rahman, Azamuddin, Universiti Malaysia Pahang Al-Sultan Abdullah</p><p>Abbasi, Muhammad Inam, Universiti Teknikal Malaysia Melaka</p><p>Abd El-Hafeez, Tarek, Minia Univ.</p><p>Abd Rahman, Mohd Amiruddin, Universiti Putra Malaysia</p><p>Abdullah-Al-Shafi, Md., Univ. of Dhaka</p><p>ABOLADE, Jeremiah, Pan African Univ.</p><p>Abraham, Bejoy, College of Engineering Muttathara</p><p>Afify, Heba M., Higher Inst. of Engineering in Shorouk Academy</p><p>Afzal, Muhammad Khalil, COMSATS Univ Islamabad</p><p>Ahire, Harshawardhan, Veermata Jijabai Technological Institute</p><p>Ahmad, Mushtaq, Nanjing Univ. of Aeronautics and Astronautics</p><p>Ahmadi, Mahmood, Univ. of Razi</p><p>Ahmed, Anas, Al Iraqia Univ.</p><p>Ahmed, Areeb, Mohammad Ali Jinnah Univ.</p><p>Ahmed, Irfan, NED Univ. of Engineering & Technology</p><p>Ahmed, Nisar, Univ. of Engineering and Technology Lahore, Pakistan</p><p>Ahmed, Suhaib, Baba Ghulam Shah Badshah Univ.</p><p>Ahn, Jin-Hyun, Myongji Univ. - Yongin Campus</p><p>Ahn, Seokki, ETRI</p><p>Ahn, Sungjun, Electronic and Telecom Research Institute</p><p>Ajayan, J., SNS College of Technology</p><p>Ajib, Wessam, Univ Quebec</p><p>Akbar, Son, Universitas Ahmad Dahlan</p><p>Akhriza, Tubagus, Kampus STIMATA</p><p>Akioka, Sayaka, Meiji Univ.</p><p>Al-Ali, Ahmed Kamil Hasan, Queensland Univ. of Technology</p><p>Alfaro, Emigdio, Universidad César Vallejo</p><p>alghanimi, abdulhameed, Middle Technical Univ.</p><p>Al-Hadi, Azremi Abdullah, Universiti Malaysia Perlis</p><p>Ali, Dia M, Ninevah Univ.</p><p>ali, Tariq, PMAS Arid Agriculture Univ.</p><p>Al-kaltakchi, Musab, Mustansiriyah Univ.</p><p>Al-Kaltakchi, Musab T. S., Mustansiriyah Univ.</p><p>Almasoud, Abdullah, Prince Sattam Bin Abdulaziz Univ.</p><p>almufti, saman, Nawroz Univ.</p><p>Al-qaness, Mohammed A. A., Wuhan Univ.</p><p>Al-Waeli, Ali H. A., American Univ. of Iraq</p><p>amin, Farhan, Yeungnam Univ.</p><p>Aminzadeh, Hamed, Payame Noor Univ.</p><p>Anwar, Aqeel, Georgia Tech</p><p>Arafat, Muhammad Yeasir, Chosun Univ.</p><p>Arif, Mehmood, Khwaja Fareed Univ. of Engineering & Information Technology</p><p>Asgher, Umer, National Univ. of Sciences and Technology</p><p>Ashraf, Umer, NIT Srinagar</p><p>Atrey, Pradeep, State Univ. of New York</p><p>Awais, Qasim, Fatima Jinnah Women Univ.</p><p>B, Srinivas, Maharaj Vijayaram Gajapathi Ram College of Engineering</p><p>Bahar, Ali Newaz, Univ.of Saskatchewan</p><p>Bahng, Seungjae, ETRI</p><p>Bakkiam David, Deebak, VIT Univ.</p><p>Becerra-Sánchez, Aldonso, Universidad Autónoma de Zacatecas</p><p>Bhaskar, D. R., Delhi Technological Univ.</p><p>Bhowmick, Anirban, VIT Univ.</p><p>Bilim, Mehmet, Nuh Naci Yazgan Univ.</p><p>Biswal, Sandeep, OPJU</p><p>bose, avishek, Oak Ridge National Laboratory</p><p>Bouwmans, Thierry, Universite de La Rochelle</p><p>Brahmbhatt, Viraj, Union College</p><p>bruzzese, roberto
)-班加罗尔校区Nam,Wonhong,Konkuk Univ.Ngebani,Ibo,Univ. of BotswanaNguyen Nhu,Chien,DeltaxNguyen,Ba Cao,Telecommunications Univ.Nguyen,Long H. B.、Oh, SoonSoo, Chosun Univ.Onan, Aytug, Izmir Katip Celebi Univ.Oria Oria, Ana Cinta, University of SevilleOttimo, Alberto, University of PisaPae, Dongsung, Sangmyung Univ.- Panda, Sanjaya Kumar, National Institute of Technology, WarangalParihar, Manoj Singh, PDPM Indian Institute of Information Technology Design and Manufacturing JabalpurPark, Chanjun, Korea Univ.Park, Hosung, Chonnam National Univ.Park, Hyunhee, Myeongji Daehakgyo Jayeon KemposeuPark, J.H, Pukyong National Univ.Park, Jaehyoung, Sejong Univ.Park, Jeman, Univ.Park,Yongbae,Ajou Univ.Patanavijit,Vorapoj,Assumption Univ.Paul,Anand,Kyungpook National Univ.Pei,Xinyue,Northeastern Univ.Peppas,Kostas,Univ.Prommee,Pipat,King Mongkut's Institute of Technology LadkrabangRabuske,Taimur,Universidade de LisboaRai,Hari Mohan,Indian Inst.Rajput,Amitesh,Birla Institute of Technology and Science - Pilani CampusRamírez-Chavarría,Roberto,Universidad Nacional Autónoma de MéxicoRanjan Senapati,Biswa,SOARauniyar,Ashish,SINTEF DigitalRavankar,Ankit A、Raza ur Rehman,Hafiz Muhammad,Yeungnam Univ.Raza,Naeem,Natl Textile Univ.Rezai,Abdalhossein,ACECRRojas,Elisa,Universidad de AlcalaRoshanghias,Reza,Yazd Univ.Roy, Soumitra, Roy Engineering CollegeSaeidi, Tale, Universiti Teknologi PETRONASSahota, Jasjot, Punjab Engineering CollegeSalem, Fatty M.. Helwan Univ、San-Segundo,Ruben,Univ Politecn MadridSawan,Mohamad,Westlake Univ.Sboev, Alexander, Kurchatov InstituteSchwarz, Sebastian, NokiaSeo, Dong-Wook, Korea Maritime and Ocean Univ.Seo, Hwajeong, Hansung Univ.Seo,Seungwoo,ETRIShahbaz,Ajmal,Univ. of UlsanSharan,Tripurari,North Eastern Regional Inst. of Sci.Amity School of Engineering and TechnologyShen, Hang, Nanjing Tech Univ.Shin, Seokjoo, Chosun Univ.Shin, Younghwan, ETRIShu, Xiangbo, Sch Comp SciShuja, Junaid, COMSATS Univ.Singgih,Ivan Kristianto,Univ. of SurabayaSingh,Rupender,Indian Institute of TechnologySingh,Shweta,Manipal Institute of Technology BengaluruSong,Giltae,Pusan Natl Univ.Srinivas, Kankanala, VIT-AP CampusSrivastava, Prashant, NIIT Univ.Stonier, Albert Alexander, VIT Univ.Subasi, Abdulhamit, Univ.NHKSWARNAKAR,SOUMEN,Netaji Subhash Engineering CollegeTahara,Tatsuki,National Institute of Information and Communications TechnologyTaher,Montadar Abas,Univ. of DiyalaTahir,Muhammad Atif,National Univ、Touil,Lamjed,Univ.of MonastirTsai,Jeng-Han,Norwegian Univ.Vahabi, Mohsen, Islamic Azad Univ. of Science and Research TehranValinataj, Mojtaba, Babol Noshirvani Univ.Vinnikov, Margarita, New Jersey Institute of TechnologyVu, Thai-Hoc, Ulsan Univ.Wan, Dehuan, Guangdong University of FinanceWang, Ding, National Digital Switching System Engineering and Technology Research CenterWang, Gai-Ge, Ocean University of ChinaWang, Guangchen, University of SydneyWang, Jihong, Northeast Electric Power Univ.王进,长沙理工大学王晶晶,北京航空航天大学王军,中国矿业大学王琳,伦敦玛丽女王大学王沐光,北京交通大学王文旭,广东外语外贸大学Wang,Wenxu,广东
{"title":"2023 Reviewer List","authors":"","doi":"10.4218/etr2.12667","DOIUrl":"https://doi.org/10.4218/etr2.12667","url":null,"abstract":"<p>Al-Aboosi, Yasin, Univ. of Mustansiriyah</p><p>A, Revathi, SASTRA Deemed Univ.</p><p>A, UMAMAGESWARI, SRM Univ. - Ramapuram Campus</p><p>Ab. Rahman, Azamuddin, Universiti Malaysia Pahang Al-Sultan Abdullah</p><p>Abbasi, Muhammad Inam, Universiti Teknikal Malaysia Melaka</p><p>Abd El-Hafeez, Tarek, Minia Univ.</p><p>Abd Rahman, Mohd Amiruddin, Universiti Putra Malaysia</p><p>Abdullah-Al-Shafi, Md., Univ. of Dhaka</p><p>ABOLADE, Jeremiah, Pan African Univ.</p><p>Abraham, Bejoy, College of Engineering Muttathara</p><p>Afify, Heba M., Higher Inst. of Engineering in Shorouk Academy</p><p>Afzal, Muhammad Khalil, COMSATS Univ Islamabad</p><p>Ahire, Harshawardhan, Veermata Jijabai Technological Institute</p><p>Ahmad, Mushtaq, Nanjing Univ. of Aeronautics and Astronautics</p><p>Ahmadi, Mahmood, Univ. of Razi</p><p>Ahmed, Anas, Al Iraqia Univ.</p><p>Ahmed, Areeb, Mohammad Ali Jinnah Univ.</p><p>Ahmed, Irfan, NED Univ. of Engineering & Technology</p><p>Ahmed, Nisar, Univ. of Engineering and Technology Lahore, Pakistan</p><p>Ahmed, Suhaib, Baba Ghulam Shah Badshah Univ.</p><p>Ahn, Jin-Hyun, Myongji Univ. - Yongin Campus</p><p>Ahn, Seokki, ETRI</p><p>Ahn, Sungjun, Electronic and Telecom Research Institute</p><p>Ajayan, J., SNS College of Technology</p><p>Ajib, Wessam, Univ Quebec</p><p>Akbar, Son, Universitas Ahmad Dahlan</p><p>Akhriza, Tubagus, Kampus STIMATA</p><p>Akioka, Sayaka, Meiji Univ.</p><p>Al-Ali, Ahmed Kamil Hasan, Queensland Univ. of Technology</p><p>Alfaro, Emigdio, Universidad César Vallejo</p><p>alghanimi, abdulhameed, Middle Technical Univ.</p><p>Al-Hadi, Azremi Abdullah, Universiti Malaysia Perlis</p><p>Ali, Dia M, Ninevah Univ.</p><p>ali, Tariq, PMAS Arid Agriculture Univ.</p><p>Al-kaltakchi, Musab, Mustansiriyah Univ.</p><p>Al-Kaltakchi, Musab T. S., Mustansiriyah Univ.</p><p>Almasoud, Abdullah, Prince Sattam Bin Abdulaziz Univ.</p><p>almufti, saman, Nawroz Univ.</p><p>Al-qaness, Mohammed A. A., Wuhan Univ.</p><p>Al-Waeli, Ali H. A., American Univ. of Iraq</p><p>amin, Farhan, Yeungnam Univ.</p><p>Aminzadeh, Hamed, Payame Noor Univ.</p><p>Anwar, Aqeel, Georgia Tech</p><p>Arafat, Muhammad Yeasir, Chosun Univ.</p><p>Arif, Mehmood, Khwaja Fareed Univ. of Engineering & Information Technology</p><p>Asgher, Umer, National Univ. of Sciences and Technology</p><p>Ashraf, Umer, NIT Srinagar</p><p>Atrey, Pradeep, State Univ. of New York</p><p>Awais, Qasim, Fatima Jinnah Women Univ.</p><p>B, Srinivas, Maharaj Vijayaram Gajapathi Ram College of Engineering</p><p>Bahar, Ali Newaz, Univ.of Saskatchewan</p><p>Bahng, Seungjae, ETRI</p><p>Bakkiam David, Deebak, VIT Univ.</p><p>Becerra-Sánchez, Aldonso, Universidad Autónoma de Zacatecas</p><p>Bhaskar, D. R., Delhi Technological Univ.</p><p>Bhowmick, Anirban, VIT Univ.</p><p>Bilim, Mehmet, Nuh Naci Yazgan Univ.</p><p>Biswal, Sandeep, OPJU</p><p>bose, avishek, Oak Ridge National Laboratory</p><p>Bouwmans, Thierry, Universite de La Rochelle</p><p>Brahmbhatt, Viraj, Union College</p><p>bruzzese, roberto","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"154-158"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etr2.12667","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HyunJung Choi, Muyeol Choi, Seonhui Kim, Yohan Lim, Minkyu Lee, Seung Yun, Donghyun Kim, Sang Hun Kim
The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.
{"title":"Spoken-to-written text conversion for enhancement of Korean–English readability and machine translation","authors":"HyunJung Choi, Muyeol Choi, Seonhui Kim, Yohan Lim, Minkyu Lee, Seung Yun, Donghyun Kim, Sang Hun Kim","doi":"10.4218/etrij.2023-0354","DOIUrl":"https://doi.org/10.4218/etrij.2023-0354","url":null,"abstract":"<p>The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"127-136"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0354","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We focus on open-domain question-answering tasks that involve a chain-of-reasoning, which are primarily implemented using large language models. With an emphasis on cost-effectiveness, we designed EffiChainQA, an architecture centered on the use of small language models. We employed a retrieval-based language model to address the limitations of large language models, such as the hallucination issue and the lack of updated knowledge. To enhance reasoning capabilities, we introduced a question decomposer that leverages a generative language model and serves as a key component in the chain-of-reasoning process. To generate training data for our question decomposer, we leveraged ChatGPT, which is known for its data augmentation ability. Comprehensive experiments were conducted using the HotpotQA dataset. Our method outperformed several established approaches, including the Chain-of-Thoughts approach, which is based on large language models. Moreover, our results are on par with those of state-of-the-art Retrieve-then-Read methods that utilize large language models.
{"title":"Towards a small language model powered chain-of-reasoning for open-domain question answering","authors":"Jihyeon Roh, Minho Kim, Kyoungman Bae","doi":"10.4218/etrij.2023-0355","DOIUrl":"https://doi.org/10.4218/etrij.2023-0355","url":null,"abstract":"<p>We focus on open-domain question-answering tasks that involve a chain-of-reasoning, which are primarily implemented using large language models. With an emphasis on cost-effectiveness, we designed <i>EffiChainQA</i>, an architecture centered on the use of small language models. We employed a retrieval-based language model to address the limitations of large language models, such as the hallucination issue and the lack of updated knowledge. To enhance reasoning capabilities, we introduced a question decomposer that leverages a generative language model and serves as a key component in the chain-of-reasoning process. To generate training data for our question decomposer, we leveraged ChatGPT, which is known for its data augmentation ability. Comprehensive experiments were conducted using the HotpotQA dataset. Our method outperformed several established approaches, including the <i>Chain-of-Thoughts</i> approach, which is based on large language models. Moreover, our results are on par with those of state-of-the-art <i>Retrieve-then-Read</i> methods that utilize large language models.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"11-21"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0355","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.
{"title":"KMSAV: Korean multi-speaker spontaneous audiovisual dataset","authors":"Kiyoung Park, Changhan Oh, Sunghee Dong","doi":"10.4218/etrij.2023-0352","DOIUrl":"https://doi.org/10.4218/etrij.2023-0352","url":null,"abstract":"<p>Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"71-81"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0352","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong-Jin Kim, Hyung-Min Park, Harksoo Kim, Seung-Hoon Na, Gerard Jounghyun Kim
<p>Recent advancements in artificial intelligence (AI) have substantially improved applications that depend on human speech and language comprehension. Human speech, characterized by the articulation of thoughts and emotions through sounds, relies on language, a complex system that uses words and symbols for interpersonal communication. The rapid evolution of AI has amplified the demand for related solutions to swiftly and efficiently process extensive amounts of speech and language data. Speech and language technologies have emerged as major topics in AI research, improving the capacity of computers to comprehend text and spoken language by resembling human cognition. These technological breakthroughs have enabled computers to interpret human language, whether expressed in textual or spoken forms, unveiling the comprehensive meaning of the intentions, nuances, and emotional cues expressed by writers or speakers.</p><p><i>Electronics and Telecommunications Research Institute (ETRI) Journal</i> is a peer-reviewed open-access journal launched in 1993 and published bimonthly by ETRI, Republic of Korea. It is intended to promote worldwide academic exchange of research on information, telecommunications, and electronics.</p><p>This special is devoted to all aspects and future research directions in the rapidly progressing subject of speech and language technologies. In particular, this special issue highlights recent outstanding results on the application of AI techniques to understand speech or natural language. We selected 12 outstanding papers on three topics of speech and language technologies. Below, we provide a summary of commitments to this special issue.</p><p>The first paper [<span>1</span>] “Towards a small language model powered chain-of-reasoning for open-domain question answering” by Roh and others focuses on open-domain question-answering tasks that involve a chain of reasoning primarily implemented using large language models. Emphasizing cost effectiveness, the authors introduce EffiChainQA, an architecture centered on the use of small language models. They employ a retrieval-based language model that is known to address the hallucination issue and incorporates up-to-date knowledge, thereby addressing common limitations of larger language models. In addition, they introduce a question decomposer that leverages a generative language model and is essential for enhanced chain of reasoning.</p><p>In the second paper in this special issue [<span>2</span>], “CR-M-SpanBERT: Multiple-embedding-based DNN Coreference Resolution Using Self-attention SpanBERT” by Jung, a model is proposed to incorporate multiple embeddings for coreference resolution based on the SpanBERT architecture. The experimental results show that multiple embeddings can improve the coreference resolution performance regardless of the employed baseline model, such as LSTM, BERT, and SpanBERT.</p><p>As automated essay scoring has evolved from handcrafted techniques to deep le
他们特别使用了基于 BERT 的重排系统,该系统大大提高了传统的基于词典的形态分析方法的准确性。本特刊的第六篇论文[6]是由 Yeo 等人撰写的 "大型语言模型代码生成能力评估框架",该论文介绍了一个评估大型语言模型代码生成能力的系统框架,并提出了一个名为 pass-rate@n 的新指标,该指标通过考虑测试通过率来捕捉细粒度的准确度水平。实验结果证明了评估框架的有效性,该框架可与现实世界的编码平台集成。Park 等人发表的题为 "KMSAV:韩国多说话者自发视听数据集 "的论文[7]是对该领域的另一个显著贡献。这篇论文介绍了一个丰富而广泛的数据库,其中包括约 150 小时经过严格转录和注释的视听数据,以及 2000 小时未经转录的 YouTube 视频。这个开放访问的语料库,伴随着一个量身定制的开源框架,通过使用最先进的自动和视听语音识别技术进行评估得到了验证。Bang 等人的第八篇论文[8]"使用大型语言模型从自发语音中识别阿尔茨海默病",通过广泛使用 ChatGPT 从潜在患者提供的图像描述中生成的评估反馈,提出了使用大型语言模型预测阿尔茨海默病的创新方法。反馈被用作语音多模态转换器块的附加特征。实验结果表明,利用 ChatGPT 的评估反馈可以大大提高诊断效果,从而推动了大型语言模型在某些疾病诊断中的应用。第九篇论文[9]是 Choi 等人撰写的 "用于后信道预测和自动语音识别的联合流模型",该论文探讨了人类对话的一个重要方面:及时使用 "嗯 "或 "呀 "等对话后信道。本文介绍了一种新方法,它利用流转换器和多任务学习将后信道预测与实时语音识别结合起来。研究结果表明,与现有方法相比,尤其是在流场景中,语音识别率有了大幅提高,这标志着在实现更自然、更吸引人的人机交互方面取得了实质性进展。本特刊的第十篇论文[10],即 Choi 等人撰写的 "为提高韩英可读性和机器翻译而进行的口语到书面文本转换",解决了由自动语音识别生成的韩语文本通常不是以书面形式而是以口语形式呈现的问题,尤其是在包含数字表达式和英语单词时。因此,在自动语音翻译中经常出现类似类型的歧义错误。为了减少这些常见错误,作者提出了一种韩语口语到书面语转录转换方法,该方法是在一个大规模数据集上训练出来的,该数据集包含 860 万个句子,其格式为转录风格,将文本片段的书面和口语形式统一起来。通过转录转换,韩语到英语的自动语音翻译得到了大幅改善,这表明高质量的任务感知数据对于正确训练人工智能模型的重要性。Jeon 等人撰写的题为 "Multimodal Audiovisual Speech Recognition Architecture Using a Three-feature Multifusion Method for Noise-robust Systems"(多模态视听语音识别架构,用于噪声抑制系统的三特征多融合方法)的论文,解决了在各种噪声环境下语音识别所面临的挑战[11]。本文提出了一种视听语音识别模型,该模型可模仿人类对话识别,在九种不同噪音水平的合成环境中表现出显著的鲁棒性。通过密集的时空卷积神经网络整合音频和视觉元素,该模型的错误率大大低于传统方法。这项研究可为增强语音识别服务铺平道路,使其在嘈杂环境中既能保持稳定,又能提高识别率。 随着先进的端到端自动语音识别和能力评估方法的开发,针对非母语人士的语言辅导系统取得了重大飞跃,正如 Kang 等人在论文[12]"基于人工智能的端到端自动语音识别和能力评估语言辅导系统 "中所介绍的那样。这篇论文详细介绍了如何结合半监督和迁移学习技术,利用不同的语音数据,创建能够熟练评估和反馈发音和流利程度的系统。为了突出其实际应用,本研究展示了两个已部署的系统:EBS AI PengTalk 和 KSI Korean AI Tutor,这两个系统分别提高了韩国小学生和学习韩语的外国人的语言学习能力。我们很高兴能参与其中,及时发表高质量的技术论文。所介绍的语音和语言模型研究必将有助于未来人工智能系统的设计和实施。
{"title":"Special issue on speech and language AI technologies","authors":"Dong-Jin Kim, Hyung-Min Park, Harksoo Kim, Seung-Hoon Na, Gerard Jounghyun Kim","doi":"10.4218/etr2.12666","DOIUrl":"https://doi.org/10.4218/etr2.12666","url":null,"abstract":"<p>Recent advancements in artificial intelligence (AI) have substantially improved applications that depend on human speech and language comprehension. Human speech, characterized by the articulation of thoughts and emotions through sounds, relies on language, a complex system that uses words and symbols for interpersonal communication. The rapid evolution of AI has amplified the demand for related solutions to swiftly and efficiently process extensive amounts of speech and language data. Speech and language technologies have emerged as major topics in AI research, improving the capacity of computers to comprehend text and spoken language by resembling human cognition. These technological breakthroughs have enabled computers to interpret human language, whether expressed in textual or spoken forms, unveiling the comprehensive meaning of the intentions, nuances, and emotional cues expressed by writers or speakers.</p><p><i>Electronics and Telecommunications Research Institute (ETRI) Journal</i> is a peer-reviewed open-access journal launched in 1993 and published bimonthly by ETRI, Republic of Korea. It is intended to promote worldwide academic exchange of research on information, telecommunications, and electronics.</p><p>This special is devoted to all aspects and future research directions in the rapidly progressing subject of speech and language technologies. In particular, this special issue highlights recent outstanding results on the application of AI techniques to understand speech or natural language. We selected 12 outstanding papers on three topics of speech and language technologies. Below, we provide a summary of commitments to this special issue.</p><p>The first paper [<span>1</span>] “Towards a small language model powered chain-of-reasoning for open-domain question answering” by Roh and others focuses on open-domain question-answering tasks that involve a chain of reasoning primarily implemented using large language models. Emphasizing cost effectiveness, the authors introduce EffiChainQA, an architecture centered on the use of small language models. They employ a retrieval-based language model that is known to address the hallucination issue and incorporates up-to-date knowledge, thereby addressing common limitations of larger language models. In addition, they introduce a question decomposer that leverages a generative language model and is essential for enhanced chain of reasoning.</p><p>In the second paper in this special issue [<span>2</span>], “CR-M-SpanBERT: Multiple-embedding-based DNN Coreference Resolution Using Self-attention SpanBERT” by Jung, a model is proposed to incorporate multiple embeddings for coreference resolution based on the SpanBERT architecture. The experimental results show that multiple embeddings can improve the coreference resolution performance regardless of the employed baseline model, such as LSTM, BERT, and SpanBERT.</p><p>As automated essay scoring has evolved from handcrafted techniques to deep le","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"7-10"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etr2.12666","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As automated essay scoring (AES) has progressed from handcrafted techniques to deep learning, holistic scoring capabilities have merged. However, specific trait assessment remains a challenge because of the limited depth of earlier methods in modeling dual assessments for holistic and multi-trait tasks. To overcome this challenge, we explore providing comprehensive feedback while modeling the interconnections between holistic and trait representations. We introduce the DualBERT-Trans-CNN model, which combines transformer-based representations with a novel dual-scale bidirectional encoder representations from transformers (BERT) encoding approach at the document-level. By explicitly leveraging multi-trait representations in a multi-task learning (MTL) framework, our DualBERT-Trans-CNN emphasizes the interrelation between holistic and trait-based score predictions, aiming for improved accuracy. For validation, we conducted extensive tests on the ASAP++ and TOEFL11 datasets. Against models of the same MTL setting, ours showed a 2.0% increase in its holistic score. Additionally, compared with single-task learning (STL) models, ours demonstrated a 3.6% enhancement in average multi-trait performance on the ASAP++ dataset.
{"title":"Dual-scale BERT using multi-trait representations for holistic and trait-specific essay grading","authors":"Minsoo Cho, Jin-Xia Huang, Oh-Woog Kwon","doi":"10.4218/etrij.2023-0324","DOIUrl":"https://doi.org/10.4218/etrij.2023-0324","url":null,"abstract":"<p>As automated essay scoring (AES) has progressed from handcrafted techniques to deep learning, holistic scoring capabilities have merged. However, specific trait assessment remains a challenge because of the limited depth of earlier methods in modeling dual assessments for holistic and multi-trait tasks. To overcome this challenge, we explore providing comprehensive feedback while modeling the interconnections between holistic and trait representations. We introduce the DualBERT-Trans-CNN model, which combines transformer-based representations with a novel dual-scale bidirectional encoder representations from transformers (BERT) encoding approach at the document-level. By explicitly leveraging multi-trait representations in a multi-task learning (MTL) framework, our DualBERT-Trans-CNN emphasizes the interrelation between holistic and trait-based score predictions, aiming for improved accuracy. For validation, we conducted extensive tests on the ASAP++ and TOEFL11 datasets. Against models of the same MTL setting, ours showed a 2.0% increase in its holistic score. Additionally, compared with single-task learning (STL) models, ours demonstrated a 3.6% enhancement in average multi-trait performance on the ASAP++ dataset.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"82-95"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jihee Ryu, Soojong Lim, Oh-Woog Kwon, Seung-Hoon Na
This study introduces a new approach in Korean morphological analysis combining dictionary-based techniques with Transformer-based deep learning models. The key innovation is the use of a BERT-based reranking system, significantly enhancing the accuracy of traditional morphological analysis. The method generates multiple suboptimal paths, then employs BERT models for reranking, leveraging their advanced language comprehension. Results show remarkable performance improvements, with the first-stage reranking achieving over 20% improvement in error reduction rate compared with existing models. The second stage, using another BERT variant, further increases this improvement to over 30%. This indicates a significant leap in accuracy, validating the effectiveness of merging dictionary-based analysis with contemporary deep learning. The study suggests future exploration in refined integrations of dictionary and deep learning methods as well as using probabilistic models for enhanced morphological analysis. This hybrid approach sets a new benchmark in the field and offers insights for similar challenges in language processing applications.
{"title":"Transformer-based reranking for improving Korean morphological analysis systems","authors":"Jihee Ryu, Soojong Lim, Oh-Woog Kwon, Seung-Hoon Na","doi":"10.4218/etrij.2023-0364","DOIUrl":"https://doi.org/10.4218/etrij.2023-0364","url":null,"abstract":"<p>This study introduces a new approach in Korean morphological analysis combining dictionary-based techniques with Transformer-based deep learning models. The key innovation is the use of a BERT-based reranking system, significantly enhancing the accuracy of traditional morphological analysis. The method generates multiple suboptimal paths, then employs BERT models for reranking, leveraging their advanced language comprehension. Results show remarkable performance improvements, with the first-stage reranking achieving over 20% improvement in error reduction rate compared with existing models. The second stage, using another BERT variant, further increases this improvement to over 30%. This indicates a significant leap in accuracy, validating the effectiveness of merging dictionary-based analysis with contemporary deep learning. The study suggests future exploration in refined integrations of dictionary and deep learning methods as well as using probabilistic models for enhanced morphological analysis. This hybrid approach sets a new benchmark in the field and offers insights for similar challenges in language processing applications.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"137-153"},"PeriodicalIF":1.4,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0364","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyclic interleaved frequency division multiplexing (CIFDM), a variant of IFDM, has recently been proposed. While CIFDM employs cyclic interleaving at the transmitter to make multipath components resolvable at the receiver, the current approach of matched filtering followed by multipath combining does not fully exploit the diversity available. This is primarily because the correlation residues among the codes have a significant impact on multipath resolution. As a solution, we introduce a novel multipath successive interference cancellation (SIC) technique for CIFDM, which replaces the conventional matched filtering approach. We have examined the performance of this proposed CIFDM-SIC technique and compared it with the conventional CIFDM-matched filter bank and IFDM schemes. Our simulation results clearly demonstrate the superiority of the proposed scheme over the existing ones.
{"title":"Multistage interference cancellation for cyclic interleaved frequency division multiplexing","authors":"G. Anuthirsha, S. Lenty Stuwart","doi":"10.4218/etrij.2023-0274","DOIUrl":"10.4218/etrij.2023-0274","url":null,"abstract":"<p>Cyclic interleaved frequency division multiplexing (CIFDM), a variant of IFDM, has recently been proposed. While CIFDM employs cyclic interleaving at the transmitter to make multipath components resolvable at the receiver, the current approach of matched filtering followed by multipath combining does not fully exploit the diversity available. This is primarily because the correlation residues among the codes have a significant impact on multipath resolution. As a solution, we introduce a novel multipath successive interference cancellation (SIC) technique for CIFDM, which replaces the conventional matched filtering approach. We have examined the performance of this proposed CIFDM-SIC technique and compared it with the conventional CIFDM-matched filter bank and IFDM schemes. Our simulation results clearly demonstrate the superiority of the proposed scheme over the existing ones.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 5","pages":"904-914"},"PeriodicalIF":1.3,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0274","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We obtain the air quality index (AQI) for a descriptive system aimed to communicate pollution risks to the population. The AQI is calculated based on major air pollutants including O3, CO, SO2, NO, NO2, benzene, and particulate matter PM2.5 that should be continuously balanced in clean air. Air pollution is a major limitation for urbanization and population growth in developing countries. Hence, automated AQI prediction by a deep learning method applied to time series may be advantageous. We use a seasonal autoregressive integrated moving average (SARIMA) model for predicting values reflecting past trends considered as seasonal patterns. In addition, a transductive long short-term memory (TLSTM) model learns dependencies through recurring memory blocks, thus learning long-term dependencies for AQI prediction. Further, the TLSTM increases the accuracy close to test points, which constitute a validation group. AQI prediction results confirm that the proposed SARIMA–TLSTM model achieves a higher accuracy (93%) than an existing convolutional neural network (87.98%), least absolute shrinkage and selection operator model (78%), and generative adversarial network (89.4%).
{"title":"Air quality index prediction using seasonal autoregressive integrated moving average transductive long short-term memory","authors":"Subramanian Deepan, Murugan Saravanan","doi":"10.4218/etrij.2023-0283","DOIUrl":"10.4218/etrij.2023-0283","url":null,"abstract":"<p>We obtain the air quality index (AQI) for a descriptive system aimed to communicate pollution risks to the population. The AQI is calculated based on major air pollutants including O<sub>3</sub>, CO, SO<sub>2</sub>, NO, NO<sub>2</sub>, benzene, and particulate matter PM2.5 that should be continuously balanced in clean air. Air pollution is a major limitation for urbanization and population growth in developing countries. Hence, automated AQI prediction by a deep learning method applied to time series may be advantageous. We use a seasonal autoregressive integrated moving average (SARIMA) model for predicting values reflecting past trends considered as seasonal patterns. In addition, a transductive long short-term memory (TLSTM) model learns dependencies through recurring memory blocks, thus learning long-term dependencies for AQI prediction. Further, the TLSTM increases the accuracy close to test points, which constitute a validation group. AQI prediction results confirm that the proposed SARIMA–TLSTM model achieves a higher accuracy (93%) than an existing convolutional neural network (87.98%), least absolute shrinkage and selection operator model (78%), and generative adversarial network (89.4%).</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 5","pages":"915-927"},"PeriodicalIF":1.3,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joghee Prasad, Arun Sekar Rajasekaran, J. Ajayan, Kambatty Bojan Gurumoorthy
Medical signal processing requires noise and interference-free inputs for precise segregation and classification operations. However, sensing and transmitting wireless media/devices generate noise that results in signal tampering in feature extractions. To address these issues, this article introduces a finite impulse response design based on a two-level transpose Vedic multiplier. The proposed architecture identifies the zero-noise impulse across the varying sensing intervals. In this process, the first level is the process of transpose array operations with equalization implemented to achieve zero noise at any sensed interval. This transpose occurs between successive array representations of the input with continuity. If the continuity is unavailable, then the noise interruption is considerable and results in signal tampering. The second level of the Vedic multiplier is to optimize the transpose speed for zero-noise segregation. This is performed independently for the zero- and nonzero-noise intervals. Finally, the finite impulse response is estimated as the sum of zero- and nonzero-noise inputs at any finite classification.
{"title":"Finite impulse response design based on two-level transpose Vedic multiplier for medical image noise reduction","authors":"Joghee Prasad, Arun Sekar Rajasekaran, J. Ajayan, Kambatty Bojan Gurumoorthy","doi":"10.4218/etrij.2023-0335","DOIUrl":"10.4218/etrij.2023-0335","url":null,"abstract":"<p>Medical signal processing requires noise and interference-free inputs for precise segregation and classification operations. However, sensing and transmitting wireless media/devices generate noise that results in signal tampering in feature extractions. To address these issues, this article introduces a finite impulse response design based on a two-level transpose Vedic multiplier. The proposed architecture identifies the zero-noise impulse across the varying sensing intervals. In this process, the first level is the process of transpose array operations with equalization implemented to achieve zero noise at any sensed interval. This transpose occurs between successive array representations of the input with continuity. If the continuity is unavailable, then the noise interruption is considerable and results in signal tampering. The second level of the Vedic multiplier is to optimize the transpose speed for zero-noise segregation. This is performed independently for the zero- and nonzero-noise intervals. Finally, the finite impulse response is estimated as the sum of zero- and nonzero-noise inputs at any finite classification.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 4","pages":"619-632"},"PeriodicalIF":1.3,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0335","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}