首页 > 最新文献

ETRI Journal最新文献

英文 中文
Inceptionv3-LSTM-COV: A multi-label framework for identifying adverse reactions to COVID medicine from chemical conformers based on Inceptionv3 and long short-term memory Inceptionv3-LSTM-COV:基于 Inceptionv3 和长短期记忆的从化学构象识别 COVID 药物不良反应的多标签框架
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-25 DOI: 10.4218/etrij.2023-0288
Pranab Das, Dilwar Hussain Mazumder
Due to the global COVID-19 pandemic, distinct medicines have been developed for treating the coronavirus disease (COVID). However, predicting and identifying potential adverse reactions to these medicines face significant challenges in producing effective COVID medication. Accurate prediction of adverse reactions to COVID medications is crucial for ensuring patient safety and medicine success. Recent advancements in computational models used in pharmaceutical production have opened up new possibilities for detecting such adverse reactions. Due to the urgent need for effective COVID medication development, this research presents a multi-label Inceptionv3 and long short-term memory methodology for COVID (Inceptionv3-LSTM-COV) medicine development. The presented experimental evaluations were conducted using the chemical conformer image of COVID medicine. The features of the chemical conformer are denoted utilizing the RGB color channel, which is extracted using Inceptionv3, GlobalAveragePooling2D, and long short-term memory (LSTM) layers. The results demonstrate that the efficiency of the Inceptionv3-LSTM-COV model outperformed the previous study's performance and achieved better results compared to MLCNN-COV, Inceptionv3, ResNet50, MobileNetv2, VGG19, and DenseNet201 models. The proposed model reported the highest accuracy value of 99.19% in predicting adverse reactions to COVID medicine.
由于 COVID-19 在全球大流行,已经开发出治疗冠状病毒病(COVID)的不同药物。然而,预测和识别这些药物的潜在不良反应是生产有效的 COVID 药物所面临的重大挑战。准确预测 COVID 药物的不良反应对于确保患者安全和药物成功至关重要。制药生产中使用的计算模型的最新进展为检测此类不良反应提供了新的可能性。鉴于对有效 COVID 药物开发的迫切需求,本研究提出了一种用于 COVID 药物开发的多标签 Inceptionv3 和长短期记忆方法(Inceptionv3-LSTM-COV)。实验评估使用 COVID 药物的化学构象图像进行。化学构象的特征利用 RGB 颜色通道表示,并使用 Inceptionv3、GlobalAveragePooling2D 和长短期记忆(LSTM)层进行提取。结果表明,Inceptionv3-LSTM-COV 模型的效率优于之前的研究,与 MLCNN-COV、Inceptionv3、ResNet50、MobileNetv2、VGG19 和 DenseNet201 模型相比取得了更好的结果。所提出的模型在预测 COVID 药物不良反应方面的准确率最高,达到 99.19%。
{"title":"Inceptionv3-LSTM-COV: A multi-label framework for identifying adverse reactions to COVID medicine from chemical conformers based on Inceptionv3 and long short-term memory","authors":"Pranab Das, Dilwar Hussain Mazumder","doi":"10.4218/etrij.2023-0288","DOIUrl":"https://doi.org/10.4218/etrij.2023-0288","url":null,"abstract":"Due to the global COVID-19 pandemic, distinct medicines have been developed for treating the coronavirus disease (COVID). However, predicting and identifying potential adverse reactions to these medicines face significant challenges in producing effective COVID medication. Accurate prediction of adverse reactions to COVID medications is crucial for ensuring patient safety and medicine success. Recent advancements in computational models used in pharmaceutical production have opened up new possibilities for detecting such adverse reactions. Due to the urgent need for effective COVID medication development, this research presents a multi-label Inceptionv3 and long short-term memory methodology for COVID (Inceptionv3-LSTM-COV) medicine development. The presented experimental evaluations were conducted using the chemical conformer image of COVID medicine. The features of the chemical conformer are denoted utilizing the RGB color channel, which is extracted using Inceptionv3, GlobalAveragePooling2D, and long short-term memory (LSTM) layers. The results demonstrate that the efficiency of the Inceptionv3-LSTM-COV model outperformed the previous study's performance and achieved better results compared to MLCNN-COV, Inceptionv3, ResNet50, MobileNetv2, VGG19, and DenseNet201 models. The proposed model reported the highest accuracy value of 99.19% in predicting adverse reactions to COVID medicine.","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"12 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Suboptimal video coding for machines method based on selective activation of in-loop filter 基于选择性激活内环滤波器的次优机器视频编码方法
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-25 DOI: 10.4218/etrij.2023-0085
Ayoung Kim, Eun-Vin An, Soon-heung Jung, Hyon-Gon Choo, Jeongil Seo, Kwang-deok Seo

A conventional codec aims to increase the compression efficiency for transmission and storage while maintaining video quality. However, as the number of platforms using machine vision rapidly increases, a codec that increases the compression efficiency and maintains the accuracy of machine vision tasks must be devised. Hence, the Moving Picture Experts Group created a standardization process for video coding for machines (VCM) to reduce bitrates while maintaining the accuracy of machine vision tasks. In particular, in-loop filters have been developed for improving the subjective quality and machine vision task accuracy. However, the high computational complexity of in-loop filters limits the development of a high-performance VCM architecture. We analyze the effect of an in-loop filter on the VCM performance and propose a suboptimal VCM method based on the selective activation of in-loop filters. The proposed method reduces the computation time for video coding by approximately 5% when using the enhanced compression model and 2% when employing a Versatile Video Coding test model while maintaining the machine vision accuracy and compression efficiency of the VCM architecture.

传统的编解码器旨在提高传输和存储的压缩效率,同时保持视频质量。然而,随着使用机器视觉的平台数量迅速增加,必须设计一种既能提高压缩效率又能保持机器视觉任务准确性的编解码器。因此,移动图像专家组(Moving Picture Experts Group)创建了机器视频编码(VCM)的标准化流程,以降低比特率,同时保持机器视觉任务的准确性。其中,为提高主观质量和机器视觉任务的准确性,开发了内环滤波器。然而,内环滤波器的高计算复杂性限制了高性能 VCM 架构的发展。我们分析了内环滤波器对 VCM 性能的影响,并提出了一种基于选择性激活内环滤波器的次优 VCM 方法。在保持 VCM 架构的机器视觉精度和压缩效率的同时,所提出的方法在使用增强压缩模型时可将视频编码的计算时间减少约 5%,在使用多功能视频编码测试模型时可将计算时间减少 2%。
{"title":"Suboptimal video coding for machines method based on selective activation of in-loop filter","authors":"Ayoung Kim,&nbsp;Eun-Vin An,&nbsp;Soon-heung Jung,&nbsp;Hyon-Gon Choo,&nbsp;Jeongil Seo,&nbsp;Kwang-deok Seo","doi":"10.4218/etrij.2023-0085","DOIUrl":"10.4218/etrij.2023-0085","url":null,"abstract":"<p>A conventional codec aims to increase the compression efficiency for transmission and storage while maintaining video quality. However, as the number of platforms using machine vision rapidly increases, a codec that increases the compression efficiency and maintains the accuracy of machine vision tasks must be devised. Hence, the Moving Picture Experts Group created a standardization process for video coding for machines (VCM) to reduce bitrates while maintaining the accuracy of machine vision tasks. In particular, in-loop filters have been developed for improving the subjective quality and machine vision task accuracy. However, the high computational complexity of in-loop filters limits the development of a high-performance VCM architecture. We analyze the effect of an in-loop filter on the VCM performance and propose a suboptimal VCM method based on the selective activation of in-loop filters. The proposed method reduces the computation time for video coding by approximately 5% when using the enhanced compression model and 2% when employing a Versatile Video Coding test model while maintaining the machine vision accuracy and compression efficiency of the VCM architecture.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 3","pages":"538-549"},"PeriodicalIF":1.4,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139981328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Framework for evaluating code generation ability of large language models 评估大型语言模型代码生成能力的框架
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-14 DOI: 10.4218/etrij.2023-0357
Sangyeop Yeo, Yu-Seung Ma, Sang Cheol Kim, Hyungkook Jun, Taeho Kim

Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n metric.

大型语言模型(LLM)在自然语言处理领域的各种应用中掀起了一场革命,并在生成程序代码方面表现出了卓越的能力。我们提出了一个用于评估 LLM 代码生成能力的框架,并引入了一个新指标:pass-ratio@n$$ passhbox{-} ratio@n$$,该指标根据测试用例的通过率来捕捉准确性的粒度。该框架旨在实现全自动,以处理生成提示、进行推理和执行生成代码所涉及的重复性工作。以提示细节、问题发布日期和难度级别为重点的初步评估表明,我们的框架与 LeetCode 编码平台的集成非常成功,并突出了 pass-ratio@n$$ passhbox{-} ratio@n$ 度量的适用性。
{"title":"Framework for evaluating code generation ability of large language models","authors":"Sangyeop Yeo,&nbsp;Yu-Seung Ma,&nbsp;Sang Cheol Kim,&nbsp;Hyungkook Jun,&nbsp;Taeho Kim","doi":"10.4218/etrij.2023-0357","DOIUrl":"10.4218/etrij.2023-0357","url":null,"abstract":"<p>Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, \u0000<math>\u0000 <mi>p</mi>\u0000 <mi>a</mi>\u0000 <mi>s</mi>\u0000 <mi>s</mi>\u0000 <mtext>-</mtext>\u0000 <mi>r</mi>\u0000 <mi>a</mi>\u0000 <mi>t</mi>\u0000 <mi>i</mi>\u0000 <mi>o</mi>\u0000 <mi>@</mi>\u0000 <mi>n</mi></math>, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the \u0000<math>\u0000 <mi>p</mi>\u0000 <mi>a</mi>\u0000 <mi>s</mi>\u0000 <mi>s</mi>\u0000 <mtext>-</mtext>\u0000 <mi>r</mi>\u0000 <mi>a</mi>\u0000 <mi>t</mi>\u0000 <mi>i</mi>\u0000 <mi>o</mi>\u0000 <mi>@</mi>\u0000 <mi>n</mi></math> metric.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"106-117"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0357","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint streaming model for backchannel prediction and automatic speech recognition 用于后信道预测和自动语音识别的联合流模型
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-14 DOI: 10.4218/etrij.2023-0358
Yong-Seok Choi, Jeong-Uk Bang, Seung Hi Kim

In human conversations, listeners often utilize brief backchannels such as “uh-huh” or “yeah.” Timely backchannels are crucial to understanding and increasing trust among conversational partners. In human–machine conversation systems, users can engage in natural conversations when a conversational agent generates backchannels like a human listener. We propose a method that simultaneously predicts backchannels and recognizes speech in real time. We use a streaming transformer and adopt multitask learning for concurrent backchannel prediction and speech recognition. The experimental results demonstrate the superior performance of our method compared with previous works while maintaining a similar single-task speech recognition performance. Owing to the extremely imbalanced training data distribution, the single-task backchannel prediction model fails to predict any of the backchannel categories, and the proposed multitask approach substantially enhances the backchannel prediction performance. Notably, in the streaming prediction scenario, the performance of backchannel prediction improves by up to 18.7% compared with existing methods.

在人类对话中,听者通常会使用简短的后信道,如 "嗯 "或 "是"。及时的反向信道对于理解和增加对话伙伴之间的信任至关重要。在人机对话系统中,当对话代理像人类听众一样产生反向信道时,用户就能进行自然的对话。我们提出了一种能同时预测反向信道和实时识别语音的方法。我们使用流转换器并采用多任务学习来同时进行反向信道预测和语音识别。实验结果表明,与之前的研究相比,我们的方法性能优越,同时保持了类似的单任务语音识别性能。由于训练数据分布极不平衡,单任务后信道预测模型无法预测任何一个后信道类别,而所提出的多任务方法大大提高了后信道预测性能。值得注意的是,在流预测场景中,与现有方法相比,后信道预测的性能最多提高了 18.7%。
{"title":"Joint streaming model for backchannel prediction and automatic speech recognition","authors":"Yong-Seok Choi,&nbsp;Jeong-Uk Bang,&nbsp;Seung Hi Kim","doi":"10.4218/etrij.2023-0358","DOIUrl":"10.4218/etrij.2023-0358","url":null,"abstract":"<p>In human conversations, listeners often utilize brief backchannels such as “uh-huh” or “yeah.” Timely backchannels are crucial to understanding and increasing trust among conversational partners. In human–machine conversation systems, users can engage in natural conversations when a conversational agent generates backchannels like a human listener. We propose a method that simultaneously predicts backchannels and recognizes speech in real time. We use a streaming transformer and adopt multitask learning for concurrent backchannel prediction and speech recognition. The experimental results demonstrate the superior performance of our method compared with previous works while maintaining a similar single-task speech recognition performance. Owing to the extremely imbalanced training data distribution, the single-task backchannel prediction model fails to predict any of the backchannel categories, and the proposed multitask approach substantially enhances the backchannel prediction performance. Notably, in the streaming prediction scenario, the performance of backchannel prediction improves by up to 18.7% compared with existing methods.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"118-126"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0358","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets 利用迁移学习和小型人类与元伪标签数据集进行命名实体识别
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-14 DOI: 10.4218/etrij.2023-0321
Kyoungman Bae, Joon-Ho Lim

We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

我们为书面语和口语引入了一种高性能命名实体识别(NER)模型。为了克服标注数据稀缺和领域转移带来的挑战,我们使用迁移学习来利用之前开发的 KorBERT 作为基础模型。我们还采用了一种元伪标签方法,使用带有标签和未标签数据的教师/学生框架。我们的模型有两处修改。首先,学生模型根据人类和伪标签数据的平均损失进行更新。其次,通过考虑反馈分数并仅在低于阈值(0.0005)时更新教师模型,减轻了噪声伪标签数据的影响。我们在口语领域实现了目标 NER 性能,并通过提出一种直接的回滚方法,在稀缺的人类标记数据基础上恢复到最佳模型,从而提高了书面语言领域的 NER 性能。通过调整命名实体字典中的标签向量权重,可以进一步提高性能。
{"title":"Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets","authors":"Kyoungman Bae,&nbsp;Joon-Ho Lim","doi":"10.4218/etrij.2023-0321","DOIUrl":"10.4218/etrij.2023-0321","url":null,"abstract":"<p>We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"59-70"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthesis of electronically tunable multifunction biquad filter using voltage differencing differential input buffered amplifiers 利用电压差分差动输入缓冲放大器合成电子可调多功能双四级滤波器
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-14 DOI: 10.4218/etrij.2023-0391
Sirigul Bunrueangsak, Winai Jaikla, Amornchai Chaichana, Piya Supavarasuwat, Surapong Siripongdee, Peerawut Suwanjan
Biquad filters are commonly used in analog circuits for various purposes in signal processing and communication applications. We synthesize an analog active biquad filter with five types of voltage-mode filtering functions. The filter is synthesized using a parallel passive resistor-inductor-capacitor (RLC) network and unity-gain voltage differencing amplifier. A voltage differencing differential input buffered amplifier (VD-DIBA) is the main active component, and the biquad filter has a three-input single-output (TISO) topology. By replacing the passive inductor and resistor with VD-DIBA-based inductance and resistance simulators with a subtractor, the TISO voltage-mode versatile filter is obtained from two VD-DIBAs, one resistor, and two capacitors connected to the ground. The proposed filter can provide five types of voltage-mode filtering functions: inverting bandpass and lowpass responses as well as noninverting band-stop, high-pass, and all-pass responses. The all-pass filter requires no additional active components. The three input voltage nodes have high impedance, and a low-impedance output voltage node facilitates cascade connections without using additional voltage buffers. In addition, the natural frequency and quality factor can be electronically tuned. The quality factor is controlled without disturbing the passband gain and natural frequency. The proposed filter is simulated and verified experimentally in the Personal Simulation Program with Integrated Circuit Emphasis (PSPICE) and through laboratory tests employing VD-DIBAs implemented using commercially available components.
在信号处理和通信应用的各种模拟电路中,双向滤波器是常用的滤波器。我们合成了一种具有五种电压模式滤波功能的模拟有源双四极滤波器。该滤波器是利用并联无源电阻电感电容(RLC)网络和单增益电压差分放大器合成的。电压差分输入缓冲放大器(VD-DIBA)是主要的有源元件,双四元组滤波器采用三输入单输出(TISO)拓扑结构。用基于 VD-DIBA 的电感和电阻模拟器及减法器取代无源电感和电阻后,TISO 电压模式多用途滤波器由两个 VD-DIBA、一个电阻和两个接地电容组成。所提出的滤波器可提供五种电压模式滤波功能:反相带通和低通响应,以及非反相带阻、高通和全通响应。全通滤波器不需要额外的有源元件。三个输入电压节点具有高阻抗,一个低阻抗输出电压节点便于级联连接,而无需使用额外的电压缓冲器。此外,固有频率和品质因数可通过电子方式调节。控制品质因数不会影响通带增益和固有频率。我们利用个人集成电路仿真程序(PSPICE)对所提出的滤波器进行了仿真和实验验证,并通过实验室测试,利用市售元件实现了 VD-DIBA 。
{"title":"Synthesis of electronically tunable multifunction biquad filter using voltage differencing differential input buffered amplifiers","authors":"Sirigul Bunrueangsak, Winai Jaikla, Amornchai Chaichana, Piya Supavarasuwat, Surapong Siripongdee, Peerawut Suwanjan","doi":"10.4218/etrij.2023-0391","DOIUrl":"https://doi.org/10.4218/etrij.2023-0391","url":null,"abstract":"Biquad filters are commonly used in analog circuits for various purposes in signal processing and communication applications. We synthesize an analog active biquad filter with five types of voltage-mode filtering functions. The filter is synthesized using a parallel passive resistor-inductor-capacitor (RLC) network and unity-gain voltage differencing amplifier. A voltage differencing differential input buffered amplifier (VD-DIBA) is the main active component, and the biquad filter has a three-input single-output (TISO) topology. By replacing the passive inductor and resistor with VD-DIBA-based inductance and resistance simulators with a subtractor, the TISO voltage-mode versatile filter is obtained from two VD-DIBAs, one resistor, and two capacitors connected to the ground. The proposed filter can provide five types of voltage-mode filtering functions: inverting bandpass and lowpass responses as well as noninverting band-stop, high-pass, and all-pass responses. The all-pass filter requires no additional active components. The three input voltage nodes have high impedance, and a low-impedance output voltage node facilitates cascade connections without using additional voltage buffers. In addition, the natural frequency and quality factor can be electronically tuned. The quality factor is controlled without disturbing the passband gain and natural frequency. The proposed filter is simulated and verified experimentally in the Personal Simulation Program with Integrated Circuit Emphasis (PSPICE) and through laboratory tests employing VD-DIBAs implemented using commercially available components.","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"207 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems 采用三特征多融合方法的多模态视听语音识别架构,适用于噪声稳健型系统
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-14 DOI: 10.4218/etrij.2023-0266
Sanghun Jeon, Jieun Lee, Dohyeon Yeo, Yong-Ju Lee, SeungJun Kim

Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial–temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

暴露在各种嘈杂环境中会损害基于人工智能的语音识别技术的识别性能。降级性能服务可作为有限系统使用,确保在特定环境中的良好性能,但会损害语音识别服务的总体质量。本研究模仿人类对话识别要素,引入了一种对各种噪声设置具有鲁棒性的视听语音识别(AVSR)模型。该模型将词嵌入和 log-Mel 频谱图转换为音频识别的特征向量。密集的时空卷积神经网络模型从对数-梅尔频谱图中提取特征,转换后用于基于视觉的识别。这种方法提高了听觉和视觉识别能力。我们评估了九种合成噪声环境下的信噪比,发现所提出的模型平均错误率较低。使用三特征多重融合方法的 AVSR 模型的错误率为 1.711%,而一般的错误率为 3.939%。该模型具有更高的稳定性和识别率,因此适用于受噪声影响的环境。
{"title":"Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems","authors":"Sanghun Jeon,&nbsp;Jieun Lee,&nbsp;Dohyeon Yeo,&nbsp;Yong-Ju Lee,&nbsp;SeungJun Kim","doi":"10.4218/etrij.2023-0266","DOIUrl":"10.4218/etrij.2023-0266","url":null,"abstract":"<p>Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial–temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"22-34"},"PeriodicalIF":1.4,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139761158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance analysis of multiview video compression based on MIV and VVC multilayer 基于 MIV 和 VVC 多层的多视图视频压缩性能分析
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-01 DOI: 10.4218/etrij.2023-0309
Jinho Lee, Gun Bang, Jungwon Kang, Mehrdad Teratani, Gauthier Lafruit, Haechul Choi
To represent immersive media providing six degree-of-freedom experience, moving picture experts group (MPEG) immersive video (MIV) was developed to compress multiview videos. Meanwhile, the state-of-the-art versatile video coding (VVC) also supports multilayer (ML) functionality, enabling the coding of multiview videos. In this study, we designed experimental conditions to assess the performance of these two state-of-the-art standards in terms of objective and subjective quality. We observe that their performances are highly dependent on the conditions of the input source, such as the camera arrangement and the ratio of input views to all views. VVC-ML is efficient when the input source is captured by a planar camera arrangement and many input views are used. Conversely, MIV outperforms VVC-ML when the camera arrangement is non-planar and the ratio of input views to all views is low. In terms of the subjective quality of the synthesized view, VVC-ML causes severe rendering artifacts such as holes when occluded regions exist among the input views, whereas MIV reconstructs the occluded regions correctly but induces rendering artifacts with rectangular shapes at low bitrates.
为了表现提供六自由度体验的身临其境媒体,移动图像专家组(MPEG)开发了身临其境视频(MIV)来压缩多视角视频。同时,最先进的多功能视频编码(VVC)也支持多层(ML)功能,从而实现了多视角视频的编码。在本研究中,我们设计了实验条件来评估这两种最先进标准在客观和主观质量方面的性能。我们发现,它们的性能在很大程度上取决于输入源的条件,如摄像机的排列和输入视图与所有视图的比例。当输入源由平面摄像机拍摄并使用多个输入视图时,VVC-ML 的效率较高。相反,当摄像机布置为非平面且输入视图与所有视图的比例较低时,MIV 的表现优于 VVC-ML。就合成视图的主观质量而言,当输入视图中存在遮挡区域时,VVC-ML 会导致严重的渲染伪像,如洞,而 MIV 能正确重建遮挡区域,但在低比特率情况下会导致矩形形状的渲染伪像。
{"title":"Performance analysis of multiview video compression based on MIV and VVC multilayer","authors":"Jinho Lee, Gun Bang, Jungwon Kang, Mehrdad Teratani, Gauthier Lafruit, Haechul Choi","doi":"10.4218/etrij.2023-0309","DOIUrl":"https://doi.org/10.4218/etrij.2023-0309","url":null,"abstract":"To represent immersive media providing six degree-of-freedom experience, moving picture experts group (MPEG) immersive video (MIV) was developed to compress multiview videos. Meanwhile, the state-of-the-art versatile video coding (VVC) also supports multilayer (ML) functionality, enabling the coding of multiview videos. In this study, we designed experimental conditions to assess the performance of these two state-of-the-art standards in terms of objective and subjective quality. We observe that their performances are highly dependent on the conditions of the input source, such as the camera arrangement and the ratio of input views to all views. VVC-ML is efficient when the input source is captured by a planar camera arrangement and many input views are used. Conversely, MIV outperforms VVC-ML when the camera arrangement is non-planar and the ratio of input views to all views is low. In terms of the subjective quality of the synthesized view, VVC-ML causes severe rendering artifacts such as holes when occluded regions exist among the input views, whereas MIV reconstructs the occluded regions correctly but induces rendering artifacts with rectangular shapes at low bitrates.","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"15 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation 具有端到端自动语音识别和能力评估功能的人工智能语言辅导系统
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-01-31 DOI: 10.4218/etrij.2023-0322
Byung Ok Kang, Hyung-Bae Jeon, Yun Kyung Lee

This paper presents the development of language tutoring systems for non-native speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI PengTalk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.

本文介绍了利用先进的端到端自动语音识别(ASR)和能力评估,为非母语人士开发语言辅导系统的情况。鉴于非母语语音中经常出现错误,因此必须应用高性能的自发语音识别。我们的系统能准确评估发音和口语流利程度,并依靠精确的转录提供错误反馈。通过使用多样化的非母语语音数据进行模型训练,实现并增强了端到端 ASR。为了提高性能,我们结合了半监督学习和迁移学习技术,使用已标注和未标注的语音数据。自动能力评估是由一个经过训练的模型来完成的,其目的是最大限度地提高由人类专家手动确定的流利度得分与计算出的流利度得分之间的统计相关性。我们为韩国小学生开发了名为 EBS AI PengTalk 的英语辅导系统,为外国人开发了名为 KSI Korean AI Tutor 的韩语辅导系统。这两个系统均由韩国政府机构部署。
{"title":"AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation","authors":"Byung Ok Kang,&nbsp;Hyung-Bae Jeon,&nbsp;Yun Kyung Lee","doi":"10.4218/etrij.2023-0322","DOIUrl":"10.4218/etrij.2023-0322","url":null,"abstract":"<p>This paper presents the development of language tutoring systems for non-native speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI PengTalk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"48-58"},"PeriodicalIF":1.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alzheimer's disease recognition from spontaneous speech using large language models 利用大型语言模型从自发语音中识别阿尔茨海默病
IF 1.4 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-01-29 DOI: 10.4218/etrij.2023-0356
Jeong-Uk Bang, Seung-Hoon Han, Byung-Ok Kang

We propose a method to automatically predict Alzheimer's disease from speech data using the ChatGPT large language model. Alzheimer's disease patients often exhibit distinctive characteristics when describing images, such as difficulties in recalling words, grammar errors, repetitive language, and incoherent narratives. For prediction, we initially employ a speech recognition system to transcribe participants' speech into text. We then gather opinions by inputting the transcribed text into ChatGPT as well as a prompt designed to solicit fluency evaluations. Subsequently, we extract embeddings from the speech, text, and opinions by the pretrained models. Finally, we use a classifier consisting of transformer blocks and linear layers to identify participants with this type of dementia. Experiments are conducted using the extensively used ADReSSo dataset. The results yield a maximum accuracy of 87.3% when speech, text, and opinions are used in conjunction. This finding suggests the potential of leveraging evaluation feedback from language models to address challenges in Alzheimer's disease recognition.

我们提出了一种利用 ChatGPT 大语言模型从语音数据中自动预测阿尔茨海默病的方法。阿尔茨海默病患者在描述图像时通常会表现出明显的特征,如难以回忆单词、语法错误、语言重复和叙述不连贯等。为了进行预测,我们首先使用语音识别系统将参与者的语音转录为文本。然后,我们将转录的文本输入 ChatGPT 以及一个旨在征求流利度评价的提示,以此收集意见。随后,我们通过预训练模型从语音、文本和意见中提取嵌入。最后,我们使用由变换块和线性层组成的分类器来识别患有此类痴呆症的参与者。我们使用广泛使用的 ADReSSo 数据集进行了实验。结果表明,当语音、文本和观点结合使用时,准确率最高可达 87.3%。这一发现表明,利用语言模型的评估反馈来应对阿尔茨海默病识别挑战是有潜力的。
{"title":"Alzheimer's disease recognition from spontaneous speech using large language models","authors":"Jeong-Uk Bang,&nbsp;Seung-Hoon Han,&nbsp;Byung-Ok Kang","doi":"10.4218/etrij.2023-0356","DOIUrl":"10.4218/etrij.2023-0356","url":null,"abstract":"<p>We propose a method to automatically predict Alzheimer's disease from speech data using the ChatGPT large language model. Alzheimer's disease patients often exhibit distinctive characteristics when describing images, such as difficulties in recalling words, grammar errors, repetitive language, and incoherent narratives. For prediction, we initially employ a speech recognition system to transcribe participants' speech into text. We then gather opinions by inputting the transcribed text into ChatGPT as well as a prompt designed to solicit fluency evaluations. Subsequently, we extract embeddings from the speech, text, and opinions by the pretrained models. Finally, we use a classifier consisting of transformer blocks and linear layers to identify participants with this type of dementia. Experiments are conducted using the extensively used ADReSSo dataset. The results yield a maximum accuracy of 87.3% when speech, text, and opinions are used in conjunction. This finding suggests the potential of leveraging evaluation feedback from language models to address challenges in Alzheimer's disease recognition.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 1","pages":"96-105"},"PeriodicalIF":1.4,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2023-0356","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ETRI Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1