APSIPA Transactions on Signal and Information Processing最新文献

英文中文

End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning 端到端日语多方言语音识别与多任务学习的方言识别

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000045

Ryo Imaizumi, Ryo Masumura, Sayaka Shiota, H. Kiya

End-to-end systems have demonstrated state-of-the-art performance on many tasks related to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we propose multi-task learning of Japanese DID and multi-dialect ASR (MD-ASR) systems with end-to-end models. Since Japanese dialects have variety in both linguistic and acoustic aspects of each dialect, Japanese DID requires simultaneously considering linguistic and acoustic features. One solution realizing Japanese DID using these features is to use transcriptions from ASR when performing DID. However, transcribing Japanese multi-dialect speech into text is regarded as a challenging task in ASR because there are big gaps in linguistic and acoustic features between a dialect and standard Japanese. One solution is dialect-aware ASR modeling, which means DID is performed with ASR. Therefore, the multi-task learning framework of Japanese DID and ASR is proposed to represent the dependency of them. We explore three systems as part of the proposed framework, changing the order in which DID and ASR are performed. In the experiments, Japanese multi-dialect ASR and DID tests were conducted on our home-made Japanese multi-dialect database and a standard Japanese database. The proposed transformer-based systems outperformed the conventional single task systems on both DID and ASR tests.

端到端系统已经在许多与自动语音识别(ASR)和方言识别(DID)相关的任务中展示了最先进的性能。本文提出了基于端到端模型的日语DID和多方言ASR (MD-ASR)系统的多任务学习。由于日语方言在语言和声学方面各不相同，因此日语DID需要同时考虑语言和声学特征。使用这些功能实现日语DID的一个解决方案是在执行DID时使用来自ASR的转录。然而，由于日语多方言语音与标准日语在语言和声学特征上存在很大差异，将日语多方言语音转录成文本在ASR中被认为是一项具有挑战性的任务。一种解决方案是方言感知的ASR建模，这意味着DID是用ASR执行的。因此，本文提出了日语DID和ASR的多任务学习框架来表示它们之间的依赖关系。我们探索了三个系统作为拟议框架的一部分，改变了DID和ASR的执行顺序。实验中，在自制的日语多方言数据库和标准日语数据库上进行了日语多方言ASR和DID测试。所提出的基于变压器的系统在DID和ASR测试中都优于传统的单任务系统。

{"title":"End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning","authors":"Ryo Imaizumi, Ryo Masumura, Sayaka Shiota, H. Kiya","doi":"10.1561/116.00000045","DOIUrl":"https://doi.org/10.1561/116.00000045","url":null,"abstract":"End-to-end systems have demonstrated state-of-the-art performance on many tasks related to automatic speech recognition (ASR) and dialect identification (DID). In this paper, we propose multi-task learning of Japanese DID and multi-dialect ASR (MD-ASR) systems with end-to-end models. Since Japanese dialects have variety in both linguistic and acoustic aspects of each dialect, Japanese DID requires simultaneously considering linguistic and acoustic features. One solution realizing Japanese DID using these features is to use transcriptions from ASR when performing DID. However, transcribing Japanese multi-dialect speech into text is regarded as a challenging task in ASR because there are big gaps in linguistic and acoustic features between a dialect and standard Japanese. One solution is dialect-aware ASR modeling, which means DID is performed with ASR. Therefore, the multi-task learning framework of Japanese DID and ASR is proposed to represent the dependency of them. We explore three systems as part of the proposed framework, changing the order in which DID and ASR are performed. In the experiments, Japanese multi-dialect ASR and DID tests were conducted on our home-made Japanese multi-dialect database and a standard Japanese database. The proposed transformer-based systems outperformed the conventional single task systems on both DID and ASR tests.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"1 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67081542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Identifying Code Reading Strategies in Debugging using STA with a Tolerance Algorithm 用容错算法识别STA调试中的代码阅读策略

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000040

Christine Lourrine S. Tablatin, M. M. Rodrigo

引用次数: 1

DeepFake and its Enabling Techniques: A Review DeepFake及其使能技术综述

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000024

R. Brooks, Yefeng Yuan, Yuhong Liu, Haiquan Chen

引用次数: 2

American Sign Language Fingerspelling Recognition in the Wild with Iterative Language Model Construction 基于迭代语言模型构建的野外美国手语拼写识别

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000003

W. Kumwilaisak, Peerawat Pannattee, C. Hansakunbuntheung, N. Thatphithakkul

This paper proposes a novel method to improve the accuracy of the American Sign Language fingerspelling recognition. Video sequences from the training set of the “ChicagoFSWild” dataset are first utilized for training a deep neural network of weakly supervised learning to generate frame labels from a sequence label automatically. The network of weakly supervised learning contains the AlexNet and the LSTM. This trained network generates a collection of frame-labeled images from the training video sequences that have Levenshtein distance between the predicted sequence and the sequence label equal to zero. The negative and positive pairs of all fingerspelling gestures are randomly formed from the collected image set. These pairs are adopted to train the Siamese network of the ResNet-50 and the projection function to produce efficient feature representations. The trained Resnet-50 and the projection function are concatenated with the bidirectional LSTM, a fully connected layer, and a softmax layer to form a deep neural network for the American Sign Language fingerspelling recognition. With the training video sequences, video frames corresponding to the video sequences that have Levenshtein distance between the predicted sequence and the sequence label equal to zero are added to the collected image set. The updated collected image set is used to train the Siamese network. The training process, from training the Siamese network to the update of the collected image set, is iterated until the image recognition performance is not further enhanced. The experimental results from the “ChicagoFSWild” dataset show that the proposed method surpasses the existing works in terms of the character error rate.

本文提出了一种提高美国手语手指拼写识别准确率的新方法。首先利用“ChicagoFSWild”数据集训练集的视频序列训练弱监督学习的深度神经网络，从序列标签自动生成帧标签。弱监督学习网络包含AlexNet和LSTM。该训练网络从预测序列与序列标签之间的Levenshtein距离为零的训练视频序列中生成一组帧标记图像。从收集到的图像集中随机生成所有拼写手势的正负对。利用这些对对ResNet-50的暹罗网络和投影函数进行训练，得到有效的特征表示。将训练好的Resnet-50和投影函数与双向LSTM、全连接层和softmax层进行连接，形成用于美国手语指纹拼写识别的深度神经网络。对于训练视频序列，将预测序列与序列标签之间Levenshtein距离为零的视频序列对应的视频帧加入到采集到的图像集中。更新后的图像集用于训练Siamese网络。训练过程，从训练Siamese网络到更新所收集的图像集，不断迭代，直到图像识别性能没有进一步提高。在“chicagoofswild”数据集上的实验结果表明，本文提出的方法在字符错误率方面优于现有的方法。

{"title":"American Sign Language Fingerspelling Recognition in the Wild with Iterative Language Model Construction","authors":"W. Kumwilaisak, Peerawat Pannattee, C. Hansakunbuntheung, N. Thatphithakkul","doi":"10.1561/116.00000003","DOIUrl":"https://doi.org/10.1561/116.00000003","url":null,"abstract":"This paper proposes a novel method to improve the accuracy of the American Sign Language fingerspelling recognition. Video sequences from the training set of the “ChicagoFSWild” dataset are first utilized for training a deep neural network of weakly supervised learning to generate frame labels from a sequence label automatically. The network of weakly supervised learning contains the AlexNet and the LSTM. This trained network generates a collection of frame-labeled images from the training video sequences that have Levenshtein distance between the predicted sequence and the sequence label equal to zero. The negative and positive pairs of all fingerspelling gestures are randomly formed from the collected image set. These pairs are adopted to train the Siamese network of the ResNet-50 and the projection function to produce efficient feature representations. The trained Resnet-50 and the projection function are concatenated with the bidirectional LSTM, a fully connected layer, and a softmax layer to form a deep neural network for the American Sign Language fingerspelling recognition. With the training video sequences, video frames corresponding to the video sequences that have Levenshtein distance between the predicted sequence and the sequence label equal to zero are added to the collected image set. The updated collected image set is used to train the Siamese network. The training process, from training the Siamese network to the update of the collected image set, is iterated until the image recognition performance is not further enhanced. The experimental results from the “ChicagoFSWild” dataset show that the proposed method surpasses the existing works in terms of the character error rate.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"1 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67079886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Self-Supervised Motion-Corrected Image Reconstruction Network for 4D Magnetic Resonance Imaging of the Body Trunk 躯干四维磁共振自监督运动校正图像重建网络

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000039

T. Küstner, Jiazhen Pan, Christopher Gilliam, H. Qi, G. Cruz, K. Hammernik, T. Blu, D. Rueckert, René M. Botnar, C. Prieto, S. Gatidis

Respiratory motion can cause artifacts in magnetic resonance imaging of the body trunk if patients cannot hold their breath or triggered acquisitions are not practical. Retrospective correction strategies usually cope with motion by fast imaging sequences under free-movement conditions followed by motion binning based on motion traces. These acquisitions yield sub-Nyquist sampled and motion-resolved k-space data. Motion states are linked to each other by non-rigid deformation fields. Usually, motion registration is formulated in image space which can however be impaired by aliasing artifacts or by estimation from low-resolution images. Subsequently, any motion-corrected reconstruction can be biased by errors in the deformation fields. In this work, we propose a deep-learning based motion-corrected 4D (3D spatial + time) image reconstruction which combines a non-rigid registration network and a 4D reconstruction network. Non-rigid motion is estimated in k-space and incorporated into the reconstruction network. The proposed method is evaluated on in-vivo 4D motion-resolved magnetic resonance images of patients with suspected liver or lung metastases and healthy subjects. The proposed approach provides 4D motion-corrected images and deformation fields. It enables a ∼ 14 × accelerated acquisition with a 25-fold faster reconstruction than comparable approaches under consistent preservation of image quality for changing patients and motion patterns.

如果患者不能屏住呼吸或触发采集不实用，呼吸运动可能会导致躯干磁共振成像中的伪影。回顾修正策略通常通过快速成像序列在自由运动条件下处理运动，然后根据运动轨迹进行运动合并。这些采集产生亚奈奎斯特采样和运动分辨率的k空间数据。运动状态通过非刚性变形场相互连接。通常，运动配准是在图像空间中制定的，但这可能会受到混叠工件或低分辨率图像的估计的影响。随后，任何运动校正重建都可能受到变形场误差的影响。在这项工作中，我们提出了一种基于深度学习的运动校正4D (3D空间+时间)图像重建方法，该方法结合了非刚性配准网络和4D重建网络。在k空间中估计非刚体运动，并将其纳入重构网络。该方法在疑似肝或肺转移患者和健康受试者的体内4D运动分辨磁共振图像上进行了评估。该方法提供了四维运动校正图像和变形场。它能够实现约14倍的加速采集，重建速度比同类方法快25倍，同时保持不断变化的患者和运动模式的图像质量。

{"title":"Self-Supervised Motion-Corrected Image Reconstruction Network for 4D Magnetic Resonance Imaging of the Body Trunk","authors":"T. Küstner, Jiazhen Pan, Christopher Gilliam, H. Qi, G. Cruz, K. Hammernik, T. Blu, D. Rueckert, René M. Botnar, C. Prieto, S. Gatidis","doi":"10.1561/116.00000039","DOIUrl":"https://doi.org/10.1561/116.00000039","url":null,"abstract":"Respiratory motion can cause artifacts in magnetic resonance imaging of the body trunk if patients cannot hold their breath or triggered acquisitions are not practical. Retrospective correction strategies usually cope with motion by fast imaging sequences under free-movement conditions followed by motion binning based on motion traces. These acquisitions yield sub-Nyquist sampled and motion-resolved k-space data. Motion states are linked to each other by non-rigid deformation fields. Usually, motion registration is formulated in image space which can however be impaired by aliasing artifacts or by estimation from low-resolution images. Subsequently, any motion-corrected reconstruction can be biased by errors in the deformation fields. In this work, we propose a deep-learning based motion-corrected 4D (3D spatial + time) image reconstruction which combines a non-rigid registration network and a 4D reconstruction network. Non-rigid motion is estimated in k-space and incorporated into the reconstruction network. The proposed method is evaluated on in-vivo 4D motion-resolved magnetic resonance images of patients with suspected liver or lung metastases and healthy subjects. The proposed approach provides 4D motion-corrected images and deformation fields. It enables a ∼ 14 × accelerated acquisition with a 25-fold faster reconstruction than comparable approaches under consistent preservation of image quality for changing patients and motion patterns.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"1 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67081415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Future of Video Coding 视频编码的未来

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000044

N. Ling, C.-C. Jay Kuo, G. Sullivan, Dong Xu, Shan Liu, H. Hang, Wen-Hsiao Peng, Jiaying Liu

引用次数: 5

Combating Misinformation/ Disinformation in Online Social Media: A Multidisciplinary View 打击在线社交媒体中的错误信息/虚假信息:多学科视角

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.0000127

M. Barni, Y. Fang, Yuhong Liu, Laura Robinson, K. Sasahara, Subramaniam Vincent, Xinchao Wang, Zhizheng Wu

Recently, the viral propagation of mis/disinformation has raised significant concerns from both academia and industry. This problem is particularly difficult because on the one hand, rapidly evolving technology makes it much cheaper and easier to manipulate and propagate social media information. On the other hand, the complexity of human psychology and sociology makes the understanding, prediction and prevention of users' involvement in mis/disinformation propagation very difficult. This themed series on "Multi-Disciplinary Dis/Misinformation Analysis and Countermeasures" aims to bring the attention and efforts from researchers in relevant disciplines together to tackle this challenging problem. In addition, on October 20th, 2021, and March 7th 2022, some of the guest editorial team members organized two panel discussions on "Social Media Disinformation and its Impact on Public Health During the COVID-19 Pandemic," and on "Dis/Misinformation Analysis and Countermeasures - A Computational Viewpoint." This article summarizes the key discussion items at these two panels and hopes to shed light on the future directions.

最近，虚假信息的病毒式传播引起了学术界和工业界的极大关注。这个问题特别困难，因为一方面，快速发展的技术使操纵和传播社交媒体信息变得更便宜、更容易。另一方面，人类心理和社会学的复杂性使得对用户参与虚假信息传播的理解、预测和预防变得非常困难。“多学科虚假信息分析与对策”主题系列旨在将相关学科研究人员的注意力和努力聚集在一起，共同解决这一具有挑战性的问题。此外，在2021年10月20日和2022年3月7日，一些客座编辑团队成员组织了两次小组讨论，分别是“2019冠状病毒病大流行期间社交媒体虚假信息及其对公共卫生的影响”和“Dis/Misinformation分析和对策-计算观点”。本文总结了这两个小组的主要讨论项目，并希望阐明未来的方向。

{"title":"Combating Misinformation/ Disinformation in Online Social Media: A Multidisciplinary View","authors":"M. Barni, Y. Fang, Yuhong Liu, Laura Robinson, K. Sasahara, Subramaniam Vincent, Xinchao Wang, Zhizheng Wu","doi":"10.1561/116.0000127","DOIUrl":"https://doi.org/10.1561/116.0000127","url":null,"abstract":"Recently, the viral propagation of mis/disinformation has raised significant concerns from both academia and industry. This problem is particularly difficult because on the one hand, rapidly evolving technology makes it much cheaper and easier to manipulate and propagate social media information. On the other hand, the complexity of human psychology and sociology makes the understanding, prediction and prevention of users' involvement in mis/disinformation propagation very difficult. This themed series on \"Multi-Disciplinary Dis/Misinformation Analysis and Countermeasures\" aims to bring the attention and efforts from researchers in relevant disciplines together to tackle this challenging problem. In addition, on October 20th, 2021, and March 7th 2022, some of the guest editorial team members organized two panel discussions on \"Social Media Disinformation and its Impact on Public Health During the COVID-19 Pandemic,\" and on \"Dis/Misinformation Analysis and Countermeasures - A Computational Viewpoint.\" This article summarizes the key discussion items at these two panels and hopes to shed light on the future directions.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":"1 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67081795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

UHP-SOT++: An Unsupervised Lightweight Single Object Tracker uhp - sot++:一种无监督轻量级单目标跟踪器

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000008

Zhiruo Zhou, Hongyu Fu, Suya You, C. J. Kuo

引用次数: 4

The Future of Computer Vision 计算机视觉的未来

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000009

Jingjing Meng, Xilin Chen, Jurgen Gall, Chang-Su Kim, Zicheng Liu, A. Piva, Junsong Yuan

引用次数: 1

Machine Learning for Wireless Communication: An Overview 无线通信的机器学习:概述

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2022-01-01 DOI: 10.1561/116.00000029

Zijian Cao, Huan Zhang, Le Liang, Geoffrey Ye Li

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

APSIPA Transactions on Signal and Information Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀