APSIPA Transactions on Signal and Information Processing最新文献

英文中文

Demystifying data and AI for manufacturing: case studies from a major computer maker 为制造业揭开数据和人工智能的神秘面纱：一家大型计算机制造商的案例研究

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2021-03-08 DOI: 10.1017/ATSIP.2021.3

Yi-Chun Chen, Bo-Huei He, Shih-Sung Lin, Jonathan Hans Soeseno, Daniel Stanley Tan, Trista Pei-chun Chen, Wei-Chao Chen

In this article, we discuss the backgrounds and technical details about several smart manufacturing projects in a tier-one electronics manufacturing facility. We devise a process to manage logistic forecast and inventory preparation for electronic parts using historical data and a recurrent neural network to achieve significant improvement over current methods. We present a system for automatically qualifying laptop software for mass production through computer vision and automation technology. The result is a reliable system that can save hundreds of man-years in the qualification process. Finally, we create a deep learning-based algorithm for visual inspection of product appearances, which requires significantly less defect training data compared to traditional approaches. For production needs, we design an automatic optical inspection machine suitable for our algorithm and process. We also discuss the issues for data collection and enabling smart manufacturing projects in a factory setting, where the projects operate on a delicate balance between process innovations and cost-saving measures.

在这篇文章中，我们讨论了一级电子制造设施中几个智能制造项目的背景和技术细节。我们使用历史数据和递归神经网络设计了一个管理电子零件物流预测和库存准备的流程，以实现对当前方法的显著改进。我们提出了一个通过计算机视觉和自动化技术自动鉴定笔记本电脑软件用于大规模生产的系统。其结果是一个可靠的系统，可以在资格认证过程中节省数百人年的时间。最后，我们创建了一种基于深度学习的产品外观视觉检测算法，与传统方法相比，该算法所需的缺陷训练数据要少得多。根据生产需要，我们设计了一台适合我们算法和工艺的自动光学检测机。我们还讨论了在工厂环境中进行数据收集和启用智能制造项目的问题，这些项目在工艺创新和成本节约措施之间保持着微妙的平衡。

{"title":"Demystifying data and AI for manufacturing: case studies from a major computer maker","authors":"Yi-Chun Chen, Bo-Huei He, Shih-Sung Lin, Jonathan Hans Soeseno, Daniel Stanley Tan, Trista Pei-chun Chen, Wei-Chao Chen","doi":"10.1017/ATSIP.2021.3","DOIUrl":"https://doi.org/10.1017/ATSIP.2021.3","url":null,"abstract":"In this article, we discuss the backgrounds and technical details about several smart manufacturing projects in a tier-one electronics manufacturing facility. We devise a process to manage logistic forecast and inventory preparation for electronic parts using historical data and a recurrent neural network to achieve significant improvement over current methods. We present a system for automatically qualifying laptop software for mass production through computer vision and automation technology. The result is a reliable system that can save hundreds of man-years in the qualification process. Finally, we create a deep learning-based algorithm for visual inspection of product appearances, which requires significantly less defect training data compared to traditional approaches. For production needs, we design an automatic optical inspection machine suitable for our algorithm and process. We also discuss the issues for data collection and enabling smart manufacturing projects in a factory setting, where the projects operate on a delicate balance between process innovations and cost-saving measures.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2021.3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49632674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Toward community answer selection by jointly static and dynamic user expertise modeling 通过静态和动态用户专业知识建模实现社区答案选择

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2021-03-01 DOI: 10.1017/ATSIP.2020.28

Yuchao Liu, Meng Liu, Jianhua Yin

Answer selection, ranking high-quality answers first, is a significant problem for the community question answering sites. Existing approaches usually consider it as a text matching task, and then calculate the quality of answers via their semantic relevance to the given question. However, they thoroughly ignore the influence of other multiple factors in the community, such as the user expertise. In this paper, we propose an answer selection model based on the user expertise modeling, which simultaneously considers the social influence and the personal interest that affect the user expertise from different views. Specifically, we propose an inductive strategy to aggregate the social influence of neighbors. Besides, we introduce the explicit topic interest of users and capture the context-based personal interest by weighing the activation of each topic. Moreover, we construct two real-world datasets containing rich user information. Extensive experiments on two datasets demonstrate that our model outperforms several state-of-the-art models.

答案选择，将高质量的答案排在首位，是社区问答网站面临的一个重大问题。现有的方法通常将其视为一项文本匹配任务，然后通过答案与给定问题的语义相关性来计算答案的质量。然而，他们完全忽略了社区中其他多种因素的影响，例如用户专业知识。在本文中，我们提出了一个基于用户专业知识模型的答案选择模型，该模型同时从不同角度考虑了影响用户专业知识的社会影响和个人兴趣。具体而言，我们提出了一种归纳策略来聚合邻居的社会影响力。此外，我们引入了用户明确的主题兴趣，并通过权衡每个主题的激活来捕捉基于上下文的个人兴趣。此外，我们构建了两个包含丰富用户信息的真实世界数据集。在两个数据集上进行的大量实验表明，我们的模型优于几种最先进的模型。

引用次数: 1

Subspace learning for facial expression recognition: an overview and a new perspective 面部表情识别的子空间学习:综述与新视角

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2021-01-14 DOI: 10.1017/ATSIP.2020.27

Cigdem Turan, Rui Zhao, K. Lam, Xiangjian He

For image recognition, an extensive number of subspace-learning methods have been proposed to overcome the high-dimensionality problem of the features being used. In this paper, we first give an overview of the most popular and state-of-the-art subspace-learning methods, and then, a novel manifold-learning method, named soft locality preserving map (SLPM), is presented. SLPM aims to control the level of spread of the different classes, which is closely connected to the generalizability of the learned subspace. We also do an overview of the extension of manifold learning methods to deep learning by formulating the loss functions for training, and further reformulate SLPM into a soft locality preserving (SLP) loss. These loss functions are applied as an additional regularization to the learning of deep neural networks. We evaluate these subspace-learning methods, as well as their deep-learning extensions, on facial expression recognition. Experiments on four commonly used databases show that SLPM effectively reduces the dimensionality of the feature vectors and enhances the discriminative power of the extracted features. Moreover, experimental results also demonstrate that the learned deep features regularized by SLP acquire a better discriminability and generalizability for facial expression recognition.

对于图像识别，已经提出了大量的子空间学习方法来克服所使用的特征的高维问题。在本文中，我们首先概述了最流行和最先进的子空间学习方法，然后提出了一种新的流形学习方法，称为软局域保持映射(SLPM)。SLPM的目的是控制不同类的扩散程度，这与学习到的子空间的可泛化性密切相关。我们还概述了流形学习方法在深度学习中的扩展，通过制定用于训练的损失函数，并进一步将SLPM重新表述为软局域保持(SLP)损失。这些损失函数作为一种额外的正则化应用于深度神经网络的学习。我们评估了这些子空间学习方法，以及它们在面部表情识别上的深度学习扩展。在4个常用数据库上的实验表明，SLPM有效地降低了特征向量的维数，提高了提取特征的判别能力。此外，实验结果还表明，学习到的深度特征经SLP正则化后，对面部表情识别具有较好的判别性和泛化性。

{"title":"Subspace learning for facial expression recognition: an overview and a new perspective","authors":"Cigdem Turan, Rui Zhao, K. Lam, Xiangjian He","doi":"10.1017/ATSIP.2020.27","DOIUrl":"https://doi.org/10.1017/ATSIP.2020.27","url":null,"abstract":"For image recognition, an extensive number of subspace-learning methods have been proposed to overcome the high-dimensionality problem of the features being used. In this paper, we first give an overview of the most popular and state-of-the-art subspace-learning methods, and then, a novel manifold-learning method, named soft locality preserving map (SLPM), is presented. SLPM aims to control the level of spread of the different classes, which is closely connected to the generalizability of the learned subspace. We also do an overview of the extension of manifold learning methods to deep learning by formulating the loss functions for training, and further reformulate SLPM into a soft locality preserving (SLP) loss. These loss functions are applied as an additional regularization to the learning of deep neural networks. We evaluate these subspace-learning methods, as well as their deep-learning extensions, on facial expression recognition. Experiments on four commonly used databases show that SLPM effectively reduces the dimensionality of the feature vectors and enhances the discriminative power of the extracted features. Moreover, experimental results also demonstrate that the learned deep features regularized by SLP acquire a better discriminability and generalizability for facial expression recognition.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2020.27","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46764150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Fairness-Oriented User Scheduling for Bursty Downlink Transmission Using Multi-Agent Reinforcement Learning 基于多Agent强化学习的突发下行链路传输公平用户调度

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-12-30 DOI: 10.1561/116.00000028

Mingqi Yuan, Qi Cao, Man-On Pun, Yi Chen

In this work, we develop practical user scheduling algorithms for downlink bursty traffic with emphasis on user fairness. In contrast to the conventional scheduling algorithms that either equally divides the transmission time slots among users or maximizing some ratios without physcial meanings, we propose to use the 5%-tile user data rate (5TUDR) as the metric to evaluate user fairness. Since it is difficult to directly optimize 5TUDR, we first cast the problem into the stochastic game framework and subsequently propose a Multi-Agent Reinforcement Learning (MARL)-based algorithm to perform distributed optimization on the resource block group (RBG) allocation. Furthermore, each MARL agent is designed to take information measured by network counters from multiple network layers (e.g. Channel Quality Indicator, Buffer size) as the input states while the RBG allocation as action with a proposed reward function designed to maximize 5TUDR. Extensive simulation is performed to show that the proposed MARL-based scheduler can achieve fair scheduling while maintaining good average network throughput as compared to conventional schedulers.

在这项工作中，我们开发了针对下行链路突发业务的实用用户调度算法，重点是用户公平性。与传统的调度算法（在用户之间平均分配传输时隙或最大化一些没有物理意义的比率）不同，我们建议使用5%的用户数据率（5TUDR）作为评估用户公平性的指标。由于很难直接优化5TUDR，我们首先将问题放入随机博弈框架中，然后提出了一种基于多Agent强化学习（MARL）的算法来对资源块组（RBG）的分配进行分布式优化。此外，每个MARL代理被设计为将来自多个网络层的网络计数器测量的信息（例如，信道质量指示符、缓冲区大小）作为输入状态，而RBG分配作为具有被设计为最大化5TUDR的所提出的奖励函数的动作。仿真结果表明，与传统调度器相比，所提出的基于MARL的调度器可以实现公平调度，同时保持良好的平均网络吞吐量。

{"title":"Fairness-Oriented User Scheduling for Bursty Downlink Transmission Using Multi-Agent Reinforcement Learning","authors":"Mingqi Yuan, Qi Cao, Man-On Pun, Yi Chen","doi":"10.1561/116.00000028","DOIUrl":"https://doi.org/10.1561/116.00000028","url":null,"abstract":"In this work, we develop practical user scheduling algorithms for downlink bursty traffic with emphasis on user fairness. In contrast to the conventional scheduling algorithms that either equally divides the transmission time slots among users or maximizing some ratios without physcial meanings, we propose to use the 5%-tile user data rate (5TUDR) as the metric to evaluate user fairness. Since it is difficult to directly optimize 5TUDR, we first cast the problem into the stochastic game framework and subsequently propose a Multi-Agent Reinforcement Learning (MARL)-based algorithm to perform distributed optimization on the resource block group (RBG) allocation. Furthermore, each MARL agent is designed to take information measured by network counters from multiple network layers (e.g. Channel Quality Indicator, Buffer size) as the input states while the RBG allocation as action with a proposed reward function designed to maximize 5TUDR. Extensive simulation is performed to show that the proposed MARL-based scheduler can achieve fair scheduling while maintaining good average network throughput as compared to conventional schedulers.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49055886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A multi-branch ResNet with discriminative features for detection of replay speech signals 一种用于重放语音信号检测的具有判别特征的多分支ResNet

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-12-29 DOI: 10.1017/ATSIP.2020.26

Xingliang Cheng, Mingxing Xu, T. Zheng

Nowadays, the security of ASV systems is increasingly gaining attention. As one of the common spoofing methods, replay attacks are easy to implement but difficult to detect. Many researchers focus on designing various features to detect the distortion of replay attack attempts. Constant-Q cepstral coefficients (CQCC), based on the magnitude of the constant-Q transform (CQT), is one of the striking features in the field of replay detection. However, it ignores phase information, which may also be distorted in the replay processes. In this work, we propose a CQT-based modified group delay feature (CQTMGD) which can capture the phase information of CQT. Furthermore, a multi-branch residual convolution network, ResNeWt, is proposed to distinguish replay attacks from bonafide attempts. We evaluated our proposal in the ASVspoof 2019 physical access dataset. Results show that CQTMGD outperformed the traditional MGD feature, and the fusion with other magnitude-based and phase-based features achieved a further improvement. Our best fusion system achieved 0.0096 min-tDCF and 0.39% EER on the evaluation set and it outperformed all the other state-of-the-art methods in the ASVspoof 2019 physical access challenge.

目前，ASV系统的安全性越来越受到关注。重放攻击作为常见的欺骗方法之一，易于实现，但难以检测。许多研究人员专注于设计各种特征来检测重放攻击尝试的失真。基于常数Q变换（CQT）幅度的常数Q倒谱系数（CQCC）是重放检测领域的显著特征之一。然而，它忽略了相位信息，这些信息在回放过程中也可能失真。在这项工作中，我们提出了一种基于CQT的修改组延迟特征（CQTMGD），它可以捕获CQT的相位信息。此外，还提出了一种多分支残差卷积网络ResNeWt，用于区分重放攻击和真实尝试。我们在ASVspoof 2019物理访问数据集中评估了我们的提案。结果表明，CQTMGD的性能优于传统的MGD特征，与其他基于幅度和相位的特征的融合得到了进一步的改进。我们最好的融合系统在评估集上实现了0.0096分钟的tDCF和0.39%的EER，在ASVspoof 2019物理访问挑战中，它的表现优于所有其他最先进的方法。

{"title":"A multi-branch ResNet with discriminative features for detection of replay speech signals","authors":"Xingliang Cheng, Mingxing Xu, T. Zheng","doi":"10.1017/ATSIP.2020.26","DOIUrl":"https://doi.org/10.1017/ATSIP.2020.26","url":null,"abstract":"Nowadays, the security of ASV systems is increasingly gaining attention. As one of the common spoofing methods, replay attacks are easy to implement but difficult to detect. Many researchers focus on designing various features to detect the distortion of replay attack attempts. Constant-Q cepstral coefficients (CQCC), based on the magnitude of the constant-Q transform (CQT), is one of the striking features in the field of replay detection. However, it ignores phase information, which may also be distorted in the replay processes. In this work, we propose a CQT-based modified group delay feature (CQTMGD) which can capture the phase information of CQT. Furthermore, a multi-branch residual convolution network, ResNeWt, is proposed to distinguish replay attacks from bonafide attempts. We evaluated our proposal in the ASVspoof 2019 physical access dataset. Results show that CQTMGD outperformed the traditional MGD feature, and the fusion with other magnitude-based and phase-based features achieved a further improvement. Our best fusion system achieved 0.0096 min-tDCF and 0.39% EER on the evaluation set and it outperformed all the other state-of-the-art methods in the ASVspoof 2019 physical access challenge.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2020.26","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43798813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder 用神经网络频谱映射模型和WaveNet声码器评估语音转换

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-11-25 DOI: 10.1017/ATSIP.2020.24

Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, T. Toda

This paper presents an evaluation of parallel voice conversion (VC) with neural network (NN)-based statistical models for spectral mapping and waveform generation. The NN-based architectures for spectral mapping include deep NN (DNN), deep mixture density network (DMDN), and recurrent NN (RNN) models. WaveNet (WN) vocoder is employed as a high-quality NN-based waveform generation. In VC, though, owing to the oversmoothed characteristics of estimated speech parameters, quality degradation still occurs. To address this problem, we utilize post-conversion for the converted features based on direct waveform modifferential and global variance postfilter. To preserve the consistency with the post-conversion, we further propose a spectrum differential loss for the spectral modeling. The experimental results demonstrate that: (1) the RNN-based spectral modeling achieves higher accuracy with a faster convergence rate and better generalization compared to the DNN-/DMDN-based models; (2) the RNN-based spectral modeling is also capable of producing less oversmoothed spectral trajectory; (3) the use of proposed spectrum differential loss improves the performance in the same-gender conversions; and (4) the proposed post-conversion on converted features for the WN vocoder in VC yields the best performance in both naturalness and speaker similarity compared to the conventional use of WN vocoder.

本文用基于神经网络的统计模型对并行语音转换（VC）的频谱映射和波形生成进行了评估。用于频谱映射的基于神经网络的架构包括深度神经网络（DNN）、深度混合密度网络（DMDN）和递归神经网络（RNN）模型。WaveNet（WN）声码器被用作高质量的基于NN的波形生成。然而，在VC中，由于估计的语音参数的过度平滑特性，质量仍然会下降。为了解决这个问题，我们基于直接波形修正和全局方差后滤波器，对转换后的特征进行后转换。为了保持与后转换的一致性，我们进一步提出了用于频谱建模的频谱微分损失。实验结果表明：（1）与基于DNN-/DDN的模型相比，基于RNN的光谱建模具有更高的精度、更快的收敛速度和更好的泛化能力；（2）基于RNN的频谱建模也能够产生较少过平滑的频谱轨迹；（3）所提出的频谱差分损耗的使用提高了同性别转换的性能；以及（4）与WN声码器的传统使用相比，在VC中对WN声代码器的转换特征提出的后转换在自然度和扬声器相似性方面产生最佳性能。

{"title":"An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder","authors":"Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, T. Toda","doi":"10.1017/ATSIP.2020.24","DOIUrl":"https://doi.org/10.1017/ATSIP.2020.24","url":null,"abstract":"This paper presents an evaluation of parallel voice conversion (VC) with neural network (NN)-based statistical models for spectral mapping and waveform generation. The NN-based architectures for spectral mapping include deep NN (DNN), deep mixture density network (DMDN), and recurrent NN (RNN) models. WaveNet (WN) vocoder is employed as a high-quality NN-based waveform generation. In VC, though, owing to the oversmoothed characteristics of estimated speech parameters, quality degradation still occurs. To address this problem, we utilize post-conversion for the converted features based on direct waveform modifferential and global variance postfilter. To preserve the consistency with the post-conversion, we further propose a spectrum differential loss for the spectral modeling. The experimental results demonstrate that: (1) the RNN-based spectral modeling achieves higher accuracy with a faster convergence rate and better generalization compared to the DNN-/DMDN-based models; (2) the RNN-based spectral modeling is also capable of producing less oversmoothed spectral trajectory; (3) the use of proposed spectrum differential loss improves the performance in the same-gender conversions; and (4) the proposed post-conversion on converted features for the WN vocoder in VC yields the best performance in both naturalness and speaker similarity compared to the conventional use of WN vocoder.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2020.24","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44907118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

End-to-end recognition of streaming Japanese speech using CTC and local attention 基于CTC和局部关注的日语流媒体语音端到端识别

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-11-23 DOI: 10.1017/ATSIP.2020.23

Jiahao Chen, Ryota Nishimura, N. Kitaoka

Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.

许多端到端、大词汇量、连续的语音识别系统现在能够实现比传统系统更好的语音识别性能。然而，这些方法大多基于双向网络和序列到序列建模，因此使用此类技术的自动语音识别(ASR)系统在开始处理数据之前需要等待整个语音输入段的输入，从而导致长时间滞后，这在某些应用中可能是一个严重的缺点。解决这个问题的一个显而易见的方法是开发一种能够处理流数据的语音识别算法。因此，在本文中，我们利用基于连接时间分类(CTC)标准训练的单向lstm模型，探索了一个具有局部关注的日语流媒体在线ASR系统的可能性。由于大多数日语ASR系统采用双向网络，这种方法尚未被很好地研究用于日语。在实验评估中，我们提出的系统的最佳结果是字符错误率为9.87%。

{"title":"End-to-end recognition of streaming Japanese speech using CTC and local attention","authors":"Jiahao Chen, Ryota Nishimura, N. Kitaoka","doi":"10.1017/ATSIP.2020.23","DOIUrl":"https://doi.org/10.1017/ATSIP.2020.23","url":null,"abstract":"Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2020.23","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48219837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Ground-distance segmentation of 3D LiDAR point cloud toward autonomous driving 面向自动驾驶的3D LiDAR点云地距分割

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-11-23 DOI: 10.1017/ATSIP.2020.21

Jian Wu, Qingxiong Yang

In this paper, we study the semantic segmentation of 3D LiDAR point cloud data in urban environments for autonomous driving, and a method utilizing the surface information of the ground plane was proposed. In practice, the resolution of a LiDAR sensor installed in a self-driving vehicle is relatively low and thus the acquired point cloud is indeed quite sparse. While recent work on dense point cloud segmentation has achieved promising results, the performance is relatively low when directly applied to sparse point clouds. This paper is focusing on semantic segmentation of the sparse point clouds obtained from 32-channel LiDAR sensor with deep neural networks. The main contribution is the integration of the ground information which is used to group ground points far away from each other. Qualitative and quantitative experiments on two large-scale point cloud datasets show that the proposed method outperforms the current state-of-the-art.

本文研究了用于自动驾驶的城市环境中三维激光雷达点云数据的语义分割，并提出了一种利用地平面表面信息的方法。在实践中，安装在自动驾驶车辆中的激光雷达传感器的分辨率相对较低，因此所获取的点云确实相当稀疏。虽然最近在密集点云分割方面的工作已经取得了有希望的结果，但当直接应用于稀疏点云时，性能相对较低。本文主要研究利用深度神经网络对32通道激光雷达传感器获得的稀疏点云进行语义分割。主要贡献是地面信息的集成，用于对彼此远离的地面点进行分组。在两个大规模点云数据集上进行的定性和定量实验表明，该方法优于现有技术。

引用次数: 2

An SMLB-based OFDM receiver over impulsive noise environment 脉冲噪声环境下基于SMLB的OFDM接收机

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-11-20 DOI: 10.1017/ATSIP.2020.22

Chengbo Liu, Na Chen, M. Okada, Yafei Hou

The impulsive noise (IN) damages the performance of wireless communication in modern 5G scenarios such as manufacturing and automatic factories. The proposed receiver utilizes constant false alarm rate to obtain the threshold and combines with blanking to further improve the performance of the conventional blanking scheme with acceptable complexity. The simulated results show that the proposed receiver can achieve a lower bit error rate even if the probability of IN occurrence is very high and the power of the IN is much larger than that of the background noise.

在制造业和自动化工厂等现代5G场景中，脉冲噪声（IN）会损害无线通信的性能。所提出的接收机利用恒定的虚警率来获得阈值，并与消隐相结合，以进一步提高具有可接受复杂性的传统消隐方案的性能。仿真结果表明，即使IN发生的概率很高，并且IN的功率远大于背景噪声的功率，所提出的接收机也可以实现较低的误码率。

引用次数: 0

Discreteness and group sparsity aware detection for uplink overloaded MU-MIMO systems 上行链路过载MU-MIMO系统的离散性和组稀疏性检测

IF 3.2 Q1 Computer Science

APSIPA Transactions on Signal and Information Processing

Pub Date : 2020-10-06 DOI: 10.1017/ATSIP.2020.19

Ryo Hayakawa, Ayano Nakai-Kasai, K. Hayashi

This paper proposes signal detection methods for frequency domain equalization (FDE) based overloaded multiuser multiple input multiple output (MU-MIMO) systems for uplink Internet of things (IoT) environments, where a lot of IoT terminals are served by a base station having less number of antennas than that of IoT terminals. By using the fact that the transmitted signal vector has the discreteness and the group sparsity, we propose a convex discreteness and group sparsity aware (DGS) optimization problem for the signal detection. We provide an optimization algorithm for the DGS optimization on the basis of the alternating direction method of multipliers (ADMM). Moreover, we extend the DGS optimization into weighted DGS (W-DGS) optimization and propose an iterative approach named iterative weighted DGS (IW-DGS), where we iteratively solve the W-DGS optimization problem with the update of the parameters in the objective function. We also discuss the computational complexity of the proposed IW-DGS and show that we can reduce the order of the complexity by using the structure of the channel matrix. Simulation results show that the symbol error rate (SER) performance of the proposed method is close to that of the oracle zero forcing (ZF) method, which perfectly knows the activity of each IoT terminal.

本文提出了基于频域均衡(FDE)的过载多用户多输入多输出(MU-MIMO)系统的信号检测方法，用于上行物联网(IoT)环境，其中大量物联网终端由天线数量少于物联网终端的基站服务。利用传输信号矢量的离散性和群稀疏性，提出了一种凸离散性和群稀疏性感知的信号检测优化问题。提出了一种基于乘法器交替方向法(ADMM)的DGS优化算法。此外，我们将DGS优化扩展为加权DGS (W-DGS)优化，并提出了一种迭代加权DGS (IW-DGS)方法，通过更新目标函数中的参数来迭代求解W-DGS优化问题。我们还讨论了所提出的IW-DGS的计算复杂度，并表明我们可以通过使用信道矩阵的结构来降低复杂度的阶数。仿真结果表明，该方法的符号错误率(SER)性能接近于oracle零强制(ZF)方法，可以很好地了解每个物联网终端的活动情况。

{"title":"Discreteness and group sparsity aware detection for uplink overloaded MU-MIMO systems","authors":"Ryo Hayakawa, Ayano Nakai-Kasai, K. Hayashi","doi":"10.1017/ATSIP.2020.19","DOIUrl":"https://doi.org/10.1017/ATSIP.2020.19","url":null,"abstract":"This paper proposes signal detection methods for frequency domain equalization (FDE) based overloaded multiuser multiple input multiple output (MU-MIMO) systems for uplink Internet of things (IoT) environments, where a lot of IoT terminals are served by a base station having less number of antennas than that of IoT terminals. By using the fact that the transmitted signal vector has the discreteness and the group sparsity, we propose a convex discreteness and group sparsity aware (DGS) optimization problem for the signal detection. We provide an optimization algorithm for the DGS optimization on the basis of the alternating direction method of multipliers (ADMM). Moreover, we extend the DGS optimization into weighted DGS (W-DGS) optimization and propose an iterative approach named iterative weighted DGS (IW-DGS), where we iteratively solve the W-DGS optimization problem with the update of the parameters in the objective function. We also discuss the computational complexity of the proposed IW-DGS and show that we can reduce the order of the complexity by using the structure of the channel matrix. Simulation results show that the symbol error rate (SER) performance of the proposed method is close to that of the oracle zero forcing (ZF) method, which perfectly knows the activity of each IoT terminal.","PeriodicalId":44812,"journal":{"name":"APSIPA Transactions on Signal and Information Processing","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/ATSIP.2020.19","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47778910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

APSIPA Transactions on Signal and Information Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀