2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文中文

MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning 基于动量对比表征学习的非平行语音转换

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979937

Kotaro Onishi, Toru Nakashika

Non-parallel voice conversion with deep neural net-works often disentangle speaker individuality and speech content. However, these methods rely on external models, text data, or implicit constraints for ways to disentangle. They may require learning other models or annotating text, or may not understand how latent representations are acquired. Therefore, we pro-pose voice conversion with momentum contrastive representation learning (MoCo V C), a method of explicitly adding constraints to intermediate features using contrastive representation learning, which is a self-supervised learning method. Using contrastive rep-resentation learning with transformations that preserve utterance content allows us to explicitly constrain the intermediate features to preserve utterance content. We present transformations used for contrastive representation learning that could be used for voice conversion and verify the effectiveness of each in an exper-iment. Moreover, MoCoVC demonstrates a high or comparable performance to the vector quantization constrained method in terms of both naturalness and speaker individuality in subjective evaluation experiments.

基于深度神经网络的非并行语音转换往往会将说话人的个性与语音内容分离开来。然而，这些方法依赖于外部模型、文本数据或隐式约束来解开纠缠。他们可能需要学习其他模型或注释文本，或者可能不理解如何获得潜在表征。因此，我们提出了基于动量对比表示学习的语音转换(MoCo V C)，这是一种利用对比表示学习显式地向中间特征添加约束的方法，是一种自监督学习方法。将对比表示学习与保留话语内容的转换结合使用，可以显式地约束中间特征以保留话语内容。我们提出了用于对比表示学习的转换，可用于语音转换，并在实验中验证了每种转换的有效性。此外，在主观评价实验中，MoCoVC在自然度和说话人个性方面都表现出与矢量量化约束方法相当的性能。

{"title":"MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning","authors":"Kotaro Onishi, Toru Nakashika","doi":"10.23919/APSIPAASC55919.2022.9979937","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979937","url":null,"abstract":"Non-parallel voice conversion with deep neural net-works often disentangle speaker individuality and speech content. However, these methods rely on external models, text data, or implicit constraints for ways to disentangle. They may require learning other models or annotating text, or may not understand how latent representations are acquired. Therefore, we pro-pose voice conversion with momentum contrastive representation learning (MoCo V C), a method of explicitly adding constraints to intermediate features using contrastive representation learning, which is a self-supervised learning method. Using contrastive rep-resentation learning with transformations that preserve utterance content allows us to explicitly constrain the intermediate features to preserve utterance content. We present transformations used for contrastive representation learning that could be used for voice conversion and verify the effectiveness of each in an exper-iment. Moreover, MoCoVC demonstrates a high or comparable performance to the vector quantization constrained method in terms of both naturalness and speaker individuality in subjective evaluation experiments.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115062338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image Watermarking based on Saliency Detection and Multiple Transformations 基于显著性检测和多重变换的图像水印

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980044

Ahmed Khan, Koksheik Wong, Vishnu Monn Baskaran

An ideal image watermarking (IW) scheme aims to manage the trade-off among quality, capacity, and robustness. However, our literature survey reveals some flaws in the form of poor robustness and quality or low embedding capability. In this paper, multiple frequency domain based image watermarking scheme using salient (eye-catching) object detection is applied. Specifically, the host and the watermark images are partitioned into background and foreground regions by the proposed multi-dimension decomposition, which accumulates image patches and combining them to form the salient map. Next, the watermark image is encrypted by multiple applications of the 3D Arnold and logistic maps, then embedded into both the identified foreground and background regions of the host image by using different embedding strengths. The proposed method can embed 1 color pixel of the watermark image into 1 color pixel in the host image while maintaining high image quality. In the best case scenario, we could embed a 24-bit image as the watermark into a 24-bit image of the same dimension while maintaining an average RGB-SSIM of 0.9999. Experiments are carried out (with 10K MSRA dataset images) to verify the performance of the proposed method and to compare our proposed method against the state-of-the-art (SOTA) watermarking methods.

一种理想的图像水印(IW)方案旨在管理质量、容量和鲁棒性之间的权衡。然而，我们的文献调查显示了一些缺陷，表现为鲁棒性和质量较差或嵌入能力较低。本文提出了一种基于多频域的图像水印方案，该方案采用显著目标检测。具体来说，通过提出的多维分解方法，将主图像和水印图像划分为背景和前景区域，将图像斑块累积并组合形成显著图。接下来，水印图像通过三维Arnold和logistic映射的多种应用进行加密，然后使用不同的嵌入强度嵌入到识别的主机图像的前景和背景区域中。该方法可以将水印图像的1个彩色像素嵌入到主图像的1个彩色像素中，同时保持较高的图像质量。在最好的情况下，我们可以将24位图像作为水印嵌入到相同维度的24位图像中，同时保持平均RGB-SSIM为0.9999。实验(使用10K MSRA数据集图像)验证了所提出方法的性能，并将所提出的方法与最先进的(SOTA)水印方法进行了比较。

{"title":"Image Watermarking based on Saliency Detection and Multiple Transformations","authors":"Ahmed Khan, Koksheik Wong, Vishnu Monn Baskaran","doi":"10.23919/APSIPAASC55919.2022.9980044","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980044","url":null,"abstract":"An ideal image watermarking (IW) scheme aims to manage the trade-off among quality, capacity, and robustness. However, our literature survey reveals some flaws in the form of poor robustness and quality or low embedding capability. In this paper, multiple frequency domain based image watermarking scheme using salient (eye-catching) object detection is applied. Specifically, the host and the watermark images are partitioned into background and foreground regions by the proposed multi-dimension decomposition, which accumulates image patches and combining them to form the salient map. Next, the watermark image is encrypted by multiple applications of the 3D Arnold and logistic maps, then embedded into both the identified foreground and background regions of the host image by using different embedding strengths. The proposed method can embed 1 color pixel of the watermark image into 1 color pixel in the host image while maintaining high image quality. In the best case scenario, we could embed a 24-bit image as the watermark into a 24-bit image of the same dimension while maintaining an average RGB-SSIM of 0.9999. Experiments are carried out (with 10K MSRA dataset images) to verify the performance of the proposed method and to compare our proposed method against the state-of-the-art (SOTA) watermarking methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123921076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Evaluation of Voice Service in LEO Communication with 3GPP PUSCH Repetition Enhancement 基于3GPP PUSCH重复增强的LEO通信语音业务评价

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979986

Shou-Hong Liu, Chun-Tai Liu, Wei-Hung Chou, JenYi Pan

In recent years, 3GPP (3rd generation partnership project) has studied and developed the standards for Non-terrestrial networks (NTN). One of the newest working items for NTN is coverage enhancements. In this paper, we construct the NTN channel described in 3GPP. Moreover, we summarize the NTN channel model and current coverage enhancements. Since the scenario of NTN is very different from the traditional terrestrial network systems, we also summarize the challenges and phenomena of the NTN. To reach the high communication quality of voice over internet protocol (VoIP) service in NTN, we evaluate the performance and discuss the benefit of the PUSCH repetition technique in the NTN low Earth orbit (LEO) scenario.

近年来，3GPP(第三代合作伙伴计划)研究和制定了非地面网络(NTN)的标准。NTN的最新工作项目之一是覆盖增强。在本文中，我们构建了3GPP中描述的NTN信道。此外，我们总结了NTN信道模型和当前的覆盖增强。由于NTN的场景与传统的地面网络系统有很大的不同，我们还总结了NTN面临的挑战和现象。为了在NTN中实现高质量的网络语音(VoIP)服务，我们评估了NTN低地球轨道(LEO)场景下PUSCH重复技术的性能并讨论了其优势。

引用次数: 0

Detection and Correction of Adversarial Examples Based on IPEG-Compression-Derived Distortion 基于ipeg压缩失真的对抗性样本检测与校正

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980147

Kenta Tsunomori, Yuma Yamasaki, M. Kuribayashi, N. Funabiki, I. Echizen

An effective way to defend against adversarial examples (AEs), which are used, for example, to attack applications such as face recognition, is to detect in advance whether an input image is an AE. Some AE defense methods focus on the response characteristics of image classifiers when denoising filters are applied to the input image. However, several filters are required, which results in a large amount of computation. Because JPEG compression of AEs effectively removes adversarial perturbations, the difference between the image before and after JPEG compression should be highly correlated with the perturbations. However, the difference should not be completely consistent with adversarial perturbations. We have developed a filtering operation that modulates this difference by varying their magnitude and positive/negative sign and adding them to an image so that adversarial perturbations can be effectively removed. We consider that adversarial perturbations that could not be removed by simply applying JPEG compression can be removed by modulating this difference. Furthermore, applying a resizing process to the image after adding these distortions enables us to remove perturbations that could not be removed otherwise. The filtering operation will successfully remove the adversarial noise and reconstruct the corrected samples from AEs. We also consider a simple but effective reconstruction method based on the filtering operations. Experiments in which the adversarial attack used was not known to the detector demonstrated that the proposed method could achieve better performance in terms of accuracy with reasonable computational complexity. In addition, the percentage of correct classification results after applying the proposed filter for non-targeted attacks was higher than that of JPEG compression and scaling. These results suggest that the proposed method effectively removes adversarial perturbations and is an effective filter for detecting AEs.

对抗对抗示例(例如，用于攻击人脸识别等应用程序)的一种有效方法是提前检测输入图像是否为AE。一些声发射防御方法关注的是对输入图像施加去噪滤波器时图像分类器的响应特性。然而，需要几个过滤器，这导致了大量的计算。由于AEs的JPEG压缩有效地消除了对抗性扰动，因此JPEG压缩前后图像之间的差异应该与扰动高度相关。然而，这种差异不应与对抗性扰动完全一致。我们已经开发了一种过滤操作，通过改变它们的大小和正负号来调节这种差异，并将它们添加到图像中，这样就可以有效地去除对抗性扰动。我们认为，通过简单地应用JPEG压缩不能消除的对抗性扰动可以通过调制这种差异来消除。此外，在添加这些扭曲后对图像应用调整大小过程使我们能够消除其他方法无法消除的扰动。滤波操作将成功地去除对抗噪声，并从ae中重建校正后的样本。我们还考虑了一种简单而有效的基于滤波运算的重建方法。实验结果表明，该方法在不知道检测器所使用的对抗性攻击的情况下，在计算复杂度合理的情况下，在准确率方面取得了较好的性能。此外，对于非目标攻击，应用该滤波器后的分类结果正确率高于JPEG压缩和缩放。这些结果表明，该方法有效地消除了对抗性扰动，是检测AEs的有效滤波器。

{"title":"Detection and Correction of Adversarial Examples Based on IPEG-Compression-Derived Distortion","authors":"Kenta Tsunomori, Yuma Yamasaki, M. Kuribayashi, N. Funabiki, I. Echizen","doi":"10.23919/APSIPAASC55919.2022.9980147","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980147","url":null,"abstract":"An effective way to defend against adversarial examples (AEs), which are used, for example, to attack applications such as face recognition, is to detect in advance whether an input image is an AE. Some AE defense methods focus on the response characteristics of image classifiers when denoising filters are applied to the input image. However, several filters are required, which results in a large amount of computation. Because JPEG compression of AEs effectively removes adversarial perturbations, the difference between the image before and after JPEG compression should be highly correlated with the perturbations. However, the difference should not be completely consistent with adversarial perturbations. We have developed a filtering operation that modulates this difference by varying their magnitude and positive/negative sign and adding them to an image so that adversarial perturbations can be effectively removed. We consider that adversarial perturbations that could not be removed by simply applying JPEG compression can be removed by modulating this difference. Furthermore, applying a resizing process to the image after adding these distortions enables us to remove perturbations that could not be removed otherwise. The filtering operation will successfully remove the adversarial noise and reconstruct the corrected samples from AEs. We also consider a simple but effective reconstruction method based on the filtering operations. Experiments in which the adversarial attack used was not known to the detector demonstrated that the proposed method could achieve better performance in terms of accuracy with reasonable computational complexity. In addition, the percentage of correct classification results after applying the proposed filter for non-targeted attacks was higher than that of JPEG compression and scaling. These results suggest that the proposed method effectively removes adversarial perturbations and is an effective filter for detecting AEs.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128993205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech 基于深度学习的单通道语音增强在调频传输语音中的应用

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980216

Yingyi Ma, Xueliang Zhang

There are three main interferences in the FM signal trans-mission process-Multipath effect, Doppler effect, and White noise. These interferences have significant influences on speech. We proposed a method that uses a masking or mapping approach for single-channel speech enhancement in wireless communication. Since the method improves speech equality by focusing on three interferences simultaneously, it is simpler in comparison to conventional methods. Experiments are conducted on the dataset, which is simulated by ourselves. Because the PESQ and STOI need reference targets, it is hard to evaluate the performance using real-world data. So we only give the spectral comparison of the real data enhancement results. Simulation results show excellent speech enhancement performance on the unprocessed mixture and significantly improve speech quality on the actual collected data. It verifies the feasibility of deep learning on this kind of task. Future studies will be made to improve the real-time performance and compress the number of network parameters.

调频信号在传输过程中主要存在三种干扰:多径效应、多普勒效应和白噪声。这些干扰对言语有重大影响。我们提出了一种利用掩模或映射方法对无线通信中的单通道语音进行增强的方法。由于该方法通过同时关注三个干扰来改善语音平等，因此与传统方法相比，该方法更简单。在数据集上进行了实验，并自行进行了模拟。由于PESQ和STOI需要参考目标，因此很难使用实际数据来评估性能。因此我们只给出真实数据增强结果的光谱比较。仿真结果表明，在未处理的混合数据上有良好的语音增强效果，在实际采集数据上有明显的语音质量提高。验证了深度学习在此类任务上的可行性。未来的研究将进一步提高实时性能，压缩网络参数的数量。

{"title":"Application of Deep Learning-based Single-channel Speech Enhancement for Frequency-modulation Transmitted Speech","authors":"Yingyi Ma, Xueliang Zhang","doi":"10.23919/APSIPAASC55919.2022.9980216","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980216","url":null,"abstract":"There are three main interferences in the FM signal trans-mission process-Multipath effect, Doppler effect, and White noise. These interferences have significant influences on speech. We proposed a method that uses a masking or mapping approach for single-channel speech enhancement in wireless communication. Since the method improves speech equality by focusing on three interferences simultaneously, it is simpler in comparison to conventional methods. Experiments are conducted on the dataset, which is simulated by ourselves. Because the PESQ and STOI need reference targets, it is hard to evaluate the performance using real-world data. So we only give the spectral comparison of the real data enhancement results. Simulation results show excellent speech enhancement performance on the unprocessed mixture and significantly improve speech quality on the actual collected data. It verifies the feasibility of deep learning on this kind of task. Future studies will be made to improve the real-time performance and compress the number of network parameters.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Object Detection in Aerial Images with Attention-based Regression Loss 基于注意力回归损失的航拍图像目标检测

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980311

Chandler Timm C. Doloriel, R. Cajote

Object detection is a computer vision technique used to identify objects that are usually present in natural scenes. However, the methods used for this case are not easily transferable to detect objects in aerial images. Objects in aerial images are mostly arbitrary-oriented, small, and in complex backgrounds compared to upright and well-focused objects in natural scenes. To effectively detect objects in aerial images, we propose a new regression loss function based on the attention mechanism through attention weights. Using the relative position of the attention weights to the bounding box, the foreground is given more attention, which highlights the target object and effectively suppresses the noise and background. Preliminary experiments are conducted on an attention-based object detector using the DOTA dataset to test the capability of attention mechanism in extracting the contextual information of objects, especially in complex environments.

物体检测是一种计算机视觉技术，用于识别通常存在于自然场景中的物体。然而，这种情况下使用的方法不容易转移到检测航空图像中的目标。与自然场景中垂直且聚焦良好的物体相比，航空图像中的物体大多是任意方向的、小的、背景复杂的。为了有效地检测航拍图像中的目标，我们提出了一种新的基于注意机制的回归损失函数。利用关注权值与边界框的相对位置，给予前景更多的关注，从而突出目标物体，有效地抑制噪声和背景。利用DOTA数据集对基于注意的目标检测器进行了初步实验，以测试注意机制在复杂环境下提取对象上下文信息的能力。

引用次数: 0

An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem 基于近似ADMM的$ell_{1}-ell_{2}$优化算法

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980002

Rui Lin, Kazunori Hayashi

Compressed sensing is a technique to recover a sparse vector from its underdetermined linear measurements. Since a naive $ell_{0}$ optimization approach is hard to tackle due to the discreteness and the non-convexity of $ell_{0}$ norm, a relaxed problem of the $ell_{1}-ell_{2}$ optimization is often employed for the reconstruction of the sparse vector especially when the measurement noise is not negligible. FISTA (fast iterative shrinkage-thresholding algorithm) is one of popular algorithms for the $ell_{1}-ell_{2}$ optimization, and is known to achieve optimal convergence rate among the first order methods. Recently, the employment of optical circuits for various signal processing including deep neural networks has been considered intensively, but it is difficult to implement FISTA with the optical circuit, because it requires operations of divisions with a dynamic value in the algorithm. In this paper, assuming the implementation with the optical circuit, we propose an ADMM (alternating direction method of multipliers) based algorithm for the $ell_{1}-ell_{2}$ optimization. It is true that an ADMM based algorithm for the $ell_{1}-ell_{2}$ optimization has been already proposed in the literature, but the proposed algorithm is derived with the different formulation from the existing method, and unlike the existing ADMM based algorithm, the proposed algorithm does not include the calculation of the inverse of a matrix. Computer simulation results demonstrate that the proposed algorithm can achieve comparable performance as FISTA or existing ADMM based algorithm while requiring no division operations and no matrix inversions.

压缩感知是一种从欠确定的线性测量中恢复稀疏向量的技术。由于朴素的$ell_{0}$优化方法由于$ell_{0}$范数的离散性和非凸性而难以解决，因此通常采用$ell_{1}-ell_{2}$优化的松弛问题来重建稀疏向量，特别是在测量噪声不可忽略的情况下。快速迭代收缩阈值算法(fast iterative shrink- threshold algorithm)是一种常用的$ell_{1}-ell_{2}$优化算法，在一阶算法中具有最优的收敛速度。近年来，包括深度神经网络在内的各种信号处理已被广泛地考虑使用光电路，但由于在算法中需要进行具有动态值的除法运算，因此用光电路实现FISTA比较困难。在本文中，假设用光学电路实现，我们提出了一种基于ADMM(乘法器的交替方向法)的$ell_{1}-ell_{2}$优化算法。的确，文献中已经提出了一种基于ADMM的$ell_{1}-ell_{2}$优化算法，但该算法的推导公式与现有方法不同，并且与现有的基于ADMM的算法不同，该算法不包括矩阵逆的计算。计算机仿真结果表明，该算法不需要除法运算，不需要矩阵逆求，可以达到与fisa或现有基于ADMM的算法相当的性能。

{"title":"An Approximated ADMM based Algorithm for $ell_{1}-ell_{2}$ Optimization Problem","authors":"Rui Lin, Kazunori Hayashi","doi":"10.23919/APSIPAASC55919.2022.9980002","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980002","url":null,"abstract":"Compressed sensing is a technique to recover a sparse vector from its underdetermined linear measurements. Since a naive $ell_{0}$ optimization approach is hard to tackle due to the discreteness and the non-convexity of $ell_{0}$ norm, a relaxed problem of the $ell_{1}-ell_{2}$ optimization is often employed for the reconstruction of the sparse vector especially when the measurement noise is not negligible. FISTA (fast iterative shrinkage-thresholding algorithm) is one of popular algorithms for the $ell_{1}-ell_{2}$ optimization, and is known to achieve optimal convergence rate among the first order methods. Recently, the employment of optical circuits for various signal processing including deep neural networks has been considered intensively, but it is difficult to implement FISTA with the optical circuit, because it requires operations of divisions with a dynamic value in the algorithm. In this paper, assuming the implementation with the optical circuit, we propose an ADMM (alternating direction method of multipliers) based algorithm for the $ell_{1}-ell_{2}$ optimization. It is true that an ADMM based algorithm for the $ell_{1}-ell_{2}$ optimization has been already proposed in the literature, but the proposed algorithm is derived with the different formulation from the existing method, and unlike the existing ADMM based algorithm, the proposed algorithm does not include the calculation of the inverse of a matrix. Computer simulation results demonstrate that the proposed algorithm can achieve comparable performance as FISTA or existing ADMM based algorithm while requiring no division operations and no matrix inversions.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131068050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scrambling-Embedding in Partially-Encrypted Images 在部分加密图像中嵌入乱序

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979991

Koi Yee Ng, Simying Ong

In this paper, an improved scrambling-embedding technique, namely row-rotational-based data hiding method is proposed to hide data in partially-encrypted images. The partially-encrypted images are generated by performing bit-wise XOR-cipher to investigate the feasibility of applying the proposed method in various encryption levels. The proposed method is performed by divided each row into multiple non-overlapping continuous partitions. These partitions will be arranged in a rotational manner to create different states, while each state will be used to represent specific data in binary representation. During the decoding process, α notation is introduced to reduce the number of failure rows, which will cause further image degradation and incorrect data extraction. The BSDS300 dataset is utilized for experiments, and encrypted with different encryption strengths. From the experiment results, it is observed that when least significant bits are encrypted, the proposed data hiding method using scrambling-embedding technique can still performed well as in the plain image domain.

本文提出了一种改进的置乱嵌入技术，即基于行旋转的数据隐藏方法，用于隐藏部分加密图像中的数据。通过执行逐位异或密码生成部分加密的图像，以研究在各种加密级别中应用所提出方法的可行性。该方法通过将每一行划分为多个不重叠的连续分区来实现。这些分区将以旋转方式排列以创建不同的状态，而每个状态将用于以二进制表示表示特定的数据。在解码过程中，为了减少错误行数，引入了α符号，避免了错误行数导致图像进一步退化和错误的数据提取。实验采用BSDS300数据集，采用不同的加密强度进行加密。实验结果表明，在对最小有效位进行加密的情况下，采用置乱嵌入技术的数据隐藏方法仍然可以达到与普通图像域相同的效果。

引用次数: 0

Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level 青少年能量倒谱系数在言语困难严重程度分类中的应用

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980322

Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil

Dysarthria is a neuro-motor speech impairment that renders speech unintelligibility, which is generally imperceptible to humans w.r.t severity-levels. Dysarthric speech classification acts as a diagnostic tool for evaluating the advancement in a patient's severity condition and also aids in automatic dysarthric speech recognition systems (an important assistive speech technology). This study investigates the significance of Teager Energy Cepstral Coefficients (TECC) in dysarthric speech classification using three deep learning architectures, namely, Convolutional Neural Network (CNN), Light-CNN (LCNN), and Residual Networks (ResNet). The performance of TECC is compared with state-of-the-art features, such as Short-Time Fourier Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). In addition, this study also investigate the effectiveness of cepstral features over the spectral features for this problem. The highest classification accuracy achieved using UA-Speech corpus is 97.18%, 94.63%, and 98.02% (i.e., absolute improvement of 1.98%, 1.41%, and 1.69%) with CNN, LCNN, and ResNet, respectively, as compared to the MFCC. Further, we evaluate feature discriminative capability using $F1$-score, Matthew's Correlation Coefficient (MCC), Jaccard index, and Hamming loss. Finally, analysis of latency period w.r.t. state-of-the-art feature sets indicates the potential of TECC for practical deployment of the severity-level classification system.

构音障碍是一种神经运动语言障碍，导致语言难以理解，一般情况下，人类无法察觉。运动障碍语音分类是评估患者严重程度进展的诊断工具，也有助于自动运动障碍语音识别系统(一种重要的辅助语音技术)。本研究利用卷积神经网络(CNN)、轻神经网络(LCNN)和残差网络(ResNet)三种深度学习架构，探讨了青少年能量倒谱系数(TECC)在困难语音分类中的意义。将TECC的性能与短时傅里叶变换(STFT)、Mel频率倒谱系数(MFCC)和线性频率倒谱系数(LFCC)等最先进的特征进行了比较。此外，本研究还探讨了倒谱特征对该问题的有效性。与MFCC相比，使用UA-Speech语料库实现的最高分类准确率分别为97.18%，94.63%和98.02%(即绝对提高1.98%，1.41%和1.69%)CNN, LCNN和ResNet。此外，我们使用$F1$-score、Matthew's Correlation Coefficient (MCC)、Jaccard index和Hamming loss来评估特征判别能力。最后，分析了延迟时间w.r.t.最先进的特征集，表明了TECC在实际部署严重级别分类系统方面的潜力。

{"title":"Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level","authors":"Aastha Kachhi, Anand Therattil, Ankur T. Patil, Hardik B. Sailor, H. Patil","doi":"10.23919/APSIPAASC55919.2022.9980322","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980322","url":null,"abstract":"Dysarthria is a neuro-motor speech impairment that renders speech unintelligibility, which is generally imperceptible to humans w.r.t severity-levels. Dysarthric speech classification acts as a diagnostic tool for evaluating the advancement in a patient's severity condition and also aids in automatic dysarthric speech recognition systems (an important assistive speech technology). This study investigates the significance of Teager Energy Cepstral Coefficients (TECC) in dysarthric speech classification using three deep learning architectures, namely, Convolutional Neural Network (CNN), Light-CNN (LCNN), and Residual Networks (ResNet). The performance of TECC is compared with state-of-the-art features, such as Short-Time Fourier Transform (STFT), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). In addition, this study also investigate the effectiveness of cepstral features over the spectral features for this problem. The highest classification accuracy achieved using UA-Speech corpus is 97.18%, 94.63%, and 98.02% (i.e., absolute improvement of 1.98%, 1.41%, and 1.69%) with CNN, LCNN, and ResNet, respectively, as compared to the MFCC. Further, we evaluate feature discriminative capability using $F1$-score, Matthew's Correlation Coefficient (MCC), Jaccard index, and Hamming loss. Finally, analysis of latency period w.r.t. state-of-the-art feature sets indicates the potential of TECC for practical deployment of the severity-level classification system.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification 基于语音和非语音片段的说话人验证重放攻击检测

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980225

Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee

Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.

声音可以代表一个人的身份。因此，它可以用于自动说话人验证(ASV)系统，以验证安全应用程序。不幸的是，现有的ASV系统很容易受到欺骗攻击。重放攻击是一种被广泛使用的欺骗技术，因为它简单但难以检测。因此，提出了许多对抗重放攻击的方法。大多数工作在检测性能中不可分割地考虑了声音和非声音部分。在这项工作中，我们研究了语音、非语音和两者在不同语音百分比下的欺骗检测性能，以获得最佳部分。我们还提出了一种利用信号的最优部分检测重放攻击的方法。从最优截面计算mel频倒谱系数作为特征，使用ResNet-34模型进行分类。我们使用来自ASVspoof 2019挑战的数据集评估了所提出的方法。结果表明，重放攻击检测的最佳部分是在非语音部分中包含10%和20%的语音。该方法的相对改进率为7.52%，错误率为1.72%，优于基线。

{"title":"Replay Attack Detection Based on Voice and Non-voice Sections for Speaker Verification","authors":"Ananda Garin Mills, Patthranit Kaewcharuay, Pannathorn Sathirasattayanon, Suradej Duangpummet, Kasorn Galajit, Jessada Karnjana, P. Aimmanee","doi":"10.23919/APSIPAASC55919.2022.9980225","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980225","url":null,"abstract":"Voice can represent a person's identity. Thus, it can be used in automatic speaker verification (ASV) systems for authenticating secure applications. Unfortunately, existing ASV systems are vulnerable to spoofing attacks. A replay attack is a widely used spoofing technique because it is simple but difficult to detect. Hence, many methods are proposed for countermeasures against replay attacks. Most work inseparably considers voice and non-voice sections in the detection's performance. In this work, we investigate the spoof detection performances when the voice, non-voice, and both with different percentages of voice are used to obtain the optimal section. We also propose a method for detecting replay attacks using the optimal section of a signal. Mel-frequency cepstral coefficients are calculated from the optimal section as a feature, and the ResNet-34 model is used for classification. We evaluated the proposed method using a dataset from the ASVspoof 2019 challenge. The results depict that the optimal section for replay attack detection is when 10% and 20% of voice are included in the non-voice sections. It also showed that the proposed method outperforms the baselines with a 7.52% relatively improvement or an equal error rate of 1.72%.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130867969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀