首页 > 最新文献

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Investigating Text-independent Speaker Verification from Practically Realizable System Perspective 基于可实现系统视角的文本无关说话人验证研究
Rohan Kumar Das, S. Prasanna
This work projects an attempt to explore the prospects of text-independent speaker verification (SV) for practical realizable systems. Although the advancements in SV systems have gained attention towards deployable systems, the performance seems to degrade under uncontrolled conditions. A protocol for data collection is designed for the text-independent SV with student attendance as an application to create a database in a real-world scenario. The i-vector based speaker modeling is used for evaluating the performance that depicts major deviation of results from that obtained on standard database. This portrays the significance of having real-world scenario based databases for robust SV studies. Further, studies are performed related to speaker categorization, speaker confidence and model update that showcase their significance towards systems in practice. The database created in this work is available as a part of multi-style speaker recognition database.
本研究旨在探索文本无关说话人验证(SV)在实际可实现系统中的应用前景。尽管SV系统的进步已经获得了对可部署系统的关注,但在不受控制的条件下,性能似乎会下降。为独立于文本的SV设计了一个数据收集协议,该SV将学生出勤作为一个应用程序,用于在真实场景中创建数据库。基于i向量的扬声器建模用于评估性能,描述结果与标准数据库上获得的结果的主要偏差。这描绘了拥有基于真实场景的数据库对于稳健的SV研究的重要性。此外,还进行了与说话人分类、说话人置信度和模型更新相关的研究,以展示它们对系统实践的意义。本文所建立的数据库可作为多风格说话人识别数据库的一部分。
{"title":"Investigating Text-independent Speaker Verification from Practically Realizable System Perspective","authors":"Rohan Kumar Das, S. Prasanna","doi":"10.23919/APSIPA.2018.8659567","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659567","url":null,"abstract":"This work projects an attempt to explore the prospects of text-independent speaker verification (SV) for practical realizable systems. Although the advancements in SV systems have gained attention towards deployable systems, the performance seems to degrade under uncontrolled conditions. A protocol for data collection is designed for the text-independent SV with student attendance as an application to create a database in a real-world scenario. The i-vector based speaker modeling is used for evaluating the performance that depicts major deviation of results from that obtained on standard database. This portrays the significance of having real-world scenario based databases for robust SV studies. Further, studies are performed related to speaker categorization, speaker confidence and model update that showcase their significance towards systems in practice. The database created in this work is available as a part of multi-style speaker recognition database.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131579266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Feature Pyramid Deep Matching and Localization Network for Image Forensics 图像取证的特征金字塔深度匹配与定位网络
Kui Ye, Jing Dong, Wei Wang, Bo Peng, T. Tan
To advance the state of the art of image forensics technologies, a new formulation of splicing localization is proposed, which aims to obtain the masks for both the query and donor images for a pair of query(probe) image and potential donor image if a region of the donor image was spliced into the probe. The former Deep Matching and Validation Network(DMVN) addresses the problem with a novel end-to-end learning based solution. Inheriting the deep dense matching layer, we propose Feature Pyramid Deep Matching and Localization Network(FPLN), whose contributions are three folds. Firstly, instead of using just one feature map as in DMVN, FPLN utilizes a pyramid of feature maps with different resolutions w.r.t. the input image to achieve better localization performance, especially for small objects. Secondly, we add a fusion layer that fuses together all the features after deep dense matching layer, which not only takes full advantage of the correlation information between those features, but is also able to integrate two pathways in DMVN into just one simple pathway, simplifying the subsequent architecture. Lastly, we employ focal loss to address the imbalance problem, as the foreground area is usually much smaller than the background area. The experiments demonstrate the superior performance of our proposed method in detection accuracy and in localizing small tempered regions.
为了提高图像取证技术的水平,提出了一种新的拼接定位公式,其目的是在将供体图像的一个区域拼接到探针中时,对一对查询(探针)图像和潜在供体图像获得查询图像和供体图像的掩码。前一种深度匹配和验证网络(DMVN)用一种新颖的端到端学习解决方案解决了这个问题。在继承深度密集匹配层的基础上,提出了特征金字塔深度匹配与定位网络(FPLN),其贡献有三方面。首先,FPLN不像DMVN那样只使用一个特征图,而是在输入图像的基础上利用不同分辨率的特征图金字塔来实现更好的定位性能,特别是对于小物体。其次,我们在深度密集匹配层之后增加融合层,将所有特征融合在一起,既充分利用了特征之间的相关信息,又能将DMVN中的两条路径整合为一条简单的路径,简化了后续的架构。最后,我们利用焦损来解决不平衡问题,因为前景区域通常比背景区域小得多。实验结果表明,该方法在检测精度和小回火区域定位方面具有较好的性能。
{"title":"Feature Pyramid Deep Matching and Localization Network for Image Forensics","authors":"Kui Ye, Jing Dong, Wei Wang, Bo Peng, T. Tan","doi":"10.23919/APSIPA.2018.8659464","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659464","url":null,"abstract":"To advance the state of the art of image forensics technologies, a new formulation of splicing localization is proposed, which aims to obtain the masks for both the query and donor images for a pair of query(probe) image and potential donor image if a region of the donor image was spliced into the probe. The former Deep Matching and Validation Network(DMVN) addresses the problem with a novel end-to-end learning based solution. Inheriting the deep dense matching layer, we propose Feature Pyramid Deep Matching and Localization Network(FPLN), whose contributions are three folds. Firstly, instead of using just one feature map as in DMVN, FPLN utilizes a pyramid of feature maps with different resolutions w.r.t. the input image to achieve better localization performance, especially for small objects. Secondly, we add a fusion layer that fuses together all the features after deep dense matching layer, which not only takes full advantage of the correlation information between those features, but is also able to integrate two pathways in DMVN into just one simple pathway, simplifying the subsequent architecture. Lastly, we employ focal loss to address the imbalance problem, as the foreground area is usually much smaller than the background area. The experiments demonstrate the superior performance of our proposed method in detection accuracy and in localizing small tempered regions.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116520895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Bayesian Multichannel Speech Enhancement with a Deep Speech Prior 基于深度语音先验的贝叶斯多通道语音增强
Kouhei Sekiguchi, Yoshiaki Bando, Kazuyoshi Yoshii, Tatsuya Kawahara
This paper describes statistical multichannel speech enhancement based on a deep generative model of speech spectra. Recently, deep neural networks (DNNs) have widely been used for converting noisy speech spectra to clean speech spectra or estimating time-frequency masks. Such a supervised approach, however, requires a sufficient amount of training data (pairs of noisy speech data and clean speech data) and often fails in an unseen noisy environment. This calls for a blind source separation method called multichannel nonnegative matrix factorization (MNMF) that can jointly estimate low-rank source spectra and spatial covariances on the fly. However, the assumption of low-rankness does not hold true for speech spectra. To solve these problems, we propose a semi-supervised method based on an extension of MNMF that consists of a deep generative model for speech spectra and a standard low-rank model for noise spectra. The speech model can be trained in advance with auto-encoding variational Bayes (AEVB) by using only clean speech data and is used as a prior of clean speech spectra for speech enhancement. Given noisy speech spectrogram, we estimate the posterior of clean speech spectra while estimating the noise model on the fly. Such adaptive estimation is achieved by using Gibbs sampling in a unified Bayesian framework. The experimental results showed the potential of the proposed method.
本文描述了基于语音谱深度生成模型的统计多通道语音增强。近年来,深度神经网络(dnn)被广泛应用于将噪声语音频谱转换为干净语音频谱或估计时频掩模。然而,这种有监督的方法需要足够数量的训练数据(有噪声的语音数据对和干净的语音数据对),并且经常在看不见的有噪声环境中失败。这需要一种称为多通道非负矩阵分解(MNMF)的盲源分离方法,该方法可以实时联合估计低秩源光谱和空间协方差。然而,低秩假设并不适用于语音谱。为了解决这些问题,我们提出了一种基于MNMF扩展的半监督方法,该方法由语音谱的深度生成模型和噪声谱的标准低秩模型组成。该模型可以仅使用干净的语音数据,利用自编码变分贝叶斯算法(AEVB)对语音模型进行预先训练,并作为干净语音谱的先验,用于语音增强。给定噪声语音谱图,我们在动态估计噪声模型的同时估计干净语音谱的后验。这种自适应估计是通过在统一的贝叶斯框架中使用吉布斯采样来实现的。实验结果表明了该方法的潜力。
{"title":"Bayesian Multichannel Speech Enhancement with a Deep Speech Prior","authors":"Kouhei Sekiguchi, Yoshiaki Bando, Kazuyoshi Yoshii, Tatsuya Kawahara","doi":"10.23919/APSIPA.2018.8659591","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659591","url":null,"abstract":"This paper describes statistical multichannel speech enhancement based on a deep generative model of speech spectra. Recently, deep neural networks (DNNs) have widely been used for converting noisy speech spectra to clean speech spectra or estimating time-frequency masks. Such a supervised approach, however, requires a sufficient amount of training data (pairs of noisy speech data and clean speech data) and often fails in an unseen noisy environment. This calls for a blind source separation method called multichannel nonnegative matrix factorization (MNMF) that can jointly estimate low-rank source spectra and spatial covariances on the fly. However, the assumption of low-rankness does not hold true for speech spectra. To solve these problems, we propose a semi-supervised method based on an extension of MNMF that consists of a deep generative model for speech spectra and a standard low-rank model for noise spectra. The speech model can be trained in advance with auto-encoding variational Bayes (AEVB) by using only clean speech data and is used as a prior of clean speech spectra for speech enhancement. Given noisy speech spectrogram, we estimate the posterior of clean speech spectra while estimating the noise model on the fly. Such adaptive estimation is achieved by using Gibbs sampling in a unified Bayesian framework. The experimental results showed the potential of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133194129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments 不利声环境下说话人聚类算法研究
Meng-Zhen Li, Xiao-Lei Zhang
Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on investigating its components respectively. A speaker clustering system contains three components—a feature extraction front-end, a dimensionality reduction algorithm, and a clustering back-end. In this paper, we use the standard Gaussian mixture model based universal background model (GMM-UBM) as a front end to extract high-dimensional supervectors, and compare three dimensionality reduction algorithms as well as two clustering algorithms. The three dimensionality reduction algorithms are the principal component analysis (PCA), spectral clustering (SC), and multilayer bootstrap network (MBN). The two clustering algorithms are the k-means and agglomerative hierarchical clustering (AHC). We have conducted an extensive experiment with both in-domain and out-of-domain settings on the noisy versions of the NIST 2006 speaker recognition evaluation (SRE) and NIST 2008 SRE corpora. Experimental results in various noisy environments show that (i) the MBN based systems perform the best in most cases, while the SC based systems outperform the PCA based systems as well as the original supervector based systems; (ii) AHC is more robust than k-means.
说话人聚类是语音处理中的一个重要问题,但对其在不利声环境下的行为缺乏全面的研究。为了解决这个问题,我们分别研究了它的组成部分。一个说话人聚类系统包含三个组成部分:特征提取前端、降维算法和聚类后端。本文采用基于标准高斯混合模型的通用背景模型(GMM-UBM)作为前端提取高维超向量,并对三种降维算法和两种聚类算法进行了比较。三种降维算法分别是主成分分析(PCA)、谱聚类(SC)和多层自举网络(MBN)。这两种聚类算法分别是k-means聚类算法和AHC聚类算法。我们在NIST 2006说话人识别评估(SRE)和NIST 2008 SRE语料库的噪声版本上进行了域内和域外设置的广泛实验。在各种噪声环境下的实验结果表明:(i)基于MBN的系统在大多数情况下表现最好,而基于SC的系统优于基于PCA的系统以及原始的基于超向量的系统;(ii) AHC比k-means更稳健。
{"title":"An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments","authors":"Meng-Zhen Li, Xiao-Lei Zhang","doi":"10.23919/APSIPA.2018.8659665","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659665","url":null,"abstract":"Speaker clustering is an important problem of speech processing, such as speaker diarization, however, its behavior in adverse acoustic environments is lack of comprehensive study. To address this problem, we focus on investigating its components respectively. A speaker clustering system contains three components—a feature extraction front-end, a dimensionality reduction algorithm, and a clustering back-end. In this paper, we use the standard Gaussian mixture model based universal background model (GMM-UBM) as a front end to extract high-dimensional supervectors, and compare three dimensionality reduction algorithms as well as two clustering algorithms. The three dimensionality reduction algorithms are the principal component analysis (PCA), spectral clustering (SC), and multilayer bootstrap network (MBN). The two clustering algorithms are the k-means and agglomerative hierarchical clustering (AHC). We have conducted an extensive experiment with both in-domain and out-of-domain settings on the noisy versions of the NIST 2006 speaker recognition evaluation (SRE) and NIST 2008 SRE corpora. Experimental results in various noisy environments show that (i) the MBN based systems perform the best in most cases, while the SC based systems outperform the PCA based systems as well as the original supervector based systems; (ii) AHC is more robust than k-means.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133233206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Uplink Resource Allocation for Narrowband Internet of Things (NB-IoT) Cellular Networks 窄带物联网蜂窝网络上行链路资源分配
Ya-Ju Yu, Jhih-Kai Wang
Narrowband Internet of Things (NB-IoT) is a new narrowband radio technology in fifth-generation (5G) networks. In NB-IoT cellular networks, to provide low-power wide-area coverage, there are several different resource allocation units that can be allocated by a base station to NB-IoT devices for uplink transmissions. Traditional resource allocation algorithms without considering the multiple resource allocation units are not appropriate for NB-IoT networks, and we observe that only adopting the same resource unit for each device will result in the radio resource wastage. Therefore, this paper investigates the uplink resource allocation problem with the considerations of the new radio frame structure and multiple resource units for NB-IoT networks. The objective is to minimize the used radio resources while each device can transmit its data. We propose an algorithm to determine a suitable resource unit and allocate the radio resource for each device to solve the target problem. Compared with a baseline, the simulation results show the efficacy of the proposed algorithm and provide useful insights into the resource allocation design for NB-IoT systems.
窄带物联网(NB-IoT)是第五代(5G)网络中的一种新型窄带无线电技术。在NB-IoT蜂窝网络中,为了提供低功耗广域覆盖,有几种不同的资源分配单元可以由基站分配给NB-IoT设备进行上行传输。传统的不考虑多个资源分配单元的资源分配算法不适合NB-IoT网络,我们观察到每个设备只采用相同的资源单元会导致无线电资源的浪费。因此,本文研究了考虑新无线电帧结构和多资源单元的NB-IoT网络上行资源分配问题。目标是在每个设备都可以传输其数据的同时最大限度地减少使用的无线电资源。我们提出了一种算法来确定合适的资源单元,并为每个设备分配无线电资源,以解决目标问题。仿真结果显示了算法的有效性,并为NB-IoT系统的资源分配设计提供了有用的见解。
{"title":"Uplink Resource Allocation for Narrowband Internet of Things (NB-IoT) Cellular Networks","authors":"Ya-Ju Yu, Jhih-Kai Wang","doi":"10.23919/APSIPA.2018.8659711","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659711","url":null,"abstract":"Narrowband Internet of Things (NB-IoT) is a new narrowband radio technology in fifth-generation (5G) networks. In NB-IoT cellular networks, to provide low-power wide-area coverage, there are several different resource allocation units that can be allocated by a base station to NB-IoT devices for uplink transmissions. Traditional resource allocation algorithms without considering the multiple resource allocation units are not appropriate for NB-IoT networks, and we observe that only adopting the same resource unit for each device will result in the radio resource wastage. Therefore, this paper investigates the uplink resource allocation problem with the considerations of the new radio frame structure and multiple resource units for NB-IoT networks. The objective is to minimize the used radio resources while each device can transmit its data. We propose an algorithm to determine a suitable resource unit and allocate the radio resource for each device to solve the target problem. Compared with a baseline, the simulation results show the efficacy of the proposed algorithm and provide useful insights into the resource allocation design for NB-IoT systems.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133684185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Generative adversarial networks for generating RGB-D videos 用于生成RGB-D视频的生成对抗网络
Yuki Nakahira, K. Kawamoto
Generative adversarial networks(GANs) have been successfully applied for generating high quality natural images and have been extended to the generation of RGB videos and 3D volume data. In this paper we consider the task of generating RGB-D videos, which is less extensively studied and still challenging. We explore deep GAN architectures suitable for the task, and develop 4 GAN architectures based on existing video-based GANs. With a facial expression database, we experimentally find that an extended version of the motion and content decomposed GANs, known as MoCoGAN, provides the highest quality RGB-D videos. We discuss several applications of our GAN to content creation and data augmentation, and also discuss its potential applications in behavioral experiments.
生成对抗网络(GANs)已经成功地应用于生成高质量的自然图像,并已扩展到生成RGB视频和3D体数据。在本文中,我们考虑了生成RGB-D视频的任务,这是一个研究较少且仍然具有挑战性的任务。我们探索了适合该任务的深度GAN架构,并基于现有的基于视频的GAN开发了4种GAN架构。通过面部表情数据库,我们实验发现,运动和内容分解gan的扩展版本(称为MoCoGAN)可以提供最高质量的RGB-D视频。我们讨论了GAN在内容创建和数据增强方面的几种应用,并讨论了其在行为实验中的潜在应用。
{"title":"Generative adversarial networks for generating RGB-D videos","authors":"Yuki Nakahira, K. Kawamoto","doi":"10.23919/APSIPA.2018.8659648","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659648","url":null,"abstract":"Generative adversarial networks(GANs) have been successfully applied for generating high quality natural images and have been extended to the generation of RGB videos and 3D volume data. In this paper we consider the task of generating RGB-D videos, which is less extensively studied and still challenging. We explore deep GAN architectures suitable for the task, and develop 4 GAN architectures based on existing video-based GANs. With a facial expression database, we experimentally find that an extended version of the motion and content decomposed GANs, known as MoCoGAN, provides the highest quality RGB-D videos. We discuss several applications of our GAN to content creation and data augmentation, and also discuss its potential applications in behavioral experiments.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"14 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133773929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detectability of the Image Operation Order: Upsampling and Mean Filtering 图像运算顺序的可检测性:上采样和均值滤波
Jiana Li, Xin Liao, Rongbing Hu, Xuchong Liu
As image modification and tampering, especially multiple editing operations, prevail in today's world, identifying authenticity and credibility of digital images becomes increasingly important. Recently, two editing operations, upsampling and mean filtering, have attracted increasing attention. While there are many existing image forensics techniques to identify the existence and order of specific operations in a certain processing chain, few detecting methods are concerned about the order of upsampling and mean filtering operations. Following some strongly indicative analysis in different domains of DFTs of images' p-maps, this paper discusses a newly designed method which utilizes features to determine the order of upsampling and mean filtering operations. Specifically, our goal is to use two features, the symmetry-based PSNR and the fourth order energy fitting curve, to characterize the features of operation chains in the DFTs of images' p-maps. We calculate the variance of the fitting curve and examine the change of fingerprints under different operating intensities to ensure these two features can be broadly applied to operation detection. These features are fed to SVM, effectively discriminating among five combinations of upsampling and mean filtering. The representative experiments can verify the effectiveness of the proposed method.
随着当今世界对图像的修改和篡改,特别是多重编辑操作的盛行,数字图像的真实性和可信度的识别变得越来越重要。最近,上采样和均值滤波两种编辑操作引起了越来越多的关注。虽然已有许多图像取证技术用于识别特定处理链中特定操作的存在和顺序,但很少有检测方法关注上采样和均值滤波操作的顺序。在对图像p-map的dft的不同域进行了强指示性分析之后,本文讨论了一种利用特征来确定上采样和均值滤波运算顺序的新设计方法。具体来说,我们的目标是使用两个特征,即基于对称的PSNR和四阶能量拟合曲线,来表征图像p-map的dft中的操作链特征。我们计算拟合曲线的方差,并检验指纹在不同操作强度下的变化,以确保这两个特征可以广泛应用于操作检测。将这些特征输入到支持向量机中,有效地区分上采样和均值滤波的五种组合。具有代表性的实验验证了该方法的有效性。
{"title":"Detectability of the Image Operation Order: Upsampling and Mean Filtering","authors":"Jiana Li, Xin Liao, Rongbing Hu, Xuchong Liu","doi":"10.23919/APSIPA.2018.8659597","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659597","url":null,"abstract":"As image modification and tampering, especially multiple editing operations, prevail in today's world, identifying authenticity and credibility of digital images becomes increasingly important. Recently, two editing operations, upsampling and mean filtering, have attracted increasing attention. While there are many existing image forensics techniques to identify the existence and order of specific operations in a certain processing chain, few detecting methods are concerned about the order of upsampling and mean filtering operations. Following some strongly indicative analysis in different domains of DFTs of images' p-maps, this paper discusses a newly designed method which utilizes features to determine the order of upsampling and mean filtering operations. Specifically, our goal is to use two features, the symmetry-based PSNR and the fourth order energy fitting curve, to characterize the features of operation chains in the DFTs of images' p-maps. We calculate the variance of the fitting curve and examine the change of fingerprints under different operating intensities to ensure these two features can be broadly applied to operation detection. These features are fed to SVM, effectively discriminating among five combinations of upsampling and mean filtering. The representative experiments can verify the effectiveness of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133937583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech 使用噪声生成模型的基于dnn的语音合成方法
Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki, H. Saruwatari
This paper proposes a generative approach to construct high-quality speech synthesis from noisy speech. Studio-quality recorded speech is required to construct high-quality speech synthesis, but most of existing speech has been recorded in a noisy environment. A common method to use noisy speech for training speech synthesis models is reducing the noise before the vocoder-based parameterization. However, such multi-step processes cause an accumulation of spectral distortion. Meanwhile, statistical parametric speech synthesis (SPSS) without vocoders, which directly generates spectral parameters or waveforms, has been proposed recently. The vocoder-free SPSS will enable us to train speech synthesis models considering the noise addition process generally used in signal processing research. In the proposed approach, newly introduced noise generation models trained by a generative adversarial training algorithm randomly generates spectra of the noise. The speech synthesis models are trained to make the sum of their output and the randomly generated noise close to spectra of noisy speech. Because the noise generation model parameters fit the spectrum of the observed noise, the proposed method can alleviate the spectral distortion found in the conventional method. Experimental results demonstrate that the proposed method outperforms the conventional method in terms of synthetic speech quality.
本文提出了一种基于生成方法的高质量语音合成方法。构建高质量的语音合成需要录音室质量的录制语音,但现有的大部分语音都是在嘈杂的环境中录制的。利用噪声语音训练语音合成模型的常用方法是在基于声码器的参数化之前降低噪声。然而,这种多步骤处理会导致光谱畸变的积累。与此同时,最近提出了不使用声码器直接生成频谱参数或波形的统计参数语音合成(SPSS)。无声码器的SPSS将使我们能够训练语音合成模型,考虑到信号处理研究中通常使用的噪声添加过程。在该方法中,新引入的噪声生成模型由生成对抗训练算法训练,随机生成噪声谱。对语音合成模型进行训练,使其输出和随机产生的噪声之和接近有噪声语音的频谱。由于噪声产生模型参数与观测噪声的频谱拟合,因此该方法可以减轻传统方法的频谱畸变。实验结果表明,该方法在合成语音质量方面优于传统方法。
{"title":"Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech","authors":"Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki, H. Saruwatari","doi":"10.23919/APSIPA.2018.8659691","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659691","url":null,"abstract":"This paper proposes a generative approach to construct high-quality speech synthesis from noisy speech. Studio-quality recorded speech is required to construct high-quality speech synthesis, but most of existing speech has been recorded in a noisy environment. A common method to use noisy speech for training speech synthesis models is reducing the noise before the vocoder-based parameterization. However, such multi-step processes cause an accumulation of spectral distortion. Meanwhile, statistical parametric speech synthesis (SPSS) without vocoders, which directly generates spectral parameters or waveforms, has been proposed recently. The vocoder-free SPSS will enable us to train speech synthesis models considering the noise addition process generally used in signal processing research. In the proposed approach, newly introduced noise generation models trained by a generative adversarial training algorithm randomly generates spectra of the noise. The speech synthesis models are trained to make the sum of their output and the randomly generated noise close to spectra of noisy speech. Because the noise generation model parameters fit the spectrum of the observed noise, the proposed method can alleviate the spectral distortion found in the conventional method. Experimental results demonstrate that the proposed method outperforms the conventional method in terms of synthetic speech quality.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131818743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Technology of Network Storage Covert Channel Based on OPTICS Algorithm 基于OPTICS算法的网络存储隐蔽信道检测技术
Linkai Huang, Linna Zhou, Yunbiao Guo
Nowadays, with our enhanced technology, network security has gradually been valued by more and more people. Network covert channel, as an important area of network security, was put forward these years. On the one hand, covert channels provide new and safe communication environment for network communication. However, on the other hand, some illegal persons are exploited to spread viruses, trojans, and so on. Therefore, the research on covert channels is particularly important. According to the resource attributes, network covert channels can be divided into two parts, storage-based channels and timestamps-based channels. The paper will base on the storage-based covert channels, and a new detection method based on clustering algorithm will be proposed. According to the clustering analysis of the values of various parts of the data flow packet, the clustering results are graphically displayed. Determine whether there is a covert channel in the packet. The experimental results show that the detection technology has the advantages of high accuracy, simple algorithm, and intuitive result images, and achieves the desired ideal results.
在科技进步的今天,网络安全逐渐被越来越多的人所重视。网络隐蔽信道作为网络安全的一个重要领域,是近年来被提出的。隐蔽信道一方面为网络通信提供了新的、安全的通信环境。然而,另一方面,一些不法分子被利用来传播病毒、木马等。因此,对隐蔽信道的研究就显得尤为重要。根据资源属性,网络隐蔽通道可以分为基于存储的通道和基于时间戳的通道两部分。本文将在基于存储的隐蔽信道的基础上,提出一种新的基于聚类算法的检测方法。根据对数据流数据包各部分值的聚类分析,将聚类结果图形化显示。确定数据包中是否存在隐蔽通道。实验结果表明,该检测技术具有精度高、算法简单、结果图像直观等优点,达到了预期的理想效果。
{"title":"Detecting Technology of Network Storage Covert Channel Based on OPTICS Algorithm","authors":"Linkai Huang, Linna Zhou, Yunbiao Guo","doi":"10.23919/APSIPA.2018.8659780","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659780","url":null,"abstract":"Nowadays, with our enhanced technology, network security has gradually been valued by more and more people. Network covert channel, as an important area of network security, was put forward these years. On the one hand, covert channels provide new and safe communication environment for network communication. However, on the other hand, some illegal persons are exploited to spread viruses, trojans, and so on. Therefore, the research on covert channels is particularly important. According to the resource attributes, network covert channels can be divided into two parts, storage-based channels and timestamps-based channels. The paper will base on the storage-based covert channels, and a new detection method based on clustering algorithm will be proposed. According to the clustering analysis of the values of various parts of the data flow packet, the clustering results are graphically displayed. Determine whether there is a covert channel in the packet. The experimental results show that the detection technology has the advantages of high accuracy, simple algorithm, and intuitive result images, and achieves the desired ideal results.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132295696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Colorization Algorithm based on Image Segmentation and Graph Signal Processing 基于图像分割和图信号处理的着色算法
Mamoru Sugawara, Kazunori Uruma, S. Hangai
This paper proposes a new colorization algorithm for converting grayscale image to color image by using several colored pixels. In order to obtain a properly colorized image, graph signal processing and image segmentation technique are introduced. After a graph is constructed on the given grayscale image, graph signal recovery technique based on graph Fourier transform recovers several important color information. The whole image is colorized by the Levin's algorithm. Experimental results using 5 images show 1.82dB to 10.2dB improvement in PSNR. The proposed algorithm also reduces the fading of color compared with the Levin's algorithm.
提出了一种利用彩色像素将灰度图像转换为彩色图像的彩色化算法。为了获得合适的彩色图像,引入了图形信号处理和图像分割技术。在给定的灰度图像上构造图形后,基于图形傅里叶变换的图形信号恢复技术可以恢复几个重要的颜色信息。整个图像通过莱文算法着色。5幅图像的实验结果表明,PSNR提高了1.82dB ~ 10.2dB。与Levin算法相比,该算法还减少了颜色的衰落。
{"title":"Colorization Algorithm based on Image Segmentation and Graph Signal Processing","authors":"Mamoru Sugawara, Kazunori Uruma, S. Hangai","doi":"10.23919/APSIPA.2018.8659720","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659720","url":null,"abstract":"This paper proposes a new colorization algorithm for converting grayscale image to color image by using several colored pixels. In order to obtain a properly colorized image, graph signal processing and image segmentation technique are introduced. After a graph is constructed on the given grayscale image, graph signal recovery technique based on graph Fourier transform recovers several important color information. The whole image is colorized by the Levin's algorithm. Experimental results using 5 images show 1.82dB to 10.2dB improvement in PSNR. The proposed algorithm also reduces the fading of color compared with the Levin's algorithm.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115674882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1