2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Energy-constrained real-time H.264/AVC video coding 能量受限的实时H.264/AVC视频编码

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6637950

T. Fonseca, R. Queiroz

Energy consumption has become a leading design constraint for computing devices in order to defray electric bills for individuals and businesses. Over the past years, digital video communication technologies have demanded higher computing power availability and, therefore, higher energy expenditure. In order to meet the challenge to provide software-based video encoding solutions, we adopted an open source software implementation of an H.264 video encoder, the x264 encoder, and optimized its prediction stage in the energy sense (E). Thus, besides looking for the coding options which lead to the best coded representation in terms of rate and distortion (RD), we constrain the process to fit within a certain energy budget. i.e., an RDE optimization. We considered energy as the time integration of the real demanded electric power for a given system. We present an RDE-optimized framework which allows for software-based real-time video compression, meeting the desired targets of electrical consumption, hence, controlling carbon emissions. We show results of energy-constrained compression wherein one can save as much as 35% of the energy with small impact on RD performance.

为了支付个人和企业的电费，能源消耗已经成为计算设备的主要设计约束。在过去的几年里，数字视频通信技术要求更高的计算能力，因此，更高的能源消耗。为了应对提供基于软件的视频编码解决方案的挑战，我们采用了H.264视频编码器x264编码器的开源软件实现，并在能量意义(E)上优化了其预测阶段。因此，除了寻找在速率和失真(RD)方面导致最佳编码表示的编码选项外，我们还将该过程约束在一定的能量预算内。例如，RDE优化。我们把能量看作是给定系统实际所需电力的时间积分。我们提出了一个rde优化框架，该框架允许基于软件的实时视频压缩，满足所需的电力消耗目标，从而控制碳排放。我们展示了能量约束压缩的结果，其中可以节省多达35%的能量，对RD性能的影响很小。

{"title":"Energy-constrained real-time H.264/AVC video coding","authors":"T. Fonseca, R. Queiroz","doi":"10.1109/ICASSP.2013.6637950","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637950","url":null,"abstract":"Energy consumption has become a leading design constraint for computing devices in order to defray electric bills for individuals and businesses. Over the past years, digital video communication technologies have demanded higher computing power availability and, therefore, higher energy expenditure. In order to meet the challenge to provide software-based video encoding solutions, we adopted an open source software implementation of an H.264 video encoder, the x264 encoder, and optimized its prediction stage in the energy sense (E). Thus, besides looking for the coding options which lead to the best coded representation in terms of rate and distortion (RD), we constrain the process to fit within a certain energy budget. i.e., an RDE optimization. We considered energy as the time integration of the real demanded electric power for a given system. We present an RDE-optimized framework which allows for software-based real-time video compression, meeting the desired targets of electrical consumption, hence, controlling carbon emissions. We show results of energy-constrained compression wherein one can save as much as 35% of the energy with small impact on RD performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"1739-1743"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88712903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A diagonalized newton algorithm for non-negative sparse coding 非负稀疏编码的对角化牛顿算法

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6639080

H. V. hamme

Signal models where non-negative vector data are represented by a sparse linear combination of non-negative basis vectors have attracted much attention in problems including image classification, document topic modeling, sound source segregation and robust speech recognition. In this paper, an iterative algorithm based on Newton updates to minimize the Kullback-Leibler divergence between data and model is proposed. It finds the sparse activation weights of the basis vectors more efficiently than the expectation-maximization (EM) algorithm. To avoid the computational burden of a matrix inversion, a diagonal approximation is made and therefore the algorithm is called diagonal Newton Algorithm (DNA). It is several times faster than EM, especially for undercomplete problems. But DNA also performs surprisingly well on overcomplete problems.

用非负基向量的稀疏线性组合表示非负向量数据的信号模型在图像分类、文档主题建模、声源分离和鲁棒语音识别等问题中受到广泛关注。本文提出了一种基于牛顿更新的迭代算法，以最小化数据与模型之间的Kullback-Leibler散度。它比期望最大化(EM)算法更有效地找到基向量的稀疏激活权。为了避免矩阵反演的计算负担，采用对角近似，因此该算法被称为对角牛顿算法(DNA)。它比EM快几倍，特别是对于不完全问题。但DNA在解决过于完整的问题上也表现得出奇地好。

引用次数: 1

Lossy compression of sparse histogram image 稀疏直方图图像的有损压缩

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288143

M. Iwahashi, H. Kobayashi, H. Kiya

In this paper, a lossy data compression for a sparse histogram image signal is proposed. It is extended from an existing lossless coding which is based on a lossless histogram packing and a lossless coding. We introduce a lossy mapping, which has less computational load than the rate-distortion optimized Lloyd-Max quantization, and combine it with a lossless coding. It was confirmed that the proposed method attains higher performance in the rate-distortion plane than existing methods. This is because it can utilize histogram sparseness of images, and also its inverse mapping does not magnify quantization noise.

本文提出了一种稀疏直方图图像信号的有损数据压缩方法。它是在现有无损编码的基础上扩展而来的一种基于无损直方图打包和无损编码的无损编码。我们引入了一种比率失真优化的Lloyd-Max量化计算量更小的有损映射，并将其与无损编码相结合。实验结果表明，该方法在速率畸变平面上比现有方法具有更高的性能。这是因为它可以利用图像的直方图稀疏性，而且它的逆映射不会放大量化噪声。

引用次数: 33

Effect of anti-aliasing filtering on the quality of speech from an HMM-based synthesizer 抗混叠滤波对基于hmm合成器语音质量的影响

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288924

Y. Shiga

This paper investigates how the quality of speech produced through statistical parametric synthesis is affected by anti-aliasing filtering, i.e., low-pass filtering that is applied prior to (down-) sampling prerecorded speech at a desired rate. It has empirically been known that the frequency response of such anti-aliasing filters influences the quality of speech synthesized to a considerable degree. For the purpose of understanding such influence more clearly, in this paper we examine the spectral aspects of speech involved in the processes of HMM training and synthesis. We then propose a technique of feature extraction that can avoid producing the roll-off feature of the frequency response near the Nyquist frequency, which is found to be the major cause of speech quality degradation resulting from anti-aliasing filtering. In the technique, the spectrum is first computed from speech at a sampling rate higher than the desired rate, then it is truncated so that its frequency range above the target Nyquist frequency is discarded, and finally the truncated spectrum is converted directly into the cepstrum. Listening test results show that the proposed technique enables training HMMs efficiently with a limited number of model parameters and effectively with less artifacts in the speech synthesized at a desired sampling rate.

本文研究了通过统计参数合成产生的语音质量如何受到抗混叠滤波的影响，即在以期望速率对预先录制的语音进行(下)采样之前应用的低通滤波。经验表明，这种抗混叠滤波器的频率响应在很大程度上影响了合成语音的质量。为了更清楚地理解这种影响，本文研究了HMM训练和合成过程中涉及的语音频谱方面。然后，我们提出了一种特征提取技术，可以避免在奈奎斯特频率附近产生频率响应的滚降特征，这是抗混叠滤波导致语音质量下降的主要原因。在该技术中，首先以高于期望速率的采样率从语音中计算频谱，然后对其进行截断，使其高于目标奈奎斯特频率的频率范围被丢弃，最后将截断的频谱直接转换为倒谱。听力测试结果表明，该方法可以在有限的模型参数下有效地训练hmm，并且在所需的采样率下合成语音中的伪影较少。

{"title":"Effect of anti-aliasing filtering on the quality of speech from an HMM-based synthesizer","authors":"Y. Shiga","doi":"10.1109/ICASSP.2012.6288924","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288924","url":null,"abstract":"This paper investigates how the quality of speech produced through statistical parametric synthesis is affected by anti-aliasing filtering, i.e., low-pass filtering that is applied prior to (down-) sampling prerecorded speech at a desired rate. It has empirically been known that the frequency response of such anti-aliasing filters influences the quality of speech synthesized to a considerable degree. For the purpose of understanding such influence more clearly, in this paper we examine the spectral aspects of speech involved in the processes of HMM training and synthesis. We then propose a technique of feature extraction that can avoid producing the roll-off feature of the frequency response near the Nyquist frequency, which is found to be the major cause of speech quality degradation resulting from anti-aliasing filtering. In the technique, the spectrum is first computed from speech at a sampling rate higher than the desired rate, then it is truncated so that its frequency range above the target Nyquist frequency is discarded, and finally the truncated spectrum is converted directly into the cepstrum. Listening test results show that the proposed technique enables training HMMs efficiently with a limited number of model parameters and effectively with less artifacts in the speech synthesized at a desired sampling rate.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"93 20 1","pages":"4525-4528"},"PeriodicalIF":0.0,"publicationDate":"2012-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83488176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved DOA estimation with acoustic vector sensor arrays using spatial sparsity and subarray manifold 利用空间稀疏性和子阵流形改进声矢量传感器阵列的DOA估计

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288438

Bo Li, Y. Zou

The performance of DOA estimation with scalar sensor arrays using spatial sparse signal reconstruction (SSR) technique is affected by the grid spacing. In this paper, we formulate the DOA estimation with the acoustic vector sensor (AVS) arrays under SSR framework. A coarse-to-fine DOA estimation algorithm has been developed. The source spatial sparsity and the inter-relations among the manifold matrices of the AVS subarrays are jointly utilized to eliminate the grid effect in the SSR technique and the improvement of the overall DOA estimation performance is achieved at low complexity. Simulation results show that the proposed method effectively mitigates the DOA estimation bias caused by off-grid sources. Interestingly, our method gives good DOA estimation accuracy when sources are closely located.

基于空间稀疏信号重构技术的标量传感器阵列方位估计受到网格间距的影响。本文在SSR框架下建立了声矢量传感器阵列的DOA估计方法。提出了一种从粗到精的DOA估计算法。利用源空间稀疏性和AVS子阵列流形矩阵之间的相互关系，消除了SSR技术中的网格效应，在低复杂度下提高了整体的DOA估计性能。仿真结果表明，该方法有效地减轻了离网源引起的DOA估计偏差。有趣的是，当源位置较近时，我们的方法具有较好的DOA估计精度。

引用次数: 17

Discriminative common vectors based on the Gram-Schmidt reorthogonalization for the small sample size problem 基于Gram-Schmidt再正交化的小样本量问题判别公向量

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288134

Y. Wen, Lianghua He, Yue Lu

The discriminative common vectors (DCV) algorithm shows better face recognition effects than some commonly used linear discriminant algorithms, which uses the subspace methods and the Gram-Schmidt orthogonalization (GSO) procedure to obtain the DCV. However, the Gram-Schmidt technique may produce a set of vectors which is far from orthogonal so that sometimes the orthogonality may be lost completely. Hence, the effectiveness of the DCV is also decreased. In this paper, we proposed an improved DCV method based on the GSO. For obtaining an accurate projection onto the corresponding space, the orthogonal basis problem is usually solved with the Gram-Schmidt process with reorthogonalization. Thus, the effectiveness of the DCV can be improved and the experimental results show that the proposed method is better for the small sample size problem as compared to the DCV.

判别公向量(discriminative common vector, DCV)算法采用子空间方法和Gram-Schmidt正交化(GSO)方法得到的判别公向量(discriminative common vector, DCV)算法，其人脸识别效果优于一些常用的线性判别算法。然而，Gram-Schmidt技术可能产生一组远离正交的向量，以至于有时会完全失去正交性。因此，DCV的有效性也降低了。在本文中，我们提出了一种基于GSO的改进DCV方法。为了得到相应空间上的精确投影，正交基问题通常采用重新正交化的Gram-Schmidt过程来解决。实验结果表明，该方法比DCV方法更适合小样本量问题。

引用次数: 0

A predictive model of music preference using pairwise comparisons 使用两两比较的音乐偏好预测模型

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288294

B. S. Jensen, J. S. Gallego, Jan Larsen

Music recommendation is an important aspect of many streaming services and multi-media systems, however, it is typically based on so-called collaborative filtering methods. In this paper we consider the recommendation task from a personal viewpoint and examine to which degree music preference can be elicited and predicted using simple and robust queries such as pairwise comparisons. We propose to model - and in turn predict - the pairwise music preference using a very flexible model based on Gaussian Process priors for which we describe the required inference. We further propose a specific covariance function and evaluate the predictive performance on a novel dataset. In a recommendation style setting we obtain a leave-one-out accuracy of 74% compared to 50% with random predictions, showing potential for further refinement and evaluation.

音乐推荐是许多流媒体服务和多媒体系统的一个重要方面，然而，它通常是基于所谓的协同过滤方法。在本文中，我们从个人的角度考虑推荐任务，并检查在多大程度上可以使用简单而稳健的查询(如两两比较)来引出和预测音乐偏好。我们建议使用基于高斯过程先验的非常灵活的模型来建模-并反过来预测-两两音乐偏好，我们描述了所需的推理。我们进一步提出了一个特定的协方差函数，并评估了在一个新的数据集上的预测性能。在推荐风格设置中，我们获得了74%的留一准确率，而随机预测的准确率为50%，显示出进一步改进和评估的潜力。

引用次数: 16

Dynamic Bayesian socio-situational setting classification 动态贝叶斯社会情境设置分类

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6289063

Yangyang Shi, P. Wiggers, C. Jonker

We propose a dynamic Bayesian classifier for the socio-situational setting of a conversation. Knowledge of the socio-situational setting can be used to search for content recorded in a particular setting or to select context-dependent models in speech recognition. The dynamic Bayesian classifier has the advantage - compared to static classifiers such a naive Bayes and support vector machines - that it can continuously update the classification during a conversation. We experimented with several models that use lexical and part-of-speech information. Our results show that the prediction accuracy of the dynamic Bayesian classifier using the first 25% of a conversation is almost 98% of the final prediction accuracy, which is calculated on the entire conversation. The best final prediction accuracy, 88.85%, is obtained by bigram dynamic Bayesian classification using words and part-of-speech tags.

我们提出了一个动态贝叶斯分类器，用于会话的社会情境设置。社会情境设置的知识可以用来搜索在特定设置中记录的内容，或者在语音识别中选择上下文相关的模型。与静态分类器(如朴素贝叶斯和支持向量机)相比，动态贝叶斯分类器的优势在于，它可以在对话期间不断更新分类。我们试验了几个使用词汇和词性信息的模型。我们的结果表明，使用会话的前25%的动态贝叶斯分类器的预测精度几乎是最终预测精度的98%，最终预测精度是在整个会话上计算的。使用单词和词性标签的双图动态贝叶斯分类获得了最佳的最终预测准确率，为88.85%。

引用次数: 3

Stable signal recovery in compressed sensing with a structured matrix perturbation 结构矩阵摄动压缩感知中的稳定信号恢复

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288483

Zai Yang, Cisheng Zhang, Lihua Xie

The sparse signal recovery in standard compressed sensing (CS) requires that the sensing matrix is exactly known. The CS problem subject to perturbation in the sensing matrix is often encountered in practice and has attracted interest of researches. Unlike existing robust signal recoveries with the recovery error growing linearly with the perturbation level, this paper analyzes the CS problem subject to a structured perturbation to provide conditions for stable signal recovery under measurement noise. Under mild conditions on the perturbed sensing matrix, similar to that for the standard CS, it is shown that a sparse signal can be stably recovered by ℓ1 minimization. A remarkable result is that the recovery is exact and independent of the perturbation if there is no measurement noise and the signal is sufficiently sparse. In the presence of noise, largest entries (in magnitude) of a compressible signal can be stably recovered. The result is demonstrated by a simulation example.

标准压缩感知(CS)中的稀疏信号恢复要求感知矩阵准确已知。传感矩阵中受扰动影响的CS问题在实践中经常遇到，并引起了人们的研究兴趣。与现有的恢复误差随扰动水平线性增长的鲁棒信号恢复不同，本文分析了受结构化扰动的CS问题，为测量噪声下的稳定信号恢复提供了条件。在类似于标准CS的轻微扰动条件下，证明了通过最小化可以稳定地恢复稀疏信号。一个显著的结果是，如果没有测量噪声，信号足够稀疏，恢复是准确的，不受扰动的影响。在噪声存在的情况下，可以稳定地恢复可压缩信号的最大分量(大小)。通过仿真算例验证了结果。

引用次数: 6

Asymptotic analysis of a partial feedback OFDMA system employing spatial, spectral, and multiuser diversity 采用空间、频谱和多用户分集的部分反馈OFDMA系统的渐近分析

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2012-08-31 DOI: 10.1109/ICASSP.2012.6288601

Yichao Huang, B. Rao

Spatial and multiuser diversity are two types of diversity techniques for delivering reliable high-date-rate services. Spectral diversity comes from opportunistic scheduling in the frequency domain enabled by the OFDMA technique, and is influenced by partial feedback design. By employing the best-M partial feedback strategy, we provide a unified view of spatial, spectral, and multiuser diversity through asymptotic (in users) analysis. We examine the tail behavior of the distribution of the received channel quality information (CQI) at the scheduler to prove the type of convergence as well as to derive the asymptotic approximations for the average spectral efficiency under partial feedback. We investigate the application of our analysis to different spatial diversity schemes. Our derived results can be used to quickly determine the minimum required partial feedback in a general multiuser MIMO-OFDMA system.

空间分集和多用户分集是提供可靠的高数据速率服务的两种分集技术。频谱分集来自于OFDMA技术在频域的机会调度，并受部分反馈设计的影响。通过采用最佳m部分反馈策略，我们通过渐近(在用户中)分析提供了空间、频谱和多用户多样性的统一视图。我们研究了接收到的信道质量信息(CQI)在调度器上分布的尾部行为，以证明其收敛类型，并推导了部分反馈下平均频谱效率的渐近逼近。我们研究了我们的分析在不同空间多样性方案中的应用。我们的推导结果可用于快速确定一般多用户MIMO-OFDMA系统所需的最小部分反馈。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀