首页 > 最新文献

2008 IEEE 10th Workshop on Multimedia Signal Processing最新文献

英文 中文
Comparison of different feature extraction techniques in content-based image retrieval for CT brain images 基于内容的CT脑图像检索中不同特征提取技术的比较
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665130
Wan Siti Halimatul Munirah Wan Ahmad, M. F. A. Fauzi
Content-based image retrieval (CBIR) system helps users retrieve relevant images based on their contents. A reliable content-based feature extraction technique is therefore required to effectively extract most of the information from the images. These important elements include texture, colour, intensity or shape of the object inside an image. CBIR, when used in medical applications, can help medical experts in their diagnosis such as retrieving similar kind of disease and patientpsilas progress monitoring. In this paper, several feature extraction techniques are explored to see their effectiveness in retrieving medical images. The techniques are Gabor transform, discrete wavelet frame, Hu moment invariants, Fourier descriptor, gray level histogram and gray level coherence vector. Experiments are conducted on 3,032 CT images of human brain and promising results are reported.
基于内容的图像检索(CBIR)系统可以使用户根据图像的内容检索相关图像。因此,需要一种可靠的基于内容的特征提取技术来有效地提取图像中的大部分信息。这些重要的元素包括图像中物体的纹理、颜色、强度或形状。在医学应用中,CBIR可以帮助医学专家进行诊断,例如检索类似的疾病和监测患者的病情进展。本文探讨了几种特征提取技术在医学图像检索中的有效性。这些技术包括Gabor变换、离散小波帧、Hu矩不变量、傅里叶描述子、灰度直方图和灰度相干向量。对3032张人脑CT图像进行了实验,并取得了可喜的结果。
{"title":"Comparison of different feature extraction techniques in content-based image retrieval for CT brain images","authors":"Wan Siti Halimatul Munirah Wan Ahmad, M. F. A. Fauzi","doi":"10.1109/MMSP.2008.4665130","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665130","url":null,"abstract":"Content-based image retrieval (CBIR) system helps users retrieve relevant images based on their contents. A reliable content-based feature extraction technique is therefore required to effectively extract most of the information from the images. These important elements include texture, colour, intensity or shape of the object inside an image. CBIR, when used in medical applications, can help medical experts in their diagnosis such as retrieving similar kind of disease and patientpsilas progress monitoring. In this paper, several feature extraction techniques are explored to see their effectiveness in retrieving medical images. The techniques are Gabor transform, discrete wavelet frame, Hu moment invariants, Fourier descriptor, gray level histogram and gray level coherence vector. Experiments are conducted on 3,032 CT images of human brain and promising results are reported.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125749014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Efficient stereo bitrate allocation for fully scalable audio codec 有效的立体声比特率分配完全可扩展的音频编解码器
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665206
Te Li, S. Rahardja, S. Koh
The bit allocation algorithm for stereo channels in MPEG-4 scalable lossless coding (SLS) is not optimized. A perceptually enhanced stereo bit allocation algorithm for fully scalable audio coding is presented in this paper. According to the energy distribution in different channels, the bitrate is allocated in a much more efficient manner. Experiment results show that the proposed method significantly improves the perceptual quality of the fully scalable audio at various bitrates without introducing any new side information.
MPEG-4可扩展无损编码(SLS)中立体声信道的位分配算法没有得到优化。本文提出了一种感知增强的立体声位分配算法,用于完全可扩展的音频编码。根据不同信道的能量分布,可以更有效地分配比特率。实验结果表明,该方法在不引入任何新的侧信息的情况下,显著提高了不同比特率下全可扩展音频的感知质量。
{"title":"Efficient stereo bitrate allocation for fully scalable audio codec","authors":"Te Li, S. Rahardja, S. Koh","doi":"10.1109/MMSP.2008.4665206","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665206","url":null,"abstract":"The bit allocation algorithm for stereo channels in MPEG-4 scalable lossless coding (SLS) is not optimized. A perceptually enhanced stereo bit allocation algorithm for fully scalable audio coding is presented in this paper. According to the energy distribution in different channels, the bitrate is allocated in a much more efficient manner. Experiment results show that the proposed method significantly improves the perceptual quality of the fully scalable audio at various bitrates without introducing any new side information.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127341299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Motion modeling with separate quad-tree structures for geometry and motion 运动建模与独立的四叉树结构的几何和运动
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665106
R. Mathew, D. Taubman
Quad-tree structures are often used to model motion between frames of a video sequence. However, a fundamental limitation of the quad-tree structure is that it can only capture horizontal and vertical edge discontinuities at dyadically related locations. To address this limitation recent work has focused on the introduction of geometry information to nodes of tree structured motion representations. In this paper we explore modeling boundary geometry and motion with separate quadtree structures. Recent work into quad-tree representations have also highlighted the benefits of leaf merging. We extend the leaf merging paradigm to incorporate separate tree structures for boundary geometry and motion. To achieve an efficient joint representation we introduce polynomial motion models and piecewise linear boundary geometry to our quad-tree structures. Experimental results show that the approach taken in this paper provides significant improvement over previous quad-tree based motion representation schemes.
四叉树结构通常用于模拟视频序列帧之间的运动。然而,四叉树结构的一个基本限制是,它只能捕获横向相关位置的水平和垂直边缘不连续。为了解决这一限制,最近的工作集中在将几何信息引入树形结构运动表示的节点上。在本文中,我们探索用单独的四叉树结构建模边界几何和运动。最近对四叉树表示的研究也强调了叶合并的好处。我们扩展了叶子合并范例,将分离的树结构用于边界几何和运动。为了实现有效的联合表示,我们在四叉树结构中引入了多项式运动模型和分段线性边界几何。实验结果表明,本文所采用的方法比以往基于四叉树的运动表示方法有了显著的改进。
{"title":"Motion modeling with separate quad-tree structures for geometry and motion","authors":"R. Mathew, D. Taubman","doi":"10.1109/MMSP.2008.4665106","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665106","url":null,"abstract":"Quad-tree structures are often used to model motion between frames of a video sequence. However, a fundamental limitation of the quad-tree structure is that it can only capture horizontal and vertical edge discontinuities at dyadically related locations. To address this limitation recent work has focused on the introduction of geometry information to nodes of tree structured motion representations. In this paper we explore modeling boundary geometry and motion with separate quadtree structures. Recent work into quad-tree representations have also highlighted the benefits of leaf merging. We extend the leaf merging paradigm to incorporate separate tree structures for boundary geometry and motion. To achieve an efficient joint representation we introduce polynomial motion models and piecewise linear boundary geometry to our quad-tree structures. Experimental results show that the approach taken in this paper provides significant improvement over previous quad-tree based motion representation schemes.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126796832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Standard-compliant multiple description image coding by spatial multiplexing and constrained least-squares restoration 基于空间复用和约束最小二乘恢复的符合标准的多重描述图像编码
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665102
Xiangjun Zhang, Xiaolin Wu
We propose a practical standard-compliant multiple description (MD) image coding technique. Multiple descriptions of an image are generated in the spatial domain by an adaptive prefiltering and uniform down sampling process. The resulting side descriptions are conventional square sample grids that are interleaved with one the other. As such each side description can be coded by any of the existing image compression standards. A side decoder reconstructs the input image by first decompressing the down-sampled image and then solving a least-squares inverse problem, guided by a two-dimensional windowed piecewise autoregressive model. The central decoder is algorithmically similar to the side decoder, but it improves the reconstruction quality by using received side descriptions as additional constraints when solving the underlying inverse problem. Compared with its predecessors the proposed image MD technique offers the lowest encoder complexity, complete standard compliance, competitive rate-distortion performance, and superior subjective quality.
我们提出了一种实用的符合标准的多重描述(MD)图像编码技术。通过自适应预滤波和均匀下采样过程,在空间域中生成图像的多个描述。所得到的边描述是彼此交错的常规方形样本网格。因此,可以用任何现有的图像压缩标准对每个侧描述进行编码。侧解码器通过首先对下采样图像进行解压缩,然后在二维加窗分段自回归模型的指导下求解最小二乘反问题来重建输入图像。中心解码器在算法上类似于侧解码器,但它通过在求解底层逆问题时使用接收侧描述作为附加约束来提高重构质量。与先前的图像MD技术相比,所提出的图像MD技术具有最低的编码器复杂性,完全符合标准,具有竞争力的率失真性能和优越的主观质量。
{"title":"Standard-compliant multiple description image coding by spatial multiplexing and constrained least-squares restoration","authors":"Xiangjun Zhang, Xiaolin Wu","doi":"10.1109/MMSP.2008.4665102","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665102","url":null,"abstract":"We propose a practical standard-compliant multiple description (MD) image coding technique. Multiple descriptions of an image are generated in the spatial domain by an adaptive prefiltering and uniform down sampling process. The resulting side descriptions are conventional square sample grids that are interleaved with one the other. As such each side description can be coded by any of the existing image compression standards. A side decoder reconstructs the input image by first decompressing the down-sampled image and then solving a least-squares inverse problem, guided by a two-dimensional windowed piecewise autoregressive model. The central decoder is algorithmically similar to the side decoder, but it improves the reconstruction quality by using received side descriptions as additional constraints when solving the underlying inverse problem. Compared with its predecessors the proposed image MD technique offers the lowest encoder complexity, complete standard compliance, competitive rate-distortion performance, and superior subjective quality.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127208499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Image registration by means of 3D octree correlation 基于三维八叉树相关的图像配准
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665132
C. Ruwwe, B. Keck, Oliver Rusch, U. Zölzer, Xavier Loison
With no calibrated camera setup at hand, careful inspection of the imagery is needed to guarantee a feasible 3D reconstruction result based upon the images. We propose a new approach for image registration based on reconstructed 3D octrees by voxel carving. Correlation of these models gives rise to a translation offset for a maximum intersection between different models from different images. Projecting the resulting three-dimensional translation offsets back into the image plane results in two two-dimensional image offsets that are used for the image registration.
由于手头没有校准的相机设置,需要仔细检查图像,以保证基于图像的可行3D重建结果。提出了一种基于体素雕刻重建的三维八叉树图像配准的新方法。这些模型的相关性使得不同图像的不同模型之间的最大交集产生平移偏移。将所得到的三维平移偏移量投影回所述图像平面,得到用于所述图像配准的两个二维图像偏移量。
{"title":"Image registration by means of 3D octree correlation","authors":"C. Ruwwe, B. Keck, Oliver Rusch, U. Zölzer, Xavier Loison","doi":"10.1109/MMSP.2008.4665132","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665132","url":null,"abstract":"With no calibrated camera setup at hand, careful inspection of the imagery is needed to guarantee a feasible 3D reconstruction result based upon the images. We propose a new approach for image registration based on reconstructed 3D octrees by voxel carving. Correlation of these models gives rise to a translation offset for a maximum intersection between different models from different images. Projecting the resulting three-dimensional translation offsets back into the image plane results in two two-dimensional image offsets that are used for the image registration.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127580306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Graphical modeling and decoding of human actions 人类行为的图形化建模和解码
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665070
W. Li, Zhengyou Zhang, Zicheng Liu
This paper presents a graphical model for learning and recognizing human actions. Specifically, we propose to encode actions in a weighted directed graph, referred to as action graph, where nodes of the graph represent salient postures that are used to characterize the actions and shared by all actions. The weight between two nodes measures the transitional probability between the two postures. An action is encoded as one or multiple paths in the action graph. The salient postures are modeled using Gaussian mixture models (GMM). Both the salient postures and action graph are automatically learned from training samples through unsupervised clustering and expectation and maximization (EM) algorithm. Experimental results have verified the performance of the proposed model, its tolerance to noise and viewpoints and its robustness across different subjects and datasets.
本文提出了一个学习和识别人类行为的图形模型。具体来说,我们建议在加权有向图中编码动作,称为动作图,其中图的节点表示用于表征动作并由所有动作共享的显著姿势。两个节点之间的权重衡量了两个姿势之间的过渡概率。动作在动作图中被编码为一条或多条路径。突出姿态采用高斯混合模型(GMM)建模。通过无监督聚类和期望最大化(EM)算法从训练样本中自动学习显著姿态和动作图。实验结果验证了该模型的性能、对噪声和视点的容忍度以及对不同主题和数据集的鲁棒性。
{"title":"Graphical modeling and decoding of human actions","authors":"W. Li, Zhengyou Zhang, Zicheng Liu","doi":"10.1109/MMSP.2008.4665070","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665070","url":null,"abstract":"This paper presents a graphical model for learning and recognizing human actions. Specifically, we propose to encode actions in a weighted directed graph, referred to as action graph, where nodes of the graph represent salient postures that are used to characterize the actions and shared by all actions. The weight between two nodes measures the transitional probability between the two postures. An action is encoded as one or multiple paths in the action graph. The salient postures are modeled using Gaussian mixture models (GMM). Both the salient postures and action graph are automatically learned from training samples through unsupervised clustering and expectation and maximization (EM) algorithm. Experimental results have verified the performance of the proposed model, its tolerance to noise and viewpoints and its robustness across different subjects and datasets.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122531784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
CCL-SVC: Optimizing user experience of broadcasting video on computation capability limited handheld devices CCL-SVC:优化在计算能力有限的手持设备上播放视频的用户体验
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665117
Jingyuan Wang, Lifeng Sun, Bin Li, Meng Zhang, Shiqiang Yang
In this paper, we propose a novel scheme using computing complexity layered scalable video coding (CCLSVC) to optimize the user experience of broadcasting video in the computing capability limited handheld terminals. To address the heterogeneity of computing capability among different handheld devices, we employ hierarchal B reference structure of SVC to divide the frames into multiple computing complexity layers (CC Layers) in server side. The handheld clients simply choose to decode the frames in their corresponding layers in terms of their computation capability to maximize the video PSNR. We have proved that the optimal CC Layers division problem is a precedence constrained scheduling problem, which is an NP-complete problem. And we further propose our fast greedy method to approximately get optimized broadcasting video playback PSNR. The simulation shows that our method is superior to temporal SVC and random frame discarding method.
本文提出了一种基于计算复杂度分层可扩展视频编码(CCLSVC)的新方案,以优化在计算能力有限的手持终端下播放视频的用户体验。为了解决不同手持设备之间计算能力的异质性,我们采用SVC的层次B参考结构将帧在服务器端划分为多个计算复杂性层(CC层)。手持客户端根据其计算能力简单地选择解码相应层中的帧,以最大化视频的PSNR。证明了最优CC层划分问题是一个有优先级约束的调度问题,是一个np完全问题。在此基础上,提出了一种快速贪婪算法来近似求解优化后的广播视频播放PSNR。仿真结果表明,该方法优于时间支持向量机和随机帧丢弃方法。
{"title":"CCL-SVC: Optimizing user experience of broadcasting video on computation capability limited handheld devices","authors":"Jingyuan Wang, Lifeng Sun, Bin Li, Meng Zhang, Shiqiang Yang","doi":"10.1109/MMSP.2008.4665117","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665117","url":null,"abstract":"In this paper, we propose a novel scheme using computing complexity layered scalable video coding (CCLSVC) to optimize the user experience of broadcasting video in the computing capability limited handheld terminals. To address the heterogeneity of computing capability among different handheld devices, we employ hierarchal B reference structure of SVC to divide the frames into multiple computing complexity layers (CC Layers) in server side. The handheld clients simply choose to decode the frames in their corresponding layers in terms of their computation capability to maximize the video PSNR. We have proved that the optimal CC Layers division problem is a precedence constrained scheduling problem, which is an NP-complete problem. And we further propose our fast greedy method to approximately get optimized broadcasting video playback PSNR. The simulation shows that our method is superior to temporal SVC and random frame discarding method.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123861336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sparse approximations for joint source-channel coding 联合信源信道编码的稀疏逼近
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665126
G. Rath, C. Guillemot, J. Fuchs
This paper considers the application of sparse approximations in a joint source-channel (JSC) coding framework. The considered JSC coded system employs a real number BCH code on the input signal before the signal is quantized and further processed. Under an impulse channel noise model, the decoding of error is posed as a sparse approximation problem. The orthogonal matching pursuit (OMP) and basis pursuit (BP) algorithms are compared with the syndrome decoding algorithm in terms of mean square reconstruction error. It is seen that, with a Gauss-Markov source and Bernoulli-Gaussian channel noise, the BP outperforms the syndrome decoding and the OMP at higher noise levels. In the case of image transmission with channel bit errors, the BP outperforms the other two decoding algorithms consistently.
本文研究了稀疏逼近在联合信源信道编码框架中的应用。所考虑的JSC编码系统在对信号进行量化和进一步处理之前,对输入信号采用实数BCH码。在脉冲信道噪声模型下,误差的解码是一个稀疏逼近问题。在均方重构误差方面,将正交匹配追踪(OMP)和基追踪(BP)算法与综合征解码算法进行了比较。可以看出,在具有高斯-马尔可夫源和伯努利-高斯信道噪声的情况下,BP在更高噪声水平下优于综合征解码和OMP。在有信道误码的图像传输情况下,BP算法始终优于其他两种译码算法。
{"title":"Sparse approximations for joint source-channel coding","authors":"G. Rath, C. Guillemot, J. Fuchs","doi":"10.1109/MMSP.2008.4665126","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665126","url":null,"abstract":"This paper considers the application of sparse approximations in a joint source-channel (JSC) coding framework. The considered JSC coded system employs a real number BCH code on the input signal before the signal is quantized and further processed. Under an impulse channel noise model, the decoding of error is posed as a sparse approximation problem. The orthogonal matching pursuit (OMP) and basis pursuit (BP) algorithms are compared with the syndrome decoding algorithm in terms of mean square reconstruction error. It is seen that, with a Gauss-Markov source and Bernoulli-Gaussian channel noise, the BP outperforms the syndrome decoding and the OMP at higher noise levels. In the case of image transmission with channel bit errors, the BP outperforms the other two decoding algorithms consistently.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123996434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Singular block Toeplitz matrix approximation and application to multi-microphone speech dereverberation 奇异块Toeplitz矩阵逼近及其在多麦克风语音去噪中的应用
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665048
Samir-Mohamad Omar, D. Slock
We consider the blind multichannel dereverberation problem for a single source. We have shown before [5] that the single-input multi-output (SIMO) reverberation filter can be equalized blindly by applying MIMO Linear Prediction (LP) to its output (after SISO input pre-whitening). In this paper, we investigate the LP-based dereverberation in a noisy environment, and/or under acoustic channel length underestimation. Considering ambient noise and late reverberation as additive noises, we propose to introduce a postfilter that transforms the MIMO prediction filter into a somewhat longer equalizer. The postfilter allows to equalize to non-zero delay. Both MMSE-ZF and MMSE design criteria are considered here for the postfilter.We also focus here on computationally efficient (FFT based) block Toeplitz covariance matrix enhancement that enforces the SIMO filtered source plus white noise structure before applying MIMO LP. A second suggested refinement is an iterative refinement between SISO and MIMO LP. Simulations show that the proposed scheme is robust in noisy environments, and performs better compared to the classic Delay-&-Predict equalizer and the Delay-&-Sum beamformer.
研究了单声源下的盲多信道去噪问题。我们之前已经表明[5],单输入多输出(SIMO)混响滤波器可以通过对其输出(在SISO输入预白化之后)应用MIMO线性预测(LP)来盲目均衡。在本文中,我们研究了在噪声环境和/或声通道长度低估下基于lp的去噪。考虑到环境噪声和后期混响作为加性噪声,我们建议引入一个后滤波器,将MIMO预测滤波器转换为一个稍长的均衡器。后置滤波器允许将延迟均衡为非零延迟。对于后滤波器,这里考虑了MMSE- zf和MMSE设计标准。我们还将重点放在计算效率(基于FFT)的块Toeplitz协方差矩阵增强上,该增强在应用MIMO LP之前加强SIMO滤波源和白噪声结构。第二个建议的改进是在SISO和MIMO LP之间进行迭代改进。仿真结果表明,该方案在噪声环境下具有较强的鲁棒性,与传统的延迟&预测均衡器和延迟&和波束形成器相比,具有更好的性能。
{"title":"Singular block Toeplitz matrix approximation and application to multi-microphone speech dereverberation","authors":"Samir-Mohamad Omar, D. Slock","doi":"10.1109/MMSP.2008.4665048","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665048","url":null,"abstract":"We consider the blind multichannel dereverberation problem for a single source. We have shown before [5] that the single-input multi-output (SIMO) reverberation filter can be equalized blindly by applying MIMO Linear Prediction (LP) to its output (after SISO input pre-whitening). In this paper, we investigate the LP-based dereverberation in a noisy environment, and/or under acoustic channel length underestimation. Considering ambient noise and late reverberation as additive noises, we propose to introduce a postfilter that transforms the MIMO prediction filter into a somewhat longer equalizer. The postfilter allows to equalize to non-zero delay. Both MMSE-ZF and MMSE design criteria are considered here for the postfilter.We also focus here on computationally efficient (FFT based) block Toeplitz covariance matrix enhancement that enforces the SIMO filtered source plus white noise structure before applying MIMO LP. A second suggested refinement is an iterative refinement between SISO and MIMO LP. Simulations show that the proposed scheme is robust in noisy environments, and performs better compared to the classic Delay-&-Predict equalizer and the Delay-&-Sum beamformer.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116736593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adaptive Multiple Experts System for personal identification using facial behaviour biometrics 基于面部行为生物识别的自适应多专家系统
Pub Date : 2008-11-05 DOI: 10.1109/MMSP.2008.4665158
Pohsiang Tsai, Tich Phuoc Tran, T. Hintz, T. Jan
Physiological and/or behavioural characteristics of humans such as face, gait and/or voice have been used in biometric recognition technology. Apart from these characteristics (which have been reported in the literature), the hypothesis of this research was to investigate if facial behaviour could be used for human identification. We analysed and proposed a multiple experts system, called Adaptive Multiple Experts System (AMES), for validating our hypothesis and analysis. We used the Japanese Female Facial Expression (JAFFE) database as it provides the facial behaviour traits for data collection. The experimental results indicate that facial behaviours may provide information about individual difference and, thus may be used as another behavioural biometric.
人类的生理和/或行为特征,如面部、步态和/或声音,已被用于生物识别技术。除了这些特征(已经在文献中报道),这项研究的假设是调查面部行为是否可以用于人类识别。我们分析并提出了一个多专家系统,称为自适应多专家系统(AMES),用于验证我们的假设和分析。我们使用了日本女性面部表情(JAFFE)数据库,因为它为数据收集提供了面部行为特征。实验结果表明,面部行为可以提供关于个体差异的信息,因此可以用作另一种行为生物特征。
{"title":"Adaptive Multiple Experts System for personal identification using facial behaviour biometrics","authors":"Pohsiang Tsai, Tich Phuoc Tran, T. Hintz, T. Jan","doi":"10.1109/MMSP.2008.4665158","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665158","url":null,"abstract":"Physiological and/or behavioural characteristics of humans such as face, gait and/or voice have been used in biometric recognition technology. Apart from these characteristics (which have been reported in the literature), the hypothesis of this research was to investigate if facial behaviour could be used for human identification. We analysed and proposed a multiple experts system, called Adaptive Multiple Experts System (AMES), for validating our hypothesis and analysis. We used the Japanese Female Facial Expression (JAFFE) database as it provides the facial behaviour traits for data collection. The experimental results indicate that facial behaviours may provide information about individual difference and, thus may be used as another behavioural biometric.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115201708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2008 IEEE 10th Workshop on Multimedia Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1