首页 > 最新文献

Proceedings of the International Conference on Signal Processing and Multimedia Applications最新文献

英文 中文
Color face recognition: A multilinear-PCA approach combined with Hidden Markov Models 彩色人脸识别:一种结合隐马尔可夫模型的多线性pca方法
D. Alexiadis, Dimitrios P. Glaroudis
Hidden Markov Models (HMMs) have been successfully applied to the face recognition problem. However, existing HMM-based techniques use feature (observation) vectors that are extracted only from the images' luminance component, while it is known that color provides significant information. In contrast to the classical PCA approach, Multilinear PCA (MPCA) seems to be an appropriate scheme for dimensionality reduction and feature extraction from color images, handling the color channels in a natural, “holistic” manner. In this paper, we propose an MPCA-based approach for color face recognition, that exploits the strengths of HMMs as classifiers. The proposed methodology was tested on three publicly available color databases and produced high recognition rates, compared to existing HMM-based methodologies.
隐马尔可夫模型已经成功地应用于人脸识别问题。然而,现有的基于hmm的技术使用的特征(观察)向量仅从图像的亮度分量中提取,而众所周知,颜色提供了重要的信息。与传统的主成分分析方法相比,多线性主成分分析(MPCA)似乎是一种适合于彩色图像降维和特征提取的方案,以一种自然的、“整体”的方式处理颜色通道。在本文中,我们提出了一种基于mpca的彩色人脸识别方法,利用hmm作为分类器的优势。与现有的基于hmm的方法相比,所提出的方法在三个公开可用的颜色数据库上进行了测试,并产生了较高的识别率。
{"title":"Color face recognition: A multilinear-PCA approach combined with Hidden Markov Models","authors":"D. Alexiadis, Dimitrios P. Glaroudis","doi":"10.5220/0003445501130119","DOIUrl":"https://doi.org/10.5220/0003445501130119","url":null,"abstract":"Hidden Markov Models (HMMs) have been successfully applied to the face recognition problem. However, existing HMM-based techniques use feature (observation) vectors that are extracted only from the images' luminance component, while it is known that color provides significant information. In contrast to the classical PCA approach, Multilinear PCA (MPCA) seems to be an appropriate scheme for dimensionality reduction and feature extraction from color images, handling the color channels in a natural, “holistic” manner. In this paper, we propose an MPCA-based approach for color face recognition, that exploits the strengths of HMMs as classifiers. The proposed methodology was tested on three publicly available color databases and produced high recognition rates, compared to existing HMM-based methodologies.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129275023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Image denoising based on Laplace distribution with local parameters in Lapped Transform domain 基于拉普拉斯变换域局部参数的图像去噪
V. K. Nath, A. Mahanta
In this paper, we present a new image denoising method based on statistical modeling of Lapped Transform (LT) coefficients. The lapped transform coefficients are first rearranged into wavelet like structure, then the rearranged coefficient subband statistics are modeled in a similar way like wavelet coefficients. We propose to model the rearranged LT coefficients in a subband using Laplace probability density function (pdf) with local variance. This simple distribution is well able to model the locality and the heavy tailed property of lapped transform coefficients. A maximum a posteriori (MAP) estimator using the Laplace probability density function (pdf) with local variance is used for the estimation of noise free lapped transform coefficients. Experimental results show that the proposed low complexity image denoising method outperforms several wavelet based image denoising techniques and also outperforms two existing LT based image denoising schemes. Our main contribution in this paper is to use the local Laplace prior for statistical modeling of LT coefficients and to use MAP estimation procedure with this proposed prior to restore the noisy image LT coefficients.
本文提出了一种基于ltp系数统计建模的图像去噪方法。首先将叠置的变换系数重组为类小波结构,然后将重组后的系数子带统计量以类似于小波系数的方式建模。我们建议使用具有局部方差的拉普拉斯概率密度函数(pdf)来模拟子带中重排的LT系数。这种简单的分布很好地模拟了叠接变换系数的局部性和重尾性。利用具有局部方差的拉普拉斯概率密度函数(pdf)的最大后验估计器来估计无噪声的重叠变换系数。实验结果表明,所提出的低复杂度图像去噪方法优于几种基于小波的图像去噪技术,也优于现有的两种基于LT的图像去噪方案。我们在本文中的主要贡献是使用局部拉普拉斯先验对LT系数进行统计建模,并使用MAP估计程序与该提出的先验恢复噪声图像的LT系数。
{"title":"Image denoising based on Laplace distribution with local parameters in Lapped Transform domain","authors":"V. K. Nath, A. Mahanta","doi":"10.5220/0003516900670072","DOIUrl":"https://doi.org/10.5220/0003516900670072","url":null,"abstract":"In this paper, we present a new image denoising method based on statistical modeling of Lapped Transform (LT) coefficients. The lapped transform coefficients are first rearranged into wavelet like structure, then the rearranged coefficient subband statistics are modeled in a similar way like wavelet coefficients. We propose to model the rearranged LT coefficients in a subband using Laplace probability density function (pdf) with local variance. This simple distribution is well able to model the locality and the heavy tailed property of lapped transform coefficients. A maximum a posteriori (MAP) estimator using the Laplace probability density function (pdf) with local variance is used for the estimation of noise free lapped transform coefficients. Experimental results show that the proposed low complexity image denoising method outperforms several wavelet based image denoising techniques and also outperforms two existing LT based image denoising schemes. Our main contribution in this paper is to use the local Laplace prior for statistical modeling of LT coefficients and to use MAP estimation procedure with this proposed prior to restore the noisy image LT coefficients.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126532421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Context based watermarking of secure JPEG-LS images 基于上下文的安全JPEG-LS图像水印
A. Subramanyam, S. Emmanuel
JPEG-LS is generally used to compress bio-medical or high dynamic range images. These compressed images sometime needs to be encrypted for confidentiality. In addition, the secured JPEG-LS compressed images may need to be watermarked to detect copyright violation, track different users handling the image, prove ownership or for authentication purpose. In the proposed technique, watermark is embedded in the context of the compressed image while the Golomb coded bit stream is encrypted. The extraction of watermark can be done during JPEG-LS decoding. The advantage of this watermarking scheme is that the media need not be decompressed or decrypted for embedding watermark thus saving computational complexity while preserving the confidentiality of the media.
JPEG-LS通常用于压缩生物医学或高动态范围图像。为了保密,有时需要对这些压缩图像进行加密。此外,安全的JPEG-LS压缩图像可能需要加水印,以检测侵犯版权、跟踪处理图像的不同用户、证明所有权或用于身份验证目的。在该技术中,水印嵌入到压缩图像的上下文中,同时对Golomb编码的比特流进行加密。水印的提取可以在JPEG-LS解码过程中完成。该方案的优点是不需要对嵌入水印的媒体进行解压缩和解密,在保证媒体保密性的同时节省了计算量。
{"title":"Context based watermarking of secure JPEG-LS images","authors":"A. Subramanyam, S. Emmanuel","doi":"10.5220/0003446201610166","DOIUrl":"https://doi.org/10.5220/0003446201610166","url":null,"abstract":"JPEG-LS is generally used to compress bio-medical or high dynamic range images. These compressed images sometime needs to be encrypted for confidentiality. In addition, the secured JPEG-LS compressed images may need to be watermarked to detect copyright violation, track different users handling the image, prove ownership or for authentication purpose. In the proposed technique, watermark is embedded in the context of the compressed image while the Golomb coded bit stream is encrypted. The extraction of watermark can be done during JPEG-LS decoding. The advantage of this watermarking scheme is that the media need not be decompressed or decrypted for embedding watermark thus saving computational complexity while preserving the confidentiality of the media.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"315 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129598443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stereo vision matching over single-channel color-based segmentation 基于单通道颜色分割的立体视觉匹配
Pablo Revuelta, B. Ruíz-Mezcua, J. M. S. Peña, J. Thiran
Stereo vision is one of the most important passive methods to extract depth maps. Among them, there are several approaches with advantages and disadvantages. Computational load is especially important in both the block matching and graphical cues approaches. In a previous work, we proposed a region growing segmentation solution to the matching process. In that work, matching was carried out over statistical descriptors of the image regions, commonly referred to as characteristic vectors, whose number is, by definition, lower than the possible block matching possibilities. This first version was defined for gray scale images. Although efficient, the gray scale algorithm presented some important disadvantages, mostly related to the segmentation process. In this article, we present a pre-processing tool to compute gray scale images that maintains the relevant color information, preserving both the advantages of gray scale segmentation and those of color image processing. The results of this improved algorithm are shown and compared to those obtained by the gray scale segmentation and matching algorithm, demonstrating a significant improvement of the computed depth maps.
立体视觉是提取深度图最重要的被动方法之一。其中,有几种方法的优点和缺点。在块匹配和图形线索方法中,计算负荷尤为重要。在之前的工作中,我们提出了一种区域增长分割的匹配过程解决方案。在这项工作中,对图像区域的统计描述符进行匹配,通常称为特征向量,其数量根据定义低于可能的块匹配可能性。第一个版本是为灰度图像定义的。灰度算法虽然有效,但也存在一些重要的缺点,主要与分割过程有关。在本文中,我们提出了一种预处理工具来计算灰度图像,保持相关的颜色信息,同时保留了灰度分割和彩色图像处理的优点。最后给出了改进算法的结果,并与灰度分割匹配算法的结果进行了比较,结果表明该算法在深度图计算上有了明显的改进。
{"title":"Stereo vision matching over single-channel color-based segmentation","authors":"Pablo Revuelta, B. Ruíz-Mezcua, J. M. S. Peña, J. Thiran","doi":"10.5220/0003473201260130","DOIUrl":"https://doi.org/10.5220/0003473201260130","url":null,"abstract":"Stereo vision is one of the most important passive methods to extract depth maps. Among them, there are several approaches with advantages and disadvantages. Computational load is especially important in both the block matching and graphical cues approaches. In a previous work, we proposed a region growing segmentation solution to the matching process. In that work, matching was carried out over statistical descriptors of the image regions, commonly referred to as characteristic vectors, whose number is, by definition, lower than the possible block matching possibilities. This first version was defined for gray scale images. Although efficient, the gray scale algorithm presented some important disadvantages, mostly related to the segmentation process. In this article, we present a pre-processing tool to compute gray scale images that maintains the relevant color information, preserving both the advantages of gray scale segmentation and those of color image processing. The results of this improved algorithm are shown and compared to those obtained by the gray scale segmentation and matching algorithm, demonstrating a significant improvement of the computed depth maps.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132438649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
AER spike-processing filter simulator: Implementation of an AER simulator based on cellular automata AER峰值处理滤波器模拟器:基于元胞自动机的AER模拟器的实现
Manuel Rivas Pérez, A. Linares-Barranco, A. Jiménez-Fernandez, A. C. Balcells, G. Jiménez-Moreno
Spike-based systems are neuro-inspired circuits implementations traditionally used for sensory systems or sensor signal processing. Address-Event-Representation (AER) is a neuromorphic communication protocol for transferring asynchronous events between VLSI spike-based chips. These neuro-inspired implementations allow developing complex, multilayer, multichip neuromorphic systems and have been used to design sensor chips, such as retinas and cochlea, processing chips, e.g. filters, and learning chips. Furthermore, Cellular Automata (CA) is a bio-inspired processing model for problem solving. This approach divides the processing synchronous cells which change their states at the same time in order to get the solution. This paper presents a software simulator able to gather several spike-based elements into the same workspace in order to test a CA architecture based on AER before a hardware implementation. Furthermore this simulator produces VHDL for testing the AER-CA into the FPGA of the USB-AER AER-tool.
基于脉冲的系统是传统上用于感官系统或传感器信号处理的神经启发电路实现。地址-事件表示(AER)是一种神经形态的通信协议,用于在基于VLSI尖峰的芯片之间传输异步事件。这些受神经启发的实现允许开发复杂的、多层的、多芯片的神经形态系统,并已用于设计传感器芯片,如视网膜和耳蜗,处理芯片,如过滤器和学习芯片。此外,元胞自动机(CA)是一种生物启发的解决问题的处理模型。该方法划分处理同步单元,这些单元同时改变其状态以获得解决方案。为了在硬件实现之前测试基于AER的CA体系结构,本文提出了一个软件模拟器,能够将几个基于峰值的元素收集到同一个工作空间中。此外,该模拟器还生成了用于在USB-AER aer工具的FPGA中测试AER-CA的VHDL。
{"title":"AER spike-processing filter simulator: Implementation of an AER simulator based on cellular automata","authors":"Manuel Rivas Pérez, A. Linares-Barranco, A. Jiménez-Fernandez, A. C. Balcells, G. Jiménez-Moreno","doi":"10.5220/0003525900910096","DOIUrl":"https://doi.org/10.5220/0003525900910096","url":null,"abstract":"Spike-based systems are neuro-inspired circuits implementations traditionally used for sensory systems or sensor signal processing. Address-Event-Representation (AER) is a neuromorphic communication protocol for transferring asynchronous events between VLSI spike-based chips. These neuro-inspired implementations allow developing complex, multilayer, multichip neuromorphic systems and have been used to design sensor chips, such as retinas and cochlea, processing chips, e.g. filters, and learning chips. Furthermore, Cellular Automata (CA) is a bio-inspired processing model for problem solving. This approach divides the processing synchronous cells which change their states at the same time in order to get the solution. This paper presents a software simulator able to gather several spike-based elements into the same workspace in order to test a CA architecture based on AER before a hardware implementation. Furthermore this simulator produces VHDL for testing the AER-CA into the FPGA of the USB-AER AER-tool.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123012124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Optimal combination of low-level features for surveillance object retrieval 用于监视对象检索的底层特征的最佳组合
Virginia Fernandez Arguedas, K. Chandramouli, Qianni Zhang, E. Izquierdo
In this paper, a low-level multi-feature fusion based classifier is presented for studying the performance of an object retrieval method from surveillance videos. The proposed retrieval framework exploits the recent developments in evolutionary computation algorithm based on biologically inspired optimisation techniques. The multi-descriptor space is formed with a combination of four MPEG-7 visual features. The proposed approach has been evaluated against kernel machines for objects extracted from AVSS 2007 dataset.
本文提出了一种基于低层次多特征融合的分类器,用于研究监控视频中目标检索方法的性能。提出的检索框架利用了基于生物启发优化技术的进化计算算法的最新发展。多描述符空间由四个MPEG-7视觉特征组合而成。针对AVSS 2007数据集中提取的对象,对所提出的方法进行了核机评估。
{"title":"Optimal combination of low-level features for surveillance object retrieval","authors":"Virginia Fernandez Arguedas, K. Chandramouli, Qianni Zhang, E. Izquierdo","doi":"10.5220/0003527101870192","DOIUrl":"https://doi.org/10.5220/0003527101870192","url":null,"abstract":"In this paper, a low-level multi-feature fusion based classifier is presented for studying the performance of an object retrieval method from surveillance videos. The proposed retrieval framework exploits the recent developments in evolutionary computation algorithm based on biologically inspired optimisation techniques. The multi-descriptor space is formed with a combination of four MPEG-7 visual features. The proposed approach has been evaluated against kernel machines for objects extracted from AVSS 2007 dataset.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115280710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Segmentation of touching Lanna characters 触摸兰纳字符的分割
Sakkayaphop Pravesjit, A. Thammano
Character segmentation is an important preprocessing step for character recognition. Incorrectly segmented characters are not likely to be correctly recognized. Touching characters is one of the most difficult segmentation cases which arise when handwritten characters are being segmented. Therefore, this paper emphasizes the interest to the segmentation of touching and overlapping characters. In the proposed character segmentation process, the bounding box analysis is initially employed to segment the document image into images of isolated characters and images of touching characters. The thinning algorithm is applied to extract the skeleton of the touching characters. Next, the skeleton of the touching characters is separated into several pieces. Finally, the separated pieces of the touching characters are put back to reconstruct two isolated characters. The proposed algorithm achieves an accuracy of 75.3%.
字符分割是字符识别的重要预处理步骤。不正确分割的字符不太可能被正确识别。触摸字符是手写体字符分割中最困难的分割案例之一。因此,本文着重对触摸和重叠字符的分割进行了研究。在本文提出的字符分割过程中,首先采用边界盒分析将文档图像分割为孤立字符图像和接触字符图像。采用细化算法提取触摸字符的骨架。接下来,将动人人物的骨架分成几部分。最后,将感人人物的分离片段放回原处,重建两个孤立的人物。该算法的准确率为75.3%。
{"title":"Segmentation of touching Lanna characters","authors":"Sakkayaphop Pravesjit, A. Thammano","doi":"10.5220/0003511300470051","DOIUrl":"https://doi.org/10.5220/0003511300470051","url":null,"abstract":"Character segmentation is an important preprocessing step for character recognition. Incorrectly segmented characters are not likely to be correctly recognized. Touching characters is one of the most difficult segmentation cases which arise when handwritten characters are being segmented. Therefore, this paper emphasizes the interest to the segmentation of touching and overlapping characters. In the proposed character segmentation process, the bounding box analysis is initially employed to segment the document image into images of isolated characters and images of touching characters. The thinning algorithm is applied to extract the skeleton of the touching characters. Next, the skeleton of the touching characters is separated into several pieces. Finally, the separated pieces of the touching characters are put back to reconstruct two isolated characters. The proposed algorithm achieves an accuracy of 75.3%.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134315562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What are good CGS/MGS configurations for H.264 quality scalable coding? 什么是好的CGS/MGS配置为H.264质量可扩展编码?
Shih-Hsuan Yang, Wei-Lune Tang
Scalable video coding (SVC) encodes image sequences into a single bit stream that can be adapted to various network and terminal capabilities. The H.264/AVC standard includes three kinds of video scalability, spatial scalability, temporal scalability, and quality scalability. Among them, quality scalability refers to image sequences of the same spatio-temporal resolution but with different fidelity levels. Two options of quality scalability are adopted in H.264/AVC, namely CGS (coarse-grain quality scalable coding) and MGS (medium-grain quality scalability), and they may be used in combinations. A refinement layer in CGS is obtained by re-quantizing the (residual) texture signal with a smaller quantization step size (QP). Using the CGS alone, however, may incur notable PSNR penalty and high encoding complexity if numerous rate points are required. MGS partitions the transform coefficients of a CGS layer into several MGS sub-layers and distributes them in different NAL units. The use of MGS may increase the adaptation flexibility, improve the coding efficiency, and reduce the coding complexity. In this paper, we investigate the CGS/MGS configurations that lead to good performance. From extensive experiments using the JSVM (Joint Scalable Video Model), however, we find that MGS should be carefully employed. Although MGS always reduces the encoding complexity as compared to using CGS alone, its rate-distortion is unstable. While MGS typically provides better or comparable rate-distortion performance for the cases with eight rate points or more, some configurations may cause an unexpected PSNR drop with an increased bit rate. This anomaly is currently under investigation.
可扩展视频编码(SVC)将图像序列编码成可以适应各种网络和终端功能的单个比特流。H.264/AVC标准包括三种视频可扩展性:空间可扩展性、时间可扩展性和质量可扩展性。其中,质量可扩展性是指具有相同时空分辨率但保真度不同的图像序列。H.264/AVC采用了两种质量可扩展性选项,即CGS(粗粒度质量可扩展性编码)和MGS(中粒度质量可扩展性编码),它们可以组合使用。采用较小的量化步长(QP)对(残余)纹理信号进行再量化,得到CGS中的细化层。然而,如果需要大量的速率点,单独使用CGS可能会导致显著的PSNR损失和高编码复杂性。MGS将一个CGS层的变换系数划分为几个MGS子层,并将它们分布在不同的NAL单位中。使用MGS可以增加自适应灵活性,提高编码效率,降低编码复杂度。在本文中,我们研究了能够带来良好性能的CGS/MGS配置。然而,从使用JSVM(联合可扩展视频模型)的大量实验中,我们发现应该谨慎使用MGS。虽然与单独使用CGS相比,MGS总能降低编码复杂度,但它的率失真是不稳定的。虽然MGS通常在8个或更多速率点的情况下提供更好或相当的速率失真性能,但随着比特率的增加,某些配置可能会导致意想不到的PSNR下降。这一异常现象目前正在调查中。
{"title":"What are good CGS/MGS configurations for H.264 quality scalable coding?","authors":"Shih-Hsuan Yang, Wei-Lune Tang","doi":"10.5220/0003608201040109","DOIUrl":"https://doi.org/10.5220/0003608201040109","url":null,"abstract":"Scalable video coding (SVC) encodes image sequences into a single bit stream that can be adapted to various network and terminal capabilities. The H.264/AVC standard includes three kinds of video scalability, spatial scalability, temporal scalability, and quality scalability. Among them, quality scalability refers to image sequences of the same spatio-temporal resolution but with different fidelity levels. Two options of quality scalability are adopted in H.264/AVC, namely CGS (coarse-grain quality scalable coding) and MGS (medium-grain quality scalability), and they may be used in combinations. A refinement layer in CGS is obtained by re-quantizing the (residual) texture signal with a smaller quantization step size (QP). Using the CGS alone, however, may incur notable PSNR penalty and high encoding complexity if numerous rate points are required. MGS partitions the transform coefficients of a CGS layer into several MGS sub-layers and distributes them in different NAL units. The use of MGS may increase the adaptation flexibility, improve the coding efficiency, and reduce the coding complexity. In this paper, we investigate the CGS/MGS configurations that lead to good performance. From extensive experiments using the JSVM (Joint Scalable Video Model), however, we find that MGS should be carefully employed. Although MGS always reduces the encoding complexity as compared to using CGS alone, its rate-distortion is unstable. While MGS typically provides better or comparable rate-distortion performance for the cases with eight rate points or more, some configurations may cause an unexpected PSNR drop with an increased bit rate. This anomaly is currently under investigation.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128549929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Real-time face recognition with GPUs: A DCT-based face recognition system using graphics processing unit 基于图形处理器的实时人脸识别:一个基于dct的人脸识别系统
D. Alexiadis, A. Papastergiou, A. Hatzigaidas
In this paper, we present an implementation of a 2-D DCT-based face recognition system, which uses a high performance parallel computing architecture, based on Graphics Processing Units (GPUs). Comparisons between the GPU-based and the “gold” CPU-based implementation in terms of execution time have been made. They show that the GPU implementation (NVIDIA GeForce GTS 250) is about 50 times faster than the CPU-based one (Intel Dual Core 1.83GHz), allowing the real-time operation of the developed face recognition system. Additionally, comparisons of the DCT-based approach with the PCA-based face recognition methodology shows that the DCT-based approach can achieve comparable recognition hit rates.
在本文中,我们提出了一个基于二维dct的人脸识别系统的实现,该系统使用基于图形处理单元(gpu)的高性能并行计算架构。在执行时间方面,对基于gpu和“黄金”基于cpu的实现进行了比较。他们表明,GPU实现(NVIDIA GeForce GTS 250)比基于cpu的实现(英特尔双核1.83GHz)快约50倍,允许开发的人脸识别系统实时运行。此外,基于dct的人脸识别方法与基于pca的人脸识别方法的比较表明,基于dct的人脸识别方法可以达到相当的识别命中率。
{"title":"Real-time face recognition with GPUs: A DCT-based face recognition system using graphics processing unit","authors":"D. Alexiadis, A. Papastergiou, A. Hatzigaidas","doi":"10.5220/0003445601200125","DOIUrl":"https://doi.org/10.5220/0003445601200125","url":null,"abstract":"In this paper, we present an implementation of a 2-D DCT-based face recognition system, which uses a high performance parallel computing architecture, based on Graphics Processing Units (GPUs). Comparisons between the GPU-based and the “gold” CPU-based implementation in terms of execution time have been made. They show that the GPU implementation (NVIDIA GeForce GTS 250) is about 50 times faster than the CPU-based one (Intel Dual Core 1.83GHz), allowing the real-time operation of the developed face recognition system. Additionally, comparisons of the DCT-based approach with the PCA-based face recognition methodology shows that the DCT-based approach can achieve comparable recognition hit rates.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130553949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Four-phase re-speaker training system 四阶段复说训练系统
A. Pražák, Zdenek Loose, J. Psutka, V. Radová, L. Müller
Since the re-speaker approach to the automatic captioning of TV broadcastings using large vocabulary continuous speech recognition (LVCSR) is on the increase, there is also a growing demand for training systems that would allow new speakers to learn the procedure. This paper describes a specially designed re-speaker training system that provides gradual four-phase tutoring process with quantitative indicators of a trainee progress to enable faster (and thus cheaper) training of the re-speakers. The performance evaluation of three re-speakers who were trained on the proposed system is also reported.
由于使用大词汇量连续语音识别(LVCSR)的电视广播自动字幕的重新说话者方法正在增加,因此对允许新说话者学习该程序的培训系统的需求也在不断增长。本文介绍了一个专门设计的再讲者培训系统,该系统提供了渐进的四阶段辅导过程,并提供了学员进步的量化指标,从而使再讲者的培训更快(因此更便宜)。还报告了接受拟议系统培训的三名重新发言者的业绩评价。
{"title":"Four-phase re-speaker training system","authors":"A. Pražák, Zdenek Loose, J. Psutka, V. Radová, L. Müller","doi":"10.5220/0003604502170220","DOIUrl":"https://doi.org/10.5220/0003604502170220","url":null,"abstract":"Since the re-speaker approach to the automatic captioning of TV broadcastings using large vocabulary continuous speech recognition (LVCSR) is on the increase, there is also a growing demand for training systems that would allow new speakers to learn the procedure. This paper describes a specially designed re-speaker training system that provides gradual four-phase tutoring process with quantitative indicators of a trainee progress to enable faster (and thus cheaper) training of the re-speakers. The performance evaluation of three re-speakers who were trained on the proposed system is also reported.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the International Conference on Signal Processing and Multimedia Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1