Proceedings of the International Conference on Signal Processing and Multimedia Applications最新文献

英文中文

Estimation-decoding on LDPC-based 2D-barcodes 基于ldpc的二维条码估计译码

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003457400340039

W. Proß, M. Otesteanu, F. Quint

In this paper we propose an extension of the Estimation-Decoding algorithm for the decoding of our Data Matrix Code (DMC), which is based on Low-Density-Parity-Check (LDPC) codes and is designed for use in industrial environment. To include possible damages in the channel-model, a Markov-modulated Gaussian channel (MMGC) was chosen to represent everything in between the embossing of a LDPC-based DMC and the camera-based acquisition. The MMGC is based on a Hidden-Markov-Model (HMM) that turns into a two-dimensional model when used in the context of DMCs. The proposed ED2D-algorithm (Estimation-Decoding in two dimensions) is implemented to operate on a 2D-LDPC-Markov factor graph that comprises of a LDPC code's Tanner-graph and a 2D-HMM. For a subsequent comparison between different barcodes in industrial environment, a simulation of typical damages has been implemented. Tests showed a superior decoding behavior of our LDPC-based DMC decoded with the ED2D-decoder over the standard Reed-Solomon-based DMC.

本文提出了一种基于低密度奇偶校验(LDPC)码的数据矩阵码(DMC)译码的估计译码算法的扩展，并设计用于工业环境。为了在通道模型中包含可能的损坏，选择了一个马尔可夫调制高斯通道(MMGC)来表示基于ldpc的DMC的压纹和基于相机的采集之间的所有内容。MMGC基于隐马尔可夫模型(HMM)，当在dmc上下文中使用时，隐马尔可夫模型会变成二维模型。提出的ed2d算法(二维估计-解码)在2d -LDPC-马尔可夫因子图上运行，该因子图由LDPC码的tanner图和2D-HMM组成。为了在工业环境中对不同条形码进行比较，对典型损伤进行了模拟。测试表明，与基于reed - solomon的标准DMC相比，使用ed2d解码器解码的基于ldpc的DMC具有更好的解码行为。

引用次数: 0

Wireless in-vehicle complaint driver environment recorder 无线车载投诉驾驶员环境记录仪

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003567300520058

O. Siordia, Isaac Martín de Diego, C. Conde, E. Cabello

In this paper, an in-vehicle complaint recording device is presented. The device is divided in independent systems for image and audio data acquisition and storage. The systems, designed to work under in-vehicle complaint devices, use existent in-vehicle wireless architectures for its communication. Several tests of the recording device in a highly realistic truck simulator show the reliability of the developed system to acquire and store driver related data. The acquired data will be used for the development of a valid methodology for the reconstruction and study of traffic accidents.

介绍了一种车载投诉记录装置。该设备分为独立的系统，用于图像和音频数据的采集和存储。该系统设计工作在车载投诉设备下，使用现有的车载无线架构进行通信。在高度逼真的卡车模拟器上进行的多次测试表明，所开发的系统在采集和存储驾驶员相关数据方面是可靠的。所获得的数据将用于制定交通事故重建和研究的有效方法。

引用次数: 5

A non-uniform real-time speech time-scale stretching method 一种非均匀实时语音时间尺度拉伸方法

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003456300270033

A. Kupryjanow, A. Czyżewski

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were analysed. Subjective tests were performed in order to compare the quality of the proposed method with the output of the standard SOLA algorithm. Accuracy of the ROS estimation was assessed to prove its robustness.

提出了一种非均匀实时语音拉伸算法。它提供了典型的SOLA算法(同步重叠和添加)与元音，辅音和沉默检测器的组合。基于内容信息和语音速率(ROS)估计值，该算法调整比例因子值。分析了实时语音拉伸的能力和由此产生的语音质量。为了将提出的方法的质量与标准SOLA算法的输出进行比较，进行了主观测试。评估了ROS估计的准确性，证明了其鲁棒性。

引用次数: 4

Managing multiple media streams in HTML5: The IEEE 1599-2008 case study 在HTML5中管理多个媒体流:IEEE 1599-2008案例研究

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003651401930199

Stefano Baldan, L. A. Ludovico, D. Mauro

This paper deals with the problem of managing multiple multimedia streams in a Web environment. Multimedia types to support are pure audio, video with no sound, and audio/video. Data streams refer to the same event or performance, consequently they both have and should maintain mutual synchronization. Besides, a Web player should be able to play different multimedia streams simultaneously, as well as to switch from one to another in real time. The clarifying example of a music piece encoded in IEEE 1599 format will be presented as a case study.

本文研究了在Web环境下管理多个多媒体流的问题。支持的多媒体类型有纯音频、无声音视频和音频/视频。数据流引用相同的事件或性能，因此它们都具有并且应该保持相互同步。此外，一个网络播放器应该能够同时播放不同的多媒体流，以及从一个实时切换到另一个。一个以IEEE 1599格式编码的音乐片段的澄清示例将作为案例研究提出。

引用次数: 4

Image matching algorithms in stereo vision using address-event-representation: A theoretical study and evaluation of the different algorithms 基于地址-事件表示的立体视觉图像匹配算法:不同算法的理论研究与评价

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003518500790084

M. Domínguez-Morales, Elena Cerezuela-Escudero, A. Jiménez-Fernandez, R. Paz-Vicente, Juan Luis Font-Calvo, P. Iñigo-Blasco, A. Linares-Barranco, G. Jiménez-Moreno

Image processing in digital computer systems usually considers the visual information as a sequence of frames. These frames are from cameras that capture reality for a short period of time. They are renewed and transmitted at a rate of 25–30 fps (typical real-time scenario). Digital video processing has to process each frame in order to obtain a filter result or detect a feature on the input. In stereo vision, existing algorithms use frames from two digital cameras and process them pixel by pixel until it is found a pattern match in a section of both stereo frames. Spike-based processing is a relatively new approach that implements the processing by manipulating spikes one by one at the time they are transmitted, like a human brain. The mammal nervous system is able to solve much more complex problems, such as visual recognition by manipulating neuron's spikes. The spike-based philosophy for visual information processing based on the neuro-inspired Address-Event-Representation (AER) is achieving nowadays very high performances. In this work we study the existing digital stereo matching algorithms and how do they work. After that, we propose an AER stereo matching algorithm using some of the principles shown in digital stereo methods.

数字计算机系统中的图像处理通常将视觉信息视为一系列帧。这些画面来自于短时间内捕捉现实的相机。它们以25 - 30fps(典型的实时场景)的速率更新和传输。数字视频处理必须对每一帧进行处理，以获得滤波结果或检测输入上的特征。在立体视觉中，现有的算法使用来自两个数码相机的帧并逐像素处理它们，直到在两个立体帧的一部分中找到匹配的模式。基于峰值的处理是一种相对较新的方法，它通过在传输时一个接一个地操纵峰值来实现处理，就像人脑一样。哺乳动物的神经系统能够解决更复杂的问题，比如通过操纵神经元的尖峰来进行视觉识别。基于神经启发的地址-事件表示(AER)的基于峰的视觉信息处理方法目前取得了很高的性能。在这项工作中，我们研究了现有的数字立体匹配算法及其工作原理。之后，我们提出了一种AER立体匹配算法，该算法使用了数字立体方法中显示的一些原理。

{"title":"Image matching algorithms in stereo vision using address-event-representation: A theoretical study and evaluation of the different algorithms","authors":"M. Domínguez-Morales, Elena Cerezuela-Escudero, A. Jiménez-Fernandez, R. Paz-Vicente, Juan Luis Font-Calvo, P. Iñigo-Blasco, A. Linares-Barranco, G. Jiménez-Moreno","doi":"10.5220/0003518500790084","DOIUrl":"https://doi.org/10.5220/0003518500790084","url":null,"abstract":"Image processing in digital computer systems usually considers the visual information as a sequence of frames. These frames are from cameras that capture reality for a short period of time. They are renewed and transmitted at a rate of 25–30 fps (typical real-time scenario). Digital video processing has to process each frame in order to obtain a filter result or detect a feature on the input. In stereo vision, existing algorithms use frames from two digital cameras and process them pixel by pixel until it is found a pattern match in a section of both stereo frames. Spike-based processing is a relatively new approach that implements the processing by manipulating spikes one by one at the time they are transmitted, like a human brain. The mammal nervous system is able to solve much more complex problems, such as visual recognition by manipulating neuron's spikes. The spike-based philosophy for visual information processing based on the neuro-inspired Address-Event-Representation (AER) is achieving nowadays very high performances. In this work we study the existing digital stereo matching algorithms and how do they work. After that, we propose an AER stereo matching algorithm using some of the principles shown in digital stereo methods.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Latent topic visual language model for object categorization 面向对象分类的潜在主题视觉语言模型

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003491601490158

Lei Wu, Nenghai Yu, J. Liu, Mingjing Li

This paper presents a latent topic visual language model to handle variation problem in object categorization. Variations including different views, styles, poses, etc., have greatly affected the spatial arrangement and distribution of visual features, on which previous categorization models largely depend. Taking the object variations as hidden topics within each category, the proposed model explores the relationship between object variations and visual feature arrangement in the traditional visual language modeling process. With this improvement, the accuracy of object categorization is further boosted. Experiments on Caltech 101 dataset have shown that this model makes sense and is effective.

针对对象分类中的变异问题，提出了一种潜在主题视觉语言模型。不同的视角、风格、姿势等变化极大地影响了视觉特征的空间排列和分布，而以往的分类模型很大程度上依赖于视觉特征的空间排列和分布。该模型将对象变化作为每个类别中的隐藏主题，探讨了传统视觉语言建模过程中对象变化与视觉特征排列之间的关系。通过这种改进，进一步提高了对象分类的准确性。在Caltech 101数据集上的实验表明，该模型是有意义和有效的。

引用次数: 2

Two-dimensional codes on mobile devices and the development of the platform 移动设备上的二维码平台的开发

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003481200190022

José Manuel Fornés Rumbao, F. R. Rubio

In the last times, the mobile terminals have experienced an accelerated technological development. This evolution has provided numerous advances in presentation and interactivity in general and it has given rise to the generation of numerous applications for it. Following this line; this article shows how to incorporate on mobile terminals a simple interaction with the environment across the technological successor of the bar codes: the two-dimensional codes. We will use three basic elements-camera quality, growth in data traffic and increased bandwidth in mobile phones-to create a platform that provides to the user an easy and useful way of obtaining information multimedia that improves his relation with the environment. We will look for a complete and global development of the system, that is, the generation of the two-dimensional code; his interaction with the platform and final obtaining of the information in the terminal.

在过去的几个世纪里，移动终端经历了一个加速的技术发展。这种演变在总体上提供了表示和交互性方面的许多进步，并为其产生了许多应用程序。沿着这条线;本文展示了如何在移动终端上跨条形码的技术继承者:二维码与环境进行简单的交互。我们将使用三个基本要素——相机质量、数据流量的增长和移动电话带宽的增加——来创建一个平台，为用户提供一种简单而有用的方式来获取信息多媒体，从而改善他与环境的关系。我们将寻找一个完整的、全球开发的系统，即二维码的生成;他与平台的互动，最终在终端中获取信息。

引用次数: 0

3D visualization of single images using patch level depth 使用补丁级深度的单个图像的3D可视化

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003511800610066

Shahrouz Yousefi, Farid Abedan Kondori, Haibo Li

In this paper we consider the task of 3D photo visualization using a single monocular image. The main idea is to use single photos taken by capturing devices such as ordinary cameras, mobile phones, tablet PCs etc. and visualize them in 3D on normal displays. Supervised learning approach is hired to retrieve depth information from single images. This algorithm is based on the hierarchical multi-scale Markov Random Field (MRF) which models the depth based on the multi-scale global and local features and relation between them in a monocular image. Consequently, the estimated depth image is used to allocate the specified depth parameters for each pixel in the 3D map. Accordingly, the multi-level depth adjustments and coding for color anaglyphs is performed. Our system receives a single 2D image as input and provides a anaglyph coded 3D image in output. Depending on the coding technology the special low-cost anaglyph glasses for viewers will be used.

在本文中，我们考虑了使用单眼图像的三维照片可视化任务。其主要思路是使用普通相机、手机、平板电脑等设备拍摄的单张照片，并在普通显示器上以3D形式呈现出来。采用监督学习方法从单幅图像中提取深度信息。该算法基于分层多尺度马尔可夫随机场(MRF)，根据单眼图像的多尺度全局和局部特征及其相互关系对深度进行建模。因此，使用估计的深度图像为3D地图中的每个像素分配指定的深度参数。在此基础上，对彩色图像进行了多级深度调整和编码。我们的系统接收一个单一的2D图像作为输入，并提供一个多边形编码的3D图像作为输出。根据编码技术的不同，将使用特殊的低成本立体眼镜。

引用次数: 3

A genetic approach for improving the side information in Wyner-Ziv video coding with long duration GOP 一种改进长时间GOP的Wyner-Ziv视频编码侧信息的遗传方法

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003526300970103

C. Yaacoub, J. Farah, Chadi Jabroun

This work tackles the problem of side information generation for the case of large-duration GOPs in distributed video coding. Based on a previously developed technique for side-information enhancement, we develop a genetic algorithm particularly designed for large GOPs, taking into account the GOP size, the additional bitrate incurred by encoding hash information, as well as the decoding complexity. The proposed algorithm makes use of different interpolation methods available in the literature in a fusion-based approach. A significant gain in the average PSNR that can reach 2 dB is observed with respect to the best performing interpolation technique, while the algorithm is run for no more than 18% of the total number of blocks in a given video sequence. On the other hand, while the encoding complexity is a main concern in distributed video coding, the proposed solution incurs no additional complexity at the encoder side in the case of hash-based Wyner-Ziv video coding.

本文解决了分布式视频编码中大持续时间GOPs情况下的边信息生成问题。基于先前开发的侧信息增强技术，我们开发了一种专门为大型GOPs设计的遗传算法，考虑到GOP大小，编码哈希信息所产生的额外比特率以及解码复杂性。提出的算法在基于融合的方法中利用了文献中可用的不同插值方法。相对于性能最好的插值技术，可以观察到平均PSNR可以达到2 dB的显着增益，而该算法在给定视频序列中运行不超过18%的块总数。另一方面，虽然编码复杂性是分布式视频编码的主要问题，但在基于哈希的Wyner-Ziv视频编码的情况下，所提出的解决方案不会在编码器端产生额外的复杂性。

引用次数: 0

Video surveillance at an industrial environment using an address event vision sensor: Comparative between two different video sensor based on a bioinspired retina 在工业环境中使用地址事件视觉传感器的视频监控:基于生物视网膜的两种不同视频传感器的比较

Proceedings of the International Conference on Signal Processing and Multimedia Applications

Pub Date : 2011-07-18 DOI: 10.5220/0003521701310134

F. Perez-Peña, Arturo Morgado Estévez, R. Montero-Gonzalez, A. Linares-Barranco, G. Jiménez-Moreno

Nowadays we live in very industrialization world that turns worried about surveillance and with lots of occupational hazards. The aim of this paper is to supply a surveillance video system to use at ultra fast industrial environments. We present an exhaustive timing analysis and comparative between two different Address Event Representation (AER) retinas, one with 64×64 pixel and the other one with 128×128 pixel in order to know the limits of them. Both are spike based image sensors that mimic the human retina and designed and manufactured by Delbruck's lab. Two different scenarios are presented in order to achieve the maximum frequency of light changes for a pixel sensor and the maximum frequency of requested pixel addresses on the AER output. Results obtained are 100 Hz and 1.88 MHz at each case for the 64×64 retina and peaks of 1.3 KHz and 8.33 MHz for the 128×128 retina. We have tested the upper spin limit of an ultra fast industrial machine and found it to be approximately 6000 rpm for the first retina and no limit achieve at top rpm for the second retina. It has been tested that in cases with high light contrast no AER data is lost.

如今，我们生活在一个高度工业化的世界，人们开始担心监控和许多职业危害。本文的目的是提供一种用于超快工业环境的监控视频系统。我们对两种不同的地址事件表示(AER)视网膜进行了详尽的时序分析和比较，一种是64×64像素，另一种是128×128像素，以了解它们的局限性。都是基于飙升的图像传感器,模拟人类视网膜和德尔布吕克的实验室设计和制造。为了实现像素传感器的最大光变化频率和AER输出上请求的像素地址的最大频率，提出了两种不同的方案。在每种情况下，64×64视网膜的峰值为100 Hz和1.88 MHz, 128×128视网膜的峰值为1.3 KHz和8.33 MHz。我们测试了一台超高速工业机器的转速上限，发现第一视网膜的转速上限约为6000转，第二视网膜的转速上限没有限制。经测试，在高光对比度的情况下，没有AER数据丢失。

{"title":"Video surveillance at an industrial environment using an address event vision sensor: Comparative between two different video sensor based on a bioinspired retina","authors":"F. Perez-Peña, Arturo Morgado Estévez, R. Montero-Gonzalez, A. Linares-Barranco, G. Jiménez-Moreno","doi":"10.5220/0003521701310134","DOIUrl":"https://doi.org/10.5220/0003521701310134","url":null,"abstract":"Nowadays we live in very industrialization world that turns worried about surveillance and with lots of occupational hazards. The aim of this paper is to supply a surveillance video system to use at ultra fast industrial environments. We present an exhaustive timing analysis and comparative between two different Address Event Representation (AER) retinas, one with 64×64 pixel and the other one with 128×128 pixel in order to know the limits of them. Both are spike based image sensors that mimic the human retina and designed and manufactured by Delbruck's lab. Two different scenarios are presented in order to achieve the maximum frequency of light changes for a pixel sensor and the maximum frequency of requested pixel addresses on the AER output. Results obtained are 100 Hz and 1.88 MHz at each case for the 64×64 retina and peaks of 1.3 KHz and 8.33 MHz for the 128×128 retina. We have tested the upper spin limit of an ultra fast industrial machine and found it to be approximately 6000 rpm for the first retina and no limit achieve at top rpm for the second retina. It has been tested that in cases with high light contrast no AER data is lost.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124433921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the International Conference on Signal Processing and Multimedia Applications

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀