2015 IEEE International Symposium on Multimedia (ISM)最新文献

英文中文

Contour-Based Depth Coding: A Subjective Quality Assessment Study 基于轮廓的深度编码:一种主观质量评价研究

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.34

Marco Calemme, Marco Cagnazzo, B. Pesquet-Popescu

Multi-view video plus depth is emerging as the most flexible format for 3D video representation, as witnessed by the current standardization efforts by ISO and ITU. The depth information allows synthesizing virtual view points, and for its compression various techniques have been proposed. It is generally recognized that a high quality view rendering at the receiver side is possible only by preserving the contour information since distortions on edges during the encoding step would cause a sensible degradation on the synthesized view and on the 3D perception. As a consequence recent approaches include contour-based coding of depths. However, the impact of contour-preserving depth-coding on the perceived quality of synthesized images has not been conveniently studied. Therefore in this paper we make an investigation by means of a subjective study to better understand the limits and the potentialities of the different techniques. Our results show that the contour information is indeed relevant in the synthesis step: preserving the contours and coding coarsely the rest typically leads to images that users cannot tell apart from the reference ones, even at low bit rate. Moreover, our results show that objective metrics that are commonly used to evaluate synthesized images may have a low correlation coefficient with MOS rates and are in general not consistent across several techniques and contents.

正如ISO和国际电联目前的标准化工作所证明的那样，多视点视频加深度正在成为3D视频表示的最灵活格式。深度信息可以用于虚拟视点的合成，针对深度信息的压缩，人们提出了多种技术。人们普遍认为，只有通过保留轮廓信息才能在接收端呈现高质量的视图，因为在编码步骤中边缘的扭曲会导致合成视图和3D感知的明显退化。因此，最近的方法包括基于轮廓的深度编码。然而，保持轮廓的深度编码对合成图像感知质量的影响尚未得到研究。因此，本文以主观研究的方式进行调查，以更好地了解不同技术的局限性和潜力。我们的研究结果表明，轮廓信息在合成步骤中确实是相关的:保留轮廓并对其余部分进行粗编码通常会导致用户无法将图像与参考图像区分开来，即使在低比特率下也是如此。此外，我们的研究结果表明，通常用于评估合成图像的客观指标可能与MOS率的相关系数较低，并且在几种技术和内容之间通常不一致。

{"title":"Contour-Based Depth Coding: A Subjective Quality Assessment Study","authors":"Marco Calemme, Marco Cagnazzo, B. Pesquet-Popescu","doi":"10.1109/ISM.2015.34","DOIUrl":"https://doi.org/10.1109/ISM.2015.34","url":null,"abstract":"Multi-view video plus depth is emerging as the most flexible format for 3D video representation, as witnessed by the current standardization efforts by ISO and ITU. The depth information allows synthesizing virtual view points, and for its compression various techniques have been proposed. It is generally recognized that a high quality view rendering at the receiver side is possible only by preserving the contour information since distortions on edges during the encoding step would cause a sensible degradation on the synthesized view and on the 3D perception. As a consequence recent approaches include contour-based coding of depths. However, the impact of contour-preserving depth-coding on the perceived quality of synthesized images has not been conveniently studied. Therefore in this paper we make an investigation by means of a subjective study to better understand the limits and the potentialities of the different techniques. Our results show that the contour information is indeed relevant in the synthesis step: preserving the contours and coding coarsely the rest typically leads to images that users cannot tell apart from the reference ones, even at low bit rate. Moreover, our results show that objective metrics that are commonly used to evaluate synthesized images may have a low correlation coefficient with MOS rates and are in general not consistent across several techniques and contents.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129168199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Multi-modality Mobile Image Recognition Based on Thermal and Visual Cameras 基于热相机和视觉相机的多模态移动图像识别

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.120

Jui-Hsin Lai, Chung-Ching Lin, Chun-Fu Chen, Ching-Yung Lin

The advances of mobile computing and sensor technology have turned the mobile devices into powerful instruments. The integration of thermal and visual cameras extends the capability of computer vision, due to the fact that both images reveal different characteristics in images, however, image alignment is a challenge. This paper proposes an effective approach to align image pairs for event detection on mobile through image recognition. We leverage thermal and visual cameras as multi-modality sources for image recognition. By analyzing the heat pattern, the proposed APP can identify the heating sources and help users inspect their house heating system, on the other hand, with applying image recognition, the proposed APP furthermore can help field workers identify the asset condition and provide the guidance to solve their issues.

移动计算和传感器技术的进步使移动设备成为强大的工具。热成像和视觉成像的集成扩展了计算机视觉的能力，因为这两种图像在图像中显示不同的特征，然而，图像对齐是一个挑战。本文提出了一种基于图像识别的移动设备事件检测图像对对齐的有效方法。我们利用热摄像机和视觉摄像机作为图像识别的多模态源。通过对热模式的分析，本应用程序可以识别热源，帮助用户检查房屋供暖系统，另一方面，应用图像识别，本应用程序还可以帮助现场工作人员识别资产状况，指导他们解决问题。

引用次数: 1

Fine-Grained Scalable Video Caching 细粒度可扩展的视频缓存

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.81

Qiushi Gong, J. Woods, K. Kar, Jacob Chakareski

Caching has been shown to enhance network performance. In this paper, we study fine-grain scalable video caching. We start from a single cache scenario by providing a solution to the caching allocation problem that optimizes the average expected video quality for the most popular video clips. Actual trace data is applied to verify the performance of our algorithm and compare its backhaul link bandwidth consumption relative to non-scalable video caching. In addition, we extend our analysis to collaborative caching and integrate network coding for further transmission efficiency. Our experimental results demonstrate considerable performance enhancement.

缓存已被证明可以提高网络性能。本文主要研究细粒度可扩展视频缓存技术。我们从单个缓存场景开始，提供了一个缓存分配问题的解决方案，该解决方案优化了最受欢迎的视频剪辑的平均预期视频质量。实际跟踪数据应用于验证我们的算法的性能，并比较其回程链路带宽消耗相对于不可扩展的视频缓存。此外，我们将我们的分析扩展到协作缓存和集成网络编码，以进一步提高传输效率。我们的实验结果显示了相当大的性能提升。

引用次数: 4

Multiple Human Monitoring with Wireless Fiber-Optic Multimedia Sensor Networks 基于无线光纤多媒体传感器网络的多人监控

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.123

Qingquan Sun

This paper presents a binary compressive sensing based fiber-optic sensor system for human monitoring. Fiber-optic sensors are flexible and convenient to measure pressure information of humans. Such a nature enables fiber-optic sensors to achieve localization and tracking directly. In order to capture more information of human subjects and scenes, a Bernoulli mixture model is proposed to model scenes. Meanwhile, compressive sensing based space encoding and decoding techniques are developed to implement scene recognition. Experimental results have demonstrated that the proposed fiber-optic sensing system and compressive sensing based encoding/decoding techniques are effective for human monitoring in terms of tracking and scene recognition.

提出了一种基于二进制压缩感知的人体监测光纤传感器系统。光纤传感器具有测量人体压力信息的灵活性和方便性。这种特性使得光纤传感器可以直接实现定位和跟踪。为了捕获更多的人体主体和场景信息，提出了一种伯努利混合模型对场景进行建模。同时，开发了基于压缩感知的空间编码和解码技术来实现场景识别。实验结果表明，所提出的光纤传感系统和基于压缩感知的编码/解码技术在跟踪和场景识别方面对人体监控是有效的。

引用次数: 0

TRACE: Linguistic-Based Approach for Automatic Lecture Video Segmentation Leveraging Wikipedia Texts 利用维基百科文本的基于语言的自动演讲视频分割方法

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.18

R. Shah, Yi Yu, A. Shaikh, Roger Zimmermann

In multimedia-based e - learning systems, the accessibility and searchability of most lecture video content is still insufficient due to the unscripted and spontaneous speech of the speakers. Moreover, this problem becomes even more challenging when the quality of such lecture videos is not sufficiently high. To extract the structural knowledge of a multi-topic lecture video and thus make it easily accessible it is very desirable to divide each video into shorter clips by performing an automatic topic-wise video segmentation. To this end, this paper presents the TRACE system to automatically perform such a segmentation based on a linguistic approach using Wikipedia texts. TRACE has two main contributions: (i) the extraction of a novel linguistic-based Wikipedia feature to segment lecture videos efficiently, and (ii) the investigation of the late fusion of video segmentation results derived from state-of-the-art algorithms. Specifically for the late fusion, we combine confidence scores produced by the models constructed from visual, transcriptional, and Wikipedia features. According to our experiments on lecture videos from VideoLectures.NET and NPTEL, the proposed algorithm segments knowledge structures more accurately compared to existing state-of-the-art algorithms. The evaluation results are very encouraging and thus confirm the effectiveness of TRACE.

在基于多媒体的电子学习系统中，由于演讲者的即兴演讲，大多数讲座视频内容的可访问性和可搜索性仍然不足。此外，当这些讲座视频的质量不够高时，这个问题变得更加具有挑战性。为了提取多主题讲座视频的结构知识，从而使其易于访问，非常需要通过执行自动主题明智的视频分割将每个视频分成更短的片段。为此，本文提出了TRACE系统，该系统基于维基百科文本的语言方法自动执行这种分割。TRACE有两个主要贡献:(i)提取一种新的基于语言的维基百科特征来有效地分割讲座视频，以及(ii)研究来自最先进算法的视频分割结果的后期融合。特别是对于后期融合，我们结合了由视觉、转录和维基百科特征构建的模型产生的置信度分数。根据我们对VideoLectures的讲座视频的实验。NET和NPTEL，与现有的最先进的算法相比，所提出的算法更准确地分割知识结构。评价结果令人鼓舞，从而证实了TRACE的有效性。

{"title":"TRACE: Linguistic-Based Approach for Automatic Lecture Video Segmentation Leveraging Wikipedia Texts","authors":"R. Shah, Yi Yu, A. Shaikh, Roger Zimmermann","doi":"10.1109/ISM.2015.18","DOIUrl":"https://doi.org/10.1109/ISM.2015.18","url":null,"abstract":"In multimedia-based e - learning systems, the accessibility and searchability of most lecture video content is still insufficient due to the unscripted and spontaneous speech of the speakers. Moreover, this problem becomes even more challenging when the quality of such lecture videos is not sufficiently high. To extract the structural knowledge of a multi-topic lecture video and thus make it easily accessible it is very desirable to divide each video into shorter clips by performing an automatic topic-wise video segmentation. To this end, this paper presents the TRACE system to automatically perform such a segmentation based on a linguistic approach using Wikipedia texts. TRACE has two main contributions: (i) the extraction of a novel linguistic-based Wikipedia feature to segment lecture videos efficiently, and (ii) the investigation of the late fusion of video segmentation results derived from state-of-the-art algorithms. Specifically for the late fusion, we combine confidence scores produced by the models constructed from visual, transcriptional, and Wikipedia features. According to our experiments on lecture videos from VideoLectures.NET and NPTEL, the proposed algorithm segments knowledge structures more accurately compared to existing state-of-the-art algorithms. The evaluation results are very encouraging and thus confirm the effectiveness of TRACE.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127773560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

WAMINet: An Open Source Library for Dynamic Geospace Analysis Using WAMI WAMINet:一个使用WAMI进行动态地球空间分析的开源库

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.66

M. Maurice, Matt Piekenbrock, Derek Doran

Modern military and commercial aerial platforms have the ability to capture imagery data over very large (kilometer wide) areas at moderate rates of 1-3 frames per second. This wide-area motion imagery (WAMI) captures the conditions and activity over a geospace, hence offering an opportunity to understand its wide-scale dynamics. This paper presents WAMINet, a library capable of ingesting large numbers of WAMI frames to build a network representation of the dynamics of the geospace being studied. It discusses the approach WAMINet uses to build the network representation, the component based design of the architecture, and illustrates its WAMI processing capabilities. Prototype versions of WAMINet and its code are available for download.

现代军事和商业空中平台有能力以每秒1-3帧的中等速率捕获非常大(公里宽)区域的图像数据。这种广域运动图像(WAMI)捕捉了地球空间的条件和活动，从而提供了了解其大尺度动态的机会。本文提出了WAMINet，这是一个能够摄取大量WAMI帧的库，用于构建所研究的地球空间动态的网络表示。讨论了WAMINet用于构建网络表示的方法，基于组件的体系结构设计，并说明了其WAMI处理能力。WAMINet的原型版本及其代码可供下载。

引用次数: 0

Precise Skin-Tone and Under-Tone Estimation by Large Photo Set Information Fusion 基于大照片集信息融合的精确肤色和底色估计

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.61

P. Aarabi, Benzakhar Manashirov, Edmund Phung, Kyung Moon Lee

This paper proposes a novel method for the estimation of a person's skin-tone and under-tone by analyzing a large collection of photos of that person. By excluding badly lit images, and analyzing well-lit skin pixels, it becomes possible to compute an overall skin-tone estimate which is in-line with the person's true skin shade, and based on this, to determine a person's under-tone. Based on a study involving 15,590 user sessions and 104,366 photos, it was found that the proposed methodology can detect the normalized RGB of the person's skin-tone with 2.3% RMSE, or based on the CIE76 color difference measure, obtain an average Delta E color difference of 3.15 in L*a*b* color space.

本文提出了一种通过分析一个人的大量照片来估计一个人的肤色和底色的新方法。通过排除光线不足的图像，并分析光线充足的皮肤像素，可以计算出与人的真实肤色一致的整体肤色估计，并在此基础上确定一个人的底色。基于对15590个用户会话和104366张照片的研究，发现所提出的方法可以以2.3%的RMSE检测人肤色的归一化RGB，或者基于CIE76色差测量，在L*a*b*色彩空间中获得平均Delta E色差3.15。

引用次数: 2

Quantitative Evaluation of Hair Texture 头发质地的定量评价

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.43

W. Guo, P. Aarabi

In this paper, we quantitatively evaluate the role of texture in hair patches, with a primary motivation of understanding what can be learned and applied by machine learning systems for texture-based hair detection. We evaluate the distribution of gradient directions in hair patches, and explore the relation between proximity to the face and the angle of the gradients for 2,870,000 hair patches selected from 100 manually silhouetted hairstyles.

在本文中，我们定量地评估了毛发斑块中纹理的作用，其主要动机是了解机器学习系统在基于纹理的毛发检测中可以学习和应用什么。我们评估了梯度方向在发块中的分布，并探索了从100个手工剪影发型中选择的2870,000个发块与面部的接近度与梯度角度之间的关系。

引用次数: 0

Development of a Web-Based Haptic Authoring Tool for Multimedia Applications 基于web的多媒体应用触觉创作工具的开发

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.71

Haiwei Dong, Yu Gao, Hussein Al Osman, Abdulmotaleb El Saddik

In this paper, we introduce an MPEG-V based haptic authoring tool intended for simplifying the development process of haptics-enabled multimedia applications. The developed tool provides a web-based interface for users to create haptic environments by importing 3D models and adding haptic properties to them. The user can then export the resulting environment to a standard MPEG-V format. The latter can be imported to a haptic player that renders the described haptics-enabled 3D scene. The proposed tool can support many haptic devices, including Geomagic Devices, Force Dimension Devices, Novint Falcon Devices, and Moog FCS HapticMaster Devices. We conduct a proof of concept HTML5 haptic game project and user studies on haptic effects, development process and user interface, which shows our tool's effectiveness in simplifying the development process of haptics-enabled multimedia applications.

在本文中，我们介绍了一个基于MPEG-V的触觉创作工具，旨在简化触觉多媒体应用程序的开发过程。开发的工具为用户提供了一个基于web的界面，通过导入3D模型和添加触觉属性来创建触觉环境。然后，用户可以将生成的环境导出为标准的MPEG-V格式。后者可以导入到渲染所描述的支持触觉的3D场景的触觉播放器中。提出的工具可以支持许多触觉设备，包括Geomagic设备，Force Dimension设备，Novint Falcon设备和Moog FCS HapticMaster设备。我们进行了一个HTML5触觉游戏项目的概念验证，并对触觉效果、开发过程和用户界面进行了用户研究，这表明我们的工具在简化触觉多媒体应用开发过程中的有效性。

引用次数: 7

Normalized Gaussian Distance Graph Cuts for Image Segmentation 归一化高斯距离图切割图像分割

2015 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2015-12-01 DOI: 10.1109/ISM.2015.36

Chengcai Leng, W. Xu, I. Cheng, Z. Xiong, A. Basu

This paper presents a novel, fast image segmentation method based on normalized Gaussian distance on nodes in conjunction with normalized graph cuts. We review the equivalence between kernel k-means and normalized cuts. Then we extend the framework of efficient spectral clustering and avoid choosing weights in the weighted graph cuts approach. Experiments on synthetic data sets and real-world images demonstrate that the proposed method is effective and accurate.

本文提出了一种基于节点归一化高斯距离并结合归一化图割的快速图像分割方法。我们回顾了核k-均值和归一化切之间的等价性。然后，我们扩展了高效谱聚类的框架，避免了在加权图切方法中选择权值。在合成数据集和真实图像上的实验证明了该方法的有效性和准确性。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE International Symposium on Multimedia (ISM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀