IEEE MultiMedia最新文献

英文中文

Optimizing Multidimensional Perceptual Quality in Online Interactive Multimedia 优化在线交互式多媒体的多维感知质量

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/MMUL.2023.3277851

Benjamin W. Wah, Jingxi X. Xu

Network latencies and losses in online interactive multimedia applications may lead to a degraded perception of quality, such as lower interactivity or sluggish responses. We can measure these degradations in perceptual quality by the just-noticeable difference, awareness, or probability of noticeability ($p_{text{note}}$pnote); the latter measures the likelihood that subjects can notice a change from a reference to a modified reference. In our previous work, we developed an efficient method for finding the perceptual quality for one metric under simplex control. However, integrating the perceptual qualities of several metrics is a heuristic. In this article, we present a formal approach to optimally combine the perceptual quality of multiple metrics into a joint measure that shows their tradeoffs. Our result shows that the optimal balance occurs when the $p_{text{note}}$pnote of all the component metrics are equal. Furthermore, our approach leads to an algorithm with a linear (instead of combinatorial) complexity of the number of metrics. Finally, we present the application of our method in two case studies, one on VoIP for finding the optimal operating points and the second on fast-action games to hide network delays while maintaining the consistency of action orders.

在线交互式多媒体应用程序中的网络延迟和损失可能导致质量感知下降，例如交互性降低或响应缓慢。我们可以通过可注意的差异、意识或可注意性概率来测量这些感知质量的退化($p_{text{note}}$pnote);后者衡量的是受试者注意到从参考到修改后的参考的变化的可能性。在我们之前的工作中，我们开发了一种在单纯形控制下寻找单个度量的感知质量的有效方法。然而，整合几个指标的感知品质是一个启发式的。在本文中，我们提出了一种正式的方法，以最佳方式将多个度量的感知质量组合成一个显示其权衡的联合度量。我们的结果表明，当所有组件指标的$p_{text{note}}$pnote相等时，会出现最佳平衡。此外，我们的方法导致算法具有线性(而不是组合)复杂度的指标数量。最后，我们介绍了我们的方法在两个案例研究中的应用，一个是在VoIP中寻找最佳操作点，第二个是在快速动作游戏中隐藏网络延迟，同时保持动作顺序的一致性。

{"title":"Optimizing Multidimensional Perceptual Quality in Online Interactive Multimedia","authors":"Benjamin W. Wah, Jingxi X. Xu","doi":"10.1109/MMUL.2023.3277851","DOIUrl":"https://doi.org/10.1109/MMUL.2023.3277851","url":null,"abstract":"Network latencies and losses in online interactive multimedia applications may lead to a degraded perception of quality, such as lower interactivity or sluggish responses. We can measure these degradations in perceptual quality by the just-noticeable difference, awareness, or probability of noticeability ($p_{text{note}}$pnote); the latter measures the likelihood that subjects can notice a change from a reference to a modified reference. In our previous work, we developed an efficient method for finding the perceptual quality for one metric under simplex control. However, integrating the perceptual qualities of several metrics is a heuristic. In this article, we present a formal approach to optimally combine the perceptual quality of multiple metrics into a joint measure that shows their tradeoffs. Our result shows that the optimal balance occurs when the $p_{text{note}}$pnote of all the component metrics are equal. Furthermore, our approach leads to an algorithm with a linear (instead of combinatorial) complexity of the number of metrics. Finally, we present the application of our method in two case studies, one on VoIP for finding the optimal operating points and the second on fast-action games to hide network delays while maintaining the consistency of action orders.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"119-128"},"PeriodicalIF":3.2,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47142163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Improved Interaction Estimation and Optimization Method for Surveillance Video Synopsis 一种改进的监控视频摘要交互估计与优化方法

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/MMUL.2022.3224874

K. Namitha, M. Geetha, N.Rev athi

Videos synopsis is an efficient technique for condensing long-duration videos into short videos. The interactions between moving objects in the original video need to be preserved during video condensation. However, identifying objects with strong spatio-temporal proximity from a monocular video frame is a challenge. Further, the process of tube rearrangement optimization is also vital for the reduction of collision rates among moving objects. Taking the aforementioned aspects into consideration, we present a comprehensive video synopsis framework. First, we propose an interaction detection method to estimate distortion less spatio-temporal interactions between moving objects by generating the top view of a scene using a perspective transformation. Second, we propose an optimization method to reduce collisions and preserve object interactions by shrinking the search space. The experimental results demonstrate that the proposed framework provides a better estimate for object interactions from surveillance videos and generates synopsis videos with fewer collisions while preserving original interactions.

视频摘要是一种将长视频压缩成短视频的有效技术。在视频压缩过程中，需要保留原始视频中运动物体之间的相互作用。然而，从单目视频帧中识别具有强时空接近性的物体是一个挑战。此外，管道重排优化过程对于降低运动物体之间的碰撞率也至关重要。考虑到上述因素，我们提出了一个全面的视频摘要框架。首先，我们提出了一种交互检测方法，通过使用透视变换生成场景的俯视图来估计运动物体之间较少失真的时空交互。其次，我们提出了一种通过缩小搜索空间来减少碰撞和保持目标交互的优化方法。实验结果表明，该框架能较好地估计监控视频中的目标交互，并在保留原始交互的前提下生成较少碰撞的摘要视频。

引用次数: 1

IEEE Computer Society CG&A IEEE计算机学会

4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/mmul.2023.3309016

引用次数: 0

Taking a “Deep” Look at Multimedia Streaming “深入”了解多媒体流媒体

4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/mmul.2023.3308401

Balakrishnan Prabhakaran

Streaming multimedia content has become an integral part of our lives influencing the way we consume daily news, communicate with friends, family and in office, and entertain ourselves. Quality of multimedia content has been improving by leaps and bounds with advances in camera and other sensing technologies. In parallel, advances in multimedia display technologies have been equally amazing providing vast choice of affordable high-definition devices of a wide range of sizes. Quality of service (QoS) offered by Internet service providers has experienced impressive growth as well. All these factors have led to a huge surge on multimedia streaming sessions that need to be supported on the Internet. Advances in deep machine learning (ML) techniques have been successfully leveraged to manage the unprecedented usage of multimedia streaming. However, as the various factors influencing multimedia streaming continue to evolve, continuous research is needed to adopt new deep learning techniques for efficient multimedia streaming.

流媒体内容已经成为我们生活中不可或缺的一部分，影响着我们日常消费新闻、与朋友、家人和办公室沟通以及娱乐的方式。随着相机和其他传感技术的进步，多媒体内容的质量得到了突飞猛进的提高。并行,多媒体显示技术的进步同样惊人的负担得起的高清设备提供广阔的选择范围广泛的大小。互联网服务提供商提供的服务质量(QoS)也经历了令人印象深刻的增长。所有这些因素导致了需要在Internet上支持的多媒体流媒体会话的巨大激增。深度机器学习(ML)技术的进步已经成功地用于管理多媒体流的前所未有的使用。然而，随着影响多媒体流的各种因素不断发展，需要不断研究采用新的深度学习技术来实现高效的多媒体流。

引用次数: 0

IEEE Annals of the History of Computing IEEE计算历史年鉴

4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/mmul.2023.3309015

引用次数: 0

Reversible Modal Conversion Model for Thermal Infrared Tracking 热红外跟踪的可逆模态转换模型

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/MMUL.2023.3239136

Yufei Zha, Fan Li, Huanyu Li, Peng Zhang, Wei Huang

Learning powerful CNN representation of the target is a key issue for thermal infrared (TIR) tracking. The lack of massive training TIR data is one of the obstacles to training the network in an end-to-end way from the scratch. Compared to the time-consuming and labor-intensive method of heavily relabeling data, we obtain trainable TIR images by leveraging the massive annotated RGB images in this article. Unlike the traditional image generation models, a modal reversible module is designed to maximize the information propagation between RGB and TIR modals in this work. The advantage is that this module can preserve the modal information as possible when the network is conducted on a large number of aligned RGBT image pairs. Additionally, the fake-TIR features generated by the proposed module are also integrated to enhance the target representation ability when TIR tracking is on-the-fly. To verify the proposed method, we conduct sufficient experiments on both single-modal TIR and multimodal RGBT tracking datasets. In single-modal TIR tracking, the performance of our method is improved by 2.8% and 0.94% on success rate compared with the SOTA on LSOTB-TIR and PTB-TIR dataset. In multimodal RGBT fusion tracking, the proposed method is tested on the RGBT234 and VOT-RGBT2020 datasets and the results have also reached the performance of SOTA.

学习对目标的强大CNN表示是热红外(TIR)跟踪的关键问题。缺乏大量的训练TIR数据是从头开始以端到端方式训练网络的障碍之一。与耗时费力地重标注数据的方法相比，我们利用本文中大量注释的RGB图像获得了可训练的TIR图像。与传统的图像生成模型不同，本文设计了模态可逆模块，以最大限度地提高RGB和TIR模态之间的信息传播。该模块的优点是在对大量对齐的RGBT图像对进行网络时，可以尽可能地保留模态信息。此外，还集成了该模块生成的伪TIR特征，增强了动态TIR跟踪时的目标表示能力。为了验证所提出的方法，我们在单模态TIR和多模态RGBT跟踪数据集上进行了充分的实验。在单模态TIR跟踪中，与LSOTB-TIR和PTB-TIR数据集上的SOTA相比，该方法的成功率分别提高了2.8%和0.94%。在多模态RGBT234和vote - rgbt2020数据集上对该方法进行了测试，结果也达到了SOTA的性能。

{"title":"Reversible Modal Conversion Model for Thermal Infrared Tracking","authors":"Yufei Zha, Fan Li, Huanyu Li, Peng Zhang, Wei Huang","doi":"10.1109/MMUL.2023.3239136","DOIUrl":"https://doi.org/10.1109/MMUL.2023.3239136","url":null,"abstract":"Learning powerful CNN representation of the target is a key issue for thermal infrared (TIR) tracking. The lack of massive training TIR data is one of the obstacles to training the network in an end-to-end way from the scratch. Compared to the time-consuming and labor-intensive method of heavily relabeling data, we obtain trainable TIR images by leveraging the massive annotated RGB images in this article. Unlike the traditional image generation models, a modal reversible module is designed to maximize the information propagation between RGB and TIR modals in this work. The advantage is that this module can preserve the modal information as possible when the network is conducted on a large number of aligned RGBT image pairs. Additionally, the fake-TIR features generated by the proposed module are also integrated to enhance the target representation ability when TIR tracking is on-the-fly. To verify the proposed method, we conduct sufficient experiments on both single-modal TIR and multimodal RGBT tracking datasets. In single-modal TIR tracking, the performance of our method is improved by 2.8% and 0.94% on success rate compared with the SOTA on LSOTB-TIR and PTB-TIR dataset. In multimodal RGBT fusion tracking, the proposed method is tested on the RGBT234 and VOT-RGBT2020 datasets and the results have also reached the performance of SOTA.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"8-24"},"PeriodicalIF":3.2,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48643610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Computer Society Information IEEE计算机学会信息

4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/mmul.2023.3308996

引用次数: 0

IEEE Transactions on Sustainable Computing IEEE可持续计算汇刊

4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/mmul.2023.3313239

引用次数: 0

Computing in Science & Engineering 计算机科学& &;工程

4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/mmul.2023.3311108

引用次数: 0

PP8K: A New Dataset for 8K UHD Video Compression and Processing PP8K:用于8K超高清视频压缩和处理的新数据集

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE MultiMedia

Pub Date : 2023-07-01 DOI: 10.1109/MMUL.2023.3269459

Wei Gao, Hang Yuan, Guibiao Liao, Zixuan Guo, Jianing Chen

In the new era of ultra-high definition (UHD) videos, 8K is becoming more popular in diversified applications to boost the human visual experience and the performances of related vision tasks. However, researchers still suffer from the lack of 8K video sources to develop better processing algorithms for the compression, saliency detection, quality assessment, and vision analysis tasks. To ameliorate this situation, we construct a new comprehensive 8K UHD video dataset, which has two sub-datasets, i.e., the common raw format videos (CRFV) dataset and the video salient object detection (VSOD) dataset. To fully validate the diversity and practicality, the spatial and temporal information characteristics of the CRFV dataset are evaluated by the widely used metrics and the video encoder. Through the extensive experiments and comparative analyses with the other counterpart datasets, the proposed 8K dataset shows apparent advantages in diversity and practicality, which can benefit its applications for the developments of UHD video technologies. This dataset has been released online: https://git.openi.org.cn/OpenDatasets/PP8K.

在超高清视频的新时代，8K在多样化的应用中越来越受欢迎，以提升人类视觉体验和相关视觉任务的表现。然而，研究人员仍然缺乏8K视频源，无法为压缩、显著性检测、质量评估和视觉分析任务开发更好的处理算法。为了改善这种情况，我们构建了一个新的综合8K超高清视频数据集，该数据集有两个子数据集，即常见原始格式视频（CRFV）数据集和视频显著对象检测（VSOD）数据集。为了充分验证其多样性和实用性，通过广泛使用的度量和视频编码器来评估CRFV数据集的空间和时间信息特性。通过广泛的实验和与其他数据集的比较分析，所提出的8K数据集在多样性和实用性方面具有明显的优势，有利于其在超高清视频技术发展中的应用。此数据集已在线发布：https://git.openi.org.cn/OpenDatasets/PP8K.

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE MultiMedia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀