Pub Date : 2023-04-01DOI: 10.1109/MMUL.2022.3215089
Burak Kara, Mehmet N. Akcay, A. Begen, Saba Ahsan, I. Curcio, Emre B. Aksu
Measuring quality accurately and quickly (preferably in real time) when streaming 360$^circ$∘ videos is essential to enhance the user experience. Most quality-of-experience metrics have primarily used viewport quality as a simple surrogate for such experiences at a given time. While this baseline approach has been later augmented by some researchers using pupil and gaze tracking, head tracking has not been considered in enough detail. This article tackles whether head motions can influence the perception of 360$^circ$∘ videos. Inspired by the latest research, this article conceptualizes a head-motion-aware metric for measuring viewport quality. A comparative study against existing head-motion-unaware metrics reveals sizeable differences. Motivated by this, we invite the community to research this topic further and substantiate the new metric’s validity.
{"title":"Could Head Motions Affect Quality When Viewing 360° Videos?","authors":"Burak Kara, Mehmet N. Akcay, A. Begen, Saba Ahsan, I. Curcio, Emre B. Aksu","doi":"10.1109/MMUL.2022.3215089","DOIUrl":"https://doi.org/10.1109/MMUL.2022.3215089","url":null,"abstract":"Measuring quality accurately and quickly (preferably in real time) when streaming 360$^circ$∘ videos is essential to enhance the user experience. Most quality-of-experience metrics have primarily used viewport quality as a simple surrogate for such experiences at a given time. While this baseline approach has been later augmented by some researchers using pupil and gaze tracking, head tracking has not been considered in enough detail. This article tackles whether head motions can influence the perception of 360$^circ$∘ videos. Inspired by the latest research, this article conceptualizes a head-motion-aware metric for measuring viewport quality. A comparative study against existing head-motion-unaware metrics reveals sizeable differences. Motivated by this, we invite the community to research this topic further and substantiate the new metric’s validity.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"28-37"},"PeriodicalIF":3.2,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47273419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1109/MMUL.2022.3232892
Mariano M. Banquiero, Gracia Valdeolivas, Sergio Trincado, Natasha Garcia, M. Juan
Mixed reality (MR) in standalone headsets has many advantages over other types of devices. With the recent appearance of the Passthrough of Oculus Quest 2, new possibilities open up. This work details the features of the current Passthrough and how its potential was harnessed and its drawbacks minimized for developing a satisfying MR experience. It has been applied to learning to play the piano as a use case. A total of 33 piano students participated in a study to compare participants’ interpretation outcomes and subjective experience when using a MR application for learning piano with two visualization modes (border lines on all the keys (Wireframe) versus solid color hiding the real keys (Solid)). The two visualization modes provided a satisfying experience. Even though there were no significant differences in the analyzed variables, the students preferred the Solid mode, indicating that short-distance Passthrough limitations should be minimized in application development.
{"title":"Passthrough Mixed Reality With Oculus Quest 2: A Case Study on Learning Piano","authors":"Mariano M. Banquiero, Gracia Valdeolivas, Sergio Trincado, Natasha Garcia, M. Juan","doi":"10.1109/MMUL.2022.3232892","DOIUrl":"https://doi.org/10.1109/MMUL.2022.3232892","url":null,"abstract":"Mixed reality (MR) in standalone headsets has many advantages over other types of devices. With the recent appearance of the Passthrough of Oculus Quest 2, new possibilities open up. This work details the features of the current Passthrough and how its potential was harnessed and its drawbacks minimized for developing a satisfying MR experience. It has been applied to learning to play the piano as a use case. A total of 33 piano students participated in a study to compare participants’ interpretation outcomes and subjective experience when using a MR application for learning piano with two visualization modes (border lines on all the keys (Wireframe) versus solid color hiding the real keys (Solid)). The two visualization modes provided a satisfying experience. Even though there were no significant differences in the analyzed variables, the students preferred the Solid mode, indicating that short-distance Passthrough limitations should be minimized in application development.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"60-69"},"PeriodicalIF":3.2,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45393533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1109/MMUL.2022.3232771
Xinjue Hu, Chen-chao Wang, Lin Zhang, Guo Chen, S. Shirmohammadi
Light field (LF), which describes the light rays that emanate at each point in a scene, can be used as a six-degrees-of-freedom (6DOF) immersive media. Similar to the traditional multiview video, LF is also captured by an array of cameras, leading to a large data volume that needs to be streamed from a server to users. When a user wishes to watch the scene from a viewpoint that no camera has captured directly, a virtual viewpoint must be rendered in real time from the directly captured viewpoints. This places high requirements on both the computing and caching capabilities of the infrastructure. Edge computing (EC), which brings computation resources closer to users, can be a promising enabler for real-time LF viewpoint rendering. In this article, we present a novel EC-assisted mobile LF delivery framework that is able to cache parts of LF viewpoints in advance and render the requested virtual viewpoints on demand at the edge node or user’s device. Numerical results demonstrate that the proposed framework can reduce the average service response latency by 45% and the energy consumption of user equipment by 60% at the cost of 55% additional caching consumption of edge nodes.
{"title":"Edge-Assisted Virtual Viewpoint Generation for Immersive Light Field","authors":"Xinjue Hu, Chen-chao Wang, Lin Zhang, Guo Chen, S. Shirmohammadi","doi":"10.1109/MMUL.2022.3232771","DOIUrl":"https://doi.org/10.1109/MMUL.2022.3232771","url":null,"abstract":"Light field (LF), which describes the light rays that emanate at each point in a scene, can be used as a six-degrees-of-freedom (6DOF) immersive media. Similar to the traditional multiview video, LF is also captured by an array of cameras, leading to a large data volume that needs to be streamed from a server to users. When a user wishes to watch the scene from a viewpoint that no camera has captured directly, a virtual viewpoint must be rendered in real time from the directly captured viewpoints. This places high requirements on both the computing and caching capabilities of the infrastructure. Edge computing (EC), which brings computation resources closer to users, can be a promising enabler for real-time LF viewpoint rendering. In this article, we present a novel EC-assisted mobile LF delivery framework that is able to cache parts of LF viewpoints in advance and render the requested virtual viewpoints on demand at the edge node or user’s device. Numerical results demonstrate that the proposed framework can reduce the average service response latency by 45% and the energy consumption of user equipment by 60% at the cost of 55% additional caching consumption of edge nodes.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"18-27"},"PeriodicalIF":3.2,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47837102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1109/MMUL.2023.3262195
The Van Le, Yong-hoon Choi, Jin Young Lee
Immersive multimedia has received a lot of attention because of its huge impact on user experience. To realize high immersion in virtual environments, many virtual views should be generated at arbitrary viewpoints with advanced display devices. However, specular regions, which affect user experience, have not been fully investigated in an immersive multimedia field. In this article, we propose specular highlight detection and rendering methods to improve immersion. For specular detection, a high-performance variational attention U-network (VAUnet), which combines a variational autoencoder and a spatial attention mechanism, is proposed with a hybrid loss function. The specular regions detected from VAUnet are compressed with an immersive video coding standard (MPEG-I), and then the rendering is performed by considering the decompressed specular regions. Extensive experiments demonstrate that the proposed method improves specular detection performance and subjective rendering quality.
{"title":"Specular Detection and Rendering for Immersive Multimedia","authors":"The Van Le, Yong-hoon Choi, Jin Young Lee","doi":"10.1109/MMUL.2023.3262195","DOIUrl":"https://doi.org/10.1109/MMUL.2023.3262195","url":null,"abstract":"Immersive multimedia has received a lot of attention because of its huge impact on user experience. To realize high immersion in virtual environments, many virtual views should be generated at arbitrary viewpoints with advanced display devices. However, specular regions, which affect user experience, have not been fully investigated in an immersive multimedia field. In this article, we propose specular highlight detection and rendering methods to improve immersion. For specular detection, a high-performance variational attention U-network (VAUnet), which combines a variational autoencoder and a spatial attention mechanism, is proposed with a hybrid loss function. The specular regions detected from VAUnet are compressed with an immersive video coding standard (MPEG-I), and then the rendering is performed by considering the decompressed specular regions. Extensive experiments demonstrate that the proposed method improves specular detection performance and subjective rendering quality.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"38-47"},"PeriodicalIF":3.2,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47676572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1109/MMUL.2023.3246528
Yao Xiao, Lei Xu, Can Zhang, Liehuang Zhu, Yan Zhang
The metaverse is an advanced digital world where users can have interactive and immersive experiences. Users enter the metaverse through digital objects created by extended reality and digital twin technologies. The ownership issue regarding these digital objects can be solved by the blockchain-based nonfungible token (NFT), which is of vital importance for the economics of the metaverse. Users can utilize NFTs to engage in various social and economic activities. However, current NFT protocols expose the owner’s information to the public, which may contradict with the privacy requirement. In this article, we propose a protocol, NFTPrivate, that can realize anonymous and confidential trading of digital objects. The key idea is to utilize cryptographic commitments to hide users’ addresses. By constructing proper zero-knowledge proofs, the owner can initiate privacy-preserving yet publicly verifiable transactions. Illustrative results show that the proposed protocol has higher computation and storage overhead than traditional NFT protocols. We think this is an acceptable compromise for privacy protection.
{"title":"Blockchain-Empowered Privacy-Preserving Digital Object Trading in the Metaverse","authors":"Yao Xiao, Lei Xu, Can Zhang, Liehuang Zhu, Yan Zhang","doi":"10.1109/MMUL.2023.3246528","DOIUrl":"https://doi.org/10.1109/MMUL.2023.3246528","url":null,"abstract":"The metaverse is an advanced digital world where users can have interactive and immersive experiences. Users enter the metaverse through digital objects created by extended reality and digital twin technologies. The ownership issue regarding these digital objects can be solved by the blockchain-based nonfungible token (NFT), which is of vital importance for the economics of the metaverse. Users can utilize NFTs to engage in various social and economic activities. However, current NFT protocols expose the owner’s information to the public, which may contradict with the privacy requirement. In this article, we propose a protocol, NFTPrivate, that can realize anonymous and confidential trading of digital objects. The key idea is to utilize cryptographic commitments to hide users’ addresses. By constructing proper zero-knowledge proofs, the owner can initiate privacy-preserving yet publicly verifiable transactions. Illustrative results show that the proposed protocol has higher computation and storage overhead than traditional NFT protocols. We think this is an acceptable compromise for privacy protection.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"81-90"},"PeriodicalIF":3.2,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47695734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1109/MMUL.2023.3247574
Zhi Wang, Jiangchuan Liu, Wenwu Zhu
Recent years have witnessed many immersive media services and applications, ranging from 360° video streaming to augmented and virtual reality (VR) and the recent metaverse experiences. These new applications usually have common features, including high fidelity, immersive interaction, and open data exchange between people and the environment. As an emerging paradigm, edge computing has become increasingly ready to support these features. We first show that a key to unleashing the power of edge computing for immersive multimedia is handling artificial intelligence models and data. Then, we present a framework that enables joint accuracy- and latency-aware edge intelligence, with adaptive deep learning model deployment and data streaming. We show that not only conventional mechanisms such as content placement and rate adaptation but also the emerging 360° and VR streaming can benefit from such edge intelligence.
{"title":"Edge Intelligence-Empowered Immersive Media","authors":"Zhi Wang, Jiangchuan Liu, Wenwu Zhu","doi":"10.1109/MMUL.2023.3247574","DOIUrl":"https://doi.org/10.1109/MMUL.2023.3247574","url":null,"abstract":"Recent years have witnessed many immersive media services and applications, ranging from 360° video streaming to augmented and virtual reality (VR) and the recent metaverse experiences. These new applications usually have common features, including high fidelity, immersive interaction, and open data exchange between people and the environment. As an emerging paradigm, edge computing has become increasingly ready to support these features. We first show that a key to unleashing the power of edge computing for immersive multimedia is handling artificial intelligence models and data. Then, we present a framework that enables joint accuracy- and latency-aware edge intelligence, with adaptive deep learning model deployment and data streaming. We show that not only conventional mechanisms such as content placement and rate adaptation but also the emerging 360° and VR streaming can benefit from such edge intelligence.","PeriodicalId":13240,"journal":{"name":"IEEE MultiMedia","volume":"30 1","pages":"8-17"},"PeriodicalIF":3.2,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41365093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}