Paul Haimes, Tetsuaki Baba, Hiroya Suda, Kumiko Kushiyama
Fuji-chan is a simple ambient display device, which uses a wireless internet connection to monitor two important characteristics of Mount Fuji, Japan's largest mountain. This system utilises two internet-based data feeds to inform people of the weather conditions at the peak of the mountain, along with the current level of volcanic eruption risk. We consider the latter information in particular to be of great importance. These two data feeds are communicated via LEDs placed at the top and base of the device, along with aural output to indicate volcanic eruption warning levels. We also created a simple web interface for this information. By creating this device and application, we aim to reimagine how geospatial information can be presented, while also creating something which is visually appealing. Through the demonstration of this multimodal system, we also aim to promote the idea of an "Internet of Beautiful Things", where IOT technology is applied to interactive artworks.
{"title":"Fuji-chan: A unique IoT ambient display for monitoring Mount Fuji's conditions","authors":"Paul Haimes, Tetsuaki Baba, Hiroya Suda, Kumiko Kushiyama","doi":"10.1145/3083187.3083223","DOIUrl":"https://doi.org/10.1145/3083187.3083223","url":null,"abstract":"Fuji-chan is a simple ambient display device, which uses a wireless internet connection to monitor two important characteristics of Mount Fuji, Japan's largest mountain. This system utilises two internet-based data feeds to inform people of the weather conditions at the peak of the mountain, along with the current level of volcanic eruption risk. We consider the latter information in particular to be of great importance. These two data feeds are communicated via LEDs placed at the top and base of the device, along with aural output to indicate volcanic eruption warning levels. We also created a simple web interface for this information. By creating this device and application, we aim to reimagine how geospatial information can be presented, while also creating something which is visually appealing. Through the demonstration of this multimodal system, we also aim to promote the idea of an \"Internet of Beautiful Things\", where IOT technology is applied to interactive artworks.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128741538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Combining advanced sensors and powerful processing capabilities smart-phone based augmented reality (AR) is becoming increasingly prolific. The increase in prominence of these resource hungry AR applications poses significant challenges to energy constrained environments such as mobile-phones.; AB@To that end we present a platform for offloading AR applications to powerful cloud servers. We implement this system using a thin-client design and explore its performance using the real world application Pokemon Go as a case study. We show that with careful design a thin client is capable of offloading much of the AR processing to a cloud server, with the results being streamed back. Our initial experiments show substantial energy savings, low latency and excellent image quality even at relatively low bit-rates.
{"title":"Towards Fully Offloaded Cloud-based AR: Design, Implementation and Experience","authors":"R. Shea, Andy Sun, Silvery Fu, Jiangchuan Liu","doi":"10.1145/3083187.3084012","DOIUrl":"https://doi.org/10.1145/3083187.3084012","url":null,"abstract":"Combining advanced sensors and powerful processing capabilities smart-phone based augmented reality (AR) is becoming increasingly prolific. The increase in prominence of these resource hungry AR applications poses significant challenges to energy constrained environments such as mobile-phones.; AB@To that end we present a platform for offloading AR applications to powerful cloud servers. We implement this system using a thin-client design and explore its performance using the real world application Pokemon Go as a case study. We show that with careful design a thin client is capable of offloading much of the AR processing to a cloud server, with the results being streamed back. Our initial experiments show substantial energy savings, low latency and excellent image quality even at relatively low bit-rates.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129612698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present DroneFace, an open dataset for testing how well face recognition can work on drones. Because of the high mobility, drones, i.e. unmanned aerial vehicles (UAVs), are appropriate for surveillance, daily patrol or seeking lost people on the streets, and thus need the capability of tracking human targets' faces from the air. Under this context, drones' distances and heights from the targets influence the accuracy of face recognition. In order to test whether a face recognition technique is suitable for drones, we establish DroneFace composed of facial images taken from various combinations of distances and heights for evaluating how a face recognition technique works in recognizing designated faces from the air. Since Face recognition is one of the most successful application in image analysis and understanding, and there exist many face recognition database for various purposes. To the best of our knowledge, DroneFace is the only dataset including facial images taken from controlled distances and heights within unconstrained environment, and can be valuable for future study of integrating face recognition techniques onto drones.
{"title":"DroneFace: An Open Dataset for Drone Research","authors":"Hwai-Jung Hsu, Kuan-Ta Chen","doi":"10.1145/3083187.3083214","DOIUrl":"https://doi.org/10.1145/3083187.3083214","url":null,"abstract":"In this paper, we present DroneFace, an open dataset for testing how well face recognition can work on drones. Because of the high mobility, drones, i.e. unmanned aerial vehicles (UAVs), are appropriate for surveillance, daily patrol or seeking lost people on the streets, and thus need the capability of tracking human targets' faces from the air. Under this context, drones' distances and heights from the targets influence the accuracy of face recognition. In order to test whether a face recognition technique is suitable for drones, we establish DroneFace composed of facial images taken from various combinations of distances and heights for evaluating how a face recognition technique works in recognizing designated faces from the air. Since Face recognition is one of the most successful application in image analysis and understanding, and there exist many face recognition database for various purposes. To the best of our knowledge, DroneFace is the only dataset including facial images taken from controlled distances and heights within unconstrained environment, and can be valuable for future study of integrating face recognition techniques onto drones.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130112124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Users' QoE (Quality of Experience) in Multi-sensorial, Immersive, Collaborative Environments (MICE) applications is mostly measured by psychometric studies. These studies provide a subjective insight into the performance of such applications. In this paper, we hypothesize that spatial coherence or the lack of it of the embedded virtual objects among users has a correlation to the QoE in MICE. We use Position Discrepancy (PD) to model this lack of spatial coherence in MICE. Based on that, we propose a Hierarchical Position Discrepancy Model (HPDM) that computes PD at multiple levels to derive the application/system-level PD as a measure of performance.; AB@Experimental results on an example task in MICE show that HPDM can objectively quantify the application performance and has a correlation to the psychometric study-based QoE measurements. We envisage HPDM can provide more insight on the MICE application without the need for extensive user study.
{"title":"Modeling User Quality of Experience (QoE) through Position Discrepancy in Multi-Sensorial, Immersive, Collaborative Environments","authors":"Shanthi Vellingiri, Prabhakaran Balakrishnan","doi":"10.1145/3083187.3084018","DOIUrl":"https://doi.org/10.1145/3083187.3084018","url":null,"abstract":"Users' QoE (Quality of Experience) in Multi-sensorial, Immersive, Collaborative Environments (MICE) applications is mostly measured by psychometric studies. These studies provide a subjective insight into the performance of such applications. In this paper, we hypothesize that spatial coherence or the lack of it of the embedded virtual objects among users has a correlation to the QoE in MICE. We use Position Discrepancy (PD) to model this lack of spatial coherence in MICE. Based on that, we propose a Hierarchical Position Discrepancy Model (HPDM) that computes PD at multiple levels to derive the application/system-level PD as a measure of performance.; AB@Experimental results on an example task in MICE show that HPDM can objectively quantify the application performance and has a correlation to the psychometric study-based QoE measurements. We envisage HPDM can provide more insight on the MICE application without the need for extensive user study.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Chu, Chris Bryan, Min Shih, Leonardo Ferrer, K. Ma
Immersive, stereoscopic visualization enables scientists to better analyze structural and physical phenomena compared to traditional display mediums. Unfortunately, current head-mounted displays (HMDs) with the high rendering quality necessary for these complex datasets are prohibitively expensive, especially in educational settings where their high cost makes it impractical to buy several devices. To address this problem, we develop two tools: (1) An authoring tool allows domain scientists to generate a set of connected, 360° video paths for traversing between dimensional keyframes in the dataset. (2) A corresponding navigational interface is a video selection and playback tool that can be paired with a low-cost HMD to enable an interactive, non-linear, storytelling experience. We demonstrate the authoring tool's utility by conducting several case studies and assess the navigational interface with a usability study. Results show the potential of our approach in effectively expanding the accessibility of high-quality, immersive visualization to a wider audience using affordable HMDs.
{"title":"Navigable Videos for Presenting Scientific Data on Affordable Head-Mounted Displays","authors":"J. Chu, Chris Bryan, Min Shih, Leonardo Ferrer, K. Ma","doi":"10.1145/3083187.3084015","DOIUrl":"https://doi.org/10.1145/3083187.3084015","url":null,"abstract":"Immersive, stereoscopic visualization enables scientists to better analyze structural and physical phenomena compared to traditional display mediums. Unfortunately, current head-mounted displays (HMDs) with the high rendering quality necessary for these complex datasets are prohibitively expensive, especially in educational settings where their high cost makes it impractical to buy several devices. To address this problem, we develop two tools: (1) An authoring tool allows domain scientists to generate a set of connected, 360° video paths for traversing between dimensional keyframes in the dataset. (2) A corresponding navigational interface is a video selection and playback tool that can be paired with a low-cost HMD to enable an interactive, non-linear, storytelling experience. We demonstrate the authoring tool's utility by conducting several case studies and assess the navigational interface with a usability study. Results show the potential of our approach in effectively expanding the accessibility of high-quality, immersive visualization to a wider audience using affordable HMDs.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125146979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitris Chatzopoulos, Carlos Bermejo, Zhanpeng Huang, Arailym Butabayeva, Rui Zheng, Morteza Golkarifard, P. Hui
We develop Hyperion a Wearable Augmented Reality (WAR) system based on Google Glass to access text information in the ambient environment. Hyperion is able to retrieve text content from users' current view and deliver the content to them in different ways according to their context. We design four work modalities for different situations that mobile users encounter in their daily activities. In addition, user interaction interfaces are provided to adapt to different application scenarios. Although Google Glass may be constrained by its poor computational capabilities and its limited battery capacity, we utilize code-level offloading to companion mobile devices to improve the runtime performance and the sustainability of WAR applications. System experiments show that Hyperion improves users ability to be aware of text information around them. Our prototype indicates promising potential of converging WAR technology and wearable devices such as Google Glass to improve people's daily activities.
{"title":"Hyperion: A Wearable Augmented Reality System for Text Extraction and Manipulation in the Air","authors":"Dimitris Chatzopoulos, Carlos Bermejo, Zhanpeng Huang, Arailym Butabayeva, Rui Zheng, Morteza Golkarifard, P. Hui","doi":"10.1145/3083187.3084017","DOIUrl":"https://doi.org/10.1145/3083187.3084017","url":null,"abstract":"We develop Hyperion a Wearable Augmented Reality (WAR) system based on Google Glass to access text information in the ambient environment. Hyperion is able to retrieve text content from users' current view and deliver the content to them in different ways according to their context. We design four work modalities for different situations that mobile users encounter in their daily activities. In addition, user interaction interfaces are provided to adapt to different application scenarios. Although Google Glass may be constrained by its poor computational capabilities and its limited battery capacity, we utilize code-level offloading to companion mobile devices to improve the runtime performance and the sustainability of WAR applications. System experiments show that Hyperion improves users ability to be aware of text information around them. Our prototype indicates promising potential of converging WAR technology and wearable devices such as Google Glass to improve people's daily activities.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127227701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinese herbal medicine (CHM) plays an important role of treatment in traditional Chinese medicine (TCM). Traditionally, CHM is used to restore the balance of the body for sick people and maintain health for common people. However, lack of the knowledge of the herbs may cause misuse of the herbs. In this demo, we will present a real-time smartphone application, which can not only recognize easily-confused herb based on Convolutional Neural Network (CNN), but also provide relevant information about the detected herbs. Our Chinese herb recognition system is implemented on a cloud server and can be used by the client user via smartphone. The recognition system is evaluated by 5-fold cross validation method and the accuracy is around 96%, which is adequate for real-world use.
{"title":"Recognition of Easily-confused TCM Herbs Using Deep Learning","authors":"Juei-Chun Weng, Min-Chun Hu, Kun-Chan Lan","doi":"10.1145/3083187.3083226","DOIUrl":"https://doi.org/10.1145/3083187.3083226","url":null,"abstract":"Chinese herbal medicine (CHM) plays an important role of treatment in traditional Chinese medicine (TCM). Traditionally, CHM is used to restore the balance of the body for sick people and maintain health for common people. However, lack of the knowledge of the herbs may cause misuse of the herbs. In this demo, we will present a real-time smartphone application, which can not only recognize easily-confused herb based on Convolutional Neural Network (CNN), but also provide relevant information about the detected herbs. Our Chinese herb recognition system is implemented on a cloud server and can be used by the client user via smartphone. The recognition system is evaluated by 5-fold cross validation method and the accuracy is around 96%, which is adequate for real-world use.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130473539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D Tele-Immersion systems allow geographically distributed users to interact in a virtual world using their "live" 3D models. The capture, reconstruction, transfer, and rendering of these models introduce significant latency into the system. Implicit Latency (ℒ') can be estimated using system clocks to measure the time after the data was received from the RGB-D camera, till the request to render the result. The Observed Latency (ℒ) between a real world event and the event being rendered on the display, cannot be accurately represented by ℒ' since ℒ' ignores the time taken to capture, or update the display, etc. In this paper, a Visual Pattern based Latency Estimation (VPLE) approach is introduced to calculate the real world visual latency of a system without the need for any custom hardware. VPLE generates a constantly changing pattern that is captured and rendered by the 3DTI system. An external observer records both the pattern and the rendered results at high frame rates. ℒ is estimated by calculating the difference between the generated and rendered patterns. VPLE is extended to allow ℒ estimation between geographically distributed sites. Evaluations show that the accuracy of VPLE depends on the refresh rate of the pattern, and is within 4ms. ℒ of a distributed 3DTI system implemented on the GPU is significantly lower than the CPU implementation, and is comparable to video streaming. It is also shown that the ℒ' estimates for GPU based 3DTI implementations are off by almost 100% compared to the ℒ.
{"title":"A Visual Latency Estimator for 3D Tele-Immersion","authors":"S. Raghuraman, K. Bahirat, B. Prabhakaran","doi":"10.1145/3083187.3084019","DOIUrl":"https://doi.org/10.1145/3083187.3084019","url":null,"abstract":"3D Tele-Immersion systems allow geographically distributed users to interact in a virtual world using their \"live\" 3D models. The capture, reconstruction, transfer, and rendering of these models introduce significant latency into the system. Implicit Latency (ℒ') can be estimated using system clocks to measure the time after the data was received from the RGB-D camera, till the request to render the result. The Observed Latency (ℒ) between a real world event and the event being rendered on the display, cannot be accurately represented by ℒ' since ℒ' ignores the time taken to capture, or update the display, etc. In this paper, a Visual Pattern based Latency Estimation (VPLE) approach is introduced to calculate the real world visual latency of a system without the need for any custom hardware. VPLE generates a constantly changing pattern that is captured and rendered by the 3DTI system. An external observer records both the pattern and the rendered results at high frame rates. ℒ is estimated by calculating the difference between the generated and rendered patterns. VPLE is extended to allow ℒ estimation between geographically distributed sites. Evaluations show that the accuracy of VPLE depends on the refresh rate of the pattern, and is within 4ms. ℒ of a distributed 3DTI system implemented on the GPU is significantly lower than the CPU implementation, and is comparable to video streaming. It is also shown that the ℒ' estimates for GPU based 3DTI implementations are off by almost 100% compared to the ℒ.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127583152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time entertainment services such as streaming audiovisual content deployed over the open, unmanaged Internet account now for more than 70% during peak periods. More and more such bandwidth hungry applications and services are proposed like immersive media services such as virtual reality and, specifically omnidirectional/360-degree videos. The adaptive streaming of omnidirectional video over HTTP imposes an important challenge on today's video delivery infrastructures which calls for dedicated, thoroughly designed techniques for content generation, delivery, and consumption.; AB@This paper describes the usage of tiles --- as specified within modern video codecs such HEVC/H.265 and VP9 --- enabling bandwidth efficient adaptive streaming of omnidirectional video over HTTP and we define various streaming strategies. Therefore, the parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR. The results indicate bitrate savings from 40% (in a realistic scenario with recorded head movements from real users) up to 65% (in an ideal scenario with a centered/fixed viewport) and serve as a baseline and guidelines for advanced techniques including the outline of a research roadmap for the near future.
{"title":"Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP: Design, Implementation, and Evaluation","authors":"M. Graf, C. Timmerer, Christopher Müller","doi":"10.1145/3083187.3084016","DOIUrl":"https://doi.org/10.1145/3083187.3084016","url":null,"abstract":"Real-time entertainment services such as streaming audiovisual content deployed over the open, unmanaged Internet account now for more than 70% during peak periods. More and more such bandwidth hungry applications and services are proposed like immersive media services such as virtual reality and, specifically omnidirectional/360-degree videos. The adaptive streaming of omnidirectional video over HTTP imposes an important challenge on today's video delivery infrastructures which calls for dedicated, thoroughly designed techniques for content generation, delivery, and consumption.; AB@This paper describes the usage of tiles --- as specified within modern video codecs such HEVC/H.265 and VP9 --- enabling bandwidth efficient adaptive streaming of omnidirectional video over HTTP and we define various streaming strategies. Therefore, the parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR. The results indicate bitrate savings from 40% (in a realistic scenario with recorded head movements from real users) up to 65% (in an ideal scenario with a centered/fixed viewport) and serve as a baseline and guidelines for advanced techniques including the outline of a research roadmap for the near future.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124686287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding how observers watch visual stimuli like Images and Videos has helped the multimedia encoding, transmission, quality assessment and rendering communities immensely, to learn the regions important to an observer and provide to him/her an optimum quality of experience. The problem is even more paramount in case of 360 degree stimuli considering that most/a part of the content might not be seen by the observers at all, while other regions maybe extraordinarily important. Attention studies in this area has however been missing, mainly due to the lack of a dataset and guidelines to evaluate and compare visual attention/saliency in such scenarios. In this work, we present a dataset of sixty different 360 degree images, each watched by at-least 40 observers. Additionally, we also provide guidelines and tools to the community regarding the procedure to evaluate and compare saliency in omni-directional images. Some basic image/ observer agnostic viewing characteristics, like variation of exploration strategies with time and expertise, and also the effect of eye-movement within the view-port are explored. The dataset and tools are made available for free use by the community and is expected to promote Reproducible Research for all future work on computational modeling of attention in 360 scenarios.
{"title":"A Dataset of Head and Eye Movements for 360 Degree Images","authors":"Yashas Rai, Jesús Gutiérrez, P. Callet","doi":"10.1145/3083187.3083218","DOIUrl":"https://doi.org/10.1145/3083187.3083218","url":null,"abstract":"Understanding how observers watch visual stimuli like Images and Videos has helped the multimedia encoding, transmission, quality assessment and rendering communities immensely, to learn the regions important to an observer and provide to him/her an optimum quality of experience. The problem is even more paramount in case of 360 degree stimuli considering that most/a part of the content might not be seen by the observers at all, while other regions maybe extraordinarily important. Attention studies in this area has however been missing, mainly due to the lack of a dataset and guidelines to evaluate and compare visual attention/saliency in such scenarios. In this work, we present a dataset of sixty different 360 degree images, each watched by at-least 40 observers. Additionally, we also provide guidelines and tools to the community regarding the procedure to evaluate and compare saliency in omni-directional images. Some basic image/ observer agnostic viewing characteristics, like variation of exploration strategies with time and expertise, and also the effect of eye-movement within the view-port are explored. The dataset and tools are made available for free use by the community and is expected to promote Reproducible Research for all future work on computational modeling of attention in 360 scenarios.","PeriodicalId":123321,"journal":{"name":"Proceedings of the 8th ACM on Multimedia Systems Conference","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126789858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}