Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287166
Sönke Südbeck, Thomas C. Krause, J. Ostermann
Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.
{"title":"Non-Line-of-Sight Time-Difference-of-Arrival Localization with Explicit Inclusion of Geometry Information in a Simple Diffraction Scenario","authors":"Sönke Südbeck, Thomas C. Krause, J. Ostermann","doi":"10.1109/MMSP48831.2020.9287166","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287166","url":null,"abstract":"Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129147058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287130
Wei‐Cheng Lee, Chih-Peng Chang, Wen-Hsiao Peng, H. Hang
This paper presents a detailed description of NCTU’s proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios in some cases. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.
{"title":"A Hybrid Layered Image Compressor with Deep-Learning Technique","authors":"Wei‐Cheng Lee, Chih-Peng Chang, Wen-Hsiao Peng, H. Hang","doi":"10.1109/MMSP48831.2020.9287130","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287130","url":null,"abstract":"This paper presents a detailed description of NCTU’s proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios in some cases. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132874692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287164
Eduardo Martínez-Enríquez, J. Portilla
Feature adjustment, understood as the process aimed at modifying at will global features of given signals, has cardinal importance for several signal processing applications, such as enhancement, restoration, style transfer, and synthesis. Despite of this, it has not yet been approached from a general, theory-grounded, perspective. This work proposes a new conceptual and practical methodology that we term Controlled Feature Adjustment (CFA). CFA provides methods for, given a set of parametric global features (scalar functions of discrete signals), (1) constructing a related set of deterministically decoupled features, and (2) adjusting these new features in a controlled way, i.e., each one independently of the others. We illustrate the application of CFA by devising a spectrally-based hierarchically decoupled feature set and applying it to obtain different types of image synthesis that are not achievable using traditional (coupled) feature sets.
{"title":"Controlled Feature Adjustment for Image Processing and Synthesis","authors":"Eduardo Martínez-Enríquez, J. Portilla","doi":"10.1109/MMSP48831.2020.9287164","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287164","url":null,"abstract":"Feature adjustment, understood as the process aimed at modifying at will global features of given signals, has cardinal importance for several signal processing applications, such as enhancement, restoration, style transfer, and synthesis. Despite of this, it has not yet been approached from a general, theory-grounded, perspective. This work proposes a new conceptual and practical methodology that we term Controlled Feature Adjustment (CFA). CFA provides methods for, given a set of parametric global features (scalar functions of discrete signals), (1) constructing a related set of deterministically decoupled features, and (2) adjusting these new features in a controlled way, i.e., each one independently of the others. We illustrate the application of CFA by devising a spectrally-based hierarchically decoupled feature set and applying it to obtain different types of image synthesis that are not achievable using traditional (coupled) feature sets.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131420127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287145
Yue Li, R. Mathew, D. Taubman
A highly scalable and compact representation of depth data is required in many applications, and it is especially critical for plenoptic multiview image compression frameworks that use depth information for novel view synthesis and interview prediction. Efficiently coding depth data can be difficult as it contains sharp discontinuities. Breakpoint-adaptive discrete wavelet transforms (BPA-DWT) currently being standardized as part of JPEG 2000 Part-17 extensions have been found suitable for coding spatial media with hard discontinuities. In this paper, we explore a modification to the original BPA-DWT by replacing the traditional constant extrapolation strategy with the newly proposed affine extrapolation for reconstructing depth data in the vicinity of discontinuities. We also present a depth reconstruction scheme that can directly decode the BPA-DWT coefficients and breakpoints onto a compact and scalable mesh-based representation which has many potential benefits over the sample-based description. For performing depth compensated view prediction, our proposed triangular mesh representation of the depth data is a natural fit for modern graphics architectures.
在许多应用中需要高度可扩展和紧凑的深度数据表示,这对于使用深度信息进行新视图合成和访谈预测的全光学多视图图像压缩框架尤为重要。有效地编码深度数据可能是困难的,因为它包含明显的不连续。断点自适应离散小波变换(BPA-DWT)目前作为JPEG 2000 part -17扩展的一部分被标准化,已经发现适合编码具有硬不连续的空间媒体。在本文中,我们探索了一种对原始bp - dwt的改进,用新提出的仿射外推法取代传统的常数外推策略,用于重建不连续区域附近的深度数据。我们还提出了一种深度重建方案,该方案可以直接将BPA-DWT系数和断点解码为紧凑且可扩展的基于网格的表示,与基于样本的描述相比,它具有许多潜在的优点。对于执行深度补偿视图预测,我们提出的深度数据的三角形网格表示非常适合现代图形架构。
{"title":"Scalable Mesh Representation for Depth from Breakpoint-Adaptive Wavelet Coding","authors":"Yue Li, R. Mathew, D. Taubman","doi":"10.1109/MMSP48831.2020.9287145","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287145","url":null,"abstract":"A highly scalable and compact representation of depth data is required in many applications, and it is especially critical for plenoptic multiview image compression frameworks that use depth information for novel view synthesis and interview prediction. Efficiently coding depth data can be difficult as it contains sharp discontinuities. Breakpoint-adaptive discrete wavelet transforms (BPA-DWT) currently being standardized as part of JPEG 2000 Part-17 extensions have been found suitable for coding spatial media with hard discontinuities. In this paper, we explore a modification to the original BPA-DWT by replacing the traditional constant extrapolation strategy with the newly proposed affine extrapolation for reconstructing depth data in the vicinity of discontinuities. We also present a depth reconstruction scheme that can directly decode the BPA-DWT coefficients and breakpoints onto a compact and scalable mesh-based representation which has many potential benefits over the sample-based description. For performing depth compensated view prediction, our proposed triangular mesh representation of the depth data is a natural fit for modern graphics architectures.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124784874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287123
Hiba Yousef, J. L. Feuvre, Alexandre Storelli
With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier.
{"title":"ABR prediction using supervised learning algorithms","authors":"Hiba Yousef, J. L. Feuvre, Alexandre Storelli","doi":"10.1109/MMSP48831.2020.9287123","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287123","url":null,"abstract":"With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122764206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287117
Kerem Durak, Mehmet N. Akcay, Yigit K. Erinc, Boran Pekel, A. Begen
In its annual developers conference in June 2019, Apple has announced a backwards-compatible extension to its popular HTTP Live Streaming (HLS) protocol to enable low-latency live streaming. This extension offers new features such as the ability to generate partial segments, use playlist delta updates, block playlist reload and provide rendition reports. Compared to the traditional HLS, these features require new capabilities on the origin servers and the caches inside a content delivery network. While HLS has been known to perform great at scale, its low-latency extension is likely to consume considerable server and network resources, and this may raise concerns about its scalability. In this paper, we make the first attempt to understand how this new extension works and performs. We also provide a 1:1 comparison against the low-latency DASH approach, which is the competing low-latency solution developed as an open standard.
{"title":"Evaluating the Performance of Apple’s Low-Latency HLS","authors":"Kerem Durak, Mehmet N. Akcay, Yigit K. Erinc, Boran Pekel, A. Begen","doi":"10.1109/MMSP48831.2020.9287117","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287117","url":null,"abstract":"In its annual developers conference in June 2019, Apple has announced a backwards-compatible extension to its popular HTTP Live Streaming (HLS) protocol to enable low-latency live streaming. This extension offers new features such as the ability to generate partial segments, use playlist delta updates, block playlist reload and provide rendition reports. Compared to the traditional HLS, these features require new capabilities on the origin servers and the caches inside a content delivery network. While HLS has been known to perform great at scale, its low-latency extension is likely to consume considerable server and network resources, and this may raise concerns about its scalability. In this paper, we make the first attempt to understand how this new extension works and performs. We also provide a 1:1 comparison against the low-latency DASH approach, which is the competing low-latency solution developed as an open standard.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121500281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287053
Kévin Riou, Jingwen Zhu, Suiyi Ling, Mathis Piquet, V. Truffault, P. Callet
Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation strategies outperform the commonly used ones.
{"title":"Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest","authors":"Kévin Riou, Jingwen Zhu, Suiyi Ling, Mathis Piquet, V. Truffault, P. Callet","doi":"10.1109/MMSP48831.2020.9287053","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287053","url":null,"abstract":"Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation strategies outperform the commonly used ones.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"413 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115953895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287062
Melissa Sanabria, F. Precioso, Thomas Menguy
Currently, in broadcast companies many human operators select which actions should belong to the summary based on multiple rules they have built upon their own experience using different sources of information. These rules define the different profiles of actions of interest that help the operator to generate better customized summaries. Most of these profiles do not directly rely on broadcast video content but rather exploit metadata describing the course of the match. In this paper, we show how the signals produced by the attention layer of a recurrent neural network can be seen as a learned representation of these action profiles and provide a new tool to support operators’ work. The results in soccer matches show the capacity of our approach to transfer knowledge between datasets from different broadcasting companies, from different leagues, and the ability of the attention layer to learn meaningful action profiles.
{"title":"Profiling Actions for Sport Video Summarization: An attention signal analysis","authors":"Melissa Sanabria, F. Precioso, Thomas Menguy","doi":"10.1109/MMSP48831.2020.9287062","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287062","url":null,"abstract":"Currently, in broadcast companies many human operators select which actions should belong to the summary based on multiple rules they have built upon their own experience using different sources of information. These rules define the different profiles of actions of interest that help the operator to generate better customized summaries. Most of these profiles do not directly rely on broadcast video content but rather exploit metadata describing the course of the match. In this paper, we show how the signals produced by the attention layer of a recurrent neural network can be seen as a learned representation of these action profiles and provide a new tool to support operators’ work. The results in soccer matches show the capacity of our approach to transfer knowledge between datasets from different broadcasting companies, from different leagues, and the ability of the attention layer to learn meaningful action profiles.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125857816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287066
U. A. Alma, M. Altinsoy
In this study, suitability of recorded and simplified texture vibrations are evaluated according to visual textures displayed on a screen. The tested vibrations are 1) recorded vibration, 2) single sinusoids, and 3) band-limited white noise which were used in the previous work. In the former study, suitability of texture vibrations were evaluated according to real textures by touching. Nevertheless, texture vibrations should be also tested based on texture images considering the fact that users interact with only virtual (visual) objects on touch devices. Thus, the aim of this study is to assess the congruence between the vibrotactile feedback and the texture images with the absence and the presence of auditory feedback. Two types of auditory feedback were used for the trimodal test, and they were tested in different loudness levels. Therefore, the most plausible combination of vibrotactile and audio stimuli when exploring the visual textures can be determined. Based on the psychophysical tests, the similarity ratings of the texture vibrations were not concluded significantly different from each other in bimodal condition as opposed to the former study. In the trimodal judgments, synthesized sound influenced the similarity ratings significantly while touch sound did not affect the perceived similarity.
{"title":"The Suitability of Texture Vibrations Based on Visually Perceived Virtual Textures in Bimodal and Trimodal Conditions","authors":"U. A. Alma, M. Altinsoy","doi":"10.1109/MMSP48831.2020.9287066","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287066","url":null,"abstract":"In this study, suitability of recorded and simplified texture vibrations are evaluated according to visual textures displayed on a screen. The tested vibrations are 1) recorded vibration, 2) single sinusoids, and 3) band-limited white noise which were used in the previous work. In the former study, suitability of texture vibrations were evaluated according to real textures by touching. Nevertheless, texture vibrations should be also tested based on texture images considering the fact that users interact with only virtual (visual) objects on touch devices. Thus, the aim of this study is to assess the congruence between the vibrotactile feedback and the texture images with the absence and the presence of auditory feedback. Two types of auditory feedback were used for the trimodal test, and they were tested in different loudness levels. Therefore, the most plausible combination of vibrotactile and audio stimuli when exploring the visual textures can be determined. Based on the psychophysical tests, the similarity ratings of the texture vibrations were not concluded significantly different from each other in bimodal condition as opposed to the former study. In the trimodal judgments, synthesized sound influenced the similarity ratings significantly while touch sound did not affect the perceived similarity.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114308865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287070
Daniela Lanz, A. Kaup
Efficient lossless coding of medical volume data with temporal axis can be achieved by motion compensated wavelet lifting. As side benefit, a scalable bit stream is generated, which allows for displaying the data at different resolution layers, highly demanded for telemedicine applications. Additionally, the similarity of the temporal base layer to the input sequence is preserved by the use of motion compensated temporal filtering. However, for medical sequences the overall rate is increased due to the specific noise characteristics of the data. The use of denoising filters inside the lifting structure can improve the compression efficiency significantly without endangering the property of perfect reconstruction. However, the design of an optimum filter is a crucial task. In this paper, we present a new method for selecting the optimal filter strength for a certain denoising filter in a rate-distortion sense. This allows to minimize the required rate based on a single input parameter for the encoder to control the requested distortion of the temporal base layer.
{"title":"Optimizing Rate-Distortion Performance of Motion Compensated Wavelet Lifting with Denoised Prediction and Update","authors":"Daniela Lanz, A. Kaup","doi":"10.1109/MMSP48831.2020.9287070","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287070","url":null,"abstract":"Efficient lossless coding of medical volume data with temporal axis can be achieved by motion compensated wavelet lifting. As side benefit, a scalable bit stream is generated, which allows for displaying the data at different resolution layers, highly demanded for telemedicine applications. Additionally, the similarity of the temporal base layer to the input sequence is preserved by the use of motion compensated temporal filtering. However, for medical sequences the overall rate is increased due to the specific noise characteristics of the data. The use of denoising filters inside the lifting structure can improve the compression efficiency significantly without endangering the property of perfect reconstruction. However, the design of an optimum filter is a crucial task. In this paper, we present a new method for selecting the optimal filter strength for a certain denoising filter in a rate-distortion sense. This allows to minimize the required rate based on a single input parameter for the encoder to control the requested distortion of the temporal base layer.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}