首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Non-Line-of-Sight Time-Difference-of-Arrival Localization with Explicit Inclusion of Geometry Information in a Simple Diffraction Scenario 简单衍射场景中包含几何信息的非视距到达时间差定位
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287166
Sönke Südbeck, Thomas C. Krause, J. Ostermann
Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.
到达时间差(TDOA)定位是一种寻找波发射物体(例如汽车喇叭)位置的技术。针对视距条件下的TDOA定位,提出了许多算法。在非视距(NLOS)情况下,这些算法的性能通常会下降。有一些技术可以减少NLOS条件带来的误差,但是,这些技术并没有直接考虑到周围环境的几何信息。本文描述了一种将环境信息纳入方程系统的简单衍射场景的NLOS TDOA定位方法。通过三种不同扬声器位置的实验验证了该方法的有效性。定位误差小于源到最近麦克风位置距离的6.2%。仿真结果表明,该方法在足够低的TDOA噪声水平下达到了Cramer-Rao-Lower-Bound。
{"title":"Non-Line-of-Sight Time-Difference-of-Arrival Localization with Explicit Inclusion of Geometry Information in a Simple Diffraction Scenario","authors":"Sönke Südbeck, Thomas C. Krause, J. Ostermann","doi":"10.1109/MMSP48831.2020.9287166","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287166","url":null,"abstract":"Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129147058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Layered Image Compressor with Deep-Learning Technique 基于深度学习技术的混合分层图像压缩器
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287130
Wei‐Cheng Lee, Chih-Peng Chang, Wen-Hsiao Peng, H. Hang
This paper presents a detailed description of NCTU’s proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios in some cases. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.
本文详细描述了NCTU的基于学习的图像压缩提案,以响应JPEG人工智能呼吁证据挑战。提出的压缩系统以VVC内部编解码器为基础层,以基于学习的残差编解码器为增强层。后者旨在通过发送潜在残余信号来改善基础层的质量。特别地,采用基层引导注意模块将残差提取集中在关键高频区域。为了重建图像,通过基于神经网络的合成器以非线性方式将该潜在残余信号与基础层输出组合。所提出的方法在常见的客观指标方面显示出与单层VVC内部相当的率失真性能,但在某些情况下,特别是在高压缩比下,表现出更好的主观质量。它始终优于HEVC intra、JPEG 2000和JPEG。该系统采用16位浮点格式的18M网络参数。平均而言,在Intel Xeon Gold 6154上对图像进行编码大约需要13.5分钟,其中VVC基础层主导了编码运行时。相反,解码由残差解码器和合成器主导,每张图像需要31秒。
{"title":"A Hybrid Layered Image Compressor with Deep-Learning Technique","authors":"Wei‐Cheng Lee, Chih-Peng Chang, Wen-Hsiao Peng, H. Hang","doi":"10.1109/MMSP48831.2020.9287130","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287130","url":null,"abstract":"This paper presents a detailed description of NCTU’s proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios in some cases. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132874692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Controlled Feature Adjustment for Image Processing and Synthesis 图像处理与合成中的受控特征调整
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287164
Eduardo Martínez-Enríquez, J. Portilla
Feature adjustment, understood as the process aimed at modifying at will global features of given signals, has cardinal importance for several signal processing applications, such as enhancement, restoration, style transfer, and synthesis. Despite of this, it has not yet been approached from a general, theory-grounded, perspective. This work proposes a new conceptual and practical methodology that we term Controlled Feature Adjustment (CFA). CFA provides methods for, given a set of parametric global features (scalar functions of discrete signals), (1) constructing a related set of deterministically decoupled features, and (2) adjusting these new features in a controlled way, i.e., each one independently of the others. We illustrate the application of CFA by devising a spectrally-based hierarchically decoupled feature set and applying it to obtain different types of image synthesis that are not achievable using traditional (coupled) feature sets.
特征调整,被理解为旨在随意修改给定信号的全局特征的过程,对于几个信号处理应用具有至关重要的意义,例如增强,恢复,风格转移和合成。尽管如此,它还没有从一个一般的、理论基础的角度来研究。这项工作提出了一种新的概念和实用的方法,我们称之为受控特征调整(CFA)。CFA提供的方法是,给定一组参数全局特征(离散信号的标量函数),(1)构造一组相关的确定性解耦特征,(2)以可控的方式调整这些新特征,即每个特征独立于其他特征。我们通过设计一个基于光谱的分层解耦特征集来说明CFA的应用,并应用它来获得使用传统(耦合)特征集无法实现的不同类型的图像合成。
{"title":"Controlled Feature Adjustment for Image Processing and Synthesis","authors":"Eduardo Martínez-Enríquez, J. Portilla","doi":"10.1109/MMSP48831.2020.9287164","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287164","url":null,"abstract":"Feature adjustment, understood as the process aimed at modifying at will global features of given signals, has cardinal importance for several signal processing applications, such as enhancement, restoration, style transfer, and synthesis. Despite of this, it has not yet been approached from a general, theory-grounded, perspective. This work proposes a new conceptual and practical methodology that we term Controlled Feature Adjustment (CFA). CFA provides methods for, given a set of parametric global features (scalar functions of discrete signals), (1) constructing a related set of deterministically decoupled features, and (2) adjusting these new features in a controlled way, i.e., each one independently of the others. We illustrate the application of CFA by devising a spectrally-based hierarchically decoupled feature set and applying it to obtain different types of image synthesis that are not achievable using traditional (coupled) feature sets.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131420127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scalable Mesh Representation for Depth from Breakpoint-Adaptive Wavelet Coding 基于断点自适应小波编码的深度可扩展网格表示
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287145
Yue Li, R. Mathew, D. Taubman
A highly scalable and compact representation of depth data is required in many applications, and it is especially critical for plenoptic multiview image compression frameworks that use depth information for novel view synthesis and interview prediction. Efficiently coding depth data can be difficult as it contains sharp discontinuities. Breakpoint-adaptive discrete wavelet transforms (BPA-DWT) currently being standardized as part of JPEG 2000 Part-17 extensions have been found suitable for coding spatial media with hard discontinuities. In this paper, we explore a modification to the original BPA-DWT by replacing the traditional constant extrapolation strategy with the newly proposed affine extrapolation for reconstructing depth data in the vicinity of discontinuities. We also present a depth reconstruction scheme that can directly decode the BPA-DWT coefficients and breakpoints onto a compact and scalable mesh-based representation which has many potential benefits over the sample-based description. For performing depth compensated view prediction, our proposed triangular mesh representation of the depth data is a natural fit for modern graphics architectures.
在许多应用中需要高度可扩展和紧凑的深度数据表示,这对于使用深度信息进行新视图合成和访谈预测的全光学多视图图像压缩框架尤为重要。有效地编码深度数据可能是困难的,因为它包含明显的不连续。断点自适应离散小波变换(BPA-DWT)目前作为JPEG 2000 part -17扩展的一部分被标准化,已经发现适合编码具有硬不连续的空间媒体。在本文中,我们探索了一种对原始bp - dwt的改进,用新提出的仿射外推法取代传统的常数外推策略,用于重建不连续区域附近的深度数据。我们还提出了一种深度重建方案,该方案可以直接将BPA-DWT系数和断点解码为紧凑且可扩展的基于网格的表示,与基于样本的描述相比,它具有许多潜在的优点。对于执行深度补偿视图预测,我们提出的深度数据的三角形网格表示非常适合现代图形架构。
{"title":"Scalable Mesh Representation for Depth from Breakpoint-Adaptive Wavelet Coding","authors":"Yue Li, R. Mathew, D. Taubman","doi":"10.1109/MMSP48831.2020.9287145","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287145","url":null,"abstract":"A highly scalable and compact representation of depth data is required in many applications, and it is especially critical for plenoptic multiview image compression frameworks that use depth information for novel view synthesis and interview prediction. Efficiently coding depth data can be difficult as it contains sharp discontinuities. Breakpoint-adaptive discrete wavelet transforms (BPA-DWT) currently being standardized as part of JPEG 2000 Part-17 extensions have been found suitable for coding spatial media with hard discontinuities. In this paper, we explore a modification to the original BPA-DWT by replacing the traditional constant extrapolation strategy with the newly proposed affine extrapolation for reconstructing depth data in the vicinity of discontinuities. We also present a depth reconstruction scheme that can directly decode the BPA-DWT coefficients and breakpoints onto a compact and scalable mesh-based representation which has many potential benefits over the sample-based description. For performing depth compensated view prediction, our proposed triangular mesh representation of the depth data is a natural fit for modern graphics architectures.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124784874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ABR prediction using supervised learning algorithms 使用监督学习算法的ABR预测
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287123
Hiba Yousef, J. L. Feuvre, Alexandre Storelli
With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier.
随着互联网上视频流量的大量增加,HTTP自适应流媒体已经成为信息娱乐内容传输的主要技术。在这种情况下,出现了许多带宽自适应算法,每个算法都旨在使用不同的会话信息(如TCP吞吐量、缓冲区占用、下载时间等)来提高用户的QoE。尽管它们在执行上有所不同,但它们大多使用相同的输入来适应媒体会话的不同条件。在本文中,我们证明了有可能预测任何ABR算法的比特率决策,这要归功于机器学习技术,特别是监督分类。这种方法具有通用性,因此它不需要玩家ABR算法本身的任何知识,但假设无论背后的逻辑是什么,它都将使用一组通用的输入功能。然后,使用机器学习特征选择,可以预测相关特征,然后在实际观察上训练模型。我们使用著名的ABR算法模拟来测试我们的方法,然后我们使用不同的VoD和Live现实数据集在商业闭源播放器上验证结果。结果表明,随机森林和梯度增强在其他ml分类器中都取得了非常高的预测精度。
{"title":"ABR prediction using supervised learning algorithms","authors":"Hiba Yousef, J. L. Feuvre, Alexandre Storelli","doi":"10.1109/MMSP48831.2020.9287123","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287123","url":null,"abstract":"With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122764206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evaluating the Performance of Apple’s Low-Latency HLS 评估苹果低延迟HLS的性能
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287117
Kerem Durak, Mehmet N. Akcay, Yigit K. Erinc, Boran Pekel, A. Begen
In its annual developers conference in June 2019, Apple has announced a backwards-compatible extension to its popular HTTP Live Streaming (HLS) protocol to enable low-latency live streaming. This extension offers new features such as the ability to generate partial segments, use playlist delta updates, block playlist reload and provide rendition reports. Compared to the traditional HLS, these features require new capabilities on the origin servers and the caches inside a content delivery network. While HLS has been known to perform great at scale, its low-latency extension is likely to consume considerable server and network resources, and this may raise concerns about its scalability. In this paper, we make the first attempt to understand how this new extension works and performs. We also provide a 1:1 comparison against the low-latency DASH approach, which is the competing low-latency solution developed as an open standard.
在2019年6月的年度开发者大会上,苹果宣布了对其流行的HTTP直播(HLS)协议的向后兼容扩展,以实现低延迟的直播。这个扩展提供了新的功能,如生成部分片段,使用播放列表delta更新,阻止播放列表重新加载和提供再现报告的能力。与传统的HLS相比,这些特性需要原始服务器和内容交付网络中的缓存上的新功能。虽然HLS在规模上的性能很好,但它的低延迟扩展可能会消耗大量的服务器和网络资源,这可能会引起对其可伸缩性的担忧。在本文中,我们首次尝试理解这个新的扩展是如何工作和执行的。我们还提供了与低延迟DASH方法的1:1比较,后者是作为开放标准开发的竞争性低延迟解决方案。
{"title":"Evaluating the Performance of Apple’s Low-Latency HLS","authors":"Kerem Durak, Mehmet N. Akcay, Yigit K. Erinc, Boran Pekel, A. Begen","doi":"10.1109/MMSP48831.2020.9287117","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287117","url":null,"abstract":"In its annual developers conference in June 2019, Apple has announced a backwards-compatible extension to its popular HTTP Live Streaming (HLS) protocol to enable low-latency live streaming. This extension offers new features such as the ability to generate partial segments, use playlist delta updates, block playlist reload and provide rendition reports. Compared to the traditional HLS, these features require new capabilities on the origin servers and the caches inside a content delivery network. While HLS has been known to perform great at scale, its low-latency extension is likely to consume considerable server and network resources, and this may raise concerns about its scalability. In this paper, we make the first attempt to understand how this new extension works and performs. We also provide a 1:1 comparison against the low-latency DASH approach, which is the competing low-latency solution developed as an open standard.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121500281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest 现实生活中的少量目标检测:自动采集的案例研究
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287053
Kévin Riou, Jingwen Zhu, Suiyi Ling, Mathis Piquet, V. Truffault, P. Callet
Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation strategies outperform the commonly used ones.
COVID-19期间的禁闭对世界各地的农业造成了严重影响。作为一种高效的采收方案,基于目标检测和机器人采收的机械采收/自动采收成为迫切需要。在自动收获系统中,鲁棒的少镜头目标检测模型是瓶颈之一,因为系统需要处理新的蔬菜/水果类别,而收集所有新类别的大规模注释数据集是昂贵的。有许多由社区开发的小镜头目标检测模型。然而,它们是否可以直接用于现实生活中的农业应用仍然是一个问题,因为在常用的训练数据集和在现实生活中的农业场景中收集的图像之间存在上下文差距。为此,在本研究中,我们提出了一个新的黄瓜数据集,并提出了两种有助于弥合上下文差距的数据增强策略。实验结果表明:1)最先进的少镜头目标检测模型在新型“黄瓜”类别上表现不佳;2)所提增强策略优于常用增强策略。
{"title":"Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest","authors":"Kévin Riou, Jingwen Zhu, Suiyi Ling, Mathis Piquet, V. Truffault, P. Callet","doi":"10.1109/MMSP48831.2020.9287053","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287053","url":null,"abstract":"Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation strategies outperform the commonly used ones.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"413 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115953895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Profiling Actions for Sport Video Summarization: An attention signal analysis 分析运动视频摘要的动作:注意信号分析
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287062
Melissa Sanabria, F. Precioso, Thomas Menguy
Currently, in broadcast companies many human operators select which actions should belong to the summary based on multiple rules they have built upon their own experience using different sources of information. These rules define the different profiles of actions of interest that help the operator to generate better customized summaries. Most of these profiles do not directly rely on broadcast video content but rather exploit metadata describing the course of the match. In this paper, we show how the signals produced by the attention layer of a recurrent neural network can be seen as a learned representation of these action profiles and provide a new tool to support operators’ work. The results in soccer matches show the capacity of our approach to transfer knowledge between datasets from different broadcasting companies, from different leagues, and the ability of the attention layer to learn meaningful action profiles.
目前,在广播公司中,许多人工操作员根据自己使用不同信息源的经验建立的多个规则来选择哪些动作应该属于摘要。这些规则定义感兴趣的操作的不同概要文件,帮助操作员生成更好的自定义摘要。这些配置文件中的大多数并不直接依赖于广播视频内容,而是利用描述比赛过程的元数据。在本文中,我们展示了递归神经网络的注意层产生的信号如何被视为这些动作轮廓的学习表示,并提供了一种支持操作员工作的新工具。足球比赛的结果显示了我们的方法在来自不同广播公司、不同联赛的数据集之间转移知识的能力,以及注意力层学习有意义的动作概况的能力。
{"title":"Profiling Actions for Sport Video Summarization: An attention signal analysis","authors":"Melissa Sanabria, F. Precioso, Thomas Menguy","doi":"10.1109/MMSP48831.2020.9287062","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287062","url":null,"abstract":"Currently, in broadcast companies many human operators select which actions should belong to the summary based on multiple rules they have built upon their own experience using different sources of information. These rules define the different profiles of actions of interest that help the operator to generate better customized summaries. Most of these profiles do not directly rely on broadcast video content but rather exploit metadata describing the course of the match. In this paper, we show how the signals produced by the attention layer of a recurrent neural network can be seen as a learned representation of these action profiles and provide a new tool to support operators’ work. The results in soccer matches show the capacity of our approach to transfer knowledge between datasets from different broadcasting companies, from different leagues, and the ability of the attention layer to learn meaningful action profiles.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125857816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Suitability of Texture Vibrations Based on Visually Perceived Virtual Textures in Bimodal and Trimodal Conditions 基于视觉感知虚拟纹理的双峰和三峰条件下纹理振动的适宜性
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287066
U. A. Alma, M. Altinsoy
In this study, suitability of recorded and simplified texture vibrations are evaluated according to visual textures displayed on a screen. The tested vibrations are 1) recorded vibration, 2) single sinusoids, and 3) band-limited white noise which were used in the previous work. In the former study, suitability of texture vibrations were evaluated according to real textures by touching. Nevertheless, texture vibrations should be also tested based on texture images considering the fact that users interact with only virtual (visual) objects on touch devices. Thus, the aim of this study is to assess the congruence between the vibrotactile feedback and the texture images with the absence and the presence of auditory feedback. Two types of auditory feedback were used for the trimodal test, and they were tested in different loudness levels. Therefore, the most plausible combination of vibrotactile and audio stimuli when exploring the visual textures can be determined. Based on the psychophysical tests, the similarity ratings of the texture vibrations were not concluded significantly different from each other in bimodal condition as opposed to the former study. In the trimodal judgments, synthesized sound influenced the similarity ratings significantly while touch sound did not affect the perceived similarity.
在本研究中,根据屏幕上显示的视觉纹理来评估记录和简化的纹理振动的适用性。测试的振动是1)记录振动,2)单正弦波和3)在之前的工作中使用的带限白噪声。在之前的研究中,纹理振动的适宜性是根据真实纹理通过触摸来评估的。然而,考虑到用户在触摸设备上只与虚拟(视觉)对象交互,纹理振动也应该基于纹理图像进行测试。因此,本研究的目的是评估触觉振动反馈与纹理图像在听觉反馈缺失和存在情况下的一致性。三模态测试使用两种类型的听觉反馈,并在不同的响度水平下进行测试。因此,在探索视觉纹理时,振动触觉和音频刺激的最合理组合可以确定。在心理物理测试的基础上,双峰条件下织构振动的相似度评分与前一研究相比没有显著差异。在三模态判断中,合成声音显著影响相似性评级,而触摸声音对感知相似性没有影响。
{"title":"The Suitability of Texture Vibrations Based on Visually Perceived Virtual Textures in Bimodal and Trimodal Conditions","authors":"U. A. Alma, M. Altinsoy","doi":"10.1109/MMSP48831.2020.9287066","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287066","url":null,"abstract":"In this study, suitability of recorded and simplified texture vibrations are evaluated according to visual textures displayed on a screen. The tested vibrations are 1) recorded vibration, 2) single sinusoids, and 3) band-limited white noise which were used in the previous work. In the former study, suitability of texture vibrations were evaluated according to real textures by touching. Nevertheless, texture vibrations should be also tested based on texture images considering the fact that users interact with only virtual (visual) objects on touch devices. Thus, the aim of this study is to assess the congruence between the vibrotactile feedback and the texture images with the absence and the presence of auditory feedback. Two types of auditory feedback were used for the trimodal test, and they were tested in different loudness levels. Therefore, the most plausible combination of vibrotactile and audio stimuli when exploring the visual textures can be determined. Based on the psychophysical tests, the similarity ratings of the texture vibrations were not concluded significantly different from each other in bimodal condition as opposed to the former study. In the trimodal judgments, synthesized sound influenced the similarity ratings significantly while touch sound did not affect the perceived similarity.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114308865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimizing Rate-Distortion Performance of Motion Compensated Wavelet Lifting with Denoised Prediction and Update 基于去噪预测和更新的运动补偿小波提升率失真性能优化
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287070
Daniela Lanz, A. Kaup
Efficient lossless coding of medical volume data with temporal axis can be achieved by motion compensated wavelet lifting. As side benefit, a scalable bit stream is generated, which allows for displaying the data at different resolution layers, highly demanded for telemedicine applications. Additionally, the similarity of the temporal base layer to the input sequence is preserved by the use of motion compensated temporal filtering. However, for medical sequences the overall rate is increased due to the specific noise characteristics of the data. The use of denoising filters inside the lifting structure can improve the compression efficiency significantly without endangering the property of perfect reconstruction. However, the design of an optimum filter is a crucial task. In this paper, we present a new method for selecting the optimal filter strength for a certain denoising filter in a rate-distortion sense. This allows to minimize the required rate based on a single input parameter for the encoder to control the requested distortion of the temporal base layer.
采用运动补偿小波提升的方法,可以实现具有时间轴的医学体数据的高效无损编码。附带的好处是,生成了一个可扩展的比特流,允许以不同的分辨率层显示数据,这是远程医疗应用非常需要的。此外,通过使用运动补偿时间滤波来保持时间基层与输入序列的相似性。然而,对于医疗序列,由于数据的特定噪声特性,总体速率增加。在提升结构内部使用去噪滤波器,可以在不影响完美重构性能的前提下显著提高压缩效率。然而,优化滤波器的设计是一项至关重要的任务。本文提出了一种在率失真情况下对某一去噪滤波器选择最优滤波强度的新方法。这允许基于编码器的单个输入参数最小化所需的速率,以控制时间基层的请求失真。
{"title":"Optimizing Rate-Distortion Performance of Motion Compensated Wavelet Lifting with Denoised Prediction and Update","authors":"Daniela Lanz, A. Kaup","doi":"10.1109/MMSP48831.2020.9287070","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287070","url":null,"abstract":"Efficient lossless coding of medical volume data with temporal axis can be achieved by motion compensated wavelet lifting. As side benefit, a scalable bit stream is generated, which allows for displaying the data at different resolution layers, highly demanded for telemedicine applications. Additionally, the similarity of the temporal base layer to the input sequence is preserved by the use of motion compensated temporal filtering. However, for medical sequences the overall rate is increased due to the specific noise characteristics of the data. The use of denoising filters inside the lifting structure can improve the compression efficiency significantly without endangering the property of perfect reconstruction. However, the design of an optimum filter is a crucial task. In this paper, we present a new method for selecting the optimal filter strength for a certain denoising filter in a rate-distortion sense. This allows to minimize the required rate based on a single input parameter for the encoder to control the requested distortion of the temporal base layer.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1