Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287083
Scott Janus, J. Boyce, S. Bhatia, J. Tanner, Atul Divekar, Penne Lee
Multiplane Images (MPI) is a new approach for storing volumetric content. MPI represents a 3D scene within a view frustum with typically 32 planes of texture and transparency information per camera. MPI literature to date has been focused on still images but applying MPI to video will require substantial compression in order to be viable for real world productions. In this paper, we describe several techniques for compressing MPI video sequences by reducing pixel rate while maintaining acceptable visual quality. We focus on using traditional video compression codecs such as HEVC. While certainly a new codec algorithm specifically tailored to MPI would likely achieve very good results, no such devices exist today that support this hypothetical MPI codec. By comparison, hundreds of millions of real-time HEVC decoders are present in laptops and TVs today.
{"title":"Multi-Plane Image Video Compression","authors":"Scott Janus, J. Boyce, S. Bhatia, J. Tanner, Atul Divekar, Penne Lee","doi":"10.1109/MMSP48831.2020.9287083","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287083","url":null,"abstract":"Multiplane Images (MPI) is a new approach for storing volumetric content. MPI represents a 3D scene within a view frustum with typically 32 planes of texture and transparency information per camera. MPI literature to date has been focused on still images but applying MPI to video will require substantial compression in order to be viable for real world productions. In this paper, we describe several techniques for compressing MPI video sequences by reducing pixel rate while maintaining acceptable visual quality. We focus on using traditional video compression codecs such as HEVC. While certainly a new codec algorithm specifically tailored to MPI would likely achieve very good results, no such devices exist today that support this hypothetical MPI codec. By comparison, hundreds of millions of real-time HEVC decoders are present in laptops and TVs today.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115664149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287167
R. G. Youvalari, J. Lainema
The Cross-Component Linear Model (CCLM) is an intra prediction technique that is adopted into the upcoming Versatile Video Coding (VVC) standard. CCLM attempts to reduce the inter-channel correlation by using a linear model. For that, the parameters of the model are calculated based on the reconstructed samples in luma channel as well as neighboring samples of the chroma coding block. In this paper, we propose a new method, called as Joint Cross-Component Linear Model (J-CCLM), in order to improve the prediction efficiency of the tool. The proposed J-CCLM technique predicts the samples of the coding block with a multi-hypothesis approach which consists of combining two intra prediction modes. To that end, the final prediction of the block is achieved by combining the conventional CCLM mode with an angular mode that is derived from the co-located luma block. The conducted experiments in VTM-8.0 test model of VVC illustrated that the proposed method provides on average more than 1.0% BD-Rate gain in chroma channels. Furthermore, the weighted YCbCr bitrate savings of 0.24% and 0.54% are achieved in 4:2:0 and 4:4:4 color formats, respectively.
跨分量线性模型(Cross-Component Linear Model, CCLM)是一种用于即将到来的通用视频编码(VVC)标准的帧内预测技术。CCLM试图通过使用线性模型来降低信道间的相关性。为此,基于亮度通道重构样本和色度编码块相邻样本计算模型参数。为了提高刀具的预测效率,本文提出了一种新的方法——联合交叉分量线性模型(J-CCLM)。提出的J-CCLM技术采用多假设方法对编码块样本进行预测,该方法结合了两种内预测模式。为此,通过将传统的CCLM模式与从共定位光斑块中导出的角模式相结合来实现块的最终预测。在VVC的VTM-8.0测试模型上进行的实验表明,该方法在色度通道上的BD-Rate平均增益大于1.0%。此外,在4:2:0和4:4:4颜色格式下,加权YCbCr比特率分别节省0.24%和0.54%。
{"title":"Joint Cross-Component Linear Model For Chroma Intra Prediction","authors":"R. G. Youvalari, J. Lainema","doi":"10.1109/MMSP48831.2020.9287167","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287167","url":null,"abstract":"The Cross-Component Linear Model (CCLM) is an intra prediction technique that is adopted into the upcoming Versatile Video Coding (VVC) standard. CCLM attempts to reduce the inter-channel correlation by using a linear model. For that, the parameters of the model are calculated based on the reconstructed samples in luma channel as well as neighboring samples of the chroma coding block. In this paper, we propose a new method, called as Joint Cross-Component Linear Model (J-CCLM), in order to improve the prediction efficiency of the tool. The proposed J-CCLM technique predicts the samples of the coding block with a multi-hypothesis approach which consists of combining two intra prediction modes. To that end, the final prediction of the block is achieved by combining the conventional CCLM mode with an angular mode that is derived from the co-located luma block. The conducted experiments in VTM-8.0 test model of VVC illustrated that the proposed method provides on average more than 1.0% BD-Rate gain in chroma channels. Furthermore, the weighted YCbCr bitrate savings of 0.24% and 0.54% are achieved in 4:2:0 and 4:4:4 color formats, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116603788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287057
Esmaeil Faramarzi, R. Joshi, M. Budagavi
Dynamic point clouds and meshes are used in a wide variety of applications such as gaming, visualization, medicine, and more recently AR/VR/MR. This paper presents two extensions of MPEG-I Video-based Point Cloud Compression (V-PCC) standard to support mesh coding. The extensions are based on Edgebreaker and TFAN mesh connectivity coding algorithms implemented in the Google Draco software and the MPEG SC3DMC software for mesh coding, respectively. Lossless results for the proposed frameworks on top of version 8.0 of the MPEG-I V-PCC test model (TMC2) are presented and compared with Draco for dense meshes.
{"title":"Mesh Coding Extensions to MPEG-I V-PCC","authors":"Esmaeil Faramarzi, R. Joshi, M. Budagavi","doi":"10.1109/MMSP48831.2020.9287057","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287057","url":null,"abstract":"Dynamic point clouds and meshes are used in a wide variety of applications such as gaming, visualization, medicine, and more recently AR/VR/MR. This paper presents two extensions of MPEG-I Video-based Point Cloud Compression (V-PCC) standard to support mesh coding. The extensions are based on Edgebreaker and TFAN mesh connectivity coding algorithms implemented in the Google Draco software and the MPEG SC3DMC software for mesh coding, respectively. Lossless results for the proposed frameworks on top of version 8.0 of the MPEG-I V-PCC test model (TMC2) are presented and compared with Draco for dense meshes.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130184671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287127
N. Hailu, Ingo Siegert, A. Nürnberger
To train end-to-end automatic speech recognition models, it requires a large amount of labeled speech data. This goal is challenging for languages with fewer resources. In contrast to the commonly used feature level data augmentation, we propose to expand the training set by using different audio codecs at the data level. The augmentation method consists of using different audio codecs with changed bit rate, sampling rate, and bit depth. The change reassures variation in the input data without drastically affecting the audio quality. Besides, we can ensure that humans still perceive the audio, and any feature extraction is possible later. To demonstrate the general applicability of the proposed augmentation technique, we evaluated it in an end-to-end automatic speech recognition architecture in four languages. After applying the method, on the Amharic, Dutch, Slovenian, and Turkish datasets, we achieved a 1.57 average improvement in the character error rates (CER) without integrating language models. The result is comparable to the baseline result, showing CER improvement of 2.78, 1.25, 1.21, and 1.05 for each language. On the Amharic dataset, we reached a syllable error rate reduction of 6.12 compared to the baseline result.
{"title":"Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation","authors":"N. Hailu, Ingo Siegert, A. Nürnberger","doi":"10.1109/MMSP48831.2020.9287127","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287127","url":null,"abstract":"To train end-to-end automatic speech recognition models, it requires a large amount of labeled speech data. This goal is challenging for languages with fewer resources. In contrast to the commonly used feature level data augmentation, we propose to expand the training set by using different audio codecs at the data level. The augmentation method consists of using different audio codecs with changed bit rate, sampling rate, and bit depth. The change reassures variation in the input data without drastically affecting the audio quality. Besides, we can ensure that humans still perceive the audio, and any feature extraction is possible later. To demonstrate the general applicability of the proposed augmentation technique, we evaluated it in an end-to-end automatic speech recognition architecture in four languages. After applying the method, on the Amharic, Dutch, Slovenian, and Turkish datasets, we achieved a 1.57 average improvement in the character error rates (CER) without integrating language models. The result is comparable to the baseline result, showing CER improvement of 2.78, 1.25, 1.21, and 1.05 for each language. On the Amharic dataset, we reached a syllable error rate reduction of 6.12 compared to the baseline result.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130966940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287060
André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira
Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.
{"title":"Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability","authors":"André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira","doi":"10.1109/MMSP48831.2020.9287060","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287060","url":null,"abstract":"Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125395526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287165
Chao Cao, C. Tulvan, M. Preda, T. Zaharia
With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.
{"title":"Skeleton-based motion estimation for Point Cloud Compression","authors":"Chao Cao, C. Tulvan, M. Preda, T. Zaharia","doi":"10.1109/MMSP48831.2020.9287165","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287165","url":null,"abstract":"With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124308155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287080
Saman Zadtootaghaj, Nabajeet Barman, Rakesh Rao Ramachandra Rao, Steve Göring, M. Martini, A. Raake, S. Möller
Existing works in the field of quality assessment focus separately on gaming and non-gaming content. Along with the traditional modeling approaches, deep learning based approaches have been used to develop quality models, due to their high prediction accuracy. In this paper, we present a deep learning based quality estimation model considering both gaming and non-gaming videos. The model is developed in three phases. First, a convolutional neural network (CNN) is trained based on an objective metric which allows the CNN to learn video artifacts such as blurriness and blockiness. Next, the model is fine-tuned based on a small image quality dataset using blockiness and blurriness ratings. Finally, a Random Forest is used to pool frame-level predictions and temporal information of videos in order to predict the overall video quality. The light-weight, low complexity nature of the model makes it suitable for real-time applications considering both gaming and non-gaming content while achieving similar performance to existing state-of-the-art model NDNetGaming. The model implementation for testing is available on GitHub1.
{"title":"DEMI: Deep Video Quality Estimation Model using Perceptual Video Quality Dimensions","authors":"Saman Zadtootaghaj, Nabajeet Barman, Rakesh Rao Ramachandra Rao, Steve Göring, M. Martini, A. Raake, S. Möller","doi":"10.1109/MMSP48831.2020.9287080","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287080","url":null,"abstract":"Existing works in the field of quality assessment focus separately on gaming and non-gaming content. Along with the traditional modeling approaches, deep learning based approaches have been used to develop quality models, due to their high prediction accuracy. In this paper, we present a deep learning based quality estimation model considering both gaming and non-gaming videos. The model is developed in three phases. First, a convolutional neural network (CNN) is trained based on an objective metric which allows the CNN to learn video artifacts such as blurriness and blockiness. Next, the model is fine-tuned based on a small image quality dataset using blockiness and blurriness ratings. Finally, a Random Forest is used to pool frame-level predictions and temporal information of videos in order to predict the overall video quality. The light-weight, low complexity nature of the model makes it suitable for real-time applications considering both gaming and non-gaming content while achieving similar performance to existing state-of-the-art model NDNetGaming. The model implementation for testing is available on GitHub1.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114807380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287093
J. Brandenburg, A. Wieckowski, Tobias Hinz, Anastasia Henkel, Valeri George, Ivan Zupancic, C. Stoffers, B. Bross, H. Schwarz, D. Marpe
Versatile Video Coding (VVC) is a new international video coding standard to be finalized in July 2020. It is designed to provide around 50% bit-rate saving at the same subjective visual quality over its predecessor, High Efficiency Video Coding (H.265/HEVC). During the standard development, objective bit-rate savings of around 40% have been reported for the VVC reference software (VTM) compared to the HEVC reference software (HM). The unoptimized VTM encoder is around 9x, and the decoder around 2x, slower than HM. This paper discusses the VVC encoder complexity in terms of soft-ware runtime. The modular design of the standard allows a VVC encoder to trade off bit-rate savings and encoder runtime. Based on a detailed tradeoff analysis, results for different operating points are reported. Additionally, initial work on software and algorithm optimization is presented. With the optimized software algorithms, an operating point with an over 22x faster single-threaded encoder runtime than VTM can be achieved, i.e. around 2.5x faster than HM, while still providing more than 30% bit-rate savings over HM. Finally, our experiments demonstrate the flexibility of VVC and its potential for optimized soft-ware encoder implementations.
{"title":"Towards Fast and Efficient VVC Encoding","authors":"J. Brandenburg, A. Wieckowski, Tobias Hinz, Anastasia Henkel, Valeri George, Ivan Zupancic, C. Stoffers, B. Bross, H. Schwarz, D. Marpe","doi":"10.1109/MMSP48831.2020.9287093","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287093","url":null,"abstract":"Versatile Video Coding (VVC) is a new international video coding standard to be finalized in July 2020. It is designed to provide around 50% bit-rate saving at the same subjective visual quality over its predecessor, High Efficiency Video Coding (H.265/HEVC). During the standard development, objective bit-rate savings of around 40% have been reported for the VVC reference software (VTM) compared to the HEVC reference software (HM). The unoptimized VTM encoder is around 9x, and the decoder around 2x, slower than HM. This paper discusses the VVC encoder complexity in terms of soft-ware runtime. The modular design of the standard allows a VVC encoder to trade off bit-rate savings and encoder runtime. Based on a detailed tradeoff analysis, results for different operating points are reported. Additionally, initial work on software and algorithm optimization is presented. With the optimized software algorithms, an operating point with an over 22x faster single-threaded encoder runtime than VTM can be achieved, i.e. around 2.5x faster than HM, while still providing more than 30% bit-rate savings over HM. Finally, our experiments demonstrate the flexibility of VVC and its potential for optimized soft-ware encoder implementations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124484794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287162
Aaro Altonen, Joni Räsänen, Jaakko Laitinen, Marko Viitanen, Jarno Vanne
Efficient transport technologies for High Efficiency Video Coding (HEVC) are key enablers for economic 4K video transmission in current telecommunication networks. This paper introduces a novel open-source Real-time Transport Protocol (RTP) library called uvgRTP for high-speed 4K HEVC video streaming. Our library supports the latest RFC 3550 specification for RTP and an associated RFC 7798 RTP payload format for HEVC. It is written in C++ under a permissive 2-clause BSD license and it can be run on both Linux and Windows operating systems with a user-friendly interface. Our experiments on an Intel Core i7-4770 CPU show that uvgRTP is able to stream HEVC video at 5.0 Gb/s over a local 10 Gb/s network. It attains 4.4 times as high peak goodput and 92.1% lower latency than the state-of-the-art FFmpeg multimedia framework. It also outperforms LIVE555 with over double the goodput and 82.3% lower latency. These results indicate that uvgRTP is currently the fastest open-source RTP library for 4K HEVC video streaming.
{"title":"Open-Source RTP Library for High-Speed 4K HEVC Video Streaming","authors":"Aaro Altonen, Joni Räsänen, Jaakko Laitinen, Marko Viitanen, Jarno Vanne","doi":"10.1109/MMSP48831.2020.9287162","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287162","url":null,"abstract":"Efficient transport technologies for High Efficiency Video Coding (HEVC) are key enablers for economic 4K video transmission in current telecommunication networks. This paper introduces a novel open-source Real-time Transport Protocol (RTP) library called uvgRTP for high-speed 4K HEVC video streaming. Our library supports the latest RFC 3550 specification for RTP and an associated RFC 7798 RTP payload format for HEVC. It is written in C++ under a permissive 2-clause BSD license and it can be run on both Linux and Windows operating systems with a user-friendly interface. Our experiments on an Intel Core i7-4770 CPU show that uvgRTP is able to stream HEVC video at 5.0 Gb/s over a local 10 Gb/s network. It attains 4.4 times as high peak goodput and 92.1% lower latency than the state-of-the-art FFmpeg multimedia framework. It also outperforms LIVE555 with over double the goodput and 82.3% lower latency. These results indicate that uvgRTP is currently the fastest open-source RTP library for 4K HEVC video streaming.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124222832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287072
Zeyu Jiang, Xun Xu, Chao Zhang, Ce Zhu
Defocus blur detection is a challenging task because of obscure homogenous regions and interferences of background clutter. Most existing deep learning-based methods mainly focus on building wider or deeper network to capture multi-level features, neglecting to extract the feature relationships of intermediate layers, thus hindering the discriminative ability of network. Moreover, fusing features at different levels have been demonstrated to be effective. However, direct integrating without distinction is not optimal because low-level features focus on fine details only and could be distracted by background clutters. To address these issues, we propose the Multi-Attention Network for stronger discriminative learning and spatial guided low-level feature learning. Specifically, a channel-wise attention module is applied to both high-level and low-level feature maps to capture channel-wise global dependencies. In addition, a spatial attention module is employed to low-level features maps to emphasize effective detailed information. Experimental results show the performance of our network is superior to the state-of-the-art algorithms.
{"title":"MultiANet: a Multi-Attention Network for Defocus Blur Detection","authors":"Zeyu Jiang, Xun Xu, Chao Zhang, Ce Zhu","doi":"10.1109/MMSP48831.2020.9287072","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287072","url":null,"abstract":"Defocus blur detection is a challenging task because of obscure homogenous regions and interferences of background clutter. Most existing deep learning-based methods mainly focus on building wider or deeper network to capture multi-level features, neglecting to extract the feature relationships of intermediate layers, thus hindering the discriminative ability of network. Moreover, fusing features at different levels have been demonstrated to be effective. However, direct integrating without distinction is not optimal because low-level features focus on fine details only and could be distracted by background clutters. To address these issues, we propose the Multi-Attention Network for stronger discriminative learning and spatial guided low-level feature learning. Specifically, a channel-wise attention module is applied to both high-level and low-level feature maps to capture channel-wise global dependencies. In addition, a spatial attention module is employed to low-level features maps to emphasize effective detailed information. Experimental results show the performance of our network is superior to the state-of-the-art algorithms.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}