Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051624
Catarina Brites, Vitor Gomes, J. Ascenso, F. Pereira
Substantial rate-distortion (RD) gains have been achieved in video coding standards by increasing the encoder complexity while maintaining the decoder complexity the lowest possible. On the other hand, the alternative distributed video coding (DVC) approach proposes to exploit the video redundancy mostly at the decoder side, keeping the encoder as simple as possible. One of the most characteristic DVC tools is the statistical reconstruction of the DCT coefficients, which plays a similar role to the inverse scalar quantization (ISQ) in predictive codecs. The main objective of this paper is to propose a statistical reconstruction approach for predictive coding (notably the H.264/AVC standard) as a substitute to ISQ, thus creating a coding architecture with a mix of predictive and distributed coding tools. Experimental results show that the proposed statistical reconstruction solution allows achieving Bjontegaard bitrate savings up to 2.4% regarding the ISQ based H.264/AVC High profile codec.
{"title":"Statistical reconstruction for predictive video coding","authors":"Catarina Brites, Vitor Gomes, J. Ascenso, F. Pereira","doi":"10.1109/VCIP.2014.7051624","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051624","url":null,"abstract":"Substantial rate-distortion (RD) gains have been achieved in video coding standards by increasing the encoder complexity while maintaining the decoder complexity the lowest possible. On the other hand, the alternative distributed video coding (DVC) approach proposes to exploit the video redundancy mostly at the decoder side, keeping the encoder as simple as possible. One of the most characteristic DVC tools is the statistical reconstruction of the DCT coefficients, which plays a similar role to the inverse scalar quantization (ISQ) in predictive codecs. The main objective of this paper is to propose a statistical reconstruction approach for predictive coding (notably the H.264/AVC standard) as a substitute to ISQ, thus creating a coding architecture with a mix of predictive and distributed coding tools. Experimental results show that the proposed statistical reconstruction solution allows achieving Bjontegaard bitrate savings up to 2.4% regarding the ISQ based H.264/AVC High profile codec.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126447728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051489
Y. Sanchez, E. Grinshpun, David W. Faucher, T. Schierl, Sameerkumar Sharma
Dynamic Adaptive Streaming over HTTP (DASH) is becoming the de facto technique for video delivery, especially for VoD services. Although 3GPP has specified carriage of DASH over eMBMS for Live Streaming, eMBMS is not available everywhere (operators are starting service rollout) and it is only worthwhile for reasonably large number of users due to the static SFN resource allocation for those services. Thus, Live Streaming using DASH over unicast connections is still necessary, which may suffer from playback interruptions when the throughput varies since the low end-to-end latency for live streaming requires small buffers. In order to cope with network throughput variations in mobile networks, we propose the usage of scalable video coding, combining it with parallel TCP connections and prioritizing the most important data of the scalable video. We show that using LTE non-GBR bearers for prioritization playback interruptions can be avoided.
{"title":"Low latency DASH based streaming over LTE","authors":"Y. Sanchez, E. Grinshpun, David W. Faucher, T. Schierl, Sameerkumar Sharma","doi":"10.1109/VCIP.2014.7051489","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051489","url":null,"abstract":"Dynamic Adaptive Streaming over HTTP (DASH) is becoming the de facto technique for video delivery, especially for VoD services. Although 3GPP has specified carriage of DASH over eMBMS for Live Streaming, eMBMS is not available everywhere (operators are starting service rollout) and it is only worthwhile for reasonably large number of users due to the static SFN resource allocation for those services. Thus, Live Streaming using DASH over unicast connections is still necessary, which may suffer from playback interruptions when the throughput varies since the low end-to-end latency for live streaming requires small buffers. In order to cope with network throughput variations in mobile networks, we propose the usage of scalable video coding, combining it with parallel TCP connections and prioritizing the most important data of the scalable video. We show that using LTE non-GBR bearers for prioritization playback interruptions can be avoided.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125727950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051577
Xianlong Lu, Chongyang Zhang, Xiaokang Yang
In this paper, we propose one online video object classification algorithm using fast Similarity Network Fusion (SNF). By constructing sample-similarity network for each data type and then efficiently fusing these networks into one single similarity network that represents the full spectrum of underlying data, SNF can efficiently identify subtypes among existing samples by clustering and predict labels for new samples based on the constructed network, which make it distinct in data integration or classification. The main problem of data online classification using SNF is its complexity. The proposed fast SNF (FSNF) in this work consists of two main steps: dividing the matrix into two parts and replacing the main part of testing matrix using the same part of training matrix. Since the main computation in SNF is to get the main part of matrix, this replacement can reduce most of the computation load. From the experiments based on online surveillance video object classification, it can be observed that: compared with SNF, the proposed FSNF can gain 16 times speed increasing with only 0.5%-0.6% accuracy losing; FSNF also significantly outperforms the existing traditional algorithms in classification accuracy.
{"title":"Online video object classification using fast similarity network fusion","authors":"Xianlong Lu, Chongyang Zhang, Xiaokang Yang","doi":"10.1109/VCIP.2014.7051577","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051577","url":null,"abstract":"In this paper, we propose one online video object classification algorithm using fast Similarity Network Fusion (SNF). By constructing sample-similarity network for each data type and then efficiently fusing these networks into one single similarity network that represents the full spectrum of underlying data, SNF can efficiently identify subtypes among existing samples by clustering and predict labels for new samples based on the constructed network, which make it distinct in data integration or classification. The main problem of data online classification using SNF is its complexity. The proposed fast SNF (FSNF) in this work consists of two main steps: dividing the matrix into two parts and replacing the main part of testing matrix using the same part of training matrix. Since the main computation in SNF is to get the main part of matrix, this replacement can reduce most of the computation load. From the experiments based on online surveillance video object classification, it can be observed that: compared with SNF, the proposed FSNF can gain 16 times speed increasing with only 0.5%-0.6% accuracy losing; FSNF also significantly outperforms the existing traditional algorithms in classification accuracy.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129858855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051534
M. Hasan, J. Arnold, M. Frater
Broadcasting of high definition stereoscopic 3D videos is growing rapidly because of increased demand in the mass consumer market. In spite of increasing consumer interest, poor quality, crosstalk or side effects and lack of defined broadcast standards has hampered the advancement of 3D displays. Real time transmission of 3DTV sequences over packet based networks may result in visual quality degradation due to packet loss and delay. In conventional 2D videos, different extrapolation and directional interpolation strategies have been used for concealing the missing blocks but in 3D, this is still an emerging field of research. Also, subjective testing is the most direct way to evaluate 3D quality and human perception on the concealed videos. This paper reviews state-of-the-art error concealment strategies and proposes a low complexity frame loss concealment method for a video decoder. Subjective testing on common 3D video sequences and its statistical comparison with existing concealment methods shows that the proposed method is efficient for the concealment of error frames of stereoscopic videos in terms of visual comfort and 3D quality.
{"title":"Subjective evaluation and statistical analysis for improved frame-loss error concealment of 3D videos","authors":"M. Hasan, J. Arnold, M. Frater","doi":"10.1109/VCIP.2014.7051534","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051534","url":null,"abstract":"Broadcasting of high definition stereoscopic 3D videos is growing rapidly because of increased demand in the mass consumer market. In spite of increasing consumer interest, poor quality, crosstalk or side effects and lack of defined broadcast standards has hampered the advancement of 3D displays. Real time transmission of 3DTV sequences over packet based networks may result in visual quality degradation due to packet loss and delay. In conventional 2D videos, different extrapolation and directional interpolation strategies have been used for concealing the missing blocks but in 3D, this is still an emerging field of research. Also, subjective testing is the most direct way to evaluate 3D quality and human perception on the concealed videos. This paper reviews state-of-the-art error concealment strategies and proposes a low complexity frame loss concealment method for a video decoder. Subjective testing on common 3D video sequences and its statistical comparison with existing concealment methods shows that the proposed method is efficient for the concealment of error frames of stereoscopic videos in terms of visual comfort and 3D quality.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128827087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051538
Heming Sun, Dajiang Zhou, Jiayi Zhu, S. Kimura, S. Goto
This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (TDCT). Compared to prior arts, this work reduces hardware cost by: reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure; and reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4K×2K 60fps video with a total hardware cost of 357,250um2 on 2-D IDCT and 80,988um2 on transpose memory in 90nm process.
{"title":"An area-efficient 4/8/16/32-point inverse DCT architecture for UHDTV HEVC decoder","authors":"Heming Sun, Dajiang Zhou, Jiayi Zhu, S. Kimura, S. Goto","doi":"10.1109/VCIP.2014.7051538","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051538","url":null,"abstract":"This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (TDCT). Compared to prior arts, this work reduces hardware cost by: reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure; and reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4K×2K 60fps video with a total hardware cost of 357,250um2 on 2-D IDCT and 80,988um2 on transpose memory in 90nm process.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051509
Edward Rosales, L. Guan
In this paper, a stereo matching algorithm using a window based frequency comparison method is formulated. The algorithm works with a local matching stereo model where a normalized cost function between frequency components and intensity values is used. The algorithm determines matching points in a stereo pair and uses a weighted cost function to determine the true disparity of the stereo pair. Unlike classical stereo correspondence algorithms that determine initial disparity maps through window based color intensity comparisons, the proposed algorithm uses window based frequency comparisons to exemplify the ability of frequency components to accurately find high detailed segments of the image. The algorithm is evaluated on the Middlebury data sets, and shows that it is noise and distortion resistant similar to the work in [1], thus allowing for higher reliability during comparisons. Additionally, this provides an advantage over typical color intensity comparisons as noise present in an image may cause mismatching when color intensity comparisons are executed.
{"title":"Stereo correspondence using an assisted discrete cosine transform method","authors":"Edward Rosales, L. Guan","doi":"10.1109/VCIP.2014.7051509","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051509","url":null,"abstract":"In this paper, a stereo matching algorithm using a window based frequency comparison method is formulated. The algorithm works with a local matching stereo model where a normalized cost function between frequency components and intensity values is used. The algorithm determines matching points in a stereo pair and uses a weighted cost function to determine the true disparity of the stereo pair. Unlike classical stereo correspondence algorithms that determine initial disparity maps through window based color intensity comparisons, the proposed algorithm uses window based frequency comparisons to exemplify the ability of frequency components to accurately find high detailed segments of the image. The algorithm is evaluated on the Middlebury data sets, and shows that it is noise and distortion resistant similar to the work in [1], thus allowing for higher reliability during comparisons. Additionally, this provides an advantage over typical color intensity comparisons as noise present in an image may cause mismatching when color intensity comparisons are executed.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127243309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051511
Jun Liu, Xiaojun Jing, Songlin Sun, Zifeng Lian
Gabor filters are one of the most successful methods for face recognition. However they dramatically increase the data volume for face representation. To extract compact and distinctive information, we propose the Variable Length Dominant Gabor Local Binary Pattern (VLD-GLBP) for face recognition. It significantly reduces the face representation data volume whereas the performance is comparable to that of the complex state-of-the-art techniques. Specifically, local binary pattern (LBP) features are first computed from the Gabor images. Then, the most frequently occurred patterns are extracted to form VLD-GLBP. Finally the distance between VLD-GLBPs is computed to realize the face image classification. The experiment results on FERET database verify the efficiency of the proposed VLD-GLBP method.
{"title":"Variable length dominant Gabor local binary pattern (VLD-GLBP) for face recognition","authors":"Jun Liu, Xiaojun Jing, Songlin Sun, Zifeng Lian","doi":"10.1109/VCIP.2014.7051511","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051511","url":null,"abstract":"Gabor filters are one of the most successful methods for face recognition. However they dramatically increase the data volume for face representation. To extract compact and distinctive information, we propose the Variable Length Dominant Gabor Local Binary Pattern (VLD-GLBP) for face recognition. It significantly reduces the face representation data volume whereas the performance is comparable to that of the complex state-of-the-art techniques. Specifically, local binary pattern (LBP) features are first computed from the Gabor images. Then, the most frequently occurred patterns are extracted to form VLD-GLBP. Finally the distance between VLD-GLBPs is computed to realize the face image classification. The experiment results on FERET database verify the efficiency of the proposed VLD-GLBP method.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117340437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051575
L. Toni, Thomas Maugey, P. Frossard
In multiview video services, multiple cameras acquire the same scene from different perspectives, which results in correlated video streams. This generates large amounts of highly redundant data, which need to be properly handled during encoding and transmission of the multi-view data. In this work, we study coding and transmission strategies in multicamera sets, where correlated sources need to be sent to a central server through a bottleneck channel, and eventually delivered to interactive clients. We propose a dynamic correlation-aware packet scheduling optimization under delay, bandwidth, and interactivity constraints. A novel trellis-based solution permits to formally decompose the multivariate optimization problem, thereby significantly reducing the computation complexity. Simulation results show the gain of the proposed algorithm compared to baseline scheduling policies.
{"title":"Packet scheduling in multicamera capture systems","authors":"L. Toni, Thomas Maugey, P. Frossard","doi":"10.1109/VCIP.2014.7051575","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051575","url":null,"abstract":"In multiview video services, multiple cameras acquire the same scene from different perspectives, which results in correlated video streams. This generates large amounts of highly redundant data, which need to be properly handled during encoding and transmission of the multi-view data. In this work, we study coding and transmission strategies in multicamera sets, where correlated sources need to be sent to a central server through a bottleneck channel, and eventually delivered to interactive clients. We propose a dynamic correlation-aware packet scheduling optimization under delay, bandwidth, and interactivity constraints. A novel trellis-based solution permits to formally decompose the multivariate optimization problem, thereby significantly reducing the computation complexity. Simulation results show the gain of the proposed algorithm compared to baseline scheduling policies.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127791091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051517
Xin Zhao, Ying Chen, Li Zhang
In the 3D video extension of H.264/AVC, namely 3D-AVC, Neighboring Based Disparity Vector (NBDV) derivation has been proposed to support multiview/stereo compatibility, therefore texture views can be decoded independently to depth views. NBDV generates a disparity vector for the current macroblock (MB) using the motion information of neighboring blocks, especially those coded with motion vectors pointing to inter-view reference pictures. In 3D-AVC, NBDV has been utilized to access minimum number of spatial and temporal neighboring blocks, therefore there is a high probability that NBDV does not derive an efficient disparity vector. This paper introduces a derived disparity vector scheme, wherein only one disparity vector derived from NBDV is maintained for the whole slice and it is used as the disparity vector of the current MB if NBDV does not derive one from neighboring blocks. Simulation results show that the proposed method provides 3.6% bit rate reduction for multiview coding.
{"title":"Derived disparity vector based NBDV for 3D-AVC","authors":"Xin Zhao, Ying Chen, Li Zhang","doi":"10.1109/VCIP.2014.7051517","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051517","url":null,"abstract":"In the 3D video extension of H.264/AVC, namely 3D-AVC, Neighboring Based Disparity Vector (NBDV) derivation has been proposed to support multiview/stereo compatibility, therefore texture views can be decoded independently to depth views. NBDV generates a disparity vector for the current macroblock (MB) using the motion information of neighboring blocks, especially those coded with motion vectors pointing to inter-view reference pictures. In 3D-AVC, NBDV has been utilized to access minimum number of spatial and temporal neighboring blocks, therefore there is a high probability that NBDV does not derive an efficient disparity vector. This paper introduces a derived disparity vector scheme, wherein only one disparity vector derived from NBDV is maintained for the whole slice and it is used as the disparity vector of the current MB if NBDV does not derive one from neighboring blocks. Simulation results show that the proposed method provides 3.6% bit rate reduction for multiview coding.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116894680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051503
Ruixiao Yao, Yanwei Liu, Jinxia Liu, Pinghua Zhao, S. Ci
Scalable video has natural advantages in adapting to the multi-channel wireless networks. And some existing works tried to further optimize the scalable video transmission by combining the crude layer-importance mapping with some extrinsic techniques, such as Forward Error Correction (FEC) and Adaptive Modulation and Coding (AMC). However, the intrinsic flexibility of scalable video streaming over the multichannel wireless networks was neglected. In this paper, we try to exploit the intrinsic flexibility by firstly analyzing the priorities of H.264/SVC video data at the network abstraction layer unit (NALU) level, and then designing the priority-validity delivery scheme for the scalable video streaming. With this strategy, the sub-stream extraction is intelligently adjusted according to the delivery history, and the more important data in a group of pictures (GOP) will be delivered through the more reliable channels. Experimental results also validate the strategy's effectiveness in improving the objective quality and perceptual experience of the received video.
{"title":"Intrinsic flexibility exploiting for scalable video streaming over multi-channel wireless networks","authors":"Ruixiao Yao, Yanwei Liu, Jinxia Liu, Pinghua Zhao, S. Ci","doi":"10.1109/VCIP.2014.7051503","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051503","url":null,"abstract":"Scalable video has natural advantages in adapting to the multi-channel wireless networks. And some existing works tried to further optimize the scalable video transmission by combining the crude layer-importance mapping with some extrinsic techniques, such as Forward Error Correction (FEC) and Adaptive Modulation and Coding (AMC). However, the intrinsic flexibility of scalable video streaming over the multichannel wireless networks was neglected. In this paper, we try to exploit the intrinsic flexibility by firstly analyzing the priorities of H.264/SVC video data at the network abstraction layer unit (NALU) level, and then designing the priority-validity delivery scheme for the scalable video streaming. With this strategy, the sub-stream extraction is intelligently adjusted according to the delivery history, and the more important data in a group of pictures (GOP) will be delivered through the more reliable channels. Experimental results also validate the strategy's effectiveness in improving the objective quality and perceptual experience of the received video.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131651113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}