Pub Date : 2024-06-07DOI: 10.1109/TBC.2024.3391025
Simin Keykhosravi;Ebrahim Bedeer
This paper investigates doubly-selective (i.e., time- and frequency-selective) channel estimation in faster-than-Nyquist (FTN) signaling HF communications. In particular, we propose a novel IM-based channel estimation algorithm for FTN signaling HF communications including pilot sequence placement (PSP) and pilot sequence location identification (PSLI) algorithms. At the transmitter, we propose the PSP algorithm that utilizes the locations of pilot sequences to carry additional information bits, thereby improving the SE of HF communications. HF channels have two non-zero independent fading paths with specific fixed delay spread and frequency spread characteristics as outlined in the Union Radio communication Sector (ITU-R) F.1487 and F.520. Having said that, based on the aforementioned properties of the HF channels and the favorable auto-correlation characteristics of the optimal pilot sequence, we propose a novel PSLI algorithm that effectively identifies the pilot sequence location within a given frame at the receiver. This is achieved by showing that the square of the absolute value of the cross-correlation between the received symbols and the pilot sequence consists of a scaled version of the square of the absolute value of the auto-correlation of the pilot sequence weighted by the gain of the corresponding HF channel path. Simulation results show very low pilot sequence location identification errors for HF channels. Our simulation results show a 6 dB improvement in the MSE of the channel estimation as well as about 3.5 dB BER improvement of FTN signaling along with an enhancement in SE compared to the method in Ishihara and Sugiura (2017). We also achieved an enhancement in SE compared to the work in Keykhosravi and Bedeer (2023) while maintaining comparable MSE of the channel estimation and BER performance.
{"title":"IM-Based Pilot-Assisted Channel Estimation for FTN Signaling HF Communications","authors":"Simin Keykhosravi;Ebrahim Bedeer","doi":"10.1109/TBC.2024.3391025","DOIUrl":"10.1109/TBC.2024.3391025","url":null,"abstract":"This paper investigates doubly-selective (i.e., time- and frequency-selective) channel estimation in faster-than-Nyquist (FTN) signaling HF communications. In particular, we propose a novel IM-based channel estimation algorithm for FTN signaling HF communications including pilot sequence placement (PSP) and pilot sequence location identification (PSLI) algorithms. At the transmitter, we propose the PSP algorithm that utilizes the locations of pilot sequences to carry additional information bits, thereby improving the SE of HF communications. HF channels have two non-zero independent fading paths with specific fixed delay spread and frequency spread characteristics as outlined in the Union Radio communication Sector (ITU-R) F.1487 and F.520. Having said that, based on the aforementioned properties of the HF channels and the favorable auto-correlation characteristics of the optimal pilot sequence, we propose a novel PSLI algorithm that effectively identifies the pilot sequence location within a given frame at the receiver. This is achieved by showing that the square of the absolute value of the cross-correlation between the received symbols and the pilot sequence consists of a scaled version of the square of the absolute value of the auto-correlation of the pilot sequence weighted by the gain of the corresponding HF channel path. Simulation results show very low pilot sequence location identification errors for HF channels. Our simulation results show a 6 dB improvement in the MSE of the channel estimation as well as about 3.5 dB BER improvement of FTN signaling along with an enhancement in SE compared to the method in Ishihara and Sugiura (2017). We also achieved an enhancement in SE compared to the work in Keykhosravi and Bedeer (2023) while maintaining comparable MSE of the channel estimation and BER performance.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"774-791"},"PeriodicalIF":3.2,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-03DOI: 10.1109/TBC.2024.3402380
George Henrique Maranhão Garcia de Oliveira;Gustavo de Melo Valeira;Cristiano Akamine
In Brazil, the Television (TV) 3.0 project has been underway since 2020 and is currently in its third phase. The aim of this project is to study, test and validate state-of-the-art technologies in order to define the techniques that will make up the next-generation of Brazilian Digital Terrestrial Television Broadcasting (DTTB) System. All the technologies involved in this system must be compatible with the transportation method defined in Phase 02 of the project: the Real-Time Object Delivery over Unidirectional Transport (ROUTE)/Dynamic Adaptive Streaming over HTTP (DASH) method from the Advanced Television Systems Committee (ATSC) 3.0 standard. Therefore, this paper proposes the use of the ROUTE/DASH transportation method in the Advanced Integrated Services Digital Broadcasting Terrestrial (ISDB-T) system, presenting the theory involved and the results obtained in the first transmission carried out involving the two aforementioned technologies.
{"title":"A Proposal to Use ROUTE/DASH in the Advanced ISDB-T","authors":"George Henrique Maranhão Garcia de Oliveira;Gustavo de Melo Valeira;Cristiano Akamine","doi":"10.1109/TBC.2024.3402380","DOIUrl":"10.1109/TBC.2024.3402380","url":null,"abstract":"In Brazil, the Television (TV) 3.0 project has been underway since 2020 and is currently in its third phase. The aim of this project is to study, test and validate state-of-the-art technologies in order to define the techniques that will make up the next-generation of Brazilian Digital Terrestrial Television Broadcasting (DTTB) System. All the technologies involved in this system must be compatible with the transportation method defined in Phase 02 of the project: the Real-Time Object Delivery over Unidirectional Transport (ROUTE)/Dynamic Adaptive Streaming over HTTP (DASH) method from the Advanced Television Systems Committee (ATSC) 3.0 standard. Therefore, this paper proposes the use of the ROUTE/DASH transportation method in the Advanced Integrated Services Digital Broadcasting Terrestrial (ISDB-T) system, presenting the theory involved and the results obtained in the first transmission carried out involving the two aforementioned technologies.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"935-944"},"PeriodicalIF":3.2,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The latest technological developments have fueled revolutionary changes and improvements in wireless communication systems. Among them, mmWave spectrum exploitation stands out for its ability to deliver ultra-high data rates. However, its full adoption beyond fifth generation multicast systems (5G+/6G) remains hampered, mainly due to mobility robustness issues. In this work, we propose a solution to address the problem of efficient sidelink-assisted multicasting in mobile multimode systems, specifically by considering the possibility of jointly utilizing sidelink/device-to-device (D2D), unicast, and multicast transmissions to improve service delivery. To overcome the complexity problem in finding the optimal solution for user-mode binding, we introduce a pre-optimization step called multicast group formation (MGF). Through a clustering technique based on unsupervised machine learning, MGF allows to reduce the complexity of solving the sidelink-assisted multiple modes mmWave (SA3M) problem. A detailed analysis of the impact of various system parameters on performance is conducted, and numerical evidence of the complexity/performance trade-off and its dependence on mobility patterns and user distribution is provided. Particularly, our proposed solution achieves a network throughput improvement of up to 32% over state-of-the-art schemes while ensuring the lowest computational time. Finally, the results demonstrate that an effective balance between power consumption and latency can be achieved through appropriate adjustments of transmit power and bandwidth.
{"title":"Beyond Complexity Limits: Machine Learning for Sidelink-Assisted mmWave Multicasting in 6G","authors":"Nadezhda Chukhno;Olga Chukhno;Sara Pizzi;Antonella Molinaro;Antonio Iera;Giuseppe Araniti","doi":"10.1109/TBC.2024.3382959","DOIUrl":"10.1109/TBC.2024.3382959","url":null,"abstract":"The latest technological developments have fueled revolutionary changes and improvements in wireless communication systems. Among them, mmWave spectrum exploitation stands out for its ability to deliver ultra-high data rates. However, its full adoption beyond fifth generation multicast systems (5G+/6G) remains hampered, mainly due to mobility robustness issues. In this work, we propose a solution to address the problem of efficient sidelink-assisted multicasting in mobile multimode systems, specifically by considering the possibility of jointly utilizing sidelink/device-to-device (D2D), unicast, and multicast transmissions to improve service delivery. To overcome the complexity problem in finding the optimal solution for user-mode binding, we introduce a pre-optimization step called multicast group formation (MGF). Through a clustering technique based on unsupervised machine learning, MGF allows to reduce the complexity of solving the sidelink-assisted multiple modes mmWave (SA3M) problem. A detailed analysis of the impact of various system parameters on performance is conducted, and numerical evidence of the complexity/performance trade-off and its dependence on mobility patterns and user distribution is provided. Particularly, our proposed solution achieves a network throughput improvement of up to 32% over state-of-the-art schemes while ensuring the lowest computational time. Finally, the results demonstrate that an effective balance between power consumption and latency can be achieved through appropriate adjustments of transmit power and bandwidth.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"1076-1090"},"PeriodicalIF":3.2,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10513425","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1109/TBC.2024.3380474
Sungjun Ahn;Hyun-Jeong Yim;Youngwan Lee;Sung-Ik Park
This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, users can experience super-personalized services accordingly. The proposed idea incorporates situations in which the user receives different service providers’ element packages, either in a sequence over time or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider’s role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.
{"title":"Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating the Same","authors":"Sungjun Ahn;Hyun-Jeong Yim;Youngwan Lee;Sung-Ik Park","doi":"10.1109/TBC.2024.3380474","DOIUrl":"10.1109/TBC.2024.3380474","url":null,"abstract":"This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, users can experience super-personalized services accordingly. The proposed idea incorporates situations in which the user receives different service providers’ element packages, either in a sequence over time or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider’s role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"980-994"},"PeriodicalIF":3.2,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140637223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most existing learning-based methods for stereoscopic image super-resolution rely on a great number of high-resolution stereoscopic images as labels. To alleviate the problem of data dependency, this paper proposes a self-supervised pretraining-based method for stereoscopic image super-resolution (SelfSSR). Specifically, to develop a self-supervised pretext task for stereoscopic images, a parallax-aware masking strategy (PAMS) is designed to adaptively mask matching areas of the left and right views. With PAMS, the network is encouraged to effectively predict missing information of input images. Besides, a cross-view Transformer module (CVTM) is presented to aggregate the intra-view and inter-view information simultaneously for stereoscopic image reconstruction. Meanwhile, the cross-attention map learned by CVTM is utilized to guide the masking process in PAMS. Comparative results on four datasets show that the proposed SelfSSR achieves state-of-the-art performance by using only 10% of labeled training data.
{"title":"Self-Supervised Pretraining for Stereoscopic Image Super-Resolution With Parallax-Aware Masking","authors":"Zhe Zhang;Jianjun Lei;Bo Peng;Jie Zhu;Qingming Huang","doi":"10.1109/TBC.2024.3382960","DOIUrl":"10.1109/TBC.2024.3382960","url":null,"abstract":"Most existing learning-based methods for stereoscopic image super-resolution rely on a great number of high-resolution stereoscopic images as labels. To alleviate the problem of data dependency, this paper proposes a self-supervised pretraining-based method for stereoscopic image super-resolution (SelfSSR). Specifically, to develop a self-supervised pretext task for stereoscopic images, a parallax-aware masking strategy (PAMS) is designed to adaptively mask matching areas of the left and right views. With PAMS, the network is encouraged to effectively predict missing information of input images. Besides, a cross-view Transformer module (CVTM) is presented to aggregate the intra-view and inter-view information simultaneously for stereoscopic image reconstruction. Meanwhile, the cross-attention map learned by CVTM is utilized to guide the masking process in PAMS. Comparative results on four datasets show that the proposed SelfSSR achieves state-of-the-art performance by using only 10% of labeled training data.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"482-491"},"PeriodicalIF":4.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140637362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The widespread use of live streaming necessitates low-latency requirements for the processing and transmission of virtual reality (VR) videos. This paper introduces a prototype system for low-latency VR video processing and transmission that exploits edge computing to harness the computational power of edge servers. This approach enables efficient video preprocessing and facilitates closer-to-user multicast video distribution. Despite edge computing’s potential, managing large-scale access, addressing differentiated channel conditions, and accommodating diverse user viewports pose significant challenges for VR video transcoding and scheduling. To tackle these challenges, our system utilizes dual-edge servers for video transcoding and slicing, thereby markedly improving the viewing experience compared to traditional cloud-based systems. Additionally, we devise a low-complexity greedy algorithm for multi-edge and multi-user VR video offloading distribution, employing the results of bitrate decisions to guide video transcoding inversely. Simulation results reveal that our strategy significantly enhances system utility by 44.77% over existing state-of-the-art schemes that do not utilize edge servers while reducing processing time by 58.54%.
{"title":"Low-Latency VR Video Processing-Transmitting System Based on Edge Computing","authors":"Nianzhen Gao;Jiaxi Zhou;Guoan Wan;Xinhai Hua;Ting Bi;Tao Jiang","doi":"10.1109/TBC.2024.3380455","DOIUrl":"10.1109/TBC.2024.3380455","url":null,"abstract":"The widespread use of live streaming necessitates low-latency requirements for the processing and transmission of virtual reality (VR) videos. This paper introduces a prototype system for low-latency VR video processing and transmission that exploits edge computing to harness the computational power of edge servers. This approach enables efficient video preprocessing and facilitates closer-to-user multicast video distribution. Despite edge computing’s potential, managing large-scale access, addressing differentiated channel conditions, and accommodating diverse user viewports pose significant challenges for VR video transcoding and scheduling. To tackle these challenges, our system utilizes dual-edge servers for video transcoding and slicing, thereby markedly improving the viewing experience compared to traditional cloud-based systems. Additionally, we devise a low-complexity greedy algorithm for multi-edge and multi-user VR video offloading distribution, employing the results of bitrate decisions to guide video transcoding inversely. Simulation results reveal that our strategy significantly enhances system utility by 44.77% over existing state-of-the-art schemes that do not utilize edge servers while reducing processing time by 58.54%.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"862-871"},"PeriodicalIF":3.2,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140578571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-11DOI: 10.1109/TBC.2024.3382949
Fei Zhou;Wei Sheng;Zitao Lu;Guoping Qiu
Video super-resolution (SR) has important real world applications such as enhancing viewing experiences of legacy low-resolution videos on high resolution display devices. However, there are no visual quality assessment (VQA) models specifically designed for evaluating SR videos while such models are crucially important both for advancing video SR algorithms and for viewing quality assurance. This paper addresses this gap. We start by contributing the first video super-resolution quality assessment database (VSR-QAD) which contains 2,260 SR videos annotated with mean opinion score (MOS) labels collected through an approximately 400 man-hours psychovisual experiment by a total of 190 subjects. We then build on the new VSR-QAD and develop the first VQA model specifically designed for evaluating SR videos. The model features a two-stream convolutional neural network architecture and a two-stage training algorithm designed for extracting spatial and temporal features characterizing the quality of SR videos. We present experimental results and data analysis to demonstrate the high data quality of VSR-QAD and the effectiveness of the new VQA model for measuring the visual quality of SR videos. The new database and the code of the proposed model will be available online at https://github.com/key1cdc/VSRQAD