Pub Date : 2022-05-11DOI: 10.3389/frsip.2022.829463
Ariel Frank, I. Cohen
In this paper, we address the problem of constant-beamwidth beamforming using nonuniform planar arrays. We propose two techniques for designing planar beamformers that can maintain different beamwidths in the XZ and YZ planes based on constant-beamwidth linear arrays. In the first technique, we utilize Kronecker product beamforming to find the weights, thus eliminating matrix inversion. The second technique provides a closed-form solution that allows for a tradeoff between white noise gain and directivity factor. The second technique is applicable even when only a subset of the sensors is used. Since our techniques are based on linear arrays, we also consider symmetric linear arrays. We present a method that determines where sensors should be placed to maximize the directivity and increase the frequency range over which the beamwidth remains constant, with a minimal number of sensors. Simulations demonstrate the advantages of the proposed design methods compared to the state-of-the-art. Specifically, our method yields a 1000-fold faster runtime than the competing method, while improving the wideband directivity factor by over 8 dB without compromising the wideband white noise gain in the simulated scenario.
{"title":"Constant-Beamwidth Kronecker Product Beamforming With Nonuniform Planar Arrays","authors":"Ariel Frank, I. Cohen","doi":"10.3389/frsip.2022.829463","DOIUrl":"https://doi.org/10.3389/frsip.2022.829463","url":null,"abstract":"In this paper, we address the problem of constant-beamwidth beamforming using nonuniform planar arrays. We propose two techniques for designing planar beamformers that can maintain different beamwidths in the XZ and YZ planes based on constant-beamwidth linear arrays. In the first technique, we utilize Kronecker product beamforming to find the weights, thus eliminating matrix inversion. The second technique provides a closed-form solution that allows for a tradeoff between white noise gain and directivity factor. The second technique is applicable even when only a subset of the sensors is used. Since our techniques are based on linear arrays, we also consider symmetric linear arrays. We present a method that determines where sensors should be placed to maximize the directivity and increase the frequency range over which the beamwidth remains constant, with a minimal number of sensors. Simulations demonstrate the advantages of the proposed design methods compared to the state-of-the-art. Specifically, our method yields a 1000-fold faster runtime than the competing method, while improving the wideband directivity factor by over 8 dB without compromising the wideband white noise gain in the simulated scenario.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84460345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-04DOI: 10.3389/frsip.2022.932873
Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, Ivan V. Baji'c
When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learning-based image compression framework where image denoising and compression are performed jointly. The latent space of the image codec is organized in a scalable manner such that the clean image can be decoded from a subset of the latent space (the base layer), while the noisy image is decoded from the full latent space at a higher rate. Using a subset of the latent space for the denoised image allows denoising to be carried out at a lower rate. Besides providing a scalable representation of the noisy input image, performing denoising jointly with compression makes intuitive sense because noise is hard to compress; hence, compressibility is one of the criteria that may help distinguish noise from the signal. The proposed codec is compared against established compression and denoising benchmarks, and the experiments reveal considerable bitrate savings compared to a cascade combination of a state-of-the-art codec and a state-of-the-art denoiser.
{"title":"Joint image compression and denoising via latent-space scalability","authors":"Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, Ivan V. Baji'c","doi":"10.3389/frsip.2022.932873","DOIUrl":"https://doi.org/10.3389/frsip.2022.932873","url":null,"abstract":"When it comes to image compression in digital cameras, denoising is traditionally performed prior to compression. However, there are applications where image noise may be necessary to demonstrate the trustworthiness of the image, such as court evidence and image forensics. This means that noise itself needs to be coded, in addition to the clean image itself. In this paper, we present a learning-based image compression framework where image denoising and compression are performed jointly. The latent space of the image codec is organized in a scalable manner such that the clean image can be decoded from a subset of the latent space (the base layer), while the noisy image is decoded from the full latent space at a higher rate. Using a subset of the latent space for the denoised image allows denoising to be carried out at a lower rate. Besides providing a scalable representation of the noisy input image, performing denoising jointly with compression makes intuitive sense because noise is hard to compress; hence, compressibility is one of the criteria that may help distinguish noise from the signal. The proposed codec is compared against established compression and denoising benchmarks, and the experiments reveal considerable bitrate savings compared to a cascade combination of a state-of-the-art codec and a state-of-the-art denoiser.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82774555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-02DOI: 10.3389/frsip.2022.842925
Xiaogang Du, Yinyin Nie, Fuhai Wang, Tao Lei, Song Wang, Xuejun Zhang
Medical image segmentation plays an important role in clinical applications, such as disease diagnosis and treatment planning. On the premise of ensuring segmentation accuracy, segmentation speed is also an important factor to improve diagnosis efficiency. Many medical image segmentation models based on deep learning can improve the segmentation accuracy, but ignore the model complexity and inference speed resulting in the failure of meeting the high real-time requirements of clinical applications. To address this problem, an asymmetric lightweight medical image segmentation network, namely AL-Net for short, is proposed in this paper. Firstly, AL-Net employs the pre-training RepVGG-A1 to extract rich semantic features, and reduces the channel processing to ensure the lower model complexity. Secondly, AL-Net introduces the lightweight atrous spatial pyramid pooling module as the context extractor, and combines the attention mechanism to capture the context information. Thirdly, a novel asymmetric decoder is proposed and introduced into AL-Net, which not only effectively eliminates redundant features, but also makes use of low-level features of images to improve the performance of AL-Net. Finally, the reparameterization technology is utilized in the inference stage, which effectively reduces the parameters of AL-Net and improves the inference speed of AL-Net without reducing the segmentation accuracy. The experimental results on retinal vessel, cell contour, and skin lesions segmentation datasets show that AL-Net is superior to the state-of-the-art models in terms of accuracy, parameters and inference speed.
{"title":"AL-Net: Asymmetric Lightweight Network for Medical Image Segmentation","authors":"Xiaogang Du, Yinyin Nie, Fuhai Wang, Tao Lei, Song Wang, Xuejun Zhang","doi":"10.3389/frsip.2022.842925","DOIUrl":"https://doi.org/10.3389/frsip.2022.842925","url":null,"abstract":"Medical image segmentation plays an important role in clinical applications, such as disease diagnosis and treatment planning. On the premise of ensuring segmentation accuracy, segmentation speed is also an important factor to improve diagnosis efficiency. Many medical image segmentation models based on deep learning can improve the segmentation accuracy, but ignore the model complexity and inference speed resulting in the failure of meeting the high real-time requirements of clinical applications. To address this problem, an asymmetric lightweight medical image segmentation network, namely AL-Net for short, is proposed in this paper. Firstly, AL-Net employs the pre-training RepVGG-A1 to extract rich semantic features, and reduces the channel processing to ensure the lower model complexity. Secondly, AL-Net introduces the lightweight atrous spatial pyramid pooling module as the context extractor, and combines the attention mechanism to capture the context information. Thirdly, a novel asymmetric decoder is proposed and introduced into AL-Net, which not only effectively eliminates redundant features, but also makes use of low-level features of images to improve the performance of AL-Net. Finally, the reparameterization technology is utilized in the inference stage, which effectively reduces the parameters of AL-Net and improves the inference speed of AL-Net without reducing the segmentation accuracy. The experimental results on retinal vessel, cell contour, and skin lesions segmentation datasets show that AL-Net is superior to the state-of-the-art models in terms of accuracy, parameters and inference speed.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72526587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-02DOI: 10.3389/frsip.2022.864392
Sige Liu, Peng Cheng, Zhuo Chen, B. Vucetic, Yonghui Li
Due to the rapid development of 5G and Internet-of-Things (IoT), various emerging applications have been catalyzed, ranging from face recognition, virtual reality to autonomous driving, demanding ubiquitous computation services beyond the capacity of mobile users (MUs). Mobile cloud computing (MCC) enables MUs to offload their tasks to the remote central cloud with substantial computation and storage, at the expense of long propagation latency. To solve the latency issue, mobile edge computing (MEC) pushes its servers to the edge of the network much closer to the MUs. It jointly considers the communication and computation to optimize network performance by satisfying quality-of-service (QoS) and quality-of-experience (QoE) requirements. However, MEC usually faces a complex combinatorial optimization problem with the complexity of exponential scale. Moreover, many important parameters might be unknown a-priori due to the dynamic nature of the offloading environment and network topology. In this paper, to deal with the above issues, we introduce bandit learning (BL), which enables each agent (MU/server) to make a sequential selection from a set of arms (servers/MUs) and then receive some numerical rewards. BL brings extra benefits to the joint consideration of offloading decision and resource allocation in MEC, including the matched mechanism, situation awareness through learning, and adaptability. We present a brief tutorial on BL of different variations, covering the mathematical formulations and corresponding solutions. Furthermore, we provide several applications of BL in MEC, including system models, problem formulations, proposed algorithms and simulation results. At last, we introduce several challenges and directions in the future research of BL in 5G MEC.
{"title":"A Tutorial on Bandit Learning and Its Applications in 5G Mobile Edge Computing (Invited Paper)","authors":"Sige Liu, Peng Cheng, Zhuo Chen, B. Vucetic, Yonghui Li","doi":"10.3389/frsip.2022.864392","DOIUrl":"https://doi.org/10.3389/frsip.2022.864392","url":null,"abstract":"Due to the rapid development of 5G and Internet-of-Things (IoT), various emerging applications have been catalyzed, ranging from face recognition, virtual reality to autonomous driving, demanding ubiquitous computation services beyond the capacity of mobile users (MUs). Mobile cloud computing (MCC) enables MUs to offload their tasks to the remote central cloud with substantial computation and storage, at the expense of long propagation latency. To solve the latency issue, mobile edge computing (MEC) pushes its servers to the edge of the network much closer to the MUs. It jointly considers the communication and computation to optimize network performance by satisfying quality-of-service (QoS) and quality-of-experience (QoE) requirements. However, MEC usually faces a complex combinatorial optimization problem with the complexity of exponential scale. Moreover, many important parameters might be unknown a-priori due to the dynamic nature of the offloading environment and network topology. In this paper, to deal with the above issues, we introduce bandit learning (BL), which enables each agent (MU/server) to make a sequential selection from a set of arms (servers/MUs) and then receive some numerical rewards. BL brings extra benefits to the joint consideration of offloading decision and resource allocation in MEC, including the matched mechanism, situation awareness through learning, and adaptability. We present a brief tutorial on BL of different variations, covering the mathematical formulations and corresponding solutions. Furthermore, we provide several applications of BL in MEC, including system models, problem formulations, proposed algorithms and simulation results. At last, we introduce several challenges and directions in the future research of BL in 5G MEC.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"126 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73929530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-02DOI: 10.3389/frsip.2022.854207
S. Gehlot, Naushad Ansari, Anubha Gupta
Hyperspectral imaging (HSI) is useful in many applications, including healthcare, geosciences, and remote surveillance. In general, the HSI data set is large. The use of compressive sensing can reduce these data considerably, provided there is a robust methodology to reconstruct the full image data with quality. This article proposes a method, namely, WTL-I, that is mutual information-based wavelet transform learning for the reconstruction of compressively sensed three-dimensional (3D) hyperspectral image data. Here, wavelet transform is learned from the compressively sensed HSI data in 3D by exploiting mutual information across spectral bands and spatial information within the spectral bands. This learned wavelet basis is subsequently used as the sparsifying basis for the recovery of full HSI data. Elaborate experiments have been conducted on three benchmark HSI data sets. In addition to evaluating the quantitative and qualitative results on the reconstructed HSI data, performance of the proposed method has also been validated in the application of HSI data classification using a deep learning classifier.
{"title":"WTL-I: Mutual Information-Based Wavelet Transform Learning for Hyperspectral Imaging","authors":"S. Gehlot, Naushad Ansari, Anubha Gupta","doi":"10.3389/frsip.2022.854207","DOIUrl":"https://doi.org/10.3389/frsip.2022.854207","url":null,"abstract":"Hyperspectral imaging (HSI) is useful in many applications, including healthcare, geosciences, and remote surveillance. In general, the HSI data set is large. The use of compressive sensing can reduce these data considerably, provided there is a robust methodology to reconstruct the full image data with quality. This article proposes a method, namely, WTL-I, that is mutual information-based wavelet transform learning for the reconstruction of compressively sensed three-dimensional (3D) hyperspectral image data. Here, wavelet transform is learned from the compressively sensed HSI data in 3D by exploiting mutual information across spectral bands and spatial information within the spectral bands. This learned wavelet basis is subsequently used as the sparsifying basis for the recovery of full HSI data. Elaborate experiments have been conducted on three benchmark HSI data sets. In addition to evaluating the quantitative and qualitative results on the reconstructed HSI data, performance of the proposed method has also been validated in the application of HSI data classification using a deep learning classifier.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"423 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75697028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-29DOI: 10.3389/frsip.2022.827160
Camilo Aguilar, M. Ortner, J. Zerubia
Small object tracking in low-resolution remote sensing images presents numerous challenges. Targets are relatively small compared to the field of view, do not present distinct features, and are often lost in cluttered environments. In this paper, we propose a track-by-detection approach to detect and track small moving targets by using a convolutional neural network and a Bayesian tracker. Our object detection consists of a two-step process based on motion and a patch-based convolutional neural network (CNN). The first stage performs a lightweight motion detection operator to obtain rough target locations. The second stage uses this information combined with a CNN to refine the detection results. In addition, we adopt an online track-by-detection approach by using the Probability Hypothesis Density (PHD) filter to convert detections into tracks. The PHD filter offers a robust multi-object Bayesian data-association framework that performs well in cluttered environments, keeps track of missed detections, and presents remarkable computational advantages over different Bayesian filters. We test our method across various cases of a challenging dataset: a low-resolution satellite video comprising numerous small moving objects. We demonstrate the proposed method outperforms competing approaches across different scenarios with both object detection and object tracking metrics.
{"title":"Small Object Detection and Tracking in Satellite Videos With Motion Informed-CNN and GM-PHD Filter","authors":"Camilo Aguilar, M. Ortner, J. Zerubia","doi":"10.3389/frsip.2022.827160","DOIUrl":"https://doi.org/10.3389/frsip.2022.827160","url":null,"abstract":"Small object tracking in low-resolution remote sensing images presents numerous challenges. Targets are relatively small compared to the field of view, do not present distinct features, and are often lost in cluttered environments. In this paper, we propose a track-by-detection approach to detect and track small moving targets by using a convolutional neural network and a Bayesian tracker. Our object detection consists of a two-step process based on motion and a patch-based convolutional neural network (CNN). The first stage performs a lightweight motion detection operator to obtain rough target locations. The second stage uses this information combined with a CNN to refine the detection results. In addition, we adopt an online track-by-detection approach by using the Probability Hypothesis Density (PHD) filter to convert detections into tracks. The PHD filter offers a robust multi-object Bayesian data-association framework that performs well in cluttered environments, keeps track of missed detections, and presents remarkable computational advantages over different Bayesian filters. We test our method across various cases of a challenging dataset: a low-resolution satellite video comprising numerous small moving objects. We demonstrate the proposed method outperforms competing approaches across different scenarios with both object detection and object tracking metrics.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"517 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77139643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-29DOI: 10.3389/frsip.2022.883943
Lauri Ilola, L. Kondrad, S. Schwarz, Ahmed Hamza
The increasing popularity of virtual, augmented, and mixed reality (VR/AR/MR) applications is driving the media industry to explore the creation and delivery of new immersive experiences. One of the trends is volumetric video, which allows users to explore content unconstrained by the traditional two-dimensional window of director’s view. The ISO/IEC joint technical committee 1 subcommittee 29, better known as the Moving Pictures Experts Group (MPEG), has recently finalized a group of standards, under the umbrella of Visual Volumetric Video-based Coding (V3C). These standards aim to efficiently code, store, and transport immersive content with 6 degrees of freedom. The V3C family of standards currently consists of three documents: 1) ISO/IEC 23090-5 defines the generic concepts of volumetric video-based coding and its application to dynamic point cloud data; 2) ISO/IEC 23090-12 specifies another application that enables compression of volumetric video content captured by multiple cameras; and 3) ISO/IEC 23090-10 describes how to store and deliver V3C compressed volumetric video content. Each standard leverages the capabilities of traditional 2D video coding and delivery solutions, allowing for re-use of existing infrastructures which facilitates fast deployment of volumetric video. This article provides an overview of the generic concepts of V3C, as defined in ISO/IEC 23090-5. Furthermore, it describes V3C carriage related functionalities specified in ISO/IEC 23090-10 and offers best practices for the community with respect to storage and delivery of volumetric video.
{"title":"An Overview of the MPEG Standard for Storage and Transport of Visual Volumetric Video-Based Coding","authors":"Lauri Ilola, L. Kondrad, S. Schwarz, Ahmed Hamza","doi":"10.3389/frsip.2022.883943","DOIUrl":"https://doi.org/10.3389/frsip.2022.883943","url":null,"abstract":"The increasing popularity of virtual, augmented, and mixed reality (VR/AR/MR) applications is driving the media industry to explore the creation and delivery of new immersive experiences. One of the trends is volumetric video, which allows users to explore content unconstrained by the traditional two-dimensional window of director’s view. The ISO/IEC joint technical committee 1 subcommittee 29, better known as the Moving Pictures Experts Group (MPEG), has recently finalized a group of standards, under the umbrella of Visual Volumetric Video-based Coding (V3C). These standards aim to efficiently code, store, and transport immersive content with 6 degrees of freedom. The V3C family of standards currently consists of three documents: 1) ISO/IEC 23090-5 defines the generic concepts of volumetric video-based coding and its application to dynamic point cloud data; 2) ISO/IEC 23090-12 specifies another application that enables compression of volumetric video content captured by multiple cameras; and 3) ISO/IEC 23090-10 describes how to store and deliver V3C compressed volumetric video content. Each standard leverages the capabilities of traditional 2D video coding and delivery solutions, allowing for re-use of existing infrastructures which facilitates fast deployment of volumetric video. This article provides an overview of the generic concepts of V3C, as defined in ISO/IEC 23090-5. Furthermore, it describes V3C carriage related functionalities specified in ISO/IEC 23090-10 and offers best practices for the community with respect to storage and delivery of volumetric video.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76010608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-28DOI: 10.3389/frsip.2022.788943
Eric Gyamfi, A. Jurcut
The rapid increase in the Industrial Internet of Things (IIoT) use cases plays a significant role in Industry 4.0 development. However, IIoT systems face resource constraints problems and are vulnerable to cyberattacks due to their inability to implement existing sophisticated security systems. One way of alleviating these resource constraints is to utilize multi-access edge computing (MEC) to provide computational resources at the network edge to execute the security applications. To provide resilient security for IIoT using MEC, the offloading latency, synchronization time, and turnaround time must be optimized to provide real-time attack detection. Hence, this paper provides a novel adaptive machine learning–based security (MLS) task offloading (ASTO) mechanism to ensure that the connectivity between the MEC server and IIoT is secured and guaranteed. We explored the trade-off between the limited computing capacity and high cloud computing latency to propose an ASTO, where MEC and IIoT can collaborate to provide optimized MLS to protect the network. In the proposed system, we converted the MLS task offloading and synchronization problem into an equivalent mathematical model, which can be solved by applying Markov transition probability and clock offset estimation using maximum likelihood. Our extensive simulations show that the proposed algorithm provides robust security for the IIoT network with low latency, synchronization accuracy, and energy efficiency.
{"title":"A Robust Security Task Offloading in Industrial IoT-Enabled Distributed Multi-Access Edge Computing","authors":"Eric Gyamfi, A. Jurcut","doi":"10.3389/frsip.2022.788943","DOIUrl":"https://doi.org/10.3389/frsip.2022.788943","url":null,"abstract":"The rapid increase in the Industrial Internet of Things (IIoT) use cases plays a significant role in Industry 4.0 development. However, IIoT systems face resource constraints problems and are vulnerable to cyberattacks due to their inability to implement existing sophisticated security systems. One way of alleviating these resource constraints is to utilize multi-access edge computing (MEC) to provide computational resources at the network edge to execute the security applications. To provide resilient security for IIoT using MEC, the offloading latency, synchronization time, and turnaround time must be optimized to provide real-time attack detection. Hence, this paper provides a novel adaptive machine learning–based security (MLS) task offloading (ASTO) mechanism to ensure that the connectivity between the MEC server and IIoT is secured and guaranteed. We explored the trade-off between the limited computing capacity and high cloud computing latency to propose an ASTO, where MEC and IIoT can collaborate to provide optimized MLS to protect the network. In the proposed system, we converted the MLS task offloading and synchronization problem into an equivalent mathematical model, which can be solved by applying Markov transition probability and clock offset estimation using maximum likelihood. Our extensive simulations show that the proposed algorithm provides robust security for the IIoT network with low latency, synchronization accuracy, and energy efficiency.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86274642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-27DOI: 10.3389/frsip.2022.861641
Koichiro Niinuma, Itir Onal Ertugrul, J. Cohn, László A. Jeni
Limited sizes of annotated video databases of spontaneous facial expression, imbalanced action unit labels, and domain shift are three main obstacles in training models to detect facial actions and estimate their intensity. To address these problems, we propose an approach that incorporates facial expression generation for facial action unit intensity estimation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and trains a GAN-based network to synthesize novel images with facial action units of interest. We leverage the synthetic images to achieve two goals: 1) generating AU-balanced databases, and 2) tackling domain shift with personalized networks. To generate a balanced database, we synthesize expressions with varying AU intensities and perform semantic resampling. Our experimental results on FERA17 show that networks trained on synthesized facial expressions outperform those trained on actual facial expressions and surpass current state-of-the-art approaches. To tackle domain shift, we propose personalizing pretrained networks. We generate synthetic expressions of each target subject with varying AU intensity labels and use the person-specific synthetic images to fine-tune pretrained networks. To evaluate performance of the personalized networks, we use DISFA and PAIN databases. Personalized networks, which require only a single image from each target subject to generate synthetic images, achieved significant improvement in generalizing to unseen domains.
{"title":"Facial Expression Manipulation for Personalized Facial Action Estimation","authors":"Koichiro Niinuma, Itir Onal Ertugrul, J. Cohn, László A. Jeni","doi":"10.3389/frsip.2022.861641","DOIUrl":"https://doi.org/10.3389/frsip.2022.861641","url":null,"abstract":"Limited sizes of annotated video databases of spontaneous facial expression, imbalanced action unit labels, and domain shift are three main obstacles in training models to detect facial actions and estimate their intensity. To address these problems, we propose an approach that incorporates facial expression generation for facial action unit intensity estimation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and trains a GAN-based network to synthesize novel images with facial action units of interest. We leverage the synthetic images to achieve two goals: 1) generating AU-balanced databases, and 2) tackling domain shift with personalized networks. To generate a balanced database, we synthesize expressions with varying AU intensities and perform semantic resampling. Our experimental results on FERA17 show that networks trained on synthesized facial expressions outperform those trained on actual facial expressions and surpass current state-of-the-art approaches. To tackle domain shift, we propose personalizing pretrained networks. We generate synthetic expressions of each target subject with varying AU intensity labels and use the person-specific synthetic images to fine-tune pretrained networks. To evaluate performance of the personalized networks, we use DISFA and PAIN databases. Personalized networks, which require only a single image from each target subject to generate synthetic images, achieved significant improvement in generalizing to unseen domains.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81742750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-27DOI: 10.3389/frsip.2022.885644
D. Taubman, A. Naman, Michael Smith, P. Lemieux, Hassaan Saadat, Osamu Watanabe, R. Mathew
ITU-T Rec T.814 | IS 15444-15, known as High Throughput JPEG 2000, or simply HTJ2K, is Part-15 in the JPEG 2000 series of standards, published in 2019 by the ITU and ISO/IEC. JPEG 2000 Part-1 has long been used as a key component in the production, archival and distribution of video content, as the distribution format for Digital Cinema, and an Interoperable Master Format from which streaming video services are commonly derived. JPEG 2000 has one of the richest feature sets of any coding standard, including scalability, region-of-interest accessibility and non-iterative optimal rate control. HTJ2K addresses a long-standing limitation of the original JPEG 2000 family of standards: relatively low throughput on CPU and GPU platforms. HTJ2K introduces an alternative block coding algorithm that allows extremely high processing throughputs, while preserving all other aspects of the JPEG 2000 framework and offering truly reversible transcoding with the original block coded representation. This paper demonstrates the benefits that HTJ2K brings to video content production and delivery, including cloud-based processing workflows and low latency video content streaming over IP networks, considering CPU, GPU and FPGA-based platforms. For non-iterative optimal rate control, HTJ2K encoders with the highest throughputs and lowest hardware encoding footprints need a strategy for constraining the number of so-called HT-Sets that are generated ahead of the classic Post-Compression Rate-Distortion optimization (PCRD-opt) process. This paper describes such a strategy, known as CPLEX, that involves a second (virtual) rate-control process. The novel combination of this virtual (CPLEX) and actual (PCRD-opt) processes has many benefits, especially for hardware encoders, where memory size and memory bandwidth are key indicators of complexity.
ITU- t Rec T.814 | IS 15444-15被称为高吞吐量JPEG 2000,或简称HTJ2K,是国际电联和ISO/IEC于2019年发布的JPEG 2000系列标准的第15部分。JPEG 2000 Part-1长期以来一直被用作制作、存档和分发视频内容的关键组件,作为数字电影的分发格式,以及可互操作的主格式,流媒体视频服务通常由此派生。JPEG 2000具有所有编码标准中最丰富的特性集之一,包括可伸缩性、感兴趣区域可访问性和非迭代最优速率控制。HTJ2K解决了原始JPEG 2000标准家族的一个长期限制:CPU和GPU平台上相对较低的吞吐量。HTJ2K引入了另一种块编码算法,该算法允许极高的处理吞吐量,同时保留了JPEG 2000框架的所有其他方面,并使用原始块编码表示提供真正可逆的转码。本文演示了HTJ2K为视频内容制作和交付带来的好处,包括基于云的处理工作流和基于IP网络的低延迟视频内容流,考虑到基于CPU、GPU和fpga的平台。对于非迭代最优速率控制,具有最高吞吐量和最低硬件编码占用的HTJ2K编码器需要一种策略来约束在经典的后压缩率-失真优化(PCRD-opt)过程之前生成的所谓ht集的数量。本文描述了这样一种策略,称为CPLEX,它涉及第二个(虚拟)速率控制过程。这种虚拟(CPLEX)和实际(PCRD-opt)进程的新颖组合有很多好处,特别是对于硬件编码器,其中内存大小和内存带宽是复杂性的关键指标。
{"title":"High Throughput JPEG 2000 for Video Content Production and Delivery Over IP Networks","authors":"D. Taubman, A. Naman, Michael Smith, P. Lemieux, Hassaan Saadat, Osamu Watanabe, R. Mathew","doi":"10.3389/frsip.2022.885644","DOIUrl":"https://doi.org/10.3389/frsip.2022.885644","url":null,"abstract":"ITU-T Rec T.814 | IS 15444-15, known as High Throughput JPEG 2000, or simply HTJ2K, is Part-15 in the JPEG 2000 series of standards, published in 2019 by the ITU and ISO/IEC. JPEG 2000 Part-1 has long been used as a key component in the production, archival and distribution of video content, as the distribution format for Digital Cinema, and an Interoperable Master Format from which streaming video services are commonly derived. JPEG 2000 has one of the richest feature sets of any coding standard, including scalability, region-of-interest accessibility and non-iterative optimal rate control. HTJ2K addresses a long-standing limitation of the original JPEG 2000 family of standards: relatively low throughput on CPU and GPU platforms. HTJ2K introduces an alternative block coding algorithm that allows extremely high processing throughputs, while preserving all other aspects of the JPEG 2000 framework and offering truly reversible transcoding with the original block coded representation. This paper demonstrates the benefits that HTJ2K brings to video content production and delivery, including cloud-based processing workflows and low latency video content streaming over IP networks, considering CPU, GPU and FPGA-based platforms. For non-iterative optimal rate control, HTJ2K encoders with the highest throughputs and lowest hardware encoding footprints need a strategy for constraining the number of so-called HT-Sets that are generated ahead of the classic Post-Compression Rate-Distortion optimization (PCRD-opt) process. This paper describes such a strategy, known as CPLEX, that involves a second (virtual) rate-control process. The novel combination of this virtual (CPLEX) and actual (PCRD-opt) processes has many benefits, especially for hardware encoders, where memory size and memory bandwidth are key indicators of complexity.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"2017 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73333103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}