首页 > 最新文献

Signal Processing-Image Communication最新文献

英文 中文
SynFlowMap: A synchronized optical flow remapping for video motion magnification SynFlowMap:用于视频运动放大的同步光流重映射
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-18 DOI: 10.1016/j.image.2024.117203
Jonathan A.S. Lima , Cristiano J. Miosso , Mylène C.Q. Farias
Motion magnification refers to the process of spatially amplifying small movements in a video to reveal important information about a scene. Several motion magnification methods have been proposed, but most of them introduce perceptible and annoying visual artifacts. In this paper, we propose a method that first analyzes the optical flow between the original frame and the corresponding frames, which are motion-magnified with other methods. Then, the method uses the generated optical flow map and the original video to synthesize a combined motion-magnified video. The method is able to amplify the motion by larger values, invert the direction of the motion, and combine filtered motion from multiple frequencies and Eulerian methods. Amongst other advantages, the proposed approach eliminates artifacts caused by Eulerian motion-magnification methods. We present an extensive qualitative and quantitative analysis of the results compared to the main approaches for Eulerian methods. A final contribution of this work is a new video database for motion magnification that allows the evaluation of quantitative motion magnification.
运动放大是指对视频中的微小运动进行空间放大,以揭示场景的重要信息。目前已提出了几种运动放大方法,但大多数方法都会带来可感知的、恼人的视觉伪影。在本文中,我们提出了一种方法,首先分析原始帧和相应帧之间的光流,这些帧是用其他方法进行运动放大的。然后,该方法使用生成的光流图和原始视频合成综合运动放大视频。该方法能以更大的数值放大运动,反转运动方向,并能将来自多个频率和欧拉方法的滤波运动结合起来。除其他优点外,所提出的方法还能消除欧拉运动放大方法造成的伪影。与欧拉方法的主要方法相比,我们对结果进行了广泛的定性和定量分析。这项工作的最后一个贡献是建立了一个新的运动放大视频数据库,可以对定量运动放大进行评估。
{"title":"SynFlowMap: A synchronized optical flow remapping for video motion magnification","authors":"Jonathan A.S. Lima ,&nbsp;Cristiano J. Miosso ,&nbsp;Mylène C.Q. Farias","doi":"10.1016/j.image.2024.117203","DOIUrl":"10.1016/j.image.2024.117203","url":null,"abstract":"<div><div>Motion magnification refers to the process of spatially amplifying small movements in a video to reveal important information about a scene. Several motion magnification methods have been proposed, but most of them introduce perceptible and annoying visual artifacts. In this paper, we propose a method that first analyzes the optical flow between the original frame and the corresponding frames, which are motion-magnified with other methods. Then, the method uses the generated optical flow map and the original video to synthesize a combined motion-magnified video. The method is able to amplify the motion by larger values, invert the direction of the motion, and combine filtered motion from multiple frequencies and Eulerian methods. Amongst other advantages, the proposed approach eliminates artifacts caused by Eulerian motion-magnification methods. We present an extensive qualitative and quantitative analysis of the results compared to the main approaches for Eulerian methods. A final contribution of this work is a new video database for motion magnification that allows the evaluation of quantitative motion magnification.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117203"},"PeriodicalIF":3.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142416778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed virtual selective-forwarding units and SDN-assisted edge computing for optimization of multi-party WebRTC videoconferencing 分布式虚拟选择性转发单元和 SDN 辅助边缘计算优化多方 WebRTC 视频会议
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-12 DOI: 10.1016/j.image.2024.117173
R. Arda Kırmızıoğlu , A. Murat Tekalp , Burak Görkemli
Network service providers (NSP) have growing interest in placing network intelligence and services at network edges by deploying software-defined network (SDN) and network function virtualization infrastructure. In multi-party WebRTC videoconferencing using scalable video coding, a selective forwarding unit (SFU) provides connectivity between peers with heterogeneous bandwidth and terminals. An important question is where in the network to place the SFU service in order to minimize end-to-end delay between all pairs of peers. Clearly, there is no single optimal place for a cloud SFU for all possible peer locations. We propose placing virtual SFUs at network edges leveraging NSP edge datacenters to optimize end-to-end delay and usage of overall network resources. The main advantage of the distributed edge-SFU framework is that each peer video stream travels the shortest path to reach other peers similar to mesh connection model, whereas each peer uploads a single stream to its edge-SFU avoiding the upload bottleneck. While the proposed distributed edge-SFU framework applies to both best-effort and managed service models, this paper proposes a premium managed, edge-integrated multi-party WebRTC service architecture with bandwidth and delay guarantees within access networks by SDN-assisted slicing of edge networks. The performance of the proposed distributed edge-SFU service architecture is demonstrated by means of experimental results.
网络服务提供商(NSP)对通过部署软件定义网络(SDN)和网络功能虚拟化基础设施在网络边缘部署网络智能和服务的兴趣与日俱增。在使用可扩展视频编码的多方 WebRTC 视频会议中,选择性转发单元(SFU)在带宽和终端异构的对等方之间提供连接。一个重要的问题是在网络的哪个位置放置 SFU 服务,以尽量减少所有对等点之间的端到端延迟。显然,对于所有可能的对等点位置,云 SFU 没有一个最佳位置。我们建议利用 NSP 边缘数据中心在网络边缘放置虚拟 SFU,以优化端到端延迟和整体网络资源的使用。分布式边缘-SFU 框架的主要优势在于,每个对等点的视频流通过最短路径到达其他对等点,类似于网状连接模型,而每个对等点向其边缘-SFU 上传单个视频流,避免了上传瓶颈。本文提出的分布式边缘-SFU 框架同时适用于尽力而为和托管服务模式,并通过 SDN 辅助的边缘网络切片,在接入网内提出了一种具有带宽和延迟保证的优质托管边缘集成多方 WebRTC 服务架构。实验结果证明了所提出的分布式边缘-SFU 服务架构的性能。
{"title":"Distributed virtual selective-forwarding units and SDN-assisted edge computing for optimization of multi-party WebRTC videoconferencing","authors":"R. Arda Kırmızıoğlu ,&nbsp;A. Murat Tekalp ,&nbsp;Burak Görkemli","doi":"10.1016/j.image.2024.117173","DOIUrl":"10.1016/j.image.2024.117173","url":null,"abstract":"<div><div>Network service providers (NSP) have growing interest in placing network intelligence and services at network edges by deploying software-defined network (SDN) and network function virtualization infrastructure. In multi-party WebRTC videoconferencing using scalable video coding, a selective forwarding unit (SFU) provides connectivity between peers with heterogeneous bandwidth and terminals. An important question is where in the network to place the SFU service in order to minimize end-to-end delay between all pairs of peers. Clearly, there is no single optimal place for a cloud SFU for all possible peer locations. We propose placing virtual SFUs at network edges leveraging NSP edge datacenters to optimize end-to-end delay and usage of overall network resources. The main advantage of the distributed edge-SFU framework is that each peer video stream travels the shortest path to reach other peers similar to mesh connection model, whereas each peer uploads a single stream to its edge-SFU avoiding the upload bottleneck. While the proposed distributed edge-SFU framework applies to both best-effort and managed service models, this paper proposes a premium managed, edge-integrated multi-party WebRTC service architecture with bandwidth and delay guarantees within access networks by SDN-assisted slicing of edge networks. The performance of the proposed distributed edge-SFU service architecture is demonstrated by means of experimental results.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117173"},"PeriodicalIF":3.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modulated deformable convolution based on graph convolution network for rail surface crack detection 基于图卷积网络的调制变形卷积用于轨道表面裂纹检测
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-10 DOI: 10.1016/j.image.2024.117202
Shuzhen Tong , Qing Wang , Xuan Wei , Cheng Lu , Xiaobo Lu

Accurate detection of rail surface cracks is essential but also tricky because of the noise, low contrast, and density inhomogeneity. In this paper, to deal with the complex situations in rail surface crack detection, we propose modulated deformable convolution based on a graph convolution network named MDCGCN. The MDCGCN is a novel convolution that calculates the offsets and modulation scalars of the modulated deformable convolution by conducting the graph convolution network on a feature map. The MDCGCN improves the performance of different networks in rail surface crack detection, harming the inference speed slightly. Finally, we demonstrate our methods’ numerical accuracy, computational efficiency, and effectiveness on the public segmentation dataset RSDD and our self-built detection dataset SEU-RSCD and explore an appropriate network structure in the baseline network UNet with the MDCGCN.

轨道表面裂纹的精确检测非常重要,但由于噪声、低对比度和密度不均匀性等原因,检测也非常棘手。本文针对轨道表面裂纹检测中的复杂情况,提出了基于图卷积网络(MDCGCN)的调制可变形卷积。MDCGCN 是一种新型卷积,通过在特征图上进行图卷积网络计算调制变形卷积的偏移和调制标量。MDCGCN 提高了不同网络在轨道表面裂纹检测中的性能,但对推理速度略有损害。最后,我们在公共分割数据集 RSDD 和自建检测数据集 SEU-RSCD 上证明了我们的方法的数值精度、计算效率和有效性,并探索了基线网络 UNet 与 MDCGCN 的适当网络结构。
{"title":"Modulated deformable convolution based on graph convolution network for rail surface crack detection","authors":"Shuzhen Tong ,&nbsp;Qing Wang ,&nbsp;Xuan Wei ,&nbsp;Cheng Lu ,&nbsp;Xiaobo Lu","doi":"10.1016/j.image.2024.117202","DOIUrl":"10.1016/j.image.2024.117202","url":null,"abstract":"<div><p>Accurate detection of rail surface cracks is essential but also tricky because of the noise, low contrast, and density inhomogeneity. In this paper, to deal with the complex situations in rail surface crack detection, we propose modulated deformable convolution based on a graph convolution network named MDCGCN. The MDCGCN is a novel convolution that calculates the offsets and modulation scalars of the modulated deformable convolution by conducting the graph convolution network on a feature map. The MDCGCN improves the performance of different networks in rail surface crack detection, harming the inference speed slightly. Finally, we demonstrate our methods’ numerical accuracy, computational efficiency, and effectiveness on the public segmentation dataset RSDD and our self-built detection dataset SEU-RSCD and explore an appropriate network structure in the baseline network UNet with the MDCGCN.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117202"},"PeriodicalIF":3.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142229160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A global reweighting approach for cross-domain semantic segmentation 跨域语义分割的全局再加权方法
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-07 DOI: 10.1016/j.image.2024.117197
Yuhang Zhang , Shishun Tian , Muxin Liao , Guoguang Hua , Wenbin Zou , Chen Xu
Unsupervised domain adaptation semantic segmentation attracts much research attention due to the expensive pixel-level annotation cost. Since the adaptation difficulty of samples is different, the weight of samples should be set independently, which is called reweighting. However, existing reweighting methods only calculate local reweighting information from predicted results or context information in batch images of two domains, which may lead to over-alignment or under-alignment problems. To handle this issue, we propose a global reweighting approach. Specifically, we first define the target centroid distance, which describes the distance between the source batch data and the target centroid. Then, we employ a Fréchet Inception Distance metric to evaluate the domain divergence and embed it into the target centroid distance. Finally, a global reweighting strategy is proposed to enhance the knowledge transferability in the source domain supervision. Extensive experiments demonstrate that our approach achieves competitive performance and helps to improve performance in other methods.
由于像素级标注成本昂贵,无监督领域自适应语义分割备受研究关注。由于样本的适配难度不同,因此需要独立设置样本的权重,这就是所谓的重新加权。然而,现有的重新加权方法只是根据预测结果或两个领域批量图像中的上下文信息计算局部重新加权信息,这可能会导致过对齐或欠对齐问题。为了解决这个问题,我们提出了一种全局再加权方法。具体来说,我们首先定义目标中心点距离,它描述了源批次数据与目标中心点之间的距离。然后,我们采用弗雷谢特起始距离度量来评估域分歧,并将其嵌入目标中心点距离中。最后,我们提出了一种全局重权策略,以增强源领域监督中的知识可转移性。广泛的实验证明,我们的方法取得了具有竞争力的性能,并有助于提高其他方法的性能。
{"title":"A global reweighting approach for cross-domain semantic segmentation","authors":"Yuhang Zhang ,&nbsp;Shishun Tian ,&nbsp;Muxin Liao ,&nbsp;Guoguang Hua ,&nbsp;Wenbin Zou ,&nbsp;Chen Xu","doi":"10.1016/j.image.2024.117197","DOIUrl":"10.1016/j.image.2024.117197","url":null,"abstract":"<div><div>Unsupervised domain adaptation semantic segmentation attracts much research attention due to the expensive pixel-level annotation cost. Since the adaptation difficulty of samples is different, the weight of samples should be set independently, which is called reweighting. However, existing reweighting methods only calculate local reweighting information from predicted results or context information in batch images of two domains, which may lead to over-alignment or under-alignment problems. To handle this issue, we propose a global reweighting approach. Specifically, we first define the target centroid distance, which describes the distance between the source batch data and the target centroid. Then, we employ a Fréchet Inception Distance metric to evaluate the domain divergence and embed it into the target centroid distance. Finally, a global reweighting strategy is proposed to enhance the knowledge transferability in the source domain supervision. Extensive experiments demonstrate that our approach achieves competitive performance and helps to improve performance in other methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117197"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142359027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory positional encoding for image captioning 图像字幕的记忆位置编码
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-07 DOI: 10.1016/j.image.2024.117201
Xiaobao Yang , Shuai He , Jie Zhang , Sugang Ma , Zhiqiang Hou , Wei Sun

Transformer-based architectures represent the state-of-the-art in image captioning. Due to its natural parallel internal structure, it cannot be aware of the order of inputting tokens, so the positional encoding becomes an indispensable component of Transformer-based models. However, most of the existing absolute positional encodings (APE) have certain limitations for image captioning. Their spatial positional features are predefined and cannot been well generalized to other forms of data, such as visual data. Meanwhile, each positional features are decoupled from each other and lack internal correlation, therefore which affects the accuracy of spatial position context representation of visual or text semantic to a certain extent. Therefore, we propose a memory positional encoding (MPE), which has generalization ability that can be applied to both the visual encoder and the sequence decoder of the image captioning models. In MPE, each positional feature is recursively generated by the learnable network with memory function, making the current generated positional features effectively inherit the genetic information of the previous n positions. In addition, existing positional encodings provide positional features with fixed value and scale, that means, they provide the same positional encoding for different inputs, which is unreasonable. Thus, to address the previous issues of scale and value of current positional encoding methods in practical applications, we further explore dynamic memory positional encoding (DMPE) based on MPE. DMPE dynamically adjusts and generates positional features based on different input to provide them with unique positional representation. Extensive experiments on the MSCOCO validate the effectiveness of MPE and DMPE.

基于变换器的架构代表了图像标题处理的最先进水平。由于其天然的并行内部结构,它无法感知输入标记的顺序,因此位置编码成为基于变换器的模型不可或缺的组成部分。然而,大多数现有的绝对位置编码(APE)在图像字幕方面都有一定的局限性。它们的空间位置特征是预定义的,不能很好地推广到其他形式的数据,如视觉数据。同时,各位置特征之间相互解耦,缺乏内部关联性,因此在一定程度上影响了视觉或文本语义的空间位置上下文表示的准确性。因此,我们提出了一种记忆位置编码(MPE),它具有通用性,可同时应用于图像字幕模型的视觉编码器和序列解码器。在 MPE 中,每个位置特征都是由具有记忆功能的可学习网络递归生成的,从而使当前生成的位置特征有效地继承了前 n 个位置的遗传信息。此外,现有的位置编码方法提供的位置特征具有固定的值和比例,也就是说,它们为不同的输入提供相同的位置编码,这是不合理的。因此,为了解决目前位置编码方法在实际应用中存在的标度和数值问题,我们进一步探索了基于 MPE 的动态存储器位置编码(DMPE)。DMPE 可根据不同的输入动态调整和生成位置特征,为其提供独特的位置表示。在 MSCOCO 上进行的大量实验验证了 MPE 和 DMPE 的有效性。
{"title":"Memory positional encoding for image captioning","authors":"Xiaobao Yang ,&nbsp;Shuai He ,&nbsp;Jie Zhang ,&nbsp;Sugang Ma ,&nbsp;Zhiqiang Hou ,&nbsp;Wei Sun","doi":"10.1016/j.image.2024.117201","DOIUrl":"10.1016/j.image.2024.117201","url":null,"abstract":"<div><p>Transformer-based architectures represent the state-of-the-art in image captioning. Due to its natural parallel internal structure, it cannot be aware of the order of inputting tokens, so the positional encoding becomes an indispensable component of Transformer-based models. However, most of the existing absolute positional encodings (APE) have certain limitations for image captioning. Their spatial positional features are predefined and cannot been well generalized to other forms of data, such as visual data. Meanwhile, each positional features are decoupled from each other and lack internal correlation, therefore which affects the accuracy of spatial position context representation of visual or text semantic to a certain extent. Therefore, we propose a memory positional encoding (MPE), which has generalization ability that can be applied to both the visual encoder and the sequence decoder of the image captioning models. In MPE, each positional feature is recursively generated by the learnable network with memory function, making the current generated positional features effectively inherit the genetic information of the previous <span><math><mi>n</mi></math></span> positions. In addition, existing positional encodings provide positional features with fixed value and scale, that means, they provide the same positional encoding for different inputs, which is unreasonable. Thus, to address the previous issues of scale and value of current positional encoding methods in practical applications, we further explore dynamic memory positional encoding (DMPE) based on MPE. DMPE dynamically adjusts and generates positional features based on different input to provide them with unique positional representation. Extensive experiments on the MSCOCO validate the effectiveness of MPE and DMPE.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117201"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Style Optimization Networks for real-time semantic segmentation of rainy and foggy weather 用于雨雾天气实时语义分割的样式优化网络
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-07 DOI: 10.1016/j.image.2024.117199
Yifang Huang, Haitao He, Hongdou He, Guyu Zhao, Peng Shi, Pengpeng Fu
Semantic segmentation is an essential task in the field of computer vision. Existing semantic segmentation models can achieve good results under good weather and lighting conditions. However, when the external environment changes, the effectiveness of these models are seriously affected. Therefore, we focus on the task of semantic segmentation in rainy and foggy weather. Fog is a common phenomenon in rainy weather conditions and has a negative impact on image visibility. Besides, to make the algorithm satisfy the application requirements of mobile devices, the computational cost and the real-time requirement of the model have become one of the major points of our research. In this paper, we propose a novel Style Optimization Network (SONet) architecture, containing a Style Optimization Module (SOM) that can dynamically learn style information, and a Key information Extraction Module (KEM) that extracts important spatial and contextual information. This can improve the learning ability and robustness of the model for rainy and foggy conditions. Meanwhile, we achieve real-time performance by using lightweight modules and a backbone network with low computational complexity. To validate the effectiveness of our SONet, we synthesized CityScapes dataset for rainy and foggy weather and evaluated the accuracy and complexity of our model. Our model achieves a segmentation accuracy of 75.29% MIoU and 83.62% MPA on a NVIDIA TITAN Xp GPU. Several comparative experiments have shown that our SONet can achieve good performance in semantic segmentation tasks under rainy and foggy weather, and due to the lightweight design of the model we have a good advantage in both accuracy and model complexity.
语义分割是计算机视觉领域的一项重要任务。现有的语义分割模型可以在良好的天气和光照条件下取得良好的效果。然而,当外部环境发生变化时,这些模型的效果就会受到严重影响。因此,我们将重点放在雨雾天气下的语义分割任务上。雾是雨天的常见现象,对图像的可见度有负面影响。此外,为了使算法满足移动设备的应用要求,模型的计算成本和实时性要求也成为我们研究的重点之一。在本文中,我们提出了一种新颖的风格优化网络(SONet)架构,其中包含一个可动态学习风格信息的风格优化模块(SOM)和一个可提取重要空间和上下文信息的关键信息提取模块(KEM)。这可以提高模型的学习能力和在雨雾天气条件下的鲁棒性。同时,通过使用轻量级模块和计算复杂度较低的骨干网络,我们实现了实时性能。为了验证 SONet 的有效性,我们合成了雨雾天气的 CityScapes 数据集,并评估了模型的准确性和复杂性。在英伟达 TITAN Xp GPU 上,我们的模型达到了 75.29% MIoU 和 83.62% MPA 的分割准确率。多项对比实验表明,我们的 SONet 可以在雨雾天气下的语义分割任务中实现良好的性能,而且由于模型的轻量级设计,我们在准确率和模型复杂度方面都具有良好的优势。
{"title":"Style Optimization Networks for real-time semantic segmentation of rainy and foggy weather","authors":"Yifang Huang,&nbsp;Haitao He,&nbsp;Hongdou He,&nbsp;Guyu Zhao,&nbsp;Peng Shi,&nbsp;Pengpeng Fu","doi":"10.1016/j.image.2024.117199","DOIUrl":"10.1016/j.image.2024.117199","url":null,"abstract":"<div><div>Semantic segmentation is an essential task in the field of computer vision. Existing semantic segmentation models can achieve good results under good weather and lighting conditions. However, when the external environment changes, the effectiveness of these models are seriously affected. Therefore, we focus on the task of semantic segmentation in rainy and foggy weather. Fog is a common phenomenon in rainy weather conditions and has a negative impact on image visibility. Besides, to make the algorithm satisfy the application requirements of mobile devices, the computational cost and the real-time requirement of the model have become one of the major points of our research. In this paper, we propose a novel Style Optimization Network (SONet) architecture, containing a Style Optimization Module (SOM) that can dynamically learn style information, and a Key information Extraction Module (KEM) that extracts important spatial and contextual information. This can improve the learning ability and robustness of the model for rainy and foggy conditions. Meanwhile, we achieve real-time performance by using lightweight modules and a backbone network with low computational complexity. To validate the effectiveness of our SONet, we synthesized CityScapes dataset for rainy and foggy weather and evaluated the accuracy and complexity of our model. Our model achieves a segmentation accuracy of 75.29% MIoU and 83.62% MPA on a NVIDIA TITAN Xp GPU. Several comparative experiments have shown that our SONet can achieve good performance in semantic segmentation tasks under rainy and foggy weather, and due to the lightweight design of the model we have a good advantage in both accuracy and model complexity.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117199"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142359025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding 使用稀疏编码的多帧图像超分辨率优化管道的新理论分析
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-07 DOI: 10.1016/j.image.2024.117198
Mohammad Mahdi Afrasiabi, Reshad Hosseini, Aliazam Abbasfar

Super-resolution is the process of obtaining a high-resolution (HR) image from one or more low-resolution (LR) images. Single image super-resolution (SISR) deals with one LR image while multi-frame super-resolution (MFSR) employs several LR ones to reach the HR output. MFSR pipeline consists of alignment, fusion, and reconstruction. We conduct a theoretical analysis using sparse coding (SC) and iterative shrinkage-thresholding algorithm to fill the gap of mathematical justification in the execution order of the optimal MFSR pipeline. Our analysis recommends executing alignment and fusion before the reconstruction stage (whether through deconvolution or SISR techniques). The suggested order ensures enhanced performance in terms of peak signal-to-noise ratio and structural similarity index. The optimal pipeline also reduces computational complexity compared to intuitive approaches that apply SISR to each input LR image. Also, we demonstrate the usefulness of SC in analysis of computer vision tasks such as MFSR, leveraging the sparsity assumption in natural images. Simulation results support the findings of our theoretical analysis, both quantitatively and qualitatively.

超分辨率是指从一个或多个低分辨率(LR)图像中获取高分辨率(HR)图像的过程。单幅图像超分辨率(SISR)处理一幅低分辨率图像,而多幅图像超分辨率(MFSR)则采用多幅低分辨率图像来获得高分辨率输出。MFSR 流程包括对齐、融合和重建。我们利用稀疏编码(SC)和迭代收缩阈值算法进行了理论分析,以填补最佳 MFSR 流水线执行顺序在数学上的不足。我们的分析建议在重建阶段(无论是通过解卷积还是 SISR 技术)之前执行配准和融合。建议的顺序可确保提高峰值信噪比和结构相似性指数的性能。与将 SISR 应用于每个输入 LR 图像的直观方法相比,最佳管道还降低了计算复杂度。此外,我们还利用自然图像中的稀疏性假设,证明了 SC 在 MFSR 等计算机视觉任务分析中的实用性。仿真结果从定量和定性两方面支持了我们的理论分析结果。
{"title":"A novel theoretical analysis on optimal pipeline of multi-frame image super-resolution using sparse coding","authors":"Mohammad Mahdi Afrasiabi,&nbsp;Reshad Hosseini,&nbsp;Aliazam Abbasfar","doi":"10.1016/j.image.2024.117198","DOIUrl":"10.1016/j.image.2024.117198","url":null,"abstract":"<div><p>Super-resolution is the process of obtaining a high-resolution (HR) image from one or more low-resolution (LR) images. Single image super-resolution (SISR) deals with one LR image while multi-frame super-resolution (MFSR) employs several LR ones to reach the HR output. MFSR pipeline consists of alignment, fusion, and reconstruction. We conduct a theoretical analysis using sparse coding (SC) and iterative shrinkage-thresholding algorithm to fill the gap of mathematical justification in the execution order of the optimal MFSR pipeline. Our analysis recommends executing alignment and fusion before the reconstruction stage (whether through deconvolution or SISR techniques). The suggested order ensures enhanced performance in terms of peak signal-to-noise ratio and structural similarity index. The optimal pipeline also reduces computational complexity compared to intuitive approaches that apply SISR to each input LR image. Also, we demonstrate the usefulness of SC in analysis of computer vision tasks such as MFSR, leveraging the sparsity assumption in natural images. Simulation results support the findings of our theoretical analysis, both quantitatively and qualitatively.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117198"},"PeriodicalIF":3.4,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater image enhancement via brightness mask-guided multi-attention embedding 通过亮度掩模引导的多注意力嵌入增强水下图像
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-06 DOI: 10.1016/j.image.2024.117200
Yuanyuan Li, Zetian Mi, Peng Lin, Xianping Fu

Numerous new underwater image enhancement methods have been proposed to correct color and enhance the contrast. Although these methods have achieved satisfactory enhancement results in some respects, few have taken into account the effect of the raw image illumination distribution on the enhancement results, often leading to oversaturation or undersaturation. To solve these problems, an underwater image enhancement network guided by brightness mask with multi-attention embedding, called BMGMANet, is designed. Specifically, considering that different regions in the underwater images have different degradation degrees, which can be implicitly reflected by a brightness mask characterizing the image illumination distribution, a decoder network guided by a reverse brightness mask is designed to enhance the dark regions while suppressing excessive enhancement of the bright regions. In addition, a triple-attention module is designed to further enhance the contrast of the underwater image and recover more details. Extensive comparative experiments demonstrate that the enhancement results of our network outperform those of existing state-of-the-art methods. Furthermore, additional experiments also prove that our BMGMANet can effectively enhance the non-uniform illumination underwater images and improve the performance of saliency object detection in underwater images.

人们提出了许多新的水下图像增强方法来校正颜色和增强对比度。虽然这些方法在某些方面取得了令人满意的增强效果,但很少有方法考虑到原始图像光照分布对增强结果的影响,这往往会导致过饱和或欠饱和。为了解决这些问题,我们设计了一种以亮度掩码为引导的水下图像增强网络,称为 BMGMANet。具体来说,考虑到水下图像的不同区域有不同的劣化程度,这可以通过表征图像光照分布的亮度掩码隐式地反映出来,因此设计了一个由反向亮度掩码引导的解码器网络,以增强暗区,同时抑制亮区的过度增强。此外,还设计了一个三重关注模块,以进一步增强水下图像的对比度,恢复更多细节。广泛的对比实验证明,我们网络的增强效果优于现有的最先进方法。此外,其他实验也证明,我们的 BMGMANet 可以有效增强非均匀光照水下图像,并提高水下图像中显著性物体检测的性能。
{"title":"Underwater image enhancement via brightness mask-guided multi-attention embedding","authors":"Yuanyuan Li,&nbsp;Zetian Mi,&nbsp;Peng Lin,&nbsp;Xianping Fu","doi":"10.1016/j.image.2024.117200","DOIUrl":"10.1016/j.image.2024.117200","url":null,"abstract":"<div><p>Numerous new underwater image enhancement methods have been proposed to correct color and enhance the contrast. Although these methods have achieved satisfactory enhancement results in some respects, few have taken into account the effect of the raw image illumination distribution on the enhancement results, often leading to oversaturation or undersaturation. To solve these problems, an underwater image enhancement network guided by brightness mask with multi-attention embedding, called BMGMANet, is designed. Specifically, considering that different regions in the underwater images have different degradation degrees, which can be implicitly reflected by a brightness mask characterizing the image illumination distribution, a decoder network guided by a reverse brightness mask is designed to enhance the dark regions while suppressing excessive enhancement of the bright regions. In addition, a triple-attention module is designed to further enhance the contrast of the underwater image and recover more details. Extensive comparative experiments demonstrate that the enhancement results of our network outperform those of existing state-of-the-art methods. Furthermore, additional experiments also prove that our BMGMANet can effectively enhance the non-uniform illumination underwater images and improve the performance of saliency object detection in underwater images.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117200"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing DJUHNet:一种基于深度表示学习的方案,适用于联合图像上采样和散列任务
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-06 DOI: 10.1016/j.image.2024.117187
Alireza Esmaeilzehi , Morteza Mirzaei , Hossein Zaredar , Dimitrios Hatzinakos , M. Omair Ahmad

In recent years, numerous efficient schemes that employ deep neural networks have been developed for the task of image hashing. However, not much attention is paid to enhancing the performance and robustness of these deep hashing networks, when the input images do not possess high spatial resolution and visual quality. This is a critical problem, as often accessing high-quality high-resolution images is not guaranteed in real-life applications. In this paper, we propose a novel method for the task of joint image upsampling and hashing, that uses a three-stage design. Specifically, in the first two stages of the proposed scheme, we obtain two deep neural networks, each of which is individually trained for the task of image super resolution and image hashing, respectively. We then fine-tune the two deep networks thus obtained by using the ideas of representation learning and alternating optimization process, in order to produce a set of optimal parameters for the task of joint image upsampling and hashing. The effectiveness of the various ideas utilized for designing the proposed method is demonstrated by performing different experimentations. It is shown that the proposed scheme is able to outperform the state-of-the-art image super resolution and hashing methods, even when they are trained simultaneously in a joint end-to-end manner.

近年来,针对图像散列任务开发了许多采用深度神经网络的高效方案。然而,当输入图像不具备高空间分辨率和视觉质量时,如何提高这些深度哈希网络的性能和鲁棒性却没有得到太多关注。这是一个关键问题,因为在实际应用中往往无法保证获得高质量的高分辨率图像。在本文中,我们针对联合图像上采样和散列任务提出了一种采用三阶段设计的新方法。具体来说,在所提方案的前两个阶段,我们获得了两个深度神经网络,每个网络分别针对图像超分辨率和图像散列任务进行单独训练。然后,我们利用表征学习和交替优化过程的思想对由此获得的两个深度网络进行微调,以便为联合图像上采样和散列任务生成一组最佳参数。通过进行不同的实验,证明了设计拟议方法所采用的各种思想的有效性。实验结果表明,即使以端到端联合方式同时训练图像上采样和哈希算法,所提出的方案也能超越最先进的图像上采样和哈希算法。
{"title":"DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing","authors":"Alireza Esmaeilzehi ,&nbsp;Morteza Mirzaei ,&nbsp;Hossein Zaredar ,&nbsp;Dimitrios Hatzinakos ,&nbsp;M. Omair Ahmad","doi":"10.1016/j.image.2024.117187","DOIUrl":"10.1016/j.image.2024.117187","url":null,"abstract":"<div><p>In recent years, numerous efficient schemes that employ deep neural networks have been developed for the task of image hashing. However, not much attention is paid to enhancing the performance and robustness of these deep hashing networks, when the input images do not possess high spatial resolution and visual quality. This is a critical problem, as often accessing high-quality high-resolution images is not guaranteed in real-life applications. In this paper, we propose a novel method for the task of joint image upsampling and hashing, that uses a three-stage design. Specifically, in the first two stages of the proposed scheme, we obtain two deep neural networks, each of which is individually trained for the task of image super resolution and image hashing, respectively. We then fine-tune the two deep networks thus obtained by using the ideas of representation learning and alternating optimization process, in order to produce a set of optimal parameters for the task of joint image upsampling and hashing. The effectiveness of the various ideas utilized for designing the proposed method is demonstrated by performing different experimentations. It is shown that the proposed scheme is able to outperform the state-of-the-art image super resolution and hashing methods, even when they are trained simultaneously in a joint end-to-end manner.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117187"},"PeriodicalIF":3.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Globally and locally optimized Pannini projection for high FoV rendering of 360° images 全局和局部优化的潘尼尼投影,用于 360° 图像的高 FoV 渲染
IF 3.4 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-30 DOI: 10.1016/j.image.2024.117190
Falah Jabar, João Ascenso, Maria Paula Queluz

To render a spherical (360° or omnidirectional) image on planar displays, a 2D image - called as viewport - must be obtained by projecting a sphere region on a plane, according to the user's viewing direction and a predefined field of view (FoV). However, any sphere to plan projection introduces geometric distortions, such as object stretching and/or bending of straight lines, which intensity increases with the considered FoV. In this paper, a fully automatic content-aware projection is proposed, aiming to reduce the geometric distortions when high FoVs are used. This new projection is based on the Pannini projection, whose parameters are firstly globally optimized according to the image content, followed by a local conformality improvement of relevant viewport objects. A crowdsourcing subjective test showed that the proposed projection is the most preferred solution among the considered state-of-the-art sphere to plan projections, producing viewports with a more pleasant visual quality.

要在平面显示器上呈现球形(360° 或全方位)图像,必须根据用户的观看方向和预定义的视场(FoV),通过在平面上投影球形区域来获得二维图像(称为视口)。然而,任何球面到平面的投影都会带来几何失真,如物体拉伸和/或直线弯曲,其强度随所考虑的 FoV 而增加。本文提出了一种全自动内容感知投影,旨在减少使用高视场角时的几何失真。这种新投影基于潘尼尼投影,其参数首先根据图像内容进行全局优化,然后对相关视口对象进行局部保形改进。众包主观测试表明,在目前最先进的球面到平面投影中,建议的投影是最受欢迎的解决方案,能产生视觉质量更佳的视口。
{"title":"Globally and locally optimized Pannini projection for high FoV rendering of 360° images","authors":"Falah Jabar,&nbsp;João Ascenso,&nbsp;Maria Paula Queluz","doi":"10.1016/j.image.2024.117190","DOIUrl":"10.1016/j.image.2024.117190","url":null,"abstract":"<div><p>To render a spherical (360° or omnidirectional) image on planar displays, a 2D image - called as viewport - must be obtained by projecting a sphere region on a plane, according to the user's viewing direction and a predefined field of view (FoV). However, any sphere to plan projection introduces geometric distortions, such as object stretching and/or bending of straight lines, which intensity increases with the considered FoV. In this paper, a fully automatic content-aware projection is proposed, aiming to reduce the geometric distortions when high FoVs are used. This new projection is based on the Pannini projection, whose parameters are firstly globally optimized according to the image content, followed by a local conformality improvement of relevant viewport objects. A crowdsourcing subjective test showed that the proposed projection is the most preferred solution among the considered state-of-the-art sphere to plan projections, producing viewports with a more pleasant visual quality.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117190"},"PeriodicalIF":3.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000912/pdfft?md5=1ff2da4c676f5e3a19cdbe6c4c5f6989&pid=1-s2.0-S0923596524000912-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal Processing-Image Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1