ACM Multimedia Asia最新文献

英文中文

Head-Motion-Aware Viewport Margins for Improving User Experience in Immersive Video 改善沉浸式视频用户体验的头部运动感知视口边距

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490573

Mehmet N. Akcay, Burak Kara, Saba Ahsan, A. Begen, I. Curcio, Emre B. Aksu

Viewport-dependent delivery (VDD) is a technique to save network resources during the transmission of immersive videos. However, it results in a non-zero motion-to-high-quality delay (MTHQD), which is the delta time from the moment where the current viewport has at least one low-quality tile to when all the tiles in the new viewport are rendered in high quality. MTHQD is an important metric in the evaluation of the VDD systems. This paper improves an earlier concept called viewport margins by introducing head-motion awareness. The primary benefit of this improvement is the reduction (up to 64%) in the average MTHQD.

视口相关传输(VDD)是一种在沉浸式视频传输过程中节省网络资源的技术。然而，它会导致非零运动到高质量延迟(MTHQD)，这是从当前视口中至少有一个低质量贴图到新视口中所有贴图都以高质量渲染的增量时间。MTHQD是VDD系统评价中的一个重要指标。本文通过引入头部运动感知来改进早期的视口边缘概念。这种改进的主要好处是减少了平均MTHQD(高达64%)。

引用次数: 4

Learning to Decompose and Restore Low-light Images with Wavelet Transform 学习用小波变换分解和恢复弱光图像

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490622

Pengju Zhang, Chaofan Zhang, Zheng Rong, Yihong Wu

Low-light images often suffer from low visibility and various noise. Most existing low-light image enhancement methods often amplify noise when enhancing low-light images, due to the neglect of separating valuable image information and noise. In this paper, we propose a novel wavelet-based attention network, where wavelet transform is integrated into attention learning for joint low-light enhancement and noise suppression. Particularly, the proposed wavelet-based attention network includes a Decomposition-Net, an Enhancement-Net and a Restoration-Net. In Decomposition-Net, to benefit denoising, wavelet transform layers are designed for separating noise and global content information into different frequency features. Furthermore, an attention-based strategy is introduced to progressively select suitable frequency features for accurately restoring illumination and reflectance according to Retinex theory. In addition, Enhancement-Net is introduced for further removing degradations in reflectance and adjusting illumination, while Restoration-Net employs conditional adversarial learning to adversarially improve the visual quality of final restored results based on enhanced illumination and reflectance. Extensive experiments on several public datasets demonstrate that the proposed method achieves more pleasing results than state-of-the-art methods.

弱光图像通常会受到低可见度和各种噪声的影响。现有的大多数弱光图像增强方法在增强弱光图像时往往会放大噪声，忽略了对有价值的图像信息和噪声的分离。在本文中，我们提出了一种新的基于小波的注意力网络，将小波变换集成到注意力学习中，以联合弱光增强和噪声抑制。特别地，提出的基于小波的注意力网络包括分解网络、增强网络和恢复网络。在Decomposition-Net中，为了更好地去噪，设计了小波变换层，将噪声和全局内容信息分离成不同的频率特征。此外，根据Retinex理论，提出了一种基于注意力的策略，逐步选择合适的频率特征，以准确地恢复光照和反射率。此外，还引入了Enhancement-Net来进一步消除反射率的退化和调整照明，而restore - net则采用条件对抗学习来基于增强的照明和反射率对抗性地提高最终恢复结果的视觉质量。在多个公开数据集上进行的大量实验表明，所提出的方法比目前最先进的方法取得了更令人满意的结果。

{"title":"Learning to Decompose and Restore Low-light Images with Wavelet Transform","authors":"Pengju Zhang, Chaofan Zhang, Zheng Rong, Yihong Wu","doi":"10.1145/3469877.3490622","DOIUrl":"https://doi.org/10.1145/3469877.3490622","url":null,"abstract":"Low-light images often suffer from low visibility and various noise. Most existing low-light image enhancement methods often amplify noise when enhancing low-light images, due to the neglect of separating valuable image information and noise. In this paper, we propose a novel wavelet-based attention network, where wavelet transform is integrated into attention learning for joint low-light enhancement and noise suppression. Particularly, the proposed wavelet-based attention network includes a Decomposition-Net, an Enhancement-Net and a Restoration-Net. In Decomposition-Net, to benefit denoising, wavelet transform layers are designed for separating noise and global content information into different frequency features. Furthermore, an attention-based strategy is introduced to progressively select suitable frequency features for accurately restoring illumination and reflectance according to Retinex theory. In addition, Enhancement-Net is introduced for further removing degradations in reflectance and adjusting illumination, while Restoration-Net employs conditional adversarial learning to adversarially improve the visual quality of final restored results based on enhanced illumination and reflectance. Extensive experiments on several public datasets demonstrate that the proposed method achieves more pleasing results than state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"55 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132090962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Hard-Boundary Attention Network for Nuclei Instance Segmentation 核实例分割的硬边界关注网络

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490602

Yalu Cheng, Pengchong Qiao, Hong-Ju He, Guoli Song, Jie Chen

Image segmentation plays an important role in medical image analysis, and accurate segmentation of nuclei is especially crucial to clinical diagnosis. However, existing methods fail to segment dense nuclei due to the hard-boundary which has similar texture to nuclear inside. To this end, we propose a Hard-Boundary Attention Network (HBANet) for nuclei instance segmentation. Specifically, we propose a Background Weaken Module (BWM) to weaken the attention of our model to the nucleus background by integrating low-level features into high-level features. To improve the robustness of the model to the hard-boundary of nuclei, we further design a Gradient-based boundary adaptive Strategy (GS) which generates boundary-weakened data for model training in an adversarial manner. We conduct extensive experiments on MoNuSeg and CPM-17 datasets, and experimental results show that our HBANet outperforms the state-of-the-art methods.

图像分割在医学图像分析中占有重要地位，准确分割细胞核对临床诊断尤为重要。然而，现有的方法由于硬边界与内部核的纹理相似而无法分割致密核。为此，我们提出了一种硬边界注意网络(HBANet)来分割核实例。具体来说，我们提出了一个背景减弱模块(BWM)，通过将低级特征集成到高级特征中来削弱我们的模型对核背景的关注。为了提高模型对核硬边界的鲁棒性，我们进一步设计了一种基于梯度的边界自适应策略(GS)，该策略以对抗的方式生成边界弱化数据用于模型训练。我们在MoNuSeg和CPM-17数据集上进行了广泛的实验，实验结果表明我们的HBANet优于最先进的方法。

引用次数: 0

An Embarrassingly Simple Approach to Discrete Supervised Hashing 离散监督哈希的一种令人尴尬的简单方法

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3493595

Shuguang Zhao, Bingzhi Chen, Zheng Zhang, Guangming Lu

Prior hashing works typically learn a projection function from high-dimensional visual feature space to low-dimensional latent space. However, such a projection function remains several crucial bottlenecks: 1) information loss and coding redundancy are inevitable; 2) the available information of semantic labels is not well-explored; 3) the learned latent embedding lacks explicit semantic meaning. To overcome these limitations, we propose a novel supervised Discrete Auto-Encoder Hashing (DAEH) framework, in which a linear auto-encoder can effectively project the semantic labels of images into a latent representation space. Instead of using the visual feature projection, the proposed DAEH framework skillfully explores the semantic information of supervised labels to refine the latent feature embedding and further optimizes hashing function. Meanwhile, we reformulate the objective and relax the discrete constraints for the binary optimization problem. Extensive experiments on Caltech-256, CIFAR-10, and MNIST datasets demonstrate that our method can outperform the state-of-the-art hashing baselines.

先前的哈希算法通常学习一个从高维视觉特征空间到低维潜在空间的投影函数。然而，这种投影函数仍然存在几个关键的瓶颈:1)信息丢失和编码冗余是不可避免的;2)语义标签的可用信息挖掘不够充分;3)学习到的潜在嵌入缺乏明确的语义。为了克服这些限制，我们提出了一种新的监督离散自编码器哈希(DAEH)框架，其中线性自编码器可以有效地将图像的语义标签投影到潜在表示空间中。本文提出的DAEH框架不是使用视觉特征投影，而是巧妙地挖掘监督标签的语义信息，以细化潜在特征嵌入，并进一步优化哈希函数。同时，我们对二元优化问题的目标进行了重新表述，并放宽了离散约束。在Caltech-256、CIFAR-10和MNIST数据集上进行的大量实验表明，我们的方法可以优于最先进的散列基线。

引用次数: 1

Language Based Image Quality Assessment 基于语言的图像质量评估

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490605

L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo

Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.

在视觉领域，生成模型的评估通常是为读者提供轶事结果。在图像增强的情况下，通常可以使用参考图像。尽管如此，使用基于信号的指标往往会导致违反直觉的结果:高度自然的清晰图像可能比模糊的图像获得更低的分数。另一方面，盲参考图像评估可能会使gan重建的图像排名高于原始未失真图像。为了避免耗时的基于人类的图像评估，可以利用语义计算机视觉任务来代替[9,25,33]。在本文中，我们提倡使用语言生成任务来评估恢复图像的质量。我们通过实验证明，作为下游任务的图像字幕可以作为图像质量评分的一种方法。相对于基于信号的指标或无参考图像质量指标，字幕分数更好地与人类排名保持一致。我们展示了局部图像结构的人为破坏如何将图像标题导向错误方向的见解。

引用次数: 4

Chinese White Dolphin Detection in the Wild 在野外发现中华白海豚

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490574

Hao Zhang, Qi Zhang, P. Nguyen, Victor C. S. Lee, Antoni B. Chan

For ecological protection of the ocean, biologists usually conduct line-transect vessel surveys to measure sea species’ population density within their habitat (such as dolphins). However, sea species observation via vessel surveys consumes a lot of manpower resources and is more challenging compared to observing common objects, due to the scarcity of the object in the wild, tiny-size of the objects, and similar-sized distracter objects (e.g., floating trash). To reduce the human experts’ workload and improve the observation accuracy, in this paper, we develop a practical system to detect Chinese White Dolphins in the wild automatically. First, we construct a dataset named Dolphin-14k with more than 2.6k dolphin instances. To improve the dataset annotation efficiency caused by the rarity of dolphins, we design an interactive dolphin box annotation strategy to annotate sparse dolphin instances in long videos efficiently. Second, we compare the performance and efficiency of three off-the-shelf object detection algorithms, including Faster-RCNN, FCOS, and YoloV5, on the Dolphin-14k dataset and pick YoloV5 as the detector, where a new category (Distracter) is added to the model training to reject the false positives. Finally, we incorporate the dolphin detector into a system prototype, which detects dolphins in video frames at 100.99 FPS per GPU with high accuracy (i.e., 90.95 mAP@0.5).

为了海洋的生态保护，生物学家通常会进行样线船调查，以测量海洋物种在其栖息地(如海豚)内的种群密度。然而，由于野外观测对象的稀缺性、对象体积小、干扰物(如漂浮垃圾)的大小相似，船舶调查海洋物种观测消耗了大量的人力资源，比普通物体的观测更具挑战性。为了减少人类专家的工作量，提高观测精度，本文开发了一种实用的野外中华白海豚自动检测系统。首先，我们构建了一个名为dolphin -14k的数据集，其中包含超过2.6万个海豚实例。为了提高海豚数量稀少导致的数据集标注效率，设计了一种交互式海豚盒标注策略，对长视频中稀疏的海豚实例进行高效标注。其次，我们比较了三种现成的目标检测算法，包括Faster-RCNN, FCOS和YoloV5，在Dolphin-14k数据集上的性能和效率，并选择YoloV5作为检测器，其中在模型训练中添加了一个新的类别(distrator)来拒绝误报。最后，我们将海豚检测器合并到系统原型中，该系统以每GPU 100.99 FPS的高精度(即90.95 mAP@0.5)检测视频帧中的海豚。

{"title":"Chinese White Dolphin Detection in the Wild","authors":"Hao Zhang, Qi Zhang, P. Nguyen, Victor C. S. Lee, Antoni B. Chan","doi":"10.1145/3469877.3490574","DOIUrl":"https://doi.org/10.1145/3469877.3490574","url":null,"abstract":"For ecological protection of the ocean, biologists usually conduct line-transect vessel surveys to measure sea species’ population density within their habitat (such as dolphins). However, sea species observation via vessel surveys consumes a lot of manpower resources and is more challenging compared to observing common objects, due to the scarcity of the object in the wild, tiny-size of the objects, and similar-sized distracter objects (e.g., floating trash). To reduce the human experts’ workload and improve the observation accuracy, in this paper, we develop a practical system to detect Chinese White Dolphins in the wild automatically. First, we construct a dataset named Dolphin-14k with more than 2.6k dolphin instances. To improve the dataset annotation efficiency caused by the rarity of dolphins, we design an interactive dolphin box annotation strategy to annotate sparse dolphin instances in long videos efficiently. Second, we compare the performance and efficiency of three off-the-shelf object detection algorithms, including Faster-RCNN, FCOS, and YoloV5, on the Dolphin-14k dataset and pick YoloV5 as the detector, where a new category (Distracter) is added to the model training to reject the false positives. Finally, we incorporate the dolphin detector into a system prototype, which detects dolphins in video frames at 100.99 FPS per GPU with high accuracy (i.e., 90.95 mAP@0.5).","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126273047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design 新药物设计中自主分子生成的深度强化学习和对接模拟

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497694

Hao Liu, Qian Wang, Xiaotong Hu

In medicinal chemistry programs, it is key to design and make compounds that are efficacious and safe. In this study, we developed a new deep Reinforcement learning-based compounds molecular generation method. Because chemical space is impractically large, and many existing generation models generate molecules that lack effectiveness, novelty and unsatisfactory molecular properties. Our proposed method-DeepRLDS, which integrates transformer network, balanced binary tree search and docking simulation based on super large-scale supercomputing, can solve these problems well. Experiments show that more than 96 of the generated molecules are chemically valid, 99 of the generated molecules are chemically novelty, the generated molecules have satisfactory molecular properties and possess a broader chemical space distribution.

在药物化学项目中，设计和制造有效和安全的化合物是关键。在这项研究中，我们开发了一种新的基于深度强化学习的化合物分子生成方法。由于化学空间大得不切实际，现有的许多生成模型生成的分子缺乏有效性、新颖性和令人不满意的分子性质。我们提出的deeprlds方法将变压器网络、平衡二叉树搜索和基于超大规模超级计算的对接仿真相结合，可以很好地解决这些问题。实验表明，所合成的分子中96个以上具有化学有效性，99个以上具有化学新颖性，所合成的分子具有令人满意的分子性质，具有更广泛的化学空间分布。

引用次数: 1

Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization 用于视频稳定的帧内和帧间迭代时间卷积网络

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490608

Haopeng Xie, Liang Xiao, Huicong Wu

Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.

视频抖动是镜头在时间序列上不规则运动的一种令人不适的产物。如何在一段连续视频帧中提取运动状态信息是视频防抖的主要问题。在本文中，我们提出了一种新的序列模型，帧内和帧间迭代时间卷积网络(I3TC-Net)，它交替地传递帧内和帧之间运动的时空相关性。我们假设运动状态信息可以用传输状态来表示。具体来说，我们采用卷积长短期记忆(ConvLSTM)和嵌入式编码器-解码器相结合的方法来生成潜在稳定帧，该帧用于迭代更新传输状态，并有效地学习每个不稳定帧的全局单应变换，从而沿时间轴产生相应的稳定结果。在此基础上，我们创建了视频数据集，解决了稳定数据的不足，提高了训练效果。实验结果表明，我们的方法在公开可用的视频上优于最先进的结果，例如稳定性得分提高了5.4分。项目页面可在https://github.com/root2022IIITC/IIITC上找到。

{"title":"Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization","authors":"Haopeng Xie, Liang Xiao, Huicong Wu","doi":"10.1145/3469877.3490608","DOIUrl":"https://doi.org/10.1145/3469877.3490608","url":null,"abstract":"Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"42 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130679449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentially Private Learning with Grouped Gradient Clipping 分组梯度裁剪的差异私有学习

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3490594

Haolin Liu, Chenyu Li, Bochao Liu, Pengju Wang, Shiming Ge, Weiping Wang

While deep learning has proved success in many critical tasks by training models from large-scale data, some private information within can be recovered from the released models, leading to the leakage of privacy. To address this problem, this paper presents a differentially private deep learning paradigm to train private models. In the approach, we propose and incorporate a simple operation termed grouped gradient clipping to modulate the gradient weights. We also incorporated the smooth sensitivity mechanism into differentially private deep learning paradigm, which bounds the adding Gaussian noise. In this way, the resulting model can simultaneously provide with strong privacy protection and avoid accuracy degradation, providing a good trade-off between privacy and performance. The theoretic advantages of grouped gradient clipping are well analyzed. Extensive evaluations on popular benchmarks and comparisons with 11 state-of-the-arts clearly demonstrate the effectiveness and genearalizability of our approach.

虽然深度学习通过从大规模数据中训练模型在许多关键任务中被证明是成功的，但其中的一些私有信息可以从发布的模型中恢复，从而导致隐私泄露。为了解决这个问题，本文提出了一种差分私有深度学习范式来训练私有模型。在该方法中，我们提出并结合了一种简单的操作，称为分组梯度裁剪来调制梯度权重。我们还将平滑灵敏度机制纳入差分私有深度学习范式，该范式限制了高斯噪声的添加。这样，得到的模型可以同时提供强大的隐私保护和避免准确性下降，在隐私和性能之间提供了很好的权衡。分析了分组梯度裁剪的理论优势。对流行基准的广泛评价和与11项最先进技术的比较清楚地表明我们的方法的有效性和普遍性。

引用次数: 10

Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images 基于多尺度图卷积网络和动态迭代类损失的遥感图像船舶分割

ACM Multimedia Asia

Pub Date : 2021-12-01 DOI: 10.1145/3469877.3497699

Yanru Jiang, Chengyu Zheng, Zhao-Hui Wang, Rui Wang, Min Ye, Chenglong Wang, Ning Song, Jie Nie

The accuracy of the semantic segmentation results of ships is of great significance to coastline navigation, resource management, and territorial protection. Although the ship semantic segmentation method based on deep learning has made great progress, there is still the problem of not exploring the correlation between the targets. In order to avoid the above problems, this paper designed a multi-scale graph convolutional network and dynamic iterative class loss for ship segmentation in remote sensing images to generate more accurate segmentation results. Based on DeepLabv3+, our network uses deep convolutional networks and atrous convolutions for multi-scale feature extraction. In particular, for multi-scale semantic features, we propose to construct a Multi-Scale Graph Convolution Network (MSGCN) to introduce semantic correlation information for pixel feature learning by GCN, which enhances the segmentation result of ship objects. In addition, we propose a Dynamic Iterative Class Loss (DICL) based on iterative batch-wise class rectification instead of pre-computing the fixed weights over the whole dataset, which solves the problem of imbalance between positive and negative samples. We compared the proposed algorithm with the most advanced deep learning target detection methods and ship detection methods and proved the superiority of our method. On a High-Resolution SAR Images Dataset [1], ship detection and instance segmentation can be implemented well.

船舶语义分割结果的准确性对海岸线导航、资源管理和国土保护具有重要意义。虽然基于深度学习的船舶语义分割方法已经取得了很大的进展，但仍然存在未探索目标之间相关性的问题。为了避免上述问题，本文设计了遥感图像船舶分割的多尺度图卷积网络和动态迭代类损失，以获得更准确的分割结果。基于DeepLabv3+，我们的网络使用深度卷积网络和亚属性卷积进行多尺度特征提取。特别是针对多尺度语义特征，提出构建多尺度图卷积网络(MSGCN)，引入语义相关信息进行像素特征学习，提高了船舶目标的分割效果。此外，我们提出了一种基于迭代分批类校正的动态迭代类损失(Dynamic Iterative Class Loss, DICL)方法，而不是预先计算整个数据集的固定权值，从而解决了正、负样本之间的不平衡问题。将本文算法与目前最先进的深度学习目标检测方法和船舶检测方法进行了比较，证明了本文算法的优越性。在高分辨率SAR图像数据集[1]上，可以很好地实现舰船检测和实例分割。

{"title":"Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images","authors":"Yanru Jiang, Chengyu Zheng, Zhao-Hui Wang, Rui Wang, Min Ye, Chenglong Wang, Ning Song, Jie Nie","doi":"10.1145/3469877.3497699","DOIUrl":"https://doi.org/10.1145/3469877.3497699","url":null,"abstract":"The accuracy of the semantic segmentation results of ships is of great significance to coastline navigation, resource management, and territorial protection. Although the ship semantic segmentation method based on deep learning has made great progress, there is still the problem of not exploring the correlation between the targets. In order to avoid the above problems, this paper designed a multi-scale graph convolutional network and dynamic iterative class loss for ship segmentation in remote sensing images to generate more accurate segmentation results. Based on DeepLabv3+, our network uses deep convolutional networks and atrous convolutions for multi-scale feature extraction. In particular, for multi-scale semantic features, we propose to construct a Multi-Scale Graph Convolution Network (MSGCN) to introduce semantic correlation information for pixel feature learning by GCN, which enhances the segmentation result of ship objects. In addition, we propose a Dynamic Iterative Class Loss (DICL) based on iterative batch-wise class rectification instead of pre-computing the fixed weights over the whole dataset, which solves the problem of imbalance between positive and negative samples. We compared the proposed algorithm with the most advanced deep learning target detection methods and ship detection methods and proved the superiority of our method. On a High-Resolution SAR Images Dataset [1], ship detection and instance segmentation can be implemented well.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"98 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113983351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Multimedia Asia

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀