Mehmet N. Akcay, Burak Kara, Saba Ahsan, A. Begen, I. Curcio, Emre B. Aksu
Viewport-dependent delivery (VDD) is a technique to save network resources during the transmission of immersive videos. However, it results in a non-zero motion-to-high-quality delay (MTHQD), which is the delta time from the moment where the current viewport has at least one low-quality tile to when all the tiles in the new viewport are rendered in high quality. MTHQD is an important metric in the evaluation of the VDD systems. This paper improves an earlier concept called viewport margins by introducing head-motion awareness. The primary benefit of this improvement is the reduction (up to 64%) in the average MTHQD.
{"title":"Head-Motion-Aware Viewport Margins for Improving User Experience in Immersive Video","authors":"Mehmet N. Akcay, Burak Kara, Saba Ahsan, A. Begen, I. Curcio, Emre B. Aksu","doi":"10.1145/3469877.3490573","DOIUrl":"https://doi.org/10.1145/3469877.3490573","url":null,"abstract":"Viewport-dependent delivery (VDD) is a technique to save network resources during the transmission of immersive videos. However, it results in a non-zero motion-to-high-quality delay (MTHQD), which is the delta time from the moment where the current viewport has at least one low-quality tile to when all the tiles in the new viewport are rendered in high quality. MTHQD is an important metric in the evaluation of the VDD systems. This paper improves an earlier concept called viewport margins by introducing head-motion awareness. The primary benefit of this improvement is the reduction (up to 64%) in the average MTHQD.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121361126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-light images often suffer from low visibility and various noise. Most existing low-light image enhancement methods often amplify noise when enhancing low-light images, due to the neglect of separating valuable image information and noise. In this paper, we propose a novel wavelet-based attention network, where wavelet transform is integrated into attention learning for joint low-light enhancement and noise suppression. Particularly, the proposed wavelet-based attention network includes a Decomposition-Net, an Enhancement-Net and a Restoration-Net. In Decomposition-Net, to benefit denoising, wavelet transform layers are designed for separating noise and global content information into different frequency features. Furthermore, an attention-based strategy is introduced to progressively select suitable frequency features for accurately restoring illumination and reflectance according to Retinex theory. In addition, Enhancement-Net is introduced for further removing degradations in reflectance and adjusting illumination, while Restoration-Net employs conditional adversarial learning to adversarially improve the visual quality of final restored results based on enhanced illumination and reflectance. Extensive experiments on several public datasets demonstrate that the proposed method achieves more pleasing results than state-of-the-art methods.
{"title":"Learning to Decompose and Restore Low-light Images with Wavelet Transform","authors":"Pengju Zhang, Chaofan Zhang, Zheng Rong, Yihong Wu","doi":"10.1145/3469877.3490622","DOIUrl":"https://doi.org/10.1145/3469877.3490622","url":null,"abstract":"Low-light images often suffer from low visibility and various noise. Most existing low-light image enhancement methods often amplify noise when enhancing low-light images, due to the neglect of separating valuable image information and noise. In this paper, we propose a novel wavelet-based attention network, where wavelet transform is integrated into attention learning for joint low-light enhancement and noise suppression. Particularly, the proposed wavelet-based attention network includes a Decomposition-Net, an Enhancement-Net and a Restoration-Net. In Decomposition-Net, to benefit denoising, wavelet transform layers are designed for separating noise and global content information into different frequency features. Furthermore, an attention-based strategy is introduced to progressively select suitable frequency features for accurately restoring illumination and reflectance according to Retinex theory. In addition, Enhancement-Net is introduced for further removing degradations in reflectance and adjusting illumination, while Restoration-Net employs conditional adversarial learning to adversarially improve the visual quality of final restored results based on enhanced illumination and reflectance. Extensive experiments on several public datasets demonstrate that the proposed method achieves more pleasing results than state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"55 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132090962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yalu Cheng, Pengchong Qiao, Hong-Ju He, Guoli Song, Jie Chen
Image segmentation plays an important role in medical image analysis, and accurate segmentation of nuclei is especially crucial to clinical diagnosis. However, existing methods fail to segment dense nuclei due to the hard-boundary which has similar texture to nuclear inside. To this end, we propose a Hard-Boundary Attention Network (HBANet) for nuclei instance segmentation. Specifically, we propose a Background Weaken Module (BWM) to weaken the attention of our model to the nucleus background by integrating low-level features into high-level features. To improve the robustness of the model to the hard-boundary of nuclei, we further design a Gradient-based boundary adaptive Strategy (GS) which generates boundary-weakened data for model training in an adversarial manner. We conduct extensive experiments on MoNuSeg and CPM-17 datasets, and experimental results show that our HBANet outperforms the state-of-the-art methods.
{"title":"Hard-Boundary Attention Network for Nuclei Instance Segmentation","authors":"Yalu Cheng, Pengchong Qiao, Hong-Ju He, Guoli Song, Jie Chen","doi":"10.1145/3469877.3490602","DOIUrl":"https://doi.org/10.1145/3469877.3490602","url":null,"abstract":"Image segmentation plays an important role in medical image analysis, and accurate segmentation of nuclei is especially crucial to clinical diagnosis. However, existing methods fail to segment dense nuclei due to the hard-boundary which has similar texture to nuclear inside. To this end, we propose a Hard-Boundary Attention Network (HBANet) for nuclei instance segmentation. Specifically, we propose a Background Weaken Module (BWM) to weaken the attention of our model to the nucleus background by integrating low-level features into high-level features. To improve the robustness of the model to the hard-boundary of nuclei, we further design a Gradient-based boundary adaptive Strategy (GS) which generates boundary-weakened data for model training in an adversarial manner. We conduct extensive experiments on MoNuSeg and CPM-17 datasets, and experimental results show that our HBANet outperforms the state-of-the-art methods.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuguang Zhao, Bingzhi Chen, Zheng Zhang, Guangming Lu
Prior hashing works typically learn a projection function from high-dimensional visual feature space to low-dimensional latent space. However, such a projection function remains several crucial bottlenecks: 1) information loss and coding redundancy are inevitable; 2) the available information of semantic labels is not well-explored; 3) the learned latent embedding lacks explicit semantic meaning. To overcome these limitations, we propose a novel supervised Discrete Auto-Encoder Hashing (DAEH) framework, in which a linear auto-encoder can effectively project the semantic labels of images into a latent representation space. Instead of using the visual feature projection, the proposed DAEH framework skillfully explores the semantic information of supervised labels to refine the latent feature embedding and further optimizes hashing function. Meanwhile, we reformulate the objective and relax the discrete constraints for the binary optimization problem. Extensive experiments on Caltech-256, CIFAR-10, and MNIST datasets demonstrate that our method can outperform the state-of-the-art hashing baselines.
{"title":"An Embarrassingly Simple Approach to Discrete Supervised Hashing","authors":"Shuguang Zhao, Bingzhi Chen, Zheng Zhang, Guangming Lu","doi":"10.1145/3469877.3493595","DOIUrl":"https://doi.org/10.1145/3469877.3493595","url":null,"abstract":"Prior hashing works typically learn a projection function from high-dimensional visual feature space to low-dimensional latent space. However, such a projection function remains several crucial bottlenecks: 1) information loss and coding redundancy are inevitable; 2) the available information of semantic labels is not well-explored; 3) the learned latent embedding lacks explicit semantic meaning. To overcome these limitations, we propose a novel supervised Discrete Auto-Encoder Hashing (DAEH) framework, in which a linear auto-encoder can effectively project the semantic labels of images into a latent representation space. Instead of using the visual feature projection, the proposed DAEH framework skillfully explores the semantic information of supervised labels to refine the latent feature embedding and further optimizes hashing function. Meanwhile, we reformulate the objective and relax the discrete constraints for the binary optimization problem. Extensive experiments on Caltech-256, CIFAR-10, and MNIST datasets demonstrate that our method can outperform the state-of-the-art hashing baselines.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122494134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo
Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.
{"title":"Language Based Image Quality Assessment","authors":"L. Galteri, Lorenzo Seidenari, P. Bongini, M. Bertini, A. Bimbo","doi":"10.1145/3469877.3490605","DOIUrl":"https://doi.org/10.1145/3469877.3490605","url":null,"abstract":"Evaluation of generative models, in the visual domain, is often performed providing anecdotal results to the reader. In the case of image enhancement, reference images are usually available. Nonetheless, using signal based metrics often leads to counterintuitive results: highly natural crisp images may obtain worse scores than blurry ones. On the other hand, blind reference image assessment may rank images reconstructed with GANs higher than the original undistorted images. To avoid time consuming human based image assessment, semantic computer vision tasks may be exploited instead [9, 25, 33]. In this paper we advocate the use of language generation tasks to evaluate the quality of restored images. We show experimentally that image captioning, used as a downstream task, may serve as a method to score image quality. Captioning scores are better aligned with human rankings with respect to signal based metrics or no-reference image quality metrics. We show insights on how the corruption, by artifacts, of local image structure may steer image captions in the wrong direction.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116426881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Zhang, Qi Zhang, P. Nguyen, Victor C. S. Lee, Antoni B. Chan
For ecological protection of the ocean, biologists usually conduct line-transect vessel surveys to measure sea species’ population density within their habitat (such as dolphins). However, sea species observation via vessel surveys consumes a lot of manpower resources and is more challenging compared to observing common objects, due to the scarcity of the object in the wild, tiny-size of the objects, and similar-sized distracter objects (e.g., floating trash). To reduce the human experts’ workload and improve the observation accuracy, in this paper, we develop a practical system to detect Chinese White Dolphins in the wild automatically. First, we construct a dataset named Dolphin-14k with more than 2.6k dolphin instances. To improve the dataset annotation efficiency caused by the rarity of dolphins, we design an interactive dolphin box annotation strategy to annotate sparse dolphin instances in long videos efficiently. Second, we compare the performance and efficiency of three off-the-shelf object detection algorithms, including Faster-RCNN, FCOS, and YoloV5, on the Dolphin-14k dataset and pick YoloV5 as the detector, where a new category (Distracter) is added to the model training to reject the false positives. Finally, we incorporate the dolphin detector into a system prototype, which detects dolphins in video frames at 100.99 FPS per GPU with high accuracy (i.e., 90.95 mAP@0.5).
{"title":"Chinese White Dolphin Detection in the Wild","authors":"Hao Zhang, Qi Zhang, P. Nguyen, Victor C. S. Lee, Antoni B. Chan","doi":"10.1145/3469877.3490574","DOIUrl":"https://doi.org/10.1145/3469877.3490574","url":null,"abstract":"For ecological protection of the ocean, biologists usually conduct line-transect vessel surveys to measure sea species’ population density within their habitat (such as dolphins). However, sea species observation via vessel surveys consumes a lot of manpower resources and is more challenging compared to observing common objects, due to the scarcity of the object in the wild, tiny-size of the objects, and similar-sized distracter objects (e.g., floating trash). To reduce the human experts’ workload and improve the observation accuracy, in this paper, we develop a practical system to detect Chinese White Dolphins in the wild automatically. First, we construct a dataset named Dolphin-14k with more than 2.6k dolphin instances. To improve the dataset annotation efficiency caused by the rarity of dolphins, we design an interactive dolphin box annotation strategy to annotate sparse dolphin instances in long videos efficiently. Second, we compare the performance and efficiency of three off-the-shelf object detection algorithms, including Faster-RCNN, FCOS, and YoloV5, on the Dolphin-14k dataset and pick YoloV5 as the detector, where a new category (Distracter) is added to the model training to reject the false positives. Finally, we incorporate the dolphin detector into a system prototype, which detects dolphins in video frames at 100.99 FPS per GPU with high accuracy (i.e., 90.95 mAP@0.5).","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126273047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In medicinal chemistry programs, it is key to design and make compounds that are efficacious and safe. In this study, we developed a new deep Reinforcement learning-based compounds molecular generation method. Because chemical space is impractically large, and many existing generation models generate molecules that lack effectiveness, novelty and unsatisfactory molecular properties. Our proposed method-DeepRLDS, which integrates transformer network, balanced binary tree search and docking simulation based on super large-scale supercomputing, can solve these problems well. Experiments show that more than 96 of the generated molecules are chemically valid, 99 of the generated molecules are chemically novelty, the generated molecules have satisfactory molecular properties and possess a broader chemical space distribution.
{"title":"Deep Reinforcement Learning and Docking Simulations for autonomous molecule generation in de novo Drug Design","authors":"Hao Liu, Qian Wang, Xiaotong Hu","doi":"10.1145/3469877.3497694","DOIUrl":"https://doi.org/10.1145/3469877.3497694","url":null,"abstract":"In medicinal chemistry programs, it is key to design and make compounds that are efficacious and safe. In this study, we developed a new deep Reinforcement learning-based compounds molecular generation method. Because chemical space is impractically large, and many existing generation models generate molecules that lack effectiveness, novelty and unsatisfactory molecular properties. Our proposed method-DeepRLDS, which integrates transformer network, balanced binary tree search and docking simulation based on super large-scale supercomputing, can solve these problems well. Experiments show that more than 96 of the generated molecules are chemically valid, 99 of the generated molecules are chemically novelty, the generated molecules have satisfactory molecular properties and possess a broader chemical space distribution.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.
{"title":"Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization","authors":"Haopeng Xie, Liang Xiao, Huicong Wu","doi":"10.1145/3469877.3490608","DOIUrl":"https://doi.org/10.1145/3469877.3490608","url":null,"abstract":"Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"42 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130679449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While deep learning has proved success in many critical tasks by training models from large-scale data, some private information within can be recovered from the released models, leading to the leakage of privacy. To address this problem, this paper presents a differentially private deep learning paradigm to train private models. In the approach, we propose and incorporate a simple operation termed grouped gradient clipping to modulate the gradient weights. We also incorporated the smooth sensitivity mechanism into differentially private deep learning paradigm, which bounds the adding Gaussian noise. In this way, the resulting model can simultaneously provide with strong privacy protection and avoid accuracy degradation, providing a good trade-off between privacy and performance. The theoretic advantages of grouped gradient clipping are well analyzed. Extensive evaluations on popular benchmarks and comparisons with 11 state-of-the-arts clearly demonstrate the effectiveness and genearalizability of our approach.
{"title":"Differentially Private Learning with Grouped Gradient Clipping","authors":"Haolin Liu, Chenyu Li, Bochao Liu, Pengju Wang, Shiming Ge, Weiping Wang","doi":"10.1145/3469877.3490594","DOIUrl":"https://doi.org/10.1145/3469877.3490594","url":null,"abstract":"While deep learning has proved success in many critical tasks by training models from large-scale data, some private information within can be recovered from the released models, leading to the leakage of privacy. To address this problem, this paper presents a differentially private deep learning paradigm to train private models. In the approach, we propose and incorporate a simple operation termed grouped gradient clipping to modulate the gradient weights. We also incorporated the smooth sensitivity mechanism into differentially private deep learning paradigm, which bounds the adding Gaussian noise. In this way, the resulting model can simultaneously provide with strong privacy protection and avoid accuracy degradation, providing a good trade-off between privacy and performance. The theoretic advantages of grouped gradient clipping are well analyzed. Extensive evaluations on popular benchmarks and comparisons with 11 state-of-the-arts clearly demonstrate the effectiveness and genearalizability of our approach.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131369552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanru Jiang, Chengyu Zheng, Zhao-Hui Wang, Rui Wang, Min Ye, Chenglong Wang, Ning Song, Jie Nie
The accuracy of the semantic segmentation results of ships is of great significance to coastline navigation, resource management, and territorial protection. Although the ship semantic segmentation method based on deep learning has made great progress, there is still the problem of not exploring the correlation between the targets. In order to avoid the above problems, this paper designed a multi-scale graph convolutional network and dynamic iterative class loss for ship segmentation in remote sensing images to generate more accurate segmentation results. Based on DeepLabv3+, our network uses deep convolutional networks and atrous convolutions for multi-scale feature extraction. In particular, for multi-scale semantic features, we propose to construct a Multi-Scale Graph Convolution Network (MSGCN) to introduce semantic correlation information for pixel feature learning by GCN, which enhances the segmentation result of ship objects. In addition, we propose a Dynamic Iterative Class Loss (DICL) based on iterative batch-wise class rectification instead of pre-computing the fixed weights over the whole dataset, which solves the problem of imbalance between positive and negative samples. We compared the proposed algorithm with the most advanced deep learning target detection methods and ship detection methods and proved the superiority of our method. On a High-Resolution SAR Images Dataset [1], ship detection and instance segmentation can be implemented well.
船舶语义分割结果的准确性对海岸线导航、资源管理和国土保护具有重要意义。虽然基于深度学习的船舶语义分割方法已经取得了很大的进展,但仍然存在未探索目标之间相关性的问题。为了避免上述问题,本文设计了遥感图像船舶分割的多尺度图卷积网络和动态迭代类损失,以获得更准确的分割结果。基于DeepLabv3+,我们的网络使用深度卷积网络和亚属性卷积进行多尺度特征提取。特别是针对多尺度语义特征,提出构建多尺度图卷积网络(MSGCN),引入语义相关信息进行像素特征学习,提高了船舶目标的分割效果。此外,我们提出了一种基于迭代分批类校正的动态迭代类损失(Dynamic Iterative Class Loss, DICL)方法,而不是预先计算整个数据集的固定权值,从而解决了正、负样本之间的不平衡问题。将本文算法与目前最先进的深度学习目标检测方法和船舶检测方法进行了比较,证明了本文算法的优越性。在高分辨率SAR图像数据集[1]上,可以很好地实现舰船检测和实例分割。
{"title":"Multi-Scale Graph Convolutional Network and Dynamic Iterative Class Loss for Ship Segmentation in Remote Sensing Images","authors":"Yanru Jiang, Chengyu Zheng, Zhao-Hui Wang, Rui Wang, Min Ye, Chenglong Wang, Ning Song, Jie Nie","doi":"10.1145/3469877.3497699","DOIUrl":"https://doi.org/10.1145/3469877.3497699","url":null,"abstract":"The accuracy of the semantic segmentation results of ships is of great significance to coastline navigation, resource management, and territorial protection. Although the ship semantic segmentation method based on deep learning has made great progress, there is still the problem of not exploring the correlation between the targets. In order to avoid the above problems, this paper designed a multi-scale graph convolutional network and dynamic iterative class loss for ship segmentation in remote sensing images to generate more accurate segmentation results. Based on DeepLabv3+, our network uses deep convolutional networks and atrous convolutions for multi-scale feature extraction. In particular, for multi-scale semantic features, we propose to construct a Multi-Scale Graph Convolution Network (MSGCN) to introduce semantic correlation information for pixel feature learning by GCN, which enhances the segmentation result of ship objects. In addition, we propose a Dynamic Iterative Class Loss (DICL) based on iterative batch-wise class rectification instead of pre-computing the fixed weights over the whole dataset, which solves the problem of imbalance between positive and negative samples. We compared the proposed algorithm with the most advanced deep learning target detection methods and ship detection methods and proved the superiority of our method. On a High-Resolution SAR Images Dataset [1], ship detection and instance segmentation can be implemented well.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"98 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113983351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}