: Objective Gallbladder carcinoma is recognized as one of the most malignant tumors in relevant to biliary sys⁃ tem. Its prognosis is extremely poor , and only 6 months of overall average. It is challenged for missed diagnose because of the lack of typical clinical manifestations in early stage of gallbladder cancer. To clarify gallbladder lesions for early detec⁃ tion of gallbladder carcinoma accurately , current gallbladder cancer - related diagnosis is mainly focused on the interpreta⁃ tion of digital pathological section images ( such as b - ultrasound , computed tomography ( CT ), magnetic resonance imaging ( MRI ), etc. ) in terms of the computer - aided diagnosis ( CAD ) . However , the accuracy is quite lower because the molecu⁃ lar level information of diseased organs cannot be obtained. Micro - hyperspectral technology can be incorporated the fea⁃ tures of spectral analysis and optical imaging , and it can obtain the chemical composition and physical features for biologi⁃ cal tissue samples at the same time. The changes of physical attributes of cancerous tissue may not be clear in the early stage , but the changes of chemical factors like its composition , structure and content can be reflected by spectral informa⁃ tion. Therefore , micro hyperspectral imaging has its potentials to achieve the early diagnosis of cancer more accurately. Micro - hyperspectral technology , as a special optical diagnosis technology , can provide an effective auxiliary diagnosis method for clinical research. However , it can provide richer spectral information but large amount of data and information redundancy are increased. To develop an improved accuracy detection method and use the rich spatial and hyperspectral information effectively , we design a multi - scale fusion attention mechanism - relevant network model for gallbladder cancer - oriented classification accuracy optimization. Method The multiscale squeeze - and - excitation - residual ( MSE - Res ) can be used to realize the fusion of multiscale features between channel dimensions. First , an improved multi - scale feature extrac⁃ tion module is employed to extract features of different scales in channel dimension. To extract the salient features of the image a maximum pooled layer , an upper sampling layer is used beyond convolution layer of 1 × 1. To compensate for the
{"title":"A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism","authors":"Hongmin Gao, Zhu Min, Xueying Cao, Chenming Li, Liu Qin, Peipei Xu","doi":"10.11834/jig.211201","DOIUrl":"https://doi.org/10.11834/jig.211201","url":null,"abstract":": Objective Gallbladder carcinoma is recognized as one of the most malignant tumors in relevant to biliary sys⁃ tem. Its prognosis is extremely poor , and only 6 months of overall average. It is challenged for missed diagnose because of the lack of typical clinical manifestations in early stage of gallbladder cancer. To clarify gallbladder lesions for early detec⁃ tion of gallbladder carcinoma accurately , current gallbladder cancer - related diagnosis is mainly focused on the interpreta⁃ tion of digital pathological section images ( such as b - ultrasound , computed tomography ( CT ), magnetic resonance imaging ( MRI ), etc. ) in terms of the computer - aided diagnosis ( CAD ) . However , the accuracy is quite lower because the molecu⁃ lar level information of diseased organs cannot be obtained. Micro - hyperspectral technology can be incorporated the fea⁃ tures of spectral analysis and optical imaging , and it can obtain the chemical composition and physical features for biologi⁃ cal tissue samples at the same time. The changes of physical attributes of cancerous tissue may not be clear in the early stage , but the changes of chemical factors like its composition , structure and content can be reflected by spectral informa⁃ tion. Therefore , micro hyperspectral imaging has its potentials to achieve the early diagnosis of cancer more accurately. Micro - hyperspectral technology , as a special optical diagnosis technology , can provide an effective auxiliary diagnosis method for clinical research. However , it can provide richer spectral information but large amount of data and information redundancy are increased. To develop an improved accuracy detection method and use the rich spatial and hyperspectral information effectively , we design a multi - scale fusion attention mechanism - relevant network model for gallbladder cancer - oriented classification accuracy optimization. Method The multiscale squeeze - and - excitation - residual ( MSE - Res ) can be used to realize the fusion of multiscale features between channel dimensions. First , an improved multi - scale feature extrac⁃ tion module is employed to extract features of different scales in channel dimension. To extract the salient features of the image a maximum pooled layer , an upper sampling layer is used beyond convolution layer of 1 × 1. To compensate for the","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88525747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RIC-NVNet:night-time vehicle enhancement network for vehicle model recognition","authors":"Yunsheng Ye, Chen Weixiao, Chen Fengxin","doi":"10.11834/jig.220122","DOIUrl":"https://doi.org/10.11834/jig.220122","url":null,"abstract":"","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"108 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87613662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhu Zhongjie, Cui Weifeng, Bai Yongqiang, Jing Weiyi, Jin Minhong
目的 色调映射是一种在保持视觉效果基本不变的前提下将高动态范围图像映射到常规低动态显示设备上进行显示的技术。针对现有方法存在细节模糊、边缘光晕及色彩失真等不足,提出一种宏微观信息增强与色彩校正的色调映射新方法。方法 将给定的高动态范围图像映射到 HSV(hue,saturation,value)颜色空间,分离亮度信息与色彩信息。基于人类视觉感知机制,在亮度通道构建宏观一致性和微观显著性的亮度感知压缩模型,并进一步通过调节模型缩放因子消除边缘光晕现象。基于颜色恒常性原理,在色度通道构建自适应饱和度偏移模型,融合亮度压缩信息调整图像的饱和度信息,解决色调映射所造成的主观色彩失真问题。结果 实验结果表明,所提算法在结构保真度、自然度和色调映射质量指数等客观评价方面均优于对比色调映射算法,同时主观平均意见值也取得了最高的 4.3 分(即好-非常好)。结论 宏微观信息增强的亮度感知压缩模型,在确保场景亮度信息不变的情况下,可以有效增强图像纹理细节的完整性和保真性。融合亮度压缩的饱和度偏移模型可以有效解决亮度压缩导致的图像色彩失真等问题。该色调映射算法效率高、通用性强,可广泛应用于图像压缩、生物医学和视频编码等领域。;Objective The traditional 8-bit images cannot accurately store and represent the real natural scene because the brightness variations in reality are very wide, ranging from faint starlight to direct sunlight with more than nine orders of magnitude.High dynamic range(HDR)imaging technology adopts floating-point numbers to address this deficiency.This technology can accurately represent the fidelity of a real scene with abundant brightness and chroma information.However, HDR images cannot be rendered directly on conventional display devices.Tone mapping(TM)technology aims to convert HDR images into traditional images while preserving the natural scene without losing information.Many excellent TM operators have emerged and have been widely used in business.However, the scene information is inevitably lost in different degrees because of the large-scale transformation and compression of the brightness range.In particular, even the state-ofart TM operators for complex scenes still have some problems, such as blurred details, edge halation, brightness imbalance, and color distortion, which seriously affect the subjective feeling of human eyes.Hence, a novel TM algorithm is proposed in this study via macro and micro information enhancement and color correction.Method Targeted algorithm structures with different strategies for the brightness and chroma domains are constructed in this study based on the human visual perception mechanism.First, an HDR image is converted from RGB color space to HSV color space, and the independent luminance information and chrominance information can be separated effectively.Thus, the subsequent processing can be performed smoothly without mutual interference.Second, different processing and optimization strategies are adopted for the brightness and chroma channels, respectively.For the former, the brightness range is greatly compressed to meet the demand of low dynamic range images while enhancing the detailed information perceived by human eyes from the macro and micro points of view.In particular, the brightness channel is divided into the basic and detail layers through the weighted guidance filter.The basic layer is compressed and combined with the macro statistical information to reduce the brightness contrast of the image and ensure the authenticity and integrity of the image background information and the overall structure.Subsequently, the salient region of the real scene is extracted by the gray-level co-occurren
目的 色调映射是一种在保持视觉效果基本不变的前提下将高动态范围图像映射到常规低动态显示设备上进行显示的技术。针对现有方法存在细节模糊、边缘光晕及色彩失真等不足,提出一种宏微观信息增强与色彩校正的色调映射新方法。方法 将给定的高动态范围图像映射到 HSV(hue,saturation,value)颜色空间,分离亮度信息与色彩信息。基于人类视觉感知机制,在亮度通道构建宏观一致性和微观显著性的亮度感知压缩模型,并进一步通过调节模型缩放因子消除边缘光晕现象。基于颜色恒常性原理,在色度通道构建自适应饱和度偏移模型,融合亮度压缩信息调整图像的饱和度信息,解决色调映射所造成的主观色彩失真问题。结果 实验结果表明,所提算法在结构保真度、自然度和色调映射质量指数等客观评价方面均优于对比色调映射算法,同时主观平均意见值也取得了最高的 4.3 分(即好-非常好)。结论 宏微观信息增强的亮度感知压缩模型,在确保场景亮度信息不变的情况下,可以有效增强图像纹理细节的完整性和保真性。融合亮度压缩的饱和度偏移模型可以有效解决亮度压缩导致的图像色彩失真等问题。该色调映射算法效率高、通用性强,可广泛应用于图像压缩、生物医学和视频编码等领域。;Objective The traditional 8-bit images cannot accurately store and represent the real natural scene because the brightness variations in reality are very wide, ranging from faint starlight to direct sunlight with more than nine orders of magnitude.High dynamic range(HDR)imaging technology adopts floating-point numbers to address this deficiency.This technology can accurately represent the fidelity of a real scene with abundant brightness and chroma information.However, HDR images cannot be rendered directly on conventional display devices.Tone mapping(TM)technology aims to convert HDR images into traditional images while preserving the natural scene without losing information.Many excellent TM operators have emerged and have been widely used in business.However, the scene information is inevitably lost in different degrees because of the large-scale transformation and compression of the brightness range.In particular, even the state-ofart TM operators for complex scenes still have some problems, such as blurred details, edge halation, brightness imbalance, and color distortion, which seriously affect the subjective feeling of human eyes.Hence, a novel TM algorithm is proposed in this study via macro and micro information enhancement and color correction.Method Targeted algorithm structures with different strategies for the brightness and chroma domains are constructed in this study based on the human visual perception mechanism.First, an HDR image is converted from RGB color space to HSV color space, and the independent luminance information and chrominance information can be separated effectively.Thus, the subsequent processing can be performed smoothly without mutual interference.Second, different processing and optimization strategies are adopted for the brightness and chroma channels, respectively.For the former, the brightness range is greatly compressed to meet the demand of low dynamic range images while enhancing the detailed information perceived by human eyes from the macro and micro points of view.In particular, the brightness channel is divided into the basic and detail layers through the weighted guidance filter.The basic layer is compressed and combined with the macro statistical information to reduce the brightness contrast of the image and ensure the authenticity and integrity of the image background information and the overall structure.Subsequently, the salient region of the real scene is extracted by the gray-level co-occurren
{"title":"Efficient tone mapping via macro and micro information enhancement and color correction","authors":"Zhu Zhongjie, Cui Weifeng, Bai Yongqiang, Jing Weiyi, Jin Minhong","doi":"10.11834/jig.220460","DOIUrl":"https://doi.org/10.11834/jig.220460","url":null,"abstract":"目的 色调映射是一种在保持视觉效果基本不变的前提下将高动态范围图像映射到常规低动态显示设备上进行显示的技术。针对现有方法存在细节模糊、边缘光晕及色彩失真等不足,提出一种宏微观信息增强与色彩校正的色调映射新方法。方法 将给定的高动态范围图像映射到 HSV(hue,saturation,value)颜色空间,分离亮度信息与色彩信息。基于人类视觉感知机制,在亮度通道构建宏观一致性和微观显著性的亮度感知压缩模型,并进一步通过调节模型缩放因子消除边缘光晕现象。基于颜色恒常性原理,在色度通道构建自适应饱和度偏移模型,融合亮度压缩信息调整图像的饱和度信息,解决色调映射所造成的主观色彩失真问题。结果 实验结果表明,所提算法在结构保真度、自然度和色调映射质量指数等客观评价方面均优于对比色调映射算法,同时主观平均意见值也取得了最高的 4.3 分(即好-非常好)。结论 宏微观信息增强的亮度感知压缩模型,在确保场景亮度信息不变的情况下,可以有效增强图像纹理细节的完整性和保真性。融合亮度压缩的饱和度偏移模型可以有效解决亮度压缩导致的图像色彩失真等问题。该色调映射算法效率高、通用性强,可广泛应用于图像压缩、生物医学和视频编码等领域。;Objective The traditional 8-bit images cannot accurately store and represent the real natural scene because the brightness variations in reality are very wide, ranging from faint starlight to direct sunlight with more than nine orders of magnitude.High dynamic range(HDR)imaging technology adopts floating-point numbers to address this deficiency.This technology can accurately represent the fidelity of a real scene with abundant brightness and chroma information.However, HDR images cannot be rendered directly on conventional display devices.Tone mapping(TM)technology aims to convert HDR images into traditional images while preserving the natural scene without losing information.Many excellent TM operators have emerged and have been widely used in business.However, the scene information is inevitably lost in different degrees because of the large-scale transformation and compression of the brightness range.In particular, even the state-ofart TM operators for complex scenes still have some problems, such as blurred details, edge halation, brightness imbalance, and color distortion, which seriously affect the subjective feeling of human eyes.Hence, a novel TM algorithm is proposed in this study via macro and micro information enhancement and color correction.Method Targeted algorithm structures with different strategies for the brightness and chroma domains are constructed in this study based on the human visual perception mechanism.First, an HDR image is converted from RGB color space to HSV color space, and the independent luminance information and chrominance information can be separated effectively.Thus, the subsequent processing can be performed smoothly without mutual interference.Second, different processing and optimization strategies are adopted for the brightness and chroma channels, respectively.For the former, the brightness range is greatly compressed to meet the demand of low dynamic range images while enhancing the detailed information perceived by human eyes from the macro and micro points of view.In particular, the brightness channel is divided into the basic and detail layers through the weighted guidance filter.The basic layer is compressed and combined with the macro statistical information to reduce the brightness contrast of the image and ensure the authenticity and integrity of the image background information and the overall structure.Subsequently, the salient region of the real scene is extracted by the gray-level co-occurren","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135601336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Cong, Hao Huaying, Wang Yang, Ma Yuhui, Yan Qifeng, Chen Bang, Ma Shaodong, Wang Xiaogui, Zhao Yitian
目的 光学相干断层扫描血管造影(optical coherence tomography angiography,OCTA)是一种非侵入式的新兴技术,越来越多地应用于视网膜血管成像。与传统眼底彩照相比,OCTA 技术能够显示黄斑周围的微血管信息,在视网膜血管成像邻域具有显著优势。临床实践中,医生可以通过 OCTA 图像观察不同层的血管结构,并通过分析血管结构的变化来判断是否存在相关疾病。大量研究表明,血管结构的任何异常变化通常都意味着存在某种眼科疾病。因此,对 OCTA 图像中的视网膜血管结构进行自动分割提取,对众多眼部相关疾病量化分析和临床决策具有重大意义。然而,OCTA 图像存在视网膜血管结构复杂、图像整体对比度低等问题,给自动分割带来极大挑战。为此,提出了一种新颖的融合隐向量对齐和 Swin Transformer 的视网膜血管结构的分割方法,能够实现血管结构的精准分割。方法 以 ResU-Net 为主干网络,通过 Swin Transformer 编码器获取丰富的血管特征信息。此外,设计了一种基于隐向量的特征对齐损失函数,能够在隐空间层次对网络进行优化,提升分割性能。结果 在 3 个 OCTA 图像数据集上的实验结果表明,本文方法的 AUC(area under curce)分别为 94.15%,94.87% 和 97.63%,ACC(accuracy)分别为 91.57%,90.03% 和 91.06%,领先其他对比方法,并且整体分割性能达到最佳。结论 本文提出的视网膜血管分割网络,在 3 个 OCTA 图像数据集上均取得了最佳的分割性能,优于对比方法。;Objective Optical coherence tomography angiography(OCTA)is a noninvasive, emerging technique that has been increasingly used for images of the retinal vasculature at the capillary-level resolution.OCTA technology can demonstrate the microvascular information around the macula and has significant remarkable advantages in retinal vascular imaging.Fundus fluorescence angiography can visualize the retinal vascular system, including capillaries.However, the technique requires intravenous injection of contrast.This process is relatively time-consuming and may have serious side effects.In clinical practice, doctors can look at different layers of vascular structures through OCTA images and analyze changes in vascular structures to determine the presence of related diseases.In particular, any abnormality in the microvasculature distributed in the macula often indicates the presence of some diseases, such as early-stage glaucomatous optic neuropathy, diabetic retinopathy, and age-related macular degeneration.Therefore, the automatic segmentation and extraction of retinal vascular structure in OCTA are vital for the quantitative analysis and clinical decision-making of many ocular diseases.However, the OCTA imaging process usually produces images with a low signal-to-noise ratio, thereby posing a great challenge for the automatic segmentation of vascular structures.Moreover, variations in vessel appearance, motion, and shadowing artifacts in different depth layers and underlying pathological structures significantly remarkably increase the difficulty in accurately segmenting retinal vessels.Therefore, this study proposes a novel segmentation method of retinal vascular structures by fusing hidden vector alignment and Swin Transformer to achieve the accurate segmentation of vascular structures.Method In this study, the ResU-Net network is used as the base network(the encoder and decoder layers consist of residual blocks and pooling layers), and the Swin Transformer is introduced into ResU-Net to form a new encoder structure.The encoding step of the feature encoder consists of four stages.Each stage comprises two layers:the Transformer layer consisting of several Swin Transformer blocks stacked together and the residual structure.The Swin Transformer encoder can acquire rich fea
目的 光学相干断层扫描血管造影(optical coherence tomography angiography,OCTA)是一种非侵入式的新兴技术,越来越多地应用于视网膜血管成像。与传统眼底彩照相比,OCTA 技术能够显示黄斑周围的微血管信息,在视网膜血管成像邻域具有显著优势。临床实践中,医生可以通过 OCTA 图像观察不同层的血管结构,并通过分析血管结构的变化来判断是否存在相关疾病。大量研究表明,血管结构的任何异常变化通常都意味着存在某种眼科疾病。因此,对 OCTA 图像中的视网膜血管结构进行自动分割提取,对众多眼部相关疾病量化分析和临床决策具有重大意义。然而,OCTA 图像存在视网膜血管结构复杂、图像整体对比度低等问题,给自动分割带来极大挑战。为此,提出了一种新颖的融合隐向量对齐和 Swin Transformer 的视网膜血管结构的分割方法,能够实现血管结构的精准分割。方法 以 ResU-Net 为主干网络,通过 Swin Transformer 编码器获取丰富的血管特征信息。此外,设计了一种基于隐向量的特征对齐损失函数,能够在隐空间层次对网络进行优化,提升分割性能。结果 在 3 个 OCTA 图像数据集上的实验结果表明,本文方法的 AUC(area under curce)分别为 94.15%,94.87% 和 97.63%,ACC(accuracy)分别为 91.57%,90.03% 和 91.06%,领先其他对比方法,并且整体分割性能达到最佳。结论 本文提出的视网膜血管分割网络,在 3 个 OCTA 图像数据集上均取得了最佳的分割性能,优于对比方法。;Objective Optical coherence tomography angiography(OCTA)is a noninvasive, emerging technique that has been increasingly used for images of the retinal vasculature at the capillary-level resolution.OCTA technology can demonstrate the microvascular information around the macula and has significant remarkable advantages in retinal vascular imaging.Fundus fluorescence angiography can visualize the retinal vascular system, including capillaries.However, the technique requires intravenous injection of contrast.This process is relatively time-consuming and may have serious side effects.In clinical practice, doctors can look at different layers of vascular structures through OCTA images and analyze changes in vascular structures to determine the presence of related diseases.In particular, any abnormality in the microvasculature distributed in the macula often indicates the presence of some diseases, such as early-stage glaucomatous optic neuropathy, diabetic retinopathy, and age-related macular degeneration.Therefore, the automatic segmentation and extraction of retinal vascular structure in OCTA are vital for the quantitative analysis and clinical decision-making of many ocular diseases.However, the OCTA imaging process usually produces images with a low signal-to-noise ratio, thereby posing a great challenge for the automatic segmentation of vascular structures.Moreover, variations in vessel appearance, motion, and shadowing artifacts in different depth layers and underlying pathological structures significantly remarkably increase the difficulty in accurately segmenting retinal vessels.Therefore, this study proposes a novel segmentation method of retinal vascular structures by fusing hidden vector alignment and Swin Transformer to achieve the accurate segmentation of vascular structures.Method In this study, the ResU-Net network is used as the base network(the encoder and decoder layers consist of residual blocks and pooling layers), and the Swin Transformer is introduced into ResU-Net to form a new encoder structure.The encoding step of the feature encoder consists of four stages.Each stage comprises two layers:the Transformer layer consisting of several Swin Transformer blocks stacked together and the residual structure.The Swin Transformer encoder can acquire rich fea
{"title":"Vessel segmentation of OCTA images based on latent vector alignment and swin Transformer","authors":"Xu Cong, Hao Huaying, Wang Yang, Ma Yuhui, Yan Qifeng, Chen Bang, Ma Shaodong, Wang Xiaogui, Zhao Yitian","doi":"10.11834/jig.220482","DOIUrl":"https://doi.org/10.11834/jig.220482","url":null,"abstract":"目的 光学相干断层扫描血管造影(optical coherence tomography angiography,OCTA)是一种非侵入式的新兴技术,越来越多地应用于视网膜血管成像。与传统眼底彩照相比,OCTA 技术能够显示黄斑周围的微血管信息,在视网膜血管成像邻域具有显著优势。临床实践中,医生可以通过 OCTA 图像观察不同层的血管结构,并通过分析血管结构的变化来判断是否存在相关疾病。大量研究表明,血管结构的任何异常变化通常都意味着存在某种眼科疾病。因此,对 OCTA 图像中的视网膜血管结构进行自动分割提取,对众多眼部相关疾病量化分析和临床决策具有重大意义。然而,OCTA 图像存在视网膜血管结构复杂、图像整体对比度低等问题,给自动分割带来极大挑战。为此,提出了一种新颖的融合隐向量对齐和 Swin Transformer 的视网膜血管结构的分割方法,能够实现血管结构的精准分割。方法 以 ResU-Net 为主干网络,通过 Swin Transformer 编码器获取丰富的血管特征信息。此外,设计了一种基于隐向量的特征对齐损失函数,能够在隐空间层次对网络进行优化,提升分割性能。结果 在 3 个 OCTA 图像数据集上的实验结果表明,本文方法的 AUC(area under curce)分别为 94.15%,94.87% 和 97.63%,ACC(accuracy)分别为 91.57%,90.03% 和 91.06%,领先其他对比方法,并且整体分割性能达到最佳。结论 本文提出的视网膜血管分割网络,在 3 个 OCTA 图像数据集上均取得了最佳的分割性能,优于对比方法。;Objective Optical coherence tomography angiography(OCTA)is a noninvasive, emerging technique that has been increasingly used for images of the retinal vasculature at the capillary-level resolution.OCTA technology can demonstrate the microvascular information around the macula and has significant remarkable advantages in retinal vascular imaging.Fundus fluorescence angiography can visualize the retinal vascular system, including capillaries.However, the technique requires intravenous injection of contrast.This process is relatively time-consuming and may have serious side effects.In clinical practice, doctors can look at different layers of vascular structures through OCTA images and analyze changes in vascular structures to determine the presence of related diseases.In particular, any abnormality in the microvasculature distributed in the macula often indicates the presence of some diseases, such as early-stage glaucomatous optic neuropathy, diabetic retinopathy, and age-related macular degeneration.Therefore, the automatic segmentation and extraction of retinal vascular structure in OCTA are vital for the quantitative analysis and clinical decision-making of many ocular diseases.However, the OCTA imaging process usually produces images with a low signal-to-noise ratio, thereby posing a great challenge for the automatic segmentation of vascular structures.Moreover, variations in vessel appearance, motion, and shadowing artifacts in different depth layers and underlying pathological structures significantly remarkably increase the difficulty in accurately segmenting retinal vessels.Therefore, this study proposes a novel segmentation method of retinal vascular structures by fusing hidden vector alignment and Swin Transformer to achieve the accurate segmentation of vascular structures.Method In this study, the ResU-Net network is used as the base network(the encoder and decoder layers consist of residual blocks and pooling layers), and the Swin Transformer is introduced into ResU-Net to form a new encoder structure.The encoding step of the feature encoder consists of four stages.Each stage comprises two layers:the Transformer layer consisting of several Swin Transformer blocks stacked together and the residual structure.The Swin Transformer encoder can acquire rich fea","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135601337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhao Yongqiang, Jin Zhi, Zhang Feng, Zhao Haiyan, Tao Zhengwei, Dou Chengfeng, Xu Xinhai, Liu Donghong
图像描述任务是利用计算机自动为已知图像生成一个完整、通顺、适用于对应场景的描述语句,实现从图像到文本的跨模态转换。随着深度学习技术的广泛应用,图像描述算法的精确度和推理速度都得到了极大提升。本文在广泛文献调研的基础上,将基于深度学习的图像描述算法研究分为两个层面,一是图像描述的基本能力构建,二是图像描述的应用有效性研究。这两个层面又可以细分为传递更加丰富的特征信息、解决暴露偏差问题、生成多样性的图像描述、实现图像描述的可控性和提升图像描述推理速度等核心技术挑战。针对上述层面所对应的挑战,本文从注意力机制、预训练模型和多模态模型的角度分析了传递更加丰富的特征信息的方法,从强化学习、非自回归模型和课程学习与计划采样的角度分析了解决暴露偏差问题的方法,从图卷积神经网络、生成对抗网络和数据增强的角度分析了生成多样性的图像描述的方法,从内容控制和风格控制的角度分析了图像描述可控性的方法,从非自回归模型、基于网格的视觉特征和基于卷积神经网络解码器的角度分析了提升图像描述推理速度的方法。此外,本文还对图像描述领域的通用数据集、评价指标和已有算法性能进行了详细介绍,并对图像描述中待解决的问题与未来研究趋势进行预测和展望。;The task of image captioning is to use a computer in automatically generating a complete, smooth, and suitable corresponding scene's caption for a known image and realizing the multimodal conversion from image to text.Describing the visual content of an image accurately and quickly is a fundamental goal for the area of artificial intelligence, which has a wide range of applications in research and production.Image captioning can be applied to many aspects of social development, such as text captions of images and videos, visual question answering, storytelling by looking at the image, network image analysis, and keyword search of an image.Image captions can also assist individuals born with visual impairments, making the computer another pair of eyes for them.The accuracy and inference speed of image captioning algorithms have been greatly improved with the wide application of deep learning technology.On the basis of extensive literature research we find that image captioning algorithms based on deep learning still have key technical challenges, i.e., delivering rich feature information, solving the problem of exposure bias, generating the diversity of image captions, realizing the controllability of image captions, and improving the inference speed of image captions.The main framework of the image captioning model is the encoder-decoder architecture.First, the encoder-decoder architecture uses an encoder to convert an input image into a fixed-length feature vector.Then, a decoder converts the fixed-length feature vector into an image caption.Therefore, the richer the feature information contained in the model is, the higher the accuracy of the model is, and the better the generation effect of the image caption is.According to the different research ideas of the existing algorithms, the present study reviews image captioning algorithms that deliver rich feature information from three aspects:attention mechanism, pretraining model, and multimodal model.Many image captioning algorithms cannot synchronize the training and prediction processes of a model.Thus, the model obtains exposure bias.When the model has an exposure bias, errors accumulate during word generation.Thus, the following words become biased, seriously affecting the accuracy of the image captioning model.According to different problem-solving methods, the present study reviews the related research on solving the exposure bias problem in the field of image captioning from three perspe
{"title":"Deep-learning-based image captioning:analysis and prospects","authors":"Zhao Yongqiang, Jin Zhi, Zhang Feng, Zhao Haiyan, Tao Zhengwei, Dou Chengfeng, Xu Xinhai, Liu Donghong","doi":"10.11834/jig.220660","DOIUrl":"https://doi.org/10.11834/jig.220660","url":null,"abstract":"图像描述任务是利用计算机自动为已知图像生成一个完整、通顺、适用于对应场景的描述语句,实现从图像到文本的跨模态转换。随着深度学习技术的广泛应用,图像描述算法的精确度和推理速度都得到了极大提升。本文在广泛文献调研的基础上,将基于深度学习的图像描述算法研究分为两个层面,一是图像描述的基本能力构建,二是图像描述的应用有效性研究。这两个层面又可以细分为传递更加丰富的特征信息、解决暴露偏差问题、生成多样性的图像描述、实现图像描述的可控性和提升图像描述推理速度等核心技术挑战。针对上述层面所对应的挑战,本文从注意力机制、预训练模型和多模态模型的角度分析了传递更加丰富的特征信息的方法,从强化学习、非自回归模型和课程学习与计划采样的角度分析了解决暴露偏差问题的方法,从图卷积神经网络、生成对抗网络和数据增强的角度分析了生成多样性的图像描述的方法,从内容控制和风格控制的角度分析了图像描述可控性的方法,从非自回归模型、基于网格的视觉特征和基于卷积神经网络解码器的角度分析了提升图像描述推理速度的方法。此外,本文还对图像描述领域的通用数据集、评价指标和已有算法性能进行了详细介绍,并对图像描述中待解决的问题与未来研究趋势进行预测和展望。;The task of image captioning is to use a computer in automatically generating a complete, smooth, and suitable corresponding scene's caption for a known image and realizing the multimodal conversion from image to text.Describing the visual content of an image accurately and quickly is a fundamental goal for the area of artificial intelligence, which has a wide range of applications in research and production.Image captioning can be applied to many aspects of social development, such as text captions of images and videos, visual question answering, storytelling by looking at the image, network image analysis, and keyword search of an image.Image captions can also assist individuals born with visual impairments, making the computer another pair of eyes for them.The accuracy and inference speed of image captioning algorithms have been greatly improved with the wide application of deep learning technology.On the basis of extensive literature research we find that image captioning algorithms based on deep learning still have key technical challenges, i.e., delivering rich feature information, solving the problem of exposure bias, generating the diversity of image captions, realizing the controllability of image captions, and improving the inference speed of image captions.The main framework of the image captioning model is the encoder-decoder architecture.First, the encoder-decoder architecture uses an encoder to convert an input image into a fixed-length feature vector.Then, a decoder converts the fixed-length feature vector into an image caption.Therefore, the richer the feature information contained in the model is, the higher the accuracy of the model is, and the better the generation effect of the image caption is.According to the different research ideas of the existing algorithms, the present study reviews image captioning algorithms that deliver rich feature information from three aspects:attention mechanism, pretraining model, and multimodal model.Many image captioning algorithms cannot synchronize the training and prediction processes of a model.Thus, the model obtains exposure bias.When the model has an exposure bias, errors accumulate during word generation.Thus, the following words become biased, seriously affecting the accuracy of the image captioning model.According to different problem-solving methods, the present study reviews the related research on solving the exposure bias problem in the field of image captioning from three perspe","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135602407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Yibo, Zhang Ke, Kong Yinghui, Yu Tingting, Zhao Shiwei
年龄信息作为人类生物特征识别的重要组成部分,在社会保障和数字娱乐等领域具有广泛的应用前景。人脸年龄合成技术由于其广泛的应用价值,受到了越来越多学者的重视,已经成为计算机视觉领域的重要研究方向之一。随着深度学习的快速发展,基于生成对抗网络的人脸年龄合成技术已成为研究热点。尽管基于生成对抗网络的人脸年龄合成方法取得了不错的成果,但生成的人脸年龄图像仍存在图像质量较差、真实感较低、年龄转换效果和多样性不足等问题。主要因为当前人脸年龄合成研究仍存在以下困难: 1)现有人脸年龄合成数据集的限制; 2)引入人脸年龄合成的先验知识不足; 3)人脸年龄图像的细粒度性被忽视; 4)高分辨率下的人脸年龄合成问题;5)目前人脸年龄合成方法的评价标准不规范。本文对目前人脸年龄合成技术进行全面综述,以人脸年龄合成方法为研究对象,阐述其研究现状。通过调研文献,对人脸年龄合成方法进行分类,重点介绍了基于生成对抗网络的人脸年龄合成方法。此外,本文还讨论了常用的人脸年龄合成数据集及评价指标,分析了各种人脸年龄合成方法的基本思想、特点及其局限性,对比了部分代表方法的性能,指出了该领域目前存在的挑战并提供了一些具有潜力的研究方向,为研究者们解决存在的问题提供便利。;Human-biometric age information has been widely used for such domains like public security and digital entertainment. Such of human-facial-related age synthesis methods are mainly divided into traditional image processing methods and machine learning-based methods. Traditional image processing methods are divided into physics-based methods and prototype-based methods. Machine learning based method is focused on the model-based method,which can be divided into parametric linear model method,deep generative model method based on the time frame and generative adversarial network(GAN)-based method. The physics-based methods are focused on intuitive facial features only,for which some subtle changes are inevitably ignored,resulting in the irrationality of synthetic images. In addition,it requires a large number of facial samples for the same person at several of ages,which is costly and labor-intensive to be collected. The aging patterns generated by the prototype-based method are obtained by faces-related averaging value,and some important personalized features may be averaged,resulting in the loss of personal identity. Severe ghosting artifacts will be appeared in their synthetic images while some dictionary-based learning methods are used to preserve personalized features to some extent. Its related parametric linear model method and the deep generative model method based on the time frame are still challenged to find a general model suitable for a specific age group,and its following model established is still linear,so the quality of its synthetic image is deficient as well. The emerging GAN-based method can be used to train models using deep convolution network. Aging patterns-related age groups is learnt in terms of the generative adversarial learning mechanism,different types of loss functions are introduced for various problems appearing in the image,and the minimum value of the perceptual loss of the original image is sorted out. Aging mode can be realized in the input face image,and identity information can be preserved simultaneously. Recent GAN framework is derived of a series of variant models and has been optimizing consistently. GAN-based age synthesis methods can be segmented into four sorts of categories:GAN-classical,GANsequential,GAN-translational and GAN-conditional. For classical GAN method,it can be used to simulate face aging. However,the input information is not fully considered,which affects the identity retention
{"title":"Overview of human-facial-related age syntheis based generative adversarial network methods","authors":"Wang Yibo, Zhang Ke, Kong Yinghui, Yu Tingting, Zhao Shiwei","doi":"10.11834/jig.220842","DOIUrl":"https://doi.org/10.11834/jig.220842","url":null,"abstract":"年龄信息作为人类生物特征识别的重要组成部分,在社会保障和数字娱乐等领域具有广泛的应用前景。人脸年龄合成技术由于其广泛的应用价值,受到了越来越多学者的重视,已经成为计算机视觉领域的重要研究方向之一。随着深度学习的快速发展,基于生成对抗网络的人脸年龄合成技术已成为研究热点。尽管基于生成对抗网络的人脸年龄合成方法取得了不错的成果,但生成的人脸年龄图像仍存在图像质量较差、真实感较低、年龄转换效果和多样性不足等问题。主要因为当前人脸年龄合成研究仍存在以下困难: 1)现有人脸年龄合成数据集的限制; 2)引入人脸年龄合成的先验知识不足; 3)人脸年龄图像的细粒度性被忽视; 4)高分辨率下的人脸年龄合成问题;5)目前人脸年龄合成方法的评价标准不规范。本文对目前人脸年龄合成技术进行全面综述,以人脸年龄合成方法为研究对象,阐述其研究现状。通过调研文献,对人脸年龄合成方法进行分类,重点介绍了基于生成对抗网络的人脸年龄合成方法。此外,本文还讨论了常用的人脸年龄合成数据集及评价指标,分析了各种人脸年龄合成方法的基本思想、特点及其局限性,对比了部分代表方法的性能,指出了该领域目前存在的挑战并提供了一些具有潜力的研究方向,为研究者们解决存在的问题提供便利。;Human-biometric age information has been widely used for such domains like public security and digital entertainment. Such of human-facial-related age synthesis methods are mainly divided into traditional image processing methods and machine learning-based methods. Traditional image processing methods are divided into physics-based methods and prototype-based methods. Machine learning based method is focused on the model-based method,which can be divided into parametric linear model method,deep generative model method based on the time frame and generative adversarial network(GAN)-based method. The physics-based methods are focused on intuitive facial features only,for which some subtle changes are inevitably ignored,resulting in the irrationality of synthetic images. In addition,it requires a large number of facial samples for the same person at several of ages,which is costly and labor-intensive to be collected. The aging patterns generated by the prototype-based method are obtained by faces-related averaging value,and some important personalized features may be averaged,resulting in the loss of personal identity. Severe ghosting artifacts will be appeared in their synthetic images while some dictionary-based learning methods are used to preserve personalized features to some extent. Its related parametric linear model method and the deep generative model method based on the time frame are still challenged to find a general model suitable for a specific age group,and its following model established is still linear,so the quality of its synthetic image is deficient as well. The emerging GAN-based method can be used to train models using deep convolution network. Aging patterns-related age groups is learnt in terms of the generative adversarial learning mechanism,different types of loss functions are introduced for various problems appearing in the image,and the minimum value of the perceptual loss of the original image is sorted out. Aging mode can be realized in the input face image,and identity information can be preserved simultaneously. Recent GAN framework is derived of a series of variant models and has been optimizing consistently. GAN-based age synthesis methods can be segmented into four sorts of categories:GAN-classical,GANsequential,GAN-translational and GAN-conditional. For classical GAN method,it can be used to simulate face aging. However,the input information is not fully considered,which affects the identity retention","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135103447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leng Jiaxu, Mo Mengjingcheng, Zhou Yinghua, Ye Yongming, Gao Chenqiang, Gao Xinbo
在人工智能技术的支持下,无人机初步获得智能感知能力,在实际应用中展现出高效灵活的数据收集能力。无人机视角下的目标检测作为关键核心技术,在诸多领域中发挥着不可替代的作用,具有重要的研究意义。为了进一步展现无人机视角下的目标检测研究进展,本文对无人机视角下的目标检测算法进行了全面总结,并对已有算法进行了归类、分析和比较。1)介绍无人机视角下的目标检测概念,并总结无人机视角下目标检测所面临的目标尺度、空间分布、样本数量、类别语义以及优化目标等 5 大不均衡挑战。在介绍现有研究方法的基础上,特别整理并介绍了无人机视角下目标检测算法在交通监控、电力巡检、作物分析和灾害救援等实际场景中的应用。2)重点阐述从数据增强策略、多尺度特征融合、区域聚焦策略、多任务学习以及模型轻量化等方面提升无人机视角下目标检测性能的方法,总结这些方法的优缺点并分析了其与现存挑战之间的关联性。3)全面介绍基于无人机视角的目标检测数据集,并呈现已有算法在两个较常用公共数据集上的性能评估。4)对无人机视角下目标检测技术的未来发展方向进行了展望。;Given the support of artificial intelligence technology, drones have initially acquired intelligent sensing capabilities and have demonstrated efficient and flexible data collection in practical applications.Drone-view object detection, which aims to locate specific objects in aerial images, plays an irreplaceable role in many fields and has important research significance.For example, drones with highly mobile and flexible deployment have remarkable advantages in accident handling, order management, traffic guidance, and flow detection, making them irreplaceable in traffic monitoring.As for disaster emergency rescue, drones with aerial vision and high mobility can achieve efficient search and safe rescue in large areas, locate people quickly and accurately in distress, and help rescuers control the situation, thereby ensuring the safety of people in distress.This study provides a comprehensive summary of the challenges in object detection based on the unmanned aerial vehicle(UAV)perspective to portray further the development of drone-view object detection.The existing algorithms and related datasets are also introduced.First, this study briefly introduces the concept of object detection in drone view and summarizes the five imbalance challenges in object detection in drone view, such as scale imbalance, spatial imbalance, class imbalance, semantic imbalance, and objective imbalance.This study analyzes and summarizes the challenges of drone-view object detection based on the aforementioned imbalances by using quantitative data analysis and visual qualitative analysis.1)Object scale imbalance is the most focused challenge in current research.It comes from the unique aerial view of drones.The changes in the drone's height and angle bring drastic changes to the object scale in the acquired images.The distance of the lens from the photographed object under the drone view is often far.This scenario results in numerous small objects in the image and makes capturing useful features for object detection difficult for the existing detectors.2)Different regions of drone-view images have great differences, and most objects are concentrated in the minor area of images, i.e., the spatial distribution of objects is enormously uneven.On the one hand, the clustering of dense objects in small areas generates occlusion.The detection model needs to devote considerable attention to this occlusion to distinguish different objects effectively.On the other hand, treating equally different areas wastes many computationa
在人工智能技术的支持下,无人机初步获得智能感知能力,在实际应用中展现出高效灵活的数据收集能力。无人机视角下的目标检测作为关键核心技术,在诸多领域中发挥着不可替代的作用,具有重要的研究意义。为了进一步展现无人机视角下的目标检测研究进展,本文对无人机视角下的目标检测算法进行了全面总结,并对已有算法进行了归类、分析和比较。1)介绍无人机视角下的目标检测概念,并总结无人机视角下目标检测所面临的目标尺度、空间分布、样本数量、类别语义以及优化目标等 5 大不均衡挑战。在介绍现有研究方法的基础上,特别整理并介绍了无人机视角下目标检测算法在交通监控、电力巡检、作物分析和灾害救援等实际场景中的应用。2)重点阐述从数据增强策略、多尺度特征融合、区域聚焦策略、多任务学习以及模型轻量化等方面提升无人机视角下目标检测性能的方法,总结这些方法的优缺点并分析了其与现存挑战之间的关联性。3)全面介绍基于无人机视角的目标检测数据集,并呈现已有算法在两个较常用公共数据集上的性能评估。4)对无人机视角下目标检测技术的未来发展方向进行了展望。;Given the support of artificial intelligence technology, drones have initially acquired intelligent sensing capabilities and have demonstrated efficient and flexible data collection in practical applications.Drone-view object detection, which aims to locate specific objects in aerial images, plays an irreplaceable role in many fields and has important research significance.For example, drones with highly mobile and flexible deployment have remarkable advantages in accident handling, order management, traffic guidance, and flow detection, making them irreplaceable in traffic monitoring.As for disaster emergency rescue, drones with aerial vision and high mobility can achieve efficient search and safe rescue in large areas, locate people quickly and accurately in distress, and help rescuers control the situation, thereby ensuring the safety of people in distress.This study provides a comprehensive summary of the challenges in object detection based on the unmanned aerial vehicle(UAV)perspective to portray further the development of drone-view object detection.The existing algorithms and related datasets are also introduced.First, this study briefly introduces the concept of object detection in drone view and summarizes the five imbalance challenges in object detection in drone view, such as scale imbalance, spatial imbalance, class imbalance, semantic imbalance, and objective imbalance.This study analyzes and summarizes the challenges of drone-view object detection based on the aforementioned imbalances by using quantitative data analysis and visual qualitative analysis.1)Object scale imbalance is the most focused challenge in current research.It comes from the unique aerial view of drones.The changes in the drone's height and angle bring drastic changes to the object scale in the acquired images.The distance of the lens from the photographed object under the drone view is often far.This scenario results in numerous small objects in the image and makes capturing useful features for object detection difficult for the existing detectors.2)Different regions of drone-view images have great differences, and most objects are concentrated in the minor area of images, i.e., the spatial distribution of objects is enormously uneven.On the one hand, the clustering of dense objects in small areas generates occlusion.The detection model needs to devote considerable attention to this occlusion to distinguish different objects effectively.On the other hand, treating equally different areas wastes many computationa
{"title":"Recent advances in drone-view object detection","authors":"Leng Jiaxu, Mo Mengjingcheng, Zhou Yinghua, Ye Yongming, Gao Chenqiang, Gao Xinbo","doi":"10.11834/jig.220836","DOIUrl":"https://doi.org/10.11834/jig.220836","url":null,"abstract":"在人工智能技术的支持下,无人机初步获得智能感知能力,在实际应用中展现出高效灵活的数据收集能力。无人机视角下的目标检测作为关键核心技术,在诸多领域中发挥着不可替代的作用,具有重要的研究意义。为了进一步展现无人机视角下的目标检测研究进展,本文对无人机视角下的目标检测算法进行了全面总结,并对已有算法进行了归类、分析和比较。1)介绍无人机视角下的目标检测概念,并总结无人机视角下目标检测所面临的目标尺度、空间分布、样本数量、类别语义以及优化目标等 5 大不均衡挑战。在介绍现有研究方法的基础上,特别整理并介绍了无人机视角下目标检测算法在交通监控、电力巡检、作物分析和灾害救援等实际场景中的应用。2)重点阐述从数据增强策略、多尺度特征融合、区域聚焦策略、多任务学习以及模型轻量化等方面提升无人机视角下目标检测性能的方法,总结这些方法的优缺点并分析了其与现存挑战之间的关联性。3)全面介绍基于无人机视角的目标检测数据集,并呈现已有算法在两个较常用公共数据集上的性能评估。4)对无人机视角下目标检测技术的未来发展方向进行了展望。;Given the support of artificial intelligence technology, drones have initially acquired intelligent sensing capabilities and have demonstrated efficient and flexible data collection in practical applications.Drone-view object detection, which aims to locate specific objects in aerial images, plays an irreplaceable role in many fields and has important research significance.For example, drones with highly mobile and flexible deployment have remarkable advantages in accident handling, order management, traffic guidance, and flow detection, making them irreplaceable in traffic monitoring.As for disaster emergency rescue, drones with aerial vision and high mobility can achieve efficient search and safe rescue in large areas, locate people quickly and accurately in distress, and help rescuers control the situation, thereby ensuring the safety of people in distress.This study provides a comprehensive summary of the challenges in object detection based on the unmanned aerial vehicle(UAV)perspective to portray further the development of drone-view object detection.The existing algorithms and related datasets are also introduced.First, this study briefly introduces the concept of object detection in drone view and summarizes the five imbalance challenges in object detection in drone view, such as scale imbalance, spatial imbalance, class imbalance, semantic imbalance, and objective imbalance.This study analyzes and summarizes the challenges of drone-view object detection based on the aforementioned imbalances by using quantitative data analysis and visual qualitative analysis.1)Object scale imbalance is the most focused challenge in current research.It comes from the unique aerial view of drones.The changes in the drone's height and angle bring drastic changes to the object scale in the acquired images.The distance of the lens from the photographed object under the drone view is often far.This scenario results in numerous small objects in the image and makes capturing useful features for object detection difficult for the existing detectors.2)Different regions of drone-view images have great differences, and most objects are concentrated in the minor area of images, i.e., the spatial distribution of objects is enormously uneven.On the one hand, the clustering of dense objects in small areas generates occlusion.The detection model needs to devote considerable attention to this occlusion to distinguish different objects effectively.On the other hand, treating equally different areas wastes many computationa","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135599823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shi Zhenghao, Wu Chenwei, Li Chengjian, You Zhenzhen, Wang Quan, Ma Chengcheng
航空遥感图像目标检测旨在定位和识别遥感图像中感兴趣的目标,是航空遥感图像智能解译的关键技术,在情报侦察、灾害救援和资源勘探等领域具有重要应用价值。然而由于航空遥感图像具有尺寸大、目标小且密集、目标呈任意角度分布、目标易被遮挡、目标类别不均衡以及背景复杂等诸多特点,航空遥感图像目标检测目前仍然是极具挑战的任务。基于深度卷积神经网络的航空遥感图像目标检测方法因具有精度高、处理速度快等优点,受到了越来越多的关注。为推进基于深度学习的航空遥感图像目标检测技术的发展,本文对当前主流遥感图像目标检测方法,特别是 2020-2022 年提出的检测方法,进行了系统梳理和总结。首先梳理了基于深度学习目标检测方法的研究发展演化过程,然后对基于卷积神经网络和基于 Transformer 目标检测方法中的代表性算法进行分析总结,再后针对不同遥感图象应用场景的改进方法思路进行归纳,分析了典型算法的思路和特点,介绍了现有的公开航空遥感图像目标检测数据集,给出了典型算法的实验比较结果,最后给出现阶段航空遥感图像目标检测研究中所存在的问题,并对未来研究及发展趋势进行了展望。;Given the successful development of aerospace technology, high-resolution remote-sensing images have been used in daily research.The earlier low-resolution images limit researchers'interpretation of image information.In comparison, today's high-resolution remote sensing images contain rich geographic and entity detail features.They are also rich in spatial structure and semantic information.Thus, they can greatly promote the development of research in this field.Aerial remote sensing image object detection aims to provide the category and location of the target of interest in aerial remote sensing images and present evidence for further information interpretation reasoning.This technology is crucial for aerial remote sensing image interpretation and has important applications in intelligence reconnaissance, target surveillance, and disaster rescue.The early remote sensing image object detection task mainly relies on manual interpretation.The interpretation results are greatly affected by subjective factors, such as the experience and energy of the interpreters.Moreover, the timeliness is low.Various remote sensing image object detection methods based on machine learning technology have been proposed with the progress and development of machine learning technology.Traditional machine learning-based object detection techniques generally use manually designed models to extract feature information, such as feature spectrum, gray value, texture, and shape of remote sensing images, after generating sliding windows.Then, they feed the extracted feature information into classifiers, such as support vector machine(SVM)and adaptive boosting(AdaBoost), to achieve object detection in remote sensing images.These methods design the corresponding feature extraction models for specific targets with strong interpretability but weak feature expression capability, poor generalization, time-consuming computation, and low accuracy.These features make meeting the needs of accurate and efficient object detection tasks challenging in complex and variable application scenarios.In recent years, the research on the application of deep learning in remote sensing image processing has received considerable attention and become a hotspot because of the wide application of deep learning techniques, such as deep convolutional neural networks and generative adversarial neural networks, in the fields of natural image object detection, classification, and recognition, and the excellent performance in the task of
航空遥感图像目标检测旨在定位和识别遥感图像中感兴趣的目标,是航空遥感图像智能解译的关键技术,在情报侦察、灾害救援和资源勘探等领域具有重要应用价值。然而由于航空遥感图像具有尺寸大、目标小且密集、目标呈任意角度分布、目标易被遮挡、目标类别不均衡以及背景复杂等诸多特点,航空遥感图像目标检测目前仍然是极具挑战的任务。基于深度卷积神经网络的航空遥感图像目标检测方法因具有精度高、处理速度快等优点,受到了越来越多的关注。为推进基于深度学习的航空遥感图像目标检测技术的发展,本文对当前主流遥感图像目标检测方法,特别是 2020-2022 年提出的检测方法,进行了系统梳理和总结。首先梳理了基于深度学习目标检测方法的研究发展演化过程,然后对基于卷积神经网络和基于 Transformer 目标检测方法中的代表性算法进行分析总结,再后针对不同遥感图象应用场景的改进方法思路进行归纳,分析了典型算法的思路和特点,介绍了现有的公开航空遥感图像目标检测数据集,给出了典型算法的实验比较结果,最后给出现阶段航空遥感图像目标检测研究中所存在的问题,并对未来研究及发展趋势进行了展望。;Given the successful development of aerospace technology, high-resolution remote-sensing images have been used in daily research.The earlier low-resolution images limit researchers'interpretation of image information.In comparison, today's high-resolution remote sensing images contain rich geographic and entity detail features.They are also rich in spatial structure and semantic information.Thus, they can greatly promote the development of research in this field.Aerial remote sensing image object detection aims to provide the category and location of the target of interest in aerial remote sensing images and present evidence for further information interpretation reasoning.This technology is crucial for aerial remote sensing image interpretation and has important applications in intelligence reconnaissance, target surveillance, and disaster rescue.The early remote sensing image object detection task mainly relies on manual interpretation.The interpretation results are greatly affected by subjective factors, such as the experience and energy of the interpreters.Moreover, the timeliness is low.Various remote sensing image object detection methods based on machine learning technology have been proposed with the progress and development of machine learning technology.Traditional machine learning-based object detection techniques generally use manually designed models to extract feature information, such as feature spectrum, gray value, texture, and shape of remote sensing images, after generating sliding windows.Then, they feed the extracted feature information into classifiers, such as support vector machine(SVM)and adaptive boosting(AdaBoost), to achieve object detection in remote sensing images.These methods design the corresponding feature extraction models for specific targets with strong interpretability but weak feature expression capability, poor generalization, time-consuming computation, and low accuracy.These features make meeting the needs of accurate and efficient object detection tasks challenging in complex and variable application scenarios.In recent years, the research on the application of deep learning in remote sensing image processing has received considerable attention and become a hotspot because of the wide application of deep learning techniques, such as deep convolutional neural networks and generative adversarial neural networks, in the fields of natural image object detection, classification, and recognition, and the excellent performance in the task of
{"title":"Object detection techniques based on deep learning for aerial remote sensing images:a survey","authors":"Shi Zhenghao, Wu Chenwei, Li Chengjian, You Zhenzhen, Wang Quan, Ma Chengcheng","doi":"10.11834/jig.221085","DOIUrl":"https://doi.org/10.11834/jig.221085","url":null,"abstract":"航空遥感图像目标检测旨在定位和识别遥感图像中感兴趣的目标,是航空遥感图像智能解译的关键技术,在情报侦察、灾害救援和资源勘探等领域具有重要应用价值。然而由于航空遥感图像具有尺寸大、目标小且密集、目标呈任意角度分布、目标易被遮挡、目标类别不均衡以及背景复杂等诸多特点,航空遥感图像目标检测目前仍然是极具挑战的任务。基于深度卷积神经网络的航空遥感图像目标检测方法因具有精度高、处理速度快等优点,受到了越来越多的关注。为推进基于深度学习的航空遥感图像目标检测技术的发展,本文对当前主流遥感图像目标检测方法,特别是 2020-2022 年提出的检测方法,进行了系统梳理和总结。首先梳理了基于深度学习目标检测方法的研究发展演化过程,然后对基于卷积神经网络和基于 Transformer 目标检测方法中的代表性算法进行分析总结,再后针对不同遥感图象应用场景的改进方法思路进行归纳,分析了典型算法的思路和特点,介绍了现有的公开航空遥感图像目标检测数据集,给出了典型算法的实验比较结果,最后给出现阶段航空遥感图像目标检测研究中所存在的问题,并对未来研究及发展趋势进行了展望。;Given the successful development of aerospace technology, high-resolution remote-sensing images have been used in daily research.The earlier low-resolution images limit researchers'interpretation of image information.In comparison, today's high-resolution remote sensing images contain rich geographic and entity detail features.They are also rich in spatial structure and semantic information.Thus, they can greatly promote the development of research in this field.Aerial remote sensing image object detection aims to provide the category and location of the target of interest in aerial remote sensing images and present evidence for further information interpretation reasoning.This technology is crucial for aerial remote sensing image interpretation and has important applications in intelligence reconnaissance, target surveillance, and disaster rescue.The early remote sensing image object detection task mainly relies on manual interpretation.The interpretation results are greatly affected by subjective factors, such as the experience and energy of the interpreters.Moreover, the timeliness is low.Various remote sensing image object detection methods based on machine learning technology have been proposed with the progress and development of machine learning technology.Traditional machine learning-based object detection techniques generally use manually designed models to extract feature information, such as feature spectrum, gray value, texture, and shape of remote sensing images, after generating sliding windows.Then, they feed the extracted feature information into classifiers, such as support vector machine(SVM)and adaptive boosting(AdaBoost), to achieve object detection in remote sensing images.These methods design the corresponding feature extraction models for specific targets with strong interpretability but weak feature expression capability, poor generalization, time-consuming computation, and low accuracy.These features make meeting the needs of accurate and efficient object detection tasks challenging in complex and variable application scenarios.In recent years, the research on the application of deep learning in remote sensing image processing has received considerable attention and become a hotspot because of the wide application of deep learning techniques, such as deep convolutional neural networks and generative adversarial neural networks, in the fields of natural image object detection, classification, and recognition, and the excellent performance in the task of","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135600115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yating Li, Xiao Jing, Liao Liang, Wang Zheng, Che Wenyi, Wang Mi
: Representation learning is essential for machine learning technique nowadays. The transition of input represen⁃ tations have been developing intensively in algorithm performance benefited from the growth of hand - crafted features to the representation for multi - media data. However , the representations of visual data are often highly entangled. The interpreta⁃ tion challenges are to be faced because all information components are encoded into the same feature space. Disentangled representation learning ( DRL ) aims to learn a low - dimensional interpretable abstract representation that can sort the mul⁃ tiple factors of variation out in high - dimensional observations. In the disentangled representation , we can capture and manipulate the information of a single factor of variation through the corresponding latent subspace , which makes it more
{"title":"A review of disentangled representation learning for visual data processing and analysis","authors":"Yating Li, Xiao Jing, Liao Liang, Wang Zheng, Che Wenyi, Wang Mi","doi":"10.11834/jig.211261","DOIUrl":"https://doi.org/10.11834/jig.211261","url":null,"abstract":": Representation learning is essential for machine learning technique nowadays. The transition of input represen⁃ tations have been developing intensively in algorithm performance benefited from the growth of hand - crafted features to the representation for multi - media data. However , the representations of visual data are often highly entangled. The interpreta⁃ tion challenges are to be faced because all information components are encoded into the same feature space. Disentangled representation learning ( DRL ) aims to learn a low - dimensional interpretable abstract representation that can sort the mul⁃ tiple factors of variation out in high - dimensional observations. In the disentangled representation , we can capture and manipulate the information of a single factor of variation through the corresponding latent subspace , which makes it more","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86018652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}