首页 > 最新文献

中国图象图形学报最新文献

英文 中文
Infrared target tracking algorithm based on attention mechanism enhancement and target model update 基于注意机制增强和目标模型更新的红外目标跟踪算法
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220459
Ji Qingbo, Chen Kuicheng, Hou Changbo, Li Ziqi, Qi Yufei
目的 多数以深度学习为基础的红外目标跟踪方法在对比度弱、噪声多的红外场景下,缺少对目标细节信息的利用,而且当跟踪场景中有相似目标且背景杂乱时,大部分跟踪器无法对跟踪的目标进行有效的更新,导致长期跟踪时鲁棒性较差。为解决这些问题,提出一种基于注意力和目标模型自适应更新的红外目标跟踪算法。方法 首先以无锚框算法为基础,加入针对红外跟踪场景设计的快速注意力增强模块以并行处理红外图像,在不损失原信息的前提下提高红外目标与背景的差异性并增强目标的细节信息,然后将提取的特征融合到主干网络的中间层,最后利用目标模型自适应更新网络,学习红外目标的特征变化趋势,同时对目标的中高层特征进行动态更新。结果 本文方法在 4 个红外目标跟踪评估基准上与其他先进算法进行了比较,在 LSOTB-TIR(large-scale thermalinfrared object tracking benchmark)数据集上的精度为 79.0%,归一化精度为 71.5%,成功率为 66.2%,较第 2 名在精度和成功率上分别高出 4.0%和 4.6%;在 PTB-TIR(thermal infrared pedestrian tracking benchmark)数据集上的精度为85.1%,成功率为 66.9%,较第 2 名分别高出 1.3% 和 3.6%;在 VOT-TIR2015(thermal infrared visual object tracking)和VOT-TIR2017 数据集上的期望平均重叠与精确度分别为 0.344、0.73 和 0.276、0.71,本文算法在前 3 个数据集的测评结果均达到最优。同时,在 LSOTB-TIR 数据集上的消融实验结果显示,本文方法对基线跟踪器有着明显的增益作用。结论 本文算法提高了对红外目标特征的捕捉能力,解决了红外目标跟踪易受干扰的问题,能够提升红外目标长期跟踪的精度和成功率。;Objective Most target tracking algorithms are designed based on visible sight scenes.However, in some cases, infrared target tracking has advantages that visible light does not have.Infrared equipment uses the radiation of an object itself to image and does not require additional lighting sources.It can display the target in weak light or dark scenes and has a certain penetration ability.However, infrared images have defects, such as unclear boundaries between targets and backgrounds, blurred images, and cluttered backgrounds.Moreover, some infrared dataset images are rough, negatively impacting the training of data-driven-based deep learning algorithms to a certain extent.Infrared tracking algorithms can be divided into traditional methods and deep learning methods.Traditional methods generally take the idea of correlation filtering as the core.Deep learning methods are mainly divided into the method of a neural network providing target features for correlation filters and the method of calculating the similarity of the image area with the framework of the Siamese network.The feature extraction ability of traditional methods for infrared targets is far inferior to that of deep learning methods.Moreover, the filters trained online cannot adapt to fast-moving or blurred targets, resulting in poor tracking accuracy in scenes with complex backgrounds.At present, most deep-learning-based infrared target tracking methods still lack the use of detailed information on infrared targets in infrared scenes with weak contrast and noise.Most trackers cannot effectively update the tracked target when the tracking scene has similar targets and cluttered background.This scenario results in poor robustness in long-term tracking.Therefore, an infrared target tracking algorithm based on attention and template adaptive update is proposed to solve the problems mentioned.Method The Siamese network tracking algorithm takes the target in the first frame as the template and performs similarity calculation on the search area of the subsequent frames to obtain the position of the target with the maximum response.The method has a simple structure and high tracking efficiency.
在LSOTB-TIR数据集上的精度为79.0%。排名第一的算法归一化精度为71.5%,成功率为66.2%,分别比排名第二的算法高4.0%和4.6%。在PTB-TIR数据集上,排名第一的算法的准确率为85.1%,成功率为66.9%,分别比排名第二的算法高1.3%和3.6%。在vottir2015数据集上的期望平均重叠度为0.344,准确度为0.73,在VOTTIR2017数据集上进行相同测试的结果为0.276和0.71。算法在前三个数据集的评价结果都达到了最高的排名。在LSOTB-TIR数据集上的烧蚀实验结果表明,该算法对基线跟踪器具有明显的增益效应。最后,对LSOTB-TIR数据集的实验结果进行定性分析,结果表明本文算法在背景杂波、快速运动、强度变化、尺度变化、遮挡、视野外、变形、低分辨率、运动模糊等属性方面具有较强的鲁棒性。该算法的快速注意力增强模块和目标自适应更新网络对跟踪成功率的提高有积极的影响。结论该算法提高了主干对红外目标特征的捕捉能力。它还通过目标的历史变化信息自适应地调整目标的特征状态。解决了复杂环境下红外目标跟踪易受干扰的问题,提高了红外目标长期跟踪的精度和成功率。
{"title":"Infrared target tracking algorithm based on attention mechanism enhancement and target model update","authors":"Ji Qingbo, Chen Kuicheng, Hou Changbo, Li Ziqi, Qi Yufei","doi":"10.11834/jig.220459","DOIUrl":"https://doi.org/10.11834/jig.220459","url":null,"abstract":"目的 多数以深度学习为基础的红外目标跟踪方法在对比度弱、噪声多的红外场景下,缺少对目标细节信息的利用,而且当跟踪场景中有相似目标且背景杂乱时,大部分跟踪器无法对跟踪的目标进行有效的更新,导致长期跟踪时鲁棒性较差。为解决这些问题,提出一种基于注意力和目标模型自适应更新的红外目标跟踪算法。方法 首先以无锚框算法为基础,加入针对红外跟踪场景设计的快速注意力增强模块以并行处理红外图像,在不损失原信息的前提下提高红外目标与背景的差异性并增强目标的细节信息,然后将提取的特征融合到主干网络的中间层,最后利用目标模型自适应更新网络,学习红外目标的特征变化趋势,同时对目标的中高层特征进行动态更新。结果 本文方法在 4 个红外目标跟踪评估基准上与其他先进算法进行了比较,在 LSOTB-TIR(large-scale thermalinfrared object tracking benchmark)数据集上的精度为 79.0%,归一化精度为 71.5%,成功率为 66.2%,较第 2 名在精度和成功率上分别高出 4.0%和 4.6%;在 PTB-TIR(thermal infrared pedestrian tracking benchmark)数据集上的精度为85.1%,成功率为 66.9%,较第 2 名分别高出 1.3% 和 3.6%;在 VOT-TIR2015(thermal infrared visual object tracking)和VOT-TIR2017 数据集上的期望平均重叠与精确度分别为 0.344、0.73 和 0.276、0.71,本文算法在前 3 个数据集的测评结果均达到最优。同时,在 LSOTB-TIR 数据集上的消融实验结果显示,本文方法对基线跟踪器有着明显的增益作用。结论 本文算法提高了对红外目标特征的捕捉能力,解决了红外目标跟踪易受干扰的问题,能够提升红外目标长期跟踪的精度和成功率。;Objective Most target tracking algorithms are designed based on visible sight scenes.However, in some cases, infrared target tracking has advantages that visible light does not have.Infrared equipment uses the radiation of an object itself to image and does not require additional lighting sources.It can display the target in weak light or dark scenes and has a certain penetration ability.However, infrared images have defects, such as unclear boundaries between targets and backgrounds, blurred images, and cluttered backgrounds.Moreover, some infrared dataset images are rough, negatively impacting the training of data-driven-based deep learning algorithms to a certain extent.Infrared tracking algorithms can be divided into traditional methods and deep learning methods.Traditional methods generally take the idea of correlation filtering as the core.Deep learning methods are mainly divided into the method of a neural network providing target features for correlation filters and the method of calculating the similarity of the image area with the framework of the Siamese network.The feature extraction ability of traditional methods for infrared targets is far inferior to that of deep learning methods.Moreover, the filters trained online cannot adapt to fast-moving or blurred targets, resulting in poor tracking accuracy in scenes with complex backgrounds.At present, most deep-learning-based infrared target tracking methods still lack the use of detailed information on infrared targets in infrared scenes with weak contrast and noise.Most trackers cannot effectively update the tracked target when the tracking scene has similar targets and cluttered background.This scenario results in poor robustness in long-term tracking.Therefore, an infrared target tracking algorithm based on attention and template adaptive update is proposed to solve the problems mentioned.Method The Siamese network tracking algorithm takes the target in the first frame as the template and performs similarity calculation on the search area of the subsequent frames to obtain the position of the target with the maximum response.The method has a simple structure and high tracking efficiency.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135601335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interferometric phase denoising combining global context and fused attention 结合全局背景和融合注意的干涉相位去噪
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220562
Zeng Qingwang, Dong Zhangyu, Yang Xuezhi, Chong Fating
目的 干涉相位去噪是合成孔径雷达干涉测量(interferometric synthetic aperture radar,InSAR)技术中的关键环节,其效果对测量精度具有重要影响。针对现有的干涉相位去噪方法大多关注局部特征以及在特征提取方面的局限性,同时为了平衡去噪和结构保持两者之间的关系,提出了一种结合全局上下文与融合注意力的相位去噪网络 GCFA-PDNet (global context and fused attention phase denoising network)。方法 将干涉相位分离为实部和虚部依次输入到网络,先从噪声相位中提取浅层特征,再将其映射到由全局上下文提取模块和融合注意力模块组成的特征增强模块,最后通过全局残差学习生成去噪图像。全局上下文提取模块能提取全局上下文信息,具有非局部方法的优势;融合注意力模块既强调关键特征,又能高效提取隐藏在复杂背景中的噪声信息。结果 所提出的方法与对比方法中性能最优者相比,在模拟数据结果的平均峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(struc-tural similarity,SSIM)指标分别提高了 5.72% 和 2.94%,在真实数据结果的平均残差点减少百分比(percentage ofresidual point reduction,PRR)和相位标准偏差(phase standard deviation,PSD)指标分别提高了 2.01% 和 3.57%。结合定性与定量分析,所提出的方法优于其他 5 种不同类型的相位去噪方法。结论 提出的去噪网络较其他方法具有更强大的特征提取能力,此外由于关注全局上下文信息和强调关键特征,网络能够在增强去噪能力的同时保持原始相位细节。;Objective Interferometric phase noise is introduced by three types of inherent factors:1)system noise, such as thermal noise and synthetic aperture radar(SAR)speckle noise;2)decoherence problems, including baseline, temporal, and spatial decoherence;3)signal processing errors, such as misregistration.The existence of noise increases the difficulty of phase unwrapping and even causes the process to fail, thereby seriously interfering with the final interferometric result.Therefore, interferometric phase denoising is a key link in interferometric SAR(InSAR)technology.Its effect has an important influence on the accuracy of measurement results.The existing interferometric phase denoising algorithms still have many defects.First is the insufficient ability to capture global contextual information.Some algorithms ignore global context information or only focus on local context information derived from a few pixels.They also lack global context information.This feature is manifested as unstable detail preservation ability in denoising results.Second, many researchers only pay attention to the influence of the spatial dimension or channel dimension of the image on the denoising result to improve the performance of denoising networks.However, they do not use spatial and channel dimensions in combination.Third, the high-level features extracted from the deep layers of the convolutional neural network have rich semantic information and ambiguous spatial details.In comparison, the low-level features extracted from the shallow layers of the network contain considerable pixel-level noise information.However, these features are isolated from one another;thus, they cannot be fully used.Method Most of the existing interferometric phase denoising methods focus on local features, and they have many limitations in feature extraction.A phase denoising network called GCFA-PDNet is proposed to solve these problems while balancing the relationship between denoising and structure preservation.This proposed phase denoising network combines global context and fused attention.The method separates the interference phase into real and imaginary parts and inputs them into the network.First, the shallow features are extracted from th
目的 干涉相位去噪是合成孔径雷达干涉测量(interferometric synthetic aperture radar,InSAR)技术中的关键环节,其效果对测量精度具有重要影响。针对现有的干涉相位去噪方法大多关注局部特征以及在特征提取方面的局限性,同时为了平衡去噪和结构保持两者之间的关系,提出了一种结合全局上下文与融合注意力的相位去噪网络 GCFA-PDNet (global context and fused attention phase denoising network)。方法 将干涉相位分离为实部和虚部依次输入到网络,先从噪声相位中提取浅层特征,再将其映射到由全局上下文提取模块和融合注意力模块组成的特征增强模块,最后通过全局残差学习生成去噪图像。全局上下文提取模块能提取全局上下文信息,具有非局部方法的优势;融合注意力模块既强调关键特征,又能高效提取隐藏在复杂背景中的噪声信息。结果 所提出的方法与对比方法中性能最优者相比,在模拟数据结果的平均峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(struc-tural similarity,SSIM)指标分别提高了 5.72% 和 2.94%,在真实数据结果的平均残差点减少百分比(percentage ofresidual point reduction,PRR)和相位标准偏差(phase standard deviation,PSD)指标分别提高了 2.01% 和 3.57%。结合定性与定量分析,所提出的方法优于其他 5 种不同类型的相位去噪方法。结论 提出的去噪网络较其他方法具有更强大的特征提取能力,此外由于关注全局上下文信息和强调关键特征,网络能够在增强去噪能力的同时保持原始相位细节。;Objective Interferometric phase noise is introduced by three types of inherent factors:1)system noise, such as thermal noise and synthetic aperture radar(SAR)speckle noise;2)decoherence problems, including baseline, temporal, and spatial decoherence;3)signal processing errors, such as misregistration.The existence of noise increases the difficulty of phase unwrapping and even causes the process to fail, thereby seriously interfering with the final interferometric result.Therefore, interferometric phase denoising is a key link in interferometric SAR(InSAR)technology.Its effect has an important influence on the accuracy of measurement results.The existing interferometric phase denoising algorithms still have many defects.First is the insufficient ability to capture global contextual information.Some algorithms ignore global context information or only focus on local context information derived from a few pixels.They also lack global context information.This feature is manifested as unstable detail preservation ability in denoising results.Second, many researchers only pay attention to the influence of the spatial dimension or channel dimension of the image on the denoising result to improve the performance of denoising networks.However, they do not use spatial and channel dimensions in combination.Third, the high-level features extracted from the deep layers of the convolutional neural network have rich semantic information and ambiguous spatial details.In comparison, the low-level features extracted from the shallow layers of the network contain considerable pixel-level noise information.However, these features are isolated from one another;thus, they cannot be fully used.Method Most of the existing interferometric phase denoising methods focus on local features, and they have many limitations in feature extraction.A phase denoising network called GCFA-PDNet is proposed to solve these problems while balancing the relationship between denoising and structure preservation.This proposed phase denoising network combines global context and fused attention.The method separates the interference phase into real and imaginary parts and inputs them into the network.First, the shallow features are extracted from th
{"title":"Interferometric phase denoising combining global context and fused attention","authors":"Zeng Qingwang, Dong Zhangyu, Yang Xuezhi, Chong Fating","doi":"10.11834/jig.220562","DOIUrl":"https://doi.org/10.11834/jig.220562","url":null,"abstract":"目的 干涉相位去噪是合成孔径雷达干涉测量(interferometric synthetic aperture radar,InSAR)技术中的关键环节,其效果对测量精度具有重要影响。针对现有的干涉相位去噪方法大多关注局部特征以及在特征提取方面的局限性,同时为了平衡去噪和结构保持两者之间的关系,提出了一种结合全局上下文与融合注意力的相位去噪网络 GCFA-PDNet (global context and fused attention phase denoising network)。方法 将干涉相位分离为实部和虚部依次输入到网络,先从噪声相位中提取浅层特征,再将其映射到由全局上下文提取模块和融合注意力模块组成的特征增强模块,最后通过全局残差学习生成去噪图像。全局上下文提取模块能提取全局上下文信息,具有非局部方法的优势;融合注意力模块既强调关键特征,又能高效提取隐藏在复杂背景中的噪声信息。结果 所提出的方法与对比方法中性能最优者相比,在模拟数据结果的平均峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(struc-tural similarity,SSIM)指标分别提高了 5.72% 和 2.94%,在真实数据结果的平均残差点减少百分比(percentage ofresidual point reduction,PRR)和相位标准偏差(phase standard deviation,PSD)指标分别提高了 2.01% 和 3.57%。结合定性与定量分析,所提出的方法优于其他 5 种不同类型的相位去噪方法。结论 提出的去噪网络较其他方法具有更强大的特征提取能力,此外由于关注全局上下文信息和强调关键特征,网络能够在增强去噪能力的同时保持原始相位细节。;Objective Interferometric phase noise is introduced by three types of inherent factors:1)system noise, such as thermal noise and synthetic aperture radar(SAR)speckle noise;2)decoherence problems, including baseline, temporal, and spatial decoherence;3)signal processing errors, such as misregistration.The existence of noise increases the difficulty of phase unwrapping and even causes the process to fail, thereby seriously interfering with the final interferometric result.Therefore, interferometric phase denoising is a key link in interferometric SAR(InSAR)technology.Its effect has an important influence on the accuracy of measurement results.The existing interferometric phase denoising algorithms still have many defects.First is the insufficient ability to capture global contextual information.Some algorithms ignore global context information or only focus on local context information derived from a few pixels.They also lack global context information.This feature is manifested as unstable detail preservation ability in denoising results.Second, many researchers only pay attention to the influence of the spatial dimension or channel dimension of the image on the denoising result to improve the performance of denoising networks.However, they do not use spatial and channel dimensions in combination.Third, the high-level features extracted from the deep layers of the convolutional neural network have rich semantic information and ambiguous spatial details.In comparison, the low-level features extracted from the shallow layers of the network contain considerable pixel-level noise information.However, these features are isolated from one another;thus, they cannot be fully used.Method Most of the existing interferometric phase denoising methods focus on local features, and they have many limitations in feature extraction.A phase denoising network called GCFA-PDNet is proposed to solve these problems while balancing the relationship between denoising and structure preservation.This proposed phase denoising network combines global context and fused attention.The method separates the interference phase into real and imaginary parts and inputs them into the network.First, the shallow features are extracted from th","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135601338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The development,application,and future of LLM similar to ChatGPT 类似ChatGPT的LLM的发展、应用和未来
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.230536
Yan Hao, Liu Yuliang, Jin Lianwen, Bai Xiang
生成式人工智能技术自 ChatGPT 发布以来,不断突破瓶颈,吸引了资本规模投入、多领域革命和政府重点关注。本文首先分析了大模型的发展动态、应用现状和前景,然后从以下 3 个方面对大模型相关技术进行了简要介绍:1)概述了大模型相关构造技术,包括构造流程、研究现状和优化技术;2)总结了 3 类当前主流图像-文本的大模型多模态技术;3)介绍了根据评估方式不同而划分的 3 类大模型评估基准。参数优化与数据集构建是大模型产品普及与技术迭代的核心问题;多模态能力是大模型重要发展方向之一;设立评估基准是比较与约束大模型的关键方法。此外,本文还讨论了现有相关技术面临的挑战与未来可能的发展方向。现阶段的大模型产品已有强大的理解能力和创造能力,在教育、医疗和金融等领域已展现出广阔的应用前景。但同时,它们也存在训练部署困难、专业知识不足和安全隐患等问题。因此,完善参数优化、优质数据集构建、多模态等技术,并建立统一、全面、便捷的评估基准,将成为大模型突破现有局限的关键。;Generative artificial intelligence(AI)technology has achieved remarkable breakthroughs and advances in its intelligence level since the release of ChatGPT several months ago, especially in terms of its scope, automation, and intelligence.The rising popularity of generative AI attracts capital inflows and promotes the innovation of various fields.Moreover, governments worldwide pay considerable attention to generative AI and hold different attitudes toward it.The US government maintains a relatively relaxed attitude to stay ahead in the global technological arena, while European countries are conservative and are concerned about data privacy in large language models(LLMs).The Chinese government attaches great importance to AI and LLMs but also emphasizes the regulatory issues.With the growing influence of ChatGPT and its competitors and the rapid development of generative AI technology, conducting a deep analysis of them becomes necessary.This paper first provides an in-depth analysis of the development, application, and prospects of generative AI.Various types of LLMs have emerged as a series of remarkable technological products that have demonstrated versatile capabilities across multiple domains, such as education, medicine, finance, law, programming, and paper writing.These models are usually fine-tuned on the basis of general LLMs, with the aim of endowing the large models with additional domainspecific knowledge and enhanced adaptability to a specific domain.LLMs(e.g., GPT-4)have achieved rapid improvements in the past few months in terms of professional knowledge, reasoning, coding, credibility, security, transferability, and multimodality.Then, the technical contribution of generative AI technology is briefly introduced from four aspects:1) we review the related work on LLMs, such as GPT-4, PaLM2, ERNIE Bot, and their construction pipeline, which involves the training of base and assistant models.The base models store a large amount of linguistic knowledge, while the assistant models acquire stronger comprehension and generation capabilities after a series of fine-tuning.2)We outline a series of public LLMs based on LLaMA, a framework for building lightweight and memory-efficient LLMs, including Alpaca, Vicuna, Koala, and Baize, as well as the key technologies for building LLMs with low memory and computation requirements, consisting of low-rank adaptation, Self-instruct, and automatic prompt engineer.3)We summarize three types of existing mainstream image -text multimodal techniques:training additional adaptation la
生成式人工智能技术自 ChatGPT 发布以来,不断突破瓶颈,吸引了资本规模投入、多领域革命和政府重点关注。本文首先分析了大模型的发展动态、应用现状和前景,然后从以下 3 个方面对大模型相关技术进行了简要介绍:1)概述了大模型相关构造技术,包括构造流程、研究现状和优化技术;2)总结了 3 类当前主流图像-文本的大模型多模态技术;3)介绍了根据评估方式不同而划分的 3 类大模型评估基准。参数优化与数据集构建是大模型产品普及与技术迭代的核心问题;多模态能力是大模型重要发展方向之一;设立评估基准是比较与约束大模型的关键方法。此外,本文还讨论了现有相关技术面临的挑战与未来可能的发展方向。现阶段的大模型产品已有强大的理解能力和创造能力,在教育、医疗和金融等领域已展现出广阔的应用前景。但同时,它们也存在训练部署困难、专业知识不足和安全隐患等问题。因此,完善参数优化、优质数据集构建、多模态等技术,并建立统一、全面、便捷的评估基准,将成为大模型突破现有局限的关键。;Generative artificial intelligence(AI)technology has achieved remarkable breakthroughs and advances in its intelligence level since the release of ChatGPT several months ago, especially in terms of its scope, automation, and intelligence.The rising popularity of generative AI attracts capital inflows and promotes the innovation of various fields.Moreover, governments worldwide pay considerable attention to generative AI and hold different attitudes toward it.The US government maintains a relatively relaxed attitude to stay ahead in the global technological arena, while European countries are conservative and are concerned about data privacy in large language models(LLMs).The Chinese government attaches great importance to AI and LLMs but also emphasizes the regulatory issues.With the growing influence of ChatGPT and its competitors and the rapid development of generative AI technology, conducting a deep analysis of them becomes necessary.This paper first provides an in-depth analysis of the development, application, and prospects of generative AI.Various types of LLMs have emerged as a series of remarkable technological products that have demonstrated versatile capabilities across multiple domains, such as education, medicine, finance, law, programming, and paper writing.These models are usually fine-tuned on the basis of general LLMs, with the aim of endowing the large models with additional domainspecific knowledge and enhanced adaptability to a specific domain.LLMs(e.g., GPT-4)have achieved rapid improvements in the past few months in terms of professional knowledge, reasoning, coding, credibility, security, transferability, and multimodality.Then, the technical contribution of generative AI technology is briefly introduced from four aspects:1) we review the related work on LLMs, such as GPT-4, PaLM2, ERNIE Bot, and their construction pipeline, which involves the training of base and assistant models.The base models store a large amount of linguistic knowledge, while the assistant models acquire stronger comprehension and generation capabilities after a series of fine-tuning.2)We outline a series of public LLMs based on LLaMA, a framework for building lightweight and memory-efficient LLMs, including Alpaca, Vicuna, Koala, and Baize, as well as the key technologies for building LLMs with low memory and computation requirements, consisting of low-rank adaptation, Self-instruct, and automatic prompt engineer.3)We summarize three types of existing mainstream image -text multimodal techniques:training additional adaptation la
{"title":"The development,application,and future of LLM similar to ChatGPT","authors":"Yan Hao, Liu Yuliang, Jin Lianwen, Bai Xiang","doi":"10.11834/jig.230536","DOIUrl":"https://doi.org/10.11834/jig.230536","url":null,"abstract":"生成式人工智能技术自 ChatGPT 发布以来,不断突破瓶颈,吸引了资本规模投入、多领域革命和政府重点关注。本文首先分析了大模型的发展动态、应用现状和前景,然后从以下 3 个方面对大模型相关技术进行了简要介绍:1)概述了大模型相关构造技术,包括构造流程、研究现状和优化技术;2)总结了 3 类当前主流图像-文本的大模型多模态技术;3)介绍了根据评估方式不同而划分的 3 类大模型评估基准。参数优化与数据集构建是大模型产品普及与技术迭代的核心问题;多模态能力是大模型重要发展方向之一;设立评估基准是比较与约束大模型的关键方法。此外,本文还讨论了现有相关技术面临的挑战与未来可能的发展方向。现阶段的大模型产品已有强大的理解能力和创造能力,在教育、医疗和金融等领域已展现出广阔的应用前景。但同时,它们也存在训练部署困难、专业知识不足和安全隐患等问题。因此,完善参数优化、优质数据集构建、多模态等技术,并建立统一、全面、便捷的评估基准,将成为大模型突破现有局限的关键。;Generative artificial intelligence(AI)technology has achieved remarkable breakthroughs and advances in its intelligence level since the release of ChatGPT several months ago, especially in terms of its scope, automation, and intelligence.The rising popularity of generative AI attracts capital inflows and promotes the innovation of various fields.Moreover, governments worldwide pay considerable attention to generative AI and hold different attitudes toward it.The US government maintains a relatively relaxed attitude to stay ahead in the global technological arena, while European countries are conservative and are concerned about data privacy in large language models(LLMs).The Chinese government attaches great importance to AI and LLMs but also emphasizes the regulatory issues.With the growing influence of ChatGPT and its competitors and the rapid development of generative AI technology, conducting a deep analysis of them becomes necessary.This paper first provides an in-depth analysis of the development, application, and prospects of generative AI.Various types of LLMs have emerged as a series of remarkable technological products that have demonstrated versatile capabilities across multiple domains, such as education, medicine, finance, law, programming, and paper writing.These models are usually fine-tuned on the basis of general LLMs, with the aim of endowing the large models with additional domainspecific knowledge and enhanced adaptability to a specific domain.LLMs(e.g., GPT-4)have achieved rapid improvements in the past few months in terms of professional knowledge, reasoning, coding, credibility, security, transferability, and multimodality.Then, the technical contribution of generative AI technology is briefly introduced from four aspects:1) we review the related work on LLMs, such as GPT-4, PaLM2, ERNIE Bot, and their construction pipeline, which involves the training of base and assistant models.The base models store a large amount of linguistic knowledge, while the assistant models acquire stronger comprehension and generation capabilities after a series of fine-tuning.2)We outline a series of public LLMs based on LLaMA, a framework for building lightweight and memory-efficient LLMs, including Alpaca, Vicuna, Koala, and Baize, as well as the key technologies for building LLMs with low memory and computation requirements, consisting of low-rank adaptation, Self-instruct, and automatic prompt engineer.3)We summarize three types of existing mainstream image -text multimodal techniques:training additional adaptation la","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135650157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Key sub-region feature fusion network for fine-grained ship detection and recognition in remote sensing images 关键子区域特征融合网络用于遥感图像的细粒度船舶检测与识别
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220671
Zhang Lei, Chen Wen, Wang Yuehuan
目的 遥感图像中的舰船目标细粒度检测与识别在港口海域监视以及情报搜集等应用中有很高的实际应用价值,但遥感图像中不同种类的舰船目标整体颜色、形状与纹理特征相近,分辨力不足,导致舰船细粒度识别困难。针对该问题,提出了一种端到端的基于关键子区域特征的舰船细粒度检测与识别方法。方法 为了获得更适于目标细粒度识别的特征,提出多层次特征融合识别网络,按照整体、局部子区域两个层次从检测网络得到的候选目标区域中提取特征。然后结合候选目标中所有子区域的信息计算每个子区域的判别性显著度,对含有判别性组件的关键子区域进行挖掘。最后基于判别性显著度将子区域特征与整体特征进行自适应融合,形成表征能力更强的特征,对舰船目标进行细粒度识别。整个检测与识别网络采用端到端一体化设计,所有候选目标特征提取过程只需要经过一次骨干网络的计算,提高了计算效率。结果 在公开的带有细粒度类别标签的 HRSC2016(high resolu-tion ship collection)数据集 L3 任务上,本文方法平均准确率为 77.3%,相较于不采用多层次特征融合识别网络提升了 6.3%;在自建的包含 45 类舰船目标的 FGSAID(fine-grained ships in aerial images dataset)数据集上,本文方法平均准确率为 71.5%。结论 本文方法有效挖掘并融合了含有判别性组件的子区域的特征,解决了目标整体特征分辨力不足导致的细粒度目标识别困难问题,相较于现有的遥感图像舰船目标检测与识别算法准确性有明显提升。;Objective The ocean has great economic and military value.The development of human society increases the impact of ocean activities on the development of a country.The sea is an important carrier of marine activities.Thus, the recognition and monitoring of ship targets in key sea areas through remote sensing images are crucial to the national defense and development of the economy.Fine-grained ship detection and recognition in high-resolution remote sensing images refer to the identification of specific types of ships based on ship detection.A precise and detailed classification is valuable in practical application fields, such as sea surveillance and intelligence gathering.Instead of coarse-grained classification categories, such as warcraft and merchant ships, specific ship types, such as Arleigh Burke-class destroyer, Nimitz-class aircraft carrier, container, and car carrier, are necessary.However, the overall color, shape, and texture of different types of ship targets are similar.The structures of ships belong to different types, but their uses are similar.Moreover, the coating color of military ships is monotonous.These characteristics complicate the classification of these targets.The existing ship detectors are designed to focus on locating targets.The design of the classification branch of these detectors is relatively simple.They only use the features of whole targets for classification, significantly decreasing the performance in the fine-grained labeled datasets.The existing ship classification methods, which mainly classify targets on the pre-cropped image patches, are separated from the detection process.This approach is unsatisfactory for practical applications for two reasons:1)the whole backbone of these methods based on neural networks must be executed on every proposal to extract features.The remote sensing images of the harbor usually include several ships;thus, the computation cost increases sharply.2)The detection and classification networks are optimized separately, and the parameters of both networks are optimized to the best.The whole process cannot obtain the optimal solution because the locations of proposals obtained by detection methods vary with the pre-cropped image patches.utilize prior knowledge of ships and propose the key sub-region feature fusion network(KSFFN), whi
目的 遥感图像中的舰船目标细粒度检测与识别在港口海域监视以及情报搜集等应用中有很高的实际应用价值,但遥感图像中不同种类的舰船目标整体颜色、形状与纹理特征相近,分辨力不足,导致舰船细粒度识别困难。针对该问题,提出了一种端到端的基于关键子区域特征的舰船细粒度检测与识别方法。方法 为了获得更适于目标细粒度识别的特征,提出多层次特征融合识别网络,按照整体、局部子区域两个层次从检测网络得到的候选目标区域中提取特征。然后结合候选目标中所有子区域的信息计算每个子区域的判别性显著度,对含有判别性组件的关键子区域进行挖掘。最后基于判别性显著度将子区域特征与整体特征进行自适应融合,形成表征能力更强的特征,对舰船目标进行细粒度识别。整个检测与识别网络采用端到端一体化设计,所有候选目标特征提取过程只需要经过一次骨干网络的计算,提高了计算效率。结果 在公开的带有细粒度类别标签的 HRSC2016(high resolu-tion ship collection)数据集 L3 任务上,本文方法平均准确率为 77.3%,相较于不采用多层次特征融合识别网络提升了 6.3%;在自建的包含 45 类舰船目标的 FGSAID(fine-grained ships in aerial images dataset)数据集上,本文方法平均准确率为 71.5%。结论 本文方法有效挖掘并融合了含有判别性组件的子区域的特征,解决了目标整体特征分辨力不足导致的细粒度目标识别困难问题,相较于现有的遥感图像舰船目标检测与识别算法准确性有明显提升。;Objective The ocean has great economic and military value.The development of human society increases the impact of ocean activities on the development of a country.The sea is an important carrier of marine activities.Thus, the recognition and monitoring of ship targets in key sea areas through remote sensing images are crucial to the national defense and development of the economy.Fine-grained ship detection and recognition in high-resolution remote sensing images refer to the identification of specific types of ships based on ship detection.A precise and detailed classification is valuable in practical application fields, such as sea surveillance and intelligence gathering.Instead of coarse-grained classification categories, such as warcraft and merchant ships, specific ship types, such as Arleigh Burke-class destroyer, Nimitz-class aircraft carrier, container, and car carrier, are necessary.However, the overall color, shape, and texture of different types of ship targets are similar.The structures of ships belong to different types, but their uses are similar.Moreover, the coating color of military ships is monotonous.These characteristics complicate the classification of these targets.The existing ship detectors are designed to focus on locating targets.The design of the classification branch of these detectors is relatively simple.They only use the features of whole targets for classification, significantly decreasing the performance in the fine-grained labeled datasets.The existing ship classification methods, which mainly classify targets on the pre-cropped image patches, are separated from the detection process.This approach is unsatisfactory for practical applications for two reasons:1)the whole backbone of these methods based on neural networks must be executed on every proposal to extract features.The remote sensing images of the harbor usually include several ships;thus, the computation cost increases sharply.2)The detection and classification networks are optimized separately, and the parameters of both networks are optimized to the best.The whole process cannot obtain the optimal solution because the locations of proposals obtained by detection methods vary with the pre-cropped image patches.utilize prior knowledge of ships and propose the key sub-region feature fusion network(KSFFN), whi
{"title":"Key sub-region feature fusion network for fine-grained ship detection and recognition in remote sensing images","authors":"Zhang Lei, Chen Wen, Wang Yuehuan","doi":"10.11834/jig.220671","DOIUrl":"https://doi.org/10.11834/jig.220671","url":null,"abstract":"目的 遥感图像中的舰船目标细粒度检测与识别在港口海域监视以及情报搜集等应用中有很高的实际应用价值,但遥感图像中不同种类的舰船目标整体颜色、形状与纹理特征相近,分辨力不足,导致舰船细粒度识别困难。针对该问题,提出了一种端到端的基于关键子区域特征的舰船细粒度检测与识别方法。方法 为了获得更适于目标细粒度识别的特征,提出多层次特征融合识别网络,按照整体、局部子区域两个层次从检测网络得到的候选目标区域中提取特征。然后结合候选目标中所有子区域的信息计算每个子区域的判别性显著度,对含有判别性组件的关键子区域进行挖掘。最后基于判别性显著度将子区域特征与整体特征进行自适应融合,形成表征能力更强的特征,对舰船目标进行细粒度识别。整个检测与识别网络采用端到端一体化设计,所有候选目标特征提取过程只需要经过一次骨干网络的计算,提高了计算效率。结果 在公开的带有细粒度类别标签的 HRSC2016(high resolu-tion ship collection)数据集 L3 任务上,本文方法平均准确率为 77.3%,相较于不采用多层次特征融合识别网络提升了 6.3%;在自建的包含 45 类舰船目标的 FGSAID(fine-grained ships in aerial images dataset)数据集上,本文方法平均准确率为 71.5%。结论 本文方法有效挖掘并融合了含有判别性组件的子区域的特征,解决了目标整体特征分辨力不足导致的细粒度目标识别困难问题,相较于现有的遥感图像舰船目标检测与识别算法准确性有明显提升。;Objective The ocean has great economic and military value.The development of human society increases the impact of ocean activities on the development of a country.The sea is an important carrier of marine activities.Thus, the recognition and monitoring of ship targets in key sea areas through remote sensing images are crucial to the national defense and development of the economy.Fine-grained ship detection and recognition in high-resolution remote sensing images refer to the identification of specific types of ships based on ship detection.A precise and detailed classification is valuable in practical application fields, such as sea surveillance and intelligence gathering.Instead of coarse-grained classification categories, such as warcraft and merchant ships, specific ship types, such as Arleigh Burke-class destroyer, Nimitz-class aircraft carrier, container, and car carrier, are necessary.However, the overall color, shape, and texture of different types of ship targets are similar.The structures of ships belong to different types, but their uses are similar.Moreover, the coating color of military ships is monotonous.These characteristics complicate the classification of these targets.The existing ship detectors are designed to focus on locating targets.The design of the classification branch of these detectors is relatively simple.They only use the features of whole targets for classification, significantly decreasing the performance in the fine-grained labeled datasets.The existing ship classification methods, which mainly classify targets on the pre-cropped image patches, are separated from the detection process.This approach is unsatisfactory for practical applications for two reasons:1)the whole backbone of these methods based on neural networks must be executed on every proposal to extract features.The remote sensing images of the harbor usually include several ships;thus, the computation cost increases sharply.2)The detection and classification networks are optimized separately, and the parameters of both networks are optimized to the best.The whole process cannot obtain the optimal solution because the locations of proposals obtained by detection methods vary with the pre-cropped image patches.utilize prior knowledge of ships and propose the key sub-region feature fusion network(KSFFN), whi","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135650158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Short-term memory and CenterTrack based vehicle-related multi-target tracking method 基于短时记忆和CenterTrack的车辆相关多目标跟踪方法
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220026
Zhang Yao, Lu Huanzhang, Wang Jue, Zhang Luping, Hu Moufa
目的 车辆多目标跟踪是智能交通领域关键技术,其性能对车辆轨迹分析和异常行为鉴别有显著影响。然而,车辆多目标跟踪常受外部光照、道路环境因素影响,车辆远近尺度变化以及相互遮挡等干扰,导致远处车辆漏检或车辆身份切换(ID switch,IDs)问题。本文提出短时记忆与CenterTrack的车辆多目标跟踪,提升车辆多目标跟踪准确度(multiple object tracking accuracy,MOTA),改善算法的适应性。方法 利用小样本扩增增加远处小目标车辆训练样本数;通过增加的样本重新训练CenterTrack确定车辆位置及车辆在相邻帧之间的中心位移量;当待关联轨迹与检测目标匹配失败时通过轨迹运动信息预测将来的位置;利用短时记忆将待关联轨迹按丢失时间长短分级与待匹配检测关联以减少跟踪车辆IDs。结果 在交通监控车辆多目标跟踪数据集UA-DETRAC (University at Albany detection and tracking)构建的5个测试序列数据中,本文方法在维持CenterTrack优势的同时,对其表现不佳的场景获得近30%的提升,与YOLOv4-DeepSort(you only look once—simple online and realtime tracking with deep association metric)相比,4种场景均获得近10%的提升,效果显著。Sherbrooke数据集的测试结果,本文方法同样获得了性能提升。结论 本文扩增了远处小目标车辆训练样本,缓解了远处小目标与近处大目标存在的样本不均衡,提高了算法对远处小目标车辆的检测能力,同时短时记忆维持关联失败的轨迹运动信息并分级匹配检测目标,降低了算法对跟踪车辆的IDs,综合提高了MOTA。;Objective The task of multi-object tracking is often focused on estimating the number,location or other related properties of objects in the scene. Specifically,it is required to be estimated accurately and consistently over a period of time. Vehicle-related multi-target tracking can be as a key technique for such domain like intelligent transportation,and its performance has a significant impact on vehicle trajectory analysis and abnormal behavior identification to some extent. Vehicle-related multi-target tracking is also recognized as a key branch of multi-target tracking and a potential technique for autonomous driving and intelligent traffic surveillance systems. For vehicle-related multi-target tracking,temporal-based motion status of vehicles in traffic scenes can be automatically obtained,which is beneficial to analyze traffic conditions and implement decisions-making quickly for transportation administrations,as well as the automatic driving system. However,to resolve missed detection of distant vehicles or vehicle ID switch(IDs) problems,such factors are often to be dealt with in relevance to external illumination,road environment factors,changes in the scale of the vehicle near and far,and mutual occlusion. We develop an integrated short-term memory and CenterTrack ability to improve the vehicle multi-target tracking accuracy(multiple object tracking accuracy(MOTA)),and its adaptability of the algorithm can be optimized further. Method From the analysis of a large number of traffic monitoring video data,it can be seen the reasons for the unbalanced samples in the training samples. On the one hand,due to the fast speed of the captured vehicle target,the identified distant small target vehicle can be preserved temperorily,and it lacks of more consistent frames. On the other hand,the amount of apparent feature information is lower derived from small target vehicle itself,and the amount of neural networkextracted feature information is disappeared quickly many times. The relative number of distant small targets in the field of view is relatively small. After downsampling as a training sample,the feature quantity is disappeared very fast,resulting in an extensive reduction in the number of effectiv
目的 车辆多目标跟踪是智能交通领域关键技术,其性能对车辆轨迹分析和异常行为鉴别有显著影响。然而,车辆多目标跟踪常受外部光照、道路环境因素影响,车辆远近尺度变化以及相互遮挡等干扰,导致远处车辆漏检或车辆身份切换(ID switch,IDs)问题。本文提出短时记忆与CenterTrack的车辆多目标跟踪,提升车辆多目标跟踪准确度(multiple object tracking accuracy,MOTA),改善算法的适应性。方法 利用小样本扩增增加远处小目标车辆训练样本数;通过增加的样本重新训练CenterTrack确定车辆位置及车辆在相邻帧之间的中心位移量;当待关联轨迹与检测目标匹配失败时通过轨迹运动信息预测将来的位置;利用短时记忆将待关联轨迹按丢失时间长短分级与待匹配检测关联以减少跟踪车辆IDs。结果 在交通监控车辆多目标跟踪数据集UA-DETRAC (University at Albany detection and tracking)构建的5个测试序列数据中,本文方法在维持CenterTrack优势的同时,对其表现不佳的场景获得近30%的提升,与YOLOv4-DeepSort(you only look once—simple online and realtime tracking with deep association metric)相比,4种场景均获得近10%的提升,效果显著。Sherbrooke数据集的测试结果,本文方法同样获得了性能提升。结论 本文扩增了远处小目标车辆训练样本,缓解了远处小目标与近处大目标存在的样本不均衡,提高了算法对远处小目标车辆的检测能力,同时短时记忆维持关联失败的轨迹运动信息并分级匹配检测目标,降低了算法对跟踪车辆的IDs,综合提高了MOTA。;Objective The task of multi-object tracking is often focused on estimating the number,location or other related properties of objects in the scene. Specifically,it is required to be estimated accurately and consistently over a period of time. Vehicle-related multi-target tracking can be as a key technique for such domain like intelligent transportation,and its performance has a significant impact on vehicle trajectory analysis and abnormal behavior identification to some extent. Vehicle-related multi-target tracking is also recognized as a key branch of multi-target tracking and a potential technique for autonomous driving and intelligent traffic surveillance systems. For vehicle-related multi-target tracking,temporal-based motion status of vehicles in traffic scenes can be automatically obtained,which is beneficial to analyze traffic conditions and implement decisions-making quickly for transportation administrations,as well as the automatic driving system. However,to resolve missed detection of distant vehicles or vehicle ID switch(IDs) problems,such factors are often to be dealt with in relevance to external illumination,road environment factors,changes in the scale of the vehicle near and far,and mutual occlusion. We develop an integrated short-term memory and CenterTrack ability to improve the vehicle multi-target tracking accuracy(multiple object tracking accuracy(MOTA)),and its adaptability of the algorithm can be optimized further. Method From the analysis of a large number of traffic monitoring video data,it can be seen the reasons for the unbalanced samples in the training samples. On the one hand,due to the fast speed of the captured vehicle target,the identified distant small target vehicle can be preserved temperorily,and it lacks of more consistent frames. On the other hand,the amount of apparent feature information is lower derived from small target vehicle itself,and the amount of neural networkextracted feature information is disappeared quickly many times. The relative number of distant small targets in the field of view is relatively small. After downsampling as a training sample,the feature quantity is disappeared very fast,resulting in an extensive reduction in the number of effectiv
{"title":"Short-term memory and CenterTrack based vehicle-related multi-target tracking method","authors":"Zhang Yao, Lu Huanzhang, Wang Jue, Zhang Luping, Hu Moufa","doi":"10.11834/jig.220026","DOIUrl":"https://doi.org/10.11834/jig.220026","url":null,"abstract":"目的 车辆多目标跟踪是智能交通领域关键技术,其性能对车辆轨迹分析和异常行为鉴别有显著影响。然而,车辆多目标跟踪常受外部光照、道路环境因素影响,车辆远近尺度变化以及相互遮挡等干扰,导致远处车辆漏检或车辆身份切换(ID switch,IDs)问题。本文提出短时记忆与CenterTrack的车辆多目标跟踪,提升车辆多目标跟踪准确度(multiple object tracking accuracy,MOTA),改善算法的适应性。方法 利用小样本扩增增加远处小目标车辆训练样本数;通过增加的样本重新训练CenterTrack确定车辆位置及车辆在相邻帧之间的中心位移量;当待关联轨迹与检测目标匹配失败时通过轨迹运动信息预测将来的位置;利用短时记忆将待关联轨迹按丢失时间长短分级与待匹配检测关联以减少跟踪车辆IDs。结果 在交通监控车辆多目标跟踪数据集UA-DETRAC (University at Albany detection and tracking)构建的5个测试序列数据中,本文方法在维持CenterTrack优势的同时,对其表现不佳的场景获得近30%的提升,与YOLOv4-DeepSort(you only look once—simple online and realtime tracking with deep association metric)相比,4种场景均获得近10%的提升,效果显著。Sherbrooke数据集的测试结果,本文方法同样获得了性能提升。结论 本文扩增了远处小目标车辆训练样本,缓解了远处小目标与近处大目标存在的样本不均衡,提高了算法对远处小目标车辆的检测能力,同时短时记忆维持关联失败的轨迹运动信息并分级匹配检测目标,降低了算法对跟踪车辆的IDs,综合提高了MOTA。;Objective The task of multi-object tracking is often focused on estimating the number,location or other related properties of objects in the scene. Specifically,it is required to be estimated accurately and consistently over a period of time. Vehicle-related multi-target tracking can be as a key technique for such domain like intelligent transportation,and its performance has a significant impact on vehicle trajectory analysis and abnormal behavior identification to some extent. Vehicle-related multi-target tracking is also recognized as a key branch of multi-target tracking and a potential technique for autonomous driving and intelligent traffic surveillance systems. For vehicle-related multi-target tracking,temporal-based motion status of vehicles in traffic scenes can be automatically obtained,which is beneficial to analyze traffic conditions and implement decisions-making quickly for transportation administrations,as well as the automatic driving system. However,to resolve missed detection of distant vehicles or vehicle ID switch(IDs) problems,such factors are often to be dealt with in relevance to external illumination,road environment factors,changes in the scale of the vehicle near and far,and mutual occlusion. We develop an integrated short-term memory and CenterTrack ability to improve the vehicle multi-target tracking accuracy(multiple object tracking accuracy(MOTA)),and its adaptability of the algorithm can be optimized further. Method From the analysis of a large number of traffic monitoring video data,it can be seen the reasons for the unbalanced samples in the training samples. On the one hand,due to the fast speed of the captured vehicle target,the identified distant small target vehicle can be preserved temperorily,and it lacks of more consistent frames. On the other hand,the amount of apparent feature information is lower derived from small target vehicle itself,and the amount of neural networkextracted feature information is disappeared quickly many times. The relative number of distant small targets in the field of view is relatively small. After downsampling as a training sample,the feature quantity is disappeared very fast,resulting in an extensive reduction in the number of effectiv","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135057030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binocular rivalry-based stereoscopic images quality assessment relevant to its asymmetric and distorted contexts 基于双目竞争的立体图像质量评价及其不对称和扭曲背景
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220309
Tang Yiling, Jiang Shunliang, Xu Shaoping, Xiao Jian, Chen Xiaojun
目的 现有方法存在特征提取时间过长、非对称失真图像预测准确性不高的问题,同时少有工作对非对称失真与对称失真立体图像的分类进行研究,为此提出了基于双目竞争的非对称失真立体图像质量评价方法。方法 依据双目竞争的视觉现象,利用非对称失真立体图像两个视点的图像质量衰减程度的不同,生成单目图像特征的融合系数,融合从左右视点图像中提取的灰度空间特征与HSV (hue-saturation-value)彩色空间特征。同时,量化两个视点图像在结构、信息量和质量衰减程度等多方面的差异,获得双目差异特征。并且将双目融合特征与双目差异特征级联为一个描述能力更强的立体图像质量感知特征向量,训练基于支持向量回归的特征—质量映射模型。此外,还利用双目差异特征训练基于支持向量分类模型的对称失真与非对称失真立体图像分类模型。结果 本文提出的质量预测模型在4个数据库上的SROCC (Spearman rank order correlation coefficient)和PLCC (Pearson linear correlation coefficient)均达到0.95以上,在3个非对称失真数据库上的均方根误差(root of mean square error,RMSE)取值均优于对比算法。在LIVE-II(LIVE 3D image quality database phase II)、IVC-I(Waterloo-IVC 3D image qualityassessment database phase I)和IVC-II (Waterloo-IVC 3D image quality assessment database phase II)这3个非对称失真立体图像测试数据库上的失真类型分类测试中,对称失真立体图像的分类准确率分别为89.91%、94.76%和98.97%,非对称失真立体图像的分类准确率分别为95.46%,92.64%和96.22%。结论 本文方法依据双目竞争的视觉现象融合左右视点图像的质量感知特征用于立体图像质量预测,能够提升非对称失真立体图像的评价准确性和鲁棒性。所提取双目差异性特征还能够用于将对称失真与非对称失真立体图像进行有效分类,分类准确性高。;Objective Computer vision-related stereoscopic image quality assessment(SIQA) is focused on recently. It is essential for parameter setting and system optimizing for such domains of multiple stereoscopic image applications like image storage,compression,transmission,and display. Stereoscopic images can be segmented into two sorts of distorted images:symmetrically and asymmetrically distorted,in terms of the degree of degradation between the left and right views. For symmetric-based distorted stereoscopic images,the distortion type and degree occurred in the left and right views are basically in consistency. Early SIQA methods were effective in evaluating symmetrically distorted images by averaging scores or features derived from the two views. However,in practice,the stereoscopic images are often asymmetrically distorted,where the distortion type and level of the two views are different. Simply averaging the quality values of the two views cannot accurately simulate the binocular fusion process and the binocular rivalry phenomena in relevance to the human visual system. Consequently,the evaluation accuracy of these methods will be down to severe lower when the quality of asymmetrically distorted stereoscopic images is estimated. Previous studies have shown that when the left and right views of a stereoscopic image exhibit varying levels or types of distortion,binocular rivalry is primarily driven by one of the views. Specially,in the process of evaluating the quality of a stereoscopic image,the visual quality of one view has a greater impact on the stereopair quality evaluation than the other view. To address this issue,some methods have simulated the binocular rivalry phenomenon in human visual system,and used a weighted average method to fuse the visual information in the two views of stereo-pairs as well. However,existing methods are still challenged for its lower prediction accuracy of asymmetrically distorted images,and its feature extraction process is also time-consuming. To optimize the evaluation accuracy of asymmet
对10个最先进的SIQA指标进行了比较分析。为了衡量绩效,我们使用了三种常用的绩效指标,分别是Spearman秩序相关系数(SROCC)、Pearson线性相关系数(PLCC)和均方根误差(RMSE)。实验结果表明,该方法的srocc和plc(越高越好)均大于0。95. 此外,该方法的均方根误差(越低越好)可以达到潜在的更低程度。此外,本文提出的分类器在LIVE-II、IVC-I和IVC-II数据库上进行了测试。对于LIVE-II数据库,为95。46%的非对称畸变立体图像能被准确分类。对于IVC-I和IVC-II数据库,对称畸变图像的分类准确率均可达到94。76%和98。97%,非对称畸变图像的分类准确率均可达到92。64%和96。22%。结论非对称畸变立体图像的退化程度可以量化。利用图像质量退化系数对单眼特征进行融合,有利于建立更具描述性的双目感知特征向量,提高非对称畸变立体图像的预测精度和鲁棒性。所提出的分类器既可以对对称畸变立体图像进行分类,也可以对不对称畸变立体图像进行分类。
{"title":"Binocular rivalry-based stereoscopic images quality assessment relevant to its asymmetric and distorted contexts","authors":"Tang Yiling, Jiang Shunliang, Xu Shaoping, Xiao Jian, Chen Xiaojun","doi":"10.11834/jig.220309","DOIUrl":"https://doi.org/10.11834/jig.220309","url":null,"abstract":"目的 现有方法存在特征提取时间过长、非对称失真图像预测准确性不高的问题,同时少有工作对非对称失真与对称失真立体图像的分类进行研究,为此提出了基于双目竞争的非对称失真立体图像质量评价方法。方法 依据双目竞争的视觉现象,利用非对称失真立体图像两个视点的图像质量衰减程度的不同,生成单目图像特征的融合系数,融合从左右视点图像中提取的灰度空间特征与HSV (hue-saturation-value)彩色空间特征。同时,量化两个视点图像在结构、信息量和质量衰减程度等多方面的差异,获得双目差异特征。并且将双目融合特征与双目差异特征级联为一个描述能力更强的立体图像质量感知特征向量,训练基于支持向量回归的特征—质量映射模型。此外,还利用双目差异特征训练基于支持向量分类模型的对称失真与非对称失真立体图像分类模型。结果 本文提出的质量预测模型在4个数据库上的SROCC (Spearman rank order correlation coefficient)和PLCC (Pearson linear correlation coefficient)均达到0.95以上,在3个非对称失真数据库上的均方根误差(root of mean square error,RMSE)取值均优于对比算法。在LIVE-II(LIVE 3D image quality database phase II)、IVC-I(Waterloo-IVC 3D image qualityassessment database phase I)和IVC-II (Waterloo-IVC 3D image quality assessment database phase II)这3个非对称失真立体图像测试数据库上的失真类型分类测试中,对称失真立体图像的分类准确率分别为89.91%、94.76%和98.97%,非对称失真立体图像的分类准确率分别为95.46%,92.64%和96.22%。结论 本文方法依据双目竞争的视觉现象融合左右视点图像的质量感知特征用于立体图像质量预测,能够提升非对称失真立体图像的评价准确性和鲁棒性。所提取双目差异性特征还能够用于将对称失真与非对称失真立体图像进行有效分类,分类准确性高。;Objective Computer vision-related stereoscopic image quality assessment(SIQA) is focused on recently. It is essential for parameter setting and system optimizing for such domains of multiple stereoscopic image applications like image storage,compression,transmission,and display. Stereoscopic images can be segmented into two sorts of distorted images:symmetrically and asymmetrically distorted,in terms of the degree of degradation between the left and right views. For symmetric-based distorted stereoscopic images,the distortion type and degree occurred in the left and right views are basically in consistency. Early SIQA methods were effective in evaluating symmetrically distorted images by averaging scores or features derived from the two views. However,in practice,the stereoscopic images are often asymmetrically distorted,where the distortion type and level of the two views are different. Simply averaging the quality values of the two views cannot accurately simulate the binocular fusion process and the binocular rivalry phenomena in relevance to the human visual system. Consequently,the evaluation accuracy of these methods will be down to severe lower when the quality of asymmetrically distorted stereoscopic images is estimated. Previous studies have shown that when the left and right views of a stereoscopic image exhibit varying levels or types of distortion,binocular rivalry is primarily driven by one of the views. Specially,in the process of evaluating the quality of a stereoscopic image,the visual quality of one view has a greater impact on the stereopair quality evaluation than the other view. To address this issue,some methods have simulated the binocular rivalry phenomenon in human visual system,and used a weighted average method to fuse the visual information in the two views of stereo-pairs as well. However,existing methods are still challenged for its lower prediction accuracy of asymmetrically distorted images,and its feature extraction process is also time-consuming. To optimize the evaluation accuracy of asymmet","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135102863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-modal representation learning and generation 跨模态表示学习与生成
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.230035
Huafeng Liu, Jingjing Chen, Lin Liang, Bingkun Bao, Zechao Li, Jiaying Liu, Liqiang Nie
: Nowadays , with the booming of multimedia data , the character of multi - source and multi - modality of data has become a challenging problem in multimedia research. Its representation and generation can be as two key factors in cross - modal learning research. Cross - modal representation studies feature learning and information integration methods using
当前,随着多媒体数据的蓬勃发展,数据的多源、多模态特性已成为多媒体研究中的一个难题。它的表示和生成可以作为跨模态学习研究的两个关键因素。跨模态表示研究采用特征学习和信息集成方法
{"title":"Cross-modal representation learning and generation","authors":"Huafeng Liu, Jingjing Chen, Lin Liang, Bingkun Bao, Zechao Li, Jiaying Liu, Liqiang Nie","doi":"10.11834/jig.230035","DOIUrl":"https://doi.org/10.11834/jig.230035","url":null,"abstract":": Nowadays , with the booming of multimedia data , the character of multi - source and multi - modality of data has become a challenging problem in multimedia research. Its representation and generation can be as two key factors in cross - modal learning research. Cross - modal representation studies feature learning and information integration methods using","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74315111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An encoder-decoder based generation model for online handwritten mathematical expressions 基于编码器-解码器的在线手写数学表达式生成模型
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220894
Yang Chen, Du Jun, Xue Mobai, Jianshu Zhang
{"title":"An encoder-decoder based generation model for online handwritten mathematical expressions","authors":"Yang Chen, Du Jun, Xue Mobai, Jianshu Zhang","doi":"10.11834/jig.220894","DOIUrl":"https://doi.org/10.11834/jig.220894","url":null,"abstract":"","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"7 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72377733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-agent path planning based on improved double DQN 基于改进双DQN的多智能体路径规划
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.211239
Zhang Chen, Jiang Wenying, Chen Siyuan, Zhou Wen, Yan Fengting
{"title":"Multi-agent path planning based on improved double DQN","authors":"Zhang Chen, Jiang Wenying, Chen Siyuan, Zhou Wen, Yan Fengting","doi":"10.11834/jig.211239","DOIUrl":"https://doi.org/10.11834/jig.211239","url":null,"abstract":"","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85151530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-label classification of chest X-ray images with pre-trained vision Transformer model 基于预训练视觉Transformer模型的胸部x线图像多标签分类
Q3 Computer Science Pub Date : 2023-01-01 DOI: 10.11834/jig.220284
Xing Suxia, Ju Zihan, Liu Zijiao, Yu Wang, Fan Fuqiang
{"title":"Multi-label classification of chest X-ray images with pre-trained vision Transformer model","authors":"Xing Suxia, Ju Zihan, Liu Zijiao, Yu Wang, Fan Fuqiang","doi":"10.11834/jig.220284","DOIUrl":"https://doi.org/10.11834/jig.220284","url":null,"abstract":"","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81572838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
中国图象图形学报
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1