首页 > 最新文献

IET Computer Vision最新文献

英文 中文
The following article for this Special Issue was published in a different issue 本期特刊的以下文章发表在另一期上
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-17 DOI: 10.1049/cvi2.12211
{"title":"The following article for this Special Issue was published in a different issue","authors":"","doi":"10.1049/cvi2.12211","DOIUrl":"https://doi.org/10.1049/cvi2.12211","url":null,"abstract":"","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 1","pages":"614"},"PeriodicalIF":1.7,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"57700580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The following article for this Special Issue was published in a different issue 本期特刊的以下文章发表在另一期
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-17 DOI: 10.1049/cvi2.12211

Fan Liu, Feifan Li, Sai Yang. Few-shot classification using Gaussianisation prototypical classifier.

IET Computer Vision 2023 February; 17(1); 62–75. https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129

刘帆,李非凡,赛扬。使用高斯化原型分类器的少量镜头分类。IET计算机视觉2023年2月;17(1);62–75。https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129
{"title":"The following article for this Special Issue was published in a different issue","authors":"","doi":"10.1049/cvi2.12211","DOIUrl":"https://doi.org/10.1049/cvi2.12211","url":null,"abstract":"<p>Fan Liu, Feifan Li, Sai Yang. Few-shot classification using Gaussianisation prototypical classifier.</p><p>IET Computer Vision 2023 February; 17(1); 62–75. https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 5","pages":"614"},"PeriodicalIF":1.7,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50151747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A monocular image depth estimation method based on weighted fusion and point-wise convolution 基于加权融合和点向卷积的单眼图像深度估计方法
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-14 DOI: 10.1049/cvi2.12212
Chen Lei, Liang Zhengyou, Sun Yu

The existing monocular depth estimation methods based on deep learning have difficulty in estimating the depth near the edges of the objects in an image when the depth distance between these objects changes abruptly and decline in accuracy when an image has more noises. Furthermore, these methods consume more hardware resources because they have huge network parameters. To solve these problems, this paper proposes a depth estimation method based on weighted fusion and point-wise convolution. The authors design a maximum-average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point-wise convolution module for processing the fused features derived from the (MAWF) module. The two modules work closely together for three times to perform weighted fusion and point-wise convolution of features of multi-scale from the encoder output, which can better decode the depth information of a scene. Experimental results show that our method achieves state-of-the-art performance on the KITTI dataset with δ1 up to 0.996 and the root mean square error metric down to 8% and has demonstrated the strong generalisation and robustness.

现有的基于深度学习的单目深度估算方法难以估算图像中物体边缘附近的深度,当这些物体之间的深度距离发生突然变化时,深度估算的准确性就会下降。此外,由于这些方法的网络参数巨大,因此会消耗更多的硬件资源。为了解决这些问题,本文提出了一种基于加权融合和点卷积的深度估计方法。作者设计了一个最大平均自适应池化加权融合模块(MAWF),用于融合全局特征和局部特征;还设计了一个连续点式卷积模块,用于处理从(MAWF)模块得到的融合特征。这两个模块三次紧密配合,对编码器输出的多尺度特征进行加权融合和点卷积,从而更好地解码场景的深度信息。实验结果表明,我们的方法在 KITTI 数据集上取得了最先进的性能,δ1 高达 0.996,均方根误差指标低至 8%,并表现出很强的泛化能力和鲁棒性。
{"title":"A monocular image depth estimation method based on weighted fusion and point-wise convolution","authors":"Chen Lei,&nbsp;Liang Zhengyou,&nbsp;Sun Yu","doi":"10.1049/cvi2.12212","DOIUrl":"10.1049/cvi2.12212","url":null,"abstract":"<p>The existing monocular depth estimation methods based on deep learning have difficulty in estimating the depth near the edges of the objects in an image when the depth distance between these objects changes abruptly and decline in accuracy when an image has more noises. Furthermore, these methods consume more hardware resources because they have huge network parameters. To solve these problems, this paper proposes a depth estimation method based on weighted fusion and point-wise convolution. The authors design a maximum-average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point-wise convolution module for processing the fused features derived from the (MAWF) module. The two modules work closely together for three times to perform weighted fusion and point-wise convolution of features of multi-scale from the encoder output, which can better decode the depth information of a scene. Experimental results show that our method achieves state-of-the-art performance on the KITTI dataset with <b><i>δ</i></b><sub>1</sub> up to 0.996 and the root mean square error metric down to 8% and has demonstrated the strong generalisation and robustness.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"1005-1016"},"PeriodicalIF":1.7,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46812260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizable and efficient cross-domain person re-identification model using deep metric learning 基于深度度量学习的可泛化高效跨领域人员再识别模型
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-13 DOI: 10.1049/cvi2.12214
Saba Sadat Faghih Imani, Kazim Fouladi-Ghaleh, Hossein Aghababa

Most of the successful person re-ID models conduct supervised training and need a large number of training data. These models fail to generalise well on unseen unlabelled testing sets. The authors aim to learn a generalisable person re-identification model. The model uses one labelled source dataset and one unlabelled target dataset during training and generalises well on the target testing set. To this end, after a feature extraction by the ResNext-50 network, the authors optimise the model by three loss functions. (a) One loss function is designed to learn the features of the target domain by tuning the distances between target images. Therefore, the trained model will be more robust to overcome the intra-domain variations in the target domain and generalises well on the target testing set. (b) One triplet loss is used which considers both source and target domains and makes the model learn the inter-domain variations between source and target domain as well as the variations in the target domain. (c) Also, one loss function is for supervised learning on the labelled source domain. Extensive experiments on Market1501 and DukeMTMC re-ID show that the model achieves a very competitive performance compared with state-of-the-art models and also it requires an acceptable amount of GPU RAM compared to other successful models.

大多数成功的人物再识别模型都是在监督下进行训练的,需要大量的训练数据。这些模型无法在不可见的无标签测试集上很好地泛化。作者的目标是学习一个可泛化的人物再识别模型。该模型在训练过程中使用一个有标签的源数据集和一个无标签的目标数据集,并能在目标测试集上很好地泛化。为此,在 ResNext-50 网络进行特征提取后,作者通过三个损失函数对模型进行了优化。(a) 其中一个损失函数旨在通过调整目标图像之间的距离来学习目标域的特征。因此,训练出的模型将更加稳健,能够克服目标域中的域内变化,并能在目标测试集上很好地泛化。(b) 使用一个三重损失函数,同时考虑源域和目标域,使模型学习源域和目标域之间的域间变化以及目标域中的变化。(c) 此外,还有一个损失函数用于对标记的源域进行监督学习。在 Market1501 和 DukeMTMC re-ID 上进行的大量实验表明,与最先进的模型相比,该模型的性能极具竞争力,而且与其他成功的模型相比,它所需的 GPU 内存量也是可以接受的。
{"title":"Generalizable and efficient cross-domain person re-identification model using deep metric learning","authors":"Saba Sadat Faghih Imani,&nbsp;Kazim Fouladi-Ghaleh,&nbsp;Hossein Aghababa","doi":"10.1049/cvi2.12214","DOIUrl":"10.1049/cvi2.12214","url":null,"abstract":"<p>Most of the successful person re-ID models conduct supervised training and need a large number of training data. These models fail to generalise well on unseen unlabelled testing sets. The authors aim to learn a generalisable person re-identification model. The model uses one labelled source dataset and one unlabelled target dataset during training and generalises well on the target testing set. To this end, after a feature extraction by the ResNext-50 network, the authors optimise the model by three loss functions. (a) One loss function is designed to learn the features of the target domain by tuning the distances between target images. Therefore, the trained model will be more robust to overcome the intra-domain variations in the target domain and generalises well on the target testing set. (b) One triplet loss is used which considers both source and target domains and makes the model learn the inter-domain variations between source and target domain as well as the variations in the target domain. (c) Also, one loss function is for supervised learning on the labelled source domain. Extensive experiments on Market1501 and DukeMTMC re-ID show that the model achieves a very competitive performance compared with state-of-the-art models and also it requires an acceptable amount of GPU RAM compared to other successful models.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"993-1004"},"PeriodicalIF":1.7,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12214","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41952356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification 更正:跨模态人再识别的整合图、注意网络和多中心约束损失
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-01 DOI: 10.1049/cvi2.12210
{"title":"Erratum: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification","authors":"","doi":"10.1049/cvi2.12210","DOIUrl":"https://doi.org/10.1049/cvi2.12210","url":null,"abstract":"","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"48 1","pages":"722"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"57700469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification 勘误表:跨模态人再识别的集成图注意力网络和多中心约束损失
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-01 DOI: 10.1049/cvi2.12210

The authors wish to bring to the readers' attention the following errors in the article by He, D., et al.: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification [1].

In Funding Information section the funding number for National Natural Science Foundation of China is incorrectly mentioned as 2022KYCX032Z. It should be 62171321.

作者希望提请读者注意何等文章中的以下错误:跨模态人再识别的集成图注意力网络和多中心约束损失[1]。在资助信息部分,国家自然科学基金的资助编号被错误地提到为2022KYCX032Z。应该是62171321。
{"title":"Erratum: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification","authors":"","doi":"10.1049/cvi2.12210","DOIUrl":"https://doi.org/10.1049/cvi2.12210","url":null,"abstract":"<p>The authors wish to bring to the readers' attention the following errors in the article by He, D., et al.: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification [<span>1</span>].</p><p>In Funding Information section the funding number for National Natural Science Foundation of China is incorrectly mentioned as 2022KYCX032Z. It should be 62171321.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 6","pages":"722"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12210","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50127724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sketch face recognition based on light semantic Transformer network 基于光语义变换器网络的素描人脸识别
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-30 DOI: 10.1049/cvi2.12209
Lin Cao, Jianqiang Yin, Yanan Guo, Kangning Du, Fan Zhang

Sketch face recognition has a wide range of applications in criminal investigation, but it remains a challenging task due to the small-scale sample and the semantic deficiencies caused by cross-modality differences. The authors propose a light semantic Transformer network to extract and model the semantic information of cross-modality images. First, the authors employ a meta-learning training strategy to obtain task-related training samples to solve the small sample problem. Then to solve the contradiction between the high complexity of the Transformer and the small sample problem of sketch face recognition, the authors build the light semantic transformer network by proposing a hierarchical group linear transformation and introducing parameter sharing, which can extract highly discriminative semantic features on small–scale datasets. Finally, the authors propose a domain-adaptive focal loss to reduce the cross-modality differences between sketches and photos and improve the training effect of the light semantic Transformer network. Extensive experiments have shown that the features extracted by the proposed method have significant discriminative effects. The authors’ method improves the recognition rate by 7.6% on the UoM-SGFSv2 dataset, and the recognition rate reaches 92.59% on the CUFSF dataset.

素描人脸识别在刑事侦查中有着广泛的应用,但由于样本规模小以及跨模态差异造成的语义缺陷,它仍然是一项具有挑战性的任务。作者提出了一种轻语义变换器网络来提取跨模态图像的语义信息并建立模型。首先,作者采用元学习训练策略获取与任务相关的训练样本,以解决小样本问题。然后,为了解决变换器的高复杂性与素描人脸识别的小样本问题之间的矛盾,作者通过提出分层群线性变换并引入参数共享,构建了轻语义变换器网络,该网络可以在小规模数据集上提取高辨别度的语义特征。最后,作者提出了一种域自适应焦点损失,以减少草图和照片之间的跨模态差异,提高光语义变换器网络的训练效果。大量实验表明,所提方法提取的特征具有显著的识别效果。作者的方法在 UoM-SGFSv2 数据集上的识别率提高了 7.6%,在 CUFSF 数据集上的识别率达到了 92.59%。
{"title":"Sketch face recognition based on light semantic Transformer network","authors":"Lin Cao,&nbsp;Jianqiang Yin,&nbsp;Yanan Guo,&nbsp;Kangning Du,&nbsp;Fan Zhang","doi":"10.1049/cvi2.12209","DOIUrl":"10.1049/cvi2.12209","url":null,"abstract":"<p>Sketch face recognition has a wide range of applications in criminal investigation, but it remains a challenging task due to the small-scale sample and the semantic deficiencies caused by cross-modality differences. The authors propose a light semantic Transformer network to extract and model the semantic information of cross-modality images. First, the authors employ a meta-learning training strategy to obtain task-related training samples to solve the small sample problem. Then to solve the contradiction between the high complexity of the Transformer and the small sample problem of sketch face recognition, the authors build the light semantic transformer network by proposing a hierarchical group linear transformation and introducing parameter sharing, which can extract highly discriminative semantic features on small–scale datasets. Finally, the authors propose a domain-adaptive focal loss to reduce the cross-modality differences between sketches and photos and improve the training effect of the light semantic Transformer network. Extensive experiments have shown that the features extracted by the proposed method have significant discriminative effects. The authors’ method improves the recognition rate by 7.6% on the UoM-SGFSv2 dataset, and the recognition rate reaches 92.59% on the CUFSF dataset.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"962-976"},"PeriodicalIF":1.7,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12209","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135641694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic deformable transformer for end-to-end face alignment 用于端对端面对齐的动态可变形变压器
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-30 DOI: 10.1049/cvi2.12208
Liming Han, Chi Yang, Qing Li, Bin Yao, Zixian Jiao, Qianyang Xie

Heatmap-based regression (HBR) methods have dominated for a long time in the face alignment field while these methods need complex design and post-processing. In this study, the authors propose an end-to-end and simple enough coordinate-based regression (CBR) method called Dynamic Deformable Transformer (DDT) for face alignment. Unlike general pre-defined landmark queries, DDT uses Dynamic Landmark Queries (DLQs) to query landmarks' classes and coordinates together. Besides, DDT adopts a deformable attention mechanism rather than a regular attention mechanism which has a faster convergence speed and lower computational complexity. Experiment results on three mainstream datasets 300W, WFLW, and COFW demonstrate DDT exceeds the state-of-the-art CBR methods by a large margin and is comparable to the current state-of-the-art HBR methods with much less computational complexity.

基于热图的回归(HBR)方法在人脸配准领域长期占据主导地位,但这些方法需要复杂的设计和后处理。在这项研究中,作者提出了一种端到端且足够简单的基于坐标的回归(CBR)方法,称为动态可变形变换器(DDT),用于人脸配准。与一般的预定义地标查询不同,DDT 使用动态地标查询(DLQ)来同时查询地标的类别和坐标。此外,DDT 采用的是可变形关注机制而非普通关注机制,收敛速度更快,计算复杂度更低。在 300W、WFLW 和 COFW 三个主流数据集上的实验结果表明,DDT 远远超过了最先进的 CBR 方法,并可与目前最先进的 HBR 方法相媲美,而且计算复杂度更低。
{"title":"Dynamic deformable transformer for end-to-end face alignment","authors":"Liming Han,&nbsp;Chi Yang,&nbsp;Qing Li,&nbsp;Bin Yao,&nbsp;Zixian Jiao,&nbsp;Qianyang Xie","doi":"10.1049/cvi2.12208","DOIUrl":"10.1049/cvi2.12208","url":null,"abstract":"<p>Heatmap-based regression (HBR) methods have dominated for a long time in the face alignment field while these methods need complex design and post-processing. In this study, the authors propose an end-to-end and simple enough coordinate-based regression (CBR) method called Dynamic Deformable Transformer (DDT) for face alignment. Unlike general pre-defined landmark queries, DDT uses Dynamic Landmark Queries (DLQs) to query landmarks' classes and coordinates together. Besides, DDT adopts a deformable attention mechanism rather than a regular attention mechanism which has a faster convergence speed and lower computational complexity. Experiment results on three mainstream datasets 300W, WFLW, and COFW demonstrate DDT exceeds the state-of-the-art CBR methods by a large margin and is comparable to the current state-of-the-art HBR methods with much less computational complexity.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"948-961"},"PeriodicalIF":1.7,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12208","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135642037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image feature learning combined with attention-based spectral representation for spatio-temporal photovoltaic power prediction 结合基于注意力的光谱表示的图像特征学习用于光伏发电功率的时空预测
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-22 DOI: 10.1049/cvi2.12199
Xingchen Guo, Jing Lai, Zhou Zheng, Chenxiang Lin, Yuxing Dai, Xuexin Xu, Haisheng San, Rong Jia, Zhihong Zhang

Clean energy is a major trend. The importance of photovoltaic power generation is also growing. Photovoltaic power generation is mainly affected by the weather. It is full of uncertainties. Previous work has relied chiefly on historical photovoltaics data for time series forecasts. However, unforeseen weather conditions can sometimes skew. Consequently, a spatial-temporal-meteorological-long short-term memory prediction model (STM-LSTM) is proposed to compensate for the shortage of photovoltaic prediction models for uncertainties. This model can simultaneously process satellite image data, historical meteorological data, and historical power generation data. In this way, historical patterns and meteorological change information are extracted to improve the accuracy of photovoltaic prediction. STM-LSTM processes raw satellite data to obtain cloud image data. It can extract cloud motion information using the dense optical flow method. First, the cloud images are processed to extract cloud position information. By adaptive attentive learning of images in different bands, a better representation for subsequent tasks can be obtained. Second, it is important to process historical meteorological data to learn meteorological change patterns. Last but not least, the historical photovoltaic power generation sequences are combined to obtain the final photovoltaic prediction results. After a series of experimental validation, the performance of the proposed STM-LSTM model has a good improvement compared with the baseline model.

清洁能源是一大趋势。光伏发电的重要性也越来越大。光伏发电主要受天气影响。它充满了不确定性。以前的工作主要依靠历史光伏数据进行时间序列预测。然而,不可预见的天气条件有时会产生偏差。为此,提出了一种时空-气象-长短期记忆预测模型(STM-LSTM)来弥补光伏预测模型在不确定性方面的不足。该模型可以同时处理卫星图像数据、历史气象数据和历史发电数据。通过提取历史模式和气象变化信息,提高光伏预测精度。STM-LSTM对卫星原始数据进行处理,得到云图数据。采用密集光流法提取云的运动信息。首先,对云图进行处理,提取云的位置信息。通过对不同波段的图像进行自适应关注学习,可以获得对后续任务更好的表征。二是对历史气象资料进行处理,学习气象变化模式。最后,将历史光伏发电序列进行组合,得到最终的光伏预测结果。经过一系列的实验验证,与基线模型相比,所提出的STM-LSTM模型的性能有较好的提高。
{"title":"Image feature learning combined with attention-based spectral representation for spatio-temporal photovoltaic power prediction","authors":"Xingchen Guo,&nbsp;Jing Lai,&nbsp;Zhou Zheng,&nbsp;Chenxiang Lin,&nbsp;Yuxing Dai,&nbsp;Xuexin Xu,&nbsp;Haisheng San,&nbsp;Rong Jia,&nbsp;Zhihong Zhang","doi":"10.1049/cvi2.12199","DOIUrl":"10.1049/cvi2.12199","url":null,"abstract":"<p>Clean energy is a major trend. The importance of photovoltaic power generation is also growing. Photovoltaic power generation is mainly affected by the weather. It is full of uncertainties. Previous work has relied chiefly on historical photovoltaics data for time series forecasts. However, unforeseen weather conditions can sometimes skew. Consequently, a spatial-temporal-meteorological-long short-term memory prediction model (STM-LSTM) is proposed to compensate for the shortage of photovoltaic prediction models for uncertainties. This model can simultaneously process satellite image data, historical meteorological data, and historical power generation data. In this way, historical patterns and meteorological change information are extracted to improve the accuracy of photovoltaic prediction. STM-LSTM processes raw satellite data to obtain cloud image data. It can extract cloud motion information using the dense optical flow method. First, the cloud images are processed to extract cloud position information. By adaptive attentive learning of images in different bands, a better representation for subsequent tasks can be obtained. Second, it is important to process historical meteorological data to learn meteorological change patterns. Last but not least, the historical photovoltaic power generation sequences are combined to obtain the final photovoltaic prediction results. After a series of experimental validation, the performance of the proposed STM-LSTM model has a good improvement compared with the baseline model.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 7","pages":"777-794"},"PeriodicalIF":1.7,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12199","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45056274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copy-paste with self-adaptation: A self-adaptive adjustment method based on copy-paste augmentation 自适应复制粘贴:一种基于复制粘贴增强的自适应调整方法
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-18 DOI: 10.1049/cvi2.12207
Xiaoyu Yu, Fuchao Li, Pengfei Bai, Yan Liu, Yinglu Chen

Data augmentation diversifies the information in the dataset. For class imbalance, the copy-paste augmentation generates new class information to alleviate the impact of this problem. However, these methods rely excessively on human intuition. Over-fitting or under-fitting can occur while adding the class information, which is inappropriate. The authors propose a self-adaptive data augmentation: the copy-paste with self-adaptation (CPA) algorithm, which improves the phenomenon of over-fitting and under-fitting. For the CPA, the evaluation results of a model are taken as an important adjustment basis. The evaluation results are combined with the information of class imbalance to generate a set of class weights. Different number of class information will be replenished according to class weights. Finally, the generated images will be inserted into the training dataset and the model will start formal training. The experimental results show that CPA can alleviate class imbalance. For TT100 K dataset, YOLOv3 is trained with the optimised dataset and its AP is increased by 2% for VOC2007 dataset, the mAP of RetinaNet on optimised dataset is 78.46, which is 1.2% higher than original dataset. For COCO2017 dataset, SSD300 is trained with the optimised dataset and its AP is increased by 1.3%.

数据扩增可使数据集中的信息多样化。对于类不平衡问题,复制粘贴扩增法可以生成新的类信息,从而减轻这一问题的影响。然而,这些方法过分依赖人的直觉。在添加类信息时,可能会出现过拟合或欠拟合的情况,这是不合适的。作者提出了一种自适应数据增强方法:具有自适应功能的复制粘贴(CPA)算法,它能改善过拟合和欠拟合现象。在 CPA 算法中,模型的评估结果是重要的调整依据。评估结果与类不平衡信息相结合,生成一组类权重。根据类权重补充不同数量的类信息。最后,将生成的图像插入训练数据集,模型开始正式训练。实验结果表明,CPA 可以缓解类不平衡问题。对于 TT100 K 数据集,YOLOv3 使用优化后的数据集进行训练,其 AP 提高了 2%;对于 VOC2007 数据集,RetinaNet 在优化后数据集上的 mAP 为 78.46,比原始数据集提高了 1.2%。对于 COCO2017 数据集,使用优化数据集训练 SSD300,其 AP 提高了 1.3%。
{"title":"Copy-paste with self-adaptation: A self-adaptive adjustment method based on copy-paste augmentation","authors":"Xiaoyu Yu,&nbsp;Fuchao Li,&nbsp;Pengfei Bai,&nbsp;Yan Liu,&nbsp;Yinglu Chen","doi":"10.1049/cvi2.12207","DOIUrl":"10.1049/cvi2.12207","url":null,"abstract":"<p>Data augmentation diversifies the information in the dataset. For class imbalance, the copy-paste augmentation generates new class information to alleviate the impact of this problem. However, these methods rely excessively on human intuition. Over-fitting or under-fitting can occur while adding the class information, which is inappropriate. The authors propose a self-adaptive data augmentation: the copy-paste with self-adaptation (CPA) algorithm, which improves the phenomenon of over-fitting and under-fitting. For the CPA, the evaluation results of a model are taken as an important adjustment basis. The evaluation results are combined with the information of class imbalance to generate a set of class weights. Different number of class information will be replenished according to class weights. Finally, the generated images will be inserted into the training dataset and the model will start formal training. The experimental results show that CPA can alleviate class imbalance. For TT100 K dataset, YOLOv3 is trained with the optimised dataset and its AP is increased by 2% for VOC2007 dataset, the mAP of RetinaNet on optimised dataset is 78.46, which is 1.2% higher than original dataset. For COCO2017 dataset, SSD300 is trained with the optimised dataset and its AP is increased by 1.3%.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"936-947"},"PeriodicalIF":1.7,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12207","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42450287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
IET Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1