A two-stage ensemble of diverse models for recognition of abnormal data in raw wind data

K. Hou, D. Xia, Qun Li, Xingwei Xu, Han Yue, Kefei Wang, Lei Chen, Le Zheng
{"title":"A two-stage ensemble of diverse models for recognition of abnormal data in raw wind data","authors":"K. Hou, D. Xia, Qun Li, Xingwei Xu, Han Yue, Kefei Wang, Lei Chen, Le Zheng","doi":"10.1109/APPEEC.2016.7779621","DOIUrl":null,"url":null,"abstract":"Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.","PeriodicalId":117485,"journal":{"name":"2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APPEEC.2016.7779621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
原始风数据异常数据识别的两阶段多模型集成
风能整合研究通常依赖于位于偏远地点的复杂传感器。因此,从包含大量低级数据的数据库生成高级合成信息的程序必须考虑到可能出现的传感器故障和不完善的输入数据。数据挖掘方法被广泛用于识别风电场输出功率与风速之间的关系,这对风电功率预测具有重要意义。不正确和不自然的数据对结果有很大的影响。为了解决这一问题,本文提出了一种经验方法,该方法可以使用不同模型的两阶段集成有效地预处理和过滤原始风数据。首先,从原始风数据中提取异常特征,并根据风电场运行状态记录和典型异常数据特征对数据集进行标注;其次,利用随机森林(RF)和梯度增强决策树(GBDT)建立了两阶段分类模型。在第一阶段,使用标记的数据集作为输入来训练RF分类器。在第二阶段,使用标记的数据集和RF分类结果作为输入来训练GBDT分类器。最后,将两个训练好的模型分别对测试集进行预测,并将RF模型和GBDT模型预测值的平均值作为最终结果。该方法在中国东北某大型风电场的数据上得到了成功的验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Electric Vehicle charging management algorithm for a UK low-voltage residential distribution network An optimization model of EVs charging and discharging for power system demand leveling A circuit approach for the propagation analysis of voltage unbalance emission in power systems A novel high-power AC/AC modular multilevel converter in Y configuration and its control strategy Comprehensive optimization for power system with multiple HVDC infeed
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1