Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock

IF 3 2区 农林科学 Q1 FORESTRY Forestry Pub Date : 2020-10-03 DOI:10.1093/forestry/cpaa034
D. N. Cosenza, L. Korhonen, M. Maltamo, P. Packalen, Jacob L. Strunk, E. Næsset, T. Gobakken, P. Soares, M. Tomé
{"title":"Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock","authors":"D. N. Cosenza, L. Korhonen, M. Maltamo, P. Packalen, Jacob L. Strunk, E. Næsset, T. Gobakken, P. Soares, M. Tomé","doi":"10.1093/forestry/cpaa034","DOIUrl":null,"url":null,"abstract":"\n In this study, for five sites around the world, we look at the effects of different model types and variable selection approaches on forest yield modelling performances in an area-based approach (ABA). We compared ordinary least squares regression (OLS), k-nearest neighbours (kNN) and random forest (RF). Our objective was to test if there are systematic differences in accuracy between OLS, kNN and RF in ABA predictions of growing stock volume. The analyses are based on a 5-fold cross-validation at five study sites: an eucalyptus plantation, a temperate forest and three different boreal forests. Two completely independent validation datasets were also available for two of the boreal sites. For the kNN, we evaluated multiple measures of distance including Euclidean, Mahalanobis, most similar neighbour (MSN) and an RF-based distance metric. The variable selection approaches we examined included a heuristic approach (for OLS, kNN and RF), exhaustive search among all combinations (OLS only) and all variables together (RF only). Performances varied by model type and variable selection approaches among sites. OLS and RF had similar accuracies and were more efficient than any of the kNN variants. Variable selection did not affect RF performance. Heuristic and exhaustive variable selection performed similarly for OLS. kNN fared the poorest amongst model types, and kNN with RF distance was prone to overfitting when compared with a validation dataset. Additional caution is therefore required when building kNN models for volume prediction though ABA, being preferable instead to opt for models based on OLS with some variable selection, or RF with all variables together.","PeriodicalId":12342,"journal":{"name":"Forestry","volume":"72 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2020-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forestry","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1093/forestry/cpaa034","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FORESTRY","Score":null,"Total":0}
引用次数: 22

Abstract

In this study, for five sites around the world, we look at the effects of different model types and variable selection approaches on forest yield modelling performances in an area-based approach (ABA). We compared ordinary least squares regression (OLS), k-nearest neighbours (kNN) and random forest (RF). Our objective was to test if there are systematic differences in accuracy between OLS, kNN and RF in ABA predictions of growing stock volume. The analyses are based on a 5-fold cross-validation at five study sites: an eucalyptus plantation, a temperate forest and three different boreal forests. Two completely independent validation datasets were also available for two of the boreal sites. For the kNN, we evaluated multiple measures of distance including Euclidean, Mahalanobis, most similar neighbour (MSN) and an RF-based distance metric. The variable selection approaches we examined included a heuristic approach (for OLS, kNN and RF), exhaustive search among all combinations (OLS only) and all variables together (RF only). Performances varied by model type and variable selection approaches among sites. OLS and RF had similar accuracies and were more efficient than any of the kNN variants. Variable selection did not affect RF performance. Heuristic and exhaustive variable selection performed similarly for OLS. kNN fared the poorest amongst model types, and kNN with RF distance was prone to overfitting when compared with a validation dataset. Additional caution is therefore required when building kNN models for volume prediction though ABA, being preferable instead to opt for models based on OLS with some variable selection, or RF with all variables together.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
线性回归、k近邻和随机森林方法在机载激光扫描预测生长量中的比较
在这项研究中,我们在世界各地的五个地点研究了基于区域的方法(ABA)中不同模型类型和变量选择方法对森林产量建模性能的影响。我们比较了普通最小二乘回归(OLS)、k近邻(kNN)和随机森林(RF)。我们的目的是检验OLS、kNN和RF在ABA预测存储量增长方面是否存在系统性的准确性差异。分析是基于5个研究地点的5倍交叉验证:一个桉树种植园,一个温带森林和三个不同的北方森林。两个完全独立的验证数据集也可用于两个北方站点。对于kNN,我们评估了多种距离度量,包括欧几里得、马氏、最相似邻居(MSN)和基于射频的距离度量。我们研究的变量选择方法包括启发式方法(用于OLS, kNN和RF),在所有组合(仅OLS)和所有变量一起(仅RF)中进行穷举搜索。性能因模型类型和不同地点的变量选择方法而异。OLS和RF具有相似的准确性,并且比任何kNN变体都更有效。变量选择不影响射频性能。启发式和穷举式变量选择对OLS执行类似的操作。kNN在模型类型中表现最差,与验证数据集相比,具有RF距离的kNN容易过拟合。因此,在通过ABA构建用于体积预测的kNN模型时,需要额外的谨慎,最好是选择基于OLS的一些变量选择,或者所有变量一起的RF模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Forestry
Forestry 农林科学-林学
CiteScore
6.70
自引率
7.10%
发文量
47
审稿时长
12-24 weeks
期刊介绍: The journal is inclusive of all subjects, geographical zones and study locations, including trees in urban environments, plantations and natural forests. We welcome papers that consider economic, environmental and social factors and, in particular, studies that take an integrated approach to sustainable management. In considering suitability for publication, attention is given to the originality of contributions and their likely impact on policy and practice, as well as their contribution to the development of knowledge. Special Issues - each year one edition of Forestry will be a Special Issue and will focus on one subject in detail; this will usually be by publication of the proceedings of an international meeting.
期刊最新文献
Testing treecbh in Central European forests: an R package for crown base height detection using high-resolution aerial laser-scanned data Impact of coarse woody debris on habitat use of two sympatric rodent species in the temperate Białowieża Forest Current understanding and future prospects for ash dieback disease with a focus on Britain Comparison of population genetic structure of Pinus mugo Turra forest stands in the Giant Mountains by analysis of nSSR molecular marker data Managing harvesting residues: a systematic review of management treatments around the world
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1