Exploring the effects of training samples on the accuracy of crop mapping with machine learning algorithm

IF 5.7 Q1 ENVIRONMENTAL SCIENCES Science of Remote Sensing Pub Date : 2023-06-01 DOI:10.1016/j.srs.2023.100081
Yangyang Fu , Ruoque Shen , Chaoqing Song , Jie Dong , Wei Han , Tao Ye , Wenping Yuan
{"title":"Exploring the effects of training samples on the accuracy of crop mapping with machine learning algorithm","authors":"Yangyang Fu ,&nbsp;Ruoque Shen ,&nbsp;Chaoqing Song ,&nbsp;Jie Dong ,&nbsp;Wei Han ,&nbsp;Tao Ye ,&nbsp;Wenping Yuan","doi":"10.1016/j.srs.2023.100081","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning algorithms are a frequently used crop classification method and have been applied to identify the distribution of various crops over regional and national scales. Previous studies have underscored that the number of training samples strongly influences the classification accuracy of machine learning algorithms, resulting in extensive training sample collection efforts. This study, taking winter wheat as an example, challenges the above principle by selecting training samples with the time-weighted dynamic time warping (TWDTW) method and finds that the classification accuracy of machine learning algorithms highly relies on the representativeness and proportion of training samples rather than the quantity. With the increase of the representativeness of training samples, i.e. more comprehensively reflected the characteristics of winter wheat, the classification accuracy is continually improved. The best classification accuracy is further achieved when selecting the training samples of winter wheat and non-winter wheat according to the ratio of their statistical areas. On the contrary, only a slight difference was found in overall accuracy (91.26% and 90.74%), producer’s accuracy (86.33% and 86.65%) and user’s accuracy (97.37% and 96.01%) when using 1,000 and 10,000 training samples. Overall, this study demonstrates that the characteristics of training samples have a great impact on the classification accuracy of machine learning algorithms, and the training samples generated by TWDTW method are reliable for crop mapping.</p></div>","PeriodicalId":101147,"journal":{"name":"Science of Remote Sensing","volume":"7 ","pages":"Article 100081"},"PeriodicalIF":5.7000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666017223000068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 1

Abstract

Machine learning algorithms are a frequently used crop classification method and have been applied to identify the distribution of various crops over regional and national scales. Previous studies have underscored that the number of training samples strongly influences the classification accuracy of machine learning algorithms, resulting in extensive training sample collection efforts. This study, taking winter wheat as an example, challenges the above principle by selecting training samples with the time-weighted dynamic time warping (TWDTW) method and finds that the classification accuracy of machine learning algorithms highly relies on the representativeness and proportion of training samples rather than the quantity. With the increase of the representativeness of training samples, i.e. more comprehensively reflected the characteristics of winter wheat, the classification accuracy is continually improved. The best classification accuracy is further achieved when selecting the training samples of winter wheat and non-winter wheat according to the ratio of their statistical areas. On the contrary, only a slight difference was found in overall accuracy (91.26% and 90.74%), producer’s accuracy (86.33% and 86.65%) and user’s accuracy (97.37% and 96.01%) when using 1,000 and 10,000 training samples. Overall, this study demonstrates that the characteristics of training samples have a great impact on the classification accuracy of machine learning algorithms, and the training samples generated by TWDTW method are reliable for crop mapping.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习算法探索训练样本对作物制图精度的影响
机器学习算法是一种常用的作物分类方法,已被应用于识别各种作物在区域和国家尺度上的分布。先前的研究强调,训练样本的数量强烈影响机器学习算法的分类精度,导致了大量的训练样本收集工作。本研究以冬小麦为例,通过用时间加权动态时间扭曲(TWDTW)方法选择训练样本来挑战上述原理,发现机器学习算法的分类精度高度依赖于训练样本的代表性和比例,而不是数量。随着训练样本代表性的增加,即更全面地反映冬小麦的特征,分类精度不断提高。当根据冬小麦和非冬小麦的统计区域比例选择训练样本时,进一步获得了最佳的分类精度。相反,当使用1000和10000个训练样本时,在总体准确率(91.26%和90.74%)、生产者准确率(86.33%和86.65%)和用户准确率(97.37%和96.01%)方面仅发现轻微差异。总之,本研究表明,训练样本的特征对机器学习算法的分类精度有很大影响,TWDTW方法生成的训练样本对于作物映射是可靠的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
12.20
自引率
0.00%
发文量
0
期刊最新文献
Coastal vertical land motion across Southeast Asia derived from combining tide gauge and satellite altimetry observations Identifying thermokarst lakes using deep learning and high-resolution satellite images A two-stage deep learning architecture for detection global coastal and offshore submesoscale ocean eddy using SDGSAT-1 multispectral imagery A comprehensive evaluation of satellite-based and reanalysis soil moisture products over the upper Blue Nile Basin, Ethiopia A comprehensive review of rice mapping from satellite data: Algorithms, product characteristics and consistency assessment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1