Sunil G C , Cengiz Koparan , Arjun Upadhyay , Mohammed Raju Ahmed , Yu Zhang , Kirk Howatt , Xin Sun
{"title":"A novel automated cloud-based image datasets for high throughput phenotyping in weed classification","authors":"Sunil G C , Cengiz Koparan , Arjun Upadhyay , Mohammed Raju Ahmed , Yu Zhang , Kirk Howatt , Xin Sun","doi":"10.1016/j.dib.2024.111097","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based weed detection data management involves data acquisition, data labeling, model development, and model evaluation phases. Out of these data management phases, data acquisition and data labeling are labor-intensive and time-consuming steps for building robust models. In addition, low temporal variation of crop and weed in the datasets is one of the limiting factors for effective weed detection model development. This article describes the cloud-based automatic data acquisition system (CADAS) to capture the weed and crop images in fixed time intervals to take plant growth stages into account for weed identification. The CADAS was developed by integrating fifteen digital cameras in the visible spectrum with gphoto2 libraries, external storage, cloud storage, and a computer with Linux operating system. Dataset from CADAS system contain six weed species and eight crop species for weed and crop detection. A dataset of 2000 images per weed and crop species was publicly released. Raw RGB images underwent a cropping process guided by bounding box annotations to generate individual JPG images for crop and weed instances. In addition to cropped image 200 raw images with label files were released publicly. This dataset hold potential for investigating challenges in deep learning-based weed and crop detection in agricultural settings. Additionally, this data could be used by researcher along with field data to boost the model performance by reducing data imbalance problem.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 111097"},"PeriodicalIF":1.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235234092401059X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning-based weed detection data management involves data acquisition, data labeling, model development, and model evaluation phases. Out of these data management phases, data acquisition and data labeling are labor-intensive and time-consuming steps for building robust models. In addition, low temporal variation of crop and weed in the datasets is one of the limiting factors for effective weed detection model development. This article describes the cloud-based automatic data acquisition system (CADAS) to capture the weed and crop images in fixed time intervals to take plant growth stages into account for weed identification. The CADAS was developed by integrating fifteen digital cameras in the visible spectrum with gphoto2 libraries, external storage, cloud storage, and a computer with Linux operating system. Dataset from CADAS system contain six weed species and eight crop species for weed and crop detection. A dataset of 2000 images per weed and crop species was publicly released. Raw RGB images underwent a cropping process guided by bounding box annotations to generate individual JPG images for crop and weed instances. In addition to cropped image 200 raw images with label files were released publicly. This dataset hold potential for investigating challenges in deep learning-based weed and crop detection in agricultural settings. Additionally, this data could be used by researcher along with field data to boost the model performance by reducing data imbalance problem.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.