Pub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111285
Eliott Lumet, Thomas Jaravel, Mélanie C. Rochoux
<div><div>By 2050, two-thirds of the world's population will live in urban areas under climate change, exacerbating the environmental and public health risks associated with poor air quality and urban heat island effects. Assessing these risks requires the development of microscale meteorological models that quickly and accurately predict wind velocity and pollutant concentration with high resolution, as the heterogeneity of urban environments leads to complex wind patterns and strong pollutant concentration gradients. Computational Fluid Dynamics (CFD) has emerged as a powerful tool to address this challenge by providing obstacle-resolved flow and dispersion predictions. However, CFD models are very expensive and require intensive computing resources, which can hinder their systematic use in practical engineering applications. They are also subject to significant uncertainties, particularly those arising from the mesoscale meteorological forcing and the internal variability of the atmospheric boundary layer, some of which are aleatory and thereby irreducible. Given these issues, the construction of CFD datasets that account for uncertainty would be an interesting avenue of research for microscale atmospheric science.</div><div>In this context, we present the PPMLES (Perturbed-Parameter ensemble of MUST Large-Eddy Simulations) dataset, which consists of 200 large-eddy simulations (LES) characterizing the complex interactions between the turbulent airflow, the tracer dispersion, and an idealized urban environment. These simulations reproduce the canonical MUST dispersion field campaign while perturbing the model's mesoscale meteorological forcing parameters. PPMLES includes time series at human height within the built environment to track wind velocity and pollutant release and dispersion over time. PPMLES also includes complete 3-D fields of first- and second-order temporal statistics of the wind velocity and pollutant concentration, with a sub-metric resolution. The uncertainty of the fields induced by the internal variability of the atmospheric boundary layer is also provided. The computation of PPMLES required significant resources, consuming 6 million CPU core hours, equivalent to the emission of approximately 10 tCO2eq of greenhouse gases. This significant computational effort and associated carbon footprint motivates the sharing of the data generated.</div><div>The added value of the PPMLES dataset is twofold. First, the perturbed-parameter ensemble of LES enables to quantify and understand the effects of the mesoscale meteorological forcing and the internal variability of the atmospheric boundary layer, which has been identified as a major challenge in predicting atmospheric flow and pollutant dispersion in urban environments. Secondly, PPMLES reference data can be used to benchmark models of different levels of complexity, and to extract key information about the physical processes involved to inform more operational modeling approaches,
{"title":"Dataset of microscale atmospheric flow and pollutant concentration large-eddy simulations for varying mesoscale meteorological forcing in an idealized urban environment","authors":"Eliott Lumet, Thomas Jaravel, Mélanie C. Rochoux","doi":"10.1016/j.dib.2025.111285","DOIUrl":"10.1016/j.dib.2025.111285","url":null,"abstract":"<div><div>By 2050, two-thirds of the world's population will live in urban areas under climate change, exacerbating the environmental and public health risks associated with poor air quality and urban heat island effects. Assessing these risks requires the development of microscale meteorological models that quickly and accurately predict wind velocity and pollutant concentration with high resolution, as the heterogeneity of urban environments leads to complex wind patterns and strong pollutant concentration gradients. Computational Fluid Dynamics (CFD) has emerged as a powerful tool to address this challenge by providing obstacle-resolved flow and dispersion predictions. However, CFD models are very expensive and require intensive computing resources, which can hinder their systematic use in practical engineering applications. They are also subject to significant uncertainties, particularly those arising from the mesoscale meteorological forcing and the internal variability of the atmospheric boundary layer, some of which are aleatory and thereby irreducible. Given these issues, the construction of CFD datasets that account for uncertainty would be an interesting avenue of research for microscale atmospheric science.</div><div>In this context, we present the PPMLES (Perturbed-Parameter ensemble of MUST Large-Eddy Simulations) dataset, which consists of 200 large-eddy simulations (LES) characterizing the complex interactions between the turbulent airflow, the tracer dispersion, and an idealized urban environment. These simulations reproduce the canonical MUST dispersion field campaign while perturbing the model's mesoscale meteorological forcing parameters. PPMLES includes time series at human height within the built environment to track wind velocity and pollutant release and dispersion over time. PPMLES also includes complete 3-D fields of first- and second-order temporal statistics of the wind velocity and pollutant concentration, with a sub-metric resolution. The uncertainty of the fields induced by the internal variability of the atmospheric boundary layer is also provided. The computation of PPMLES required significant resources, consuming 6 million CPU core hours, equivalent to the emission of approximately 10 tCO2eq of greenhouse gases. This significant computational effort and associated carbon footprint motivates the sharing of the data generated.</div><div>The added value of the PPMLES dataset is twofold. First, the perturbed-parameter ensemble of LES enables to quantify and understand the effects of the mesoscale meteorological forcing and the internal variability of the atmospheric boundary layer, which has been identified as a major challenge in predicting atmospheric flow and pollutant dispersion in urban environments. Secondly, PPMLES reference data can be used to benchmark models of different levels of complexity, and to extract key information about the physical processes involved to inform more operational modeling approaches, ","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111285"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143131300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111271
Virginia Maß , Pendar Alirezazadeh , Johannes Seidl-Schulz , Matthias Leipnitz , Eric Fritzsche , Rasheed Ali Adam Ibraheem , Martin Geyer , Michael Pflanz , Stefanie Reim
The evaluation of fruit genetic resources regarding a resistance to pathogens is an essential basis for subsequent selection in fruit breeding. Both genetic analysis and phenotyping of defined traits are important tools and provide decision data in the evaluation process. However, the phenotyping of plants is often carried out ‘by hand’ and remains the bottleneck in fruit breeding and fruit growing. The development of a digital and UAV (unmanned aerial vehicle)-based phenotyping method for the assessment of genotype-specific susceptibility or resistance against diseases in orchards would significantly increase the efficiency of plant breeding. In this framework, a workflow for drone-based monitoring of pathogens in orchards was developed using the European pear rust (Gymnosporangium sabinae) as model pathogen. Pear rust is widespread in orchards and causes conspicuous, clearly visible, yellow to orange-colored disease symptoms.
In this paper, we provide a dataset with expert-annotated high-resolution RGB images with pear rust symptoms. For data collection, ten UAV-flight campaigns were realized between 2021 and 2023 under various weather conditions and with different flight parameters in the experimental orchard of the Julius Kühn-Institute for Breeding Research on Fruit Crops in Dresden-Pillnitz (Germany). 1394 images were captured of different pear genotypes, including varieties, wild species and progeny from breeding. The dataset contains manually labelled images with a size of 768 × 768 pixels of leaves infected with pear rust at different stages of development, labelled as class GYMNSA, as well as background images without symptoms. Each leaf with pear rust symptoms was annotated with the drawing method by two points (bounding boxes) using the Computer Vision Annotation Tool (CVAT, v1.1.0) [1] and presented in YOLO 1.1 file format (.txt files). A total of 584 annotated images and 162 background images, organized into a training and validation set, are included in the GYMNSA dataset. This GYMNSA dataset can be used as a resource for researchers and developers working on drone-based plant disease monitoring systems.
{"title":"Annotated image dataset with different stages of European pear rust for UAV-based automated symptom detection in orchards","authors":"Virginia Maß , Pendar Alirezazadeh , Johannes Seidl-Schulz , Matthias Leipnitz , Eric Fritzsche , Rasheed Ali Adam Ibraheem , Martin Geyer , Michael Pflanz , Stefanie Reim","doi":"10.1016/j.dib.2025.111271","DOIUrl":"10.1016/j.dib.2025.111271","url":null,"abstract":"<div><div>The evaluation of fruit genetic resources regarding a resistance to pathogens is an essential basis for subsequent selection in fruit breeding. Both genetic analysis and phenotyping of defined traits are important tools and provide decision data in the evaluation process. However, the phenotyping of plants is often carried out ‘by hand’ and remains the bottleneck in fruit breeding and fruit growing. The development of a digital and UAV (unmanned aerial vehicle)-based phenotyping method for the assessment of genotype-specific susceptibility or resistance against diseases in orchards would significantly increase the efficiency of plant breeding. In this framework, a workflow for drone-based monitoring of pathogens in orchards was developed using the European pear rust (<em>Gymnosporangium sabinae</em>) as model pathogen. Pear rust is widespread in orchards and causes conspicuous, clearly visible, yellow to orange-colored disease symptoms.</div><div>In this paper, we provide a dataset with expert-annotated high-resolution RGB images with pear rust symptoms. For data collection, ten UAV-flight campaigns were realized between 2021 and 2023 under various weather conditions and with different flight parameters in the experimental orchard of the Julius Kühn-Institute for Breeding Research on Fruit Crops in Dresden-Pillnitz (Germany). 1394 images were captured of different pear genotypes, including varieties, wild species and progeny from breeding. The dataset contains manually labelled images with a size of 768 × 768 pixels of leaves infected with pear rust at different stages of development, labelled as class GYMNSA, as well as background images without symptoms. Each leaf with pear rust symptoms was annotated with the drawing method by two points (bounding boxes) using the Computer Vision Annotation Tool (CVAT, v1.1.0) [1] and presented in YOLO 1.1 file format (.txt files). A total of 584 annotated images and 162 background images, organized into a training and validation set, are included in the GYMNSA dataset. This GYMNSA dataset can be used as a resource for researchers and developers working on drone-based plant disease monitoring systems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111271"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An open-source geodatabase and its associate WebGIS platform (CONNECTOSED) were developed to collect and utilize data for the Sediment Flow Connectivity Index (SfCI) for the Apulia region of southern Italy. Maps depicting sediment mobility and connectivity across the hydrographic basins of the Apulia region were generated and stored in the geodatabase. This geodatabase is organized into folders containing data in TIFF, shapefile, Jpeg and Pdf formats, including input variables (digital elevation model, land cover map, rainfall map, and soil units dataset for each hydrographic basin), classification graphs (ranking of variable values), dimensionless index maps (slope, ruggedness, rainfall, land cover, and soil stability) and key products (maps of sediment mobility, SfCI, and applied SfCI). The geodatabase maintains the mapping methodology underlying the SfCI algorithm by integrating various Earth datasets from multiple sources through ArcMap™, QGIS® and Matlab® software. This approach aligns surface characteristics with driving forces to describe the spatial variability of sediment pathways and identify hotspot areas. The availability of both input and processed data enables the computation and continuous updating of this applied geomorphological indicator, which is useful for assessing susceptibility to rapid Earth surface changes related to multi-hazard exposure. The geodatabase and the CONNECTOSED platform are valuable tools for researchers and stakeholders involved in land monitoring. The geodatabase and the CONNECTOSED platform are essential tools for researchers, policymakers, and stakeholders involved in land monitoring and environmental management. These tools provide open access to extensive datasets and detailed descriptions of surface dynamics, establishing connections between the causes and effects of extreme phenomena, such as floods, landslides, fires, soil pollution. This integration allows users to combine various forms of environmental data, a capability that is vital for enhancing scientific knowledge, supporting the development of insights, and fostering more informed, evidence-based decision-making in land use planning, conservation efforts, and sustainability initiatives.
{"title":"Sediment flow connectivity index data for the Apulia region (Italy): An open-source geodatabase and the innovative CONNECTOSED WebGIS platform","authors":"Alok Kushabaha , Domenico Capolongo , Giovanni Scicchitano , Floriana Rizzo , Marina Zingaro","doi":"10.1016/j.dib.2024.111210","DOIUrl":"10.1016/j.dib.2024.111210","url":null,"abstract":"<div><div>An open-source geodatabase and its associate WebGIS platform (CONNECTOSED) were developed to collect and utilize data for the Sediment Flow Connectivity Index (SfCI) for the Apulia region of southern Italy. Maps depicting sediment mobility and connectivity across the hydrographic basins of the Apulia region were generated and stored in the geodatabase. This geodatabase is organized into folders containing data in TIFF, shapefile, Jpeg and Pdf formats, including input variables (digital elevation model, land cover map, rainfall map, and soil units dataset for each hydrographic basin), classification graphs (ranking of variable values), dimensionless index maps (slope, ruggedness, rainfall, land cover, and soil stability) and key products (maps of sediment mobility, SfCI, and applied SfCI). The geodatabase maintains the mapping methodology underlying the SfCI algorithm by integrating various Earth datasets from multiple sources through ArcMap™, QGIS® and Matlab® software. This approach aligns surface characteristics with driving forces to describe the spatial variability of sediment pathways and identify hotspot areas. The availability of both input and processed data enables the computation and continuous updating of this applied geomorphological indicator, which is useful for assessing susceptibility to rapid Earth surface changes related to multi-hazard exposure. The geodatabase and the CONNECTOSED platform are valuable tools for researchers and stakeholders involved in land monitoring. The geodatabase and the CONNECTOSED platform are essential tools for researchers, policymakers, and stakeholders involved in land monitoring and environmental management. These tools provide open access to extensive datasets and detailed descriptions of surface dynamics, establishing connections between the causes and effects of extreme phenomena, such as floods, landslides, fires, soil pollution. This integration allows users to combine various forms of environmental data, a capability that is vital for enhancing scientific knowledge, supporting the development of insights, and fostering more informed, evidence-based decision-making in land use planning, conservation efforts, and sustainability initiatives.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111210"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111196
José E. Barradas-Hernández, Sergio Márquez-Domínguez, Franco Antonio Carpio-Santamaria, Alejandro Vargas-Colorado, Abigail Zamora-Hernández, Roberto Rivera-Baizabal
The data presented here are the result of microtremor measurements at 44 points in three different soil types classified according to their fundamental vibration frequencies, on the metropolitan area of Veracruz-Boca del Río, Mexico. These Data are raw and was obtained using a GÜRALP 6TD model broadband orthogonal triaxial seismometer with an integrated 24-bit digitizer with a minimum recording time of 30 min and a recording rate of 100 samples per second (sps). The microtremor records were used to construct the H/V spectral ratios using the method of Nakamura. These H/V spectral ratios are a good approximation of the transfer function between the vibration waves in the sediment and the rigid stratum. Therefore, they can be used to construct seismic microzonation maps, seismic intensity maps and spectra for designing seismic resistant structures. One-dimensional stratigraphic soil models were obtained by processing the H/V spectral ratios. The relevant data from these models are layer thickness, primary wave velocities (Vp), secondary wave velocities (Vs) and density. These models represent a mathematical approximation of the soil structure that can be used to dynamically classify it according to Mexican technical codes.
这里展示的数据是在墨西哥Veracruz-Boca del Río的大都市区,根据其基本振动频率,在三种不同土壤类型的44个点进行微震测量的结果。这些原始数据是使用GÜRALP 6TD型宽带正交三轴地震仪获得的,该地震仪带有集成的24位数字化仪,最小记录时间为30分钟,记录速率为每秒100个样本(sps)。利用微震记录,采用Nakamura方法构建H/V谱比。这些H/V谱比很好地近似了泥沙和刚性地层中振动波之间的传递函数。因此,它们可用于构造地震微区划图、地震烈度图和抗震结构设计谱。通过对H/V谱比的处理,得到一维地层土壤模型。这些模型的相关数据是层厚、一次波速度(Vp)、二次波速度(Vs)和密度。这些模型代表了土壤结构的数学近似,可用于根据墨西哥技术规范对其进行动态分类。
{"title":"Using microtremor data to obtain dynamic properties of soils in the Veracruz-Boca del Rio metropolitan area","authors":"José E. Barradas-Hernández, Sergio Márquez-Domínguez, Franco Antonio Carpio-Santamaria, Alejandro Vargas-Colorado, Abigail Zamora-Hernández, Roberto Rivera-Baizabal","doi":"10.1016/j.dib.2024.111196","DOIUrl":"10.1016/j.dib.2024.111196","url":null,"abstract":"<div><div>The data presented here are the result of microtremor measurements at 44 points in three different soil types classified according to their fundamental vibration frequencies, on the metropolitan area of Veracruz-Boca del Río, Mexico. These Data are raw and was obtained using a GÜRALP 6TD model broadband orthogonal triaxial seismometer with an integrated 24-bit digitizer with a minimum recording time of 30 min and a recording rate of 100 samples per second (sps). The microtremor records were used to construct the H/V spectral ratios using the method of Nakamura. These H/V spectral ratios are a good approximation of the transfer function between the vibration waves in the sediment and the rigid stratum. Therefore, they can be used to construct seismic microzonation maps, seismic intensity maps and spectra for designing seismic resistant structures. One-dimensional stratigraphic soil models were obtained by processing the H/V spectral ratios. The relevant data from these models are layer thickness, primary wave velocities (Vp), secondary wave velocities (Vs) and density. These models represent a mathematical approximation of the soil structure that can be used to dynamically classify it according to Mexican technical codes.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111196"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11698973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111202
Weslley Lima , Victor Silva , Jasson Silva , Ricardo Lira , Anselmo Paiva
Digital transformation has significantly impacted public procurement, improving operational efficiency, transparency, and competition. This transformation has allowed the automation of data analysis and oversight in public administration. Public procurement involves various stages and generates a multitude of documents. However, experts manually analyze these unstructured textual documents, which are time-consuming and inefficient. To address this issue, we introduce BidCorpus, a novel and comprehensive dataset consisting of thousands of documents related to public procurement, specifically bidding notices from Brazilian public websites. The dataset was labeled using weak supervision techniques, manual labeling, and BERT-based language models. Models trained with these annotated data showed promising results, with metrics greater than 80 % in various experiments. The models could also tolerate intentional changes made to bidding notices to evade fraud detection. All the resources from this work are publicly available, including the documents, pre-processing scripts, and training and evaluation of the models. We expect the dataset and its labels to be of great value to researchers working on public procurement problems.
{"title":"BidCorpus: A multifaceted learning dataset for public procurement","authors":"Weslley Lima , Victor Silva , Jasson Silva , Ricardo Lira , Anselmo Paiva","doi":"10.1016/j.dib.2024.111202","DOIUrl":"10.1016/j.dib.2024.111202","url":null,"abstract":"<div><div>Digital transformation has significantly impacted public procurement, improving operational efficiency, transparency, and competition. This transformation has allowed the automation of data analysis and oversight in public administration. Public procurement involves various stages and generates a multitude of documents. However, experts manually analyze these unstructured textual documents, which are time-consuming and inefficient. To address this issue, we introduce BidCorpus, a novel and comprehensive dataset consisting of thousands of documents related to public procurement, specifically bidding notices from Brazilian public websites. The dataset was labeled using weak supervision techniques, manual labeling, and BERT-based language models. Models trained with these annotated data showed promising results, with metrics greater than 80 % in various experiments. The models could also tolerate intentional changes made to bidding notices to evade fraud detection. All the resources from this work are publicly available, including the documents, pre-processing scripts, and training and evaluation of the models. We expect the dataset and its labels to be of great value to researchers working on public procurement problems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111202"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11715116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142946185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111217
Miguel Cobos
This dataset contains evaluation results from video game-based assessments administered to first-level university students across six different academic programs at Universidad Indoamérica from October 2022 to August 2024. The data were collected using an adapted version of Pacman through the ClassTools.net platform, where traditional quiz questions were integrated into gameplay mechanics. The dataset comprises 1418 assessment attempts from students in Law, Medicine, Psychology, Clinical Psychology, Architecture, and Nursing programs, documenting their performance in digital culture and computing courses. Each record includes attempt number, timestamp, student identifier, gender, academic period, section, career program, and score achieved. The dataset enables analysis of student performance patterns, learning progression through multiple attempts, and comparative studies across different academic programs and periods. This information can support research in educational gamification, assessment design, and digital learning strategies in higher education.
{"title":"Dataset of video game-based assessments in digital culture courses at Indoamerica University","authors":"Miguel Cobos","doi":"10.1016/j.dib.2024.111217","DOIUrl":"10.1016/j.dib.2024.111217","url":null,"abstract":"<div><div>This dataset contains evaluation results from video game-based assessments administered to first-level university students across six different academic programs at Universidad Indoamérica from October 2022 to August 2024. The data were collected using an adapted version of Pacman through the <span><span>ClassTools.net</span><svg><path></path></svg></span> platform, where traditional quiz questions were integrated into gameplay mechanics. The dataset comprises 1418 assessment attempts from students in Law, Medicine, Psychology, Clinical Psychology, Architecture, and Nursing programs, documenting their performance in digital culture and computing courses. Each record includes attempt number, timestamp, student identifier, gender, academic period, section, career program, and score achieved. The dataset enables analysis of student performance patterns, learning progression through multiple attempts, and comparative studies across different academic programs and periods. This information can support research in educational gamification, assessment design, and digital learning strategies in higher education.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111217"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11719281/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111205
Zaid A. El Shair, Samir A. Rawashdeh
In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels — consisting of object classifications, pixel-precise bounding boxes, and unique object IDs — which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.
{"title":"MEVDT: Multi-modal event-based vehicle detection and tracking dataset","authors":"Zaid A. El Shair, Samir A. Rawashdeh","doi":"10.1016/j.dib.2024.111205","DOIUrl":"10.1016/j.dib.2024.111205","url":null,"abstract":"<div><div>In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels — consisting of object classifications, pixel-precise bounding boxes, and unique object IDs — which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111205"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11720431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111244
A K M Fazlul Kobir Siam, Prayma Bishshash, Md. Asraful Sharker Nirob, Sajib Bin Mamun, Md Assaduzzaman, Sheak Rashed Haider Noori
A comprehensive dataset on lemon leaf disease can surely bring a lot of potentials into the development of agricultural research and the improvement of disease management strategies. This dataset was developed from 1354 raw images taken with professional agricultural specialist guidance from July to September 2024 in Charpolisha, Jamalpur, and further enhanced with augmented techniques, adding 9000 images. The augmentation process involves a set of techniques-flipping, rotation, zooming, shifting, adding noise, shearing, and brightening-to increase variety for different lemon leaf condition representations. Each of these images was standardized to 800 × 800 pixels resolution, so that consistency may be maintained among the dataset. All images were labelled in the nine prefixed categories: anthracnose, bacterial blight, citrus canker, curl virus, deficiency leaf, dry leaf, healthy leaf, sooty mould, and spider mites. In the present study, a DenseNet-121 architecture was used, where 20 % of the dataset was kept for validation and the remaining 80 % for training. A trained model with a batch size of 32 was trained for 30 epochs, achieving an accuracy of 98.56 % with augmentation, and 96.19 % without it. The dataset will not only act as a benchmark in developing accurate machine learning models for early disease detection, but it will also contribute to the cause of sustainable lemon cultivation practices by facilitating timely and effective disease management interventions.
{"title":"A comprehensive image dataset for the identification of lemon leaf diseases and computer vision applications","authors":"A K M Fazlul Kobir Siam, Prayma Bishshash, Md. Asraful Sharker Nirob, Sajib Bin Mamun, Md Assaduzzaman, Sheak Rashed Haider Noori","doi":"10.1016/j.dib.2024.111244","DOIUrl":"10.1016/j.dib.2024.111244","url":null,"abstract":"<div><div>A comprehensive dataset on lemon leaf disease can surely bring a lot of potentials into the development of agricultural research and the improvement of disease management strategies. This dataset was developed from 1354 raw images taken with professional agricultural specialist guidance from July to September 2024 in Charpolisha, Jamalpur, and further enhanced with augmented techniques, adding 9000 images. The augmentation process involves a set of techniques-flipping, rotation, zooming, shifting, adding noise, shearing, and brightening-to increase variety for different lemon leaf condition representations. Each of these images was standardized to 800 × 800 pixels resolution, so that consistency may be maintained among the dataset. All images were labelled in the nine prefixed categories: anthracnose, bacterial blight, citrus canker, curl virus, deficiency leaf, dry leaf, healthy leaf, sooty mould, and spider mites. In the present study, a DenseNet-121 architecture was used, where 20 % of the dataset was kept for validation and the remaining 80 % for training. A trained model with a batch size of 32 was trained for 30 epochs, achieving an accuracy of 98.56 % with augmentation, and 96.19 % without it. The dataset will not only act as a benchmark in developing accurate machine learning models for early disease detection, but it will also contribute to the cause of sustainable lemon cultivation practices by facilitating timely and effective disease management interventions<em>.</em></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111244"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11732584/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111230
Nigar Alishzade , Jamaladdin Hasanov
Advancements in sign language processing technology hinge on the availability of extensive, reliable datasets, comprehensive instructions, and adherence to ethical guidelines. To facilitate progress in gesture recognition and translation systems and to support the Azerbaijani sign language community we present the Azerbaijani Sign Language Dataset (AzSLD). This comprehensive dataset was collected from a diverse group of sign language users, encompassing a range of linguistic parameters. Developed within the framework of a vision-based Azerbaijani Sign Language translation project, AzSLD includes recordings of the fingerspelling alphabet, individual words, and sentences. The data acquisition process involved recording signers across various age groups, genders, and proficiency levels to ensure broad representation. Sign language sentences were captured using two cameras from different angles, providing comprehensive visual coverage of each gesture. This approach enables robust training and evaluation of gesture recognition algorithms. The dataset comprises 30,000 meticulously annotated videos, each labeled with precise gesture identifiers and corresponding linguistic translations. To facilitate efficient usage of the dataset, we provide technical instructions and source code for a data loader. Researchers and developers working on sign language recognition, translation, and synthesis systems will find AzSLD invaluable, as it offers a rich repository of labeled data for training and evaluation purposes.
{"title":"AzSLD: Azerbaijani sign language dataset for fingerspelling, word, and sentence translation with baseline software","authors":"Nigar Alishzade , Jamaladdin Hasanov","doi":"10.1016/j.dib.2024.111230","DOIUrl":"10.1016/j.dib.2024.111230","url":null,"abstract":"<div><div>Advancements in sign language processing technology hinge on the availability of extensive, reliable datasets, comprehensive instructions, and adherence to ethical guidelines. To facilitate progress in gesture recognition and translation systems and to support the Azerbaijani sign language community we present the Azerbaijani Sign Language Dataset (AzSLD). This comprehensive dataset was collected from a diverse group of sign language users, encompassing a range of linguistic parameters. Developed within the framework of a vision-based Azerbaijani Sign Language translation project, AzSLD includes recordings of the fingerspelling alphabet, individual words, and sentences. The data acquisition process involved recording signers across various age groups, genders, and proficiency levels to ensure broad representation. Sign language sentences were captured using two cameras from different angles, providing comprehensive visual coverage of each gesture. This approach enables robust training and evaluation of gesture recognition algorithms. The dataset comprises 30,000 meticulously annotated videos, each labeled with precise gesture identifiers and corresponding linguistic translations. To facilitate efficient usage of the dataset, we provide technical instructions and source code for a data loader. Researchers and developers working on sign language recognition, translation, and synthesis systems will find AzSLD invaluable, as it offers a rich repository of labeled data for training and evaluation purposes.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111230"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111236
Juan Herrero, María Tierra, Carmen Castañeda
<div><div>The dataset [<span><span>1</span></span>] hosts pedological info and images of the lands —locally known as <em>chesas</em>— of the outcropping gypsiferous core of the Barbastro-Balaguer anticline (<span><span>Fig. 1</span></span>). It stands out in the landscape for the linear reliefs due to outcrops of dipping strata with differential resistance to erosion, and also because of its whitish color (<span><span>Fig. 2</span></span>) and gypsophilous vegetation. This gypsum outcrop was named in the 19<sup>th</sup> Century [<span><span>2</span></span>] as a gypseous belt, and has been further studied by other geologists like [<span><span>3</span></span>,<span><span>4</span></span>] and by civil engineers e.g. Hué and Llamas [<span><span>5</span></span>]. Traditionally chesas were rangeland, with sparse almond and olive trees and rainfed winter cereals confined at the flat —and often terraced— valley bottoms, or <em>vales</em> as known in NE Spain. The chesas have attracted the attention of botanists [<span><span>[6]</span></span>, <span><span>[7]</span></span>, <span><span>[8]</span></span>], foresters [<span><span>9</span></span>,<span><span>10</span></span>], and soil hydrophysical properties researchers [<span><span>11</span></span>]. Moreover, public interest is increasing as the administrations are establishing rules for nature protection in the gypseous lands, e.g., a demarcation of 137 km<sup>2</sup> set within the chesas was declared a Special Conservation Area “ES2410074 Yesos de Barbastro”, and then protected by the Habitats Directive of European Union. Also, plant physiologists are focusing on the adaptations of plants to gypsum as reviewed by Escudero et al. [<span><span>12</span></span>]. No soil map is available, but according to [<span><span>13</span></span>,<span><span>14</span></span>] the Gypsic Haploxerepts [<span><span>15</span></span>] are dominant. In the absence of a soil map, our dataset can help in the decisions to be made by the authorities, as is the case for water allocation to irrigated estates both in operation and planned, or for authorizations for the spreading of pig slurry.</div><div>The herein presented soil data were collected with the classical techniques of pedological prospection. The dataset [<span><span>1</span></span>] contains the scans in .TIFF format of 150 whole thin sections of the soils, under both plane polarized light (PPL) and cross polarized light (XPL). Moreover, this dataset directs to a freely downloadable book [<span><span>16</span></span>] with the corresponding pedological descriptions, chemical and physical analyses, hydrophysical data, and scanning electron microscope images of the soils, plus micrographs of relevant pedofeatures of thin sections seen under petrographic microscope. The dataset [<span><span>1</span></span>] also presents a .xlsx file with an English translation of all figure captions of [<span><span>16</span></span>], including those of micrographs, and two more .xls
数据集[1]包含了土地的土壤学信息和图像(当地称为chesas),这些土地是barastrol - balaguer背斜露头的石膏岩心(图1)。由于露头的倾斜地层具有不同的抗侵蚀能力,它在景观中脱颖而出,因为它的白色(图2)和石膏植被。这一石膏露头在19世纪被命名为“石膏带”,并被其他地质学家[3,4]和土木工程师(如hu和Llamas bb1)进一步研究。传统上,切萨斯是牧场,稀疏的杏树和橄榄树和雨水喂养的冬季谷物被限制在平坦的——通常是梯田的——山谷底部,或西班牙东北部所知的山谷。这些chesas已经引起了植物学家[[6],[7],[8]]、林业学家[9,10]和土壤水物理性质研究者[bbb]的注意。此外,随着管理部门制定石膏土地自然保护规则,公众的兴趣也在增加,例如,在chesas内划定了137平方公里的边界,被宣布为“ES2410074 Yesos de barbasstro”特别保护区,然后受欧盟栖息地指令保护。此外,植物生理学家正在关注植物对石膏的适应性,正如Escudero等人所回顾的那样。没有土壤地图可用,但根据[13,14],Gypsic Haploxerepts[15]占优势。在没有土壤地图的情况下,我们的数据集可以帮助当局做出决定,就像在运营和计划中的灌溉庄园分配水的情况一样,或者授权猪浆的传播。本文所介绍的土壤资料是用经典的土壤学勘探技术采集的。数据集[1]包含在平面偏振光(PPL)和交叉偏振光(XPL)下的。tiff格式的150个完整的土壤薄片的扫描。此外,该数据集指向免费下载的书籍[16],其中包含相应的土壤学描述,化学和物理分析,水物理数据,土壤的扫描电子显微镜图像,以及在岩石显微镜下看到的薄片的相关土壤特征的显微照片。数据集[1]还提供了一个.xlsx文件,其中包含[16]的所有图片标题的英文翻译,包括那些显微照片,以及另外两个.xlsx文件,其中包含分析数据。所有的数据都可以被自然学家、工程师、技术人员和负责环境法制定和执行的公务员,以及参与公民科学活动的人直接重用。薄片保存在EEAD,并可根据要求在我们的场所进行检查。
{"title":"Soil data from the Barbastro-Balaguer gypsum belt, NE Spain","authors":"Juan Herrero, María Tierra, Carmen Castañeda","doi":"10.1016/j.dib.2024.111236","DOIUrl":"10.1016/j.dib.2024.111236","url":null,"abstract":"<div><div>The dataset [<span><span>1</span></span>] hosts pedological info and images of the lands —locally known as <em>chesas</em>— of the outcropping gypsiferous core of the Barbastro-Balaguer anticline (<span><span>Fig. 1</span></span>). It stands out in the landscape for the linear reliefs due to outcrops of dipping strata with differential resistance to erosion, and also because of its whitish color (<span><span>Fig. 2</span></span>) and gypsophilous vegetation. This gypsum outcrop was named in the 19<sup>th</sup> Century [<span><span>2</span></span>] as a gypseous belt, and has been further studied by other geologists like [<span><span>3</span></span>,<span><span>4</span></span>] and by civil engineers e.g. Hué and Llamas [<span><span>5</span></span>]. Traditionally chesas were rangeland, with sparse almond and olive trees and rainfed winter cereals confined at the flat —and often terraced— valley bottoms, or <em>vales</em> as known in NE Spain. The chesas have attracted the attention of botanists [<span><span>[6]</span></span>, <span><span>[7]</span></span>, <span><span>[8]</span></span>], foresters [<span><span>9</span></span>,<span><span>10</span></span>], and soil hydrophysical properties researchers [<span><span>11</span></span>]. Moreover, public interest is increasing as the administrations are establishing rules for nature protection in the gypseous lands, e.g., a demarcation of 137 km<sup>2</sup> set within the chesas was declared a Special Conservation Area “ES2410074 Yesos de Barbastro”, and then protected by the Habitats Directive of European Union. Also, plant physiologists are focusing on the adaptations of plants to gypsum as reviewed by Escudero et al. [<span><span>12</span></span>]. No soil map is available, but according to [<span><span>13</span></span>,<span><span>14</span></span>] the Gypsic Haploxerepts [<span><span>15</span></span>] are dominant. In the absence of a soil map, our dataset can help in the decisions to be made by the authorities, as is the case for water allocation to irrigated estates both in operation and planned, or for authorizations for the spreading of pig slurry.</div><div>The herein presented soil data were collected with the classical techniques of pedological prospection. The dataset [<span><span>1</span></span>] contains the scans in .TIFF format of 150 whole thin sections of the soils, under both plane polarized light (PPL) and cross polarized light (XPL). Moreover, this dataset directs to a freely downloadable book [<span><span>16</span></span>] with the corresponding pedological descriptions, chemical and physical analyses, hydrophysical data, and scanning electron microscope images of the soils, plus micrographs of relevant pedofeatures of thin sections seen under petrographic microscope. The dataset [<span><span>1</span></span>] also presents a .xlsx file with an English translation of all figure captions of [<span><span>16</span></span>], including those of micrographs, and two more .xls","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111236"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}