Pub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111305
Ioannis Sifnaios, Simon Furbo, Adam R. Jensen
The pit thermal energy storage (PTES) in Høje Taastrup, Denmark, was the first large-scale PTES to be operated as a short-term storage (storage cycle of 1-2 weeks). The storage was connected to the Copenhagen district heating grid and started operating in February 2023. In addition to the unique use case, the storage represents the state-of-the-art PTES system, featuring an innovative lid construction and a custom-developed polymer liner. Monitoring data of the storage operation are provided freely for 2024, including measurements of the storage water temperature, charged/discharged energy, diffuser flow rates and temperatures, lid heat flux, humidity, and temperatures, soil temperature, and ambient conditions. The dataset can be used to assess the storage performance and, more importantly, validate simulation models, which has not been done for short-term PTES systems. The data is freely available on GitHub.
{"title":"Monitoring data of the Høje Taastrup water pit thermal energy storage","authors":"Ioannis Sifnaios, Simon Furbo, Adam R. Jensen","doi":"10.1016/j.dib.2025.111305","DOIUrl":"10.1016/j.dib.2025.111305","url":null,"abstract":"<div><div>The pit thermal energy storage (PTES) in Høje Taastrup, Denmark, was the first large-scale PTES to be operated as a short-term storage (storage cycle of 1-2 weeks). The storage was connected to the Copenhagen district heating grid and started operating in February 2023. In addition to the unique use case, the storage represents the state-of-the-art PTES system, featuring an innovative lid construction and a custom-developed polymer liner. Monitoring data of the storage operation are provided freely for 2024, including measurements of the storage water temperature, charged/discharged energy, diffuser flow rates and temperatures, lid heat flux, humidity, and temperatures, soil temperature, and ambient conditions. The dataset can be used to assess the storage performance and, more importantly, validate simulation models, which has not been done for short-term PTES systems. The data is freely available on GitHub.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111305"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143131302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111190
Rose Nakasi , Joyce Nakatumba Nabende , Jeremy Francis Tusubira , Aloyzius Lubowa Bamundaga , Alfred Andama
Malaria is a major public health challenge in sub-Saharan Africa. Timely and accurate diagnosis of malaria is vital to reduce the caseload and mortality rates associated with malaria. The use of microscopy in malaria screening is the gold standard recommended method by the World Health Organisation (WHO). In Uganda, utilization of microscopy is challenged by insufficient expertise to interpret the images accurately, affecting the efficiency, effectiveness and accuracy of malaria detection and diagnosis. We present a benchmark dataset of thick and thin blood smear images for automatic malaria screening in Uganda. Mobile Microscopy data was collected from Mulago Hospital, Department of Internal Medicine, Makerere University and Kiruddu National Referral Hospital in Uganda. The labelled image data can be used to build computational models implemented with convolution neural networks. The dataset has 3000 labelled thick blood smear images and 1000 labelled thin blood smear images. The datasets will support robust and accurate deep learning models for malaria diagnosis using thick and thin blood smear images with reasonable detection accuracies.
{"title":"A dataset of blood slide images for AI-based diagnosis of malaria","authors":"Rose Nakasi , Joyce Nakatumba Nabende , Jeremy Francis Tusubira , Aloyzius Lubowa Bamundaga , Alfred Andama","doi":"10.1016/j.dib.2024.111190","DOIUrl":"10.1016/j.dib.2024.111190","url":null,"abstract":"<div><div>Malaria is a major public health challenge in sub-Saharan Africa. Timely and accurate diagnosis of malaria is vital to reduce the caseload and mortality rates associated with malaria<strong>.</strong> The use of microscopy in malaria screening is the gold standard recommended method by the World Health Organisation (WHO). In Uganda, utilization of microscopy is challenged by insufficient expertise to interpret the images accurately, affecting the efficiency, effectiveness and accuracy of malaria detection and diagnosis. We present a benchmark dataset of thick and thin blood smear images for automatic malaria screening in Uganda. Mobile Microscopy data was collected from Mulago Hospital, Department of Internal Medicine, Makerere University and Kiruddu National Referral Hospital in Uganda. The labelled image data can be used to build computational models implemented with convolution neural networks. The dataset has 3000 labelled thick blood smear images and 1000 labelled thin blood smear images. The datasets will support robust and accurate deep learning models for malaria diagnosis using thick and thin blood smear images with reasonable detection accuracies.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111190"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11719325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111164
M. Torres-Miralles , P. Jeanneret , M. Lamminen , F. Joly , B. Dumont , H. Tuomisto , I. Herzon
High Nature Value (HNV) farming systems occur in areas where the major land use is agriculture and are characterized by their significance in promoting biodiversity and ecosystem services due to their extensive land use. Despite their importance for ecological and socio-economic resilience of rural regions, these systems are often overlooked in Life Cycle Assessment (LCA) studies due to challenges in data compilation, especially from small local farms and because of the diversity of production. To address this gap, we established an international collaborative network across Europe, involving professionals directly engaged with farmers, farmer associations, and researchers to collect data on HNV farms employing a developed questionnaire examining inputs and outputs, farm structures, and herd characteristics. Our dataset includes 41 farms and covers five European countries—Spain, France, Greece, Estonia, and Finland—spanning three bioregions of Mediterranean, Atlantic, and Boreal. Data, anonymised and integrated into a matrix, focus on such environmental impact indicators as greenhouse gas emissions (GHGs), biodiversity, land and water use, and fossil resource scarcity. We applied LCA using analytical tools such as the European Carbon Calculator (Joint Research Centre of the European Commission), OpenLCA 10.4., and the SALCA-BD expert system. Additionally, we utilised the LCA inventory Agribalyse 3.0 database to estimate the environmental footprint of four pivotal HNV products: goat cheese, cow milk, lamb, and beef. The main outcome is a unique and novel dataset for HNV farming systems, addressing critical gaps in available information. Our primary objective is to facilitate further investigations, empowering other researchers to expand and enhance their understanding of the environmental impact associated with HNV farming systems, drawing attention to a potential role of HNV farming systems in transitioning towards a more sustainable food production and consumption.
{"title":"High nature value farming systems in Europe: A dataset encompassing the environmental impact assessment of farms and extensive ruminant food products","authors":"M. Torres-Miralles , P. Jeanneret , M. Lamminen , F. Joly , B. Dumont , H. Tuomisto , I. Herzon","doi":"10.1016/j.dib.2024.111164","DOIUrl":"10.1016/j.dib.2024.111164","url":null,"abstract":"<div><div>High Nature Value (HNV) farming systems occur in areas where the major land use is agriculture and are characterized by their significance in promoting biodiversity and ecosystem services due to their extensive land use. Despite their importance for ecological and socio-economic resilience of rural regions, these systems are often overlooked in Life Cycle Assessment (LCA) studies due to challenges in data compilation, especially from small local farms and because of the diversity of production. To address this gap, we established an international collaborative network across Europe, involving professionals directly engaged with farmers, farmer associations, and researchers to collect data on HNV farms employing a developed questionnaire examining inputs and outputs, farm structures, and herd characteristics. Our dataset includes 41 farms and covers five European countries—Spain, France, Greece, Estonia, and Finland—spanning three bioregions of Mediterranean, Atlantic, and Boreal. Data, anonymised and integrated into a matrix, focus on such environmental impact indicators as greenhouse gas emissions (GHGs), biodiversity, land and water use, and fossil resource scarcity. We applied LCA using analytical tools such as the European Carbon Calculator (Joint Research Centre of the European Commission), OpenLCA 10.4., and the SALCA-BD expert system. Additionally, we utilised the LCA inventory Agribalyse 3.0 database to estimate the environmental footprint of four pivotal HNV products: goat cheese, cow milk, lamb, and beef. The main outcome is a unique and novel dataset for HNV farming systems, addressing critical gaps in available information. Our primary objective is to facilitate further investigations, empowering other researchers to expand and enhance their understanding of the environmental impact associated with HNV farming systems, drawing attention to a potential role of HNV farming systems in transitioning towards a more sustainable food production and consumption.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111164"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11719277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to determine the nature of the original chemical precipitate and the ocean during the emplacement of BIF in the northern part of the Congo Craton data from the Njweng prospect anomaly was utilized. During field mapping and ground-truthing of this anomaly, samples were collected and the geochemical data of 26 representative samples constitute the core of this contribution. The selected samples for geochemical studies represented the various facies: oxide facies BIF composing of magnetite, hematite/martite, quartz; the silicate facies BIF composing of quartz, magnetite, hematite/martite and goethite. The ore (≥75 wt.% Fe) is a hematite-martite-goethite assemblage. The BIF show a characteristic banding and no association with volcanic activity. Chemical analysis was accomplished using a combination of the Inductively Coupled Plasma Emission Spectrometry (ICP-ES) and Inductively Coupled Plasma Mass Spectrometry (ICP-MS) on the major and trace elements respectively. The BIF have Fe2O3(T) content that ranges from 47.2 to 88.5 wt% and SiO2 from 1.87 to 49.28 wt% with the low silica content in the ore. The BIF show average SiO2 content of 41.20 wt% and 53.9 wt% Fe2O3. Both the oxide and silicate facies BIF samples show modern seawater characteristics (depletion in LREEY relative to HREEY, positive Eu(Eu/Eu*), Y anomalies and super chondritic Y/Ho ratios). The BIF reveals low Al2O3, TiO2, Na2O, K2O, HFSE and Nd concentrations. The average concentration of V, Ni, and Cu in the BIF is low.
{"title":"Data on compositional diversity of textural varieties of lake superior-type Banded Iron Formation (BIF) of the Njweng prospect, Mbalam iron ore district, southern Cameroon","authors":"Dieudonne Charles Isidore Ilouga , Cheo Emmanuel Suh , Akumbom Vishiti , Elisha Muntum Shemang","doi":"10.1016/j.dib.2024.111253","DOIUrl":"10.1016/j.dib.2024.111253","url":null,"abstract":"<div><div>In order to determine the nature of the original chemical precipitate and the ocean during the emplacement of BIF in the northern part of the Congo Craton data from the Njweng prospect anomaly was utilized. During field mapping and ground-truthing of this anomaly, samples were collected and the geochemical data of 26 representative samples constitute the core of this contribution. The selected samples for geochemical studies represented the various facies: oxide facies BIF composing of magnetite, hematite/martite, quartz; the silicate facies BIF composing of quartz, magnetite, hematite/martite and goethite. The ore (≥75 wt.% Fe) is a hematite-martite-goethite assemblage. The BIF show a characteristic banding and no association with volcanic activity. Chemical analysis was accomplished using a combination of the Inductively Coupled Plasma Emission Spectrometry (ICP-ES) and Inductively Coupled Plasma Mass Spectrometry (ICP-MS) on the major and trace elements respectively. The BIF have Fe<sub>2</sub>O<sub>3(T)</sub> content that ranges from 47.2 to 88.5 wt% and SiO<sub>2</sub> from 1.87 to 49.28 wt% with the low silica content in the ore. The BIF show average SiO<sub>2</sub> content of 41.20 wt% and 53.9 wt% Fe<sub>2</sub>O<sub>3</sub>. Both the oxide and silicate facies BIF samples show modern seawater characteristics (depletion in LREEY relative to HREEY, positive Eu(Eu/Eu*), Y anomalies and super chondritic Y/Ho ratios). The BIF reveals low Al<sub>2</sub>O<sub>3</sub>, TiO<sub>2</sub>, Na<sub>2</sub>O, K<sub>2</sub>O, HFSE and Nd concentrations. The average concentration of V, Ni, and Cu in the BIF is low.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111253"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783050/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite significant research on the Bangla language in Natural Language Processing (NLP), there remains a notable resource deficit for its diverse regional dialects, such as those spoken in Chittagong, Sylhet, and Barisal. These dialects, often considered unintelligible to speakers of Standard Bengali, pose challenges due to their unique grammatical structures and phonetic variations. Some linguists categorize them as distinct languages. To address this, we present ONUBAD, a large and freely available dataset for the automatic translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla using a Neural Machine Translation (NMT) system. ONUBAD provides a parallel corpus of 1540 words, 130 clauses, and 980 sentences per regional dialect and their standard counterparts along with English translation. The dataset includes metadata on phonetic variations and grammatical features, aiming to bridge the gap between standard and non-standard forms of Bangla. It serves as a valuable resource for researchers in NLP, dialect studies, and linguistic preservation, helping to develop more accurate and contextually relevant translation models. The dataset was collected between July and September 2024 from diverse sources such as books, websites, and regional people with the help of regional dialect specialists. It is hosted by the Department of Computer Science and Engineering, Jahangirnagar University, and is freely accessible at https://data.mendeley.com/datasets/6ft99kf89b/2
{"title":"ONUBAD: A comprehensive dataset for automated conversion of Bangla regional dialects into standard Bengali dialect","authors":"Nusrat Sultana , Rumana Yasmin , Bijon Mallik , Mohammad Shorif Uddin","doi":"10.1016/j.dib.2025.111276","DOIUrl":"10.1016/j.dib.2025.111276","url":null,"abstract":"<div><div>Despite significant research on the Bangla language in Natural Language Processing (NLP), there remains a notable resource deficit for its diverse regional dialects, such as those spoken in Chittagong, Sylhet, and Barisal. These dialects, often considered unintelligible to speakers of Standard Bengali, pose challenges due to their unique grammatical structures and phonetic variations. Some linguists categorize them as distinct languages. To address this, we present ONUBAD, a large and freely available dataset for the automatic translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla using a Neural Machine Translation (NMT) system. ONUBAD provides a parallel corpus of 1540 words, 130 clauses, and 980 sentences per regional dialect and their standard counterparts along with English translation. The dataset includes metadata on phonetic variations and grammatical features, aiming to bridge the gap between standard and non-standard forms of Bangla. It serves as a valuable resource for researchers in NLP, dialect studies, and linguistic preservation, helping to develop more accurate and contextually relevant translation models. The dataset was collected between July and September 2024 from diverse sources such as books, websites, and regional people with the help of regional dialect specialists. It is hosted by the Department of Computer Science and Engineering, Jahangirnagar University, and is freely accessible at <span><span>https://data.mendeley.com/datasets/6ft99kf89b/2</span><svg><path></path></svg></span></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111276"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111234
Grzegorz Orłowski , Lucyna Hałupka , Przemysław Pokorny , Bartosz Borczyk , Tomasz Skawiński , Wojciech Dobicki
The dataset presented in this data paper supports “The prenatal assimilation of minerals and metals in the nestlings of a small passerine bird” (Orłowski et al. 2024) [1]. The article includes raw data on dead nestlings of a small passerine bird, the Eurasian Reed Warbler Acrocephalus scirpaceus breeding in an extensive reedbed (with dominating plant species, the Common Reed Phragmites australis) located in an intensively fertilized fishpond habitat, the Stawy Milickie [Milicz Ponds] Nature Reserve (SW Poland). The data include the description of concentrations of Cu, Ni Cd, Pb, Zn, Ag, Mg, Fe, Co and Ca measured in the isolated, emptied gastrointestinal tract, the whole body, and carcass of the each of 26 individual nestlings of a different age (1–9 days old) and hence a different stage of post-natal development. The dataset includes also some additional information on the breeding biology of the focal species.
{"title":"Supporting dataset on the content of Cu, Ni Cd, Pb, Zn, Ag, Mg, Fe, Co and Ca in the carcass, gastrointestinal tract tissues and the whole body of nestlings of a small passerine bird, the Eurasian Reed Warbler Acrocephalus scirpaceus from an intensively fertilized fishpond habitat","authors":"Grzegorz Orłowski , Lucyna Hałupka , Przemysław Pokorny , Bartosz Borczyk , Tomasz Skawiński , Wojciech Dobicki","doi":"10.1016/j.dib.2024.111234","DOIUrl":"10.1016/j.dib.2024.111234","url":null,"abstract":"<div><div>The dataset presented in this data paper supports “The prenatal assimilation of minerals and metals in the nestlings of a small passerine bird” (Orłowski et al. 2024) [1]. The article includes raw data on dead nestlings of a small passerine bird, the Eurasian Reed Warbler <em>Acrocephalus scirpaceus</em> breeding in an extensive reedbed (with dominating plant species, the Common Reed <em>Phragmites australis)</em> located in an intensively fertilized fishpond habitat, the Stawy Milickie [Milicz Ponds] Nature Reserve (SW Poland). The data include the description of concentrations of Cu, Ni Cd, Pb, Zn, Ag, Mg, Fe, Co and Ca measured in the isolated, emptied gastrointestinal tract, the whole body, and carcass of the each of 26 individual nestlings of a different age (1–9 days old) and hence a different stage of post-natal development. The dataset includes also some additional information on the breeding biology of the focal species.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111234"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111223
Luigi Santopietro , Filomena Pietrapertosa , Angela Pilogallo , Monica Salvia
This data article provides a comprehensive description of climate change mitigation and adaptation policies implemented by 21 Italian regions (NUTS2 level) as of January 2024. It was developed as part a wider research work published by the authors [2].
The dataset collects information on the efforts the regions are making to tackle the climate crisis. In particular, it contains a collection of regional climate plans (RCPs) and a catalogue of their contents analysed with regard to objectives, planned actions and monitoring and evaluation indicators.
To complete the regional framework, the dataset also provides an overview of the socio-economic data for the Italian regions, derived from EUROSTAT (as of 2023), and the climate indicators from the Italian CIRO (Climate Indicators for Italian Regions) database, updated to 2021. Regional sustainable development strategies were also examined for consistency with climate planning and climate emergency declarations.
Moreover, specific data are presented on the regions' participation in transnational networks and initiatives, as well as references on climate legislation currently in force (January 2024).
{"title":"Current efforts in regional climate planning: A dataset from Italian NUTS2 regions","authors":"Luigi Santopietro , Filomena Pietrapertosa , Angela Pilogallo , Monica Salvia","doi":"10.1016/j.dib.2024.111223","DOIUrl":"10.1016/j.dib.2024.111223","url":null,"abstract":"<div><div>This data article provides a comprehensive description of climate change mitigation and adaptation policies implemented by 21 Italian regions (NUTS2 level) as of January 2024. It was developed as part a wider research work published by the authors [2].</div><div>The dataset collects information on the efforts the regions are making to tackle the climate crisis. In particular, it contains a collection of regional climate plans (RCPs) and a catalogue of their contents analysed with regard to objectives, planned actions and monitoring and evaluation indicators.</div><div>To complete the regional framework, the dataset also provides an overview of the socio-economic data for the Italian regions, derived from EUROSTAT (as of 2023), and the climate indicators from the Italian CIRO (Climate Indicators for Italian Regions) database, updated to 2021. Regional sustainable development strategies were also examined for consistency with climate planning and climate emergency declarations.</div><div>Moreover, specific data are presented on the regions' participation in transnational networks and initiatives, as well as references on climate legislation currently in force (January 2024).</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111223"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142983025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2024.111243
Gentil A. Collazos-Escobar , Andrés F. Bahamón-Monje , Nelson Gutiérrez-Guzmán
This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.
{"title":"Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine content","authors":"Gentil A. Collazos-Escobar , Andrés F. Bahamón-Monje , Nelson Gutiérrez-Guzmán","doi":"10.1016/j.dib.2024.111243","DOIUrl":"10.1016/j.dib.2024.111243","url":null,"abstract":"<div><div>This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (<em>Theobroma cacao</em> L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111243"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143001717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radishes, which are common root vegetables, are rich in vitamins and minerals, and contain low calories. This vegetable is known for its rapid growth. Nevertheless, the variety of leaf diseases where leaves get affected by various bacterial and fungal diseases can hinder the healthy growth of radish. Furthermore, there is a high risk of inaccurate identification of diseases if the farmers try to use traditional methods in recognizing these diseases. With the purpose of precise identification of radish leaf diseases for the finest growth of this vegetable, total of 2801 images of the radish leaves are collected from vegetable field in Bangladesh. The collected dataset includes comprehensive images of healthy leaves as well as four types of leaf affected by various diseases such as Black Leaf Spot, Downey Mildew, Flea Beetle and Mosaic. Utilizing this robust dataset, deep learning models can be trained to identify the leaf diseases which helps to detect the diseases in order to reduce the harm of the cultivation of radish. By identifying the diseases on radish leaves accurat-ely and maintaining healthy production of radish, this dataset contributes to the broader sustainability in the agricultural sector.
{"title":"Smartphone image dataset for radish plant leaf disease classification from Bangladesh","authors":"Mahamudul Hasan, Raiyan Gani, Mohammad Rifat Ahmmad Rashid, Maherun Nessa Isty, Raka Kamara, Taslima Khan Tarin","doi":"10.1016/j.dib.2024.111263","DOIUrl":"10.1016/j.dib.2024.111263","url":null,"abstract":"<div><div>Radishes, which are common root vegetables, are rich in vitamins and minerals, and contain low calories. This vegetable is known for its rapid growth. Nevertheless, the variety of leaf diseases where leaves get affected by various bacterial and fungal diseases can hinder the healthy growth of radish. Furthermore, there is a high risk of inaccurate identification of diseases if the farmers try to use traditional methods in recognizing these diseases. With the purpose of precise identification of radish leaf diseases for the finest growth of this vegetable, total of 2801 images of the radish leaves are collected from vegetable field in Bangladesh. The collected dataset includes comprehensive images of healthy leaves as well as four types of leaf affected by various diseases such as Black Leaf Spot, Downey Mildew, Flea Beetle and Mosaic. Utilizing this robust dataset, deep learning models can be trained to identify the leaf diseases which helps to detect the diseases in order to reduce the harm of the cultivation of radish. By identifying the diseases on radish leaves accurat-ely and maintaining healthy production of radish, this dataset contributes to the broader sustainability in the agricultural sector.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111263"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11754492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143028017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.dib.2025.111275
Nele Graubner , Johannes Schmidt
This data set includes the spatial model of the thickness and distribution of fine-grained floodplain deposits in the Leipzig floodplain area. The data set originates from borehole records provided by the Saxon State Office for Environment, Agriculture, and Geology [1]. The data processing involved the categorization of the stratigraphic descriptions of the borehole logs. For that, a methodology was implemented to categorize those into 6 broader classifications (sand, gravel, clay, anthropogenic sediments, fine-grained/organic sediments and others) with 33 sub-categories. Subsequently, the stratigraphic layers were analysed to determine the depth and thickness of the fine-grained floodplain deposits, as well as the distribution of anthropogenic material. The data set was filtered, with the condition that each borehole log has at least one clayey layer and a gravel layer of at least 0.7 m thickness and, later, interpolated to present a complete spatial model for the research area. The final data set is based on 3,414 data points (data collection covers the period: 1852 to 2018) within the Leipzig floodplain and offers significant resource for future interdisciplinary research into the natural and anthropogenic history of the Leipzig's floodplains, offering valuable information for more detailed analyses and more precise modelling of fine-grained floodplain deposit distribution in the Leipzig floodplain area.
{"title":"Spatial distribution of fine-grained floodplain deposits and anthropogenic materials based on official borehole data in the floodplain of Leipzig, Germany","authors":"Nele Graubner , Johannes Schmidt","doi":"10.1016/j.dib.2025.111275","DOIUrl":"10.1016/j.dib.2025.111275","url":null,"abstract":"<div><div>This data set includes the spatial model of the thickness and distribution of fine-grained floodplain deposits in the Leipzig floodplain area. The data set originates from borehole records provided by the Saxon State Office for Environment, Agriculture, and Geology [1]. The data processing involved the categorization of the stratigraphic descriptions of the borehole logs. For that, a methodology was implemented to categorize those into 6 broader classifications (sand, gravel, clay, anthropogenic sediments, fine-grained/organic sediments and others) with 33 sub-categories. Subsequently, the stratigraphic layers were analysed to determine the depth and thickness of the fine-grained floodplain deposits, as well as the distribution of anthropogenic material. The data set was filtered, with the condition that each borehole log has at least one clayey layer and a gravel layer of at least 0.7 m thickness and, later, interpolated to present a complete spatial model for the research area. The final data set is based on 3,414 data points (data collection covers the period: 1852 to 2018) within the Leipzig floodplain and offers significant resource for future interdisciplinary research into the natural and anthropogenic history of the Leipzig's floodplains, offering valuable information for more detailed analyses and more precise modelling of fine-grained floodplain deposit distribution in the Leipzig floodplain area.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111275"},"PeriodicalIF":1.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}