Pub Date : 2026-01-08DOI: 10.1016/j.dib.2026.112456
Marie-Liesse Vermeire , Pathé Basse , Samuel Legros , Falilou Diallo , Anne Desnues , Frédéric Feder
Recycling the growing stock of organic waste products (OWP) from cities, factories, and farms is a key challenge for sustainable agriculture. However, it must be done with awareness of performances but also potential long-term environmental and health risks. In this context, the SOERE PRO observatory was established ("Systèmes d'Observation et d'Expérimentation pour la Recherche en Environnement - Produits Résiduaires Organiques'', a label granted by the French National Research Alliance for the Environment (AllEnvi) to recognize high-quality research infrastructures, which translates to "Long-term Observation and Experimentation Systems for Environmental Research - Organic Waste Products''), including the trial in Sangalkam, in the Dakar region of Senegal, where these data are collected. Since 2016, four fertilizer types - one mineral (synthetic) and three organic - have been applied annually to three successive vegetable crops (tomato, lettuce, carrot). The dataset currently covers the period 2016 - 2025, with data collection ongoing and new data to be added in the future. Manual weeding and hoeing is carried out regularly for each crop, no pesticides are used for crop protection on the trial. A comprehensive, multi-variable dataset is consistently documented, including soil physico-chemical parameters measured annually at three depths, organic waste product characterization, crop yield and quality parameters, and detailed management activities, making it particularly suitable for process-based modelling and long-term impact assessment. The originality of this dataset lies in its long duration, the diversity of organic and mineral fertilization strategies, the inclusion of multiple vegetable crops per year, and its location under Sub-Sahelian conditions, a context for which long-term agronomic datasets remain scarce. All soil, OWP and vegetables samples are stored in a sample bank in Dakar, and available for additional analyses. The objective of this dataset is to provide long-term, integrated information on crop productivity, crop quality, and soil responses to repeated organic and mineral fertilization in a Sub-Sahelian market-gardening system. The dataset is publicly available through a Dataverse repository for free (re)use in meta-analyses, process-based modelling, and environmental studies, notably to improve understanding of nutrient cycling, contaminant dynamics, soil biodiversity, and long-term soil functioning in Sub-Sahelian agroecosystems, and to support sustainable land management and food security in Southern countries under future climate change.
{"title":"Soil and crop data from a long-term organic fertilization trial in Sub-Sahelian market gardening","authors":"Marie-Liesse Vermeire , Pathé Basse , Samuel Legros , Falilou Diallo , Anne Desnues , Frédéric Feder","doi":"10.1016/j.dib.2026.112456","DOIUrl":"10.1016/j.dib.2026.112456","url":null,"abstract":"<div><div>Recycling the growing stock of organic waste products (OWP) from cities, factories, and farms is a key challenge for sustainable agriculture. However, it must be done with awareness of performances but also potential long-term environmental and health risks. In this context, the SOERE PRO observatory was established (\"Systèmes d'Observation et d'Expérimentation pour la Recherche en Environnement - Produits Résiduaires Organiques'', a label granted by the French National Research Alliance for the Environment (AllEnvi) to recognize high-quality research infrastructures, which translates to \"Long-term Observation and Experimentation Systems for Environmental Research - Organic Waste Products''), including the trial in Sangalkam, in the Dakar region of Senegal, where these data are collected. Since 2016, four fertilizer types - one mineral (synthetic) and three organic - have been applied annually to three successive vegetable crops (tomato, lettuce, carrot). The dataset currently covers the period 2016 - 2025, with data collection ongoing and new data to be added in the future. Manual weeding and hoeing is carried out regularly for each crop, no pesticides are used for crop protection on the trial. A comprehensive, multi-variable dataset is consistently documented, including soil physico-chemical parameters measured annually at three depths, organic waste product characterization, crop yield and quality parameters, and detailed management activities, making it particularly suitable for process-based modelling and long-term impact assessment. The originality of this dataset lies in its long duration, the diversity of organic and mineral fertilization strategies, the inclusion of multiple vegetable crops per year, and its location under Sub-Sahelian conditions, a context for which long-term agronomic datasets remain scarce. All soil, OWP and vegetables samples are stored in a sample bank in Dakar, and available for additional analyses. The objective of this dataset is to provide long-term, integrated information on crop productivity, crop quality, and soil responses to repeated organic and mineral fertilization in a Sub-Sahelian market-gardening system. The dataset is publicly available through a Dataverse repository for free (re)use in meta-analyses, process-based modelling, and environmental studies, notably to improve understanding of nutrient cycling, contaminant dynamics, soil biodiversity, and long-term soil functioning in Sub-Sahelian agroecosystems, and to support sustainable land management and food security in Southern countries under future climate change.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112456"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dib.2026.112450
Xinchao Song , Mingjun Li , Sean Banerjee , Natasha Kholgade Banerjee
We present the HILO dataset consisting of high-resolution 3D scanned models for 253 common-use objects and 32,256 multi-viewpoint RGB-D images with typically low-resolution data for 144 tabletop scenes consisting of collections of random sets of 10 objects drawn from the set of 253 objects. The dataset provides the 6 degree of freedom (6DOF) pose for all objects found in each of the 32,256 RGB-D images, obtained by performing precise 3D alignment of the 3D models to the RGB-D images. The dataset also contains metadata on object mass, short text descriptor, binning into everyday use classes, and aspect ratio and function categories, intrinsic parameter information for RGB-D sensors used in capture, and transformations between camera poses. Object 3D models in the dataset were acquired by scanning using a tabletop 3D scanner, and were manually inspected, cleaned, repaired, and exported as original ultra high-resolution at ∼1M vertices and simplified high-resolution meshes at ∼10k vertices. To capture the multi-view RGB-D images, we established an in-house testbed consisting of a turntable and two robotic manipulators to respectively cover azimuth angles and elevation angles, and span a hemisphere. Images were captured using two Microsoft Azure Kinect sensors mounted at the wrists of the robot, one per robot. We captured images over two distances forming hemispherical shells. We used in-house software written in python to control the turntable movement, robot motion, and image capture, as well as to perform camera calibration, processing to generate registered images and foreground masks, manual precise alignment of object models to images, and post-capture correction of misalignments in camera transformation parameters. The dataset provides value in enabling training and evaluation of algorithms for several tasks in computer vision, artificial intelligence (AI), and robotics such as object completion, recognition, segmentation, high-resolution structure generation, robotic grasp planning, and recognition of human-preferred grasp locations for human-robot collaboration.
{"title":"Dataset of RGB-D images of object collections from multiple viewpoints with aligned high-resolution 3D models of objects","authors":"Xinchao Song , Mingjun Li , Sean Banerjee , Natasha Kholgade Banerjee","doi":"10.1016/j.dib.2026.112450","DOIUrl":"10.1016/j.dib.2026.112450","url":null,"abstract":"<div><div>We present the HILO dataset consisting of high-resolution 3D scanned models for 253 common-use objects and 32,256 multi-viewpoint RGB-D images with typically low-resolution data for 144 tabletop scenes consisting of collections of random sets of 10 objects drawn from the set of 253 objects. The dataset provides the 6 degree of freedom (6DOF) pose for all objects found in each of the 32,256 RGB-D images, obtained by performing precise 3D alignment of the 3D models to the RGB-D images. The dataset also contains metadata on object mass, short text descriptor, binning into everyday use classes, and aspect ratio and function categories, intrinsic parameter information for RGB-D sensors used in capture, and transformations between camera poses. Object 3D models in the dataset were acquired by scanning using a tabletop 3D scanner, and were manually inspected, cleaned, repaired, and exported as original ultra high-resolution at ∼1M vertices and simplified high-resolution meshes at ∼10k vertices. To capture the multi-view RGB-D images, we established an in-house testbed consisting of a turntable and two robotic manipulators to respectively cover azimuth angles and elevation angles, and span a hemisphere. Images were captured using two Microsoft Azure Kinect sensors mounted at the wrists of the robot, one per robot. We captured images over two distances forming hemispherical shells. We used in-house software written in python to control the turntable movement, robot motion, and image capture, as well as to perform camera calibration, processing to generate registered images and foreground masks, manual precise alignment of object models to images, and post-capture correction of misalignments in camera transformation parameters. The dataset provides value in enabling training and evaluation of algorithms for several tasks in computer vision, artificial intelligence (AI), and robotics such as object completion, recognition, segmentation, high-resolution structure generation, robotic grasp planning, and recognition of human-preferred grasp locations for human-robot collaboration.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112450"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07eCollection Date: 2026-02-01DOI: 10.1016/j.dib.2026.112451
Brittany Antonczak, Meg Fay, Aviral Chawla, Gregory Rowangould
The Highway Performance Monitoring System, managed by the Federal Highway Administration, provides data on average annual daily traffic volume across roadways in the United States, but it has limited representation of medium- and heavy-duty vehicle traffic on lower-volume roadways that are not part of the national highway system. This gap limits research and policy analysis on the community impacts of truck traffic, especially concerning air quality and public health. To address this, we use Random Forest Regression to estimate medium- and heavy-duty vehicle traffic volumes on network links where these data are missing. The result is a comprehensive vehicle traffic dataset that covers 85.2% of public roadways in the United States. From these data, we also calculate traffic density values for each census block and vehicle class that can serve as a high-resolution surrogate for traffic-related air pollution exposure in public health studies and policy analysis. Our high-resolution spatial data products are rigorously validated and provide a more complete representation of truck traffic than any existing publicly available dataset. These datasets are valuable for transportation planning, public health research, and policy decisions aimed at understanding and mitigating the effects of truck traffic on communities that are disproportionately exposed to air pollution from vehicle traffic.
{"title":"Comprehensive and spatially detailed passenger vehicle and truck traffic volume data for the United States estimated by machine learning.","authors":"Brittany Antonczak, Meg Fay, Aviral Chawla, Gregory Rowangould","doi":"10.1016/j.dib.2026.112451","DOIUrl":"10.1016/j.dib.2026.112451","url":null,"abstract":"<p><p>The Highway Performance Monitoring System, managed by the Federal Highway Administration, provides data on average annual daily traffic volume across roadways in the United States, but it has limited representation of medium- and heavy-duty vehicle traffic on lower-volume roadways that are not part of the national highway system. This gap limits research and policy analysis on the community impacts of truck traffic, especially concerning air quality and public health. To address this, we use Random Forest Regression to estimate medium- and heavy-duty vehicle traffic volumes on network links where these data are missing. The result is a comprehensive vehicle traffic dataset that covers 85.2% of public roadways in the United States. From these data, we also calculate traffic density values for each census block and vehicle class that can serve as a high-resolution surrogate for traffic-related air pollution exposure in public health studies and policy analysis. Our high-resolution spatial data products are rigorously validated and provide a more complete representation of truck traffic than any existing publicly available dataset. These datasets are valuable for transportation planning, public health research, and policy decisions aimed at understanding and mitigating the effects of truck traffic on communities that are disproportionately exposed to air pollution from vehicle traffic.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"64 ","pages":"112451"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12855594/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146104187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.dib.2025.112432
Nicole Nawrot , Jacek Kluska
Cultivating Miscanthus × giganteus (M×g) energy crop on marginal soil supports phytoattenuation and provides high-energy biomass for biofuel production. Improving nutrient-poor soil with low-cost recovered organic amendments, such as spent coffee grounds (SCG) and SCG-derived biochar (BC) offers sustainable benefits. This data article presents the findings from a medium-term greenhouse experiment at the Gdansk University of Technology assessing M×g cultivation on marginal soil with SCG and BC amendments into soil. In a pot-scale experiment the medium term-effect on M×g biomass growth, photosynthesis parameters, root tissues development, as well as final elemental composition was examined. Soil pH and elemental composition were also determined. As global coffee consumption increases, large quantities of SCG are generated and often landfilled. Their beneficial reuse aligns with circular economy principles and Sustainable Development Goals (SDGs 7 and 13), providing both a short-term nutrient source and a means of improving soil quality and resilience. The article compiles five datasets detailing: (1) M×g growth parameters, tissue development, and photosynthetic indices, (2) nutrient and caffeine leaching behaviour; and (3) elemental composition of plants and soils following exposure. These datasets, available in the Bridge of Knowledge Gdansk University of Technology repository, provide a resource for environmental researchers, soil and plant scientists, biochar specialists, and decisionmakers working to restore marginal soil usability. This study promotes sustainable land management by demonstrating how organic wastes and biochar can be combined to improve crop performance, sequester carbon, and reduce nutrient losses while minimizing external fertilizer inputs.
{"title":"Effects of raw and thermally processed spent coffee grounds on Miscanthus × giganteus plantation: Data description","authors":"Nicole Nawrot , Jacek Kluska","doi":"10.1016/j.dib.2025.112432","DOIUrl":"10.1016/j.dib.2025.112432","url":null,"abstract":"<div><div>Cultivating <em>Miscanthus × giganteus</em> (<em>M</em> <em>×</em> <em>g</em>) energy crop on marginal soil supports phytoattenuation and provides high-energy biomass for biofuel production. Improving nutrient-poor soil with low-cost recovered organic amendments, such as spent coffee grounds (SCG) and SCG-derived biochar (BC) offers sustainable benefits. This data article presents the findings from a medium-term greenhouse experiment at the Gdansk University of Technology assessing <em>M</em> <em>×</em> <em>g</em> cultivation on marginal soil with SCG and BC amendments into soil. In a pot-scale experiment the medium term-effect on <em>M</em> <em>×</em> <em>g</em> biomass growth, photosynthesis parameters, root tissues development, as well as final elemental composition was examined. Soil pH and elemental composition were also determined. As global coffee consumption increases, large quantities of SCG are generated and often landfilled. Their beneficial reuse aligns with circular economy principles and Sustainable Development Goals (SDGs 7 and 13), providing both a short-term nutrient source and a means of improving soil quality and resilience. The article compiles five datasets detailing: (1) <em>M</em> <em>×</em> <em>g</em> growth parameters, tissue development, and photosynthetic indices, (2) nutrient and caffeine leaching behaviour; and (3) elemental composition of plants and soils following exposure. These datasets, available in the Bridge of Knowledge Gdansk University of Technology repository, provide a resource for environmental researchers, soil and plant scientists, biochar specialists, and decisionmakers working to restore marginal soil usability. This study promotes sustainable land management by demonstrating how organic wastes and biochar can be combined to improve crop performance, sequester carbon, and reduce nutrient losses while minimizing external fertilizer inputs.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"64 ","pages":"Article 112432"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145973351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.dib.2025.112434
Camille Marchal , Damien Ballan , Sarra Azib , Morgane Innocent , Bertrand Urien , Annick Tamaro , Marine Le Gall-Ely , Emmanuel Coton , Adeline Picot , Jérôme Mounier , Louis Coroller , Patrick Gabriel
Fresh fruits and vegetables (FFV) represent the largest part of food waste at the consumer level. This waste directly results from FFV physiological and microbiological spoilage, itself intricately linked to behavioural factors such as consumer practices, including purchase, storage and hygiene practices, but also consumers’ perceptions towards spoilage. Based on a dual approach combining microbiological and behavioural sciences, we examined the link between FFV waste produced by 49 volunteering French households, measured using connected bins, the microbial ecology of their storage compartments, using culture-dependent and -independent approaches, and their consumer behaviour, cleaning and storage practices, through in-depth interviews and a dedicated survey. An exploratory qualitative survey carried out on 17 individuals followed by two quantitative data collections on 1048 and 815 representative French consumers enabled us to identify anti-FFV waste practices and to cluster consumers according to their anti-FFV waste behaviours. Spoilage dynamics of commonly consumed FFV, according to storage temperature, microbial contamination level and the presence or absence of surface wounds, were also performed in controlled conditions. This citizen-science-based dataset covers a wide array of microbiological and behavioural factors related to domestic FFV waste, as well as real measurements of waste volumes thanks to the innovative use of connected bins. Altogether, this data could provide interesting insights into more effective and accessible guidelines for FFV waste reduction at the consumer level, and thus to a potential reduction of global food waste and its related costs.
{"title":"Participatory and multi-disciplinary science dataset and surveys for the assessment of the microbiological and behavioural factors influencing fresh fruits and vegetables' waste at home","authors":"Camille Marchal , Damien Ballan , Sarra Azib , Morgane Innocent , Bertrand Urien , Annick Tamaro , Marine Le Gall-Ely , Emmanuel Coton , Adeline Picot , Jérôme Mounier , Louis Coroller , Patrick Gabriel","doi":"10.1016/j.dib.2025.112434","DOIUrl":"10.1016/j.dib.2025.112434","url":null,"abstract":"<div><div>Fresh fruits and vegetables (FFV) represent the largest part of food waste at the consumer level. This waste directly results from FFV physiological and microbiological spoilage, itself intricately linked to behavioural factors such as consumer practices, including purchase, storage and hygiene practices, but also consumers’ perceptions towards spoilage. Based on a dual approach combining microbiological and behavioural sciences, we examined the link between FFV waste produced by 49 volunteering French households, measured using connected bins, the microbial ecology of their storage compartments, using culture-dependent and -independent approaches, and their consumer behaviour, cleaning and storage practices, through in-depth interviews and a dedicated survey. An exploratory qualitative survey carried out on 17 individuals followed by two quantitative data collections on 1048 and 815 representative French consumers enabled us to identify anti-FFV waste practices and to cluster consumers according to their anti-FFV waste behaviours. Spoilage dynamics of commonly consumed FFV, according to storage temperature, microbial contamination level and the presence or absence of surface wounds, were also performed in controlled conditions. This citizen-science-based dataset covers a wide array of microbiological and behavioural factors related to domestic FFV waste, as well as real measurements of waste volumes thanks to the innovative use of connected bins. Altogether, this data could provide interesting insights into more effective and accessible guidelines for FFV waste reduction at the consumer level, and thus to a potential reduction of global food waste and its related costs.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112434"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.dib.2025.112444
Utshob Sutradhar , Priyankar Biswas , Sumon Hossain , A.T.M. Saiful Islam , Shuvo Dev
This paper presents a dataset on electrical power collected from a university campus in Bangladesh. It is meant to help research on energy forecasting in university settings. The dataset has hourly measurements of system voltage, three-phase currents (R, Y, B), and power factor (pf). These were recorded at the campus substation. Data were collected during different operational conditions, including academic periods and vacations. This provides insights into load behaviour, changes in power factor, and phase imbalance patterns in an educational setting. The dataset supports the creation and assessment of models for load forecasting, anomaly detection, and improving power efficiency. It was also combined with weather data to aid research on load forecasting that takes weather into account. The weather parameters include temperature, humidity, precipitation, wind speed, and solar radiation. All weather values match energy values and were gathered hourly and daily. This dataset is especially useful for researchers studying how artificial intelligence and machine learning can be applied in managing electrical energy. The dataset also includes notes about context, such as reduced load during national holidays. This improves its usefulness for studies that focus on events in forecasting. By making this dataset open access, it helps fill the gap in publicly available electrical load data from educational institutions in developing countries. This supports reproducible research and sustainable energy management on campus.
{"title":"UniEload: Electrical load dataset for energy forecasting applications at public universities in Bangladesh","authors":"Utshob Sutradhar , Priyankar Biswas , Sumon Hossain , A.T.M. Saiful Islam , Shuvo Dev","doi":"10.1016/j.dib.2025.112444","DOIUrl":"10.1016/j.dib.2025.112444","url":null,"abstract":"<div><div>This paper presents a dataset on electrical power collected from a university campus in Bangladesh. It is meant to help research on energy forecasting in university settings. The dataset has hourly measurements of system voltage, three-phase currents (R, Y, B), and power factor (pf). These were recorded at the campus substation. Data were collected during different operational conditions, including academic periods and vacations. This provides insights into load behaviour, changes in power factor, and phase imbalance patterns in an educational setting. The dataset supports the creation and assessment of models for load forecasting, anomaly detection, and improving power efficiency. It was also combined with weather data to aid research on load forecasting that takes weather into account. The weather parameters include temperature, humidity, precipitation, wind speed, and solar radiation. All weather values match energy values and were gathered hourly and daily. This dataset is especially useful for researchers studying how artificial intelligence and machine learning can be applied in managing electrical energy. The dataset also includes notes about context, such as reduced load during national holidays. This improves its usefulness for studies that focus on events in forecasting. By making this dataset open access, it helps fill the gap in publicly available electrical load data from educational institutions in developing countries. This supports reproducible research and sustainable energy management on campus.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112444"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.dib.2025.112443
Claudia Moricca , Erasmo Di Fonso , Rachele Nicolini , Laura Sadori
Here we present a coherent, georeferenced and chronologically qualified corpus of fossil plant remains compiled from published archaeobotanical records from archaeological sites from Central Italy, focused on Olea europaea (olive) and Vitis vinifera (grape). The dataset is entirely based on secondary data and does not include newly generated primary archaeobotanical analyses. The dataset integrates site, context and all relevant archaeobotanical occurrences within a coherent relational and spatial model. The corpus was initiated through a structured bibliographic survey aided by the BRAIN database. Exclusively published literature was consulted, allowing to model archaeological sites and link them to excavation contexts and individual archaeobotanical occurrences (defined as the combination of a taxon and the specific plant part recovered, e.g., fruit, seed, rachis). The geodatabase was implemented using QGIS, with a local backend in GeoPackage, then migrated to PostgreSQL/PostGIS to support complex spatial/relational queries and future online outputs. All entities have a defined spatial placement accompanied by explicit quality-control parameters documenting positional uncertainty, source type and authority, as derived from the original published sources, ensuring transparent assessment of locational reliability. To enrich taxonomic information, an automated open thesaurus was built from CC BY/CC BY-SA resources (Floritaly, Acta Plantarum, and Wikimedia projects). The workflow employs REST-style access (or form-equivalent submissions), conservative rate-limiting, randomized waits, retries, and checkpoints; provenance and attribution (including noted transformations) are preserved. A standardized chronological table harmonizes relative cultural phases using ICCD nomenclature, with controlled fallbacks to Perio.do or peer-reviewed literature; a self-referential hierarchy (parent_id) ensures inheritance from sub-phase to broader period. Crucially, the use of open licenses, stable identifiers and cross-references makes the dataset interoperable and interlinked with the source ecosystems from which the secondary archaeobotanical data were extracted: records can resolve back to Floritaly and Acta Plantarum, and our forthcoming web portal can expose these connections for bidirectional navigation, automated updating and external reuse. The result is an interoperable, verifiable resource suitable for spatial and temporal analyses of plant remains based on aggregated and standardized published archaeobotanical data, while remaining legally reusable under the original licenses.
{"title":"A georeferenced dataset of archaeobotanical findings of Olea europaea and Vitis vinifera compiled from published records from Central Italy","authors":"Claudia Moricca , Erasmo Di Fonso , Rachele Nicolini , Laura Sadori","doi":"10.1016/j.dib.2025.112443","DOIUrl":"10.1016/j.dib.2025.112443","url":null,"abstract":"<div><div>Here we present a coherent, georeferenced and chronologically qualified corpus of fossil plant remains compiled from published archaeobotanical records from archaeological sites from Central Italy, focused on <em>Olea europaea</em> (olive) and <em>Vitis vinifera</em> (grape). The dataset is entirely based on secondary data and does not include newly generated primary archaeobotanical analyses. The dataset integrates site, context and all relevant archaeobotanical occurrences within a coherent relational and spatial model. The corpus was initiated through a structured bibliographic survey aided by the BRAIN database. Exclusively published literature was consulted, allowing to model archaeological sites and link them to excavation contexts and individual archaeobotanical occurrences (defined as the combination of a taxon and the specific plant part recovered, e.g., fruit, seed, rachis). The geodatabase was implemented using QGIS, with a local backend in GeoPackage, then migrated to PostgreSQL/PostGIS to support complex spatial/relational queries and future online outputs. All entities have a defined spatial placement accompanied by explicit quality-control parameters documenting positional uncertainty, source type and authority, as derived from the original published sources, ensuring transparent assessment of locational reliability. To enrich taxonomic information, an automated open thesaurus was built from CC BY/CC BY-SA resources (Floritaly, Acta Plantarum, and Wikimedia projects). The workflow employs REST-style access (or form-equivalent submissions), conservative rate-limiting, randomized waits, retries, and checkpoints; provenance and attribution (including noted transformations) are preserved. A standardized chronological table harmonizes relative cultural phases using ICCD nomenclature, with controlled fallbacks to Perio.do or peer-reviewed literature; a self-referential hierarchy (parent_id) ensures inheritance from sub-phase to broader period. Crucially, the use of open licenses, stable identifiers and cross-references makes the dataset interoperable and interlinked with the source ecosystems from which the secondary archaeobotanical data were extracted: records can resolve back to Floritaly and <em>Acta Plantarum</em>, and our forthcoming web portal can expose these connections for bidirectional navigation, automated updating and external reuse. The result is an interoperable, verifiable resource suitable for spatial and temporal analyses of plant remains based on aggregated and standardized published archaeobotanical data, while remaining legally reusable under the original licenses.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"64 ","pages":"Article 112443"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145973353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.dib.2025.112445
Trang Thu Tran , Huyen Minh Thi Ta , Duc Hoang Le , Duong Huy Nguyen , Nam Trung Nguyen
This dataset presents RNA sequencing (RNA-seq) data from RAW264.7 murine macrophages pretreated with 9-methoxycanthin-6-one, a canthin-6-one–type alkaloid isolated from Eurycoma longifolia Jack, and subsequently stimulated with polyinosinic:polycytidylic acid [poly(I:C)], a synthetic double-stranded RNA analog that activates TLR3-mediated antiviral signaling. RAW264.7 cells were pretreated with 9-methoxycanthin-6-one (30 µM) for 30 min and then exposed to poly(I:C) (20 µg/mL) for 6 h. Total RNA was extracted, quality-checked, and sequenced on the Illumina platform to generate paired-end reads. Differential expression analysis and functional annotation were performed to profile genes responsive to 9-methoxycanthin-6-one treatment under poly(I:C) stimulation. The dataset includes normalized expression matrices, lists of upregulated and downregulated genes, and pathway enrichment outputs in standard formats. These data provide a reference resource for understanding the transcriptomic responses of macrophages to natural alkaloid treatment during viral-mimetic immune activation. The dataset can be reused to compare host antiviral transcriptional responses across TLR3-related pathways, evaluate macrophage activation markers, or integrate with other E. longifolia bioactive compounds.
{"title":"Transcriptomic dataset of RAW264.7 murine macrophages pretreated with 9-methoxycanthin-6-one under poly(I:C)-TLR3 stimulation","authors":"Trang Thu Tran , Huyen Minh Thi Ta , Duc Hoang Le , Duong Huy Nguyen , Nam Trung Nguyen","doi":"10.1016/j.dib.2025.112445","DOIUrl":"10.1016/j.dib.2025.112445","url":null,"abstract":"<div><div>This dataset presents RNA sequencing (RNA-seq) data from RAW264.7 murine macrophages pretreated with 9-methoxycanthin-6-one, a canthin-6-one–type alkaloid isolated from <em>Eurycoma longifolia</em> Jack, and subsequently stimulated with polyinosinic:polycytidylic acid [poly(I:C)], a synthetic double-stranded RNA analog that activates TLR3-mediated antiviral signaling. RAW264.7 cells were pretreated with 9-methoxycanthin-6-one (30 µM) for 30 min and then exposed to poly(I:C) (20 µg/mL) for 6 h. Total RNA was extracted, quality-checked, and sequenced on the Illumina platform to generate paired-end reads. Differential expression analysis and functional annotation were performed to profile genes responsive to 9-methoxycanthin-6-one treatment under poly(I:C) stimulation. The dataset includes normalized expression matrices, lists of upregulated and downregulated genes, and pathway enrichment outputs in standard formats. These data provide a reference resource for understanding the transcriptomic responses of macrophages to natural alkaloid treatment during viral-mimetic immune activation. The dataset can be reused to compare host antiviral transcriptional responses across TLR3-related pathways, evaluate macrophage activation markers, or integrate with other <em>E. longifolia</em> bioactive compounds.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112445"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.dib.2025.112436
Luis Ávila Calderón , Sina Schriever , Ying Han , Jürgen Olbricht , Pedro Dolabella Portella , Birgit Skrotzki
The article presents creep data for the single-crystal, [001]-oriented nickel-based superalloy CMSX-6, tested at a temperature of 980 °C under initial stresses ranging from 140 MPa to 230 MPa. The constant-load creep experiments were performed in accordance with DIN EN ISO 204:2019–4 standard within an ISO 17025 accredited laboratory. A total of 12 datasets are included, each of which includes the percentage creep extension as a function of time. The data series and associated metadata were systematically documented using a data schema specifically developed for creep data of single-crystal Ni-based superalloys. This dataset serves multiple purposes: it can be used to compare with one's own creep test results on similar materials, to verify testing setups (e.g., by replicating tests on the same or comparable materials), to calibrate and validate creep models, and to support alloy development efforts.
本文介绍了单晶,[001]取向镍基高温合金CMSX-6的蠕变数据,测试温度为980°C,初始应力范围为140 MPa至230 MPa。恒载蠕变实验在ISO 17025认可的实验室中按照DIN EN ISO 204:2019-4标准进行。总共包括12个数据集,每个数据集都包含蠕变扩展百分比作为时间的函数。使用专门为单晶镍基高温合金蠕变数据开发的数据模式,系统地记录了数据系列和相关元数据。该数据集具有多种用途:它可用于与类似材料的蠕变测试结果进行比较,验证测试设置(例如,通过在相同或可比材料上重复测试),校准和验证蠕变模型,并支持合金开发工作。
{"title":"Creep reference data of single-crystal Ni-based superalloy CMSX-6","authors":"Luis Ávila Calderón , Sina Schriever , Ying Han , Jürgen Olbricht , Pedro Dolabella Portella , Birgit Skrotzki","doi":"10.1016/j.dib.2025.112436","DOIUrl":"10.1016/j.dib.2025.112436","url":null,"abstract":"<div><div>The article presents creep data for the single-crystal, [001]-oriented nickel-based superalloy CMSX-6, tested at a temperature of 980 °C under initial stresses ranging from 140 MPa to 230 MPa. The constant-load creep experiments were performed in accordance with DIN EN ISO 204:2019–4 standard within an ISO 17025 accredited laboratory. A total of 12 datasets are included, each of which includes the percentage creep extension as a function of time. The data series and associated metadata were systematically documented using a data schema specifically developed for creep data of single-crystal Ni-based superalloys. This dataset serves multiple purposes: it can be used to compare with one's own creep test results on similar materials, to verify testing setups (e.g., by replicating tests on the same or comparable materials), to calibrate and validate creep models, and to support alloy development efforts.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112436"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study introduces a novel multilingual dataset designed to distinguish auto-tuned musical compositions from authentic recordings, addressing a significant gap in existing resources. The dataset encompasses songs in English, Mandarin, and Japanese, ensuring a diverse representation of linguistic contexts. The data collection process began with aggregating diverse datasets from the Music Information Retrieval domain, incorporating tracks from the three specified languages to capture a wide range of musical styles and recording qualities. Each audio file was subsequently standardized into 10-second intervals with the sample rate of 16 kHz to facilitate manageable analysis. For the creation of auto-tuned samples, pitch correction was implemented using the probabilistic YIN (PYIN) algorithm for accurate pitch detection, followed by transposition via the pitch-synchronized overlap and add (PSOLA) technique. To emulate realistic auto-tuning scenarios, pitch correction was randomly applied to portions of each 10-second segment, ensuring variability and realism in the dataset, which makes it suitable for training robust detection models. Additionally, time-domain labels indicating the exact locations of pitch correction within each segment were generated, providing precise annotations crucial for developing accurate detection algorithms. The resulting multilingual dataset comprises a comprehensive collection of both auto-tuned and authentic musical segments across English, Mandarin, and Japanese languages, each annotated with detailed information about pitch correction applications. This rich annotation allows for nuanced analysis and supports various research applications, while the dataset's structure and thorough documentation of its creation process make it a valuable resource for researchers in music analysis, machine learning, and audio signal processing.
{"title":"ATDD: Multi-lingual dataset for auto-tune detection in music recordings","authors":"Mahyar Gohari , Paolo Bestagini , Sergio Benini , Nicola Adami","doi":"10.1016/j.dib.2025.112446","DOIUrl":"10.1016/j.dib.2025.112446","url":null,"abstract":"<div><div>This study introduces a novel multilingual dataset designed to distinguish auto-tuned musical compositions from authentic recordings, addressing a significant gap in existing resources. The dataset encompasses songs in English, Mandarin, and Japanese, ensuring a diverse representation of linguistic contexts. The data collection process began with aggregating diverse datasets from the Music Information Retrieval domain, incorporating tracks from the three specified languages to capture a wide range of musical styles and recording qualities. Each audio file was subsequently standardized into 10-second intervals with the sample rate of 16 kHz to facilitate manageable analysis. For the creation of auto-tuned samples, pitch correction was implemented using the probabilistic YIN (PYIN) algorithm for accurate pitch detection, followed by transposition via the pitch-synchronized overlap and add (PSOLA) technique. To emulate realistic auto-tuning scenarios, pitch correction was randomly applied to portions of each 10-second segment, ensuring variability and realism in the dataset, which makes it suitable for training robust detection models. Additionally, time-domain labels indicating the exact locations of pitch correction within each segment were generated, providing precise annotations crucial for developing accurate detection algorithms. The resulting multilingual dataset comprises a comprehensive collection of both auto-tuned and authentic musical segments across English, Mandarin, and Japanese languages, each annotated with detailed information about pitch correction applications. This rich annotation allows for nuanced analysis and supports various research applications, while the dataset's structure and thorough documentation of its creation process make it a valuable resource for researchers in music analysis, machine learning, and audio signal processing.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112446"},"PeriodicalIF":1.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}