Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.
{"title":"A clinical dataset on type-2 diabetes including demographic, anthropometric, and biochemical parameters from Bangladesh","authors":"Md. Younus Bhuiyan , Shahriar Siddique Ayon , Md. Ebrahim Hossain , Md. Saef Ullah Miah , Afjal H. Sarower , Fateha khanam Bappee","doi":"10.1016/j.dib.2026.112457","DOIUrl":"10.1016/j.dib.2026.112457","url":null,"abstract":"<div><div>Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112457"},"PeriodicalIF":1.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-10DOI: 10.1016/j.dib.2026.112459
Luiza N. Loges , Ricardo DeMoya , Valentina Laverde, Saulius Sumanas
Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic fli1b mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) fli1b mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and fli1b mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.
{"title":"Single-cell RNA-seq data of wild type and fli1b mutant zebrafish embryos","authors":"Luiza N. Loges , Ricardo DeMoya , Valentina Laverde, Saulius Sumanas","doi":"10.1016/j.dib.2026.112459","DOIUrl":"10.1016/j.dib.2026.112459","url":null,"abstract":"<div><div>Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic <em>fli1b</em> mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) <em>fli1b</em> mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and <em>fli1b</em> mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112459"},"PeriodicalIF":1.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users’ needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.
{"title":"Indonesian pharmaceutical dataset for self-medication","authors":"Richard Wiputra , Carrie Florista Benjaminsz , Andrian Loria , Rafaell Widjaya , Rudy , Andry Chowanda","doi":"10.1016/j.dib.2026.112460","DOIUrl":"10.1016/j.dib.2026.112460","url":null,"abstract":"<div><div>The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users’ needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112460"},"PeriodicalIF":1.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Industrial hemp cultivation is expanding and requires reliable monitoring for legal compliance and agricultural management. This paper presents a standardized UAV-based multisensor framework designed for Cannabis sativa L. It integrates RGB, multispectral, and thermal imaging as core modules, with hyperspectral and LiDAR as optional extensions. The framework sets protocols for sensor integration, flight planning, field measurements, and annotation, ensuring datasets that meet EU altitude limits (≤120 m AGL). Multi-altitude and multi-time-of-day acquisitions are proposed to capture spatial and diurnal variability. These data improve model robustness for phenotyping, stress detection, and THC compliance verification. Potential applications include precision agriculture, breeding, regulatory monitoring, environmental assessment, and illicit crop detection. Open-access datasets generated through this framework will support reproducibility, machine learning development, and collaboration among researchers, farmers, and regulators.
{"title":"A Uav-based multisensor framework for legal industrial Cannabis monitoring and open-access dataset development","authors":"Genta Rexha , Ina Papadhopulli , Aleksandër Biberaj , Elson Agastra , Enida Sheme , Elinda Meçe","doi":"10.1016/j.dib.2026.112463","DOIUrl":"10.1016/j.dib.2026.112463","url":null,"abstract":"<div><div>Industrial hemp cultivation is expanding and requires reliable monitoring for legal compliance and agricultural management. This paper presents a standardized UAV-based multisensor framework designed for Cannabis sativa L. It integrates RGB, multispectral, and thermal imaging as core modules, with hyperspectral and LiDAR as optional extensions. The framework sets protocols for sensor integration, flight planning, field measurements, and annotation, ensuring datasets that meet EU altitude limits (≤120 m AGL). Multi-altitude and multi-time-of-day acquisitions are proposed to capture spatial and diurnal variability. These data improve model robustness for phenotyping, stress detection, and THC compliance verification. Potential applications include precision agriculture, breeding, regulatory monitoring, environmental assessment, and illicit crop detection. Open-access datasets generated through this framework will support reproducibility, machine learning development, and collaboration among researchers, farmers, and regulators.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112463"},"PeriodicalIF":1.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This dataset provides a comprehensive genomic and pathogenicity profiling of Staphylococcus aureus strain IHS3A, a methicillin-resistant (MRSA) clinical isolate obtained from a healthcare worker in a teaching hospital in Jordan, Middle East. Whole genome sequencing was performed using the Illumina NextSeq 2000 platform, followed by high-quality de novo assembly using SPAdes. The genome spans 2821,373 bp across 90 contigs, with a GC content of 32.78%, and demonstrates high-quality metrics, including 99.67% completeness and minimal contamination (0.08%). The genome analysis identified 2611 predicted protein-coding sequences. Multilocus sequence typing (MLST) assigned the isolate to ST10647, SCCmec typing revealed type IVc (2B), and spa typing identified t131. The dataset includes comprehensive annotations of key antimicrobial resistance genes, such as mecA (methicillin resistance), blaZ (penicillin resistance), and lmrS (macrolide efflux), as well as virulence factors related to adherence (e.g., atl, clfA), immune evasion (e.g., scn, adsA), secretion systems (e.g., esaA, esaB), and toxins (e.g., hla, lukF-PV, tsst). Secondary metabolite biosynthetic gene clusters, such as staphyloferrin B and staphylopine, were identified. The genome also encodes a diverse carbohydrate-active enzyme (CAZyme) profile. These genomic data are valuable for further research on MRSA evolution, resistance mechanisms, and virulence factors in Jordan and the Middle East. The genome data have been deposited in the NCBI database under the accession number JBPPGA000000000, with a direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1. Bioproject: PRJNA1283614, Biosample: SAMN49700843.
{"title":"Draft genome data analysis and pathogenicity profiling of Staphylococcus aureus strain IHS3A with antibiotic resistance genes isolated from a hospital in Jordan","authors":"Saqr Abushattal , Sulaiman M. Alnaimat , Nidal Odat , Mahmoud Abushattal","doi":"10.1016/j.dib.2026.112453","DOIUrl":"10.1016/j.dib.2026.112453","url":null,"abstract":"<div><div>This dataset provides a comprehensive genomic and pathogenicity profiling of <em>Staphylococcus aureus</em> strain IHS3A, a methicillin-resistant (MRSA) clinical isolate obtained from a healthcare worker in a teaching hospital in Jordan, Middle East. Whole genome sequencing was performed using the Illumina NextSeq 2000 platform, followed by high-quality de novo assembly using SPAdes. The genome spans 2821,373 bp across 90 contigs, with a GC content of 32.78%, and demonstrates high-quality metrics, including 99.67% completeness and minimal contamination (0.08%). The genome analysis identified 2611 predicted protein-coding sequences. Multilocus sequence typing (MLST) assigned the isolate to ST10647, SCC<em>mec</em> typing revealed type IVc (2B), and spa typing identified t131. The dataset includes comprehensive annotations of key antimicrobial resistance genes, such as <em>mecA</em> (methicillin resistance), <em>blaZ</em> (penicillin resistance), and <em>lmrS</em> (macrolide efflux), as well as virulence factors related to adherence (e.g., <em>atl, clfA</em>), immune evasion (e.g., <em>scn, adsA</em>), secretion systems (e.g., <em>esaA, esaB</em>), and toxins (e.g., <em>hla, lukF</em>-<em>PV, tsst</em>). Secondary metabolite biosynthetic gene clusters, such as staphyloferrin B and staphylopine, were identified. The genome also encodes a diverse carbohydrate-active enzyme (CAZyme) profile. These genomic data are valuable for further research on MRSA evolution, resistance mechanisms, and virulence factors in Jordan and the Middle East. The genome data have been deposited in the NCBI database under the accession number JBPPGA000000000, with a direct URL to data: <span><span>https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1</span><svg><path></path></svg></span>. Bioproject: PRJNA1283614, Biosample: SAMN49700843.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"64 ","pages":"Article 112453"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145973220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dib.2026.112448
Simeon Okechukwu Ajakwe , Vivian Ukamaka Ihekoronye , Golam Mohtasin , Rubina Akter , Jae Min Lee , Dong Seong Kim
The rapid proliferation of unmanned aerial vehicles (UAVs) for logistics, surveillance, and civilian applications continues to pose significant challenges to airspace security, particularly through unauthorized or malicious deployments. Existing UAV datasets are limited in scope, often focusing on single-drone scenarios, synthetic imagery, or restricted environmental conditions, thereby constraining the development of robust counter-UAV systems. To bridge these gaps, we present vision-based drone detection dataset named as VisioDECT, a comprehensive and scenario-rich dataset for multi-drone detection, identification, and neutralization. The dataset comprises 20,924 annotated images and labels from six UAV models (Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2, and Mavic 2 Enterprise), captured across three distinct scenarios (sunny, cloudy, and evening) at varying altitudes (30–100 m) and distances. Importantly, all UAVs included in this dataset are rotary-wing (multirotor) platforms, which dominate low-altitude airspace and are the most commonly encountered in real-world surveillance and counter-UAV scenarios. Data were collected over 20 months from more than 12 locations in South Korea, ensuring diversity in illumination, weather, and background complexity. Each sample is provided in three standard formats (.txt, .xml, .csv), with detailed metadata and quality-verified annotations for detection and classification tasks. Illustrative benchmark evaluations using state-of-the-art detection models (e.g., DRONET, YOLO variants) are included solely to validate the quality and practical usability of the dataset for real-time drone defense research. VisioDECT provides a standardized, reproducible, and scalable resource that enables benchmarking, model training, and evaluation for airspace surveillance, UAV traffic management, and national security applications.
{"title":"VisioDECT: A robust dataset for aerial and scenario based multi-drone detection, identification, and neutralization","authors":"Simeon Okechukwu Ajakwe , Vivian Ukamaka Ihekoronye , Golam Mohtasin , Rubina Akter , Jae Min Lee , Dong Seong Kim","doi":"10.1016/j.dib.2026.112448","DOIUrl":"10.1016/j.dib.2026.112448","url":null,"abstract":"<div><div>The rapid proliferation of unmanned aerial vehicles (UAVs) for logistics, surveillance, and civilian applications continues to pose significant challenges to airspace security, particularly through unauthorized or malicious deployments. Existing UAV datasets are limited in scope, often focusing on single-drone scenarios, synthetic imagery, or restricted environmental conditions, thereby constraining the development of robust counter-UAV systems. To bridge these gaps, we present vision-based drone detection dataset named as <strong>VisioDECT</strong>, a comprehensive and scenario-rich dataset for multi-drone detection, identification, and neutralization. The dataset comprises 20,924 annotated images and labels from six UAV models (Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2, and Mavic 2 Enterprise), captured across three distinct scenarios (sunny, cloudy, and evening) at varying altitudes (30–100 m) and distances. Importantly, all UAVs included in this dataset are rotary-wing (multirotor) platforms, which dominate low-altitude airspace and are the most commonly encountered in real-world surveillance and counter-UAV scenarios. Data were collected over 20 months from more than 12 locations in South Korea, ensuring diversity in illumination, weather, and background complexity. Each sample is provided in three standard formats (.txt, .xml, .csv), with detailed metadata and quality-verified annotations for detection and classification tasks. Illustrative benchmark evaluations using state-of-the-art detection models (e.g., DRONET, YOLO variants) are included solely to validate the quality and practical usability of the dataset for real-time drone defense research. VisioDECT provides a standardized, reproducible, and scalable resource that enables benchmarking, model training, and evaluation for airspace surveillance, UAV traffic management, and national security applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112448"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dib.2026.112454
Bidyut R. Mohapatra, Linel S. Moralez, Kiya E. James
This study reports the whole-genome sequence data and functional annotations of a novel Stutzerimonas marianensis strain LB-0542 isolated from the decomposing pelagic Sargassum biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L50 of 2 and a N50 of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of Sargassum and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.
{"title":"Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits","authors":"Bidyut R. Mohapatra, Linel S. Moralez, Kiya E. James","doi":"10.1016/j.dib.2026.112454","DOIUrl":"10.1016/j.dib.2026.112454","url":null,"abstract":"<div><div>This study reports the whole-genome sequence data and functional annotations of a novel <em>Stutzerimonas marianensis</em> strain LB-0542 isolated from the decomposing pelagic <em>Sargassum</em> biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L<sub>50</sub> of 2 and a N<sub>50</sub> of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of <em>Sargassum</em> and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112454"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dib.2026.112455
Chao LI , Chen Zhang , Wenbo Zhang , Chengzhen LV , Yaqiang Li , Yufen Wang
This study employed an HY-6010-S hyperspectral imaging system, covering a spectral range of 400–1000 nm, combined with an RGB industrial camera to acquire multimodal data. The dataset simulates phenotypic analysis scenarios of maize seeds under controlled laboratory conditions, with the ambient temperature maintained at 20–25°C. Comprehensive testing was conducted using 12 different maize varieties. Approximately 200 seed samples were collected per variety, resulting in a total sample size of about 2400, each subjected to hyperspectral and RGB image acquisition. Preprocessing steps included noise reduction, background removal, band selection, and modality alignment. To ensure the accuracy and reliability of the experimental data, HHIT software and Python were utilized for data processing. This dataset plays a significant role in seed variety classification, phenotypic analysis, precision agriculture, and machine learning applications.
{"title":"Corn seed dataset based on hyperspectral and RGB images","authors":"Chao LI , Chen Zhang , Wenbo Zhang , Chengzhen LV , Yaqiang Li , Yufen Wang","doi":"10.1016/j.dib.2026.112455","DOIUrl":"10.1016/j.dib.2026.112455","url":null,"abstract":"<div><div>This study employed an HY-6010-S hyperspectral imaging system, covering a spectral range of 400–1000 nm, combined with an RGB industrial camera to acquire multimodal data. The dataset simulates phenotypic analysis scenarios of maize seeds under controlled laboratory conditions, with the ambient temperature maintained at 20–25°C. Comprehensive testing was conducted using 12 different maize varieties. Approximately 200 seed samples were collected per variety, resulting in a total sample size of about 2400, each subjected to hyperspectral and RGB image acquisition. Preprocessing steps included noise reduction, background removal, band selection, and modality alignment. To ensure the accuracy and reliability of the experimental data, HHIT software and Python were utilized for data processing. This dataset plays a significant role in seed variety classification, phenotypic analysis, precision agriculture, and machine learning applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112455"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.dib.2026.112452
Oguz Akbilgic , Ibrahim Karabayir , Luke Patterson , Stephanie B. Dixon , Daniel A. Mulrooney , Kirsten K. Ness , Melissa M. Hudson
Childhood cancer survivors (CCS), exposed to prior cardiotoxic treatments such as anthracyclines and chest radiation, are at lifelong risk of cardiovascular complications. Current guidelines recommend periodic echocardiographic surveillance, but adherence rates are as low as 41%. This dataset provides paired same-day 12-lead clinical electrocardiograms (ECG) and single-lead wearable ECG recordings from the Apple Watch, collected from adult CCS participating in the St. Jude Lifetime Cohort Study (SJLIFE). The availability of paired wearable and clinical ECGs enables the development and validation of remote AI-based cardiac screening tools, potentially leading to more precise long-term cardiovascular surveillance in this population. Using this dataset, researchers can assess whether an AI model developed using clinical ECG can be repeat when using ECG from an Apple Watch.
{"title":"Paired clinical 12 lead and apple watch electrocardiogram data repository from childhood cancer survivors authors","authors":"Oguz Akbilgic , Ibrahim Karabayir , Luke Patterson , Stephanie B. Dixon , Daniel A. Mulrooney , Kirsten K. Ness , Melissa M. Hudson","doi":"10.1016/j.dib.2026.112452","DOIUrl":"10.1016/j.dib.2026.112452","url":null,"abstract":"<div><div>Childhood cancer survivors (CCS), exposed to prior cardiotoxic treatments such as anthracyclines and chest radiation, are at lifelong risk of cardiovascular complications. Current guidelines recommend periodic echocardiographic surveillance, but adherence rates are as low as 41%. This dataset provides paired same-day 12-lead clinical electrocardiograms (ECG) and single-lead wearable ECG recordings from the Apple Watch, collected from adult CCS participating in the St. Jude Lifetime Cohort Study (SJLIFE). The availability of paired wearable and clinical ECGs enables the development and validation of remote AI-based cardiac screening tools, potentially leading to more precise long-term cardiovascular surveillance in this population. Using this dataset, researchers can assess whether an AI model developed using clinical ECG can be repeat when using ECG from an Apple Watch.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112452"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantitative metallography of ferrite-pearlite steels is essential for establishing structure-property correlations, yet manual characterization is labour-intensive and prone to bias. This article presents a large-scale synthetic dataset designed to train and benchmark deep learning models for the automated segmentation of pearlite colonies and ferrite grains. The dataset was generated using a computational pipeline that superimposes experimentally obtained ferrite and pearlite morphological textures onto simulated polycrystalline templates generated via nucleation and growth phenomena. The primary parameter investigated was the geometric lamellar orientation of pearlite colonies, which was categorized into 10 distinct classes (20° angular bins and a background ferrite class) relative to the image frame. The resulting dataset comprises 10,499 synthetic micrographs (512 × 512 pixels) paired with pixel-perfect ground truth segmentation masks. This data provides a robust resource for developing computer vision algorithms capable of discerning pearlite colonies based on the geometric orientation of their lamellae, thereby facilitating high-throughput quantitative analysis in materials science.
{"title":"Benchmarking geometric lamellar orientation: A large-scale synthetic dataset for quantification of ferrite-pearlite steels","authors":"Nikhil Chaurasia, Sandeep Sangal, Shikhar Krishn Jha","doi":"10.1016/j.dib.2025.112439","DOIUrl":"10.1016/j.dib.2025.112439","url":null,"abstract":"<div><div>Quantitative metallography of ferrite-pearlite steels is essential for establishing structure-property correlations, yet manual characterization is labour-intensive and prone to bias. This article presents a large-scale synthetic dataset designed to train and benchmark deep learning models for the automated segmentation of pearlite colonies and ferrite grains. The dataset was generated using a computational pipeline that superimposes experimentally obtained ferrite and pearlite morphological textures onto simulated polycrystalline templates generated via nucleation and growth phenomena. The primary parameter investigated was the geometric lamellar orientation of pearlite colonies, which was categorized into 10 distinct classes (20° angular bins and a background ferrite class) relative to the image frame. The resulting dataset comprises 10,499 synthetic micrographs (512 × 512 pixels) paired with pixel-perfect ground truth segmentation masks. This data provides a robust resource for developing computer vision algorithms capable of discerning pearlite colonies based on the geometric orientation of their lamellae, thereby facilitating high-throughput quantitative analysis in materials science.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112439"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}