Data in Brief最新文献_第9页

A clinical dataset on type-2 diabetes including demographic, anthropometric, and biochemical parameters from Bangladesh 来自孟加拉国的2型糖尿病临床数据集，包括人口统计学、人体测量学和生化参数

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-10 DOI: 10.1016/j.dib.2026.112457

Md. Younus Bhuiyan , Shahriar Siddique Ayon , Md. Ebrahim Hossain , Md. Saef Ullah Miah , Afjal H. Sarower , Fateha khanam Bappee

Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.

2型糖尿病是孟加拉国的一个主要公共卫生问题，该数据集提供了1065份精心整理的患者记录，其中包含与评估相关的人口统计学、人体测量学和临床变量。数据是在常规临床访问期间收集的，并由训练有素的工作人员记录，并检查以确保准确性和完整性。它包括基本细节，如年龄、怀孕数、体重指数和皮肤褶皱厚度；生命体征，如血压；与血糖相关的实验室结果（空腹血糖和胰岛素）；糖尿病谱系功能；以及2型糖尿病简单的是/否标签。舒张压和皮肤褶皱厚度的一些值缺失，因此用户应小心处理这些值。由于数据是横断面的，并且来自寻求治疗的患者，因此糖尿病病例（840例）多于非糖尿病病例（225例）。该数据集旨在用于方法开发（例如，机器学习分类器训练，特征选择基准测试和过采样/归算研究），用于南亚临床环境中特定背景的流行病学描述和模型验证，以及作为可重复的生物医学数据工作流程的教学资源。

{"title":"A clinical dataset on type-2 diabetes including demographic, anthropometric, and biochemical parameters from Bangladesh","authors":"Md. Younus Bhuiyan , Shahriar Siddique Ayon , Md. Ebrahim Hossain , Md. Saef Ullah Miah , Afjal H. Sarower , Fateha khanam Bappee","doi":"10.1016/j.dib.2026.112457","DOIUrl":"10.1016/j.dib.2026.112457","url":null,"abstract":"<div><div>Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112457"},"PeriodicalIF":1.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Single-cell RNA-seq data of wild type and fli1b mutant zebrafish embryos 野生型和fli1b突变斑马鱼胚胎的单细胞RNA-seq数据

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-10 DOI: 10.1016/j.dib.2026.112459

Luiza N. Loges , Ricardo DeMoya , Valentina Laverde, Saulius Sumanas

Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic fli1b mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) fli1b mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and fli1b mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.

Fli1b是一种ETS转录因子，先前已发现与斑马鱼血管和造血发育有关。在这里，我们展示了受精后24小时野生型和母体合子fli1b突变斑马鱼胚胎的单细胞RNA测序数据。从大约40个全母合子（MZ） fli1b突变体和兄弟亲本野生型胚胎中获得单细胞悬液，并使用10X Genomics Chromium平台进行RNA测序。经过生物信息学分析，在整合的野生型和fli1b突变数据集中鉴定出34个不同的细胞簇。随后根据标记基因的表达对聚类进行注释。这些数据将为进一步研究血管和造血发育的分子机制提供有价值的信息。此外，获得的多种细胞类型的转录组将有助于研究斑马鱼和其他模型的其他发育机制。

引用次数: 0

Indonesian pharmaceutical dataset for self-medication 印尼医药数据集自我用药

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-09 DOI: 10.1016/j.dib.2026.112460

Richard Wiputra , Carrie Florista Benjaminsz , Andrian Loria , Rafaell Widjaya , Rudy , Andry Chowanda

The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users’ needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.

印度尼西亚自我用药药物数据集由两个结构化数据集组成，其中包含一些最重要的公共卫生信息：药物数据集和疾病数据集。两者都是从印度尼西亚注册和受监管的远程医疗提供商的网站上提取的。药物数据集包含关于药物、适应症、剂量、副作用、禁忌症和警告的一般数据，而疾病数据集包含疾病的定义、描述、症状和原因。这两个数据集均以CSV文件格式提供，并仅以印尼语提供，以保持与源内容的一致性，并满足当地用户的需求。这些数据集可用于促进研究、应用程序开发和印度尼西亚卫生信息系统，通过当地背景和可访问的卫生数据供印度尼西亚人口使用。一些潜在的应用包括为健康聊天机器人提供动力，武装医疗搜索工具，指导健康素养计划，以及促进将标准化的本地信息集成到健康技术平台中。

{"title":"Indonesian pharmaceutical dataset for self-medication","authors":"Richard Wiputra , Carrie Florista Benjaminsz , Andrian Loria , Rafaell Widjaya , Rudy , Andry Chowanda","doi":"10.1016/j.dib.2026.112460","DOIUrl":"10.1016/j.dib.2026.112460","url":null,"abstract":"<div><div>The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users’ needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112460"},"PeriodicalIF":1.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Uav-based multisensor framework for legal industrial Cannabis monitoring and open-access dataset development 基于无人机的多传感器框架，用于合法工业大麻监测和开放获取数据集开发

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-09 DOI: 10.1016/j.dib.2026.112463

Genta Rexha , Ina Papadhopulli , Aleksandër Biberaj , Elson Agastra , Enida Sheme , Elinda Meçe

Industrial hemp cultivation is expanding and requires reliable monitoring for legal compliance and agricultural management. This paper presents a standardized UAV-based multisensor framework designed for Cannabis sativa L. It integrates RGB, multispectral, and thermal imaging as core modules, with hyperspectral and LiDAR as optional extensions. The framework sets protocols for sensor integration, flight planning, field measurements, and annotation, ensuring datasets that meet EU altitude limits (≤120 m AGL). Multi-altitude and multi-time-of-day acquisitions are proposed to capture spatial and diurnal variability. These data improve model robustness for phenotyping, stress detection, and THC compliance verification. Potential applications include precision agriculture, breeding, regulatory monitoring, environmental assessment, and illicit crop detection. Open-access datasets generated through this framework will support reproducibility, machine learning development, and collaboration among researchers, farmers, and regulators.

工业大麻种植正在扩大，需要对法律合规和农业管理进行可靠的监测。本文提出了一种针对大麻的标准化无人机多传感器框架，该框架将RGB、多光谱和热成像作为核心模块，高光谱和激光雷达作为可选扩展模块。该框架为传感器集成、飞行计划、现场测量和注释设置协议，确保数据集符合欧盟高度限制（≤120米AGL）。提出了多海拔和多时段采集以捕获空间和日变化。这些数据提高了模型在表型、应力检测和THC依从性验证方面的稳健性。潜在的应用包括精准农业、育种、监管监测、环境评估和非法作物检测。通过该框架生成的开放获取数据集将支持可重复性、机器学习开发以及研究人员、农民和监管机构之间的合作。

引用次数: 0

Draft genome data analysis and pathogenicity profiling of Staphylococcus aureus strain IHS3A with antibiotic resistance genes isolated from a hospital in Jordan 从约旦一家医院分离的具有抗生素耐药基因的金黄色葡萄球菌IHS3A菌株基因组数据分析和致病性谱草稿

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112453

Saqr Abushattal , Sulaiman M. Alnaimat , Nidal Odat , Mahmoud Abushattal

This dataset provides a comprehensive genomic and pathogenicity profiling of Staphylococcus aureus strain IHS3A, a methicillin-resistant (MRSA) clinical isolate obtained from a healthcare worker in a teaching hospital in Jordan, Middle East. Whole genome sequencing was performed using the Illumina NextSeq 2000 platform, followed by high-quality de novo assembly using SPAdes. The genome spans 2821,373 bp across 90 contigs, with a GC content of 32.78%, and demonstrates high-quality metrics, including 99.67% completeness and minimal contamination (0.08%). The genome analysis identified 2611 predicted protein-coding sequences. Multilocus sequence typing (MLST) assigned the isolate to ST10647, SCCmec typing revealed type IVc (2B), and spa typing identified t131. The dataset includes comprehensive annotations of key antimicrobial resistance genes, such as mecA (methicillin resistance), blaZ (penicillin resistance), and lmrS (macrolide efflux), as well as virulence factors related to adherence (e.g., atl, clfA), immune evasion (e.g., scn, adsA), secretion systems (e.g., esaA, esaB), and toxins (e.g., hla, lukF-PV, tsst). Secondary metabolite biosynthetic gene clusters, such as staphyloferrin B and staphylopine, were identified. The genome also encodes a diverse carbohydrate-active enzyme (CAZyme) profile. These genomic data are valuable for further research on MRSA evolution, resistance mechanisms, and virulence factors in Jordan and the Middle East. The genome data have been deposited in the NCBI database under the accession number JBPPGA000000000, with a direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1. Bioproject: PRJNA1283614, Biosample: SAMN49700843.

该数据集提供了金黄色葡萄球菌菌株IHS3A的全面基因组和致病性分析，这是一种耐甲氧西林（MRSA）临床分离物，来自中东约旦一家教学医院的一名卫生保健工作者。使用Illumina NextSeq 2000平台进行全基因组测序，然后使用SPAdes进行高质量的从头组装。该基因组全长2821,373 bp，共90个contigs， GC含量为32.78%，具有高质量的指标，包括99.67%的完整性和最小污染（0.08%）。基因组分析鉴定出2611个预测蛋白编码序列。多位点序列分型（MLST）鉴定分离株为ST10647型，SCCmec分型鉴定为IVc型（2B）， spa分型鉴定为t131型。该数据集包括关键抗菌素耐药基因的综合注释，如mecA（甲氧西林耐药）、blaZ（青霉素耐药）和lmrS（大环内酯外排），以及与粘附（如atl、clfA）、免疫逃避（如scn、adsA）、分泌系统（如esaA、esaB）和毒素（如hla、lukF-PV、tsst）相关的毒力因子。次生代谢物生物合成基因簇，如葡萄铁蛋白B和葡萄蛋白。基因组还编码多种碳水化合物活性酶（CAZyme）谱。这些基因组数据对进一步研究约旦和中东地区的MRSA进化、耐药机制和毒力因素具有重要价值。基因组数据已存入NCBI数据库，登录号为JBPPGA000000000，数据的直接URL为：https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1。生物工程：PRJNA1283614，生物样品：SAMN49700843。

{"title":"Draft genome data analysis and pathogenicity profiling of Staphylococcus aureus strain IHS3A with antibiotic resistance genes isolated from a hospital in Jordan","authors":"Saqr Abushattal , Sulaiman M. Alnaimat , Nidal Odat , Mahmoud Abushattal","doi":"10.1016/j.dib.2026.112453","DOIUrl":"10.1016/j.dib.2026.112453","url":null,"abstract":"<div><div>This dataset provides a comprehensive genomic and pathogenicity profiling of <em>Staphylococcus aureus</em> strain IHS3A, a methicillin-resistant (MRSA) clinical isolate obtained from a healthcare worker in a teaching hospital in Jordan, Middle East. Whole genome sequencing was performed using the Illumina NextSeq 2000 platform, followed by high-quality de novo assembly using SPAdes. The genome spans 2821,373 bp across 90 contigs, with a GC content of 32.78%, and demonstrates high-quality metrics, including 99.67% completeness and minimal contamination (0.08%). The genome analysis identified 2611 predicted protein-coding sequences. Multilocus sequence typing (MLST) assigned the isolate to ST10647, SCC<em>mec</em> typing revealed type IVc (2B), and spa typing identified t131. The dataset includes comprehensive annotations of key antimicrobial resistance genes, such as <em>mecA</em> (methicillin resistance), <em>blaZ</em> (penicillin resistance), and <em>lmrS</em> (macrolide efflux), as well as virulence factors related to adherence (e.g., <em>atl, clfA</em>), immune evasion (e.g., <em>scn, adsA</em>), secretion systems (e.g., <em>esaA, esaB</em>), and toxins (e.g., <em>hla, lukF</em>-<em>PV, tsst</em>). Secondary metabolite biosynthetic gene clusters, such as staphyloferrin B and staphylopine, were identified. The genome also encodes a diverse carbohydrate-active enzyme (CAZyme) profile. These genomic data are valuable for further research on MRSA evolution, resistance mechanisms, and virulence factors in Jordan and the Middle East. The genome data have been deposited in the NCBI database under the accession number JBPPGA000000000, with a direct URL to data: <span><span>https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1</span><svg><path></path></svg></span>. Bioproject: PRJNA1283614, Biosample: SAMN49700843.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"64 ","pages":"Article 112453"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145973220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VisioDECT: A robust dataset for aerial and scenario based multi-drone detection, identification, and neutralization VisioDECT：一个强大的数据集，用于空中和基于场景的多无人机检测、识别和中和

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112448

Simeon Okechukwu Ajakwe , Vivian Ukamaka Ihekoronye , Golam Mohtasin , Rubina Akter , Jae Min Lee , Dong Seong Kim

The rapid proliferation of unmanned aerial vehicles (UAVs) for logistics, surveillance, and civilian applications continues to pose significant challenges to airspace security, particularly through unauthorized or malicious deployments. Existing UAV datasets are limited in scope, often focusing on single-drone scenarios, synthetic imagery, or restricted environmental conditions, thereby constraining the development of robust counter-UAV systems. To bridge these gaps, we present vision-based drone detection dataset named as VisioDECT, a comprehensive and scenario-rich dataset for multi-drone detection, identification, and neutralization. The dataset comprises 20,924 annotated images and labels from six UAV models (Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2, and Mavic 2 Enterprise), captured across three distinct scenarios (sunny, cloudy, and evening) at varying altitudes (30–100 m) and distances. Importantly, all UAVs included in this dataset are rotary-wing (multirotor) platforms, which dominate low-altitude airspace and are the most commonly encountered in real-world surveillance and counter-UAV scenarios. Data were collected over 20 months from more than 12 locations in South Korea, ensuring diversity in illumination, weather, and background complexity. Each sample is provided in three standard formats (.txt, .xml, .csv), with detailed metadata and quality-verified annotations for detection and classification tasks. Illustrative benchmark evaluations using state-of-the-art detection models (e.g., DRONET, YOLO variants) are included solely to validate the quality and practical usability of the dataset for real-time drone defense research. VisioDECT provides a standardized, reproducible, and scalable resource that enables benchmarking, model training, and evaluation for airspace surveillance, UAV traffic management, and national security applications.

用于后勤、监视和民用应用的无人机（uav）的快速扩散继续对空域安全构成重大挑战，特别是通过未经授权或恶意部署。现有的无人机数据集在范围上是有限的，通常聚焦于单无人机场景、合成图像或受限的环境条件，从而限制了鲁棒反无人机系统的发展。为了弥补这些差距，我们提出了基于视觉的无人机检测数据集VisioDECT，这是一个全面的、场景丰富的多无人机检测、识别和中和数据集。该数据集包括来自六种无人机型号（Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S， Mavic Air 2和Mavic 2 Enterprise）的20,924张注释图像和标签，在不同高度（30-100米）和距离的三种不同场景（晴天，多云和傍晚）中捕获。重要的是，该数据集中包含的所有无人机都是旋翼（多旋翼）平台，它们主导着低空空域，在现实世界的监视和反无人机场景中最常见。数据在20个月内从韩国超过12个地点收集，确保了照明、天气和背景复杂性的多样性。每个示例都以三种标准格式（.txt、.xml和.csv）提供，并提供详细的元数据和用于检测和分类任务的经过质量验证的注释。使用最先进的检测模型（例如，DRONET， YOLO变体）进行说明性基准评估，仅用于验证实时无人机防御研究数据集的质量和实际可用性。VisioDECT提供了标准化、可复制和可扩展的资源，可以为空域监视、无人机交通管理和国家安全应用提供基准测试、模型训练和评估。

{"title":"VisioDECT: A robust dataset for aerial and scenario based multi-drone detection, identification, and neutralization","authors":"Simeon Okechukwu Ajakwe , Vivian Ukamaka Ihekoronye , Golam Mohtasin , Rubina Akter , Jae Min Lee , Dong Seong Kim","doi":"10.1016/j.dib.2026.112448","DOIUrl":"10.1016/j.dib.2026.112448","url":null,"abstract":"<div><div>The rapid proliferation of unmanned aerial vehicles (UAVs) for logistics, surveillance, and civilian applications continues to pose significant challenges to airspace security, particularly through unauthorized or malicious deployments. Existing UAV datasets are limited in scope, often focusing on single-drone scenarios, synthetic imagery, or restricted environmental conditions, thereby constraining the development of robust counter-UAV systems. To bridge these gaps, we present vision-based drone detection dataset named as <strong>VisioDECT</strong>, a comprehensive and scenario-rich dataset for multi-drone detection, identification, and neutralization. The dataset comprises 20,924 annotated images and labels from six UAV models (Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2, and Mavic 2 Enterprise), captured across three distinct scenarios (sunny, cloudy, and evening) at varying altitudes (30–100 m) and distances. Importantly, all UAVs included in this dataset are rotary-wing (multirotor) platforms, which dominate low-altitude airspace and are the most commonly encountered in real-world surveillance and counter-UAV scenarios. Data were collected over 20 months from more than 12 locations in South Korea, ensuring diversity in illumination, weather, and background complexity. Each sample is provided in three standard formats (.txt, .xml, .csv), with detailed metadata and quality-verified annotations for detection and classification tasks. Illustrative benchmark evaluations using state-of-the-art detection models (e.g., DRONET, YOLO variants) are included solely to validate the quality and practical usability of the dataset for real-time drone defense research. VisioDECT provides a standardized, reproducible, and scalable resource that enables benchmarking, model training, and evaluation for airspace surveillance, UAV traffic management, and national security applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112448"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits 从马尾藻废料中分离的新型马里安Stutzerimonas marianensis LB-0542的基因组数据挖掘及其对塑料降解和植物生长的促进作用

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112454

Bidyut R. Mohapatra, Linel S. Moralez, Kiya E. James

This study reports the whole-genome sequence data and functional annotations of a novel Stutzerimonas marianensis strain LB-0542 isolated from the decomposing pelagic Sargassum biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L₅₀ of 2 and a N₅₀ of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of Sargassum and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.

本研究报道了从搁浅在巴巴多斯长滩的浮游马尾藻分解生物量中分离到的一株新的Stutzerimonas marianensis菌株LB-0542的全基因组序列数据和功能注释。基因组DNA测序采用Illumina NextSeq2000平台。使用SPAdes genome Assembler（3.15.5版）进行基因组组装。组装的基因组大小为4520,813 bp，覆盖率为110X， GC含量为63.2%，L50为2，N50为1079,143 bp。基因组由12个contigs、0个CRISPR、3个rRNA、56个tRNA和4166个cds（编码序列）组成，编码率为89.4%。同源基因簇（COG）和子系统特征的基因组注释结果表明，代谢和氨基酸及其衍生物分别是最主要的类别。对碳水化合物活性酶（CAZymes）的基因组分析鉴定出230个基因编码4个功能类别的碳水化合物酶[糖苷水解酶（75个基因）、糖基转移酶（95个基因）、碳水化合物酯酶（9个基因）和碳水化合物结合模块（51个基因）]。塑料降解基因组的功能注释显示，存在34个基因，可以催化14种塑料的降解过程，聚乙二醇[PEG(29%)]，聚乳酸[PLA(11%)]，聚（3-羟基丁酸酯-co-3-羟基戊酸酯）[PHBV(9%)]，聚羟基烷酸酯[PHA(9%)]，聚乙烯[PE(6%)]，聚己内酯[PCL(6%)]，聚醚砜[PES(6%)]，聚对苯二甲酸乙二醇酯[PET (6%)]，聚苯乙烯[PS(3%)]、聚丁二酸丁二酯[PBAT(3%)]、聚苯乙烯[PS(3%)]、聚丁二酸丁二酯[PBSA(3%)]、聚3-羟戊酸酯[P3HV(3%)]、聚乙烯醇[PVA(3%)]和天然橡胶[NR(3%)]。对植物生长促进性状的基因组挖掘发现了3175个与定植植物系统（26%）、竞争排斥（21%）、胁迫控制（21%）、生物施肥（14%）、植物激素和植物信号产生（10%）、生物修复（7%）和植物免疫应答刺激（1%）相关的基因。这些基因组挖掘结果表明，新菌株LB-0542对马尾藻和含塑料废物的可持续生物催化处理具有生物技术和生态意义。基因组序列数据可在DDBJ/EMBL/GenBank中查询，登录号BAAIAE000000000。

{"title":"Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits","authors":"Bidyut R. Mohapatra, Linel S. Moralez, Kiya E. James","doi":"10.1016/j.dib.2026.112454","DOIUrl":"10.1016/j.dib.2026.112454","url":null,"abstract":"<div><div>This study reports the whole-genome sequence data and functional annotations of a novel <em>Stutzerimonas marianensis</em> strain LB-0542 isolated from the decomposing pelagic <em>Sargassum</em> biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L<sub>50</sub> of 2 and a N<sub>50</sub> of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of <em>Sargassum</em> and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112454"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corn seed dataset based on hyperspectral and RGB images 基于高光谱和RGB图像的玉米种子数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112455

Chao LI , Chen Zhang , Wenbo Zhang , Chengzhen LV , Yaqiang Li , Yufen Wang

This study employed an HY-6010-S hyperspectral imaging system, covering a spectral range of 400–1000 nm, combined with an RGB industrial camera to acquire multimodal data. The dataset simulates phenotypic analysis scenarios of maize seeds under controlled laboratory conditions, with the ambient temperature maintained at 20–25°C. Comprehensive testing was conducted using 12 different maize varieties. Approximately 200 seed samples were collected per variety, resulting in a total sample size of about 2400, each subjected to hyperspectral and RGB image acquisition. Preprocessing steps included noise reduction, background removal, band selection, and modality alignment. To ensure the accuracy and reliability of the experimental data, HHIT software and Python were utilized for data processing. This dataset plays a significant role in seed variety classification, phenotypic analysis, precision agriculture, and machine learning applications.

本研究采用HY-6010-S高光谱成像系统，覆盖400-1000 nm光谱范围，结合RGB工业相机获取多模态数据。该数据集模拟了受控实验室条件下玉米种子的表型分析情景，环境温度保持在20-25℃。采用12个不同的玉米品种进行了综合试验。每个品种大约收集了200个种子样本，总样本量约为2400个，每个样本都进行了高光谱和RGB图像采集。预处理步骤包括降噪、背景去除、波段选择和模态对齐。为了保证实验数据的准确性和可靠性，使用HHIT软件和Python进行数据处理。该数据集在种子品种分类、表型分析、精准农业和机器学习应用中发挥着重要作用。

引用次数: 0

Paired clinical 12 lead and apple watch electrocardiogram data repository from childhood cancer survivors authors 来自儿童癌症幸存者作者的配对临床铅和苹果手表心电图数据库

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112452

Oguz Akbilgic , Ibrahim Karabayir , Luke Patterson , Stephanie B. Dixon , Daniel A. Mulrooney , Kirsten K. Ness , Melissa M. Hudson

Childhood cancer survivors (CCS), exposed to prior cardiotoxic treatments such as anthracyclines and chest radiation, are at lifelong risk of cardiovascular complications. Current guidelines recommend periodic echocardiographic surveillance, but adherence rates are as low as 41%. This dataset provides paired same-day 12-lead clinical electrocardiograms (ECG) and single-lead wearable ECG recordings from the Apple Watch, collected from adult CCS participating in the St. Jude Lifetime Cohort Study (SJLIFE). The availability of paired wearable and clinical ECGs enables the development and validation of remote AI-based cardiac screening tools, potentially leading to more precise long-term cardiovascular surveillance in this population. Using this dataset, researchers can assess whether an AI model developed using clinical ECG can be repeat when using ECG from an Apple Watch.

儿童癌症幸存者（CCS）先前暴露于蒽环类药物和胸部放射等心脏毒性治疗，终生面临心血管并发症的风险。目前的指南建议定期超声心动图监测，但依从率低至41%。该数据集提供了来自Apple Watch的配对当日12导联临床心电图（ECG）和单导联可穿戴心电图记录，这些记录来自参加St. Jude终身队列研究（SJLIFE）的成人CCS。配对可穿戴和临床心电图的可用性使基于人工智能的远程心脏筛查工具的开发和验证成为可能，从而在这一人群中实现更精确的长期心血管监测。利用该数据集，研究人员可以评估使用临床心电图开发的人工智能模型是否可以在使用苹果手表的心电图时重复。

引用次数: 0

Benchmarking geometric lamellar orientation: A large-scale synthetic dataset for quantification of ferrite-pearlite steels 基准几何层状取向：铁素体-珠光体钢定量的大规模合成数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief

Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2025.112439

Nikhil Chaurasia, Sandeep Sangal, Shikhar Krishn Jha

Quantitative metallography of ferrite-pearlite steels is essential for establishing structure-property correlations, yet manual characterization is labour-intensive and prone to bias. This article presents a large-scale synthetic dataset designed to train and benchmark deep learning models for the automated segmentation of pearlite colonies and ferrite grains. The dataset was generated using a computational pipeline that superimposes experimentally obtained ferrite and pearlite morphological textures onto simulated polycrystalline templates generated via nucleation and growth phenomena. The primary parameter investigated was the geometric lamellar orientation of pearlite colonies, which was categorized into 10 distinct classes (20° angular bins and a background ferrite class) relative to the image frame. The resulting dataset comprises 10,499 synthetic micrographs (512 × 512 pixels) paired with pixel-perfect ground truth segmentation masks. This data provides a robust resource for developing computer vision algorithms capable of discerning pearlite colonies based on the geometric orientation of their lamellae, thereby facilitating high-throughput quantitative analysis in materials science.

铁素体-珠光体钢的定量金相学对于建立组织-性能相关性至关重要，但手工表征是劳动密集型的，容易产生偏差。本文提出了一个大规模的合成数据集，旨在训练和基准深度学习模型，用于珠光体菌落和铁素体晶粒的自动分割。该数据集是通过计算管道生成的，该计算管道将实验获得的铁素体和珠光体形态纹理叠加到通过成核和生长现象生成的模拟多晶模板上。研究的主要参数是珠光体菌落的几何片层取向，相对于图像帧，它们被分为10个不同的类别（20°角的bin和背景铁氧体类别）。结果数据集包括10499张合成显微照片（512 × 512像素），并与像素完美的地面真值分割掩模配对。这些数据为开发计算机视觉算法提供了强大的资源，该算法能够根据珠光体片的几何方向识别珠光体菌落，从而促进材料科学中的高通量定量分析。

{"title":"Benchmarking geometric lamellar orientation: A large-scale synthetic dataset for quantification of ferrite-pearlite steels","authors":"Nikhil Chaurasia, Sandeep Sangal, Shikhar Krishn Jha","doi":"10.1016/j.dib.2025.112439","DOIUrl":"10.1016/j.dib.2025.112439","url":null,"abstract":"<div><div>Quantitative metallography of ferrite-pearlite steels is essential for establishing structure-property correlations, yet manual characterization is labour-intensive and prone to bias. This article presents a large-scale synthetic dataset designed to train and benchmark deep learning models for the automated segmentation of pearlite colonies and ferrite grains. The dataset was generated using a computational pipeline that superimposes experimentally obtained ferrite and pearlite morphological textures onto simulated polycrystalline templates generated via nucleation and growth phenomena. The primary parameter investigated was the geometric lamellar orientation of pearlite colonies, which was categorized into 10 distinct classes (20° angular bins and a background ferrite class) relative to the image frame. The resulting dataset comprises 10,499 synthetic micrographs (512 × 512 pixels) paired with pixel-perfect ground truth segmentation masks. This data provides a robust resource for developing computer vision algorithms capable of discerning pearlite colonies based on the geometric orientation of their lamellae, thereby facilitating high-throughput quantitative analysis in materials science.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112439"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0