首页 > 最新文献

Data in Brief最新文献

英文 中文
BanglaMUSE: A multimodal Bangla sentiment dataset of text–audio pairs for speech and sentiment analysis BanglaMUSE:一个多模态孟加拉语情感数据集,用于语音和情感分析的文本音频对
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-12 DOI: 10.1016/j.dib.2026.112458
Md. Darun Nayeem , Zarin Rafa , Tasnuva Tasnim Nova , Yasin Rahman , Abdul Mumeet Pathan , Md. Masudul Islam
This article describes a publicly available multimodal Bangla sentiment dataset designed to support research in speech processing, sentiment analysis, and low-resource language modeling. The dataset comprises two synchronized modalities: sentiment-annotated Bangla text and corresponding speech recordings. It contains 1,000 manually curated Bangla sentences evenly distributed across positive and negative sentiment classes, alongside 4,000 aligned audio recordings produced by four native speakers. Each sentence is recorded independently by all speakers to ensure speaker diversity while maintaining consistent textual content. The text component reflects natural, everyday Bangla language usage and is structured to facilitate sentiment classification and linguistic analysis. The audio recordings were collected under controlled yet realistic acoustic conditions using multiple recording devices, introducing natural variability relevant for real-world speech applications. All samples underwent manual quality verification to ensure accurate text–audio alignment and to remove noisy or duplicated recordings. The dataset is suitable for a wide range of applications, including multimodal sentiment classification, sentiment-aware speech recognition, audio–text alignment, and benchmarking of multimodal learning approaches for low-resource languages. Its modular structure allows straightforward extension with additional speakers, dialects, or sentiment categories. By providing aligned textual and speech data for Bangla, this dataset contributes a valuable resource to the research community and supports broader efforts toward linguistic diversity in artificial intelligence.
本文描述了一个公开可用的多模态孟加拉语情感数据集,旨在支持语音处理、情感分析和低资源语言建模方面的研究。数据集包括两种同步模式:情感注释的孟加拉语文本和相应的语音记录。它包含1000个人工整理的孟加拉语句子,平均分布在积极和消极情绪类别中,还有4000个由四位母语人士制作的对齐录音。每句话都由所有说话人独立录音,以确保说话人的多样性,同时保持文本内容的一致性。文本成分反映了自然的、日常的孟加拉语用法,其结构便于情感分类和语言分析。录音是在受控的真实声学条件下使用多个录音设备收集的,引入了与现实世界语音应用相关的自然变异性。所有样本都进行了人工质量验证,以确保准确的文本音频对齐,并去除噪音或重复录音。该数据集适用于广泛的应用,包括多模态情感分类、情感感知语音识别、音频文本对齐以及低资源语言的多模态学习方法的基准测试。它的模块化结构允许直接扩展额外的发言者,方言,或情绪类别。通过为孟加拉语提供一致的文本和语音数据,该数据集为研究界提供了宝贵的资源,并支持人工智能中语言多样性的更广泛努力。
{"title":"BanglaMUSE: A multimodal Bangla sentiment dataset of text–audio pairs for speech and sentiment analysis","authors":"Md. Darun Nayeem ,&nbsp;Zarin Rafa ,&nbsp;Tasnuva Tasnim Nova ,&nbsp;Yasin Rahman ,&nbsp;Abdul Mumeet Pathan ,&nbsp;Md. Masudul Islam","doi":"10.1016/j.dib.2026.112458","DOIUrl":"10.1016/j.dib.2026.112458","url":null,"abstract":"<div><div>This article describes a publicly available multimodal Bangla sentiment dataset designed to support research in speech processing, sentiment analysis, and low-resource language modeling. The dataset comprises two synchronized modalities: sentiment-annotated Bangla text and corresponding speech recordings. It contains 1,000 manually curated Bangla sentences evenly distributed across positive and negative sentiment classes, alongside 4,000 aligned audio recordings produced by four native speakers. Each sentence is recorded independently by all speakers to ensure speaker diversity while maintaining consistent textual content. The text component reflects natural, everyday Bangla language usage and is structured to facilitate sentiment classification and linguistic analysis. The audio recordings were collected under controlled yet realistic acoustic conditions using multiple recording devices, introducing natural variability relevant for real-world speech applications. All samples underwent manual quality verification to ensure accurate text–audio alignment and to remove noisy or duplicated recordings. The dataset is suitable for a wide range of applications, including multimodal sentiment classification, sentiment-aware speech recognition, audio–text alignment, and benchmarking of multimodal learning approaches for low-resource languages. Its modular structure allows straightforward extension with additional speakers, dialects, or sentiment categories. By providing aligned textual and speech data for Bangla, this dataset contributes a valuable resource to the research community and supports broader efforts toward linguistic diversity in artificial intelligence.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112458"},"PeriodicalIF":1.4,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep-sea image dataset for organism detection 用于生物检测的深海图像数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-10 DOI: 10.1016/j.dib.2026.112462
Takaki Nishio, Yuki Kawae
The conservation of marine resources and the mitigation of marine pollution require strengthened knowledge of marine biodiversity, particularly in the deep sea. Videos and images are valuable for documenting the distribution of deep-sea organisms, but manual processing is labor-intensive and variable, emphasizing the need for automated methods. To address this, the J-EDI Organism Detection Dataset (JODD) is introduced. This dataset comprises 8151 images and 15,621 bounding boxes annotated in the Common Objects in Context (COCO) format. The images were captured during deep-sea surveys conducted by the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) between 1984 and 2021, using remotely operated vehicles (ROVs) and human-occupied vehicles (HOVs). All images were derived from publicly available videos in JAMSTEC’s E-library of Deep-sea Images (J-EDI). The dataset includes 20 object categories—19 biological groups and one machine category—providing a reusable resource for developing and benchmarking machine learning models for the automatic detection of deep-sea organisms.
养护海洋资源和减轻海洋污染需要加强对海洋生物多样性的认识,特别是对深海生物多样性的认识。视频和图像对于记录深海生物的分布是有价值的,但人工处理是劳动密集型的,而且是可变的,强调了自动化方法的必要性。为了解决这个问题,引入了J-EDI生物检测数据集(JODD)。该数据集包括8151张图像和15621个边界框,以Common Objects in Context (COCO)格式标注。这些图像是在1984年至2021年期间由日本海洋地球科学技术机构(JAMSTEC)使用远程操作车辆(rov)和载人车辆(hov)进行的深海调查中捕获的。所有图像均来自JAMSTEC的深海图像电子库(J-EDI)中的公开视频。该数据集包括20个对象类别- 19个生物类群和一个机器类别-为深海生物自动检测的机器学习模型的开发和基准测试提供了可重复使用的资源。
{"title":"Deep-sea image dataset for organism detection","authors":"Takaki Nishio,&nbsp;Yuki Kawae","doi":"10.1016/j.dib.2026.112462","DOIUrl":"10.1016/j.dib.2026.112462","url":null,"abstract":"<div><div>The conservation of marine resources and the mitigation of marine pollution require strengthened knowledge of marine biodiversity, particularly in the deep sea. Videos and images are valuable for documenting the distribution of deep-sea organisms, but manual processing is labor-intensive and variable, emphasizing the need for automated methods. To address this, the J-EDI Organism Detection Dataset (JODD) is introduced. This dataset comprises 8151 images and 15,621 bounding boxes annotated in the Common Objects in Context (COCO) format. The images were captured during deep-sea surveys conducted by the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) between 1984 and 2021, using remotely operated vehicles (ROVs) and human-occupied vehicles (HOVs). All images were derived from publicly available videos in JAMSTEC’s E-library of Deep-sea Images (J-EDI). The dataset includes 20 object categories—19 biological groups and one machine category—providing a reusable resource for developing and benchmarking machine learning models for the automatic detection of deep-sea organisms.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112462"},"PeriodicalIF":1.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A clinical dataset on type-2 diabetes including demographic, anthropometric, and biochemical parameters from Bangladesh 来自孟加拉国的2型糖尿病临床数据集,包括人口统计学、人体测量学和生化参数
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-10 DOI: 10.1016/j.dib.2026.112457
Md. Younus Bhuiyan , Shahriar Siddique Ayon , Md. Ebrahim Hossain , Md. Saef Ullah Miah , Afjal H. Sarower , Fateha khanam Bappee
Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.
2型糖尿病是孟加拉国的一个主要公共卫生问题,该数据集提供了1065份精心整理的患者记录,其中包含与评估相关的人口统计学、人体测量学和临床变量。数据是在常规临床访问期间收集的,并由训练有素的工作人员记录,并检查以确保准确性和完整性。它包括基本细节,如年龄、怀孕数、体重指数和皮肤褶皱厚度;生命体征,如血压;与血糖相关的实验室结果(空腹血糖和胰岛素);糖尿病谱系功能;以及2型糖尿病简单的是/否标签。舒张压和皮肤褶皱厚度的一些值缺失,因此用户应小心处理这些值。由于数据是横断面的,并且来自寻求治疗的患者,因此糖尿病病例(840例)多于非糖尿病病例(225例)。该数据集旨在用于方法开发(例如,机器学习分类器训练,特征选择基准测试和过采样/归算研究),用于南亚临床环境中特定背景的流行病学描述和模型验证,以及作为可重复的生物医学数据工作流程的教学资源。
{"title":"A clinical dataset on type-2 diabetes including demographic, anthropometric, and biochemical parameters from Bangladesh","authors":"Md. Younus Bhuiyan ,&nbsp;Shahriar Siddique Ayon ,&nbsp;Md. Ebrahim Hossain ,&nbsp;Md. Saef Ullah Miah ,&nbsp;Afjal H. Sarower ,&nbsp;Fateha khanam Bappee","doi":"10.1016/j.dib.2026.112457","DOIUrl":"10.1016/j.dib.2026.112457","url":null,"abstract":"<div><div>Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112457"},"PeriodicalIF":1.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-cell RNA-seq data of wild type and fli1b mutant zebrafish embryos 野生型和fli1b突变斑马鱼胚胎的单细胞RNA-seq数据
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-10 DOI: 10.1016/j.dib.2026.112459
Luiza N. Loges , Ricardo DeMoya , Valentina Laverde, Saulius Sumanas
Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic fli1b mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) fli1b mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and fli1b mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.
Fli1b是一种ETS转录因子,先前已发现与斑马鱼血管和造血发育有关。在这里,我们展示了受精后24小时野生型和母体合子fli1b突变斑马鱼胚胎的单细胞RNA测序数据。从大约40个全母合子(MZ) fli1b突变体和兄弟亲本野生型胚胎中获得单细胞悬液,并使用10X Genomics Chromium平台进行RNA测序。经过生物信息学分析,在整合的野生型和fli1b突变数据集中鉴定出34个不同的细胞簇。随后根据标记基因的表达对聚类进行注释。这些数据将为进一步研究血管和造血发育的分子机制提供有价值的信息。此外,获得的多种细胞类型的转录组将有助于研究斑马鱼和其他模型的其他发育机制。
{"title":"Single-cell RNA-seq data of wild type and fli1b mutant zebrafish embryos","authors":"Luiza N. Loges ,&nbsp;Ricardo DeMoya ,&nbsp;Valentina Laverde,&nbsp;Saulius Sumanas","doi":"10.1016/j.dib.2026.112459","DOIUrl":"10.1016/j.dib.2026.112459","url":null,"abstract":"<div><div>Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic <em>fli1b</em> mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) <em>fli1b</em> mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and <em>fli1b</em> mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112459"},"PeriodicalIF":1.4,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Indonesian pharmaceutical dataset for self-medication 印尼医药数据集自我用药
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-09 DOI: 10.1016/j.dib.2026.112460
Richard Wiputra , Carrie Florista Benjaminsz , Andrian Loria , Rafaell Widjaya , Rudy , Andry Chowanda
The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users’ needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.
印度尼西亚自我用药药物数据集由两个结构化数据集组成,其中包含一些最重要的公共卫生信息:药物数据集和疾病数据集。两者都是从印度尼西亚注册和受监管的远程医疗提供商的网站上提取的。药物数据集包含关于药物、适应症、剂量、副作用、禁忌症和警告的一般数据,而疾病数据集包含疾病的定义、描述、症状和原因。这两个数据集均以CSV文件格式提供,并仅以印尼语提供,以保持与源内容的一致性,并满足当地用户的需求。这些数据集可用于促进研究、应用程序开发和印度尼西亚卫生信息系统,通过当地背景和可访问的卫生数据供印度尼西亚人口使用。一些潜在的应用包括为健康聊天机器人提供动力,武装医疗搜索工具,指导健康素养计划,以及促进将标准化的本地信息集成到健康技术平台中。
{"title":"Indonesian pharmaceutical dataset for self-medication","authors":"Richard Wiputra ,&nbsp;Carrie Florista Benjaminsz ,&nbsp;Andrian Loria ,&nbsp;Rafaell Widjaya ,&nbsp;Rudy ,&nbsp;Andry Chowanda","doi":"10.1016/j.dib.2026.112460","DOIUrl":"10.1016/j.dib.2026.112460","url":null,"abstract":"<div><div>The Indonesian Pharmaceutical Dataset for Self-medication consists of two structured datasets containing some of the most important public health information: a drug dataset and a disease dataset. Both were extracted from the websites of Indonesian-registered and regulated telemedicine providers. The drug dataset contains general data on drugs, indications, dosages, side effects, contraindications, and warnings, whereas the disease dataset contains definitions, descriptions, symptoms, and causes of diseases. Both datasets are provided in CSV file format and are available exclusively in Bahasa Indonesia to maintain consistency with the source content and cater to local users’ needs. These datasets are available to facilitate research, application development, and Indonesian health information systems through locally contextualized and accessible health data for the Indonesian population to use. Some potential applications include powering health chatbots, arming medical search tools, guiding health literacy programs, and facilitating the integration of standardized local information into HealthTech platforms.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112460"},"PeriodicalIF":1.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Uav-based multisensor framework for legal industrial Cannabis monitoring and open-access dataset development 基于无人机的多传感器框架,用于合法工业大麻监测和开放获取数据集开发
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-09 DOI: 10.1016/j.dib.2026.112463
Genta Rexha , Ina Papadhopulli , Aleksandër Biberaj , Elson Agastra , Enida Sheme , Elinda Meçe
Industrial hemp cultivation is expanding and requires reliable monitoring for legal compliance and agricultural management. This paper presents a standardized UAV-based multisensor framework designed for Cannabis sativa L. It integrates RGB, multispectral, and thermal imaging as core modules, with hyperspectral and LiDAR as optional extensions. The framework sets protocols for sensor integration, flight planning, field measurements, and annotation, ensuring datasets that meet EU altitude limits (≤120 m AGL). Multi-altitude and multi-time-of-day acquisitions are proposed to capture spatial and diurnal variability. These data improve model robustness for phenotyping, stress detection, and THC compliance verification. Potential applications include precision agriculture, breeding, regulatory monitoring, environmental assessment, and illicit crop detection. Open-access datasets generated through this framework will support reproducibility, machine learning development, and collaboration among researchers, farmers, and regulators.
工业大麻种植正在扩大,需要对法律合规和农业管理进行可靠的监测。本文提出了一种针对大麻的标准化无人机多传感器框架,该框架将RGB、多光谱和热成像作为核心模块,高光谱和激光雷达作为可选扩展模块。该框架为传感器集成、飞行计划、现场测量和注释设置协议,确保数据集符合欧盟高度限制(≤120米AGL)。提出了多海拔和多时段采集以捕获空间和日变化。这些数据提高了模型在表型、应力检测和THC依从性验证方面的稳健性。潜在的应用包括精准农业、育种、监管监测、环境评估和非法作物检测。通过该框架生成的开放获取数据集将支持可重复性、机器学习开发以及研究人员、农民和监管机构之间的合作。
{"title":"A Uav-based multisensor framework for legal industrial Cannabis monitoring and open-access dataset development","authors":"Genta Rexha ,&nbsp;Ina Papadhopulli ,&nbsp;Aleksandër Biberaj ,&nbsp;Elson Agastra ,&nbsp;Enida Sheme ,&nbsp;Elinda Meçe","doi":"10.1016/j.dib.2026.112463","DOIUrl":"10.1016/j.dib.2026.112463","url":null,"abstract":"<div><div>Industrial hemp cultivation is expanding and requires reliable monitoring for legal compliance and agricultural management. This paper presents a standardized UAV-based multisensor framework designed for Cannabis sativa L. It integrates RGB, multispectral, and thermal imaging as core modules, with hyperspectral and LiDAR as optional extensions. The framework sets protocols for sensor integration, flight planning, field measurements, and annotation, ensuring datasets that meet EU altitude limits (≤120 m AGL). Multi-altitude and multi-time-of-day acquisitions are proposed to capture spatial and diurnal variability. These data improve model robustness for phenotyping, stress detection, and THC compliance verification. Potential applications include precision agriculture, breeding, regulatory monitoring, environmental assessment, and illicit crop detection. Open-access datasets generated through this framework will support reproducibility, machine learning development, and collaboration among researchers, farmers, and regulators.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112463"},"PeriodicalIF":1.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Draft genome data analysis and pathogenicity profiling of Staphylococcus aureus strain IHS3A with antibiotic resistance genes isolated from a hospital in Jordan 从约旦一家医院分离的具有抗生素耐药基因的金黄色葡萄球菌IHS3A菌株基因组数据分析和致病性谱草稿
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112453
Saqr Abushattal , Sulaiman M. Alnaimat , Nidal Odat , Mahmoud Abushattal
This dataset provides a comprehensive genomic and pathogenicity profiling of Staphylococcus aureus strain IHS3A, a methicillin-resistant (MRSA) clinical isolate obtained from a healthcare worker in a teaching hospital in Jordan, Middle East. Whole genome sequencing was performed using the Illumina NextSeq 2000 platform, followed by high-quality de novo assembly using SPAdes. The genome spans 2821,373 bp across 90 contigs, with a GC content of 32.78%, and demonstrates high-quality metrics, including 99.67% completeness and minimal contamination (0.08%). The genome analysis identified 2611 predicted protein-coding sequences. Multilocus sequence typing (MLST) assigned the isolate to ST10647, SCCmec typing revealed type IVc (2B), and spa typing identified t131. The dataset includes comprehensive annotations of key antimicrobial resistance genes, such as mecA (methicillin resistance), blaZ (penicillin resistance), and lmrS (macrolide efflux), as well as virulence factors related to adherence (e.g., atl, clfA), immune evasion (e.g., scn, adsA), secretion systems (e.g., esaA, esaB), and toxins (e.g., hla, lukF-PV, tsst). Secondary metabolite biosynthetic gene clusters, such as staphyloferrin B and staphylopine, were identified. The genome also encodes a diverse carbohydrate-active enzyme (CAZyme) profile. These genomic data are valuable for further research on MRSA evolution, resistance mechanisms, and virulence factors in Jordan and the Middle East. The genome data have been deposited in the NCBI database under the accession number JBPPGA000000000, with a direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1. Bioproject: PRJNA1283614, Biosample: SAMN49700843.
该数据集提供了金黄色葡萄球菌菌株IHS3A的全面基因组和致病性分析,这是一种耐甲氧西林(MRSA)临床分离物,来自中东约旦一家教学医院的一名卫生保健工作者。使用Illumina NextSeq 2000平台进行全基因组测序,然后使用SPAdes进行高质量的从头组装。该基因组全长2821,373 bp,共90个contigs, GC含量为32.78%,具有高质量的指标,包括99.67%的完整性和最小污染(0.08%)。基因组分析鉴定出2611个预测蛋白编码序列。多位点序列分型(MLST)鉴定分离株为ST10647型,SCCmec分型鉴定为IVc型(2B), spa分型鉴定为t131型。该数据集包括关键抗菌素耐药基因的综合注释,如mecA(甲氧西林耐药)、blaZ(青霉素耐药)和lmrS(大环内酯外排),以及与粘附(如atl、clfA)、免疫逃避(如scn、adsA)、分泌系统(如esaA、esaB)和毒素(如hla、lukF-PV、tsst)相关的毒力因子。次生代谢物生物合成基因簇,如葡萄铁蛋白B和葡萄蛋白。基因组还编码多种碳水化合物活性酶(CAZyme)谱。这些基因组数据对进一步研究约旦和中东地区的MRSA进化、耐药机制和毒力因素具有重要价值。基因组数据已存入NCBI数据库,登录号为JBPPGA000000000,数据的直接URL为:https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1。生物工程:PRJNA1283614,生物样品:SAMN49700843。
{"title":"Draft genome data analysis and pathogenicity profiling of Staphylococcus aureus strain IHS3A with antibiotic resistance genes isolated from a hospital in Jordan","authors":"Saqr Abushattal ,&nbsp;Sulaiman M. Alnaimat ,&nbsp;Nidal Odat ,&nbsp;Mahmoud Abushattal","doi":"10.1016/j.dib.2026.112453","DOIUrl":"10.1016/j.dib.2026.112453","url":null,"abstract":"<div><div>This dataset provides a comprehensive genomic and pathogenicity profiling of <em>Staphylococcus aureus</em> strain IHS3A, a methicillin-resistant (MRSA) clinical isolate obtained from a healthcare worker in a teaching hospital in Jordan, Middle East. Whole genome sequencing was performed using the Illumina NextSeq 2000 platform, followed by high-quality de novo assembly using SPAdes. The genome spans 2821,373 bp across 90 contigs, with a GC content of 32.78%, and demonstrates high-quality metrics, including 99.67% completeness and minimal contamination (0.08%). The genome analysis identified 2611 predicted protein-coding sequences. Multilocus sequence typing (MLST) assigned the isolate to ST10647, SCC<em>mec</em> typing revealed type IVc (2B), and spa typing identified t131. The dataset includes comprehensive annotations of key antimicrobial resistance genes, such as <em>mecA</em> (methicillin resistance), <em>blaZ</em> (penicillin resistance), and <em>lmrS</em> (macrolide efflux), as well as virulence factors related to adherence (e.g., <em>atl, clfA</em>), immune evasion (e.g., <em>scn, adsA</em>), secretion systems (e.g., <em>esaA, esaB</em>), and toxins (e.g., <em>hla, lukF</em>-<em>PV, tsst</em>). Secondary metabolite biosynthetic gene clusters, such as staphyloferrin B and staphylopine, were identified. The genome also encodes a diverse carbohydrate-active enzyme (CAZyme) profile. These genomic data are valuable for further research on MRSA evolution, resistance mechanisms, and virulence factors in Jordan and the Middle East. The genome data have been deposited in the NCBI database under the accession number JBPPGA000000000, with a direct URL to data: <span><span>https://www.ncbi.nlm.nih.gov/nuccore/JBPPGA000000000.1</span><svg><path></path></svg></span>. Bioproject: PRJNA1283614, Biosample: SAMN49700843.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"64 ","pages":"Article 112453"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145973220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VisioDECT: A robust dataset for aerial and scenario based multi-drone detection, identification, and neutralization VisioDECT:一个强大的数据集,用于空中和基于场景的多无人机检测、识别和中和
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112448
Simeon Okechukwu Ajakwe , Vivian Ukamaka Ihekoronye , Golam Mohtasin , Rubina Akter , Jae Min Lee , Dong Seong Kim
The rapid proliferation of unmanned aerial vehicles (UAVs) for logistics, surveillance, and civilian applications continues to pose significant challenges to airspace security, particularly through unauthorized or malicious deployments. Existing UAV datasets are limited in scope, often focusing on single-drone scenarios, synthetic imagery, or restricted environmental conditions, thereby constraining the development of robust counter-UAV systems. To bridge these gaps, we present vision-based drone detection dataset named as VisioDECT, a comprehensive and scenario-rich dataset for multi-drone detection, identification, and neutralization. The dataset comprises 20,924 annotated images and labels from six UAV models (Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2, and Mavic 2 Enterprise), captured across three distinct scenarios (sunny, cloudy, and evening) at varying altitudes (30–100 m) and distances. Importantly, all UAVs included in this dataset are rotary-wing (multirotor) platforms, which dominate low-altitude airspace and are the most commonly encountered in real-world surveillance and counter-UAV scenarios. Data were collected over 20 months from more than 12 locations in South Korea, ensuring diversity in illumination, weather, and background complexity. Each sample is provided in three standard formats (.txt, .xml, .csv), with detailed metadata and quality-verified annotations for detection and classification tasks. Illustrative benchmark evaluations using state-of-the-art detection models (e.g., DRONET, YOLO variants) are included solely to validate the quality and practical usability of the dataset for real-time drone defense research. VisioDECT provides a standardized, reproducible, and scalable resource that enables benchmarking, model training, and evaluation for airspace surveillance, UAV traffic management, and national security applications.
用于后勤、监视和民用应用的无人机(uav)的快速扩散继续对空域安全构成重大挑战,特别是通过未经授权或恶意部署。现有的无人机数据集在范围上是有限的,通常聚焦于单无人机场景、合成图像或受限的环境条件,从而限制了鲁棒反无人机系统的发展。为了弥补这些差距,我们提出了基于视觉的无人机检测数据集VisioDECT,这是一个全面的、场景丰富的多无人机检测、识别和中和数据集。该数据集包括来自六种无人机型号(Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2和Mavic 2 Enterprise)的20,924张注释图像和标签,在不同高度(30-100米)和距离的三种不同场景(晴天,多云和傍晚)中捕获。重要的是,该数据集中包含的所有无人机都是旋翼(多旋翼)平台,它们主导着低空空域,在现实世界的监视和反无人机场景中最常见。数据在20个月内从韩国超过12个地点收集,确保了照明、天气和背景复杂性的多样性。每个示例都以三种标准格式(.txt、.xml和.csv)提供,并提供详细的元数据和用于检测和分类任务的经过质量验证的注释。使用最先进的检测模型(例如,DRONET, YOLO变体)进行说明性基准评估,仅用于验证实时无人机防御研究数据集的质量和实际可用性。VisioDECT提供了标准化、可复制和可扩展的资源,可以为空域监视、无人机交通管理和国家安全应用提供基准测试、模型训练和评估。
{"title":"VisioDECT: A robust dataset for aerial and scenario based multi-drone detection, identification, and neutralization","authors":"Simeon Okechukwu Ajakwe ,&nbsp;Vivian Ukamaka Ihekoronye ,&nbsp;Golam Mohtasin ,&nbsp;Rubina Akter ,&nbsp;Jae Min Lee ,&nbsp;Dong Seong Kim","doi":"10.1016/j.dib.2026.112448","DOIUrl":"10.1016/j.dib.2026.112448","url":null,"abstract":"<div><div>The rapid proliferation of unmanned aerial vehicles (UAVs) for logistics, surveillance, and civilian applications continues to pose significant challenges to airspace security, particularly through unauthorized or malicious deployments. Existing UAV datasets are limited in scope, often focusing on single-drone scenarios, synthetic imagery, or restricted environmental conditions, thereby constraining the development of robust counter-UAV systems. To bridge these gaps, we present vision-based drone detection dataset named as <strong>VisioDECT</strong>, a comprehensive and scenario-rich dataset for multi-drone detection, identification, and neutralization. The dataset comprises 20,924 annotated images and labels from six UAV models (Anafi-Extended, DJI FPV, DJI Phantom, EFT-E410S, Mavic Air 2, and Mavic 2 Enterprise), captured across three distinct scenarios (sunny, cloudy, and evening) at varying altitudes (30–100 m) and distances. Importantly, all UAVs included in this dataset are rotary-wing (multirotor) platforms, which dominate low-altitude airspace and are the most commonly encountered in real-world surveillance and counter-UAV scenarios. Data were collected over 20 months from more than 12 locations in South Korea, ensuring diversity in illumination, weather, and background complexity. Each sample is provided in three standard formats (.txt, .xml, .csv), with detailed metadata and quality-verified annotations for detection and classification tasks. Illustrative benchmark evaluations using state-of-the-art detection models (e.g., DRONET, YOLO variants) are included solely to validate the quality and practical usability of the dataset for real-time drone defense research. VisioDECT provides a standardized, reproducible, and scalable resource that enables benchmarking, model training, and evaluation for airspace surveillance, UAV traffic management, and national security applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112448"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146075114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits 从马尾藻废料中分离的新型马里安Stutzerimonas marianensis LB-0542的基因组数据挖掘及其对塑料降解和植物生长的促进作用
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112454
Bidyut R. Mohapatra, Linel S. Moralez, Kiya E. James
This study reports the whole-genome sequence data and functional annotations of a novel Stutzerimonas marianensis strain LB-0542 isolated from the decomposing pelagic Sargassum biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L50 of 2 and a N50 of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of Sargassum and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.
本研究报道了从搁浅在巴巴多斯长滩的浮游马尾藻分解生物量中分离到的一株新的Stutzerimonas marianensis菌株LB-0542的全基因组序列数据和功能注释。基因组DNA测序采用Illumina NextSeq2000平台。使用SPAdes genome Assembler(3.15.5版)进行基因组组装。组装的基因组大小为4520,813 bp,覆盖率为110X, GC含量为63.2%,L50为2,N50为1079,143 bp。基因组由12个contigs、0个CRISPR、3个rRNA、56个tRNA和4166个cds(编码序列)组成,编码率为89.4%。同源基因簇(COG)和子系统特征的基因组注释结果表明,代谢和氨基酸及其衍生物分别是最主要的类别。对碳水化合物活性酶(CAZymes)的基因组分析鉴定出230个基因编码4个功能类别的碳水化合物酶[糖苷水解酶(75个基因)、糖基转移酶(95个基因)、碳水化合物酯酶(9个基因)和碳水化合物结合模块(51个基因)]。塑料降解基因组的功能注释显示,存在34个基因,可以催化14种塑料的降解过程,聚乙二醇[PEG(29%)],聚乳酸[PLA(11%)],聚(3-羟基丁酸酯-co-3-羟基戊酸酯)[PHBV(9%)],聚羟基烷酸酯[PHA(9%)],聚乙烯[PE(6%)],聚己内酯[PCL(6%)],聚醚砜[PES(6%)],聚对苯二甲酸乙二醇酯[PET (6%)],聚苯乙烯[PS(3%)]、聚丁二酸丁二酯[PBAT(3%)]、聚苯乙烯[PS(3%)]、聚丁二酸丁二酯[PBSA(3%)]、聚3-羟戊酸酯[P3HV(3%)]、聚乙烯醇[PVA(3%)]和天然橡胶[NR(3%)]。对植物生长促进性状的基因组挖掘发现了3175个与定植植物系统(26%)、竞争排斥(21%)、胁迫控制(21%)、生物施肥(14%)、植物激素和植物信号产生(10%)、生物修复(7%)和植物免疫应答刺激(1%)相关的基因。这些基因组挖掘结果表明,新菌株LB-0542对马尾藻和含塑料废物的可持续生物催化处理具有生物技术和生态意义。基因组序列数据可在DDBJ/EMBL/GenBank中查询,登录号BAAIAE000000000。
{"title":"Genome data mining of a novel Stutzerimonas marianensis strain LB-0542 isolated from pelagic Sargassum seaweed waste for plastic-degrading and plant growth-promoting traits","authors":"Bidyut R. Mohapatra,&nbsp;Linel S. Moralez,&nbsp;Kiya E. James","doi":"10.1016/j.dib.2026.112454","DOIUrl":"10.1016/j.dib.2026.112454","url":null,"abstract":"<div><div>This study reports the whole-genome sequence data and functional annotations of a novel <em>Stutzerimonas marianensis</em> strain LB-0542 isolated from the decomposing pelagic <em>Sargassum</em> biomass stranded on Long Beach, Barbados. The genomic DNA was sequenced with the Illumina NextSeq2000 platform. The genome assembly was performed with the SPAdes Genome Assembler (ver 3.15.5). The assembled genome has a size of 4520,813 bp, a coverage of 110X, a GC content of 63.2 %, a L<sub>50</sub> of 2 and a N<sub>50</sub> of 1079,143 bp. The genome consists of 12 contigs, 0 CRISPR, 3 rRNA, 56 tRNA and 4166 CDSs (coding sequences) with a coding ratio of 89.4 %. The genome annotation results for the COG (cluster of orthologous genes) and subsystem features indicate that the metabolism and the amino acids and derivatives are the most dominant categories, respectively. The analysis of the genome for the existence of Carbohydrate-Active Enzymes (CAZymes) identified 230 genes encoding four functional classes of CAZymes [glycoside hydrolases (75 genes), glycosyltransferases (95 genes), carbohydrate esterases (9 genes) and carbohydrate-binding modules (51 genes)]. The functional annotation of the genome for plastic degradation revealed the presence of 34 genes, which could catalyse the degradation process of 14 types of plastics, polyethylene glycol [PEG (29 %)], polylactic acid [PLA (11 %)], poly(3-hydroxybutyrate-co-3-hydroxyvalerate) [PHBV (9 %)], polyhydroxyalkanoates [PHA (9 %)], polyethylene [PE (6 %)], polycaprolactone [PCL (6 %)], polyethersulfone [PES (6 %)], polyethylene terephthalate [PET (6 %)], poly(butylene adipate-co-terephthalate [PBAT (3 %)], (polystyrene [PS (3 %)], polybutylene succinate [PBSA (3 %)], poly(3-hydroxyvalerate) [P3HV (3 %)], polyvinyl alcohol [PVA (3 %)] and natural rubber [NR (3 %)]. The genome mining for plant growth-promoting traits identified 3175 genes that are associated with the colonizing plant system (26 %), competitive exclusion (21 %), stress control (21 %), biofertilization (14 %), phytohormone and plant signal production (10 %), bioremediation (7 %) and plant immune response stimulation (1 %). These genome mining results are an indication of the biotechnological and ecological significance of the novel strain LB-0542 for sustainable biocatalytic processing of <em>Sargassum</em> and plastic-containing waste. The genome sequence data is available in DDBJ/EMBL/GenBank with the accession number BAAIAE000000000.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112454"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corn seed dataset based on hyperspectral and RGB images 基于高光谱和RGB图像的玉米种子数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-08 DOI: 10.1016/j.dib.2026.112455
Chao LI , Chen Zhang , Wenbo Zhang , Chengzhen LV , Yaqiang Li , Yufen Wang
This study employed an HY-6010-S hyperspectral imaging system, covering a spectral range of 400–1000 nm, combined with an RGB industrial camera to acquire multimodal data. The dataset simulates phenotypic analysis scenarios of maize seeds under controlled laboratory conditions, with the ambient temperature maintained at 20–25°C. Comprehensive testing was conducted using 12 different maize varieties. Approximately 200 seed samples were collected per variety, resulting in a total sample size of about 2400, each subjected to hyperspectral and RGB image acquisition. Preprocessing steps included noise reduction, background removal, band selection, and modality alignment. To ensure the accuracy and reliability of the experimental data, HHIT software and Python were utilized for data processing. This dataset plays a significant role in seed variety classification, phenotypic analysis, precision agriculture, and machine learning applications.
本研究采用HY-6010-S高光谱成像系统,覆盖400-1000 nm光谱范围,结合RGB工业相机获取多模态数据。该数据集模拟了受控实验室条件下玉米种子的表型分析情景,环境温度保持在20-25℃。采用12个不同的玉米品种进行了综合试验。每个品种大约收集了200个种子样本,总样本量约为2400个,每个样本都进行了高光谱和RGB图像采集。预处理步骤包括降噪、背景去除、波段选择和模态对齐。为了保证实验数据的准确性和可靠性,使用HHIT软件和Python进行数据处理。该数据集在种子品种分类、表型分析、精准农业和机器学习应用中发挥着重要作用。
{"title":"Corn seed dataset based on hyperspectral and RGB images","authors":"Chao LI ,&nbsp;Chen Zhang ,&nbsp;Wenbo Zhang ,&nbsp;Chengzhen LV ,&nbsp;Yaqiang Li ,&nbsp;Yufen Wang","doi":"10.1016/j.dib.2026.112455","DOIUrl":"10.1016/j.dib.2026.112455","url":null,"abstract":"<div><div>This study employed an HY-6010-S hyperspectral imaging system, covering a spectral range of 400–1000 nm, combined with an RGB industrial camera to acquire multimodal data. The dataset simulates phenotypic analysis scenarios of maize seeds under controlled laboratory conditions, with the ambient temperature maintained at 20–25°C. Comprehensive testing was conducted using 12 different maize varieties. Approximately 200 seed samples were collected per variety, resulting in a total sample size of about 2400, each subjected to hyperspectral and RGB image acquisition. Preprocessing steps included noise reduction, background removal, band selection, and modality alignment. To ensure the accuracy and reliability of the experimental data, HHIT software and Python were utilized for data processing. This dataset plays a significant role in seed variety classification, phenotypic analysis, precision agriculture, and machine learning applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112455"},"PeriodicalIF":1.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data in Brief
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1