首页 > 最新文献

Data in Brief最新文献

英文 中文
OpenStreetMap-derived multimodal dataset across 23 cities: Paired urban morphology tiles with bioclimatic variables openstreetmap衍生的跨23个城市的多模态数据集:配对城市形态瓦片与生物气候变量
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-27 DOI: 10.1016/j.dib.2026.112518
Tao He, Wei Lu
We present an OpenStreetMap-derived multimodal dataset spanning 23 cities and 11,711 tile-level samples. For each 768 × 768 m tile, we provide an aligned image pair: (i) a stylized ecological baseline that generalizes green and water features together with major roads and railways, and (ii) a target urban morphology map color-coded by functional building classes, transport infrastructure, green space, and water. Each sample includes latitude/longitude; the eight WorldClim v2.1 bioclimatic variables can be reconstructed locally with the provided script. The dataset is organized by city and indexed with JSONL records linking image paths and attributes, enabling direct integration into machine learning pipelines. Cross-city and cross-climate coverage supports training and evaluation of generative models for urban design, comparative analyses of morphology across climate regimes, and imputation of functional footprints in data-scarce regions. The ecological baseline represents a constructed pre-urban template rather than a historical map.
我们提出了一个由openstreetmap衍生的多模式数据集,涵盖23个城市和11,711个瓷砖级样本。对于每一个768 × 768米的瓷砖,我们提供了一个对齐的图像对:(i)一个风格化的生态基线,概括了绿色和水的特征,以及主要的道路和铁路;(ii)一个目标城市形态地图,由功能建筑类别、交通基础设施、绿地和水进行颜色编码。每个样本包括纬度/经度;8个WorldClim v2.1生物气候变量可以使用提供的脚本在本地重建。数据集按城市组织,并使用链接图像路径和属性的JSONL记录进行索引,从而可以直接集成到机器学习管道中。跨城市和跨气候覆盖支持城市设计生成模型的培训和评估,跨气候制度形态的比较分析,以及数据稀缺地区功能足迹的归因。生态基线代表了一个构建的前城市模板,而不是历史地图。
{"title":"OpenStreetMap-derived multimodal dataset across 23 cities: Paired urban morphology tiles with bioclimatic variables","authors":"Tao He,&nbsp;Wei Lu","doi":"10.1016/j.dib.2026.112518","DOIUrl":"10.1016/j.dib.2026.112518","url":null,"abstract":"<div><div>We present an OpenStreetMap-derived multimodal dataset spanning 23 cities and 11,711 tile-level samples. For each 768 × 768 m tile, we provide an aligned image pair: (i) a stylized ecological baseline that generalizes green and water features together with major roads and railways, and (ii) a target urban morphology map color-coded by functional building classes, transport infrastructure, green space, and water. Each sample includes latitude/longitude; the eight WorldClim v2.1 bioclimatic variables can be reconstructed locally with the provided script. The dataset is organized by city and indexed with JSONL records linking image paths and attributes, enabling direct integration into machine learning pipelines. Cross-city and cross-climate coverage supports training and evaluation of generative models for urban design, comparative analyses of morphology across climate regimes, and imputation of functional footprints in data-scarce regions. The ecological baseline represents a constructed pre-urban template rather than a historical map.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112518"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset on Spanish medium-sized family firms: Linking socioemotional wealth, HRM practices, and financial indicators 西班牙中型家族企业数据集:连接社会情感财富、人力资源管理实践和财务指标
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-19 DOI: 10.1016/j.dib.2026.112488
J. Samuel Baixauli-Soler , María Belda-Ruiz , Gabriel Lozano-Reina , Juan David Peláez-León , Gregorio Sánchez-Marín
This article presents a dataset on 508 medium-sized Spanish family firms, collected between March and June 2016 through structured telephone interviews with CEOs or HR directors. The questionnaire covered four dimensions: family involvement and socioemotional wealth (SEW), human resource management (HRM) practices, financial strategies, and managerial demographics. To complement survey data, financial indicators were extracted from the SABI (Sistema de Análisis de Balances Ibéricos) database. The dataset integrates subjective managerial assessments with objective firm-level information, offering a unique resource for research on family business management, HRM, and financial policies. Variables include firm ownership and management, generational structures, SEW priorities, human capital and HRM practices, financial goals and capital access, as well as managers’ demographic characteristics. The database is released in cleaned, anonymized, and fully documented form (together with the questionnaire and a detailed codebook), enabling replication, comparative studies, and meta-analyses on family firms and related organizational topics.
本文展示了508家西班牙中型家族企业的数据集,这些数据是在2016年3月至6月期间通过对首席执行官或人力资源总监的结构化电话采访收集的。问卷涵盖四个维度:家庭参与与社会情感财富(SEW)、人力资源管理(HRM)实践、财务策略和管理人口统计。为了补充调查数据,财务指标从SABI (Sistema de Análisis de balesimacrios)数据库中提取。该数据集将主观管理评估与客观公司层面的信息相结合,为家族企业管理、人力资源管理和财务政策的研究提供了独特的资源。变量包括公司所有权和管理、代际结构、SEW优先级、人力资本和人力资源管理实践、财务目标和资本获取,以及管理者的人口特征。该数据库以经过清理、匿名和完整记录的形式发布(连同问卷和详细的代码本),可以对家族企业和相关组织主题进行复制、比较研究和元分析。
{"title":"Dataset on Spanish medium-sized family firms: Linking socioemotional wealth, HRM practices, and financial indicators","authors":"J. Samuel Baixauli-Soler ,&nbsp;María Belda-Ruiz ,&nbsp;Gabriel Lozano-Reina ,&nbsp;Juan David Peláez-León ,&nbsp;Gregorio Sánchez-Marín","doi":"10.1016/j.dib.2026.112488","DOIUrl":"10.1016/j.dib.2026.112488","url":null,"abstract":"<div><div>This article presents a dataset on 508 medium-sized Spanish family firms, collected between March and June 2016 through structured telephone interviews with CEOs or HR directors. The questionnaire covered four dimensions: family involvement and socioemotional wealth (SEW), human resource management (HRM) practices, financial strategies, and managerial demographics. To complement survey data, financial indicators were extracted from the SABI (<em>Sistema de Análisis de Balances Ibéricos</em>) database. The dataset integrates subjective managerial assessments with objective firm-level information, offering a unique resource for research on family business management, HRM, and financial policies. Variables include firm ownership and management, generational structures, SEW priorities, human capital and HRM practices, financial goals and capital access, as well as managers’ demographic characteristics. The database is released in cleaned, anonymized, and fully documented form (together with the questionnaire and a detailed codebook), enabling replication, comparative studies, and meta-analyses on family firms and related organizational topics.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112488"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A simulation-based dataset for anomaly detection in hydrogen blend transport networks 基于模拟的氢混合输送网络异常检测数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-28 DOI: 10.1016/j.dib.2026.112520
Andrea Senese , Saverio De Vito , Elena Esposito , Michele Villari , Giovanni Acampora , Girolamo Di Francia , Antonia Longobardi , Giulia Monteleone
Hydrogen transport involves the safe movement of gaseous hydrogen through industrial pipeline networks, typically between production plants, storage facilities, and distribution centers, and is a key component in the transition toward more sustainable energy sources [1]. Monitoring these networks is essential, as hydrogen is highly flammable and leaks, compressor failures, or delayed component responses can lead to serious accidents, environmental damage, and operational interruptions. Despite the growing interest in this sector, publicly available datasets containing multivariate data on hydrogen transport networks are extremely limited, hindering the development and evaluation of data-driven monitoring methods [[2], [3], [4]]. To address this gap, we present a synthetic dataset simulated using a MATLAB Simscape model of a pipeline segment representative of an industrial network [[5], [6], [7],14]. The dataset includes time-series data from distributed virtual sensors, covering both normal operating conditions and anomalous scenarios such as leaks, compressor failures, and delayed component responses [8,9]. The simulation reproduces transient and steady-state dynamics typical of industrial networks, providing data suitable for the development and evaluation of algorithms for digital twins [10], monitoring, and anomaly detection in hydrogen transport infrastructures [10,11].
氢气运输是指气态氢气通过工业管道网络的安全运输,通常在生产工厂、储存设施和配送中心之间进行,是向更可持续能源过渡的关键组成部分。监测这些网络至关重要,因为氢气是高度易燃和泄漏的,压缩机故障或组件响应延迟可能导致严重事故、环境破坏和运行中断。尽管对这一领域的兴趣日益浓厚,但包含氢传输网络多元数据的公开可用数据集极其有限,阻碍了数据驱动监测方法的开发和评估[[2],[3],[4]]。为了解决这一差距,我们提出了一个合成数据集,使用代表工业网络的管道段的MATLAB Simscape模型进行模拟[[5],[6],[7],14]。该数据集包括来自分布式虚拟传感器的时间序列数据,涵盖了正常运行条件和异常情况,如泄漏、压缩机故障和延迟组件响应[8,9]。该模拟再现了工业网络典型的瞬态和稳态动态,为氢传输基础设施中的数字孪生体[10]、监测和异常检测算法的开发和评估提供了合适的数据[10,11]。
{"title":"A simulation-based dataset for anomaly detection in hydrogen blend transport networks","authors":"Andrea Senese ,&nbsp;Saverio De Vito ,&nbsp;Elena Esposito ,&nbsp;Michele Villari ,&nbsp;Giovanni Acampora ,&nbsp;Girolamo Di Francia ,&nbsp;Antonia Longobardi ,&nbsp;Giulia Monteleone","doi":"10.1016/j.dib.2026.112520","DOIUrl":"10.1016/j.dib.2026.112520","url":null,"abstract":"<div><div>Hydrogen transport involves the safe movement of gaseous hydrogen through industrial pipeline networks, typically between production plants, storage facilities, and distribution centers, and is a key component in the transition toward more sustainable energy sources [<span><span>1</span></span>]. Monitoring these networks is essential, as hydrogen is highly flammable and leaks, compressor failures, or delayed component responses can lead to serious accidents, environmental damage, and operational interruptions. Despite the growing interest in this sector, publicly available datasets containing multivariate data on hydrogen transport networks are extremely limited, hindering the development and evaluation of data-driven monitoring methods [<span><span>[2]</span></span>, <span><span>[3]</span></span>, <span><span>[4]</span></span>]. To address this gap, we present a synthetic dataset simulated using a MATLAB Simscape model of a pipeline segment representative of an industrial network [<span><span>[5]</span></span>, <span><span>[6]</span></span>, <span><span>[7]</span></span>,<span><span>14</span></span>]. The dataset includes time-series data from distributed virtual sensors, covering both normal operating conditions and anomalous scenarios such as leaks, compressor failures, and delayed component responses [<span><span>8</span></span>,<span><span>9</span></span>]. The simulation reproduces transient and steady-state dynamics typical of industrial networks, providing data suitable for the development and evaluation of algorithms for digital twins [<span><span>10</span></span>], monitoring, and anomaly detection in hydrogen transport infrastructures [<span><span>10</span></span>,<span><span>11</span></span>].</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112520"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedQA-MA: A Moroccan Arabic medical question-answering dataset for virtual healthcare assistants and large language models MedQA-MA:用于虚拟医疗助理和大型语言模型的摩洛哥阿拉伯医学问答数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-02-02 DOI: 10.1016/j.dib.2026.112537
Soufiyan Ouali, Said El Garouani
The healthcare domain constitutes a fundamental pillar of national development, as maintaining population health not only enhances citizens' quality of life but also generates substantial economic benefits through increased productivity, innovation, and workforce participation. However, the healthcare industry faces numerous challenges and barriers that impede universal access to medical services. In low- and middle-income countries, significant portions of the population forego medical consultations due to various socioeconomic constraints, including prohibitive consultation fees, scheduling difficulties, and extended waiting periods. Consequently, there is an urgent need for innovative approaches to optimize healthcare delivery processes. Recent advances in artificial intelligence have demonstrated promising potential in developing intelligent systems that address healthcare accessibility gaps. These innovations include medical chatbots, appointment booking systems, disease-prediction models, and psychiatric virtual assistants. However, such technological enhancements have predominantly focused on high-resource languages, while research in low-resource languages, particularly Arabic, remains in its preliminary stages. This disparity is especially pronounced in Arabic dialects, which differ substantially from Modern Standard Arabic in terms of vocabulary, syntax, and semantic structures. To address this critical gap, we present the first comprehensive dataset for the Moroccan Arabic dialect in the healthcare domain. The MedQA-MA dataset comprises 108,943 question-answer pairs in text format, with each pair categorized according to medical specialty. Including 23 distinct medical specialties, this dataset serves multiple applications, including sentiment analysis, specialty classification, question-answering systems, and the development of human-like medical chatbots. The dataset has been meticulously curated, annotated, and validated by qualified medical professionals, ensuring its reliability and clinical relevance for developing realistic healthcare systems grounded in authentic medical interactions.
The MedQA-MA dataset is publicly available and freely accessible at https://data.mendeley.com/datasets/v6gs7nsy9z/1, representing a significant contribution to Arabic Natural Language Processing research in healthcare applications and facilitating the development of culturally and linguistically appropriate medical AI systems for Arabic-speaking populations.
医疗保健领域是国家发展的基本支柱,因为保持人口健康不仅可以提高公民的生活质量,还可以通过提高生产力、创新和劳动力参与来产生巨大的经济效益。然而,医疗保健行业面临着许多阻碍普遍获得医疗服务的挑战和障碍。在低收入和中等收入国家,由于各种社会经济限制,包括高昂的咨询费、排期困难和等待时间过长,很大一部分人口放弃了医疗咨询。因此,迫切需要创新方法来优化医疗保健服务流程。人工智能的最新进展表明,在开发解决医疗保健可及性差距的智能系统方面具有很大的潜力。这些创新包括医疗聊天机器人、预约系统、疾病预测模型和精神病学虚拟助手。然而,这种技术改进主要集中在资源丰富的语文,而对资源贫乏的语文,特别是阿拉伯语的研究仍处于初步阶段。这种差异在阿拉伯方言中尤其明显,阿拉伯方言在词汇、句法和语义结构方面与现代标准阿拉伯语有很大的不同。为了解决这一关键的差距,我们提出了第一个综合数据集的摩洛哥阿拉伯语方言在医疗保健领域。MedQA-MA数据集包括108,943对文本格式的问答对,每对都根据医学专业进行分类。该数据集包括23个不同的医学专业,服务于多种应用,包括情感分析、专业分类、问答系统和类人医疗聊天机器人的开发。该数据集由合格的医疗专业人员精心策划、注释和验证,确保其可靠性和临床相关性,以开发基于真实医疗互动的现实医疗系统。MedQA-MA数据集可在https://data.mendeley.com/datasets/v6gs7nsy9z/1上公开和免费获取,代表了对医疗保健应用中的阿拉伯自然语言处理研究的重大贡献,并促进了为阿拉伯语人口开发文化和语言上合适的医疗人工智能系统。
{"title":"MedQA-MA: A Moroccan Arabic medical question-answering dataset for virtual healthcare assistants and large language models","authors":"Soufiyan Ouali,&nbsp;Said El Garouani","doi":"10.1016/j.dib.2026.112537","DOIUrl":"10.1016/j.dib.2026.112537","url":null,"abstract":"<div><div>The healthcare domain constitutes a fundamental pillar of national development, as maintaining population health not only enhances citizens' quality of life but also generates substantial economic benefits through increased productivity, innovation, and workforce participation. However, the healthcare industry faces numerous challenges and barriers that impede universal access to medical services. In low- and middle-income countries, significant portions of the population forego medical consultations due to various socioeconomic constraints, including prohibitive consultation fees, scheduling difficulties, and extended waiting periods. Consequently, there is an urgent need for innovative approaches to optimize healthcare delivery processes. Recent advances in artificial intelligence have demonstrated promising potential in developing intelligent systems that address healthcare accessibility gaps. These innovations include medical chatbots, appointment booking systems, disease-prediction models, and psychiatric virtual assistants. However, such technological enhancements have predominantly focused on high-resource languages, while research in low-resource languages, particularly Arabic, remains in its preliminary stages. This disparity is especially pronounced in Arabic dialects, which differ substantially from Modern Standard Arabic in terms of vocabulary, syntax, and semantic structures. To address this critical gap, we present the first comprehensive dataset for the Moroccan Arabic dialect in the healthcare domain. The MedQA-MA dataset comprises 108,943 question-answer pairs in text format, with each pair categorized according to medical specialty. Including 23 distinct medical specialties, this dataset serves multiple applications, including sentiment analysis, specialty classification, question-answering systems, and the development of human-like medical chatbots. The dataset has been meticulously curated, annotated, and validated by qualified medical professionals, ensuring its reliability and clinical relevance for developing realistic healthcare systems grounded in authentic medical interactions.</div><div>The MedQA-MA dataset is publicly available and freely accessible at <span><span>https://data.mendeley.com/datasets/v6gs7nsy9z/1</span><svg><path></path></svg></span>, representing a significant contribution to Arabic Natural Language Processing research in healthcare applications and facilitating the development of culturally and linguistically appropriate medical AI systems for Arabic-speaking populations.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112537"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146185213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of RGB-D images of object collections from multiple viewpoints with aligned high-resolution 3D models of objects 多视点物体集合的RGB-D图像数据集,具有对齐的高分辨率物体3D模型
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-08 DOI: 10.1016/j.dib.2026.112450
Xinchao Song , Mingjun Li , Sean Banerjee , Natasha Kholgade Banerjee
We present the HILO dataset consisting of high-resolution 3D scanned models for 253 common-use objects and 32,256 multi-viewpoint RGB-D images with typically low-resolution data for 144 tabletop scenes consisting of collections of random sets of 10 objects drawn from the set of 253 objects. The dataset provides the 6 degree of freedom (6DOF) pose for all objects found in each of the 32,256 RGB-D images, obtained by performing precise 3D alignment of the 3D models to the RGB-D images. The dataset also contains metadata on object mass, short text descriptor, binning into everyday use classes, and aspect ratio and function categories, intrinsic parameter information for RGB-D sensors used in capture, and transformations between camera poses. Object 3D models in the dataset were acquired by scanning using a tabletop 3D scanner, and were manually inspected, cleaned, repaired, and exported as original ultra high-resolution at ∼1M vertices and simplified high-resolution meshes at ∼10k vertices. To capture the multi-view RGB-D images, we established an in-house testbed consisting of a turntable and two robotic manipulators to respectively cover azimuth angles and elevation angles, and span a hemisphere. Images were captured using two Microsoft Azure Kinect sensors mounted at the wrists of the robot, one per robot. We captured images over two distances forming hemispherical shells. We used in-house software written in python to control the turntable movement, robot motion, and image capture, as well as to perform camera calibration, processing to generate registered images and foreground masks, manual precise alignment of object models to images, and post-capture correction of misalignments in camera transformation parameters. The dataset provides value in enabling training and evaluation of algorithms for several tasks in computer vision, artificial intelligence (AI), and robotics such as object completion, recognition, segmentation, high-resolution structure generation, robotic grasp planning, and recognition of human-preferred grasp locations for human-robot collaboration.
我们展示了HILO数据集,包括253个常用对象的高分辨率3D扫描模型和32,256个多视点RGB-D图像,以及144个桌面场景的典型低分辨率数据,这些场景由从253个对象集中抽取的10个对象的随机集合组成。该数据集通过对3D模型与RGB-D图像进行精确的3D对齐,为32,256张RGB-D图像中的所有物体提供了6个自由度(6DOF)的姿态。该数据集还包含关于物体质量的元数据,短文本描述符,分成日常使用类,宽高比和功能类别,用于捕获的RGB-D传感器的内在参数信息,以及相机姿势之间的转换。数据集中的对象3D模型是通过使用桌面3D扫描仪扫描获得的,然后进行人工检查、清洗、修复,并在~ 1M顶点处导出为原始的超高分辨率网格,在~ 10k顶点处导出为简化的高分辨率网格。为了捕获多视角RGB-D图像,我们建立了一个内部测试平台,该平台由一个转台和两个机器人操作台组成,分别覆盖方位角和仰角,并跨越一个半球。图像是通过安装在机器人手腕上的两个微软Azure Kinect传感器捕获的,每个机器人一个。我们在两个距离上拍摄了形成半球形壳的图像。我们使用python编写的内部软件来控制转台运动,机器人运动和图像捕获,以及执行相机校准,处理以生成配准图像和前景蒙版,手动精确对齐对象模型到图像,以及捕获后相机变换参数的不校准校正。该数据集为计算机视觉,人工智能(AI)和机器人技术中的几个任务的算法训练和评估提供了价值,例如对象补全,识别,分割,高分辨率结构生成,机器人抓取规划以及识别人类首选的抓取位置以进行人机协作。
{"title":"Dataset of RGB-D images of object collections from multiple viewpoints with aligned high-resolution 3D models of objects","authors":"Xinchao Song ,&nbsp;Mingjun Li ,&nbsp;Sean Banerjee ,&nbsp;Natasha Kholgade Banerjee","doi":"10.1016/j.dib.2026.112450","DOIUrl":"10.1016/j.dib.2026.112450","url":null,"abstract":"<div><div>We present the HILO dataset consisting of high-resolution 3D scanned models for 253 common-use objects and 32,256 multi-viewpoint RGB-D images with typically low-resolution data for 144 tabletop scenes consisting of collections of random sets of 10 objects drawn from the set of 253 objects. The dataset provides the 6 degree of freedom (6DOF) pose for all objects found in each of the 32,256 RGB-D images, obtained by performing precise 3D alignment of the 3D models to the RGB-D images. The dataset also contains metadata on object mass, short text descriptor, binning into everyday use classes, and aspect ratio and function categories, intrinsic parameter information for RGB-D sensors used in capture, and transformations between camera poses. Object 3D models in the dataset were acquired by scanning using a tabletop 3D scanner, and were manually inspected, cleaned, repaired, and exported as original ultra high-resolution at ∼1M vertices and simplified high-resolution meshes at ∼10k vertices. To capture the multi-view RGB-D images, we established an in-house testbed consisting of a turntable and two robotic manipulators to respectively cover azimuth angles and elevation angles, and span a hemisphere. Images were captured using two Microsoft Azure Kinect sensors mounted at the wrists of the robot, one per robot. We captured images over two distances forming hemispherical shells. We used in-house software written in python to control the turntable movement, robot motion, and image capture, as well as to perform camera calibration, processing to generate registered images and foreground masks, manual precise alignment of object models to images, and post-capture correction of misalignments in camera transformation parameters. The dataset provides value in enabling training and evaluation of algorithms for several tasks in computer vision, artificial intelligence (AI), and robotics such as object completion, recognition, segmentation, high-resolution structure generation, robotic grasp planning, and recognition of human-preferred grasp locations for human-robot collaboration.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112450"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Soil and crop data from a long-term organic fertilization trial in Sub-Sahelian market gardening 萨赫勒以南地区市场园艺长期有机施肥试验的土壤和作物数据
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-08 DOI: 10.1016/j.dib.2026.112456
Marie-Liesse Vermeire , Pathé Basse , Samuel Legros , Falilou Diallo , Anne Desnues , Frédéric Feder
Recycling the growing stock of organic waste products (OWP) from cities, factories, and farms is a key challenge for sustainable agriculture. However, it must be done with awareness of performances but also potential long-term environmental and health risks. In this context, the SOERE PRO observatory was established ("Systèmes d'Observation et d'Expérimentation pour la Recherche en Environnement - Produits Résiduaires Organiques'', a label granted by the French National Research Alliance for the Environment (AllEnvi) to recognize high-quality research infrastructures, which translates to "Long-term Observation and Experimentation Systems for Environmental Research - Organic Waste Products''), including the trial in Sangalkam, in the Dakar region of Senegal, where these data are collected. Since 2016, four fertilizer types - one mineral (synthetic) and three organic - have been applied annually to three successive vegetable crops (tomato, lettuce, carrot). The dataset currently covers the period 2016 - 2025, with data collection ongoing and new data to be added in the future. Manual weeding and hoeing is carried out regularly for each crop, no pesticides are used for crop protection on the trial. A comprehensive, multi-variable dataset is consistently documented, including soil physico-chemical parameters measured annually at three depths, organic waste product characterization, crop yield and quality parameters, and detailed management activities, making it particularly suitable for process-based modelling and long-term impact assessment. The originality of this dataset lies in its long duration, the diversity of organic and mineral fertilization strategies, the inclusion of multiple vegetable crops per year, and its location under Sub-Sahelian conditions, a context for which long-term agronomic datasets remain scarce. All soil, OWP and vegetables samples are stored in a sample bank in Dakar, and available for additional analyses. The objective of this dataset is to provide long-term, integrated information on crop productivity, crop quality, and soil responses to repeated organic and mineral fertilization in a Sub-Sahelian market-gardening system. The dataset is publicly available through a Dataverse repository for free (re)use in meta-analyses, process-based modelling, and environmental studies, notably to improve understanding of nutrient cycling, contaminant dynamics, soil biodiversity, and long-term soil functioning in Sub-Sahelian agroecosystems, and to support sustainable land management and food security in Southern countries under future climate change.
从城市、工厂和农场中回收越来越多的有机废物(OWP)是可持续农业的一个关键挑战。然而,在进行这项工作时,不仅要意识到业绩,还要注意潜在的长期环境和健康风险。在这方面,建立了SOERE PRO观测站(“环境研究的观察和实验系统-有机产品”,这是法国国家环境研究联盟(AllEnvi)授予的一个标签,以承认高质量的研究基础设施,其翻译为“环境研究的长期观察和实验系统-有机废物”),包括在桑卡尔卡姆的试验。在收集这些数据的塞内加尔达喀尔地区。自2016年以来,四种肥料——一种矿物(合成)和三种有机——每年连续施用于三种蔬菜作物(番茄、生菜、胡萝卜)。该数据集目前涵盖2016 - 2025年期间,数据收集正在进行中,未来将添加新数据。每个作物定期进行人工除草和锄地,试验中不使用农药进行作物保护。一个全面的、多变量的数据集被一致地记录下来,包括每年在三个深度测量的土壤物理化学参数、有机废物特性、作物产量和质量参数,以及详细的管理活动,使其特别适合基于过程的建模和长期影响评估。该数据集的独创性在于其持续时间长,有机和矿物施肥策略的多样性,每年包括多种蔬菜作物,以及其在萨赫勒以南条件下的位置,这是一个长期农艺数据集仍然稀缺的背景。所有土壤、土壤磷和蔬菜样本都储存在达喀尔的一个样本库中,供进一步分析使用。该数据集的目的是提供关于萨赫勒以南市场园艺系统中作物生产力、作物质量和土壤对重复施用有机和矿物肥料的反应的长期综合信息。该数据集可通过Dataverse存储库公开提供,供元分析、基于过程的建模和环境研究免费(重复)使用,特别是用于提高对萨赫勒以南农业生态系统中养分循环、污染物动态、土壤生物多样性和长期土壤功能的理解,并支持南方国家在未来气候变化下的可持续土地管理和粮食安全。
{"title":"Soil and crop data from a long-term organic fertilization trial in Sub-Sahelian market gardening","authors":"Marie-Liesse Vermeire ,&nbsp;Pathé Basse ,&nbsp;Samuel Legros ,&nbsp;Falilou Diallo ,&nbsp;Anne Desnues ,&nbsp;Frédéric Feder","doi":"10.1016/j.dib.2026.112456","DOIUrl":"10.1016/j.dib.2026.112456","url":null,"abstract":"<div><div>Recycling the growing stock of organic waste products (OWP) from cities, factories, and farms is a key challenge for sustainable agriculture. However, it must be done with awareness of performances but also potential long-term environmental and health risks. In this context, the SOERE PRO observatory was established (\"Systèmes d'Observation et d'Expérimentation pour la Recherche en Environnement - Produits Résiduaires Organiques'', a label granted by the French National Research Alliance for the Environment (AllEnvi) to recognize high-quality research infrastructures, which translates to \"Long-term Observation and Experimentation Systems for Environmental Research - Organic Waste Products''), including the trial in Sangalkam, in the Dakar region of Senegal, where these data are collected. Since 2016, four fertilizer types - one mineral (synthetic) and three organic - have been applied annually to three successive vegetable crops (tomato, lettuce, carrot). The dataset currently covers the period 2016 - 2025, with data collection ongoing and new data to be added in the future. Manual weeding and hoeing is carried out regularly for each crop, no pesticides are used for crop protection on the trial. A comprehensive, multi-variable dataset is consistently documented, including soil physico-chemical parameters measured annually at three depths, organic waste product characterization, crop yield and quality parameters, and detailed management activities, making it particularly suitable for process-based modelling and long-term impact assessment. The originality of this dataset lies in its long duration, the diversity of organic and mineral fertilization strategies, the inclusion of multiple vegetable crops per year, and its location under Sub-Sahelian conditions, a context for which long-term agronomic datasets remain scarce. All soil, OWP and vegetables samples are stored in a sample bank in Dakar, and available for additional analyses. The objective of this dataset is to provide long-term, integrated information on crop productivity, crop quality, and soil responses to repeated organic and mineral fertilization in a Sub-Sahelian market-gardening system. The dataset is publicly available through a Dataverse repository for free (re)use in meta-analyses, process-based modelling, and environmental studies, notably to improve understanding of nutrient cycling, contaminant dynamics, soil biodiversity, and long-term soil functioning in Sub-Sahelian agroecosystems, and to support sustainable land management and food security in Southern countries under future climate change.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112456"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome data of Propionibacterium freudenreichii J117, a functional strain from raw-milk cheese 原乳奶酪功能菌株弗氏丙酸杆菌J117的基因组数据。
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-24 DOI: 10.1016/j.dib.2026.112498
Paulina Deptula , Jenni Sihvola , Pekka Varmanen
This dataset reports the complete genome sequence of Propionibacterium freudenreichii strain J117, a food-grade bacterium isolated from Austrian Vorarlberger Bergkäs cheese. The strain was selected for its application in a co-fermentation platform aimed at enhancing vitamin B12 content in plant-based fermented foods. Genomic DNA was extracted from anaerobic cultures grown in yeast extract lactate (YEL) broth and sequenced using PacBio Sequel II long-read technology with SMRT Cell 8 M. High-fidelity (HiFi) reads were generated, and circular consensus sequences (CCS) were assembled using the Improved Phased Assembler (IPA v2).
Genome annotation was performed with Bakta v1.10.4. Antibiotic resistance screening was carried out using the Resistance Gene Identifier (RGI v6.0.3) from the Comprehensive Antibiotic Resistance Database (CARD) via the PROKSEE platform. No plasmid-encoded resistance determinants were identified. The genome comprises two circular replicons and includes full annotation of coding sequences, RNAs, CRISPR array, and pseudogenes.
The raw sequencing data, genome assembly files, and annotation outputs are included in the associated data repository, organized in subfolders for raw reads, assemblies, and analysis results. This dataset supports the related research article: Zhang, R., Chen, L., Zhang, D., Sihvola, J., Chamlagain, B., Olin, M., Piironen, V., & Varmanen, P. Innovative co-fermentation of Propionibacterium freudenreichii and Rhizopus oryzae enhances vitamin B12, riboflavin, and flavor profile components in sweet fermented glutinous rice. Food Chemistry, 503 (2026).
The availability of this genome provides a reference for comparative genomic analysis, functional pathway prediction, and strain development. It also facilitates safety assessment of food-related strains, such as the absence of mobile antibiotic resistance genes, thereby supporting the transparent use of J117 in fermented food applications.
该数据集报道了从奥地利Vorarlberger Bergkäs奶酪中分离出的一种食品级细菌——弗氏丙酸杆菌J117菌株的完整基因组序列。选择该菌株用于旨在提高植物性发酵食品中维生素B12含量的共发酵平台。从酵母提取物乳酸(YEL)培养液中厌氧培养物中提取基因组DNA,使用PacBio Sequel II长读技术与SMRT Cell 8 m进行测序,生成高保真(HiFi)读段,并使用改进的分阶段组装器(IPA v2)组装环状一致序列(CCS)。使用Bakta v1.10.4进行基因组注释。通过PROKSEE平台,使用抗生素耐药综合数据库(CARD)中的耐药基因标识符(RGI v6.0.3)进行抗生素耐药筛选。未发现质粒编码的抗性决定因素。基因组由两个圆形复制子组成,包括编码序列、rna、CRISPR阵列和假基因的完整注释。原始测序数据、基因组组装文件和注释输出包含在关联的数据存储库中,并组织在用于原始读取、组装和分析结果的子文件夹中。该数据集支持相关研究文章:Zhang, R., Chen, L., Zhang, D., Sihvola, J., Chamlagain, B., Olin, M., Piironen, V., Varmanen, P.,创新的弗氏丙酸杆菌和米根霉共发酵提高了甜发酵糯中的维生素B12、核黄素和风味成分。食品化学,2003,26(3):326 - 326。该基因组的可用性为比较基因组分析、功能途径预测和菌株开发提供了参考。它还有助于食品相关菌株的安全评估,例如不存在流动抗生素耐药基因,从而支持J117在发酵食品应用中的透明使用。
{"title":"Genome data of Propionibacterium freudenreichii J117, a functional strain from raw-milk cheese","authors":"Paulina Deptula ,&nbsp;Jenni Sihvola ,&nbsp;Pekka Varmanen","doi":"10.1016/j.dib.2026.112498","DOIUrl":"10.1016/j.dib.2026.112498","url":null,"abstract":"<div><div>This dataset reports the complete genome sequence of <em>Propionibacterium freudenreichii</em> strain J117, a food-grade bacterium isolated from Austrian Vorarlberger Bergkäs cheese. The strain was selected for its application in a co-fermentation platform aimed at enhancing vitamin B12 content in plant-based fermented foods. Genomic DNA was extracted from anaerobic cultures grown in yeast extract lactate (YEL) broth and sequenced using PacBio Sequel II long-read technology with SMRT Cell 8 M. High-fidelity (HiFi) reads were generated, and circular consensus sequences (CCS) were assembled using the Improved Phased Assembler (IPA v2).</div><div>Genome annotation was performed with Bakta v1.10.4. Antibiotic resistance screening was carried out using the Resistance Gene Identifier (RGI v6.0.3) from the Comprehensive Antibiotic Resistance Database (CARD) via the PROKSEE platform. No plasmid-encoded resistance determinants were identified. The genome comprises two circular replicons and includes full annotation of coding sequences, RNAs, CRISPR array, and pseudogenes.</div><div>The raw sequencing data, genome assembly files, and annotation outputs are included in the associated data repository, organized in subfolders for raw reads, assemblies, and analysis results. This dataset supports the related research article: Zhang, R., Chen, L., Zhang, D., Sihvola, J., Chamlagain, B., Olin, M., Piironen, V., &amp; Varmanen, P. Innovative co-fermentation of <em>Propionibacterium freudenreichii</em> and <em>Rhizopus oryzae</em> enhances vitamin B12, riboflavin, and flavor profile components in sweet fermented glutinous rice. <em>Food Chemistry</em>, 503 (2026).</div><div>The availability of this genome provides a reference for comparative genomic analysis, functional pathway prediction, and strain development. It also facilitates safety assessment of food-related strains, such as the absence of mobile antibiotic resistance genes, thereby supporting the transparent use of J117 in fermented food applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112498"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146178306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anonymous transactional dataset in a local food and beverages (F&B) micro-small-medium enterprise (MSME) for recommender systems 本地餐饮(F&B)中小企业(MSME)推荐系统的匿名交易数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-27 DOI: 10.1016/j.dib.2026.112519
Mychael Maoeretz Engel , Ford Lumban Gaol , Aditya Kurniawan , Widodo Budiharto
The dataset presented in this article comprises anonymous transactional records and associated product metadata collected from a local Food and Beverages (F&B) Micro-Small-Medium Enterprise (MSME) operating in a local city in Indonesia. This data can be used by researchers, data scientists, and industry professionals using various techniques in recommender system and machine learning. The data acquisition process involved the passive logging of sales events through the business's internal Point-of-Sale (POS) system from January 2025 to September 2025. The raw data, initially containing a comprehensive transaction log and a detailed product catalog, underwent a cleaning and structuring protocol. The process needed the elimination of seven unneeded features which included blank customer details and duplicate monetary entries and uniform product description formatting. The dataset lacks any distinctive customer identification numbers because it does not contain Customer IDs. The RFM analysis of 1000 transactions through session context grouping produced 14 pseudo-profiles which showed stability as behavioral indicators for unidentifiable users. The final data package consists of two relational tables which include the Transactions Table with 11 core features (Outlet, Date, Time and Total Amount) and the Products Metadata Table with definitions for 96 individual products. The available data in the dataset allows researchers to perform studies about retail analytics and recommender systems. Specifically, it supports the development and benchmarking of algorithms designed for session-based recommendation and the creation of user segmentation models in anonymous, data-sparse environments typical of the MSME retail sector.
本文中提供的数据集包括匿名交易记录和相关产品元数据,这些数据来自印度尼西亚当地城市的一家当地食品和饮料(F&;B)微型中小型企业(MSME)。研究人员、数据科学家和行业专业人士可以使用这些数据,使用推荐系统和机器学习中的各种技术。数据采集过程包括从2025年1月到2025年9月通过企业内部销售点(POS)系统被动记录销售事件。原始数据最初包含一个全面的事务日志和一个详细的产品目录,经过了清理和结构化协议。这个过程需要消除7个不需要的特征,包括空白的客户详细信息、重复的货币分录和统一的产品描述格式。该数据集缺少任何独特的客户标识号,因为它不包含客户id。通过会话上下文分组对1000个事务进行的RFM分析产生了14个伪配置文件,这些伪配置文件显示了作为不可识别用户的行为指标的稳定性。最终的数据包由两个关系表组成,其中包括包含11个核心特性(Outlet、Date、Time和Total Amount)的事务表和包含96个单独产品定义的产品元数据表。数据集中的可用数据允许研究人员对零售分析和推荐系统进行研究。具体来说,它支持为基于会话的推荐设计的算法的开发和基准测试,以及在MSME零售部门典型的匿名、数据稀疏环境中创建用户细分模型。
{"title":"Anonymous transactional dataset in a local food and beverages (F&B) micro-small-medium enterprise (MSME) for recommender systems","authors":"Mychael Maoeretz Engel ,&nbsp;Ford Lumban Gaol ,&nbsp;Aditya Kurniawan ,&nbsp;Widodo Budiharto","doi":"10.1016/j.dib.2026.112519","DOIUrl":"10.1016/j.dib.2026.112519","url":null,"abstract":"<div><div>The dataset presented in this article comprises anonymous transactional records and associated product metadata collected from a local Food and Beverages (F&amp;B) Micro-Small-Medium Enterprise (MSME) operating in a local city in Indonesia. This data can be used by researchers, data scientists, and industry professionals using various techniques in recommender system and machine learning. The data acquisition process involved the passive logging of sales events through the business's internal Point-of-Sale (POS) system from January 2025 to September 2025. The raw data, initially containing a comprehensive transaction log and a detailed product catalog, underwent a cleaning and structuring protocol. The process needed the elimination of seven unneeded features which included blank customer details and duplicate monetary entries and uniform product description formatting. The dataset lacks any distinctive customer identification numbers because it does not contain Customer IDs. The RFM analysis of 1000 transactions through session context grouping produced 14 pseudo-profiles which showed stability as behavioral indicators for unidentifiable users. The final data package consists of two relational tables which include the Transactions Table with 11 core features (Outlet, Date, Time and Total Amount) and the Products Metadata Table with definitions for 96 individual products. The available data in the dataset allows researchers to perform studies about retail analytics and recommender systems. Specifically, it supports the development and benchmarking of algorithms designed for session-based recommendation and the creation of user segmentation models in anonymous, data-sparse environments typical of the MSME retail sector.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112519"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146184681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-cell RNA-seq data of wild type and fli1b mutant zebrafish embryos 野生型和fli1b突变斑马鱼胚胎的单细胞RNA-seq数据
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-10 DOI: 10.1016/j.dib.2026.112459
Luiza N. Loges , Ricardo DeMoya , Valentina Laverde, Saulius Sumanas
Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic fli1b mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) fli1b mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and fli1b mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.
Fli1b是一种ETS转录因子,先前已发现与斑马鱼血管和造血发育有关。在这里,我们展示了受精后24小时野生型和母体合子fli1b突变斑马鱼胚胎的单细胞RNA测序数据。从大约40个全母合子(MZ) fli1b突变体和兄弟亲本野生型胚胎中获得单细胞悬液,并使用10X Genomics Chromium平台进行RNA测序。经过生物信息学分析,在整合的野生型和fli1b突变数据集中鉴定出34个不同的细胞簇。随后根据标记基因的表达对聚类进行注释。这些数据将为进一步研究血管和造血发育的分子机制提供有价值的信息。此外,获得的多种细胞类型的转录组将有助于研究斑马鱼和其他模型的其他发育机制。
{"title":"Single-cell RNA-seq data of wild type and fli1b mutant zebrafish embryos","authors":"Luiza N. Loges ,&nbsp;Ricardo DeMoya ,&nbsp;Valentina Laverde,&nbsp;Saulius Sumanas","doi":"10.1016/j.dib.2026.112459","DOIUrl":"10.1016/j.dib.2026.112459","url":null,"abstract":"<div><div>Fli1b is an ETS transcription factor, which has been previously implicated in zebrafish vascular and hematopoietic development. Here we present single cell RNA sequencing data from wild-type and maternal zygotic <em>fli1b</em> mutant zebrafish embryos at 24 h post fertilization. Single-cell suspensions were obtained from approximately 40 whole maternal-zygotic (MZ) <em>fli1b</em> mutant and sibling parent wild-type embryos and subjected to RNA sequencing using the 10X Genomics Chromium platform. Following bioinformatic analysis, 34 distinct cell clusters were identified in the integrated wild-type and <em>fli1b</em> mutant dataset. The clusters were subsequently annotated based on expression of marker genes. These data will be valuable for further studies of the molecular mechanisms involved in vascular and hematopoietic development. In addition, the obtained transcriptomes of multiple cell types will be useful to investigate other developmental mechanisms in zebrafish and other models.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112459"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Sentinel-1 SAR imagery dataset for airstrips detection and segmentation in the Brazilian Amazon Rainforest 巴西亚马逊雨林机场跑道检测与分割的Sentinel-1 SAR图像数据集
IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-04-01 Epub Date: 2026-01-14 DOI: 10.1016/j.dib.2026.112472
Leandro da Silva Gomes , Gustavo Henrique de Queiroz Stabile , Tahisa Neitzel Kuck , Felipe Augusto Pereira de Figueiredo , Elcio Hideiti Shiguemori , Dimas Irion Alves
The Brazilian Amazon Rainforest holds a large ecological and economic importance and is considered one of the most biodiverse regions on the planet. The region faces numerous challenges from illegal human activities that threaten its sustainability and well-being, which are often supported by the construction of unauthorized airstrips. Additionally, due to its persistent cloud cover, which often hinders monitoring with optical satellites, Synthetic Aperture Radar (SAR) imagery provides a crucial alternative for the region surveillance. Thus, this dataset was developed to support the training and evaluation of machine learning techniques, including deep learning models for detecting and segmenting airstrips in the Brazilian Amazon Rainforest using SAR imagery. The dataset comprises images from the Sentinel-1 satellite, acquired primarily between 2021 and 2024, covering 1040 locations of known airstrips sourced from the MapBiomas project (published in 2023, based on 2021 reference data). For the change detection task, historical “before” images were selected from the period between 2014 and 2021 to capture the pre-construction state. The data is structured to support three distinct machine learning tasks: object detection (e.g., YOLOv8), semantic segmentation (e.g., U-Net), and change detection. For each task, specific images and annotations are provided. Additionally, geospatial files (Shapefile, GeoPackage) are included to facilitate the integration and visualization of the dataset in a GIS environment. The data is valuable for researchers in remote sensing, computer vision, environmental monitoring, security and defense, enabling the development of automated systems to monitor irregular activities in remote forest regions. The dataset is available at a Mendeley Data repository: https://data.mendeley.com/datasets/x7rn78ymtn/1
巴西亚马逊雨林拥有巨大的生态和经济重要性,被认为是地球上生物多样性最丰富的地区之一。该地区面临着许多来自非法人类活动的挑战,这些活动威胁到其可持续性和福祉,这些活动往往由未经授权的飞机跑道建设提供支持。此外,由于其持续的云层覆盖通常会阻碍光学卫星的监测,合成孔径雷达(SAR)图像为区域监测提供了一个重要的替代方案。因此,该数据集的开发是为了支持机器学习技术的培训和评估,包括使用SAR图像检测和分割巴西亚马逊雨林中的飞机跑道的深度学习模型。该数据集包括Sentinel-1卫星的图像,主要在2021年至2024年期间获取,涵盖了MapBiomas项目(2023年发布,基于2021年参考数据)中已知机场跑道的1040个位置。对于变化检测任务,选择2014年至2021年期间的历史“之前”图像来捕捉施工前的状态。数据的结构支持三个不同的机器学习任务:对象检测(例如,YOLOv8),语义分割(例如,U-Net)和变化检测。对于每个任务,都提供了特定的图像和注释。此外,还包括地理空间文件(Shapefile、geoppackage),以促进数据集在GIS环境中的集成和可视化。这些数据对遥感、计算机视觉、环境监测、安全和国防方面的研究人员很有价值,使开发自动化系统能够监测偏远森林地区的不规则活动。该数据集可在Mendeley数据存储库中获得:https://data.mendeley.com/datasets/x7rn78ymtn/1
{"title":"A Sentinel-1 SAR imagery dataset for airstrips detection and segmentation in the Brazilian Amazon Rainforest","authors":"Leandro da Silva Gomes ,&nbsp;Gustavo Henrique de Queiroz Stabile ,&nbsp;Tahisa Neitzel Kuck ,&nbsp;Felipe Augusto Pereira de Figueiredo ,&nbsp;Elcio Hideiti Shiguemori ,&nbsp;Dimas Irion Alves","doi":"10.1016/j.dib.2026.112472","DOIUrl":"10.1016/j.dib.2026.112472","url":null,"abstract":"<div><div>The Brazilian Amazon Rainforest holds a large ecological and economic importance and is considered one of the most biodiverse regions on the planet. The region faces numerous challenges from illegal human activities that threaten its sustainability and well-being, which are often supported by the construction of unauthorized airstrips. Additionally, due to its persistent cloud cover, which often hinders monitoring with optical satellites, Synthetic Aperture Radar (SAR) imagery provides a crucial alternative for the region surveillance. Thus, this dataset was developed to support the training and evaluation of machine learning techniques, including deep learning models for detecting and segmenting airstrips in the Brazilian Amazon Rainforest using SAR imagery. The dataset comprises images from the Sentinel-1 satellite, acquired primarily between 2021 and 2024, covering 1040 locations of known airstrips sourced from the MapBiomas project (published in 2023, based on 2021 reference data). For the change detection task, historical “before” images were selected from the period between 2014 and 2021 to capture the pre-construction state. The data is structured to support three distinct machine learning tasks: object detection (e.g., YOLOv8), semantic segmentation (e.g., U-Net), and change detection. For each task, specific images and annotations are provided. Additionally, geospatial files (Shapefile, GeoPackage) are included to facilitate the integration and visualization of the dataset in a GIS environment. The data is valuable for researchers in remote sensing, computer vision, environmental monitoring, security and defense, enabling the development of automated systems to monitor irregular activities in remote forest regions. The dataset is available at a Mendeley Data repository: <span><span>https://data.mendeley.com/datasets/x7rn78ymtn/1</span><svg><path></path></svg></span></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"65 ","pages":"Article 112472"},"PeriodicalIF":1.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146036171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data in Brief
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1