首页 > 最新文献

Scientific Data最新文献

英文 中文
A full-length mtDNA dataset for studying genetic variations across generations and complex family structures. 全长mtDNA数据集,用于研究跨代遗传变异和复杂的家庭结构。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-13 DOI: 10.1038/s41597-026-06824-0
Yanan Liu, Qi Yang, Yujia Xuan, Jinyuan Zhao, Anqi Chen, Suhua Zhang

Mitochondrial DNA (mtDNA) mutations are critical to disease research, evolutionary studies, and lineage tracing but are challenging to analyze due to interference from nuclear mitochondrial sequences (NUMTs). Current high-throughput sequencing techniques rely on multiple primers or probes to amplify short mtDNA fragments, followed by alignment to a reference genome. However, this approach fails to mitigate NUMTs interference, leading to ambiguous results. In this study, we presented a nanopore-based third-generation sequencing (TGS) method using a single primer pair to amplify full-length mtDNA, effectively circumventing NUMTs artifacts. Sequencing was carried out on the QITAN TECH QNome-3841hex platform, generating complete mtDNA coverage for 106 samples from eight distinct family pedigrees, including complex familial structures such as half-siblings and multi-generational households. The sequencing achieved 100% genome coverage with an average mapping rate of 99.96%, supporting comprehensive genome characterization. The resulting dataset offers valuable insights into mtDNA mutation detection, mitochondrial genetics, population genetics, ancestry tracing, and forensic identification, and may advance mtDNA sequencing technologies and intergenerational studies.

线粒体DNA (mtDNA)突变对疾病研究、进化研究和谱系追踪至关重要,但由于核线粒体序列(numt)的干扰,分析具有挑战性。目前的高通量测序技术依赖于多个引物或探针来扩增短mtDNA片段,然后与参考基因组比对。然而,这种方法不能减轻numt的干扰,导致结果不明确。在这项研究中,我们提出了一种基于纳米孔的第三代测序(TGS)方法,使用单个引物对扩增全长mtDNA,有效地规避了numt伪影。测序在QITAN TECH qname -3841hex平台上进行,对来自8个不同家庭谱系的106个样本进行了完整的mtDNA覆盖,包括半兄弟姐妹和多代家庭等复杂的家庭结构。测序实现了100%的基因组覆盖率,平均作图率为99.96%,支持全面的基因组表征。由此产生的数据集为mtDNA突变检测、线粒体遗传学、群体遗传学、祖先追踪和法医鉴定提供了有价值的见解,并可能推进mtDNA测序技术和代际研究。
{"title":"A full-length mtDNA dataset for studying genetic variations across generations and complex family structures.","authors":"Yanan Liu, Qi Yang, Yujia Xuan, Jinyuan Zhao, Anqi Chen, Suhua Zhang","doi":"10.1038/s41597-026-06824-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06824-0","url":null,"abstract":"<p><p>Mitochondrial DNA (mtDNA) mutations are critical to disease research, evolutionary studies, and lineage tracing but are challenging to analyze due to interference from nuclear mitochondrial sequences (NUMTs). Current high-throughput sequencing techniques rely on multiple primers or probes to amplify short mtDNA fragments, followed by alignment to a reference genome. However, this approach fails to mitigate NUMTs interference, leading to ambiguous results. In this study, we presented a nanopore-based third-generation sequencing (TGS) method using a single primer pair to amplify full-length mtDNA, effectively circumventing NUMTs artifacts. Sequencing was carried out on the QITAN TECH QNome-3841hex platform, generating complete mtDNA coverage for 106 samples from eight distinct family pedigrees, including complex familial structures such as half-siblings and multi-generational households. The sequencing achieved 100% genome coverage with an average mapping rate of 99.96%, supporting comprehensive genome characterization. The resulting dataset offers valuable insights into mtDNA mutation detection, mitochondrial genetics, population genetics, ancestry tracing, and forensic identification, and may advance mtDNA sequencing technologies and intergenerational studies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146195327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Light sheet microscopy imaging dataset of CAR-T-cell-mediated cytotoxicity. car - t细胞介导的细胞毒性的薄层显微镜成像数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06829-9
Jie Wang, Jin Jin, Yuekun Fang, Liting Chen, Peng Fei

Research on CAR-T cell states is crucial for understanding the mechanisms of immunotherapy. Previous studies in live cells have been primarily limited by phototoxicity, resolution, and throughput, making it difficult to conduct further research and observations on cell states. To enable more detailed studies of cell states, we developed a microscopy imaging system with subcellular resolution, low phototoxicity, high imaging throughput, and automated data reconstruction. Using this system, we have generated and shared over 400 image sets that capture the cytotoxic effects of CAR-T cells on tumor cells. The data provide an isotropic spatial resolution of 320 nm, a temporal resolution of up to 2.5 seconds per volume, and long-term observations spanning up to 5 hours. This study reports an imaging system that fills an essential gap in the field, offers valuable insights into the cytotoxic processes of CAR-T cells, and significantly advances research in this area.

CAR-T细胞状态的研究对于理解免疫治疗的机制至关重要。以往对活细胞的研究主要受到光毒性、分辨率和通量的限制,难以对细胞状态进行进一步的研究和观察。为了更详细地研究细胞状态,我们开发了一种具有亚细胞分辨率、低光毒性、高成像吞吐量和自动数据重建的显微镜成像系统。使用这个系统,我们已经生成并共享了400多个图像集,这些图像集捕获了CAR-T细胞对肿瘤细胞的细胞毒性作用。数据提供了320 nm的各向同性空间分辨率,每体积的时间分辨率高达2.5秒,以及长达5小时的长期观测。该研究报告了一种成像系统,填补了该领域的重要空白,为CAR-T细胞的细胞毒性过程提供了有价值的见解,并显著推进了该领域的研究。
{"title":"Light sheet microscopy imaging dataset of CAR-T-cell-mediated cytotoxicity.","authors":"Jie Wang, Jin Jin, Yuekun Fang, Liting Chen, Peng Fei","doi":"10.1038/s41597-026-06829-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06829-9","url":null,"abstract":"<p><p>Research on CAR-T cell states is crucial for understanding the mechanisms of immunotherapy. Previous studies in live cells have been primarily limited by phototoxicity, resolution, and throughput, making it difficult to conduct further research and observations on cell states. To enable more detailed studies of cell states, we developed a microscopy imaging system with subcellular resolution, low phototoxicity, high imaging throughput, and automated data reconstruction. Using this system, we have generated and shared over 400 image sets that capture the cytotoxic effects of CAR-T cells on tumor cells. The data provide an isotropic spatial resolution of 320 nm, a temporal resolution of up to 2.5 seconds per volume, and long-term observations spanning up to 5 hours. This study reports an imaging system that fills an essential gap in the field, offers valuable insights into the cytotoxic processes of CAR-T cells, and significantly advances research in this area.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze. Multi-TPC:一个包含语音、动作和凝视的三方对话的多模态数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06819-x
Meng-Chen Lee, Zhigang Deng

Analysis and generation of conversational gestures, especially in multi-party settings, remains an open challenge in many fields, due to the lack of publicly available datasets, models, and standardized evaluation metrics. To address this gap, we introduce Multi-TPC, a multimodal dataset of three-party conversations featuring synchronized speech, motion, and gaze. Multi-TPC captures rich conversational dynamics, enabling the study of interactions between multiple participants. Our statistical analysis reveals correlations between gestures and various modalities, including audio, text, and speaker identity. Our dataset and model provide a foundation for advancing research in discourse analysis, human communication dynamics, and multimodal interaction.

由于缺乏公开可用的数据集、模型和标准化评估指标,会话手势的分析和生成,特别是在多方环境中,在许多领域仍然是一个开放的挑战。为了解决这一差距,我们引入了Multi-TPC,这是一个三方对话的多模式数据集,具有同步的语音、动作和凝视。Multi-TPC捕获丰富的会话动态,使研究多个参与者之间的交互成为可能。我们的统计分析揭示了手势和各种形式之间的相关性,包括音频、文本和说话者身份。我们的数据集和模型为推进话语分析、人类交流动力学和多模态交互的研究提供了基础。
{"title":"Multi-TPC: A Multimodal Dataset for Three-Party Conversations with Speech, Motion, and Gaze.","authors":"Meng-Chen Lee, Zhigang Deng","doi":"10.1038/s41597-026-06819-x","DOIUrl":"https://doi.org/10.1038/s41597-026-06819-x","url":null,"abstract":"<p><p>Analysis and generation of conversational gestures, especially in multi-party settings, remains an open challenge in many fields, due to the lack of publicly available datasets, models, and standardized evaluation metrics. To address this gap, we introduce Multi-TPC, a multimodal dataset of three-party conversations featuring synchronized speech, motion, and gaze. Multi-TPC captures rich conversational dynamics, enabling the study of interactions between multiple participants. Our statistical analysis reveals correlations between gestures and various modalities, including audio, text, and speaker identity. Our dataset and model provide a foundation for advancing research in discourse analysis, human communication dynamics, and multimodal interaction.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TURB-Smoke. A database of Lagrangian pollutants emitted from point sources in turbulent flows with a mean wind. TURB-Smoke。拉格朗日污染物的数据库,从点源释放的湍流与平均风。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06774-7
Luca Biferale, Fabio Bonaccorso, Niccolò Cocciaglia, Robin A Heinonen, Lorenzo Piro

Identifying the location and characteristics of pollution sources in turbulent flows is challenging, especially for environmental monitoring and emergency response, due to sparse, stochastic, and infrequent cue detection. Even in idealized settings, accurately modeling these phenomena remains highly complex, with realistic representations typically achievable only through experimental or simulation-based data. We introduce TURB-Smoke, a cutting-edge numerical dataset designed for investigating odor and contaminant dispersion in turbulent environments with and without mean wind. Generated via direct numerical simulations of the fully resolved three-dimensional Navier-Stokes equations, TURB-Smoke tracks hundreds of millions of Lagrangian particles released from five distinct point sources in fully developed turbulence, thus providing a reliable ground-truth framework for developing and evaluating source-tracking strategies using stationary sensors or mobile agents in realistic flows. Each particle's trajectory is continuously tracked on many characteristic turbulence timescales, recording both the position and the local flow velocity. Additionally, we provide coarse-grained concentration fields in 3D and in quasi-2D slabs containing the source, ideal for quickly testing and optimizing search algorithms under varying flow conditions.

由于线索检测的稀疏、随机和不频繁,确定湍流中污染源的位置和特征具有挑战性,特别是对于环境监测和应急响应而言。即使在理想化的环境中,对这些现象的精确建模仍然非常复杂,通常只有通过实验或基于模拟的数据才能实现真实的表示。我们介绍了TURB-Smoke,一个尖端的数值数据集,用于研究有和没有平均风的湍流环境中的气味和污染物分散。通过对完全解决的三维Navier-Stokes方程的直接数值模拟生成,TURB-Smoke跟踪在完全发展的湍流中从五个不同的点源释放的数亿个拉格朗日粒子,从而为在现实流动中使用固定传感器或移动代理开发和评估源跟踪策略提供可靠的基础事实框架。在许多特征湍流时间尺度上连续跟踪每个粒子的轨迹,记录位置和局部流速。此外,我们还提供包含源的3D和准2d平板中的粗粒度浓度场,非常适合在不同流动条件下快速测试和优化搜索算法。
{"title":"TURB-Smoke. A database of Lagrangian pollutants emitted from point sources in turbulent flows with a mean wind.","authors":"Luca Biferale, Fabio Bonaccorso, Niccolò Cocciaglia, Robin A Heinonen, Lorenzo Piro","doi":"10.1038/s41597-026-06774-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06774-7","url":null,"abstract":"<p><p>Identifying the location and characteristics of pollution sources in turbulent flows is challenging, especially for environmental monitoring and emergency response, due to sparse, stochastic, and infrequent cue detection. Even in idealized settings, accurately modeling these phenomena remains highly complex, with realistic representations typically achievable only through experimental or simulation-based data. We introduce TURB-Smoke, a cutting-edge numerical dataset designed for investigating odor and contaminant dispersion in turbulent environments with and without mean wind. Generated via direct numerical simulations of the fully resolved three-dimensional Navier-Stokes equations, TURB-Smoke tracks hundreds of millions of Lagrangian particles released from five distinct point sources in fully developed turbulence, thus providing a reliable ground-truth framework for developing and evaluating source-tracking strategies using stationary sensors or mobile agents in realistic flows. Each particle's trajectory is continuously tracked on many characteristic turbulence timescales, recording both the position and the local flow velocity. Additionally, we provide coarse-grained concentration fields in 3D and in quasi-2D slabs containing the source, ideal for quickly testing and optimizing search algorithms under varying flow conditions.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WearGait-PD: An Open-Access Wearables Dataset for Gait in Parkinson's Disease and Age-Matched Controls. WearGait-PD:一个开放获取的可穿戴设备数据集,用于帕金森病和年龄匹配对照的步态。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06806-2
Anthony J Anderson, David Eguren, Michael A Gonzalez, Michael Caiola, Naima Khan, Sophia Watkinson, Isabella Zuccaroli, Siegfried S Hirczy, Cyrus P Zabetian, Kelly Mills, Emile Moukheiber, Laureano Moro-Velazquez, Najim Dehak, Chelsie Motley, Brittney C Muir, Ankur Butala, Kimberly Kontson

Wearable movement sensors are powerful tools for objectively characterizing and quantifying movement. They enhance the precise characterization of gait, balance, and motor symptoms in Parkinson's disease and related disorders, facilitating in-clinic and remote assessments, disease management, and therapeutic intervention development. Access to high-quality data from these sensors can accelerate discoveries in this clinical population. The WearGait-PD open-access dataset contains raw inertial measurement unit (IMU) and sensorized insole data from 100 individuals with PD and 85 age-matched controls, synchronized to a gait walkway reference system. IMU data include 3-degree of freedom (DOF) acceleration, rotational velocity, magnetic field strength, and orientation for each of 13 sensors on the participant's body. Sensor insole data include absolute pressure from 16 sensors in each insole and 3-DOF acceleration and rotational velocity. Walkway data include 2D position and relative pressure for each active sensor during every footfall. Frame-by-frame annotation of participant actions during gait and balance tasks was incorporated using synchronized video cameras. All data were associated with demographic information and clinical evaluations (e.g., medications, DBS-status, MDS-UPDRS scores).

可穿戴式运动传感器是客观表征和量化运动的有力工具。它们增强了帕金森病及相关疾病的步态、平衡和运动症状的精确表征,促进了临床和远程评估、疾病管理和治疗干预的发展。从这些传感器获取高质量数据可以加速这一临床人群的发现。WearGait-PD开放获取数据集包含来自100名PD患者和85名年龄匹配的对照者的原始惯性测量单元(IMU)和感应鞋垫数据,与步态通道参考系统同步。IMU数据包括3自由度(DOF)加速度、旋转速度、磁场强度和参与者身体上13个传感器的方向。传感器鞋垫数据包括每个鞋垫16个传感器的绝对压力和3-DOF加速度和旋转速度。人行道数据包括每次行走时每个主动传感器的二维位置和相对压力。使用同步摄像机对参与者在步态和平衡任务中的动作进行逐帧注释。所有数据均与人口统计信息和临床评估(如药物、dbs状态、MDS-UPDRS评分)相关。
{"title":"WearGait-PD: An Open-Access Wearables Dataset for Gait in Parkinson's Disease and Age-Matched Controls.","authors":"Anthony J Anderson, David Eguren, Michael A Gonzalez, Michael Caiola, Naima Khan, Sophia Watkinson, Isabella Zuccaroli, Siegfried S Hirczy, Cyrus P Zabetian, Kelly Mills, Emile Moukheiber, Laureano Moro-Velazquez, Najim Dehak, Chelsie Motley, Brittney C Muir, Ankur Butala, Kimberly Kontson","doi":"10.1038/s41597-026-06806-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06806-2","url":null,"abstract":"<p><p>Wearable movement sensors are powerful tools for objectively characterizing and quantifying movement. They enhance the precise characterization of gait, balance, and motor symptoms in Parkinson's disease and related disorders, facilitating in-clinic and remote assessments, disease management, and therapeutic intervention development. Access to high-quality data from these sensors can accelerate discoveries in this clinical population. The WearGait-PD open-access dataset contains raw inertial measurement unit (IMU) and sensorized insole data from 100 individuals with PD and 85 age-matched controls, synchronized to a gait walkway reference system. IMU data include 3-degree of freedom (DOF) acceleration, rotational velocity, magnetic field strength, and orientation for each of 13 sensors on the participant's body. Sensor insole data include absolute pressure from 16 sensors in each insole and 3-DOF acceleration and rotational velocity. Walkway data include 2D position and relative pressure for each active sensor during every footfall. Frame-by-frame annotation of participant actions during gait and balance tasks was incorporated using synchronized video cameras. All data were associated with demographic information and clinical evaluations (e.g., medications, DBS-status, MDS-UPDRS scores).</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full-elevational gradient dataset on moth diversity and abundance in a temperate mountain range. 温带山区飞蛾多样性和丰度的全海拔梯度数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06837-9
Oldřich Čížek, Pavel Marhoul, Tomáš Kadlec, Oto Kaláb, Tomáš Jor, Antonín Hlaváček

Climate change is reshaping ecosystems worldwide, yet our ability to quantify its long-term impact across taxa is limited by a lack of reliable and comparable data. Here, we present a systematically collected long-term dataset spanning nearly a decade (2012-2021), documenting the diversity, abundance, and distribution of 439 moth species (Lepidoptera: Heterocera) from the Czech part of the Giant Mountains, a region entirely protected as Krkonoše National Park. Using standardised light traps, we sampled 982 localities across an area of 550 km², yielding a total of 64,776 specimens. Localities are accompanied by in-situ assessments of vegetation characteristics and management regimes, complemented by topographical derivatives and ecosystem information retrieved post-hoc from open spatial data. The dataset provides a valuable resource for investigating spatial and temporal patterns in moth diversity and abundance, as well as for evaluating the effects of different management practices, supporting both basic and applied research.

气候变化正在重塑全球生态系统,但由于缺乏可靠和可比较的数据,我们量化其对分类群的长期影响的能力受到限制。在这里,我们提供了一个系统收集的近十年(2012-2021)的长期数据集,记录了来自巨人山脉捷克部分的439种飞蛾(鳞翅目:异角目)的多样性、丰度和分布,该地区完全被保护为Krkonoše国家公园。使用标准化光诱法,我们在550平方公里的区域内对982个地点进行了采样,共采集了64,776个标本。在对地点进行实地评估的同时,还对植被特征和管理制度进行评估,并辅以从开放空间数据中检索的地形衍生物和生态系统信息。该数据集为调查飞蛾多样性和丰度的时空格局,以及评估不同管理措施的效果提供了宝贵的资源,为基础研究和应用研究提供了支持。
{"title":"Full-elevational gradient dataset on moth diversity and abundance in a temperate mountain range.","authors":"Oldřich Čížek, Pavel Marhoul, Tomáš Kadlec, Oto Kaláb, Tomáš Jor, Antonín Hlaváček","doi":"10.1038/s41597-026-06837-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06837-9","url":null,"abstract":"<p><p>Climate change is reshaping ecosystems worldwide, yet our ability to quantify its long-term impact across taxa is limited by a lack of reliable and comparable data. Here, we present a systematically collected long-term dataset spanning nearly a decade (2012-2021), documenting the diversity, abundance, and distribution of 439 moth species (Lepidoptera: Heterocera) from the Czech part of the Giant Mountains, a region entirely protected as Krkonoše National Park. Using standardised light traps, we sampled 982 localities across an area of 550 km², yielding a total of 64,776 specimens. Localities are accompanied by in-situ assessments of vegetation characteristics and management regimes, complemented by topographical derivatives and ecosystem information retrieved post-hoc from open spatial data. The dataset provides a valuable resource for investigating spatial and temporal patterns in moth diversity and abundance, as well as for evaluating the effects of different management practices, supporting both basic and applied research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multimodal Dataset for Neurophysiological and AI Applications. 神经生理学和人工智能应用的多模态数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06758-7
Juan Trujillo, Rosario Ferrer-Cascales, Miguel A Teruel, Nicolás Ruiz-Robledillo, Javier Sanchis, Sandra García-Ponsoda, Alejandro Panagiotidis-Arrizabalaga, Natalia Albaladejo-Blázquez, Ángela Martínez-Nicolás, Jorge García-Carrasco, Alejandro Reina, Ana Lavalle, Alejandro Maté, Borja Costa-López

Attention Deficit Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder characterized by inattention, hyperactivity, and impulsivity. Current diagnostic methods rely primarily on subjective clinical evaluations, which are prone to bias. Neurophysiological techniques such as electroencephalography (EEG), eye tracking, and electrodermal activity (EDA) offer promising objective alternatives; however, their adoption is limited by the scarcity of large, public, multimodal datasets. To address this gap, we introduce the BALLADEER ADHD Dataset, a comprehensive multimodal resource that integrates simultaneous EEG, eye-tracking, and physiological signals from children and adolescents with ADHD and neurotypical controls. Data were collected through carefully designed cognitive tasks aimed at eliciting neurophysiological responses related to attentional control, response inhibition, and cognitive flexibility-key domains affected in ADHD. The dataset facilitates the development of machine learning models for ADHD classification and biomarker discovery through cross-modal analyses of EEG, eye movements, and autonomic nervous system activity. By publicly releasing this dataset, we aim to enhance transparency, reproducibility, and innovation in computational neuroscience and ADHD research.

注意缺陷多动障碍(ADHD)是一种普遍的神经发育障碍,以注意力不集中、多动和冲动为特征。目前的诊断方法主要依赖于主观的临床评估,这很容易产生偏差。神经生理学技术,如脑电图(EEG)、眼动追踪和皮电活动(EDA)提供了有希望的客观替代方案;然而,它们的采用受到缺乏大型、公共、多模式数据集的限制。为了解决这一差距,我们引入了BALLADEER ADHD数据集,这是一个综合的多模式资源,整合了多动症儿童和青少年以及神经正常对照组的同步脑电图、眼动追踪和生理信号。数据通过精心设计的认知任务收集,旨在引发与注意力控制、反应抑制和认知灵活性相关的神经生理反应,这些反应是ADHD影响的关键领域。该数据集通过对脑电图、眼球运动和自主神经系统活动的跨模态分析,促进了ADHD分类和生物标志物发现的机器学习模型的开发。通过公开发布这个数据集,我们的目标是提高计算神经科学和多动症研究的透明度、可重复性和创新性。
{"title":"A Multimodal Dataset for Neurophysiological and AI Applications.","authors":"Juan Trujillo, Rosario Ferrer-Cascales, Miguel A Teruel, Nicolás Ruiz-Robledillo, Javier Sanchis, Sandra García-Ponsoda, Alejandro Panagiotidis-Arrizabalaga, Natalia Albaladejo-Blázquez, Ángela Martínez-Nicolás, Jorge García-Carrasco, Alejandro Reina, Ana Lavalle, Alejandro Maté, Borja Costa-López","doi":"10.1038/s41597-026-06758-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06758-7","url":null,"abstract":"<p><p>Attention Deficit Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder characterized by inattention, hyperactivity, and impulsivity. Current diagnostic methods rely primarily on subjective clinical evaluations, which are prone to bias. Neurophysiological techniques such as electroencephalography (EEG), eye tracking, and electrodermal activity (EDA) offer promising objective alternatives; however, their adoption is limited by the scarcity of large, public, multimodal datasets. To address this gap, we introduce the BALLADEER ADHD Dataset, a comprehensive multimodal resource that integrates simultaneous EEG, eye-tracking, and physiological signals from children and adolescents with ADHD and neurotypical controls. Data were collected through carefully designed cognitive tasks aimed at eliciting neurophysiological responses related to attentional control, response inhibition, and cognitive flexibility-key domains affected in ADHD. The dataset facilitates the development of machine learning models for ADHD classification and biomarker discovery through cross-modal analyses of EEG, eye movements, and autonomic nervous system activity. By publicly releasing this dataset, we aim to enhance transparency, reproducibility, and innovation in computational neuroscience and ADHD research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global daily 9 km remotely sensed soil moisture (2015-2025) with microwave radiative transfer-guided learning. 2015-2025年全球日9公里遥感土壤湿度(微波辐射迁移引导学习)
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06721-6
Sijia Feng, Aoyang Li, Rui Zhou, Klaus Butterbach-Bahl, Kaiyu Guan, Zhenong Jin, Majken C Looms, Sherrie Wang, Christian Igel, Claire Treat, Jørgen Eivind Olesen, Sheng Wang

Accurate estimation of surface soil moisture (SM) in terrestrial ecosystems is essential for understanding hydroclimate dynamics. The L-band Soil Moisture Active Passive (SMAP) mission provides 9-km global daily surface SM by using a microwave radiative transfer model (RTM)-based algorithm. However, the accuracy of SMAP SM is limited in regions with dense vegetation cover and complex surface conditions, due to the empirical parameterization and oversimplified radiative transfer processes. To overcome the limitations, we developed a Process-Guided Machine Learning (PGML) framework to integrate RTM theories and deep learning to predict global daily surface 9-km SM from April 2015 to June 2025. Informed by domain knowledge, we developed the PGML model structure using RTM and hydrological theories, designed a Kling-Gupta efficiency-based cost function, pretrained it with RTM simulations, and fine-tuned it with in-situ measurements. The independent validation shows that PGML SM has strong agreement with in-situ measurements (R = 0.868 and unbiased RMSE = 0.054 m3/m3). This study highlights the potential of PGML to enhance the accuracy of satellite SM, thereby supporting improved water resources and ecosystem management.

陆地生态系统表层土壤水分的准确估算是理解水文气候动力学的基础。l波段土壤湿度主被动(SMAP)任务利用基于微波辐射传输模型(RTM)的算法提供9 km全球日地表SM。然而,由于经验参数化和辐射传输过程过于简化,在植被覆盖密集和地表条件复杂的地区,SMAP的SM精度受到限制。为了克服局限性,我们开发了一个过程引导机器学习(PGML)框架,将RTM理论和深度学习结合起来,预测2015年4月至2025年6月全球日地表9公里SM。根据领域知识,我们利用RTM和水文理论开发了PGML模型结构,设计了基于Kling-Gupta效率的成本函数,使用RTM模拟对其进行预训练,并通过现场测量对其进行微调。独立验证表明,PGML SM与原位测量结果具有较强的一致性(R = 0.868,无偏RMSE = 0.054 m3/m3)。该研究强调了PGML在提高卫星SM精度方面的潜力,从而支持改善水资源和生态系统管理。
{"title":"Global daily 9 km remotely sensed soil moisture (2015-2025) with microwave radiative transfer-guided learning.","authors":"Sijia Feng, Aoyang Li, Rui Zhou, Klaus Butterbach-Bahl, Kaiyu Guan, Zhenong Jin, Majken C Looms, Sherrie Wang, Christian Igel, Claire Treat, Jørgen Eivind Olesen, Sheng Wang","doi":"10.1038/s41597-026-06721-6","DOIUrl":"https://doi.org/10.1038/s41597-026-06721-6","url":null,"abstract":"<p><p>Accurate estimation of surface soil moisture (SM) in terrestrial ecosystems is essential for understanding hydroclimate dynamics. The L-band Soil Moisture Active Passive (SMAP) mission provides 9-km global daily surface SM by using a microwave radiative transfer model (RTM)-based algorithm. However, the accuracy of SMAP SM is limited in regions with dense vegetation cover and complex surface conditions, due to the empirical parameterization and oversimplified radiative transfer processes. To overcome the limitations, we developed a Process-Guided Machine Learning (PGML) framework to integrate RTM theories and deep learning to predict global daily surface 9-km SM from April 2015 to June 2025. Informed by domain knowledge, we developed the PGML model structure using RTM and hydrological theories, designed a Kling-Gupta efficiency-based cost function, pretrained it with RTM simulations, and fine-tuned it with in-situ measurements. The independent validation shows that PGML SM has strong agreement with in-situ measurements (R = 0.868 and unbiased RMSE = 0.054 m<sup>3</sup>/m<sup>3</sup>). This study highlights the potential of PGML to enhance the accuracy of satellite SM, thereby supporting improved water resources and ecosystem management.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 32-year species-specific live fuel moisture content dataset for southern California chaparral. 一个32年的特定物种的南加州灌木林活燃料水分含量数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06794-3
Kevin Varga, Charles Jones

Live fuel moisture content (LFMC) strongly affects the behavior of wildland fire, resulting in its incorporation into wildfire spread models and danger ratings. In this study, over ten thousand LFMC observations are combined with predictor variables from Landsat imagery and the Weather Research and Forecasting model to train species-specific random forest models that predict the LFMC of four fuel types-chamise, old growth chamise, black sage, and bigpod ceanothus. These models are then utilized to create a historical, 32-year long, LFMC dataset in southern California chaparral. Additionally, the high spatial and temporal sampling frequency of chamise allowed for quantile mapping bias correction to be applied. The final chamise output, which is the most robust, has a mean absolute error of 9.68% and an R2 value of 0.76. The LFMC dataset successfully captures the variability in the annual cycle, the spatial heterogeneity, and the interspecies differences, which makes it applicable for better understanding varying fire season characteristics and landscape level flammability.

活燃料含水率(LFMC)强烈影响野火的行为,因此被纳入野火蔓延模型和危险等级。在这项研究中,超过10,000个LFMC观测数据与来自Landsat图像和天气研究与预报模型的预测变量相结合,训练特定物种的随机森林模型,预测四种燃料类型的LFMC -羚羊,老生长羚羊,黑鼠尾草和大荚海鼠。然后利用这些模型在南加州的灌木丛中创建一个32年的历史LFMC数据集。此外,黄土的高时空采样频率允许应用分位数映射偏差校正。最后得到的结果鲁棒性最强,平均绝对误差为9.68%,R2值为0.76。LFMC数据集成功地捕获了年周期、空间异质性和种间差异的变化,这使得它可以更好地理解不同的火灾季节特征和景观水平的可燃性。
{"title":"A 32-year species-specific live fuel moisture content dataset for southern California chaparral.","authors":"Kevin Varga, Charles Jones","doi":"10.1038/s41597-026-06794-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06794-3","url":null,"abstract":"<p><p>Live fuel moisture content (LFMC) strongly affects the behavior of wildland fire, resulting in its incorporation into wildfire spread models and danger ratings. In this study, over ten thousand LFMC observations are combined with predictor variables from Landsat imagery and the Weather Research and Forecasting model to train species-specific random forest models that predict the LFMC of four fuel types-chamise, old growth chamise, black sage, and bigpod ceanothus. These models are then utilized to create a historical, 32-year long, LFMC dataset in southern California chaparral. Additionally, the high spatial and temporal sampling frequency of chamise allowed for quantile mapping bias correction to be applied. The final chamise output, which is the most robust, has a mean absolute error of 9.68% and an R<sup>2</sup> value of 0.76. The LFMC dataset successfully captures the variability in the annual cycle, the spatial heterogeneity, and the interspecies differences, which makes it applicable for better understanding varying fire season characteristics and landscape level flammability.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Histological Image Dataset with Metadata for Colorectal Cancer Microenvironment. 基于元数据的结直肠癌微环境大规模组织学图像数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-12 DOI: 10.1038/s41597-026-06675-9
Hao Wang, Huiying Li, Jingmin Xue, Yang Jiang, Keru Ma, Fenqi Du, Genshen Mo, Hao Li, Yuze Huang, Haonan Xie, Hongxue Meng, Peng Han, Shenghan Lou

The pronounced heterogeneity of the tumor microenvironment (TME) in colorectal cancer (CRC) presents major obstacles in accurately predicting patient outcomes and tailoring treatment responses. Deciphering this intricate microenvironment based on histological images and classifying it into well-defined tissue components is critical for optimizing clinical interventions. Although deep learning (DL) has advanced substantially in medical imaging analysis, its application in CRC remains limited due to a shortage of comprehensively annotated datasets and large-scale, high-quality histological images. To address this gap, we present HMU-CRC-Hist550K, a curated dataset comprising 550,000 annotated image tiles derived from 500 whole-slide images, fully labeled into eight distinct TME tissue classes. The dataset represents a broad collection of publicly available CRC histology samples. Additionally, we demonstrate the utility of this resource by benchmarking three DL models on tissue segmentation tasks. HMU-CRC-Hist550K offers a valuable foundation for TME profiling, AI-assisted diagnosis, molecular subtype inference, and individualized therapy planning, while also enabling new research directions in modeling the spatial-temporal evolution of the TME.

结直肠癌(CRC)肿瘤微环境(TME)的明显异质性是准确预测患者预后和定制治疗反应的主要障碍。根据组织学图像解读这种复杂的微环境并将其分类为定义明确的组织成分对于优化临床干预至关重要。尽管深度学习(DL)在医学影像分析方面取得了长足的进步,但由于缺乏全面注释的数据集和大规模、高质量的组织学图像,其在CRC中的应用仍然有限。为了解决这一差距,我们提出了HMU-CRC-Hist550K,这是一个精心策划的数据集,包括来自500张整张幻灯片的55万张带注释的图像块,完全标记为8种不同的TME组织类别。该数据集代表了公开可用的CRC组织学样本的广泛集合。此外,我们通过对组织分割任务上的三个深度学习模型进行基准测试来演示该资源的实用性。HMU-CRC-Hist550K为TME分析、人工智能辅助诊断、分子亚型推断和个体化治疗规划提供了宝贵的基础,同时也为TME的时空演化建模提供了新的研究方向。
{"title":"Large-Scale Histological Image Dataset with Metadata for Colorectal Cancer Microenvironment.","authors":"Hao Wang, Huiying Li, Jingmin Xue, Yang Jiang, Keru Ma, Fenqi Du, Genshen Mo, Hao Li, Yuze Huang, Haonan Xie, Hongxue Meng, Peng Han, Shenghan Lou","doi":"10.1038/s41597-026-06675-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06675-9","url":null,"abstract":"<p><p>The pronounced heterogeneity of the tumor microenvironment (TME) in colorectal cancer (CRC) presents major obstacles in accurately predicting patient outcomes and tailoring treatment responses. Deciphering this intricate microenvironment based on histological images and classifying it into well-defined tissue components is critical for optimizing clinical interventions. Although deep learning (DL) has advanced substantially in medical imaging analysis, its application in CRC remains limited due to a shortage of comprehensively annotated datasets and large-scale, high-quality histological images. To address this gap, we present HMU-CRC-Hist550K, a curated dataset comprising 550,000 annotated image tiles derived from 500 whole-slide images, fully labeled into eight distinct TME tissue classes. The dataset represents a broad collection of publicly available CRC histology samples. Additionally, we demonstrate the utility of this resource by benchmarking three DL models on tissue segmentation tasks. HMU-CRC-Hist550K offers a valuable foundation for TME profiling, AI-assisted diagnosis, molecular subtype inference, and individualized therapy planning, while also enabling new research directions in modeling the spatial-temporal evolution of the TME.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1