首页 > 最新文献

Scientific Data最新文献

英文 中文
Sign4all: a Spanish Sign Language dataset. Sign4all:西班牙手语数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-23 DOI: 10.1038/s41597-026-06872-6
Francisco Morillas-Espejo, Ester Martinez-Martin

Sign Language Recognition (SLR) is a critical component of human-machine interaction, enabling more inclusive technologies for the deaf and hard-of-hearing community. However, current datasets often suffer from data sparsity and a bias toward right-handed signs. To support this effort, we present Sign4all, a dataset for Spanish Sign Language (LSE), specifically designed for Isolated Sign Language Recognition (ISLR). The dataset is composed of 7,756 high-resolution RGB video recordings and their corresponding skeletal keypoints, covering 24 signs related to daily activities, more specifically a vocabulary centered in the catering field. Unlike sparse lexicons, Sign4all adopts a high-density approach, providing an average of 323 samples per sign to facilitate data-intensive deep learning models. Moreover, the dataset provides a handedness balance, with equal representation of left- and right-handed signs for every sign to support handedness invariance. Each sample was manually segmented, temporally normalized and preprocessed through spatial normalization to guarantee consistency and compatibility with different deep learning pipelines. Technical validation using Transformer and skeletal models demonstrates the dataset's integrity and the need of providing pre-computed augmentation splits. All data is formatted in widely supported file types (AVI for video, HDF5 for keypoints), enabling direct use in machine learning frameworks such as TensorFlow or PyTorch.

手语识别(SLR)是人机交互的关键组成部分,为聋人和听障群体提供更具包容性的技术。然而,当前的数据集经常遭受数据稀疏和对右手符号的偏见。为了支持这一努力,我们提出了Sign4all,一个专门为孤立手语识别(ISLR)设计的西班牙手语(LSE)数据集。该数据集由7756个高分辨率RGB视频记录及其对应的骨架关键点组成,涵盖了与日常活动相关的24个标志,更具体地说是以餐饮领域为中心的词汇。与稀疏词典不同,Sign4all采用高密度方法,每个符号平均提供323个样本,以促进数据密集型深度学习模型。此外,该数据集还提供了一个惯用手性平衡,每个符号的左手和右手都有相同的表示,以支持惯用手性不变性。每个样本经过人工分割、时间归一化和空间归一化预处理,保证了不同深度学习管道的一致性和兼容性。使用Transformer和骨架模型的技术验证演示了数据集的完整性和提供预先计算的增强分割的需求。所有数据都以广泛支持的文件类型格式化(视频为AVI,关键点为HDF5),可以直接用于机器学习框架,如TensorFlow或PyTorch。
{"title":"Sign4all: a Spanish Sign Language dataset.","authors":"Francisco Morillas-Espejo, Ester Martinez-Martin","doi":"10.1038/s41597-026-06872-6","DOIUrl":"https://doi.org/10.1038/s41597-026-06872-6","url":null,"abstract":"<p><p>Sign Language Recognition (SLR) is a critical component of human-machine interaction, enabling more inclusive technologies for the deaf and hard-of-hearing community. However, current datasets often suffer from data sparsity and a bias toward right-handed signs. To support this effort, we present Sign4all, a dataset for Spanish Sign Language (LSE), specifically designed for Isolated Sign Language Recognition (ISLR). The dataset is composed of 7,756 high-resolution RGB video recordings and their corresponding skeletal keypoints, covering 24 signs related to daily activities, more specifically a vocabulary centered in the catering field. Unlike sparse lexicons, Sign4all adopts a high-density approach, providing an average of 323 samples per sign to facilitate data-intensive deep learning models. Moreover, the dataset provides a handedness balance, with equal representation of left- and right-handed signs for every sign to support handedness invariance. Each sample was manually segmented, temporally normalized and preprocessed through spatial normalization to guarantee consistency and compatibility with different deep learning pipelines. Technical validation using Transformer and skeletal models demonstrates the dataset's integrity and the need of providing pre-computed augmentation splits. All data is formatted in widely supported file types (AVI for video, HDF5 for keypoints), enabling direct use in machine learning frameworks such as TensorFlow or PyTorch.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-level visual representation dataset for large-scale non-financial information disclosure. 面向大规模非财务信息披露的多层次可视化数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-23 DOI: 10.1038/s41597-026-06848-6
Bingjie Li, Binglong Xia, Ze Cheng, Yitong Xu, Zhao Duan

Although corporate sustainability reports increasingly employ visual rhetoric to influence stakeholder perceptions, quantitative tools for objectively measuring these strategies remain limited. Here we present the Non-Financial Information Disclosure Visual Representations Index (NFIVI) dataset, a dynamic resource covering Chinese listed companies. While the current release (2006-2024) encompasses a comprehensive collection of these reports, the dataset is updated annually, with data volume steadily increasing as new reports are processed. Utilizing a pipeline integrating layout analysis and computer vision, we decompose reports into three fundamental elements: text, image, and color. This dataset introduces two indices to objectively quantify visual composition and structure: the Feature-Correlation Index (NFIVI_FC), measuring stylistic consistency through multidimensional feature coherence, and the Information Entropy Index (NFIVI_EI), assessing visual complexity based on color diversity. Alongside 18 granular indicators spanning the text, image, and color dimensions at both page and document levels, these indices operationalize abstract design concepts into computable metrics. This resource enables large-scale quantitative research into corporate impression management and supports the development of automated auditing tools for non-financial disclosures.

尽管企业可持续发展报告越来越多地采用视觉修辞来影响利益相关者的看法,但客观衡量这些战略的定量工具仍然有限。本文提出了非财务信息披露可视化表述指数(NFIVI)数据集,这是一个涵盖中国上市公司的动态资源。虽然当前版本(2006-2024)包含了这些报告的全面集合,但数据集每年更新一次,随着新报告的处理,数据量稳步增加。利用整合布局分析和计算机视觉的管道,我们将报告分解为三个基本元素:文本、图像和颜色。该数据集引入了两个指标来客观量化视觉组成和结构:特征相关指数(NFIVI_FC),通过多维特征一致性衡量风格一致性;信息熵指数(NFIVI_EI),基于颜色多样性评估视觉复杂性。除了在页面和文档级别上跨越文本、图像和颜色维度的18个粒度指标外,这些指标还将抽象的设计概念转化为可计算的指标。该资源能够对企业印象管理进行大规模定量研究,并支持非财务披露的自动审计工具的开发。
{"title":"A multi-level visual representation dataset for large-scale non-financial information disclosure.","authors":"Bingjie Li, Binglong Xia, Ze Cheng, Yitong Xu, Zhao Duan","doi":"10.1038/s41597-026-06848-6","DOIUrl":"https://doi.org/10.1038/s41597-026-06848-6","url":null,"abstract":"<p><p>Although corporate sustainability reports increasingly employ visual rhetoric to influence stakeholder perceptions, quantitative tools for objectively measuring these strategies remain limited. Here we present the Non-Financial Information Disclosure Visual Representations Index (NFIVI) dataset, a dynamic resource covering Chinese listed companies. While the current release (2006-2024) encompasses a comprehensive collection of these reports, the dataset is updated annually, with data volume steadily increasing as new reports are processed. Utilizing a pipeline integrating layout analysis and computer vision, we decompose reports into three fundamental elements: text, image, and color. This dataset introduces two indices to objectively quantify visual composition and structure: the Feature-Correlation Index (NFIVI_FC), measuring stylistic consistency through multidimensional feature coherence, and the Information Entropy Index (NFIVI_EI), assessing visual complexity based on color diversity. Alongside 18 granular indicators spanning the text, image, and color dimensions at both page and document levels, these indices operationalize abstract design concepts into computable metrics. This resource enables large-scale quantitative research into corporate impression management and supports the development of automated auditing tools for non-financial disclosures.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maternal-Fetal Ultrasouno Video Dataset for End-to-end Intrapartum Biometry and Multi-task Learning. 端到端产时生物测量和多任务学习的母胎超声视频数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-23 DOI: 10.1038/s41597-026-06900-5
Ming Niu, Jieyun Bai, Yunbo Gao, Yitong Tang, Yaosheng Lu, Zhenyan Han, Hongying Hou, Yuxin Huang

Intrapartum biometry is of vital significance in monitoring labor progress. However, the realization of AI-based end-to-end intrapartum biometry and labor progress assessment requires intrapartum ultrasound video datasets with multi-category annotations, and currently, there is no public video dataset available for multi-category fine-grained classification. While several image datasets exist for related tasks (e.g., JNU-IFM, PSFHS, IUGC), a dedicated benchmark in the video domain remains unavailable. To bridge this gap, we have publicly released, for the first time, a multi-center, multi-device, and multi-category labeled intrapartum ultrasound dataset. This dataset comprises 774 videos / 68,106 images, along with corresponding standard plane classification labels, multi-class segmentation labels of pubic symphysis and fetal head, and two ultrasound parameter labels that characterize labor progress. This dataset can facilitate research on multi-task learning methods and the development of end-to-end automated approaches, especially in the automation of obstetric processes and auxiliary decision-making.

产时生物测量对监测产程有重要意义。然而,实现基于人工智能的端到端产程生物识别和产程评估需要多类别标注的产程超声视频数据集,目前尚无公共视频数据集可用于多类别细粒度分类。虽然存在一些用于相关任务的图像数据集(例如,JNU-IFM, PSFHS, IUGC),但视频领域的专用基准仍然不可用。为了弥补这一差距,我们首次公开发布了一个多中心、多设备、多类别标记的产时超声数据集。该数据集包括774个视频/ 68,106张图像,以及相应的标准平面分类标签,耻骨联合和胎头的多类分割标签,以及表征分娩过程的两个超声参数标签。该数据集可以促进多任务学习方法的研究和端到端自动化方法的开发,特别是在产科过程和辅助决策的自动化方面。
{"title":"Maternal-Fetal Ultrasouno Video Dataset for End-to-end Intrapartum Biometry and Multi-task Learning.","authors":"Ming Niu, Jieyun Bai, Yunbo Gao, Yitong Tang, Yaosheng Lu, Zhenyan Han, Hongying Hou, Yuxin Huang","doi":"10.1038/s41597-026-06900-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06900-5","url":null,"abstract":"<p><p>Intrapartum biometry is of vital significance in monitoring labor progress. However, the realization of AI-based end-to-end intrapartum biometry and labor progress assessment requires intrapartum ultrasound video datasets with multi-category annotations, and currently, there is no public video dataset available for multi-category fine-grained classification. While several image datasets exist for related tasks (e.g., JNU-IFM, PSFHS, IUGC), a dedicated benchmark in the video domain remains unavailable. To bridge this gap, we have publicly released, for the first time, a multi-center, multi-device, and multi-category labeled intrapartum ultrasound dataset. This dataset comprises 774 videos / 68,106 images, along with corresponding standard plane classification labels, multi-class segmentation labels of pubic symphysis and fetal head, and two ultrasound parameter labels that characterize labor progress. This dataset can facilitate research on multi-task learning methods and the development of end-to-end automated approaches, especially in the automation of obstetric processes and auxiliary decision-making.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality Chromosome-level genome assembly of Gynostemma guangxiense (Cucurbitaceae). 绞股蓝(葫芦科)染色体水平的高质量基因组组装。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-23 DOI: 10.1038/s41597-026-06889-x
Xiao Zhang, Hao Zhang, Chen Chen, Yuemei Zhao

Gynostemma guangxiense X. X. Chen & D. H. Qin, belonging to the family Cucurbitaceae, is a perennial creeping herbaceous plant endemic to China with potential medicinal and health value. Here, we report the high-quality chromosome-level genome of G. guangxiense obtained by integrating Illumina short read, PacBio high-fidelity (HiFi) long read, Hi-C, and RNA-Seq technologies. The genome is anchored to 11 pseudochromosomes with a total size of 565.18 Mb, with a scaffold N50 of 52.63 Mb, achieving a complete BUSCO of 98.00%. Furthermore, we identified 27,527 protein-coding genes, of which 97.75% were functionally annotated. This genome provides an important molecular foundation for adaptive evolution, genetic conservation, and effective development of valuable medicinal plant resources within the Gynostemma genus.

绞股蓝(Gynostemma guangxiense)是葫芦科植物,是中国特有的多年生匍匐草本植物,具有潜在的药用和保健价值。本文报道了利用Illumina short - read、PacBio high-fidelity (HiFi) long - read、Hi-C和RNA-Seq技术获得的高质量光仙鸟染色体水平基因组。基因组锚定在11条假染色体上,总大小为565.18 Mb,支架N50为52.63 Mb, BUSCO为98.00%。此外,我们鉴定了27,527个蛋白质编码基因,其中97.75%被功能注释。该基因组为绞绞线属药用植物的适应性进化、遗传保护和有效开发提供了重要的分子基础。
{"title":"A high-quality Chromosome-level genome assembly of Gynostemma guangxiense (Cucurbitaceae).","authors":"Xiao Zhang, Hao Zhang, Chen Chen, Yuemei Zhao","doi":"10.1038/s41597-026-06889-x","DOIUrl":"https://doi.org/10.1038/s41597-026-06889-x","url":null,"abstract":"<p><p>Gynostemma guangxiense X. X. Chen & D. H. Qin, belonging to the family Cucurbitaceae, is a perennial creeping herbaceous plant endemic to China with potential medicinal and health value. Here, we report the high-quality chromosome-level genome of G. guangxiense obtained by integrating Illumina short read, PacBio high-fidelity (HiFi) long read, Hi-C, and RNA-Seq technologies. The genome is anchored to 11 pseudochromosomes with a total size of 565.18 Mb, with a scaffold N50 of 52.63 Mb, achieving a complete BUSCO of 98.00%. Furthermore, we identified 27,527 protein-coding genes, of which 97.75% were functionally annotated. This genome provides an important molecular foundation for adaptive evolution, genetic conservation, and effective development of valuable medicinal plant resources within the Gynostemma genus.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-nucleus RNA sequencing dataset of diverse tissues from wild-type monkey and Tau-P301L transgenic monkey. 野生型猴和Tau-P301L转基因猴不同组织的单核RNA测序数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-21 DOI: 10.1038/s41597-026-06882-4
Bofeng Han, Yan Chen, Weijie Ouyang, Danyi Chen, Jiawei Li, Weien Liang, Xudong Zhang, Chengxi Wei, Ling Liu, Sen Yan, Zhuchi Tu

Utilizing non-human primates to study the role of human Tau and its related pathologies is logical and important due to their closer similarity to human brain structure and function. In our earlier research, we generated a transgenic cynomolgus monkey model expressing Tau (P301L) through lentiviral infection of monkey embryos. These monkeys exhibited age-dependent neurodegeneration and motor dysfunction. Single-nucleus RNA sequencing (snRNA-seq) is a powerful and promising technique for elucidating the cellular complexity and pathology across different tissues. However, single-cell data from non-human primate models of Tau pathology are currently nonexistent. In this study, we performed snRNA-seq on the hippocampus, striatum, and spinal cord of Tau (P301L) monkey, providing the first snRNA-seq atlas of multiple tissue regions in a non-human primate model that simulates human tauopathies. This will offer crucial data references for cross-species single-cell level studies of tau and its related pathologies.

利用非人类灵长类动物研究人类Tau蛋白及其相关病理的作用是合乎逻辑和重要的,因为它们与人类大脑的结构和功能更相似。在前期的研究中,我们通过慢病毒感染猴胚胎,建立了表达Tau (P301L)的转基因食蟹猴模型。这些猴子表现出年龄依赖性的神经变性和运动功能障碍。单核RNA测序(snRNA-seq)是一种强大而有前途的技术,用于阐明不同组织的细胞复杂性和病理。然而,来自非人类灵长类动物Tau病理模型的单细胞数据目前尚不存在。在本研究中,我们对Tau (P301L)猴的海马、纹状体和脊髓进行了snRNA-seq,首次在模拟人类Tau病的非人灵长类动物模型中提供了多个组织区域的snRNA-seq图谱。这将为跨物种单细胞水平的tau蛋白及其相关病理研究提供重要的数据参考。
{"title":"Single-nucleus RNA sequencing dataset of diverse tissues from wild-type monkey and Tau-P301L transgenic monkey.","authors":"Bofeng Han, Yan Chen, Weijie Ouyang, Danyi Chen, Jiawei Li, Weien Liang, Xudong Zhang, Chengxi Wei, Ling Liu, Sen Yan, Zhuchi Tu","doi":"10.1038/s41597-026-06882-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06882-4","url":null,"abstract":"<p><p>Utilizing non-human primates to study the role of human Tau and its related pathologies is logical and important due to their closer similarity to human brain structure and function. In our earlier research, we generated a transgenic cynomolgus monkey model expressing Tau (P301L) through lentiviral infection of monkey embryos. These monkeys exhibited age-dependent neurodegeneration and motor dysfunction. Single-nucleus RNA sequencing (snRNA-seq) is a powerful and promising technique for elucidating the cellular complexity and pathology across different tissues. However, single-cell data from non-human primate models of Tau pathology are currently nonexistent. In this study, we performed snRNA-seq on the hippocampus, striatum, and spinal cord of Tau (P301L) monkey, providing the first snRNA-seq atlas of multiple tissue regions in a non-human primate model that simulates human tauopathies. This will offer crucial data references for cross-species single-cell level studies of tau and its related pathologies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146776634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The GaMMA corpus of Danish polyadic conversations with gaze speech and motion data in quiet and noise. 丹麦语多方位对话的GaMMA语料库,包括安静和噪音下的凝视、言语和动作数据。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-21 DOI: 10.1038/s41597-026-06851-x
Mark Dourado, Henrik Gert Hassager, Jesper Udesen, Stefania Serafin

The GaMMA (Gaze, Motion, and Multi-talker Audio) corpus captures the behavior of polyadic conversations among native Danish speakers under both normal and cocktail party conditions. Eleven groups of four normal-hearing participants are recorded while engaged in natural and spontaneous interactions. All conversations were conducted without conversational tasks. Each group was intentionally composed of participants with prior intragroup and interpersonal relations. Gaze and motion data were collected using an optical tracking system and eye-tracking glasses, while speech was recorded via omnidirectional head-worn microphones and binaural hearing aid microphones with low occlusion. Calibrations were conducted before trials and compensation filters were created to account for differences in microphone placements. Processed versions of the audio signals, with background noise attenuated and crosstalk removed, were used to compute speech activity for all participants. The corpus, including both raw and processed gaze and audio data, as well as filters, calibration signals, and speech activity output, is publicly available.

伽玛语料库(凝视、动作和多说话者音频)捕获了丹麦语母语者在正常和鸡尾酒会条件下的多元对话行为。11组4名听力正常的参与者在进行自然和自发的互动时被记录下来。所有的对话都是在没有会话任务的情况下进行的。每个小组都有意地由具有先前的小组内部和人际关系的参与者组成。通过光学跟踪系统和眼动追踪眼镜收集注视和运动数据,通过全向头戴式麦克风和低遮挡双耳助听器麦克风记录语音。在试验之前进行校准,并创建补偿滤波器以解释麦克风放置的差异。经过处理的音频信号,减弱背景噪声,去除串扰,被用来计算所有参与者的语音活动。该语料库包括原始和处理过的凝视和音频数据,以及过滤器、校准信号和语音活动输出,都是公开的。
{"title":"The GaMMA corpus of Danish polyadic conversations with gaze speech and motion data in quiet and noise.","authors":"Mark Dourado, Henrik Gert Hassager, Jesper Udesen, Stefania Serafin","doi":"10.1038/s41597-026-06851-x","DOIUrl":"https://doi.org/10.1038/s41597-026-06851-x","url":null,"abstract":"<p><p>The GaMMA (Gaze, Motion, and Multi-talker Audio) corpus captures the behavior of polyadic conversations among native Danish speakers under both normal and cocktail party conditions. Eleven groups of four normal-hearing participants are recorded while engaged in natural and spontaneous interactions. All conversations were conducted without conversational tasks. Each group was intentionally composed of participants with prior intragroup and interpersonal relations. Gaze and motion data were collected using an optical tracking system and eye-tracking glasses, while speech was recorded via omnidirectional head-worn microphones and binaural hearing aid microphones with low occlusion. Calibrations were conducted before trials and compensation filters were created to account for differences in microphone placements. Processed versions of the audio signals, with background noise attenuated and crosstalk removed, were used to compute speech activity for all participants. The corpus, including both raw and processed gaze and audio data, as well as filters, calibration signals, and speech activity output, is publicly available.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146259145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dense longitudinal multimodal single-subject rs-fMRI dataset acquired by self-administered scanning. 一个密集的纵向多模态单受试者rs-fMRI数据集通过自我管理扫描获得。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-21 DOI: 10.1038/s41597-026-06879-z
Evgeny D Petrovskiy

Dense longitudinal neuroimaging usually requires substantial institutional resources, yet can also be achieved by an individual using standard clinical MRI infrastructure. This work presents a multimodal single-subject dataset comprising 85 hours of resting-state fMRI acquired over 11 months, including 51.6 hours under a standardized protocol (paired eyes-open/-closed runs, 128 sessions over 7.5 months). Additional data include 195 T1-weighted structural scans, 54 diffusion MRI sessions, physiological recordings, pre-session behavioral assessments, and detailed medication and lifestyle logs. Scans were collected primarily via self-administered acquisition on a clinical 3 T system, with sub-3 mm between-session positioning reproducibility observed in later sessions. Quality control identified 58 hours of low-motion data (mean framewise displacement <0.2 mm), with higher-motion runs occurring predominantly during sleep. The acquisition period included antidepressant dose changes and seasonal variation, forming a single-subject naturalistic context with collinear factors that preclude causal inference. The dataset follows the BIDS standard and is intended for methodological development, reliability analyses, preprocessing benchmarking, and educational use.

密集的纵向神经成像通常需要大量的机构资源,但也可以由个人使用标准的临床MRI基础设施来实现。这项工作提出了一个多模式的单受试者数据集,包括在11个月内获得的85小时静息状态fMRI,其中包括51.6小时的标准化协议(成对睁眼/闭眼运行,超过7.5个月的128次会话)。其他数据包括195次t1加权结构扫描、54次弥散性MRI、生理记录、会前行为评估以及详细的用药和生活方式日志。扫描主要通过在临床3t系统上的自我管理采集收集,在随后的治疗中观察到每次治疗之间低于3mm的定位可重复性。质量控制鉴定了58小时的低运动数据(平均帧内位移)
{"title":"A dense longitudinal multimodal single-subject rs-fMRI dataset acquired by self-administered scanning.","authors":"Evgeny D Petrovskiy","doi":"10.1038/s41597-026-06879-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06879-z","url":null,"abstract":"<p><p>Dense longitudinal neuroimaging usually requires substantial institutional resources, yet can also be achieved by an individual using standard clinical MRI infrastructure. This work presents a multimodal single-subject dataset comprising 85 hours of resting-state fMRI acquired over 11 months, including 51.6 hours under a standardized protocol (paired eyes-open/-closed runs, 128 sessions over 7.5 months). Additional data include 195 T1-weighted structural scans, 54 diffusion MRI sessions, physiological recordings, pre-session behavioral assessments, and detailed medication and lifestyle logs. Scans were collected primarily via self-administered acquisition on a clinical 3 T system, with sub-3 mm between-session positioning reproducibility observed in later sessions. Quality control identified 58 hours of low-motion data (mean framewise displacement <0.2 mm), with higher-motion runs occurring predominantly during sleep. The acquisition period included antidepressant dose changes and seasonal variation, forming a single-subject naturalistic context with collinear factors that preclude causal inference. The dataset follows the BIDS standard and is intended for methodological development, reliability analyses, preprocessing benchmarking, and educational use.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146776579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De novo transcriptome assembly of the Moroccan fir, Abies marocana Trab. 摩洛哥冷杉(Abies marocana Trab)转录组的从头组装。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-21 DOI: 10.1038/s41597-026-06888-y
Belén Méndez-Cea, Isabel García-García, Manuel Pavesio-Toledano, Jose Luis Horreo, José Ignacio Seco, Francisco Javier Gallego, Juan Carlos Linares

The Moroccan fir (Abies marocana Trab.) is an endangered conifer endemic to the western Rif Mountains. Despite its ecological and economic significance, no transcriptomic data was previously available for the species. Here, we present the first de novo transcriptome assembly for A. marocana, generated from RNA-seq data obtained from three organs (leaf, stem, and root) subjected to different environmental conditions (drought, heat, cold, hormones, and physical damage), using both short- and long-read sequencing technologies, to achieve a comprehensive representation of the species' transcriptome. The assembly achieved a completeness value of 92.1% according to BUSCO, with 279,439 final transcripts, of which approximately 45.2% were functionally annotated. This high-quality transcriptome provides a valuable resource for advancing genetic research and supporting conservation efforts for this vulnerable species.

摩洛哥冷杉(Abies marocana Trab.)是Rif山脉西部特有的濒危针叶树。尽管具有重要的生态和经济意义,但以前没有关于该物种的转录组学数据。在这里,我们展示了马罗卡纳的第一个从头转录组组装,从遭受不同环境条件(干旱、热、冷、激素和物理损伤)的三个器官(叶、茎和根)中获得的RNA-seq数据生成,使用短读和长读测序技术,以实现物种转录组的全面表示。根据BUSCO的数据,该汇编的完整性值为92.1%,最终转录本为279,439份,其中约45.2%得到了功能注释。这种高质量的转录组为推进遗传研究和支持这种脆弱物种的保护工作提供了宝贵的资源。
{"title":"De novo transcriptome assembly of the Moroccan fir, Abies marocana Trab.","authors":"Belén Méndez-Cea, Isabel García-García, Manuel Pavesio-Toledano, Jose Luis Horreo, José Ignacio Seco, Francisco Javier Gallego, Juan Carlos Linares","doi":"10.1038/s41597-026-06888-y","DOIUrl":"https://doi.org/10.1038/s41597-026-06888-y","url":null,"abstract":"<p><p>The Moroccan fir (Abies marocana Trab.) is an endangered conifer endemic to the western Rif Mountains. Despite its ecological and economic significance, no transcriptomic data was previously available for the species. Here, we present the first de novo transcriptome assembly for A. marocana, generated from RNA-seq data obtained from three organs (leaf, stem, and root) subjected to different environmental conditions (drought, heat, cold, hormones, and physical damage), using both short- and long-read sequencing technologies, to achieve a comprehensive representation of the species' transcriptome. The assembly achieved a completeness value of 92.1% according to BUSCO, with 279,439 final transcripts, of which approximately 45.2% were functionally annotated. This high-quality transcriptome provides a valuable resource for advancing genetic research and supporting conservation efforts for this vulnerable species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146776585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WHACS: An Improved Global Wave Hindcast for the Australian Climate Service. WHACS:为澳大利亚气候服务改进的全球波浪预报。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-21 DOI: 10.1038/s41597-026-06864-6
Grant Smith, Alberto Meucci, Claire Spillman, Ron Hoeke, Vanessa Hernaman, Claire Trenham, Stefan Zieger, Bryan Hally, Emilio Echevarria

A multi-decadal global wind-wave hindcast dataset-WHACS: the Wave Hindcast for ACS-spanning 1979 to near present was developed to offer insight into historical wave conditions both directly and as boundary forcing to localised simulations. Applications for WHACS include coastal management, climate research, and renewable energy projects, ultimately helping communities and industries make informed decisions to improve safety, efficiency, and resilience regarding wave conditions. This dataset features a near-global spherical multi-cell (SMC) grid that aligns with the Bureau operational wave forecast model and has been calibrated to better represent extreme wave conditions by improving the representation of extreme winds. Spanning from 1979 to near present, WHACS available output consists of multiple hourly bulk and spectral partition wave parameters for the native SMC grid, as well as regular global and regional regridded bulk wave parameters. For the Indo-Pacific, a gridded output of full spectral data is available across exclusive economic zones.

开发了一个跨越1979年至近现在的多年代际全球风浪后发数据- whacs: acs的风浪后发数据,以提供对历史风浪条件的直接洞察,并作为局部模拟的边界强迫。WHACS的应用包括海岸管理、气候研究和可再生能源项目,最终帮助社区和行业做出明智的决策,以提高海浪条件下的安全性、效率和复原力。该数据集具有近全球球形多单元格(SMC)网格,该网格与局的业务海浪预报模型一致,并经过校准,通过改进极端风的表示,更好地代表极端海浪条件。从1979年到近现在,WHACS的可用输出包括本地SMC网格的多个小时体波和谱分区波参数,以及规则的全球和区域重新网格体波参数。对于印度-太平洋地区来说,可以通过专属经济区获得全光谱数据的网格化输出。
{"title":"WHACS: An Improved Global Wave Hindcast for the Australian Climate Service.","authors":"Grant Smith, Alberto Meucci, Claire Spillman, Ron Hoeke, Vanessa Hernaman, Claire Trenham, Stefan Zieger, Bryan Hally, Emilio Echevarria","doi":"10.1038/s41597-026-06864-6","DOIUrl":"https://doi.org/10.1038/s41597-026-06864-6","url":null,"abstract":"<p><p>A multi-decadal global wind-wave hindcast dataset-WHACS: the Wave Hindcast for ACS-spanning 1979 to near present was developed to offer insight into historical wave conditions both directly and as boundary forcing to localised simulations. Applications for WHACS include coastal management, climate research, and renewable energy projects, ultimately helping communities and industries make informed decisions to improve safety, efficiency, and resilience regarding wave conditions. This dataset features a near-global spherical multi-cell (SMC) grid that aligns with the Bureau operational wave forecast model and has been calibrated to better represent extreme wave conditions by improving the representation of extreme winds. Spanning from 1979 to near present, WHACS available output consists of multiple hourly bulk and spectral partition wave parameters for the native SMC grid, as well as regular global and regional regridded bulk wave parameters. For the Indo-Pacific, a gridded output of full spectral data is available across exclusive economic zones.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146776657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Unified Dataset for Antibody and Nanobody Design Including Sequence, Structure, and Binding Affinity Data. 抗体和纳米体设计的统一数据集,包括序列,结构和结合亲和力数据。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-21 DOI: 10.1038/s41597-026-06878-0
Yikai Wu, Xuejiao Liu, Karin Hrovatin, Dezhi Wu, Stephanie Linker, Mathias Winkel, Feng Tan

The design and optimization of antibodies and nanobodies using deep generative models hold transformative potential for therapeutic and diagnostic applications, which are hindered by the fragmented and inconsistent nature of existing datasets. To address these limitations, we introduce the Antibody and Nanobody Design Dataset (ANDD), a unified dataset that integrates sequence, structure, antigen, and affinity data from 15 diverse sources. ANDD is a comprehensive resource comprising 48,683 antibody/nanobody sequences, with structural data for 24,941 entries, and antigen sequences for 12,575 entries. We further augmented the affinity data with 2,271 predicted affinity values using ANTIPASTI, a robust model for binding affinity prediction. Consequently, ANDD includes 9,557 affinity values, making it the largest dataset to date for antibody/nanobody and antigen pairs with affinity data. By addressing challenges of data fragmentation and inconsistency, ANDD provides a robust foundation for training deep generative models. With ANDD, the models can better model antibody/nanobody-antigen interactions, while design novel antibodies and nanobodies with improved specificity and efficacy, paving the way for development of targeted therapeutics.

使用深度生成模型的抗体和纳米体的设计和优化具有治疗和诊断应用的变革潜力,这受到现有数据集碎片化和不一致的性质的阻碍。为了解决这些限制,我们引入了抗体和纳米体设计数据集(ANDD),这是一个统一的数据集,集成了来自15个不同来源的序列、结构、抗原和亲和力数据。ANDD是一个综合资源,包括48,683个抗体/纳米体序列,24,941个条目的结构数据和12,575个条目的抗原序列。我们使用ANTIPASTI(一个强大的结合亲和预测模型)进一步增加了2271个预测亲和值的亲和数据。因此,ANDD包含9,557个亲和力值,使其成为迄今为止具有亲和力数据的抗体/纳米体和抗原对的最大数据集。通过解决数据碎片和不一致的挑战,ANDD为训练深度生成模型提供了坚实的基础。利用ANDD,这些模型可以更好地模拟抗体/纳米体-抗原相互作用,同时设计出特异性和有效性更高的新型抗体和纳米体,为靶向治疗的发展铺平道路。
{"title":"A Unified Dataset for Antibody and Nanobody Design Including Sequence, Structure, and Binding Affinity Data.","authors":"Yikai Wu, Xuejiao Liu, Karin Hrovatin, Dezhi Wu, Stephanie Linker, Mathias Winkel, Feng Tan","doi":"10.1038/s41597-026-06878-0","DOIUrl":"10.1038/s41597-026-06878-0","url":null,"abstract":"<p><p>The design and optimization of antibodies and nanobodies using deep generative models hold transformative potential for therapeutic and diagnostic applications, which are hindered by the fragmented and inconsistent nature of existing datasets. To address these limitations, we introduce the Antibody and Nanobody Design Dataset (ANDD), a unified dataset that integrates sequence, structure, antigen, and affinity data from 15 diverse sources. ANDD is a comprehensive resource comprising 48,683 antibody/nanobody sequences, with structural data for 24,941 entries, and antigen sequences for 12,575 entries. We further augmented the affinity data with 2,271 predicted affinity values using ANTIPASTI, a robust model for binding affinity prediction. Consequently, ANDD includes 9,557 affinity values, making it the largest dataset to date for antibody/nanobody and antigen pairs with affinity data. By addressing challenges of data fragmentation and inconsistency, ANDD provides a robust foundation for training deep generative models. With ANDD, the models can better model antibody/nanobody-antigen interactions, while design novel antibodies and nanobodies with improved specificity and efficacy, paving the way for development of targeted therapeutics.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12932709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146776587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1