Pub Date : 2026-01-16DOI: 10.1038/s41597-026-06595-8
Ze Yuan, Ting Ma
High-resolution datasets of anthropogenic water pollution discharges are essential for characterizing pollution dynamics and informing water quality management. However, China's pollution source data remain limited to provincial scales and decadal censuses, constraining spatiotemporal analyses and policy evaluation. We present a High-resolution Sectoral Water Pollution Discharge Dataset for mainland China (2007-2022), providing annual data at 30 arc-second (approximately 1 km at the equator) resolution. By integrating pollution source statistics with geospatial data through a top-down downscaling framework, we allocated provincial discharges to grid cells. The dataset quantifies gridded anthropogenic discharge measured by chemical oxygen demand (COD) and ammonium nitrogen (NH3-N) from five sectors: urban residential, rural residential, industrial, crop farming, and livestock farming. Validation was performed by comparing city-level aggregated estimates against official census records from 73 cities, demonstrating strong agreement (R² > 0.6) for both pollutants across all sectors. This dataset enables identification of fine-scale pollution hotspots within river basins that were previously obscured by provincial-scale data, thereby supporting the implementation of targeted pollution control strategies.
{"title":"High-resolution gridded dataset of sectoral water pollution discharges in China from 2007 to 2022.","authors":"Ze Yuan, Ting Ma","doi":"10.1038/s41597-026-06595-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06595-8","url":null,"abstract":"<p><p>High-resolution datasets of anthropogenic water pollution discharges are essential for characterizing pollution dynamics and informing water quality management. However, China's pollution source data remain limited to provincial scales and decadal censuses, constraining spatiotemporal analyses and policy evaluation. We present a High-resolution Sectoral Water Pollution Discharge Dataset for mainland China (2007-2022), providing annual data at 30 arc-second (approximately 1 km at the equator) resolution. By integrating pollution source statistics with geospatial data through a top-down downscaling framework, we allocated provincial discharges to grid cells. The dataset quantifies gridded anthropogenic discharge measured by chemical oxygen demand (COD) and ammonium nitrogen (NH<sub>3</sub>-N) from five sectors: urban residential, rural residential, industrial, crop farming, and livestock farming. Validation was performed by comparing city-level aggregated estimates against official census records from 73 cities, demonstrating strong agreement (R² > 0.6) for both pollutants across all sectors. This dataset enables identification of fine-scale pollution hotspots within river basins that were previously obscured by provincial-scale data, thereby supporting the implementation of targeted pollution control strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-026-06606-8
Abdurrahman Keskin, Hani J Shayya, Achchhe Patel, Dario Sirabella, Barbara Corneo, Marko Jovanovic
Human embryonic stem cells (hESCs) provide a powerful in vitro model to study lineage specification and the regulatory programs underlying early human development. Here, we present a high-resolution, temporal multi-omics dataset tracking mRNA, translation, and protein expression dynamics during hESC differentiation into definitive endoderm and subsequent polyhormonal (PH) cells, a key pancreatic lineage. RNA-seq, ribosome profiling, and quantitative mass spectrometry-based proteomics were performed on matched samples collected at ten time points in biological duplicates, allowing detailed characterization of transcriptional, translational, and protein abundance changes over the differentiation timeline. The dataset exhibits high technical quality, with strong reproducibility between replicates and rigorous quality control metrics across all omics platforms. This extensive dataset provides critical insights into the complex regulatory mechanisms driving polyhormonal cell differentiation and serves as a valuable resource for the research community, enabling deeper exploration of mammalian development, endodermal lineage specification, and gene regulation.
{"title":"Temporal multiomics gene expression data across human embryonic stem cell-derived polyhormonal cell differentiation.","authors":"Abdurrahman Keskin, Hani J Shayya, Achchhe Patel, Dario Sirabella, Barbara Corneo, Marko Jovanovic","doi":"10.1038/s41597-026-06606-8","DOIUrl":"https://doi.org/10.1038/s41597-026-06606-8","url":null,"abstract":"<p><p>Human embryonic stem cells (hESCs) provide a powerful in vitro model to study lineage specification and the regulatory programs underlying early human development. Here, we present a high-resolution, temporal multi-omics dataset tracking mRNA, translation, and protein expression dynamics during hESC differentiation into definitive endoderm and subsequent polyhormonal (PH) cells, a key pancreatic lineage. RNA-seq, ribosome profiling, and quantitative mass spectrometry-based proteomics were performed on matched samples collected at ten time points in biological duplicates, allowing detailed characterization of transcriptional, translational, and protein abundance changes over the differentiation timeline. The dataset exhibits high technical quality, with strong reproducibility between replicates and rigorous quality control metrics across all omics platforms. This extensive dataset provides critical insights into the complex regulatory mechanisms driving polyhormonal cell differentiation and serves as a valuable resource for the research community, enabling deeper exploration of mammalian development, endodermal lineage specification, and gene regulation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-026-06563-2
William W M Verbiest, Pauline Hicter, Hans Beeckman, Daniel Wallenus, Bhély Angoboy Ilondea, Jean-François Bastin, Marijn Bauters, Jérôme Chave, Ruben De Blaere, Thalès de Hauleville, Tom De Mil, Maaike de Ridder, Cécile De Troyer, Corneille E N Ewango, Adeline Fayolle, Anais Gorel, Fabian Jörg Fischer, Begüm Kaçamak, Christien Kimbuluma, Nestor K Luambua, Félix Laurent, Kévin Liévens, Jean-Remy Makana, François Malaisse, Mbusa Wasukundi, Michael Monnoye, Alfred Ngomanda, Franck Rodrigue Olouo Ambounda, Benjamin Toirambe, Cédric Otepa, Joris Van Acker, Bes Van den Abbeele, Jan Van den Bulcke, Blanca Van Houtte Alonso, Thierry Wankana, Brice Yannick Djiofack, Wannes Hubau
Wood density is a key plant property, indispensable for estimating forest biomass. Yet, despite tropical regions' substantial contributions to global tree diversity and carbon cycling, they remain underrepresented in wood density datasets such as the CIRAD and Global Wood Density Database (GWDD). To address this gap, we present the 'Tervuren xylarium Wood Density Database' (TWDD), containing 13,332 samples from 2,994 species, 1,022 genera, and 156 plant families across six continents (72% from Africa). TWDD offers direct measurements of oven-dry (oven-dry mass/oven-dry volume, all samples), air-dry (air-dry mass/air-dry volume, 6,408 samples), green (green mass/green volume, 1,657 samples), and basic wood density (oven-dry mass/green volume, 1,686 samples). Basic density was estimated for the remaining 11,646 samples via conversion from oven-dry density. TWDD closes a substantial wood density data gap, especially in Africa, adding 1,164 new species, 160 new genera, and 8 new plant families not included in GWDD or CIRAD datasets. The TWDD provides a critical resource for advancing research on forest community dynamics, ecosystem functioning, carbon cycling, and trait-based ecology worldwide.
{"title":"The Tervuren xylarium Wood Density Database (TWDD).","authors":"William W M Verbiest, Pauline Hicter, Hans Beeckman, Daniel Wallenus, Bhély Angoboy Ilondea, Jean-François Bastin, Marijn Bauters, Jérôme Chave, Ruben De Blaere, Thalès de Hauleville, Tom De Mil, Maaike de Ridder, Cécile De Troyer, Corneille E N Ewango, Adeline Fayolle, Anais Gorel, Fabian Jörg Fischer, Begüm Kaçamak, Christien Kimbuluma, Nestor K Luambua, Félix Laurent, Kévin Liévens, Jean-Remy Makana, François Malaisse, Mbusa Wasukundi, Michael Monnoye, Alfred Ngomanda, Franck Rodrigue Olouo Ambounda, Benjamin Toirambe, Cédric Otepa, Joris Van Acker, Bes Van den Abbeele, Jan Van den Bulcke, Blanca Van Houtte Alonso, Thierry Wankana, Brice Yannick Djiofack, Wannes Hubau","doi":"10.1038/s41597-026-06563-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06563-2","url":null,"abstract":"<p><p>Wood density is a key plant property, indispensable for estimating forest biomass. Yet, despite tropical regions' substantial contributions to global tree diversity and carbon cycling, they remain underrepresented in wood density datasets such as the CIRAD and Global Wood Density Database (GWDD). To address this gap, we present the 'Tervuren xylarium Wood Density Database' (TWDD), containing 13,332 samples from 2,994 species, 1,022 genera, and 156 plant families across six continents (72% from Africa). TWDD offers direct measurements of oven-dry (oven-dry mass/oven-dry volume, all samples), air-dry (air-dry mass/air-dry volume, 6,408 samples), green (green mass/green volume, 1,657 samples), and basic wood density (oven-dry mass/green volume, 1,686 samples). Basic density was estimated for the remaining 11,646 samples via conversion from oven-dry density. TWDD closes a substantial wood density data gap, especially in Africa, adding 1,164 new species, 160 new genera, and 8 new plant families not included in GWDD or CIRAD datasets. The TWDD provides a critical resource for advancing research on forest community dynamics, ecosystem functioning, carbon cycling, and trait-based ecology worldwide.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding how processing, structure, properties, and performance interact is essential for guiding materials design and discovery. Yet, causal mechanisms linking these elements are typically scattered across text, figures, and references in the literature, and efforts to systematically mine and organize such knowledge remain limited. In this work, we leverage an LLM-based mechanism deduction framework to construct a dataset of 207,200 fine-grained mechanisms with 1,113,940 multimodal evidences from 61,766 materials science research articles. Each mechanism is linked to a specific causal relation among the tetrahedral elements and is supported by evidence from experiment information, characterization results, and external knowledge, with its accuracy verified by materials science researchers. This dataset provides a large-scale, cross-validated collection of multimodal mechanism knowledge in materials science, serving as a resource for data-driven research and intelligent analysis.
{"title":"A multimodal dataset of causal mechanisms in materials science literature.","authors":"Yinpeng Liu, Congrui Wang, Jiawei Liu, Xiang Shi, Yong Huang, Qikai Cheng, Wei Lu","doi":"10.1038/s41597-026-06598-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06598-5","url":null,"abstract":"<p><p>Understanding how processing, structure, properties, and performance interact is essential for guiding materials design and discovery. Yet, causal mechanisms linking these elements are typically scattered across text, figures, and references in the literature, and efforts to systematically mine and organize such knowledge remain limited. In this work, we leverage an LLM-based mechanism deduction framework to construct a dataset of 207,200 fine-grained mechanisms with 1,113,940 multimodal evidences from 61,766 materials science research articles. Each mechanism is linked to a specific causal relation among the tetrahedral elements and is supported by evidence from experiment information, characterization results, and external knowledge, with its accuracy verified by materials science researchers. This dataset provides a large-scale, cross-validated collection of multimodal mechanism knowledge in materials science, serving as a resource for data-driven research and intelligent analysis.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-026-06612-w
Emily St John, Anna-Louise Reysenbach
Actively venting high temperature deep-sea hydrothermal vent deposits along tectonic spreading centers and in backarc basins harbor a rich diversity of thermophilic Bacteria and Archaea, many of which have no representatives in cultivation nor any genomic representation in databases. Here, in order to produce a global-scale time series metagenomic resource for studying the microbial functional and genomic diversity in these high temperature ecosystems, we obtained 70 metagenomes from collections across spatial and temporal gradients from 21 different vent fields spanning 16 years (1993-2009). The dataset (Deep-Sea Hydrothermal Vent dataset (DSV70)) includes 3.56 Tbp of raw DNA sequence reads, that have been assembled to produce 7,422 medium- to high-quality (based on CheckM2) metagenome-assembled genomes (MAGs) of Bacteria (6,063 MAGs) and Archaea (1,359 MAGs). Collectively, this DSV70 dataset and the published 40 metagenomes from more recent deep-sea collections (2004 to 2018), represent a valuable resource for exploring the functional and phylogenomic diversity of the deep-sea hydrothermal microbiomes, and provide many reference genomes for studies in the taxonomy and systematics of poorly studied microbial lineages. Further, with the interest in mining the mineral resources at deep-sea vents, the DSV70 provides a genomic legacy for monitoring impacts on the microbial communities in these systems.
{"title":"Global deep-sea hydrothermal deposit metagenomes and metagenome-assembled genomes over time and space.","authors":"Emily St John, Anna-Louise Reysenbach","doi":"10.1038/s41597-026-06612-w","DOIUrl":"https://doi.org/10.1038/s41597-026-06612-w","url":null,"abstract":"<p><p>Actively venting high temperature deep-sea hydrothermal vent deposits along tectonic spreading centers and in backarc basins harbor a rich diversity of thermophilic Bacteria and Archaea, many of which have no representatives in cultivation nor any genomic representation in databases. Here, in order to produce a global-scale time series metagenomic resource for studying the microbial functional and genomic diversity in these high temperature ecosystems, we obtained 70 metagenomes from collections across spatial and temporal gradients from 21 different vent fields spanning 16 years (1993-2009). The dataset (Deep-Sea Hydrothermal Vent dataset (DSV70)) includes 3.56 Tbp of raw DNA sequence reads, that have been assembled to produce 7,422 medium- to high-quality (based on CheckM2) metagenome-assembled genomes (MAGs) of Bacteria (6,063 MAGs) and Archaea (1,359 MAGs). Collectively, this DSV70 dataset and the published 40 metagenomes from more recent deep-sea collections (2004 to 2018), represent a valuable resource for exploring the functional and phylogenomic diversity of the deep-sea hydrothermal microbiomes, and provide many reference genomes for studies in the taxonomy and systematics of poorly studied microbial lineages. Further, with the interest in mining the mineral resources at deep-sea vents, the DSV70 provides a genomic legacy for monitoring impacts on the microbial communities in these systems.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present our new Brain/MINDS 3D digital marmoset brain atlas version 2.0 (BMA2.0), a population-based 3D digital brain atlas of the common marmoset (Callithrix jacchus), designed to overcome the limitations of previous single subject atlases that are prone to structural biases arising from individual variation. Here, manually delineated cortical regions from 10 myelin-stained brains were used to create a generalized cortical parcellation. Newly refined subcortical regions from a previous atlas and a completely new cerebellum parcellation were also incorporated, resulting in a comprehensive whole brain parcellation for both hemispheres. To facilitate multimodal data analysis, the atlas package includes co-registered average templates for myelin and Nissl staining from the same individuals, ex vivo MRI T2 (91 individuals), and in vivo MRI T2 (446 individuals). Cortical flat maps and pial, cortical mid-thickness, and white matter surfaces are also provided. BMA2.0 provides a central brain space for multimodal data integration, spatial analysis, and comparative neuroscience. Standard formats and transformations are provided for easy integration into existing workflows and interoperability with existing atlases.
{"title":"Brain/MINDS Marmoset Brain Atlas 2.0: Population Cortical Parcellation With Multi-Modal Templates.","authors":"Rui Gong, Noritaka Ichinohe, Hiroshi Abe, Toshiki Tani, Mengkuan Lin, Takuto Okuno, Ken Nakae, Junichi Hata, Shin Ishii, Patrice Delmas, Shahrokh Heidari, Jiaxuan Wang, Tetsuo Yamamori, Hideyuki Okano, Alexander Woodward","doi":"10.1038/s41597-026-06601-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06601-z","url":null,"abstract":"<p><p>We present our new Brain/MINDS 3D digital marmoset brain atlas version 2.0 (BMA2.0), a population-based 3D digital brain atlas of the common marmoset (Callithrix jacchus), designed to overcome the limitations of previous single subject atlases that are prone to structural biases arising from individual variation. Here, manually delineated cortical regions from 10 myelin-stained brains were used to create a generalized cortical parcellation. Newly refined subcortical regions from a previous atlas and a completely new cerebellum parcellation were also incorporated, resulting in a comprehensive whole brain parcellation for both hemispheres. To facilitate multimodal data analysis, the atlas package includes co-registered average templates for myelin and Nissl staining from the same individuals, ex vivo MRI T2 (91 individuals), and in vivo MRI T2 (446 individuals). Cortical flat maps and pial, cortical mid-thickness, and white matter surfaces are also provided. BMA2.0 provides a central brain space for multimodal data integration, spatial analysis, and comparative neuroscience. Standard formats and transformations are provided for easy integration into existing workflows and interoperability with existing atlases.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-025-06519-y
Abdul Malik, Sateesh Masabathini, Mohsin Ahmed Shaikh, Qinqin Kong, Muhammad Usman, Dasari Hari Prasad, Ibrahim Hoteit
Heatwaves are becoming more intense and frequent as global temperatures rise, affecting vulnerable populations, particularly in low-income communities. Addressing the impacts of heatwaves requires high-resolution data to assess their influence on labour productivity, public health, and climate risk. We introduce the Comprehensive Heat Indices (CHI) dataset, a high-resolution (0.1° × 0.1°) hourly dataset from 1950 to 2024, derived from the ERA5 and ERA5-Land reanalyses. The CHI dataset encompasses thirteen heat stress indices, including wet-bulb temperature, universal thermal climate index, mean radiant temperature, wind chill, and lethal heat stress index (Ls). Thresholds for Ls are empirically linked to mortality, enabling the identification of life-threatening heat events. Ls is sensitive to soil moisture variability, improving assessments in agricultural regions. The CHI dataset supports indoor and outdoor applications and is sensitive to humidity, radiation, and wind. Covering the global land area from 60°S to 75°N and 180°W to 180°E, it provides a unique, long-term perspective on spatial and temporal trends in heat stress, which are critical for climate impact research and adaptation planning.
{"title":"A Global High-Resolution Comprehensive Heat Indices Dataset from 1950 to 2024.","authors":"Abdul Malik, Sateesh Masabathini, Mohsin Ahmed Shaikh, Qinqin Kong, Muhammad Usman, Dasari Hari Prasad, Ibrahim Hoteit","doi":"10.1038/s41597-025-06519-y","DOIUrl":"https://doi.org/10.1038/s41597-025-06519-y","url":null,"abstract":"<p><p>Heatwaves are becoming more intense and frequent as global temperatures rise, affecting vulnerable populations, particularly in low-income communities. Addressing the impacts of heatwaves requires high-resolution data to assess their influence on labour productivity, public health, and climate risk. We introduce the Comprehensive Heat Indices (CHI) dataset, a high-resolution (0.1° × 0.1°) hourly dataset from 1950 to 2024, derived from the ERA5 and ERA5-Land reanalyses. The CHI dataset encompasses thirteen heat stress indices, including wet-bulb temperature, universal thermal climate index, mean radiant temperature, wind chill, and lethal heat stress index (Ls). Thresholds for Ls are empirically linked to mortality, enabling the identification of life-threatening heat events. Ls is sensitive to soil moisture variability, improving assessments in agricultural regions. The CHI dataset supports indoor and outdoor applications and is sensitive to humidity, radiation, and wind. Covering the global land area from 60°S to 75°N and 180°W to 180°E, it provides a unique, long-term perspective on spatial and temporal trends in heat stress, which are critical for climate impact research and adaptation planning.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-025-06526-z
Pascal Hansen, Ji Woong Brian Kim, Antony Goldenberg, Juo Tung Chen, Yuanzhe Amos Li, Anton Deguet, Brandon White, De Ru Tsai, Richard Cha, Jeffrey Jopling, Paul Maria Scheikl, Axel Krieger
The growing global shortage of skilled surgeons underscores the need for intelligent, assistive technologies in the operating room. To address this challenge, we introduce ImitateCholec, a publicly available dataset specifically designed to advance autonomous robotic systems during the critical clipping and cutting phase of laparoscopic cholecystectomy. The dataset comprises over 18,000 demonstrations from 34 ex vivo porcine cholecystectomies, totaling approximately 20 hours of data. Each clipping and cutting phase recorded in the dataset is segmented into 17 distinct surgical tasks. ImitateCholec uniquely integrates endoscopic videos captured from multiple camera perspectives with comprehensive kinematic data acquired through the da Vinci Research Kit. Both optimal demonstration executions and recovery maneuvers were systematically recorded, enabling the training of imitation learning models capable of robustly addressing real-world surgical variability. Primarily, ImitateCholec facilitates imitation learning for long-horizon surgical workflow execution, significantly advancing the development of autonomous robotic systems toward achieving phase-level autonomy and, ultimately, full procedural autonomy. Additional supported applications include surgical workflow modeling, error recognition, and surgical tool pose estimation.
{"title":"ImitateCholec: A Multimodal Dataset for Long-Horizon Imitation Learning in Robotic Cholecystectomy.","authors":"Pascal Hansen, Ji Woong Brian Kim, Antony Goldenberg, Juo Tung Chen, Yuanzhe Amos Li, Anton Deguet, Brandon White, De Ru Tsai, Richard Cha, Jeffrey Jopling, Paul Maria Scheikl, Axel Krieger","doi":"10.1038/s41597-025-06526-z","DOIUrl":"https://doi.org/10.1038/s41597-025-06526-z","url":null,"abstract":"<p><p>The growing global shortage of skilled surgeons underscores the need for intelligent, assistive technologies in the operating room. To address this challenge, we introduce ImitateCholec, a publicly available dataset specifically designed to advance autonomous robotic systems during the critical clipping and cutting phase of laparoscopic cholecystectomy. The dataset comprises over 18,000 demonstrations from 34 ex vivo porcine cholecystectomies, totaling approximately 20 hours of data. Each clipping and cutting phase recorded in the dataset is segmented into 17 distinct surgical tasks. ImitateCholec uniquely integrates endoscopic videos captured from multiple camera perspectives with comprehensive kinematic data acquired through the da Vinci Research Kit. Both optimal demonstration executions and recovery maneuvers were systematically recorded, enabling the training of imitation learning models capable of robustly addressing real-world surgical variability. Primarily, ImitateCholec facilitates imitation learning for long-horizon surgical workflow execution, significantly advancing the development of autonomous robotic systems toward achieving phase-level autonomy and, ultimately, full procedural autonomy. Additional supported applications include surgical workflow modeling, error recognition, and surgical tool pose estimation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-026-06561-4
Matthew D Robbins, Sunchung Park, B Shaun Bushman, Scott E Warnke, Jinyoung Y Barnaby
Creeping bentgrass (Agrostis stolonifera) is a widely used cool-season turfgrass valued for its fine texture and ability to form dense, uniform turfs. However, its complex allotetraploid genome and high repetitive content have posed challenges for genomic research and molecular breeding. Here, we report a haplotype-resolved chromosome-level genome assembly generated using PacBio HiFi and Oxford Nanopore sequencing with Omni-C scaffolding. The final assembly spans 5.4 Gb, with a scaffold N50 of 187.9 Mb and comprises 28 pseudochromosomes representing fully phased haplotypes (2n = 4x = 28). BUSCO analysis indicated 98.8% completeness, indicating the high quality of the assembly. We annotated 146,216 protein-coding genes and found that transposable elements account for 79.8% of the genome, dominated by LTR-Gypsy elements. Subgenome-specific LTR clustering and comparative genomic alignments supported an allopolyploid origin involving two diverged progenitors. This high-quality genome provides a foundational resource for functional genomics and breeding efforts to improve disease resistance, abiotic stress tolerance, and turf quality.
{"title":"Haplotype-resolved chromosome-level genome assembly of creeping bentgrass, Agrostis stolonifera.","authors":"Matthew D Robbins, Sunchung Park, B Shaun Bushman, Scott E Warnke, Jinyoung Y Barnaby","doi":"10.1038/s41597-026-06561-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06561-4","url":null,"abstract":"<p><p>Creeping bentgrass (Agrostis stolonifera) is a widely used cool-season turfgrass valued for its fine texture and ability to form dense, uniform turfs. However, its complex allotetraploid genome and high repetitive content have posed challenges for genomic research and molecular breeding. Here, we report a haplotype-resolved chromosome-level genome assembly generated using PacBio HiFi and Oxford Nanopore sequencing with Omni-C scaffolding. The final assembly spans 5.4 Gb, with a scaffold N50 of 187.9 Mb and comprises 28 pseudochromosomes representing fully phased haplotypes (2n = 4x = 28). BUSCO analysis indicated 98.8% completeness, indicating the high quality of the assembly. We annotated 146,216 protein-coding genes and found that transposable elements account for 79.8% of the genome, dominated by LTR-Gypsy elements. Subgenome-specific LTR clustering and comparative genomic alignments supported an allopolyploid origin involving two diverged progenitors. This high-quality genome provides a foundational resource for functional genomics and breeding efforts to improve disease resistance, abiotic stress tolerance, and turf quality.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1038/s41597-026-06578-9
Xiao Xiang Zhu, Qingyu Li, Yilei Shi, Yuanyuan Wang, Adam J Stewart, Jonathan Prexl, Fahong Zhang
Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. Using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the GlobalBuildingMap (GBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times - depending on the efficiency of the solar device - the global energy consumption in 2020.
{"title":"GlobalBuildingMap - Unveiling the mystery of global buildings.","authors":"Xiao Xiang Zhu, Qingyu Li, Yilei Shi, Yuanyuan Wang, Adam J Stewart, Jonathan Prexl, Fahong Zhang","doi":"10.1038/s41597-026-06578-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06578-9","url":null,"abstract":"<p><p>Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. Using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the GlobalBuildingMap (GBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times - depending on the efficiency of the solar device - the global energy consumption in 2020.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145990582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}