Pub Date : 2026-02-20DOI: 10.1038/s41597-026-06849-5
Li-Na Du, Zhuo-Cong Wang, Zhuo-Ni Chen, Zhi-Xian Qin, Chen-Hong Li
Traccatichthys pulcher is an ornamental loach species recognized for its vibrant body coloration, characteristic black dorsal fin margin, and iridescent green lateral stripes. To advance genomic research on this species, a high-quality, near telomere-to-telomere (T2T) genome assembly was generated using PacBio HiFi, ONT ultra-long, and Hi-C sequencing technologies. The resulting haplotype-resolved assembly spanned approximately 623.68 Mb, with a contig N50 of 22.9 Mb, and was anchored onto 24 chromosomes. Telomeric sequences were detected at both ends of eight chromosomes and at one end of 13 chromosomes. Twenty-three chromosomes were entirely gapless, while a single gap was identified in the remaining chromosome. The assembly contained 119.1 Mb of repetitive elements, and 23 967 protein-coding genes were annotated. BUSCO analysis indicated high completeness, with 98.6% of conserved genes recovered. This high-quality, near T2T genome assembly offers a valuable and robust genetic resource for investigating molecular mechanisms, evolutionary processes, conservation biology, and selective breeding of T. pulcher.
{"title":"Near telomere-to-telomere genome assembly of the stone loach (Traccatichthys pulcher).","authors":"Li-Na Du, Zhuo-Cong Wang, Zhuo-Ni Chen, Zhi-Xian Qin, Chen-Hong Li","doi":"10.1038/s41597-026-06849-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06849-5","url":null,"abstract":"<p><p>Traccatichthys pulcher is an ornamental loach species recognized for its vibrant body coloration, characteristic black dorsal fin margin, and iridescent green lateral stripes. To advance genomic research on this species, a high-quality, near telomere-to-telomere (T2T) genome assembly was generated using PacBio HiFi, ONT ultra-long, and Hi-C sequencing technologies. The resulting haplotype-resolved assembly spanned approximately 623.68 Mb, with a contig N50 of 22.9 Mb, and was anchored onto 24 chromosomes. Telomeric sequences were detected at both ends of eight chromosomes and at one end of 13 chromosomes. Twenty-three chromosomes were entirely gapless, while a single gap was identified in the remaining chromosome. The assembly contained 119.1 Mb of repetitive elements, and 23 967 protein-coding genes were annotated. BUSCO analysis indicated high completeness, with 98.6% of conserved genes recovered. This high-quality, near T2T genome assembly offers a valuable and robust genetic resource for investigating molecular mechanisms, evolutionary processes, conservation biology, and selective breeding of T. pulcher.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146259178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-20DOI: 10.1038/s41597-026-06799-y
Daria Blinova, Gayathri Emuru, Rakesh Emuru, Benjamin E Bagozzi
The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) was adopted in 1975 in an effort to manage the international biodiversity trade. Meetings regulating the implementation of CITES have since been held every 2-3 years with the involvement of diverse stakeholders representing country Party-signatories, non-Party states, international organizations, private sector interests, and NGOs. These meetings and their outcomes are of interest to environmental science scholars, social scientists, journalists, and advocacy organizations. Yet, no usable data on meeting attendees and their details exists. This limits researchers' and advocates' abilities to study or track CITES meeting attendance patterns, and their associated causes and effects. Applying NLP techniques to PDF attendance rosters, we build the first CITES attendee-level dataset, covering 20,987 attendee records for all meetings to date. The dataset contains rich information on attendee geo-locations, names, affiliations and genders, and variables associated with attendee delegations, among others. Summaries and validations underscore the promise of our data and suggest new avenues for research on international wildlife conservation.
{"title":"Geo-located attendance data for CITES Conferences of the Parties.","authors":"Daria Blinova, Gayathri Emuru, Rakesh Emuru, Benjamin E Bagozzi","doi":"10.1038/s41597-026-06799-y","DOIUrl":"https://doi.org/10.1038/s41597-026-06799-y","url":null,"abstract":"<p><p>The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) was adopted in 1975 in an effort to manage the international biodiversity trade. Meetings regulating the implementation of CITES have since been held every 2-3 years with the involvement of diverse stakeholders representing country Party-signatories, non-Party states, international organizations, private sector interests, and NGOs. These meetings and their outcomes are of interest to environmental science scholars, social scientists, journalists, and advocacy organizations. Yet, no usable data on meeting attendees and their details exists. This limits researchers' and advocates' abilities to study or track CITES meeting attendance patterns, and their associated causes and effects. Applying NLP techniques to PDF attendance rosters, we build the first CITES attendee-level dataset, covering 20,987 attendee records for all meetings to date. The dataset contains rich information on attendee geo-locations, names, affiliations and genders, and variables associated with attendee delegations, among others. Summaries and validations underscore the promise of our data and suggest new avenues for research on international wildlife conservation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146259193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06884-2
Jordan Holmes, Scott L England
We present a comprehensive dataset of dayside auroral emissions observed by the Global-scale Observations of the Limb and Disk (GOLD) mission from October 2018 to June 2025. The dataset contains over 47,000 unique scans of the northern aurora in three far-ultraviolet spectral channels (OI 135.6 nm, NI 149.3 nm, and N₂ LBH), estimates of the background dayglow, binary masks of auroral locations, and other corresponding spatial and temporal metadata. The OI 135.6 nm, NI 149.3 nm, and N₂ LBH emissions are far-ultraviolet signatures of electron-impact excitation in the upper atmosphere and therefore serve as tracers of auroral electron precipitation. From this dataset, auroral pixels are directly available with no dayglow contamination of the emissions. Auroral signals are extracted through a multi-stage processing pipeline inspired by computer vision and machine learning techniques. This dataset provides a consistent view of the dayside aurora over the North American and Atlantic sectors, enabling studies of auroral dynamics with GOLD observations.
我们提供了2018年10月至2025年6月全球尺度翼盘观测(GOLD)任务观测到的日间极光发射的综合数据集。该数据集包含了47,000多个北极光在三个远紫外光谱通道(OI 135.6 nm, NI 149.3 nm和n2lbh)上的独特扫描,对背景日光的估计,极光位置的二元掩模,以及其他相应的时空元数据。OI 135.6 nm, NI 149.3 nm和n2lbh发射是高层大气中电子撞击激发的远紫外特征,因此可以作为极光电子沉淀的示踪剂。从这个数据集中,可以直接获得极光像素,没有排放的日光污染。极光信号通过计算机视觉和机器学习技术启发的多阶段处理管道提取。该数据集提供了北美和大西洋地区白天侧极光的一致视图,使极光动力学研究与GOLD观测成为可能。
{"title":"A dayside aurora dataset from the Global-scale Observations of the Limb and Disk mission.","authors":"Jordan Holmes, Scott L England","doi":"10.1038/s41597-026-06884-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06884-2","url":null,"abstract":"<p><p>We present a comprehensive dataset of dayside auroral emissions observed by the Global-scale Observations of the Limb and Disk (GOLD) mission from October 2018 to June 2025. The dataset contains over 47,000 unique scans of the northern aurora in three far-ultraviolet spectral channels (OI 135.6 nm, NI 149.3 nm, and N₂ LBH), estimates of the background dayglow, binary masks of auroral locations, and other corresponding spatial and temporal metadata. The OI 135.6 nm, NI 149.3 nm, and N₂ LBH emissions are far-ultraviolet signatures of electron-impact excitation in the upper atmosphere and therefore serve as tracers of auroral electron precipitation. From this dataset, auroral pixels are directly available with no dayglow contamination of the emissions. Auroral signals are extracted through a multi-stage processing pipeline inspired by computer vision and machine learning techniques. This dataset provides a consistent view of the dayside aurora over the North American and Atlantic sectors, enabling studies of auroral dynamics with GOLD observations.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06825-z
William G Resh, Keunyoung Eli Lee, Yi Ming, Xinyao Andy Xia, Nicole Dias, Kecheng Anderson Liu, Darren Cao, William Huh
The Integrated Network Solutions in Government Hiring Trends (INSIGHT+) database supports research on the U.S. federal civil service labor market. As of September 2024, the federal workforce included over 2.4 million civilian employees spanning more than 400 occupations, with over 85% located outside the Washington, D.C. metropolitan area. INSIGHT + integrates micro-, meso-, and macro-level statistics from sources ranging from governmental and academic sources. Our database covers workforce dynamics from 2018 to 2023. It allows granular multivariate analyses and accommodates agency-, location-, and institution-specific tables that can be matched in the future with further detailed data on agency outputs such as discretionary grants, contracts, contingent liabilities, and various economic impacts. This article outlines the current capabilities and significance of INSIGHT + for civil service labor market research, emphasizing ongoing enhancements to enrich analyses of civil service institutions.
{"title":"Integrated Network Solutions in Government Hiring Trends (INSIGHT+).","authors":"William G Resh, Keunyoung Eli Lee, Yi Ming, Xinyao Andy Xia, Nicole Dias, Kecheng Anderson Liu, Darren Cao, William Huh","doi":"10.1038/s41597-026-06825-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06825-z","url":null,"abstract":"<p><p>The Integrated Network Solutions in Government Hiring Trends (INSIGHT+) database supports research on the U.S. federal civil service labor market. As of September 2024, the federal workforce included over 2.4 million civilian employees spanning more than 400 occupations, with over 85% located outside the Washington, D.C. metropolitan area. INSIGHT + integrates micro-, meso-, and macro-level statistics from sources ranging from governmental and academic sources. Our database covers workforce dynamics from 2018 to 2023. It allows granular multivariate analyses and accommodates agency-, location-, and institution-specific tables that can be matched in the future with further detailed data on agency outputs such as discretionary grants, contracts, contingent liabilities, and various economic impacts. This article outlines the current capabilities and significance of INSIGHT + for civil service labor market research, emphasizing ongoing enhancements to enrich analyses of civil service institutions.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06868-2
Xudong Yang, Pengbo Yan, Jing Jin, Xinyi Liu, Jun Yang
Diverse tree communities can bolster urban ecosystem resilience and provide vital ecosystem services. However, existing urban tree species datasets have limited geographic coverage and contain inadequate attributes. To address those gaps, we developed the Global Urban Tree Species (GUTS) dataset by integrating data from literature, biodiversity databases, and other open sources. The new dataset encompasses 159,845 occurrence records of 10,094 tree species in 8,349 cities and 139 countries. Among them, 109,879 records were confirmed from urban areas, representing 11.18% of global tree species diversity. The dataset has been validated using multiple methods. GUTS fills critical data gaps and provides a foundation for future research and management of global urban biodiversity.
{"title":"Global Urban Tree Species (GUTS): Revealing tree species diversity across the world's urban areas.","authors":"Xudong Yang, Pengbo Yan, Jing Jin, Xinyi Liu, Jun Yang","doi":"10.1038/s41597-026-06868-2","DOIUrl":"https://doi.org/10.1038/s41597-026-06868-2","url":null,"abstract":"<p><p>Diverse tree communities can bolster urban ecosystem resilience and provide vital ecosystem services. However, existing urban tree species datasets have limited geographic coverage and contain inadequate attributes. To address those gaps, we developed the Global Urban Tree Species (GUTS) dataset by integrating data from literature, biodiversity databases, and other open sources. The new dataset encompasses 159,845 occurrence records of 10,094 tree species in 8,349 cities and 139 countries. Among them, 109,879 records were confirmed from urban areas, representing 11.18% of global tree species diversity. The dataset has been validated using multiple methods. GUTS fills critical data gaps and provides a foundation for future research and management of global urban biodiversity.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06810-6
Yang Jia, Yihan Guo, Yetang Chen, Xinmeng Zhang, Gang Wang, Qixing Zhang
Because no multimodal dataset was previously available for fire detection research, we developed the MmodalFire multimodal fire detection dataset for training and evaluation of indoor fire detection algorithms. This publicly available dataset includes video and physical sensing data for fire detection use. The dataset comprises 65 videos that simultaneously captured six physical sensing data types, including smoke density, temperature, and infrared and ultraviolet radiation at 5 μm, 4.4 μm, and 3.8 μm. All data were acquired using monitoring cameras and fire sensors deployed as part of a fire detection system that was carefully designed to cover all possible variations, including different wind velocities, illumination conditions, common interference types, and occlusions. All videos and corresponding physical sensing data sequences are labeled as either fire or non-fire sequences. Using the MmodalFire dataset, we evaluated four basic baseline fusion models and the proposed dynamic fusion models to provide a reference for multimodal fire detection research under controlled laboratory settings, promoting research on multimodal fire detection algorithms using controlled-setting data.
{"title":"MmodalFire: A Continuous Multimodal Dataset Comprising Video and Physical Sensing Data for Detecting Indoor Fires.","authors":"Yang Jia, Yihan Guo, Yetang Chen, Xinmeng Zhang, Gang Wang, Qixing Zhang","doi":"10.1038/s41597-026-06810-6","DOIUrl":"https://doi.org/10.1038/s41597-026-06810-6","url":null,"abstract":"<p><p>Because no multimodal dataset was previously available for fire detection research, we developed the MmodalFire multimodal fire detection dataset for training and evaluation of indoor fire detection algorithms. This publicly available dataset includes video and physical sensing data for fire detection use. The dataset comprises 65 videos that simultaneously captured six physical sensing data types, including smoke density, temperature, and infrared and ultraviolet radiation at 5 μm, 4.4 μm, and 3.8 μm. All data were acquired using monitoring cameras and fire sensors deployed as part of a fire detection system that was carefully designed to cover all possible variations, including different wind velocities, illumination conditions, common interference types, and occlusions. All videos and corresponding physical sensing data sequences are labeled as either fire or non-fire sequences. Using the MmodalFire dataset, we evaluated four basic baseline fusion models and the proposed dynamic fusion models to provide a reference for multimodal fire detection research under controlled laboratory settings, promoting research on multimodal fire detection algorithms using controlled-setting data.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06863-7
Kenji Fujisaki, Fabien Ferchaud, Hugues Clivot, Elisa Bruni, Bertrand Guenet, Christian Pichot, Antoine Versini, François Baudin, Antonio Bispo, Philippe Peylin, Manuel P Martin, Johannes L Jensen, Jørgen Eriksen, Claire Chenu, Andrew S Gregory, Margaret J Glendining, Ines Merbach, Nicolas Beaudoin, Bruno Mary, Alain Mollier, Gilles Tison, Christophe Montagnier, Abad Chabbi, Françoise Vertès, Alice Cadéro, Anne-Isabelle Graux, Sylvain Pellerin, Florent Levavasseur, Manon Gilles, Thierry Morvan, Camille Resseguier, Luis Milesi, Alicia Irizar, Adriàn Andriulo, Marie-Noël Mistou, Arnaud Butier, Michel Bertrand, Bénédicte Autret, Marie-Hélène Jeuffroy, Gilles Grandeau, Thierry Doré, Vincent Cellier, Alain Berthier, Sébastien Darras, Guillaume Audebert, Ludovic Pasquier, Fabien Ecalle, Antoine Savoie, Marcus Schiedung, Christopher Poeplau, Nadia I Maaroufi, Thomas Kätterer, Martin A Bolinder, Jonathan Sanderman, Pierre Barré
Soil organic carbon (SOC) models need independent evaluation against field measurements, but those latter are rarely publicly available and harmonized. In this study, we collected and shared data from 167 agronomic treatments in 34 agronomic long-term experiments (LTEs) located in temperate croplands, allowing the evaluation of several soil organic C models such as RothC, Century, AMG, MIMICS, ICBM, Millenial, and CTOOL. The dataset includes climate data, soil properties, C inputs from crops (n = 4588 records) and organic amendments, irrigation data, monthly soil cover, as well as SOC stock measurements in the topsoil layer (n = 1328 records). Climate, soil moisture, and soil temperature data were extracted from daily climate databases. Carbon inputs from crops were calculated from observed yields and harvest index, with some harvest index values estimated, combined with crop allometric coefficients from the literature. Descriptions of LTE, agronomic treatments, methodological metadata, and a part of the code, accompanies the dataset. The dataset can be reused to evaluate single SOC models, or to evaluate an ensemble of models.
{"title":"Data from long-term experiments in temperate croplands to evaluate soil organic carbon models.","authors":"Kenji Fujisaki, Fabien Ferchaud, Hugues Clivot, Elisa Bruni, Bertrand Guenet, Christian Pichot, Antoine Versini, François Baudin, Antonio Bispo, Philippe Peylin, Manuel P Martin, Johannes L Jensen, Jørgen Eriksen, Claire Chenu, Andrew S Gregory, Margaret J Glendining, Ines Merbach, Nicolas Beaudoin, Bruno Mary, Alain Mollier, Gilles Tison, Christophe Montagnier, Abad Chabbi, Françoise Vertès, Alice Cadéro, Anne-Isabelle Graux, Sylvain Pellerin, Florent Levavasseur, Manon Gilles, Thierry Morvan, Camille Resseguier, Luis Milesi, Alicia Irizar, Adriàn Andriulo, Marie-Noël Mistou, Arnaud Butier, Michel Bertrand, Bénédicte Autret, Marie-Hélène Jeuffroy, Gilles Grandeau, Thierry Doré, Vincent Cellier, Alain Berthier, Sébastien Darras, Guillaume Audebert, Ludovic Pasquier, Fabien Ecalle, Antoine Savoie, Marcus Schiedung, Christopher Poeplau, Nadia I Maaroufi, Thomas Kätterer, Martin A Bolinder, Jonathan Sanderman, Pierre Barré","doi":"10.1038/s41597-026-06863-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06863-7","url":null,"abstract":"<p><p>Soil organic carbon (SOC) models need independent evaluation against field measurements, but those latter are rarely publicly available and harmonized. In this study, we collected and shared data from 167 agronomic treatments in 34 agronomic long-term experiments (LTEs) located in temperate croplands, allowing the evaluation of several soil organic C models such as RothC, Century, AMG, MIMICS, ICBM, Millenial, and CTOOL. The dataset includes climate data, soil properties, C inputs from crops (n = 4588 records) and organic amendments, irrigation data, monthly soil cover, as well as SOC stock measurements in the topsoil layer (n = 1328 records). Climate, soil moisture, and soil temperature data were extracted from daily climate databases. Carbon inputs from crops were calculated from observed yields and harvest index, with some harvest index values estimated, combined with crop allometric coefficients from the literature. Descriptions of LTE, agronomic treatments, methodological metadata, and a part of the code, accompanies the dataset. The dataset can be reused to evaluate single SOC models, or to evaluate an ensemble of models.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06859-3
Yongjing Mao, Kilian Vos, Laura Cagigal, Valentine Bodin, Mitchell D Harley, Kristen D Splinter
Coastal erosion at wave-dominated beaches, primarily driven by nearshore wave dynamics, poses a substantial challenge for coastal management. While existing datasets from individual beaches have improved our understanding of site-specific coastal morphodynamics, there is a growing demand for regional-scale datasets to understand and predict regional shoreline responses to climate variability. To address this, we present a combined shoreline and nearshore wave dataset for the wave-dominated coast of southeast Australia, comprising over 8,000 cross-shore transects at 100 m spacing for over 300 beaches. For each transect, satellite-derived shoreline positions (1984-2024) and beach-face slopes are provided, alongside hourly nearshore wave parameters (1979-2024) extracted at the 10 m depth contour. Shoreline data have been validated using available field surveys, and wave data have been assessed against offshore and nearshore buoy observations. This dataset provides a valuable resource for developing regional-scale understanding of shoreline variability along wave-dominated and embayed coastlines.
{"title":"A Forty-year regional-scale dataset of shoreline change and nearshore wave conditions in Southeast Australia.","authors":"Yongjing Mao, Kilian Vos, Laura Cagigal, Valentine Bodin, Mitchell D Harley, Kristen D Splinter","doi":"10.1038/s41597-026-06859-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06859-3","url":null,"abstract":"<p><p>Coastal erosion at wave-dominated beaches, primarily driven by nearshore wave dynamics, poses a substantial challenge for coastal management. While existing datasets from individual beaches have improved our understanding of site-specific coastal morphodynamics, there is a growing demand for regional-scale datasets to understand and predict regional shoreline responses to climate variability. To address this, we present a combined shoreline and nearshore wave dataset for the wave-dominated coast of southeast Australia, comprising over 8,000 cross-shore transects at 100 m spacing for over 300 beaches. For each transect, satellite-derived shoreline positions (1984-2024) and beach-face slopes are provided, alongside hourly nearshore wave parameters (1979-2024) extracted at the 10 m depth contour. Shoreline data have been validated using available field surveys, and wave data have been assessed against offshore and nearshore buoy observations. This dataset provides a valuable resource for developing regional-scale understanding of shoreline variability along wave-dominated and embayed coastlines.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06691-9
Ying Yu, Xuewei Wang, Diego Manya, Angel Hsu
Reliable, comparable greenhouse gas (GHG) emissions data at the subnational level remain scarce, despite growing expectations for cities and regions to lead on climate action. Inconsistent reporting, methodological variation, and limited coverage of self-reported inventories hinder efforts to track progress and guide mitigation opportunities. To address these challenges, we develop a machine learning (ML) framework to estimate annual Scope 1 and 2 CO2-equivalent emissions for subnational jurisdictions in G20 countries from 2000 to 2020. Our approach integrates publicly available geospatial, socioeconomic, and environmental data with self-reported inventories where available, and aligns predictions with subnational administrative boundaries. Compared to traditional downscaling or proxy-based approaches, our model improves spatial relevance and predictive performance while capturing locally specific emission drivers. This globally consistent, administratively-aligned dataset can serve as a baseline for assessing climate progress, especially in data-poor or inconsistent reporting contexts, and supports more targeted, data-informed policy decisions for urban and regional decarbonization.
{"title":"Machine learning estimates for G20 subnational urban GHG emissions from 2000-2020.","authors":"Ying Yu, Xuewei Wang, Diego Manya, Angel Hsu","doi":"10.1038/s41597-026-06691-9","DOIUrl":"https://doi.org/10.1038/s41597-026-06691-9","url":null,"abstract":"<p><p>Reliable, comparable greenhouse gas (GHG) emissions data at the subnational level remain scarce, despite growing expectations for cities and regions to lead on climate action. Inconsistent reporting, methodological variation, and limited coverage of self-reported inventories hinder efforts to track progress and guide mitigation opportunities. To address these challenges, we develop a machine learning (ML) framework to estimate annual Scope 1 and 2 CO<sub>2</sub>-equivalent emissions for subnational jurisdictions in G20 countries from 2000 to 2020. Our approach integrates publicly available geospatial, socioeconomic, and environmental data with self-reported inventories where available, and aligns predictions with subnational administrative boundaries. Compared to traditional downscaling or proxy-based approaches, our model improves spatial relevance and predictive performance while capturing locally specific emission drivers. This globally consistent, administratively-aligned dataset can serve as a baseline for assessing climate progress, especially in data-poor or inconsistent reporting contexts, and supports more targeted, data-informed policy decisions for urban and regional decarbonization.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1038/s41597-026-06881-5
Aline M A Martins, Diogo G Biagi, Blake L Tsu, Juliana de Saldanha da Gama Fischer, Luisa Bulcao Vieira Coelho, Paulo Costa Carvalho, Alysson R Muotri
This dataset contains mass spectrometry-based proteomic profiles of human brain organoids cultured on Earth for 30 days, then maintained aboard the International Space Station (ISS) for an additional 30 days, with matched ground controls that remained on Earth for the equivalent duration. Brain organoids were derived from induced pluripotent stem cell (iPSC) lines: Q83X, carrying a nonsense mutation in MECP2 from a male patient with Rett syndrome, and WT83, derived from the patient's unaffected familial control. Rett syndrome is a severe X-linked neurodevelopmental disorder caused by loss-of-function mutations in MECP2, which encodes Methyl-CpG-binding protein 2, a critical epigenetic regulator. The spaceflight experiment was conducted using cryovials with automated control maintenance. Deep proteome coverage with approximately 6,000 protein groups was inferred from 56,639 peptides. This dataset provides unique insights into how the space environment affects human neural tissue and MECP2-related pathologies, serving as a resource for understanding spaceflight-induced neurological changes and as a steppingstone for future space missions.
{"title":"Proteomic dataset of MECP2-deficient and wild-type human brain organoids under spaceflight and ground conditions.","authors":"Aline M A Martins, Diogo G Biagi, Blake L Tsu, Juliana de Saldanha da Gama Fischer, Luisa Bulcao Vieira Coelho, Paulo Costa Carvalho, Alysson R Muotri","doi":"10.1038/s41597-026-06881-5","DOIUrl":"https://doi.org/10.1038/s41597-026-06881-5","url":null,"abstract":"<p><p>This dataset contains mass spectrometry-based proteomic profiles of human brain organoids cultured on Earth for 30 days, then maintained aboard the International Space Station (ISS) for an additional 30 days, with matched ground controls that remained on Earth for the equivalent duration. Brain organoids were derived from induced pluripotent stem cell (iPSC) lines: Q83X, carrying a nonsense mutation in MECP2 from a male patient with Rett syndrome, and WT83, derived from the patient's unaffected familial control. Rett syndrome is a severe X-linked neurodevelopmental disorder caused by loss-of-function mutations in MECP2, which encodes Methyl-CpG-binding protein 2, a critical epigenetic regulator. The spaceflight experiment was conducted using cryovials with automated control maintenance. Deep proteome coverage with approximately 6,000 protein groups was inferred from 56,639 peptides. This dataset provides unique insights into how the space environment affects human neural tissue and MECP2-related pathologies, serving as a resource for understanding spaceflight-induced neurological changes and as a steppingstone for future space missions.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146228511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}