首页 > 最新文献

Scientific Data最新文献

英文 中文
The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research. 最小语义内容(MSC)数据集:计算美学研究的大型平衡资源。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06816-0
Olivier Penacchio, Arslan Javed, Bogdan Raducanu, Xavier Otazu, C Alejandro Parraga

Image databases are central to empirical aesthetics, enabling tests of how image statistics relate to observers' appreciation. However, many existing databases have two key limitations: (1) they conflate low-level visual features with high-level semantic content, making it difficult to separate visual from cognitive influences on aesthetic judgments; and (2) they are imbalanced, overrepresenting highly appreciated images. To address these issues, we present the Minimum Semantic Content (MSC) database, a large, systematically curated resource for computational aesthetics. It comprises 10,426 natural scenes with reduced, homogenized semantic content, minimizing cognitive and emotional confounds. Each received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants, via crowdsourcing. The database includes both "beautified" and "uglified" versions, generated with a manipulation technique that promotes uniform coverage across the aesthetic spectrum. This broader distribution mitigates bias and overfitting in models. Validation also shows improved robustness in computational models overall. This database enables researchers to study how perceptual features shape aesthetic judgments, using stimuli with very limited semantic and contextual confounds.

图像数据库是经验美学的核心,可以测试图像统计如何与观察者的欣赏相关。然而,许多现有的数据库存在两个关键的局限性:(1)它们将低级视觉特征与高级语义内容混为一谈,使得难以区分视觉和认知对审美判断的影响;(2)它们是不平衡的,过度代表高度赞赏的图像。为了解决这些问题,我们提出了最小语义内容(MSC)数据库,这是一个大型的,系统策划的计算美学资源。它包含10426个自然场景,减少了语义内容的同质化,最大限度地减少了认知和情感上的混淆。通过众包的方式,从大约1万名参与者中抽取了100名观察者,对每个人的审美进行了打分。该数据库包括“美化”和“美化”两种版本,它们是通过一种操纵技术生成的,这种技术促进了美学范围内的统一覆盖。这种更广泛的分布减轻了模型中的偏差和过拟合。验证还显示了计算模型总体上的鲁棒性改进。这个数据库使研究人员能够研究感知特征如何影响审美判断,使用非常有限的语义和上下文混淆刺激。
{"title":"The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research.","authors":"Olivier Penacchio, Arslan Javed, Bogdan Raducanu, Xavier Otazu, C Alejandro Parraga","doi":"10.1038/s41597-026-06816-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06816-0","url":null,"abstract":"<p><p>Image databases are central to empirical aesthetics, enabling tests of how image statistics relate to observers' appreciation. However, many existing databases have two key limitations: (1) they conflate low-level visual features with high-level semantic content, making it difficult to separate visual from cognitive influences on aesthetic judgments; and (2) they are imbalanced, overrepresenting highly appreciated images. To address these issues, we present the Minimum Semantic Content (MSC) database, a large, systematically curated resource for computational aesthetics. It comprises 10,426 natural scenes with reduced, homogenized semantic content, minimizing cognitive and emotional confounds. Each received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants, via crowdsourcing. The database includes both \"beautified\" and \"uglified\" versions, generated with a manipulation technique that promotes uniform coverage across the aesthetic spectrum. This broader distribution mitigates bias and overfitting in models. Validation also shows improved robustness in computational models overall. This database enables researchers to study how perceptual features shape aesthetic judgments, using stimuli with very limited semantic and contextual confounds.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of radiocarbon dates from Holarctic mammal collagen purified with high-quality chemistry. 用高质量化学纯化的全北极哺乳动物胶原蛋白的放射性碳日期数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06562-3
Salvador Herrando-Pérez, Kieren J Mitchell, John R Southon, Chris S M Turney, Thomas W Stafford

Radiocarbon dates from megafaunal remains provide insights into climatic and anthropogenic factors shaping past ecosystems. Chronologies have advanced through rigorous chemical purification (pretreatment) of fossil vertebrate collagen for accelerator mass spectrometry (AMS) radiocarbon dating. We present MEGA14C, a comprehensive dataset of late Quaternary AMS radiocarbon dates for Holarctic large-bodied mammals, based on collagen purified by ultrafiltration (92% of records), XAD-2 purification (7%) and hydroxyproline isolation (1%). MEGA14C includes 11,715 dates spanning 8 orders, 23 families, 78 genera, 133 species and 18 subspecies, 27% from extinct taxa, and dominated by Equus, Bos, Mammuthus, Rangifer, Bison, Ursus, Cervus, Canis, Coelodonta and Sus. Where available, geolocation, genetic and isotopic data are provided. Pretreatment is critical for accurate and reproducible radiocarbon measurements, yet 44% of published dates lack this information. We addressed this gap through over 10,000 personal communications (out of >100,000 emails) with researchers and AMS laboratories among the parties involved in fossil dating. This unique dataset supports (pre)historical research and provides a foundation for future expansion and/or integration into a global radiocarbon repository.

巨型动物遗骸的放射性碳年代测定提供了对塑造过去生态系统的气候和人为因素的深入了解。通过严格的化学纯化(预处理)化石脊椎动物胶原蛋白的加速器质谱(AMS)放射性碳定年的年代进展。我们提出了MEGA14C,一个完整的全北极大型哺乳动物晚第四纪AMS放射性碳年代数据集,基于超滤纯化的胶原蛋白(92%的记录),XAD-2纯化(7%)和羟基脯氨酸分离(1%)。MEGA14C包含8目23科78属133种18亚种11715个数据,27%来自已灭绝的分类群,以马属、牛属、猛犸属、Rangifer、Bison、熊属、鹿属、犬属、Coelodonta和苏属为主。在可能的情况下,提供地理位置、遗传和同位素数据。预处理对于准确和可重复的放射性碳测量至关重要,但已公布的日期中有44%缺乏这方面的信息。我们通过与参与化石年代测定的研究人员和AMS实验室的10,000多封个人通信(从1,000,000封电子邮件中)解决了这一差距。这个独特的数据集支持(前)历史研究,并为未来扩展和/或整合到全球放射性碳库提供了基础。
{"title":"A dataset of radiocarbon dates from Holarctic mammal collagen purified with high-quality chemistry.","authors":"Salvador Herrando-Pérez, Kieren J Mitchell, John R Southon, Chris S M Turney, Thomas W Stafford","doi":"10.1038/s41597-026-06562-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06562-3","url":null,"abstract":"<p><p>Radiocarbon dates from megafaunal remains provide insights into climatic and anthropogenic factors shaping past ecosystems. Chronologies have advanced through rigorous chemical purification (pretreatment) of fossil vertebrate collagen for accelerator mass spectrometry (AMS) radiocarbon dating. We present MEGA14C, a comprehensive dataset of late Quaternary AMS radiocarbon dates for Holarctic large-bodied mammals, based on collagen purified by ultrafiltration (92% of records), XAD-2 purification (7%) and hydroxyproline isolation (1%). MEGA14C includes 11,715 dates spanning 8 orders, 23 families, 78 genera, 133 species and 18 subspecies, 27% from extinct taxa, and dominated by Equus, Bos, Mammuthus, Rangifer, Bison, Ursus, Cervus, Canis, Coelodonta and Sus. Where available, geolocation, genetic and isotopic data are provided. Pretreatment is critical for accurate and reproducible radiocarbon measurements, yet 44% of published dates lack this information. We addressed this gap through over 10,000 personal communications (out of >100,000 emails) with researchers and AMS laboratories among the parties involved in fossil dating. This unique dataset supports (pre)historical research and provides a foundation for future expansion and/or integration into a global radiocarbon repository.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Collectivism Index for Investigating Cultural Variation in China across Regions and Time. 考察中国跨地域、跨时间文化差异的集体主义指数。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06661-1
Liuqing Wei, Thomas Talhelm, Jiong Zhu, Alexander Scott English, An Huang

We created a collectivism index to measure regional differences in China. The index uses eight Census indicators that reflect family living arrangements, marriage stability, innovation, and independence. The Census data offers large, nationally representative data, which ensures high-fidelity measurement and fine-grained geographic resolution from provinces down to prefectures (N = 356). The data also allows researchers to track change over time because the data stretches from 1982 to 2020. This dataset is useful for exploring causes of societal differences, outcomes of collectivism, and cultural shifts in longitudinal data.

我们创建了一个集体主义指数来衡量中国的地区差异。该指数采用了8项人口普查指标,反映了家庭生活安排、婚姻稳定、创新和独立性。人口普查数据提供了大量具有全国代表性的数据,确保了从省到县的高保真测量和精细的地理分辨率(N = 356)。这些数据还允许研究人员追踪随时间的变化,因为数据从1982年延伸到2020年。这个数据集对于探索社会差异的原因、集体主义的结果和纵向数据中的文化转变是有用的。
{"title":"A Collectivism Index for Investigating Cultural Variation in China across Regions and Time.","authors":"Liuqing Wei, Thomas Talhelm, Jiong Zhu, Alexander Scott English, An Huang","doi":"10.1038/s41597-026-06661-1","DOIUrl":"https://doi.org/10.1038/s41597-026-06661-1","url":null,"abstract":"<p><p>We created a collectivism index to measure regional differences in China. The index uses eight Census indicators that reflect family living arrangements, marriage stability, innovation, and independence. The Census data offers large, nationally representative data, which ensures high-fidelity measurement and fine-grained geographic resolution from provinces down to prefectures (N = 356). The data also allows researchers to track change over time because the data stretches from 1982 to 2020. This dataset is useful for exploring causes of societal differences, outcomes of collectivism, and cultural shifts in longitudinal data.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
50-Years Inland Waterway Freight Data in the Rhine-Alpine Corridor. 莱茵-阿尔卑斯走廊50年内河货运数据。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06875-3
Bas Turpijn, Fedor Baart, Lóránt Tavasszy, Mark van Koningsveld

To support a modal shift toward sustainable freight solutions, such as inland waterway transport (IWT), researchers and practitioners require long-term historical data on IWT freight flows. However, such comprehensive time series have been unavailable until now. This study addresses this gap by presenting a harmonized dataset encompassing 50 years (1970-2023) of IWT freight data across Europe, with a focus on the Rhine-Alpine Corridor. The dataset includes transport volumes (in tonnes) and transport performance (in ton-kilometers), classified according to NST-R, NST2007, and CCR nomenclatures. To ensure data continuity and completeness, processing techniques-including imputation and optical character recognition-were applied. The dataset offers valuable insights for researchers, policymakers, and transport planners aiming to comprehend and enhance the role of IWT in Europe's freight transport landscape.

为了支持向可持续货运解决方案的模式转变,例如内河运输(IWT),研究人员和从业人员需要内河运输货流的长期历史数据。然而,到目前为止,还没有这样全面的时间序列。本研究通过提供一个统一的数据集来解决这一差距,该数据集包含了整个欧洲50年(1970-2023)的内河运输货运数据,重点是莱茵-阿尔卑斯走廊。该数据集包括运输量(以吨为单位)和运输绩效(以吨公里为单位),根据NST-R、NST2007和CCR命名进行分类。为了保证数据的连续性和完整性,采用了包括输入和光学字符识别在内的处理技术。该数据集为研究人员、政策制定者和运输规划者提供了有价值的见解,旨在理解和加强IWT在欧洲货运格局中的作用。
{"title":"50-Years Inland Waterway Freight Data in the Rhine-Alpine Corridor.","authors":"Bas Turpijn, Fedor Baart, Lóránt Tavasszy, Mark van Koningsveld","doi":"10.1038/s41597-026-06875-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06875-3","url":null,"abstract":"<p><p>To support a modal shift toward sustainable freight solutions, such as inland waterway transport (IWT), researchers and practitioners require long-term historical data on IWT freight flows. However, such comprehensive time series have been unavailable until now. This study addresses this gap by presenting a harmonized dataset encompassing 50 years (1970-2023) of IWT freight data across Europe, with a focus on the Rhine-Alpine Corridor. The dataset includes transport volumes (in tonnes) and transport performance (in ton-kilometers), classified according to NST-R, NST2007, and CCR nomenclatures. To ensure data continuity and completeness, processing techniques-including imputation and optical character recognition-were applied. The dataset offers valuable insights for researchers, policymakers, and transport planners aiming to comprehend and enhance the role of IWT in Europe's freight transport landscape.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global 0.05° Grid-Based Dataset of Keyhole Imagery with Spatio-Temporal Indicators (1960-1984). 带时空指标的全球0.05°网格锁孔图像数据集(1960-1984)。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06866-4
Tao Wang, Xinle Zhang, Mulin Shan, Mingyuan Deng, Jiaheng Wang, Huanjun Liu, Hao Li, Jinyu Sun

The American satellite reconnaissance program (Keyhole imagery) is serving as a significant data source for geoscience research because of its high-resolution and early temporal coverage, while lack of spatial and temporal description of its uneven distribution could hinder researchers from selecting/accessing appropriate the Keyhole images. Here we introduce a global grid-based dataset that organizes declassified U.S. Keyhole imagery (1960-1984) for direct reuse, built on a global equal-area sinusoidal grid. This dataset standardizes scene metadata and provides indicators designed to inform study design and data integration: coverage count (how often a place was imaged), unique acquisition dates (temporal sampling richness), first/last observation year (temporal bounds), observation span (duration), peak observation year and a three-year window (temporal concentration), resolution class (C1-C3), temporal-coverage class across five five-year intervals, and resolution-coverage class (A-G) for multi-scale availability. This dataset enables users to quickly locate usable scenes, assess temporal suitability, combine historical images with modern satellites, and determine which non-free images to purchase if free images were unsuitable for their research.

美国卫星侦察计划(Keyhole图像)由于其高分辨率和早期时间覆盖而成为地球科学研究的重要数据源,但缺乏对其不均匀分布的时空描述可能会阻碍研究人员选择/访问适当的Keyhole图像。在这里,我们介绍了一个基于全球网格的数据集,该数据集组织了解密的美国Keyhole图像(1960-1984),以供直接重用,建立在全球等面积正弦网格上。该数据集标准化了场景元数据,并提供了旨在为研究设计和数据集成提供信息的指标:覆盖计数(一个地方的成像频率)、唯一采集日期(时间采样丰富度)、第一/最后观测年(时间界限)、观测跨度(持续时间)、峰值观测年和三年窗口(时间浓度)、分辨率等级(C1-C3)、5个5年间隔的时间覆盖等级,以及多尺度可用性的分辨率覆盖等级(a - g)。该数据集使用户能够快速定位可用的场景,评估时间适用性,将历史图像与现代卫星相结合,并确定如果免费图像不适合他们的研究,则购买哪些非免费图像。
{"title":"Global 0.05° Grid-Based Dataset of Keyhole Imagery with Spatio-Temporal Indicators (1960-1984).","authors":"Tao Wang, Xinle Zhang, Mulin Shan, Mingyuan Deng, Jiaheng Wang, Huanjun Liu, Hao Li, Jinyu Sun","doi":"10.1038/s41597-026-06866-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06866-4","url":null,"abstract":"<p><p>The American satellite reconnaissance program (Keyhole imagery) is serving as a significant data source for geoscience research because of its high-resolution and early temporal coverage, while lack of spatial and temporal description of its uneven distribution could hinder researchers from selecting/accessing appropriate the Keyhole images. Here we introduce a global grid-based dataset that organizes declassified U.S. Keyhole imagery (1960-1984) for direct reuse, built on a global equal-area sinusoidal grid. This dataset standardizes scene metadata and provides indicators designed to inform study design and data integration: coverage count (how often a place was imaged), unique acquisition dates (temporal sampling richness), first/last observation year (temporal bounds), observation span (duration), peak observation year and a three-year window (temporal concentration), resolution class (C1-C3), temporal-coverage class across five five-year intervals, and resolution-coverage class (A-G) for multi-scale availability. This dataset enables users to quickly locate usable scenes, assess temporal suitability, combine historical images with modern satellites, and determine which non-free images to purchase if free images were unsuitable for their research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome sequencing and assembly of Neolissochilus pnar, the largest cavefish species of Mahseer. 马绍尔最大洞穴鱼类Neolissochilus pnar的基因组测序和组装。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06842-y
Vindhya Mohindra, Labrechai Mog Chowdhury, Dran Khlur Baiaineh Mukhim, Kangkan Sarma, Deisakee Pyrbot Warbah, Dandadhar Sarma, Joykrushna Jena

Neolissochilus pnar, identified as the world's largest cave fish, belongs to the family Cyprinidae and is endemic to one of India's biodiversity hotspots, specifically in the limestones caves of Meghalaya, India. This species is notably different from its closely related counterpart, Neolissochilus hexastichus, primarily in its lack of pigmentation and the absence or reduction of eyes. While juvenile N. pnar may have small or reduced eyes, adults exhibit a absence of external ocular features. Thus, genome sequence resources of this species would be an effective tool for bioprospecting and mining of novel genes responsible for the important traits. In this study, genome sequencing was done through long reads technology (PacBio) and high quality draft genome assembly, of 1.56 Gb in size with 1,423 contigs, N50 of 18.990 Mb was generated, which showed 99% (BUSCO) genome completenes. The genome assembly contains 44.30% repetitive elements, 1,416,376 SSRs, and 37,559 functionally annotated genes. Single-copy orthologs (SOGs) analysis indicated N. pnar to be in the same cluster with other cave dwelling Cyprinids used in the sudy.The extensive genomic information generated in present study would be a useful resource for understanding evolutionaly significance and genes governing the traits including the body colour and eye development in Mahseer species.

Neolissochilus pnar被认为是世界上最大的洞穴鱼,属于鲤科,是印度生物多样性热点地区之一的特有物种,特别是在印度梅加拉亚邦的石灰岩洞穴中。这个物种与它的近亲Neolissochilus hexastichus明显不同,主要是在它缺乏色素沉着和没有或减少眼睛。虽然幼鸟的眼睛可能小或缩小,但成年鸟却没有外部的眼部特征。因此,该物种的基因组序列资源将成为生物勘探和挖掘重要性状新基因的有效工具。本研究通过长读取技术(PacBio)和高质量的草图基因组组装完成基因组测序,全长1.56 Gb, 1423个contigs,生成N50为18.990 Mb,显示99% (BUSCO)的基因组完整度。该基因组包含44.30%的重复元件,1,416,376个ssr和37,559个功能注释基因。单拷贝同源物(SOGs)分析表明,nn . pnar与研究中使用的其他穴居鲤科动物属于同一类群。本研究所产生的广泛的基因组信息将为了解马西尔物种的进化意义和控制包括身体颜色和眼睛发育在内的性状的基因提供有用的资源。
{"title":"Genome sequencing and assembly of Neolissochilus pnar, the largest cavefish species of Mahseer.","authors":"Vindhya Mohindra, Labrechai Mog Chowdhury, Dran Khlur Baiaineh Mukhim, Kangkan Sarma, Deisakee Pyrbot Warbah, Dandadhar Sarma, Joykrushna Jena","doi":"10.1038/s41597-026-06842-y","DOIUrl":"https://doi.org/10.1038/s41597-026-06842-y","url":null,"abstract":"<p><p>Neolissochilus pnar, identified as the world's largest cave fish, belongs to the family Cyprinidae and is endemic to one of India's biodiversity hotspots, specifically in the limestones caves of Meghalaya, India. This species is notably different from its closely related counterpart, Neolissochilus hexastichus, primarily in its lack of pigmentation and the absence or reduction of eyes. While juvenile N. pnar may have small or reduced eyes, adults exhibit a absence of external ocular features. Thus, genome sequence resources of this species would be an effective tool for bioprospecting and mining of novel genes responsible for the important traits. In this study, genome sequencing was done through long reads technology (PacBio) and high quality draft genome assembly, of 1.56 Gb in size with 1,423 contigs, N<sub>50</sub> of 18.990 Mb was generated, which showed 99% (BUSCO) genome completenes. The genome assembly contains 44.30% repetitive elements, 1,416,376 SSRs, and 37,559 functionally annotated genes. Single-copy orthologs (SOGs) analysis indicated N. pnar to be in the same cluster with other cave dwelling Cyprinids used in the sudy.The extensive genomic information generated in present study would be a useful resource for understanding evolutionaly significance and genes governing the traits including the body colour and eye development in Mahseer species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Carbon footprint dataset of concrete based on field surveys at commercial mixing plants in Shandong, China. 基于山东商业搅拌站现场调查的混凝土碳足迹数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06789-0
Ditao Niu, Juan Zhou, Bingbing Guo

Carbon dioxide (CO2) emissions from concrete have grown rapidly, ranking second after the power sector. Current emission factors often overlook regional heterogeneity. To bridge this knowledge gap, this study takes Shandong Province, a typical region in China, as a case study. Considering the difference in geography, history, culture, and economic development, Shandong is divided into five subregions: Eastern, Western, Southern, Northern, and Central Shandong. This study developed a fundamental carbon footprint dataset of concrete by collecting 993 mix proportions of strength grades (C25-C60) from field surveys over the past five years. Statistical analysis showed that raw material dosages followed normal distributions (Kolmogorov-Smirnovtest, p > 0.05), while transportation distances and electricity consumption followed lognormal distributions. Based on statistical characteristics, a Monte Carlo simulation with 10,000 iterations was conducted to establish a stochastic model for carbon emissions accounting. Model performance was validated against survey data, achieving a mean absolute percentage error (MAPE) of 1.89% and a coefficient of determination (R²) of 0.9904. Sensitivity analysis identified cement dosage as the key driver of emissions.

混凝土的二氧化碳(CO2)排放量迅速增长,仅次于电力行业。目前的排放因子往往忽略了区域异质性。为了弥补这一知识缺口,本研究以中国典型地区山东省为研究对象。考虑到地理、历史、文化和经济发展的差异,山东被划分为五个分区:山东东部、西部、山东南部、山东北部和山东中部。本研究通过收集过去五年现场调查中993种强度等级(C25-C60)的配合比,开发了一个基本的混凝土碳足迹数据集。统计分析表明,原料用量服从正态分布(Kolmogorov-Smirnovtest, p > 0.05),运输距离和用电量服从对数正态分布。基于统计特性,进行1万次蒙特卡罗模拟,建立碳排放核算的随机模型。根据调查数据验证了模型的性能,平均绝对百分比误差(MAPE)为1.89%,决定系数(R²)为0.9904。敏感性分析表明水泥用量是影响排放的关键因素。
{"title":"Carbon footprint dataset of concrete based on field surveys at commercial mixing plants in Shandong, China.","authors":"Ditao Niu, Juan Zhou, Bingbing Guo","doi":"10.1038/s41597-026-06789-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06789-0","url":null,"abstract":"<p><p>Carbon dioxide (CO<sub>2</sub>) emissions from concrete have grown rapidly, ranking second after the power sector. Current emission factors often overlook regional heterogeneity. To bridge this knowledge gap, this study takes Shandong Province, a typical region in China, as a case study. Considering the difference in geography, history, culture, and economic development, Shandong is divided into five subregions: Eastern, Western, Southern, Northern, and Central Shandong. This study developed a fundamental carbon footprint dataset of concrete by collecting 993 mix proportions of strength grades (C25-C60) from field surveys over the past five years. Statistical analysis showed that raw material dosages followed normal distributions (Kolmogorov-Smirnovtest, p > 0.05), while transportation distances and electricity consumption followed lognormal distributions. Based on statistical characteristics, a Monte Carlo simulation with 10,000 iterations was conducted to establish a stochastic model for carbon emissions accounting. Model performance was validated against survey data, achieving a mean absolute percentage error (MAPE) of 1.89% and a coefficient of determination (R²) of 0.9904. Sensitivity analysis identified cement dosage as the key driver of emissions.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive re-assembly and annotation dataset for the argan tree (Argania spinosa L., Sapotaceae) genome. 意大利坚果树(Argania spinosa L.,槭树科)基因组的综合重组和注释数据集。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06596-7
Abdellah Idrissi Azami, Stacy Pirro, Nihal Habib, Turgay Unver, M Gonzalo Claros, Juan de Dios Alché, Sofia Sehli, Zainab El Ouafi, Douae El Ghoubali, Dalila Bousta, Najib Al Idrissi, Fatima Gaboun, Abderrazak Rfaki, Abdelkhalek Legsyer, Chakib Nejjari, Saaid Amzazi, Lahcen Belyamani, Abdelhamid El Mousadik, Hassan Ghazal

We present a comprehensive annotation dataset for the scaffold-level nuclear genome assembly of the argan tree (Argania spinosa (L.) Skeels, Sapotaceae). Using Illumina whole-genome shotgun reads that we previously generated for the "Argan Amghar" individual and deposited under BioProject PRJNA294096, together with the corresponding GenBank assembly GCA_003260245.2, we re-assembled and curated a 690 Mbp draft with scaffold N50 of 25 Mbp and L50 of 11 large macro-scaffolds. Ab initio gene prediction with AUGUSTUS and GeneMark-ES, integrated by EVidenceModeler, produced 51,078 protein-coding genes and 2,081 non-coding RNA genes, while repeat annotation covers 53.0% of the assembly. Functional annotation combined eggNOG-mapper, InterProScan and BLASTp searches against UniProtKB/Swiss-Prot to assign curated functions, domains and Gene Ontology terms to 32,785 genes and to support 25,484 proteins with UniProt evidence. BUSCO analyses indicate high completeness of the assembly gene space and completeness of the predicted proteome (74.6%). All primary data products, including a unified GFF3 file and the predicted proteome FASTA, are openly available via NCBI and Zenodo ( https://doi.org/10.5281/zenodo.17901083 ).

我们提出了一个完整的摩洛哥坚果树(Argania spinosa (L.))支架水平核基因组组装的注释数据集。斯基尔,山榄科)。使用我们之前为“Argan Amghar”个体生成并保存在BioProject PRJNA294096下的Illumina全基因组shotgun reads,以及相应的GenBank assembly GCA_003260245.2,我们重新组装并整理了一个690 Mbp的草图,其中支架N50为25 Mbp, L50为11个大型宏支架。使用AUGUSTUS和GeneMark-ES进行从头开始基因预测,并由EVidenceModeler集成,得到51,078个蛋白质编码基因和2,081个非编码RNA基因,重复注释覆盖53.0%的组装。功能注释结合了eggNOG-mapper, InterProScan和BLASTp对UniProtKB/Swiss-Prot的搜索,为32,785个基因分配了策划的功能,结构域和基因本体术语,并支持25,484个具有UniProt证据的蛋白质。BUSCO分析表明,组装基因空间和预测蛋白质组的完整性较高(74.6%)。所有主要数据产品,包括统一的GFF3文件和预测的蛋白质组FASTA,都可以通过NCBI和Zenodo (https://doi.org/10.5281/zenodo.17901083)公开获得。
{"title":"Comprehensive re-assembly and annotation dataset for the argan tree (Argania spinosa L., Sapotaceae) genome.","authors":"Abdellah Idrissi Azami, Stacy Pirro, Nihal Habib, Turgay Unver, M Gonzalo Claros, Juan de Dios Alché, Sofia Sehli, Zainab El Ouafi, Douae El Ghoubali, Dalila Bousta, Najib Al Idrissi, Fatima Gaboun, Abderrazak Rfaki, Abdelkhalek Legsyer, Chakib Nejjari, Saaid Amzazi, Lahcen Belyamani, Abdelhamid El Mousadik, Hassan Ghazal","doi":"10.1038/s41597-026-06596-7","DOIUrl":"10.1038/s41597-026-06596-7","url":null,"abstract":"<p><p>We present a comprehensive annotation dataset for the scaffold-level nuclear genome assembly of the argan tree (Argania spinosa (L.) Skeels, Sapotaceae). Using Illumina whole-genome shotgun reads that we previously generated for the \"Argan Amghar\" individual and deposited under BioProject PRJNA294096, together with the corresponding GenBank assembly GCA_003260245.2, we re-assembled and curated a 690 Mbp draft with scaffold N50 of 25 Mbp and L50 of 11 large macro-scaffolds. Ab initio gene prediction with AUGUSTUS and GeneMark-ES, integrated by EVidenceModeler, produced 51,078 protein-coding genes and 2,081 non-coding RNA genes, while repeat annotation covers 53.0% of the assembly. Functional annotation combined eggNOG-mapper, InterProScan and BLASTp searches against UniProtKB/Swiss-Prot to assign curated functions, domains and Gene Ontology terms to 32,785 genes and to support 25,484 proteins with UniProt evidence. BUSCO analyses indicate high completeness of the assembly gene space and completeness of the predicted proteome (74.6%). All primary data products, including a unified GFF3 file and the predicted proteome FASTA, are openly available via NCBI and Zenodo ( https://doi.org/10.5281/zenodo.17901083 ).</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":"267"},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mountain glacier extents at the Last Glacial Maximum. 末次盛冰期山地冰川扩大。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06841-z
Augusto C Lima, Helen E Dulfer, Anna L C Hughes, Martin Margold, Iestyn Barr, Benjamin J C Laabs, Suzette G A Flantua

Mountain regions experienced repeated glacial expansions and retreats during the Quaternary, shaping landscapes, ecosystems, and regional climates. While numerous reconstructions exist for individual mountain glaciers, global geodatabases remain scarce and rarely updated to reflect the latest observations. Here, we present GLACIMONTIS, a global geodatabase of maximum recorded areal extents of mountain glaciers at local Last Glacial Maximum, spanning 57-14 kyr BP. Our synthesis integrates reconstructions from 209 studies across 271 mountain ranges worldwide, compiling 15,014 individual glacier reconstructions, including 8,809 reconstructions compiled for the first time in a global geodatabase. Our work updates knowledge in 135 mountain ranges and highlights research gaps in 71 others. GLACIMONTIS represents the most comprehensive and up-to-date synthesis of mountain glacier areal extent at the global and local Last Glacial Maximum, providing spatial boundaries for refining climate-glacier modeling and delineating paleoecological reconstructions, and a framework for identifying regional research gaps. GLACIMONTIS advances Quaternary science by enhancing access to paleoglacier reconstructions and fostering interdisciplinary research in and across mountains worldwide.

在第四纪,山区经历了反复的冰川扩张和退缩,塑造了景观、生态系统和区域气候。虽然对单个山地冰川进行了大量重建,但全球地理数据库仍然很少更新,以反映最新的观测结果。在这里,我们提出了GLACIMONTIS,这是一个跨越57-14 kyr BP的区域性末次盛冰期山地冰川最大记录面积的全球地理数据库。我们的合成整合了来自全球271个山脉的209项研究的重建,汇编了15,014个单独的冰川重建,其中包括首次在全球地理数据库中编译的8,809个重建。我们的工作更新了135个山脉的知识,并突出了71个山脉的研究差距。GLACIMONTIS代表了全球和局部末次极冰期最全面和最新的山地冰川面积范围综合,为完善气候冰川模型和描绘古生态重建提供了空间边界,并为识别区域研究空白提供了框架。GLACIMONTIS通过加强对古冰川重建的获取和促进全球山区和跨山区的跨学科研究,推进第四纪科学。
{"title":"Mountain glacier extents at the Last Glacial Maximum.","authors":"Augusto C Lima, Helen E Dulfer, Anna L C Hughes, Martin Margold, Iestyn Barr, Benjamin J C Laabs, Suzette G A Flantua","doi":"10.1038/s41597-026-06841-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06841-z","url":null,"abstract":"<p><p>Mountain regions experienced repeated glacial expansions and retreats during the Quaternary, shaping landscapes, ecosystems, and regional climates. While numerous reconstructions exist for individual mountain glaciers, global geodatabases remain scarce and rarely updated to reflect the latest observations. Here, we present GLACIMONTIS, a global geodatabase of maximum recorded areal extents of mountain glaciers at local Last Glacial Maximum, spanning 57-14 kyr BP. Our synthesis integrates reconstructions from 209 studies across 271 mountain ranges worldwide, compiling 15,014 individual glacier reconstructions, including 8,809 reconstructions compiled for the first time in a global geodatabase. Our work updates knowledge in 135 mountain ranges and highlights research gaps in 71 others. GLACIMONTIS represents the most comprehensive and up-to-date synthesis of mountain glacier areal extent at the global and local Last Glacial Maximum, providing spatial boundaries for refining climate-glacier modeling and delineating paleoecological reconstructions, and a framework for identifying regional research gaps. GLACIMONTIS advances Quaternary science by enhancing access to paleoglacier reconstructions and fostering interdisciplinary research in and across mountains worldwide.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Atmospheric and oceanic data from a triangle-shaped moored array in the northern South China Sea during 2016. 2016年南海北部三角形系泊阵列大气和海洋数据。
IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-02-17 DOI: 10.1038/s41597-026-06801-7
Han Zhang, Qi Li, Dake Chen, Xiaodong Shang, Di Tian, Tongya Liu, Min He, Jiacheng Hong, Guofei Wei, Jian Liu

This work presents a triangle-shaped moored array dataset comprising three buoys and two moorings with synchronous atmospheric and oceanic data in the northern South China Sea during 2016. The moored array was deployed from late July to early August and recovered in October. The atmospheric data were observed by meteorological sensors and automatic meteorological stations ~2.5 m above sea surface at the buoys. The oceanic data consist of temperature and salinity measurements using conductivity, temperature, and depth (CTD) recorders or temperature sensors. It also includes currents observed by acoustic Doppler current profilers (ADCPs) and current meters. The data reveal air-sea interactions and oceanic processes in the upper and deep ocean. Multiscale processes were recorded, such as air-sea fluxes, tides, internal waves, and low-frequency flows and variations. The data are valuable and may have a lot of potential applications, including analyzing the phenomena and mechanisms of air-sea interactions and ocean dynamics as well as validating and improving numerical model simulations, data reanalysis, and data assimilation.

本文提出了一个包含三个浮标和两个系泊的三角形系泊阵列数据集,该数据集包含2016年南海北部的同步大气和海洋数据。系泊阵列从7月下旬部署到8月初,并于10月恢复。大气资料由气象传感器和浮标上距海面2.5 m的自动气象站观测。海洋数据包括使用电导率、温度和深度(CTD)记录仪或温度传感器测量的温度和盐度。它还包括由声学多普勒电流分析器(ADCPs)和电流计观察到的电流。这些数据揭示了上层和深海的海气相互作用和海洋过程。记录了海气通量、潮汐、内波、低频流及其变化等多尺度过程。这些数据在分析海气相互作用和海洋动力学的现象和机制、验证和改进数值模式模拟、数据再分析和数据同化等方面具有重要的应用价值。
{"title":"Atmospheric and oceanic data from a triangle-shaped moored array in the northern South China Sea during 2016.","authors":"Han Zhang, Qi Li, Dake Chen, Xiaodong Shang, Di Tian, Tongya Liu, Min He, Jiacheng Hong, Guofei Wei, Jian Liu","doi":"10.1038/s41597-026-06801-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06801-7","url":null,"abstract":"<p><p>This work presents a triangle-shaped moored array dataset comprising three buoys and two moorings with synchronous atmospheric and oceanic data in the northern South China Sea during 2016. The moored array was deployed from late July to early August and recovered in October. The atmospheric data were observed by meteorological sensors and automatic meteorological stations ~2.5 m above sea surface at the buoys. The oceanic data consist of temperature and salinity measurements using conductivity, temperature, and depth (CTD) recorders or temperature sensors. It also includes currents observed by acoustic Doppler current profilers (ADCPs) and current meters. The data reveal air-sea interactions and oceanic processes in the upper and deep ocean. Multiscale processes were recorded, such as air-sea fluxes, tides, internal waves, and low-frequency flows and variations. The data are valuable and may have a lot of potential applications, including analyzing the phenomena and mechanisms of air-sea interactions and ocean dynamics as well as validating and improving numerical model simulations, data reanalysis, and data assimilation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1