首页 > 最新文献

Scientific Data最新文献

英文 中文
An integrated dataset of spatiotemporal and event data in elite soccer.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-02-01 DOI: 10.1038/s41597-025-04505-y
Manuel Bassek, Robert Rein, Hendrik Weber, Daniel Memmert

Data-driven match analysis in soccer is a growing discipline in both research and practice. However, public data is scarce, which raises the barrier for entering this field and decreases reproducibility of methods and results. To bridge this gap, this paper presents a dataset of official match information, event, and position data from seven matches of the German Bundesliga's first and second division. The match information contains meta data about the matches and their participants. The event data contain timestamps along with descriptions of discrete events, like passes, shots, or fouls. The position data contain the x/y-coordinates of every player and the ball. By integrating multiple data modalities - i.e., event logs with timestamps, and x-y coordinates of player and ball positions - the dataset offers a multidimensional view of match dynamics. This dataset supports the validation of existing analytical techniques and facilitates the development of new methodologies in sports analytics. With availability under CC-BY 4.0, it promotes transparency, reproducibility, and the idea of open science in match analysis research.

{"title":"An integrated dataset of spatiotemporal and event data in elite soccer.","authors":"Manuel Bassek, Robert Rein, Hendrik Weber, Daniel Memmert","doi":"10.1038/s41597-025-04505-y","DOIUrl":"10.1038/s41597-025-04505-y","url":null,"abstract":"<p><p>Data-driven match analysis in soccer is a growing discipline in both research and practice. However, public data is scarce, which raises the barrier for entering this field and decreases reproducibility of methods and results. To bridge this gap, this paper presents a dataset of official match information, event, and position data from seven matches of the German Bundesliga's first and second division. The match information contains meta data about the matches and their participants. The event data contain timestamps along with descriptions of discrete events, like passes, shots, or fouls. The position data contain the x/y-coordinates of every player and the ball. By integrating multiple data modalities - i.e., event logs with timestamps, and x-y coordinates of player and ball positions - the dataset offers a multidimensional view of match dynamics. This dataset supports the validation of existing analytical techniques and facilitates the development of new methodologies in sports analytics. With availability under CC-BY 4.0, it promotes transparency, reproducibility, and the idea of open science in match analysis research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"195"},"PeriodicalIF":5.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-02-01 DOI: 10.1038/s41597-025-04382-5
Kumar Abhishek, Aditi Jain, Ghassan Hamarneh

The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition. In this paper, we conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our analysis pipeline and the accompanying code publicly available, we aim to encourage similar explorations and to facilitate the identification and addressing of potential data quality issues in other large datasets.

{"title":"Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets.","authors":"Kumar Abhishek, Aditi Jain, Ghassan Hamarneh","doi":"10.1038/s41597-025-04382-5","DOIUrl":"10.1038/s41597-025-04382-5","url":null,"abstract":"<p><p>The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition. In this paper, we conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our analysis pipeline and the accompanying code publicly available, we aim to encourage similar explorations and to facilitate the identification and addressing of potential data quality issues in other large datasets.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"196"},"PeriodicalIF":5.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787307/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-02-01 DOI: 10.1038/s41597-024-04196-x
Nan Yin, Junheng Liang, Xi Guo, Xue Jiang, Jie He, Xiaotong Zhang

Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challenges, we present a semi-automatic method and system for constructing heterogeneous material data schemas based on structure and context-aware recommendation. We propose a schema fragment tree structure to represent data schemas with hierarchical relationships, transforming the recommendation into subtree matching. Fragment index and semantic search techniques are introduced to identify candidate fragments, and a tree editing distance algorithm calculates similarity scores. Evaluated on the Data Schema Construction System, the algorithm outperforms baselines-TF-IDF and BM25 for schemas matching-in precision, recall, and F1-score. The baseline for reduced workload refers to the effort required to create schemas without recommendation. Our recommendation improves schema creation efficiency by 50.5% and reduces schema proliferation by 16.5%.

{"title":"Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation.","authors":"Nan Yin, Junheng Liang, Xi Guo, Xue Jiang, Jie He, Xiaotong Zhang","doi":"10.1038/s41597-024-04196-x","DOIUrl":"10.1038/s41597-024-04196-x","url":null,"abstract":"<p><p>Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challenges, we present a semi-automatic method and system for constructing heterogeneous material data schemas based on structure and context-aware recommendation. We propose a schema fragment tree structure to represent data schemas with hierarchical relationships, transforming the recommendation into subtree matching. Fragment index and semantic search techniques are introduced to identify candidate fragments, and a tree editing distance algorithm calculates similarity scores. Evaluated on the Data Schema Construction System, the algorithm outperforms baselines-TF-IDF and BM25 for schemas matching-in precision, recall, and F1-score. The baseline for reduced workload refers to the effort required to create schemas without recommendation. Our recommendation improves schema creation efficiency by 50.5% and reduces schema proliferation by 16.5%.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"190"},"PeriodicalIF":5.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787372/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality chromosome-level genome assembly of the mulberry looper, Phthonandria atrilineata.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-31 DOI: 10.1038/s41597-025-04509-8
De-Long Guan, Ying-Can Qin, Ya-Zhen Chen, Shi-Hao Zhang, Ji-Ping Liu, Hui-Yu Yi, Xiao-Dong Li

The mulberry looper (Phthonandria atrilineata), a geometrid moth, plays a pivotal role in the destruction of mulberry trees (Morus spp.). In China, P. atrilineata is the most significant insect pest to sericulture, as it feeds on mulberry leaves and spreads diseases. The outbreak trend of P. atrilineata has been expanding yearly, causing substantial economic losses. Despite its ecological and economic importance, knowledge about the genomic background of P. atrilineata remains limited. Here, we report a chromosome-level reference genome of P. atrilineata, with a total size of 336.55 Mb, containing 15,026 protein-coding genes and 39.72% repeat sequences. These findings have the potential to shed light on the genetic basis of the destructive nature and environmental adaptation of P. atrilineata, offering valuable genomic resources for understanding genome evolution and pest management within this Lepidopteran pest.

桑线虫(Phthonandria atrilineata)是一种尺蠖蛾,在桑树(Morus spp.)的毁坏中起着关键作用。在中国,桑环纹夜蛾是蚕桑业最主要的害虫,它以桑叶为食并传播疾病。该虫的爆发趋势逐年扩大,造成了巨大的经济损失。尽管 P. atrilineata 在生态和经济方面具有重要意义,但人们对其基因组背景的了解仍然有限。在此,我们报告了一个染色体水平的 P. atrilineata 参考基因组,总大小为 336.55 Mb,包含 15,026 个蛋白编码基因和 39.72% 的重复序列。这些发现有望揭示 P. atrilineata 的破坏性和环境适应性的遗传基础,为了解这种鳞翅目害虫的基因组进化和害虫管理提供宝贵的基因组资源。
{"title":"A high-quality chromosome-level genome assembly of the mulberry looper, Phthonandria atrilineata.","authors":"De-Long Guan, Ying-Can Qin, Ya-Zhen Chen, Shi-Hao Zhang, Ji-Ping Liu, Hui-Yu Yi, Xiao-Dong Li","doi":"10.1038/s41597-025-04509-8","DOIUrl":"10.1038/s41597-025-04509-8","url":null,"abstract":"<p><p>The mulberry looper (Phthonandria atrilineata), a geometrid moth, plays a pivotal role in the destruction of mulberry trees (Morus spp.). In China, P. atrilineata is the most significant insect pest to sericulture, as it feeds on mulberry leaves and spreads diseases. The outbreak trend of P. atrilineata has been expanding yearly, causing substantial economic losses. Despite its ecological and economic importance, knowledge about the genomic background of P. atrilineata remains limited. Here, we report a chromosome-level reference genome of P. atrilineata, with a total size of 336.55 Mb, containing 15,026 protein-coding genes and 39.72% repeat sequences. These findings have the potential to shed light on the genetic basis of the destructive nature and environmental adaptation of P. atrilineata, offering valuable genomic resources for understanding genome evolution and pest management within this Lepidopteran pest.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"186"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11785749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MALDI-ToF mass spectrometry database for identification and classification of highly pathogenic bacteria.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-31 DOI: 10.1038/s41597-025-04504-z
Peter Lasch, Wolfgang Beyer, Alejandra Bosch, Rainer Borriss, Michal Drevinek, Susann Dupke, Monika Ehling-Schulz, Xuewen Gao, Roland Grunow, Daniela Jacob, Silke R Klee, Armand Paauw, Jörg Rau, Andy Schneider, Holger C Scholz, Maren Stämmler, Le Thi Thanh Tam, Herbert Tomaso, Guido Werner, Joerg Doellinger

Today, MALDI-ToF MS is an established technique to characterize and identify pathogenic bacteria. The technique is increasingly applied by clinical microbiological laboratories that use commercially available complete solutions, including spectra databases covering clinically relevant bacteria. Such databases are validated for clinical, or research applications, but are often less comprehensive concerning highly pathogenic bacteria (HPB). To improve MALDI-ToF MS diagnostics of HPB we initiated a program to develop protocols for reliable and MALDI-compatible microbial inactivation and to acquire mass spectra thereof many years ago. As a result of this project, databases covering HPB, closely related bacteria, and bacteria of clinical relevance have been made publicly available on platforms such as ZENODO. This publication in detail describes the most recent version of this database. The dataset contains a total of 11,055 spectra from altogether 1,601 microbial strains and 264 species and is primarily intended to improve the diagnosis of HPB. We hope that our MALDI-ToF MS data may also be a valuable resource for developing machine learning-based bacterial identification and classification methods.

{"title":"A MALDI-ToF mass spectrometry database for identification and classification of highly pathogenic bacteria.","authors":"Peter Lasch, Wolfgang Beyer, Alejandra Bosch, Rainer Borriss, Michal Drevinek, Susann Dupke, Monika Ehling-Schulz, Xuewen Gao, Roland Grunow, Daniela Jacob, Silke R Klee, Armand Paauw, Jörg Rau, Andy Schneider, Holger C Scholz, Maren Stämmler, Le Thi Thanh Tam, Herbert Tomaso, Guido Werner, Joerg Doellinger","doi":"10.1038/s41597-025-04504-z","DOIUrl":"10.1038/s41597-025-04504-z","url":null,"abstract":"<p><p>Today, MALDI-ToF MS is an established technique to characterize and identify pathogenic bacteria. The technique is increasingly applied by clinical microbiological laboratories that use commercially available complete solutions, including spectra databases covering clinically relevant bacteria. Such databases are validated for clinical, or research applications, but are often less comprehensive concerning highly pathogenic bacteria (HPB). To improve MALDI-ToF MS diagnostics of HPB we initiated a program to develop protocols for reliable and MALDI-compatible microbial inactivation and to acquire mass spectra thereof many years ago. As a result of this project, databases covering HPB, closely related bacteria, and bacteria of clinical relevance have been made publicly available on platforms such as ZENODO. This publication in detail describes the most recent version of this database. The dataset contains a total of 11,055 spectra from altogether 1,601 microbial strains and 264 species and is primarily intended to improve the diagnosis of HPB. We hope that our MALDI-ToF MS data may also be a valuable resource for developing machine learning-based bacterial identification and classification methods.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"187"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11785946/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous Dataset of Brain, Eye and Hand during Visuomotor Tasks.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-31 DOI: 10.1038/s41597-024-04227-7
Hao Zhang, Yiqing Hu, Yang Li, Shuangyu Zhang, XiaoLi Li, Chenguang Zhao

Visuomotor integration is a complex skill set encompassing many fundamental abilities, such as visual search, attention monitoring, and motor control. To explore the dynamic interplay between visual inputs and motor outputs, it is necessary to simultaneously record multiple brain activities with high temporal and spatial resolution, as well as to record implicit and explicit behaviors. However, there is a lack of public datasets that provide simultaneous multiple modalities during a visual-motor task. Functional near-infrared spectroscopy and electroencephalography to record brain activity simultaneously facilitate more precise capture of the complex visuomotor of brain mechanisms. Additionally, by employing a combined eye movement and manual response, it is possible to fully evaluate the effects of visuomotor outputs from implicit and explicit dimensions. We recorded whole-brain EEG (34 electrodes) and fNIRS (44 channels) covering the frontal and parietal cortex along with eye movements, behavior sampling, and operant behavior. The dataset underwent rigorous synchronization, quality control to highlight the effectiveness of our experiments and to demonstrate the high quality of our multimodal data framework.

{"title":"Simultaneous Dataset of Brain, Eye and Hand during Visuomotor Tasks.","authors":"Hao Zhang, Yiqing Hu, Yang Li, Shuangyu Zhang, XiaoLi Li, Chenguang Zhao","doi":"10.1038/s41597-024-04227-7","DOIUrl":"10.1038/s41597-024-04227-7","url":null,"abstract":"<p><p>Visuomotor integration is a complex skill set encompassing many fundamental abilities, such as visual search, attention monitoring, and motor control. To explore the dynamic interplay between visual inputs and motor outputs, it is necessary to simultaneously record multiple brain activities with high temporal and spatial resolution, as well as to record implicit and explicit behaviors. However, there is a lack of public datasets that provide simultaneous multiple modalities during a visual-motor task. Functional near-infrared spectroscopy and electroencephalography to record brain activity simultaneously facilitate more precise capture of the complex visuomotor of brain mechanisms. Additionally, by employing a combined eye movement and manual response, it is possible to fully evaluate the effects of visuomotor outputs from implicit and explicit dimensions. We recorded whole-brain EEG (34 electrodes) and fNIRS (44 channels) covering the frontal and parietal cortex along with eye movements, behavior sampling, and operant behavior. The dataset underwent rigorous synchronization, quality control to highlight the effectiveness of our experiments and to demonstrate the high quality of our multimodal data framework.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"189"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11785794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring China's Policy Stringency on Climate Change for 1954-2022.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-31 DOI: 10.1038/s41597-025-04476-0
Bo Li, Enxian Fu, Shuhao Yang, Jiaying Lin, Wei Zhang, Jian Zhang, Yaling Lu, Jiantong Wang, Hongqiang Jiang

Efforts on climate change have demonstrated tangible impacts through various actions and policies. However, a significant knowledge gap remains: comparing the stringency of climate change policies over time or across jurisdictions is challenging due to ambiguous definitions, the lack of a unified assessment framework, complex causal effects, and the difficulty in achieving effective measurement. Furthermore, China's climate governance is expected to address multiple objectives by integrating main effects and side effects, to achieve synergies that encompass environmental, economic, and social impacts. This paper employs an integrated framework comprising lexicon, text analysis, machine learning, and large-language model applied to multi-source data to quantify China's policy stringency on climate change (PSCC) from 1954 to 2022. To achieve effective, robust, and explainable measurement, Chain-of-Thought and SHAP analysis are integrated into the framework. By framing the PSCC on varied sub-dimensions covering mitigation, adaptation, implementation, and spatial difference, this dataset maps the government's varied stringency on climate change and can be used as a robust variable to support a series of downstream causal analysis.

{"title":"Measuring China's Policy Stringency on Climate Change for 1954-2022.","authors":"Bo Li, Enxian Fu, Shuhao Yang, Jiaying Lin, Wei Zhang, Jian Zhang, Yaling Lu, Jiantong Wang, Hongqiang Jiang","doi":"10.1038/s41597-025-04476-0","DOIUrl":"10.1038/s41597-025-04476-0","url":null,"abstract":"<p><p>Efforts on climate change have demonstrated tangible impacts through various actions and policies. However, a significant knowledge gap remains: comparing the stringency of climate change policies over time or across jurisdictions is challenging due to ambiguous definitions, the lack of a unified assessment framework, complex causal effects, and the difficulty in achieving effective measurement. Furthermore, China's climate governance is expected to address multiple objectives by integrating main effects and side effects, to achieve synergies that encompass environmental, economic, and social impacts. This paper employs an integrated framework comprising lexicon, text analysis, machine learning, and large-language model applied to multi-source data to quantify China's policy stringency on climate change (PSCC) from 1954 to 2022. To achieve effective, robust, and explainable measurement, Chain-of-Thought and SHAP analysis are integrated into the framework. By framing the PSCC on varied sub-dimensions covering mitigation, adaptation, implementation, and spatial difference, this dataset maps the government's varied stringency on climate change and can be used as a robust variable to support a series of downstream causal analysis.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"188"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11785789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143075254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Downscaled gridded global dataset for gross domestic product (GDP) per capita PPP over 1990-2022.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-30 DOI: 10.1038/s41597-025-04487-x
Matti Kummu, Maria Kosonen, Sina Masoumzadeh Sayyar

We present a comprehensive gridded GDP per capita dataset downscaled to the admin 2 level (43,501 units) covering 1990-2022. It updates existing outdated datasets, which use reported subnational data only up to 2010. Our dataset, which is based on reported subnational GDP per capita data from 89 countries and 2,708 administrative units, employs various novel methods for extrapolation and downscaling. Downscaling with machine learning algorithms showed high performance (R2 = 0.79 for cross-validation, R2 = 0.80 for the test dataset) and accuracy against reported datasets (Pearson R = 0.88). The dataset includes reported and downscaled annual data (1990-2022) for three administrative levels: 0 (national; reported data for 237 administrative units), 1 (provincial; reported data for 2,708 administrative units for 89 countries), and 2 (municipality; downscaled data for 43,501 administrative units). The dataset has a higher spatial resolution and wider temporal range than the existing data do and will thus contribute to global or regional spatial analyses such as socioenvironmental modelling and economic resilience evaluation. The data are available at https://doi.org/10.5281/zenodo.10976733 .

{"title":"Downscaled gridded global dataset for gross domestic product (GDP) per capita PPP over 1990-2022.","authors":"Matti Kummu, Maria Kosonen, Sina Masoumzadeh Sayyar","doi":"10.1038/s41597-025-04487-x","DOIUrl":"10.1038/s41597-025-04487-x","url":null,"abstract":"<p><p>We present a comprehensive gridded GDP per capita dataset downscaled to the admin 2 level (43,501 units) covering 1990-2022. It updates existing outdated datasets, which use reported subnational data only up to 2010. Our dataset, which is based on reported subnational GDP per capita data from 89 countries and 2,708 administrative units, employs various novel methods for extrapolation and downscaling. Downscaling with machine learning algorithms showed high performance (R<sup>2</sup> = 0.79 for cross-validation, R<sup>2</sup> = 0.80 for the test dataset) and accuracy against reported datasets (Pearson R = 0.88). The dataset includes reported and downscaled annual data (1990-2022) for three administrative levels: 0 (national; reported data for 237 administrative units), 1 (provincial; reported data for 2,708 administrative units for 89 countries), and 2 (municipality; downscaled data for 43,501 administrative units). The dataset has a higher spatial resolution and wider temporal range than the existing data do and will thus contribute to global or regional spatial analyses such as socioenvironmental modelling and economic resilience evaluation. The data are available at https://doi.org/10.5281/zenodo.10976733 .</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"178"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782586/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143067724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-fidelity annotated triploid genome of the quarantine root-knot nematode, Meloidogyne enterolobii.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-30 DOI: 10.1038/s41597-025-04434-w
Marine Poullet, Hemanth Konigopal, Corinne Rancurel, Marine Sallaberry, Celine Lopez-Roques, Ana Paula Zotta Mota, Joanna Lledo, Sebastian Kiewnick, Etienne G J Danchin

Root-knot nematodes (RKN) of the genus Meloidogyne are obligatory plant endoparasites that cause substantial economic losses to agricultural production and impact the global food supply. These plant parasitic nematodes belong to the most widespread and devastating genus worldwide, yet few measures of control are available. The most efficient way to control RKN is deployment of resistance genes in plants. However, current resistance genes that control other Meloidogyne species are mostly inefficient on Meloidogyne enterolobii. Consequently, M. enterolobii was listed as a European Union quarantine pest requiring regulation. To gain insight into the molecular characteristics underlying its parasitic success, exploring the genome of M. enterolobii is essential. Here, we report a high-quality genome assembly of M. enterolobii using the high-fidelity long-read sequencing technology developed by Pacific Biosciences, combined with a gap-aware sequence transformer, DeepConsensus. The resulting triploid genome assembly spans 285.4 Mb with 556 contigs, a GC% of 30 ± 0.042 and an N50 value of 2.11 Mb, constituting a useful platform for comparative, population and functional genomics.

{"title":"High-fidelity annotated triploid genome of the quarantine root-knot nematode, Meloidogyne enterolobii.","authors":"Marine Poullet, Hemanth Konigopal, Corinne Rancurel, Marine Sallaberry, Celine Lopez-Roques, Ana Paula Zotta Mota, Joanna Lledo, Sebastian Kiewnick, Etienne G J Danchin","doi":"10.1038/s41597-025-04434-w","DOIUrl":"10.1038/s41597-025-04434-w","url":null,"abstract":"<p><p>Root-knot nematodes (RKN) of the genus Meloidogyne are obligatory plant endoparasites that cause substantial economic losses to agricultural production and impact the global food supply. These plant parasitic nematodes belong to the most widespread and devastating genus worldwide, yet few measures of control are available. The most efficient way to control RKN is deployment of resistance genes in plants. However, current resistance genes that control other Meloidogyne species are mostly inefficient on Meloidogyne enterolobii. Consequently, M. enterolobii was listed as a European Union quarantine pest requiring regulation. To gain insight into the molecular characteristics underlying its parasitic success, exploring the genome of M. enterolobii is essential. Here, we report a high-quality genome assembly of M. enterolobii using the high-fidelity long-read sequencing technology developed by Pacific Biosciences, combined with a gap-aware sequence transformer, DeepConsensus. The resulting triploid genome assembly spans 285.4 Mb with 556 contigs, a GC% of 30 ± 0.042 and an N50 value of 2.11 Mb, constituting a useful platform for comparative, population and functional genomics.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"184"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782629/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143067741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retraction Note: Genome assembly and annotation of Meloidogyne enterolobii, an emerging parthenogenetic root-knot nematode.
IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-30 DOI: 10.1038/s41597-025-04446-6
Georgios D Koutsovoulos, Marine Poullet, Abdelnaser Elashry, Djampa K L Kozlowski, Erika Sallet, Martine Da Rocha, Laetitia Perfus-Barbeoch, Cristina Martin-Jimenez, Juerg Ernst Frey, Christian H Ahrens, Sebastian Kiewnick, Etienne G J Danchin
{"title":"Retraction Note: Genome assembly and annotation of Meloidogyne enterolobii, an emerging parthenogenetic root-knot nematode.","authors":"Georgios D Koutsovoulos, Marine Poullet, Abdelnaser Elashry, Djampa K L Kozlowski, Erika Sallet, Martine Da Rocha, Laetitia Perfus-Barbeoch, Cristina Martin-Jimenez, Juerg Ernst Frey, Christian H Ahrens, Sebastian Kiewnick, Etienne G J Danchin","doi":"10.1038/s41597-025-04446-6","DOIUrl":"10.1038/s41597-025-04446-6","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"183"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11782473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143067746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Scientific Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1