Fungi from the Pyricularia genus cause blast disease in many economically important crops and grasses, such as wheat, rice, and Cenchrus grass JUJUNCAO. Structure variation associated with the gain and loss of effectors contributes largely to the adaptive evolution of this fungus towards diverse host plants. A telomere-to-telomere genome assembly would facilitate the identification of genome-wide structural variations through comparative genomics. Here, we report a telomere-to-telomere, near-complete genome assembly of a Pyricularia penniseti isolate JC-1 infecting JUJUNCAO. The assembly consists of eight core chromosomes and two supernumerary chromosomes, named mini1 and mini2, spanning 42.1 Mb. We annotated 12,156 protein-coding genes and identified 4.54% of the genome as repetitive sequences. The two supernumerary chromosomes contained fewer genes and more repetitive sequences than the core chromosomes. Our genome and results provide valuable resources for the future study in genome evolution, structure variation and host adaptation of the Pyricularia fungus.
{"title":"Near complete assembly of Pyricularia penniseti infecting Cenchrus grass identified its eight core chromosomes.","authors":"Yuyong Li, Xianjun Wang, Jianqiang Huang, Zhenyu Fang, Xiwen Lian, Guodong Lu, Guifang Lin, Zonghua Wang, Baohua Wang, Xiuxiu Li, Huakun Zheng","doi":"10.1038/s41597-024-04035-z","DOIUrl":"10.1038/s41597-024-04035-z","url":null,"abstract":"<p><p>Fungi from the Pyricularia genus cause blast disease in many economically important crops and grasses, such as wheat, rice, and Cenchrus grass JUJUNCAO. Structure variation associated with the gain and loss of effectors contributes largely to the adaptive evolution of this fungus towards diverse host plants. A telomere-to-telomere genome assembly would facilitate the identification of genome-wide structural variations through comparative genomics. Here, we report a telomere-to-telomere, near-complete genome assembly of a Pyricularia penniseti isolate JC-1 infecting JUJUNCAO. The assembly consists of eight core chromosomes and two supernumerary chromosomes, named mini1 and mini2, spanning 42.1 Mb. We annotated 12,156 protein-coding genes and identified 4.54% of the genome as repetitive sequences. The two supernumerary chromosomes contained fewer genes and more repetitive sequences than the core chromosomes. Our genome and results provide valuable resources for the future study in genome evolution, structure variation and host adaptation of the Pyricularia fungus.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1186"},"PeriodicalIF":5.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142558695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1038/s41597-024-03954-1
Gijs de Boer, Radiance Calmer, Gina Jozef, John J Cassano, Jonathan Hamilton, Dale Lawrence, Steven Borenstein, Abhiram Doddi, Christopher Cox, Julia Schmale, Andreas Preußer, Brian Argrow
{"title":"Publisher Correction: Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems.","authors":"Gijs de Boer, Radiance Calmer, Gina Jozef, John J Cassano, Jonathan Hamilton, Dale Lawrence, Steven Borenstein, Abhiram Doddi, Christopher Cox, Julia Schmale, Andreas Preußer, Brian Argrow","doi":"10.1038/s41597-024-03954-1","DOIUrl":"10.1038/s41597-024-03954-1","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1188"},"PeriodicalIF":5.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142558696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1038/s41597-024-04038-w
Gabrielle Burke, Pat Wongpan, Delphine Lannuzel, Hakase Hayashida
Dimethylsulfide (DMS) is a climatically active volatile sulfur compound found in Earth's oceans and atmosphere that plays an important role in cloud formation. DMS originates from its precursor dimethylsulfoniopropionate (DMSP), which is produced by several classes of phytoplankton. Concentrations of DMS and DMSP in Antarctic sea ice, snow and underlying seawater are not well documented and there is currently no dataset available to find the existing data. The purpose of this project was to compile historical measurements into a publicly available dataset. A total of 220 samples collected since 1992 were compiled using the Antarctic Sea ice Processes and Climate program template, in accordance with the existing datasets for chlorophyll-a, macronutrients, and dissolved iron. Analyses performed on the completed DMS dataset showed that the spatial and temporal coverages are limited; there are barely any measurements in autumn and winter, nor in the Amundsen or Ross seas. These findings provide a basis for future sampling efforts in the Antarctic region.
{"title":"Data collation for climate-cooling gas dimethylsulfide in Antarctic snow, sea ice and underlying seawater.","authors":"Gabrielle Burke, Pat Wongpan, Delphine Lannuzel, Hakase Hayashida","doi":"10.1038/s41597-024-04038-w","DOIUrl":"10.1038/s41597-024-04038-w","url":null,"abstract":"<p><p>Dimethylsulfide (DMS) is a climatically active volatile sulfur compound found in Earth's oceans and atmosphere that plays an important role in cloud formation. DMS originates from its precursor dimethylsulfoniopropionate (DMSP), which is produced by several classes of phytoplankton. Concentrations of DMS and DMSP in Antarctic sea ice, snow and underlying seawater are not well documented and there is currently no dataset available to find the existing data. The purpose of this project was to compile historical measurements into a publicly available dataset. A total of 220 samples collected since 1992 were compiled using the Antarctic Sea ice Processes and Climate program template, in accordance with the existing datasets for chlorophyll-a, macronutrients, and dissolved iron. Analyses performed on the completed DMS dataset showed that the spatial and temporal coverages are limited; there are barely any measurements in autumn and winter, nor in the Amundsen or Ross seas. These findings provide a basis for future sampling efforts in the Antarctic region.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1185"},"PeriodicalIF":5.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142558694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Consisting of trees, climbers and herbs exclusively in the intertidal environments, mangrove forest is one of the most extreme and vulnerable ecosystems of our planet and has long been of great interest for biologists and ecologists. Here, we first assembled the chromosome-scale genome of a climber mangrove plant, Dalbergia candenatensis. The assembled genome size is approximately 474.55 Mb, with a scaffold N50 of 48.1 Mb, a complete BUSCO score of 98.4%, and a high LTR Assembly Index value of 21. The genome contained 283.46 Mb (59.74%) repetitive sequences, and 29,554 protein-coding genes were predicted, of which 87.54% were functionally annotated in five databases. The high-quality genome assembly and annotation presented herein provide a valuable genomic resource that will expedite genomic and evolutionary studies of mangrove plants and facilitate the elucidation of molecular mechanisms underlying the salt- and water-logging-tolerance of mangrove plants.
{"title":"Chromosome-scale genome assembly of the mangrove climber species Dalbergia candenatensis.","authors":"Miaomiao Shi, Yu Zhang, Huiwen Huang, Shiran Gu, Xiangping Wang, Shijin Li, Zhongtao Zhao, Tieyao Tu","doi":"10.1038/s41597-024-04032-2","DOIUrl":"10.1038/s41597-024-04032-2","url":null,"abstract":"<p><p>Consisting of trees, climbers and herbs exclusively in the intertidal environments, mangrove forest is one of the most extreme and vulnerable ecosystems of our planet and has long been of great interest for biologists and ecologists. Here, we first assembled the chromosome-scale genome of a climber mangrove plant, Dalbergia candenatensis. The assembled genome size is approximately 474.55 Mb, with a scaffold N50 of 48.1 Mb, a complete BUSCO score of 98.4%, and a high LTR Assembly Index value of 21. The genome contained 283.46 Mb (59.74%) repetitive sequences, and 29,554 protein-coding genes were predicted, of which 87.54% were functionally annotated in five databases. The high-quality genome assembly and annotation presented herein provide a valuable genomic resource that will expedite genomic and evolutionary studies of mangrove plants and facilitate the elucidation of molecular mechanisms underlying the salt- and water-logging-tolerance of mangrove plants.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1187"},"PeriodicalIF":5.8,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142558693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-04004-6
Konrad Pomianowski, Ewa Kulczykowska, Artur Burzyński
Although the European flounder is frequently used in research and has economic importance, there is still lack of comprehensive transcriptome data for this species. In the present research we show RNA-Seq data from ten selected organs of P. flesus female inhabiting brackish waters of the Gulf of Gdańsk (southern Baltic Sea). High throughput Next Generation Sequencing technology NovaSeq 6000 was used to generate 500 M sequencing reads. These were mapped against European flounder reference genome and reads extracted from the mapping were assembled producing 61k reliable contigs. Gene ontology (GO) terms were assigned to the majority of annotated contigs/unigenes based on the results of PFAM, PANTHER, UniProt and InterPro protein databases searches. BUSCOs statistics for eukaryota, metazoa, vertebrata and actinopterygii databases showed that the reported transcriptome represents a high level of completeness. The data set can be successfully used as a tool in design of experiments from various research fields including biology, aquaculture and toxicology.
{"title":"Genome guided, organ-specific transcriptome assembly of the European flounder (P. flesus) from the Baltic Sea.","authors":"Konrad Pomianowski, Ewa Kulczykowska, Artur Burzyński","doi":"10.1038/s41597-024-04004-6","DOIUrl":"10.1038/s41597-024-04004-6","url":null,"abstract":"<p><p>Although the European flounder is frequently used in research and has economic importance, there is still lack of comprehensive transcriptome data for this species. In the present research we show RNA-Seq data from ten selected organs of P. flesus female inhabiting brackish waters of the Gulf of Gdańsk (southern Baltic Sea). High throughput Next Generation Sequencing technology NovaSeq 6000 was used to generate 500 M sequencing reads. These were mapped against European flounder reference genome and reads extracted from the mapping were assembled producing 61k reliable contigs. Gene ontology (GO) terms were assigned to the majority of annotated contigs/unigenes based on the results of PFAM, PANTHER, UniProt and InterPro protein databases searches. BUSCOs statistics for eukaryota, metazoa, vertebrata and actinopterygii databases showed that the reported transcriptome represents a high level of completeness. The data set can be successfully used as a tool in design of experiments from various research fields including biology, aquaculture and toxicology.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1184"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525550/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-04002-8
Jose A Miranda Calero, Laura Gutiérrez-Martín, Esther Rituerto-González, Elena Romero-Perales, Jose M Lanza-Gutiérrez, Carmen Peláez-Moreno, Celia López-Ongil
WEMAC is a unique open multi-modal dataset that comprises physiological, speech, and self-reported emotional data records of 100 women, targeting Gender-based Violence detection. Emotions were elicited through visualizing a validated video set using an immersive virtual reality headset. The physiological signals captured during the experiment include blood volume pulse, galvanic skin response, and skin temperature. The speech was acquired right after the stimuli visualization to capture the final traces of the perceived emotion. Subjects were asked to annotate among 12 categorical emotions, several dimensional emotions with a modified version of the Self-Assessment Manikin, and liking and familiarity labels. The technical validation proves that all the targeted categorical emotions show a strong statistically significant positive correlation with their corresponding reported ones. That means that the videos elicit the desired emotions in the users in most cases. Specifically, a negative correlation is found when comparing fear and not-fear emotions, indicating that this is a well-portrayed emotional dimension, a specific, though not exclusive, purpose of WEMAC towards detecting gender violence.
{"title":"WEMAC: Women and Emotion Multi-modal Affective Computing dataset.","authors":"Jose A Miranda Calero, Laura Gutiérrez-Martín, Esther Rituerto-González, Elena Romero-Perales, Jose M Lanza-Gutiérrez, Carmen Peláez-Moreno, Celia López-Ongil","doi":"10.1038/s41597-024-04002-8","DOIUrl":"10.1038/s41597-024-04002-8","url":null,"abstract":"<p><p>WEMAC is a unique open multi-modal dataset that comprises physiological, speech, and self-reported emotional data records of 100 women, targeting Gender-based Violence detection. Emotions were elicited through visualizing a validated video set using an immersive virtual reality headset. The physiological signals captured during the experiment include blood volume pulse, galvanic skin response, and skin temperature. The speech was acquired right after the stimuli visualization to capture the final traces of the perceived emotion. Subjects were asked to annotate among 12 categorical emotions, several dimensional emotions with a modified version of the Self-Assessment Manikin, and liking and familiarity labels. The technical validation proves that all the targeted categorical emotions show a strong statistically significant positive correlation with their corresponding reported ones. That means that the videos elicit the desired emotions in the users in most cases. Specifically, a negative correlation is found when comparing fear and not-fear emotions, indicating that this is a well-portrayed emotional dimension, a specific, though not exclusive, purpose of WEMAC towards detecting gender violence.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1182"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525988/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-03993-8
Thomas De Kerf, Seppe Sels, Svetlana Samsonova, Steve Vanlanduit
The high incidence of oil spills in port areas poses a serious threat to the environment, prompting the need for efficient detection mechanisms. Utilizing automated drones for this purpose can significantly improve the speed and accuracy of oil spill detection. Such advancements not only expedite cleanup operations, reducing environmental harm but also enhance polluter accountability, potentially deterring future incidents. Currently, there's a scarcity of datasets employing RGB images for oil spill detection in maritime settings. This paper presents a unique, annotated dataset aimed at addressing this gap, leveraging a neural network for analysis on both desktop and edge computing platforms. The dataset, captured via drone, comprises 1268 images categorized into oil, water, and other, with a convolutional neural network trained using an Unet model architecture achieving an F1 score of 0.71 for oil detection. This underscores the dataset's practicality for real-world applications, offering crucial resources for environmental conservation in port environments.
港口地区漏油事件频发,对环境构成严重威胁,因此需要高效的检测机制。为此,利用自动无人机可以大大提高溢油检测的速度和准确性。这种进步不仅能加快清理行动,减少对环境的危害,还能加强对污染者的问责,有可能阻止未来事件的发生。目前,采用 RGB 图像进行海上溢油检测的数据集非常稀少。本文介绍了一个独特的、带有注释的数据集,旨在利用神经网络在桌面和边缘计算平台上进行分析,从而弥补这一空白。该数据集通过无人机捕获,包含 1268 张图像,分为油、水和其他类别,使用 Unet 模型架构训练的卷积神经网络在油类检测方面的 F1 得分为 0.71。这凸显了该数据集在实际应用中的实用性,为港口环境的环境保护提供了重要资源。
{"title":"A dataset of drone-captured, segmented images for oil spill detection in port environments.","authors":"Thomas De Kerf, Seppe Sels, Svetlana Samsonova, Steve Vanlanduit","doi":"10.1038/s41597-024-03993-8","DOIUrl":"10.1038/s41597-024-03993-8","url":null,"abstract":"<p><p>The high incidence of oil spills in port areas poses a serious threat to the environment, prompting the need for efficient detection mechanisms. Utilizing automated drones for this purpose can significantly improve the speed and accuracy of oil spill detection. Such advancements not only expedite cleanup operations, reducing environmental harm but also enhance polluter accountability, potentially deterring future incidents. Currently, there's a scarcity of datasets employing RGB images for oil spill detection in maritime settings. This paper presents a unique, annotated dataset aimed at addressing this gap, leveraging a neural network for analysis on both desktop and edge computing platforms. The dataset, captured via drone, comprises 1268 images categorized into oil, water, and other, with a convolutional neural network trained using an Unet model architecture achieving an F1 score of 0.71 for oil detection. This underscores the dataset's practicality for real-world applications, offering crucial resources for environmental conservation in port environments.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1180"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-03897-7
Morteza Akbari, Hamid-Reza Pourreza, Elias Khalili Pour, Afsar Dastjani Farahani, Fatemeh Bazvand, Nazanin Ebrahimiadib, Marjan Imani Fooladi, Fereshteh Ramazani K
Retinopathy of Prematurity (ROP) is a critical eye disorder affecting premature infants, characterized by abnormal blood vessel development in the retina. Plus Disease, indicating severe ROP progression, plays a pivotal role in diagnosis. Recent advancements in Artificial Intelligence (AI) have shown parity with or surpass human experts in ROP detection, especially Plus Disease. However, the success of AI systems depends on high-quality datasets, emphasizing the need for collaboration and data sharing among researchers. To address this challenge, the paper introduces a new public dataset, FARFUM-RoP (Farabi and Ferdowsi University of Mashhad's ROP dataset), comprising 1533 ROP fundus images from 68 patients, annotated independently by five experienced childhood ophthalmologists as "Normal," "Pre-Plus," or "Plus." Ethical principles and consent were meticulously followed during data collection. The paper presents the dataset structure, patient details, and expert labels.
{"title":"FARFUM-RoP, A dataset for computer-aided detection of Retinopathy of Prematurity.","authors":"Morteza Akbari, Hamid-Reza Pourreza, Elias Khalili Pour, Afsar Dastjani Farahani, Fatemeh Bazvand, Nazanin Ebrahimiadib, Marjan Imani Fooladi, Fereshteh Ramazani K","doi":"10.1038/s41597-024-03897-7","DOIUrl":"10.1038/s41597-024-03897-7","url":null,"abstract":"<p><p>Retinopathy of Prematurity (ROP) is a critical eye disorder affecting premature infants, characterized by abnormal blood vessel development in the retina. Plus Disease, indicating severe ROP progression, plays a pivotal role in diagnosis. Recent advancements in Artificial Intelligence (AI) have shown parity with or surpass human experts in ROP detection, especially Plus Disease. However, the success of AI systems depends on high-quality datasets, emphasizing the need for collaboration and data sharing among researchers. To address this challenge, the paper introduces a new public dataset, FARFUM-RoP (Farabi and Ferdowsi University of Mashhad's ROP dataset), comprising 1533 ROP fundus images from 68 patients, annotated independently by five experienced childhood ophthalmologists as \"Normal,\" \"Pre-Plus,\" or \"Plus.\" Ethical principles and consent were meticulously followed during data collection. The paper presents the dataset structure, patient details, and expert labels.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1176"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525552/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ancient Yi script has been used for over 8000 years, which can be ranked with Oracle,Sumerian,Egyptian,Mayan and Harappan,and is one of the six ancient scripts in the world. In this article, we collected 2922 handwritten single word samples of commonly used ancient Yi characters. Each character was written by 310 people respectively, with a total of 427,939 valid characters. We completed continuous handwritten text sampling, written by 250 people, with 5 texts per person, covering topics such as Yi astronomy, geography, rituals, and agriculture. In the process of data collection, we proposed an automatic sampling method for ancient Yi script, and completed the automatic cutting and labeling of handwritten samples. Furthermore, we tested the recognition performance of the sorted data set under different deep learning network models. The results show that ancient Yi script has diverse shape structures and rich writing styles, which can be used as a benchmark data set in related fields such as handwritten text recognition and handwritten text generation.
{"title":"Ancient Yi Script Handwriting Sample Repository.","authors":"Xiaojuan Liu, Xu Han, Shanxiong Chen, Weijia Dai, Qiuyue Ruan","doi":"10.1038/s41597-024-03918-5","DOIUrl":"10.1038/s41597-024-03918-5","url":null,"abstract":"<p><p>The ancient Yi script has been used for over 8000 years, which can be ranked with Oracle,Sumerian,Egyptian,Mayan and Harappan,and is one of the six ancient scripts in the world. In this article, we collected 2922 handwritten single word samples of commonly used ancient Yi characters. Each character was written by 310 people respectively, with a total of 427,939 valid characters. We completed continuous handwritten text sampling, written by 250 people, with 5 texts per person, covering topics such as Yi astronomy, geography, rituals, and agriculture. In the process of data collection, we proposed an automatic sampling method for ancient Yi script, and completed the automatic cutting and labeling of handwritten samples. Furthermore, we tested the recognition performance of the sorted data set under different deep learning network models. The results show that ancient Yi script has diverse shape structures and rich writing styles, which can be used as a benchmark data set in related fields such as handwritten text recognition and handwritten text generation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1183"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-04030-4
Thu Ha Nguyen, Fiona H M Tang, Giulia Conchedda, Leon Casse, Griffiths Obli-Laryea, Francesco N Tubiello, Federico Maggi
We introduce NPKGRIDS, a new geospatial dataset, providing for the first time data on application rates for all three main plant nutrients, nitrogen (N), phosphorus (P, in terms of phosphorus pentoxide, P2O5) and potassium (K, in terms of potassium oxide, K2O) across 173 crops as of 2020, with a geospatial resolution of 0.05° (approximately 5.6 km at the equator). Development of NPKGRIDS adopted a data fusion approach to integrate crop mask information with eight published datasets of fertilizer application rates, compiled from either georeferenced data or national and subnational statistics. Furthermore, the total applied mass of N, P2O5, and K2O were benchmarked against the country level information from FAO and the International Fertilizers Association (IFA) and validated against data available from National Statistical Offices (NSOs). NPKGRIDS can be used in global modelling, and decision and policy making to help maximize crop yields while reducing environmental impacts.
{"title":"NPKGRIDS: a global georeferenced dataset of N, P<sub>2</sub>O<sub>5</sub>, and K<sub>2</sub>O fertilizer application rates for 173 crops.","authors":"Thu Ha Nguyen, Fiona H M Tang, Giulia Conchedda, Leon Casse, Griffiths Obli-Laryea, Francesco N Tubiello, Federico Maggi","doi":"10.1038/s41597-024-04030-4","DOIUrl":"10.1038/s41597-024-04030-4","url":null,"abstract":"<p><p>We introduce NPKGRIDS, a new geospatial dataset, providing for the first time data on application rates for all three main plant nutrients, nitrogen (N), phosphorus (P, in terms of phosphorus pentoxide, P<sub>2</sub>O<sub>5</sub>) and potassium (K, in terms of potassium oxide, K<sub>2</sub>O) across 173 crops as of 2020, with a geospatial resolution of 0.05° (approximately 5.6 km at the equator). Development of NPKGRIDS adopted a data fusion approach to integrate crop mask information with eight published datasets of fertilizer application rates, compiled from either georeferenced data or national and subnational statistics. Furthermore, the total applied mass of N, P<sub>2</sub>O<sub>5</sub>, and K<sub>2</sub>O were benchmarked against the country level information from FAO and the International Fertilizers Association (IFA) and validated against data available from National Statistical Offices (NSOs). NPKGRIDS can be used in global modelling, and decision and policy making to help maximize crop yields while reducing environmental impacts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1179"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526156/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}