The ancient Yi script has been used for over 8000 years, which can be ranked with Oracle,Sumerian,Egyptian,Mayan and Harappan,and is one of the six ancient scripts in the world. In this article, we collected 2922 handwritten single word samples of commonly used ancient Yi characters. Each character was written by 310 people respectively, with a total of 427,939 valid characters. We completed continuous handwritten text sampling, written by 250 people, with 5 texts per person, covering topics such as Yi astronomy, geography, rituals, and agriculture. In the process of data collection, we proposed an automatic sampling method for ancient Yi script, and completed the automatic cutting and labeling of handwritten samples. Furthermore, we tested the recognition performance of the sorted data set under different deep learning network models. The results show that ancient Yi script has diverse shape structures and rich writing styles, which can be used as a benchmark data set in related fields such as handwritten text recognition and handwritten text generation.
{"title":"Ancient Yi Script Handwriting Sample Repository.","authors":"Xiaojuan Liu, Xu Han, Shanxiong Chen, Weijia Dai, Qiuyue Ruan","doi":"10.1038/s41597-024-03918-5","DOIUrl":"10.1038/s41597-024-03918-5","url":null,"abstract":"<p><p>The ancient Yi script has been used for over 8000 years, which can be ranked with Oracle,Sumerian,Egyptian,Mayan and Harappan,and is one of the six ancient scripts in the world. In this article, we collected 2922 handwritten single word samples of commonly used ancient Yi characters. Each character was written by 310 people respectively, with a total of 427,939 valid characters. We completed continuous handwritten text sampling, written by 250 people, with 5 texts per person, covering topics such as Yi astronomy, geography, rituals, and agriculture. In the process of data collection, we proposed an automatic sampling method for ancient Yi script, and completed the automatic cutting and labeling of handwritten samples. Furthermore, we tested the recognition performance of the sorted data set under different deep learning network models. The results show that ancient Yi script has diverse shape structures and rich writing styles, which can be used as a benchmark data set in related fields such as handwritten text recognition and handwritten text generation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1183"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-04030-4
Thu Ha Nguyen, Fiona H M Tang, Giulia Conchedda, Leon Casse, Griffiths Obli-Laryea, Francesco N Tubiello, Federico Maggi
We introduce NPKGRIDS, a new geospatial dataset, providing for the first time data on application rates for all three main plant nutrients, nitrogen (N), phosphorus (P, in terms of phosphorus pentoxide, P2O5) and potassium (K, in terms of potassium oxide, K2O) across 173 crops as of 2020, with a geospatial resolution of 0.05° (approximately 5.6 km at the equator). Development of NPKGRIDS adopted a data fusion approach to integrate crop mask information with eight published datasets of fertilizer application rates, compiled from either georeferenced data or national and subnational statistics. Furthermore, the total applied mass of N, P2O5, and K2O were benchmarked against the country level information from FAO and the International Fertilizers Association (IFA) and validated against data available from National Statistical Offices (NSOs). NPKGRIDS can be used in global modelling, and decision and policy making to help maximize crop yields while reducing environmental impacts.
{"title":"NPKGRIDS: a global georeferenced dataset of N, P<sub>2</sub>O<sub>5</sub>, and K<sub>2</sub>O fertilizer application rates for 173 crops.","authors":"Thu Ha Nguyen, Fiona H M Tang, Giulia Conchedda, Leon Casse, Griffiths Obli-Laryea, Francesco N Tubiello, Federico Maggi","doi":"10.1038/s41597-024-04030-4","DOIUrl":"10.1038/s41597-024-04030-4","url":null,"abstract":"<p><p>We introduce NPKGRIDS, a new geospatial dataset, providing for the first time data on application rates for all three main plant nutrients, nitrogen (N), phosphorus (P, in terms of phosphorus pentoxide, P<sub>2</sub>O<sub>5</sub>) and potassium (K, in terms of potassium oxide, K<sub>2</sub>O) across 173 crops as of 2020, with a geospatial resolution of 0.05° (approximately 5.6 km at the equator). Development of NPKGRIDS adopted a data fusion approach to integrate crop mask information with eight published datasets of fertilizer application rates, compiled from either georeferenced data or national and subnational statistics. Furthermore, the total applied mass of N, P<sub>2</sub>O<sub>5</sub>, and K<sub>2</sub>O were benchmarked against the country level information from FAO and the International Fertilizers Association (IFA) and validated against data available from National Statistical Offices (NSOs). NPKGRIDS can be used in global modelling, and decision and policy making to help maximize crop yields while reducing environmental impacts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1179"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526156/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cold stress during overwintering is considered the bottleneck of red tilapia industry. In this study, the water temperature (WT) was reduced by 2 °C per day from 20 °C to 8 °C in the cold (C) group. Then transcriptome of brain(B), gill(G), liver(L) and skin(S) tissues and proteome of G, L and S tissues were performed in C and Normal (N) (WT: 20 °C) group. 24 transcriptomes were completed, and 168.8 Gb data were obtained, with more than 5.89 Gb clean data of each sample. A total of 30499 annotation results were obtained with 3199, 4697, 4393, and 3382 differentially expressed mRNAs in NB_vs_CB, NG_vs_CG, NL_vs_CL, NS_vs_CS. 18 DIA proteomes were performed, and 6341 proteins were obtained with 178, 500 and 166 differentially expressed proteins in NG_vs_CG, NL_vs_CL, NS_vs_CS. Our datasets can be reused for key genes and proteins identification, omics joint analysis and regulatory mechanism analysis of low temperature or cold stress in fish, which will help understanding the regulatory mechanism and facilitate the molecular selective breeding of cold-resistant varieties of fish.
{"title":"The mRNA and protein datasets after cold stress of red tilapia.","authors":"Lanmei Wang, Haoran Yang, Herbert Brightmore Munyaradzia, Wenbin Zhu, Zai-Jie Dong","doi":"10.1038/s41597-024-04025-1","DOIUrl":"10.1038/s41597-024-04025-1","url":null,"abstract":"<p><p>The cold stress during overwintering is considered the bottleneck of red tilapia industry. In this study, the water temperature (WT) was reduced by 2 °C per day from 20 °C to 8 °C in the cold (C) group. Then transcriptome of brain(B), gill(G), liver(L) and skin(S) tissues and proteome of G, L and S tissues were performed in C and Normal (N) (WT: 20 °C) group. 24 transcriptomes were completed, and 168.8 Gb data were obtained, with more than 5.89 Gb clean data of each sample. A total of 30499 annotation results were obtained with 3199, 4697, 4393, and 3382 differentially expressed mRNAs in NB_vs_CB, NG_vs_CG, NL_vs_CL, NS_vs_CS. 18 DIA proteomes were performed, and 6341 proteins were obtained with 178, 500 and 166 differentially expressed proteins in NG_vs_CG, NL_vs_CL, NS_vs_CS. Our datasets can be reused for key genes and proteins identification, omics joint analysis and regulatory mechanism analysis of low temperature or cold stress in fish, which will help understanding the regulatory mechanism and facilitate the molecular selective breeding of cold-resistant varieties of fish.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1177"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1038/s41597-024-03843-7
Jianbo Jian, Feichao Du, Binhu Wang, Xiaodong Fang, Thomas Ostenfeld Larsen, Yuhang Li, Eva C Sonnenschein
The diatom Paralia guyana is a tychoplanktonic microalgal species that represents one of the early diverging diatoms. P. guyana can thrive in both planktonic and benthic habitats, making a significant contribution to the occurrence of red tide events. Although a dozen diatom genomes have been sequenced, the identity of the early diverging diatoms remains elusive. The understanding of the evolutionary clades and mechanisms of ecological adaptation in P. guyana is limited by the absence of a high-quality genome assembly. In this study, the first high-quality genome assembly for the early diverging diatom P. guyana was established using PacBio single molecular sequencing. The assembled genome has a size of 558.85 Mb, making it the largest diatom genome on record, with a contig N50 size of 26.06 Mb. A total of 27,121 protein-coding genes were predicted in the P. guyana genome, of which 22,904 predicted genes (84.45%) were functionally annotated. This data and analysis provide innovative genomic resources for tychoplanktonic microalgal species and shed light on the evolutionary origins of diatoms.
硅藻 Paralia guyana 是一种浮游微藻,是早期分化硅藻的代表之一。P. guyana 既能在浮游生物栖息地也能在底栖生物栖息地生长,对赤潮事件的发生做出了重要贡献。虽然已经对十几个硅藻基因组进行了测序,但早期分化硅藻的身份仍然难以确定。由于缺乏高质量的基因组组装,人们对 P. guyana 的进化支系和生态适应机制的了解受到了限制。在本研究中,利用 PacBio 单分子测序技术首次建立了早期分化硅藻 P. guyana 的高质量基因组。组装的基因组大小为 558.85 Mb,是有记录以来最大的硅藻基因组,等位基因 N50 大小为 26.06 Mb。在 P. guyana 基因组中,共预测出 27 121 个编码蛋白质的基因,其中 22 904 个预测基因(84.45%)已进行了功能注释。这些数据和分析为浮游微藻物种提供了创新的基因组资源,并揭示了硅藻的进化起源。
{"title":"A high-quality genome of the early diverging tychoplanktonic diatom Paralia guyana.","authors":"Jianbo Jian, Feichao Du, Binhu Wang, Xiaodong Fang, Thomas Ostenfeld Larsen, Yuhang Li, Eva C Sonnenschein","doi":"10.1038/s41597-024-03843-7","DOIUrl":"10.1038/s41597-024-03843-7","url":null,"abstract":"<p><p>The diatom Paralia guyana is a tychoplanktonic microalgal species that represents one of the early diverging diatoms. P. guyana can thrive in both planktonic and benthic habitats, making a significant contribution to the occurrence of red tide events. Although a dozen diatom genomes have been sequenced, the identity of the early diverging diatoms remains elusive. The understanding of the evolutionary clades and mechanisms of ecological adaptation in P. guyana is limited by the absence of a high-quality genome assembly. In this study, the first high-quality genome assembly for the early diverging diatom P. guyana was established using PacBio single molecular sequencing. The assembled genome has a size of 558.85 Mb, making it the largest diatom genome on record, with a contig N50 size of 26.06 Mb. A total of 27,121 protein-coding genes were predicted in the P. guyana genome, of which 22,904 predicted genes (84.45%) were functionally annotated. This data and analysis provide innovative genomic resources for tychoplanktonic microalgal species and shed light on the evolutionary origins of diatoms.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1175"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525933/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lanternfish not only boast the most abundant biomass among marine fish species but also play a vital role in marine ecosystems. As one of the lanternfish species with the highest global catch, the skinnycheek lanternfish (Benthosema pterotum) is widely distributed in the Indo-Pacific region, playing a pivotal role in the marine biological pump. This study constructed the first chromosome-level genome of B. pterotum using a combination of short-read sequencing, PacBio, and Hi-C sequencing technologies. The genome size of B. pterotum is 1,272.53 Mb, with a contig N50 of 810 Kb and a scaffold N50 of 54.49 M. More than 99.65% of contigs were successfully anchored onto 24 pseudochromosomes, and 95.7% of BUSCO genes were identified within the genome, demonstrating the high level of completeness in genome assembly. A total of 24,934 protein-coding genes were predicted, of which 99.02% were functionally annotated. The successful assembly of a high-quality genome for B. pterotum provides valuable genetic resources for better understanding its biological characteristics and potentially those of all lanternfish species.
{"title":"Chromosome-level genome assembly and annotation of the skinnycheek lanternfish Benthosema ptertum.","authors":"Qiaohong Liu, Xiaoying Cao, Lisheng Wu, Huan Wang, Hai Li, Longshan Lin, Shufang Liu, Shaoxiong Ding","doi":"10.1038/s41597-024-04039-9","DOIUrl":"10.1038/s41597-024-04039-9","url":null,"abstract":"<p><p>Lanternfish not only boast the most abundant biomass among marine fish species but also play a vital role in marine ecosystems. As one of the lanternfish species with the highest global catch, the skinnycheek lanternfish (Benthosema pterotum) is widely distributed in the Indo-Pacific region, playing a pivotal role in the marine biological pump. This study constructed the first chromosome-level genome of B. pterotum using a combination of short-read sequencing, PacBio, and Hi-C sequencing technologies. The genome size of B. pterotum is 1,272.53 Mb, with a contig N50 of 810 Kb and a scaffold N50 of 54.49 M. More than 99.65% of contigs were successfully anchored onto 24 pseudochromosomes, and 95.7% of BUSCO genes were identified within the genome, demonstrating the high level of completeness in genome assembly. A total of 24,934 protein-coding genes were predicted, of which 99.02% were functionally annotated. The successful assembly of a high-quality genome for B. pterotum provides valuable genetic resources for better understanding its biological characteristics and potentially those of all lanternfish species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1178"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The giant freshwater prawn (Macrobrachium rosenbergii) has many advantages in aquaculture, such as fast growth rate, short breeding cycle and good nutrition, which makes it a freshwater shrimp with high economic value. Herein, high-quality chromosome-level genome of both female and male prawns were obtained by combining Illumina paired-end sequencing, PacBio single molecule sequencing technique and High-through chromosome conformation capture (Hi-C) technologies. In ZZ male prawn, a final contig assembly of 3118.58 Mb with a N50 length of 956,237 bp was obtained. In WW female prawn, a final contig assembly of 3333.31 Mb with a N50 length of 1,143,555 bp was obtained. The assembled genome sequences from prawns were anchored to 59 chromosomes. Moreover, the sex chromosomes including W chromosome and Z chromosome were generated in prawn with the length of 36.23 Mb and 27.33 Mb, respectively. The sequence similarity of Z chromosome and W chromosome reached to 74.90%. The high-quality genome resource will be useful for further molecular breeding and functional genomic research of giant freshwater prawns.
{"title":"Chromosome level genome assembly of giant freshwater prawn (Macrobrachium rosenbergii).","authors":"Shiyan Liu, Meihui Li, Chong Han, Shuisheng Li, Jin Zhang, Cheng Peng, Yong Zhang","doi":"10.1038/s41597-024-04016-2","DOIUrl":"10.1038/s41597-024-04016-2","url":null,"abstract":"<p><p>The giant freshwater prawn (Macrobrachium rosenbergii) has many advantages in aquaculture, such as fast growth rate, short breeding cycle and good nutrition, which makes it a freshwater shrimp with high economic value. Herein, high-quality chromosome-level genome of both female and male prawns were obtained by combining Illumina paired-end sequencing, PacBio single molecule sequencing technique and High-through chromosome conformation capture (Hi-C) technologies. In ZZ male prawn, a final contig assembly of 3118.58 Mb with a N50 length of 956,237 bp was obtained. In WW female prawn, a final contig assembly of 3333.31 Mb with a N50 length of 1,143,555 bp was obtained. The assembled genome sequences from prawns were anchored to 59 chromosomes. Moreover, the sex chromosomes including W chromosome and Z chromosome were generated in prawn with the length of 36.23 Mb and 27.33 Mb, respectively. The sequence similarity of Z chromosome and W chromosome reached to 74.90%. The high-quality genome resource will be useful for further molecular breeding and functional genomic research of giant freshwater prawns.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1181"},"PeriodicalIF":5.8,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1038/s41597-024-04018-0
Jianmin Wang, Dennis H Choi, Elizabeth LaRue, Jeff W Atkins, Jane R Foster, Jaclyn H Matthes, Robert T Fahey, Songlin Fei, Brady S Hardiman
Structural diversity (SD) characterizes the volume and physical arrangement of biotic components in an ecosystem which control critical ecosystem functions and processes. LiDAR data provides detailed 3-D spatial position information of components and has been widely used to calculate SD. However, the intensive computation of SD metrics from extensive LiDAR datasets is time-consuming and challenging for researchers who lack access to high-performance computing resources. Moreover, a lack of understanding of LiDAR data and algorithms could lead to inconsistent SD metrics. Here, we developed a SD product using the Discrete-Return LiDAR Point Cloud from the NEON Aerial Observation Platform. This product provides SD metrics detailing height, density, openness, and complexity at a spatial resolution of 30 m, aligned to the Landsat grids, for 211 site-years for 45 Terrestrial NEON sites from 2013 to 2022. To accommodate various ecosystems with different understory heights, it includes three different cut-off heights (0.5 m, 2 m, and 5 m). This structural diversity product can enable various applications such as ecosystem productivity estimation and disturbance monitoring.
{"title":"NEON-SD: A 30-m Structural Diversity Product Derived from the NEON Discrete-Return LiDAR Point Cloud.","authors":"Jianmin Wang, Dennis H Choi, Elizabeth LaRue, Jeff W Atkins, Jane R Foster, Jaclyn H Matthes, Robert T Fahey, Songlin Fei, Brady S Hardiman","doi":"10.1038/s41597-024-04018-0","DOIUrl":"10.1038/s41597-024-04018-0","url":null,"abstract":"<p><p>Structural diversity (SD) characterizes the volume and physical arrangement of biotic components in an ecosystem which control critical ecosystem functions and processes. LiDAR data provides detailed 3-D spatial position information of components and has been widely used to calculate SD. However, the intensive computation of SD metrics from extensive LiDAR datasets is time-consuming and challenging for researchers who lack access to high-performance computing resources. Moreover, a lack of understanding of LiDAR data and algorithms could lead to inconsistent SD metrics. Here, we developed a SD product using the Discrete-Return LiDAR Point Cloud from the NEON Aerial Observation Platform. This product provides SD metrics detailing height, density, openness, and complexity at a spatial resolution of 30 m, aligned to the Landsat grids, for 211 site-years for 45 Terrestrial NEON sites from 2013 to 2022. To accommodate various ecosystems with different understory heights, it includes three different cut-off heights (0.5 m, 2 m, and 5 m). This structural diversity product can enable various applications such as ecosystem productivity estimation and disturbance monitoring.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1174"},"PeriodicalIF":5.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1038/s41597-024-04028-y
Dan Zhang, Jianfeng Jin, Zeqing Niu, Michael C Orr, Feng Zhang, Rafael R Ferrari, Qingtao Wu, Qingsong Zhou, Wa Da, Arong Luo, Chaodong Zhu
Megachile is one of the largest bee genera, including nearly 1,500 species, but very few chromosome-level assemblies exist for this group or the family Megachilidae. Here, we report the chromosome-level genome assembly of Megachile lagopoda collected from Xizang, China. Using PacBio CLR long reads and Hi-C data, we assembled a genome of 256.83 Mb with 96.08% of the assembly located on 16 chromosomes. Our assembly contains 266 scaffolds, with a scaffold N50 length of 15.6 Mb, and BUSCO completeness of 99.20%. We masked 27.10% (69.61 Mb) of the assembly as repetitive elements, identified 459 non-coding RNAs, and predicted 11,157 protein-coding genes. This high-quality genome of M. lagopoda represents an important step forward for our knowledge of megachilid genomics and bee evolution overall.
{"title":"Chromosome-level genome assembly of Megachile lagopoda (Linnaeus, 1761) (Hymenoptera: Megachilidae).","authors":"Dan Zhang, Jianfeng Jin, Zeqing Niu, Michael C Orr, Feng Zhang, Rafael R Ferrari, Qingtao Wu, Qingsong Zhou, Wa Da, Arong Luo, Chaodong Zhu","doi":"10.1038/s41597-024-04028-y","DOIUrl":"10.1038/s41597-024-04028-y","url":null,"abstract":"<p><p>Megachile is one of the largest bee genera, including nearly 1,500 species, but very few chromosome-level assemblies exist for this group or the family Megachilidae. Here, we report the chromosome-level genome assembly of Megachile lagopoda collected from Xizang, China. Using PacBio CLR long reads and Hi-C data, we assembled a genome of 256.83 Mb with 96.08% of the assembly located on 16 chromosomes. Our assembly contains 266 scaffolds, with a scaffold N50 length of 15.6 Mb, and BUSCO completeness of 99.20%. We masked 27.10% (69.61 Mb) of the assembly as repetitive elements, identified 459 non-coding RNAs, and predicted 11,157 protein-coding genes. This high-quality genome of M. lagopoda represents an important step forward for our knowledge of megachilid genomics and bee evolution overall.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1171"},"PeriodicalIF":5.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1038/s41597-024-04024-2
Brian Amos, Steven Gerontakis, Michael McDonald
We describe the creation and verification of databases of all precinct boundaries used in the United States 2016, 2018, and 2020 November general elections, enhanced with election results for all partisan statewide offices. United States election officials report election results in the smallest geographic reporting known as the precinct. Scholars and practitioners find these election results valuable for numerous use cases. However, these data cannot be augmented with other geographically-bound data, such as U.S. Census data, without precinct boundaries. Here we describe the collection of precinct boundary data from state and local election officials, sometimes provided in GIS formats, images, text descriptions, and - in rare cases - verbally. We describe how we verify boundaries with other election data, such as geocoded voter registration files. Our open-source data has appeared in redistricting litigation argued before the United States Supreme Court; and has been used by state and local redistricting authorities, media organizations, advocacy groups, scholars, and a vibrant community of mapping enthusiasts.
{"title":"United States Precinct Boundaries and Statewide Partisan Election Results.","authors":"Brian Amos, Steven Gerontakis, Michael McDonald","doi":"10.1038/s41597-024-04024-2","DOIUrl":"10.1038/s41597-024-04024-2","url":null,"abstract":"<p><p>We describe the creation and verification of databases of all precinct boundaries used in the United States 2016, 2018, and 2020 November general elections, enhanced with election results for all partisan statewide offices. United States election officials report election results in the smallest geographic reporting known as the precinct. Scholars and practitioners find these election results valuable for numerous use cases. However, these data cannot be augmented with other geographically-bound data, such as U.S. Census data, without precinct boundaries. Here we describe the collection of precinct boundary data from state and local election officials, sometimes provided in GIS formats, images, text descriptions, and - in rare cases - verbally. We describe how we verify boundaries with other election data, such as geocoded voter registration files. Our open-source data has appeared in redistricting litigation argued before the United States Supreme Court; and has been used by state and local redistricting authorities, media organizations, advocacy groups, scholars, and a vibrant community of mapping enthusiasts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1173"},"PeriodicalIF":5.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1038/s41597-024-04026-0
Kyleisha J Foote, James W A Grant, Pascale M Biron
Salmonid fishes are arguably one of the most studied fish taxa on Earth, but little is known about their biomass range in many parts of the world. We created a dataset of estimated salmonid biomass using published material of over 1000 rivers, covering 27 countries and 11 species. The dataset, spanning 84 years of data, is the largest known compilation of published studies on salmonid biomass in streams, allowing detailed analyses of differences in biomass by species, region, period, and sampling techniques. Production is also recorded for 194 rivers, allowing further analyses and relationships between biomass and production to be explored. There is scope to expand the list of variables in the dataset, which would be useful to the scientific community as it would enable models to be developed to predict salmonid biomass and production, among many other analyses.
{"title":"A global dataset of salmonid biomass in streams.","authors":"Kyleisha J Foote, James W A Grant, Pascale M Biron","doi":"10.1038/s41597-024-04026-0","DOIUrl":"10.1038/s41597-024-04026-0","url":null,"abstract":"<p><p>Salmonid fishes are arguably one of the most studied fish taxa on Earth, but little is known about their biomass range in many parts of the world. We created a dataset of estimated salmonid biomass using published material of over 1000 rivers, covering 27 countries and 11 species. The dataset, spanning 84 years of data, is the largest known compilation of published studies on salmonid biomass in streams, allowing detailed analyses of differences in biomass by species, region, period, and sampling techniques. Production is also recorded for 194 rivers, allowing further analyses and relationships between biomass and production to be explored. There is scope to expand the list of variables in the dataset, which would be useful to the scientific community as it would enable models to be developed to predict salmonid biomass and production, among many other analyses.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1172"},"PeriodicalIF":5.8,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}