BMC genomic data最新文献_第9页

Draft genome assembly for the purple-hinged rock scallop (Crassadoma gigantea). 紫铰链岩扇贝（Crassadoma gigantea）基因组组装草图。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-28 DOI: 10.1186/s12863-025-01330-5

Hayley Goss, Paige Miller, Susan F Zaleski, Robert J Miller, Donna M Schroeder, Henry M Page

引用次数: 0

The genetic structure and diversity of smallholder dairy cattle in Rwanda. 卢旺达小农奶牛的遗传结构和多样性。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-27 DOI: 10.1186/s12863-025-01323-4

Oluyinka Opoola, Felicien Shumbusho, Innocent Rwamuhizi, Isidore Houaga, David Harvey, David Hambrook, Kellie Watson, Mizeck G G Chagunda, Raphael Mrode, Appolinaire Djikeng

Previous genomic characterisation of Rwanda dairy cattle predominantly focused on the One Cow per Poor Family (locally called "Girinka") programme. However, smallholder farmers in Rwanda have benefited from other livestock initiatives and development programmes. Capturing and documenting the genetic diversity, is critical in part as a key contribution to genomic resource required to support dairy development in Rwanda. A total of 2,229 crossbred animals located in all dairy-producing regions of Rwanda were sampled. For each animal, a hair sample was collected and genotyped by using the Geneseek Genomic Profiler (GGP, Neogen Geneseek^®) Bovine 50 K (n = 1,917) and GGP Bovine 100 K arrays (n = 312). The combined dataset was subject to quality control, data curation for use in population genetics and genomic analyses. To assess the genetic structure and diversity of the current population, key analyses for population structure were applied: Principal Component Analysis (PCA), population structure and diversity, admixture analysis, measures of heterozygosity, runs of homozygosity (ROH) and minor allelic frequency (MAF). A dataset of global dairy population of European taurine, African indicus and African taurus (n = 250) was used as reference. Results showed that Rwanda cattle population is highly admixed of diverse pure and crossbred animals with average MAF of 33% (standard error; se = 0.001) with proportion of foreign high yielding (taurine) dairy breeds of Jersey Island (18%); 12% non-Island Jersey and 42% Holstein-Friesian ancestries. Two African Bos taurus and five Bos indicus breeds contributed 28% of their genetics. Genetic distances were highest in Gir and N'dama (0.29); and Nelore and N'dama (0.29). There were 1,331 ROH regions and average heterozygosity were high for Rwanda cattle (0.41 se = 0.001). Asides well-established genes in cattle, we found evidence for a variety of novel and less-known genes under selection to be associated with fertility, milk production, innate immunity and environmental adaptation. This observed diversity offers opportunity to decipher the presence and/or lack of genetic variations to initiate short- and long-term breed improvement programmes for adaptation traits, disease resistance, heat tolerance, productivity and profitability of smallholder dairy systems in Rwanda.

以前对卢旺达奶牛的基因组特征分析主要集中在每个贫困家庭一头奶牛（当地称为“Girinka”）计划上。然而，卢旺达的小农也受益于其他牲畜倡议和发展方案。捕捉和记录遗传多样性至关重要，部分原因是对支持卢旺达乳制品发展所需的基因组资源作出了重要贡献。对位于卢旺达所有奶业产区的2 229只杂交动物进行了抽样。对每只动物收集毛发样本，并使用Geneseek基因组分析器（GGP, Neogen Geneseek®）牛50 K （n = 1,917）和GGP牛100 K阵列（n = 312）进行基因分型。合并的数据集受到质量控制，数据管理用于群体遗传学和基因组分析。采用主成分分析（PCA）、种群结构与多样性分析、外源分析、杂合度测量、纯合度运行数（ROH）和次要等位基因频率（MAF）等关键分析方法对现有群体的遗传结构和多样性进行了评价。以欧洲牛磺酸、非洲籼牛和非洲金牛全球奶牛种群数据集（n = 250）为参考。结果表明，卢旺达牛种群是多种纯种和杂交动物的高度混合种群，平均MAF为33%(标准误差；se = 0.001)，泽西岛外国高产（牛磺酸）奶牛品种所占比例为18%；12%非泽西岛血统，42%荷尔斯泰因-弗里斯血统。2个非洲牛牛品种和5个非洲牛品种贡献了28%的遗传基因。遗传距离最高的是吉尔和恩达马（0.29）；以及Nelore and N'dama（0.29）。卢旺达牛的平均杂合度较高（0.41 se = 0.001）。除了牛中已建立的基因外，我们还发现了一些新的和不太为人所知的基因在选择过程中与生育力、产奶量、先天免疫和环境适应有关的证据。这种观察到的多样性为破译遗传变异的存在和/或缺乏提供了机会，从而启动短期和长期的品种改进计划，以提高卢旺达小农乳制品系统的适应性状、抗病性、耐热性、生产力和盈利能力。

{"title":"The genetic structure and diversity of smallholder dairy cattle in Rwanda.","authors":"Oluyinka Opoola, Felicien Shumbusho, Innocent Rwamuhizi, Isidore Houaga, David Harvey, David Hambrook, Kellie Watson, Mizeck G G Chagunda, Raphael Mrode, Appolinaire Djikeng","doi":"10.1186/s12863-025-01323-4","DOIUrl":"10.1186/s12863-025-01323-4","url":null,"abstract":"Previous genomic characterisation of Rwanda dairy cattle predominantly focused on the One Cow per Poor Family (locally called \"Girinka\") programme. However, smallholder farmers in Rwanda have benefited from other livestock initiatives and development programmes. Capturing and documenting the genetic diversity, is critical in part as a key contribution to genomic resource required to support dairy development in Rwanda. A total of 2,229 crossbred animals located in all dairy-producing regions of Rwanda were sampled. For each animal, a hair sample was collected and genotyped by using the Geneseek Genomic Profiler (GGP, Neogen Geneseek®) Bovine 50 K (n = 1,917) and GGP Bovine 100 K arrays (n = 312). The combined dataset was subject to quality control, data curation for use in population genetics and genomic analyses. To assess the genetic structure and diversity of the current population, key analyses for population structure were applied: Principal Component Analysis (PCA), population structure and diversity, admixture analysis, measures of heterozygosity, runs of homozygosity (ROH) and minor allelic frequency (MAF). A dataset of global dairy population of European taurine, African indicus and African taurus (n = 250) was used as reference. Results showed that Rwanda cattle population is highly admixed of diverse pure and crossbred animals with average MAF of 33% (standard error; se = 0.001) with proportion of foreign high yielding (taurine) dairy breeds of Jersey Island (18%); 12% non-Island Jersey and 42% Holstein-Friesian ancestries. Two African Bos taurus and five Bos indicus breeds contributed 28% of their genetics. Genetic distances were highest in Gir and N'dama (0.29); and Nelore and N'dama (0.29). There were 1,331 ROH regions and average heterozygosity were high for Rwanda cattle (0.41 se = 0.001). Asides well-established genes in cattle, we found evidence for a variety of novel and less-known genes under selection to be associated with fertility, milk production, innate immunity and environmental adaptation. This observed diversity offers opportunity to decipher the presence and/or lack of genetic variations to initiate short- and long-term breed improvement programmes for adaptation traits, disease resistance, heat tolerance, productivity and profitability of smallholder dairy systems in Rwanda.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"38"},"PeriodicalIF":1.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144163855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Whole-genome sequencing of global forest pathogen Diplodia sapinea causing pine shoot blight. 全球森林病原菌松枝枯萎病的全基因组测序。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-27 DOI: 10.1186/s12863-025-01328-z

QuanChao Wang, FeiFei Liu, HuaChao Xu, XuDong Zhou

Objective: The pathogenic fungus Diplodia sapinea is of significant importance due to its primary role inducing tip dieback on various Pinus species which are widely distributed throughout the world. The objective of this study is to further provide comprehensive and specific resources for genome assembly and sequence annotation of this important forest pathogen from China, thereby establishing a robust foundation for future studies on its systematics, population genetics, genomics and global movement.

Data description: A high-quality genome of D. sapinea strain ZXD319 was sequenced utilizing the Nanopore PromethION and BGI DNBSEQ-T7 platforms. The assembled genome spans a total length of 36.81 Mb, comprising 14 contigs, with a GC content of 56.80% and an N50 value of 2,972,533 bp. It encompasses 11,200 protein-coding genes and 252 noncoding RNAs. The predicted genes were annotated against multiple public databases, and 1,611 potential virulence genes were identified through the Pathogen Host Interactions (PHI) database. Furthermore, the genome comparative analysis of D. sapinea and related species revealed 11,568 gene clusters and 3,436 single-copy clusters. Phylogenetic analysis indicated a close evolutionary relationship between D. sapinea with D. corticola and D. seriata. The genomic data presented herein serve as a valuable resource for future studies on this globally important pathogen.

目的：对广泛分布在世界各地的各种松属植物的根尖枯死起主要诱导作用，故对松属真菌sapinea的研究具有重要意义。本研究旨在进一步为这一重要的中国森林病原菌的基因组组装和序列注释提供全面而有针对性的资源，为其系统学、群体遗传学、基因组学和全球运动研究奠定坚实的基础。数据描述：利用Nanopore PromethION和BGI DNBSEQ-T7平台对D. sapinea菌株ZXD319的高质量基因组进行了测序。该基因组全长36.81 Mb，包含14个contigs， GC含量为56.80%，N50值为2,972,533 bp。它包含11,200个蛋白质编码基因和252个非编码rna。在多个公共数据库中对预测基因进行注释，并通过病原体宿主相互作用（PHI）数据库鉴定出1,611个潜在毒力基因。此外，通过基因组比较分析，发现了11,568个基因簇和3,436个单拷贝簇。系统发育分析表明，皂荚菊与皮质菊和seriata有着密切的进化关系。本文提出的基因组数据为未来研究这一全球重要病原体提供了宝贵的资源。

{"title":"Whole-genome sequencing of global forest pathogen Diplodia sapinea causing pine shoot blight.","authors":"QuanChao Wang, FeiFei Liu, HuaChao Xu, XuDong Zhou","doi":"10.1186/s12863-025-01328-z","DOIUrl":"10.1186/s12863-025-01328-z","url":null,"abstract":"Objective: The pathogenic fungus Diplodia sapinea is of significant importance due to its primary role inducing tip dieback on various Pinus species which are widely distributed throughout the world. The objective of this study is to further provide comprehensive and specific resources for genome assembly and sequence annotation of this important forest pathogen from China, thereby establishing a robust foundation for future studies on its systematics, population genetics, genomics and global movement.Data description: A high-quality genome of D. sapinea strain ZXD319 was sequenced utilizing the Nanopore PromethION and BGI DNBSEQ-T7 platforms. The assembled genome spans a total length of 36.81 Mb, comprising 14 contigs, with a GC content of 56.80% and an N50 value of 2,972,533 bp. It encompasses 11,200 protein-coding genes and 252 noncoding RNAs. The predicted genes were annotated against multiple public databases, and 1,611 potential virulence genes were identified through the Pathogen Host Interactions (PHI) database. Furthermore, the genome comparative analysis of D. sapinea and related species revealed 11,568 gene clusters and 3,436 single-copy clusters. Phylogenetic analysis indicated a close evolutionary relationship between D. sapinea with D. corticola and D. seriata. The genomic data presented herein serve as a valuable resource for future studies on this globally important pathogen.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"37"},"PeriodicalIF":1.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144153001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The first complete genome of Fructilactobacillus vespulae: strain Mu01, isolated from nectar of Musa paradisiaca L. 从天堂芭蕉花蜜中分离得到了第一个完整的vespulae果乳杆菌Mu01的基因组。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-22 DOI: 10.1186/s12863-025-01329-y

Manuel Zúñiga, Cristina Alcántara, Ángela Peirotén, Luis Andrés Ramón-Nuñez, Vicente Monedero, José María Landete

Objectives: Lactobacillales, commonly known as lactic acid bacteria (LAB), is an order of Gram-positive, facultatively anaerobic or microaerophilic bacteria characterized by their ability to ferment carbohydrates and produce lactic acid as a major metabolic byproduct. Many species within this group have significant roles in food fermentation, human health, and industrial applications. Here, we report the complete genome sequence of Fructilactobacillus vespulae Mu01, the first sequenced genome of this species. The complete genome sequence of F. vespulae Mu01 is expected to provide valuable insights into the genetics and metabolism of this little-characterized species.

Data description: A novel strain of Fructilactobacillus vespulae was isolated from nectar of Musa paradisiaca L. during a survey for LAB associated with wild and cultivated plants in the metropolitan area of Valencia, Spain. A complete genome was obtained by sequencing with Nanopore long read technology. The genome consists of a chromosome of 1506092 bp and a plasmid of 42437 bp, presenting a GC content of 36 % and 31 %, respectively. The genome includes 1541 genes, with 1450 CDSs, 7 pseudogenes, 18 rRNA encoding genes, 63 tRNAs and 3 ncRNAs.

目的：乳酸杆菌，通常被称为乳酸菌（LAB），是革兰氏阳性、兼性厌氧或嗜微氧细菌的一目，其特征是它们具有发酵碳水化合物和产生乳酸作为主要代谢副产物的能力。该组中的许多物种在食品发酵，人类健康和工业应用中具有重要作用。在此，我们报道了vesulae frutilactobacillus Mu01的完整基因组序列，这是该物种的第一个基因组序列。F. vespulae Mu01的全基因组序列有望为这一鲜为人知的物种的遗传和代谢提供有价值的见解。资料描述：在西班牙瓦伦西亚市区对野生和栽培植物的乳酸菌进行调查时，从天堂Musa paradisiaca L.的花蜜中分离到一株新的乳酸菌vespulae。利用纳米孔长读技术测序获得了完整的基因组。基因组由一条1506092 bp的染色体和一个42437 bp的质粒组成，GC含量分别为36%和31%。基因组包括1541个基因，其中CDSs 1450个，假基因7个，rRNA编码基因18个，trna 63个，ncrna 3个。

{"title":"The first complete genome of Fructilactobacillus vespulae: strain Mu01, isolated from nectar of Musa paradisiaca L.","authors":"Manuel Zúñiga, Cristina Alcántara, Ángela Peirotén, Luis Andrés Ramón-Nuñez, Vicente Monedero, José María Landete","doi":"10.1186/s12863-025-01329-y","DOIUrl":"10.1186/s12863-025-01329-y","url":null,"abstract":"Objectives: Lactobacillales, commonly known as lactic acid bacteria (LAB), is an order of Gram-positive, facultatively anaerobic or microaerophilic bacteria characterized by their ability to ferment carbohydrates and produce lactic acid as a major metabolic byproduct. Many species within this group have significant roles in food fermentation, human health, and industrial applications. Here, we report the complete genome sequence of Fructilactobacillus vespulae Mu01, the first sequenced genome of this species. The complete genome sequence of F. vespulae Mu01 is expected to provide valuable insights into the genetics and metabolism of this little-characterized species.Data description: A novel strain of Fructilactobacillus vespulae was isolated from nectar of Musa paradisiaca L. during a survey for LAB associated with wild and cultivated plants in the metropolitan area of Valencia, Spain. A complete genome was obtained by sequencing with Nanopore long read technology. The genome consists of a chromosome of 1506092 bp and a plasmid of 42437 bp, presenting a GC content of 36 % and 31 %, respectively. The genome includes 1541 genes, with 1450 CDSs, 7 pseudogenes, 18 rRNA encoding genes, 63 tRNAs and 3 ncRNAs.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"36"},"PeriodicalIF":1.9,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computational prediction of deleterious nonsynonymous SNPs in the CTNS gene: implications for cystinosis. CTNS基因中有害非同义snp的计算预测：对胱氨酸病的影响。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-15 DOI: 10.1186/s12863-025-01325-2

Leila Adda Neggaz, Amira Chahinez Dahmani, Ibtissem Derriche, Nawel Adda Neggaz, Abdallah Boudjema

Background: Cystinosis is a rare autosomal recessive lysosomal storage disorder caused by mutations in the CTNS gene, which encodes cystinosin, a lysosomal cystine transporter. These mutations disrupt cystine efflux, leading to its accumulation in lysosomes and subsequent cellular damage. While more than 140 mutations have been identified, the functional and structural impacts of many nonsynonymous single nucleotide polymorphisms (nsSNPs) remain poorly understood. Nonsynonymous SNPs are of particular interest because they can directly alter protein structure and function, potentially leading to disease. Clinically, cystinosis most often presents with renal Fanconi syndrome, photophobia and vision loss due to corneal cystine crystals, and progressive neuromuscular complications such as distal myopathy and swallowing difficulties This study aimed to identify deleterious nsSNPs in the CTNS gene and evaluate their effects on cystinosin stability, structure, and function via computational tools and molecular dynamics simulations.

Results: From a dataset of 12,028 SNPs, 327 nsSNPs were identified, among which 19 were consistently classified as deleterious across multiple predictive tools, including SIFT, PolyPhen, and molecular dynamics simulations. Stability predictions revealed that most of these mutations destabilize cystinosin, with G308R and G308V located in the sixth transmembrane domain essential for transporter function having the most severe effects. Molecular dynamics simulations revealed that these mutations significantly increase local flexibility, alter hydrogen bonding patterns, and enhance solvent accessibility, resulting in structural perturbations. Notably, D305G and F142S disrupted the transmembrane domains essential for the function of cystinosin, whereas compared with the wild-type protein, G309V resulted in increased stability. Conservation analysis revealed that 16 of the 19 mutations affected highly conserved residues, indicating their crucial roles in the function of cystinosin. Additionally, protein interaction analyses suggested that mutations could impact associations with lysosomal and membrane transport proteins.

Conclusions: This study identified 19 deleterious nsSNPs in the CTNS gene that impair cystinosin stability and function. These findings highlight the structural and functional importance of key residues, such as G308, D305, and F142, which play critical roles in maintaining the active conformation and transport capacity of cystinosin. These insights provide a foundation for future experimental validation and the development of targeted therapeutic strategies to mitigate the effects of pathogenic mutations in cystinosis.

背景：胱氨酸病是一种罕见的常染色体隐性溶酶体贮积症，由编码溶酶体胱氨酸转运体——胱氨酸的CTNS基因突变引起。这些突变破坏了胱氨酸的外排，导致其在溶酶体中的积累和随后的细胞损伤。虽然已经确定了140多个突变，但许多非同义单核苷酸多态性（nsSNPs）的功能和结构影响仍然知之甚少。非同义snp特别有趣，因为它们可以直接改变蛋白质结构和功能，潜在地导致疾病。在临床上，胱氨酸病最常表现为肾范可尼综合征、角膜胱氨酸结晶体引起的畏光和视力丧失，以及进行性神经肌肉并发症，如远端肌病和吞咽困难。本研究旨在通过计算工具和分子动力学模拟来鉴定CTNS基因中有害的非单核苷酸多态性，并评估它们对胱氨酸蛋白稳定性、结构和功能的影响。结果：从12028个snp的数据集中，鉴定出327个nssnp，其中19个通过多种预测工具（包括SIFT， PolyPhen和分子动力学模拟）一致地被归类为有害的。稳定性预测显示，这些突变中的大多数会破坏胱氨酸的稳定性，其中位于转运体功能必需的第六跨膜结构域的G308R和G308V影响最严重。分子动力学模拟表明，这些突变显著增加了局部柔韧性，改变了氢键模式，增强了溶剂可及性，导致结构扰动。值得注意的是，D305G和F142S破坏了胱氨酸蛋白功能所必需的跨膜结构域，而与野生型蛋白相比，G309V增加了稳定性。保守分析显示，19个突变中有16个影响了高度保守的残基，表明它们在胱氨酸的功能中起着至关重要的作用。此外，蛋白质相互作用分析表明，突变可能影响与溶酶体和膜运输蛋白的关联。结论：本研究在CTNS基因中发现了19个有害的非单核苷酸多态性，这些非单核苷酸多态性会损害胱氨酸的稳定性和功能。这些发现突出了G308、D305和F142等关键残基在结构和功能上的重要性，它们在维持胱氨酸的活性构象和运输能力方面起着关键作用。这些见解为未来的实验验证和靶向治疗策略的发展提供了基础，以减轻胱氨酸病致病性突变的影响。

{"title":"Computational prediction of deleterious nonsynonymous SNPs in the CTNS gene: implications for cystinosis.","authors":"Leila Adda Neggaz, Amira Chahinez Dahmani, Ibtissem Derriche, Nawel Adda Neggaz, Abdallah Boudjema","doi":"10.1186/s12863-025-01325-2","DOIUrl":"https://doi.org/10.1186/s12863-025-01325-2","url":null,"abstract":"Background: Cystinosis is a rare autosomal recessive lysosomal storage disorder caused by mutations in the CTNS gene, which encodes cystinosin, a lysosomal cystine transporter. These mutations disrupt cystine efflux, leading to its accumulation in lysosomes and subsequent cellular damage. While more than 140 mutations have been identified, the functional and structural impacts of many nonsynonymous single nucleotide polymorphisms (nsSNPs) remain poorly understood. Nonsynonymous SNPs are of particular interest because they can directly alter protein structure and function, potentially leading to disease. Clinically, cystinosis most often presents with renal Fanconi syndrome, photophobia and vision loss due to corneal cystine crystals, and progressive neuromuscular complications such as distal myopathy and swallowing difficulties This study aimed to identify deleterious nsSNPs in the CTNS gene and evaluate their effects on cystinosin stability, structure, and function via computational tools and molecular dynamics simulations.Results: From a dataset of 12,028 SNPs, 327 nsSNPs were identified, among which 19 were consistently classified as deleterious across multiple predictive tools, including SIFT, PolyPhen, and molecular dynamics simulations. Stability predictions revealed that most of these mutations destabilize cystinosin, with G308R and G308V located in the sixth transmembrane domain essential for transporter function having the most severe effects. Molecular dynamics simulations revealed that these mutations significantly increase local flexibility, alter hydrogen bonding patterns, and enhance solvent accessibility, resulting in structural perturbations. Notably, D305G and F142S disrupted the transmembrane domains essential for the function of cystinosin, whereas compared with the wild-type protein, G309V resulted in increased stability. Conservation analysis revealed that 16 of the 19 mutations affected highly conserved residues, indicating their crucial roles in the function of cystinosin. Additionally, protein interaction analyses suggested that mutations could impact associations with lysosomal and membrane transport proteins.Conclusions: This study identified 19 deleterious nsSNPs in the CTNS gene that impair cystinosin stability and function. These findings highlight the structural and functional importance of key residues, such as G308, D305, and F142, which play critical roles in maintaining the active conformation and transport capacity of cystinosin. These insights provide a foundation for future experimental validation and the development of targeted therapeutic strategies to mitigate the effects of pathogenic mutations in cystinosis.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"35"},"PeriodicalIF":1.9,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12079974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144082491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Molecular characterization and phylogenetic analyses of the mitogenome of Wan-Xi white goose, a native goose breed in China. 中国本土鹅品种万西白鹅有丝分裂基因组的分子特征及系统发育分析。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-13 DOI: 10.1186/s12863-025-01326-1

Lunbin Xia, Shaoshuai Bi, Yafei Zhang, Cunwu Chen, Naidong Chen

Background: The Wan-Xi white goose (WXG), an indigenous Chinese waterfowl (Anserini: Anserinae), is crucial for goose germplasm conservation. This study aimed to sequence and analyze the complete mitochondrial DNA (mtDNA) of WXG using the BGISEQ-500 platform. The mtDNA's structure and function were investigated to gain insights into its genetic diversity and population structure.

Results: The mtDNA was found to be 16,743 bp long and comprised 22 transfer RNA (tRNA) genes, 2 ribosomal RNA genes, a complement of 13 protein-coding genes (PCGs), as well as a single noncoding control region known as the D-loop. Notably, all tRNA genes, except for trnS1-tRNA which lacked the dihydrouridine stem, were predicted to adopt the typical cloverleaf structure. Given the genetic variability across the mtDNA of Anser spp. and the intergenic gaps identified by codon analysis, the codon usage patterns were comprehensively examined via comparative analysis of the mtDNAs of WXG and 24 other Anser spp. The relative synonymous codon usage (RSCU) values of the 13 mitochondrial PCGs of WXG were consistent with those of the mitochondrial PCGs of the 24 other Anser spp. Analysis of the neutrality (GC3-GC12), the effective number of codons (ENCs)-GC3, and parity rule 2-bias plots further revealed that natural selection emerged as the primary factor influencing codon bias in Anser sp. High nucleotide diversity (Pi > 0.02) was observed in several regions, including the D-loop, ATP6, 12S rRNA, ND1, 16S rRNA_ND1, COX2, and ND5. Furthermore, the results of nonsynonymous (Ka)/synonymous (Ks) analysis of the 13 mitochondrial PCGs of the 25 species under Anser revealed that the genes were subject to strong purifying selection. The findings of phylogenetic analysis further revealed that WXG and 10 other members of Anser cygnoides clustered into a single branch to form a monophyletic group.

Conclusion: This research provides valuable insights into the mtDNA of WXG, highlighting its genetic diversity and population structure. The identified mutation hotspots and purifying selection on mitochondrial PCGs suggest potential areas for future research on Anser cygnoides. The findings contribute to our understanding of this rare species and its conservation efforts.

背景：皖西白鹅（WXG）是中国本土水禽（雁科），在鹅种质资源保护中具有重要意义。本研究旨在利用BGISEQ-500平台对WXG的线粒体全DNA （mtDNA）进行测序和分析。研究了mtDNA的结构和功能，以了解其遗传多样性和群体结构。结果发现，mtDNA长16743 bp，由22个转移RNA （tRNA）基因、2个核糖体RNA基因、13个蛋白质编码基因（PCGs）和一个称为D-loop的非编码控制区组成。值得注意的是，除trnS1-tRNA缺乏二氢吡啶茎外，所有tRNA基因均采用典型的三叶草结构。考虑到Anser sp . mtDNA的遗传变异性和密码子分析发现的基因间缺口，通过对WXG和其他24种Anser sp .线粒体mtDNA的比较分析，全面考察了密码子的使用模式，WXG的13个线粒体pcg的相对同义密码子使用（RSCU）值与其他24种Anser sp .线粒体pcg的相对同义密码子使用（RSCU）值一致。和宇称规则2-偏倚图进一步揭示了自然选择是影响Anser sp密码子偏倚的主要因素。在D-loop、ATP6、12S rRNA、ND1、16S rRNA_ND1、COX2和ND5等多个区域观察到高核苷酸多样性（Pi > 0.02）。此外，对25个物种的13个线粒体PCGs的非同义(Ka)/同义（Ks）分析结果显示，这些基因受到强烈的纯化选择的影响。系统发育分析结果进一步表明，WXG和其他10个cygnoides成员聚集在一个分支中，形成一个单系群。结论：本研究为了解WXG的mtDNA提供了有价值的信息，揭示了WXG的遗传多样性和群体结构。线粒体PCGs突变热点的确定和纯化选择为今后的研究提供了潜在的研究领域。这些发现有助于我们了解这种稀有物种及其保护工作。

{"title":"Molecular characterization and phylogenetic analyses of the mitogenome of Wan-Xi white goose, a native goose breed in China.","authors":"Lunbin Xia, Shaoshuai Bi, Yafei Zhang, Cunwu Chen, Naidong Chen","doi":"10.1186/s12863-025-01326-1","DOIUrl":"10.1186/s12863-025-01326-1","url":null,"abstract":"Background: The Wan-Xi white goose (WXG), an indigenous Chinese waterfowl (Anserini: Anserinae), is crucial for goose germplasm conservation. This study aimed to sequence and analyze the complete mitochondrial DNA (mtDNA) of WXG using the BGISEQ-500 platform. The mtDNA's structure and function were investigated to gain insights into its genetic diversity and population structure.Results: The mtDNA was found to be 16,743 bp long and comprised 22 transfer RNA (tRNA) genes, 2 ribosomal RNA genes, a complement of 13 protein-coding genes (PCGs), as well as a single noncoding control region known as the D-loop. Notably, all tRNA genes, except for trnS1-tRNA which lacked the dihydrouridine stem, were predicted to adopt the typical cloverleaf structure. Given the genetic variability across the mtDNA of Anser spp. and the intergenic gaps identified by codon analysis, the codon usage patterns were comprehensively examined via comparative analysis of the mtDNAs of WXG and 24 other Anser spp. The relative synonymous codon usage (RSCU) values of the 13 mitochondrial PCGs of WXG were consistent with those of the mitochondrial PCGs of the 24 other Anser spp. Analysis of the neutrality (GC3-GC12), the effective number of codons (ENCs)-GC3, and parity rule 2-bias plots further revealed that natural selection emerged as the primary factor influencing codon bias in Anser sp. High nucleotide diversity (Pi > 0.02) was observed in several regions, including the D-loop, ATP6, 12S rRNA, ND1, 16S rRNA_ND1, COX2, and ND5. Furthermore, the results of nonsynonymous (Ka)/synonymous (Ks) analysis of the 13 mitochondrial PCGs of the 25 species under Anser revealed that the genes were subject to strong purifying selection. The findings of phylogenetic analysis further revealed that WXG and 10 other members of Anser cygnoides clustered into a single branch to form a monophyletic group.Conclusion: This research provides valuable insights into the mtDNA of WXG, highlighting its genetic diversity and population structure. The identified mutation hotspots and purifying selection on mitochondrial PCGs suggest potential areas for future research on Anser cygnoides. The findings contribute to our understanding of this rare species and its conservation efforts.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"34"},"PeriodicalIF":1.9,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correlation of METTL4 genetic variants and severe pneumonia pediatric patients in Southern China. METTL4基因变异与中国南方儿童重症肺炎的相关性

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-05-01 DOI: 10.1186/s12863-025-01306-5

Liuheyi Ma, Xiaoyu Zuo, Bingtai Lu, Yuxia Zhang

Background: Pneumonia is a major cause of mortality and health burden in children under five, yet its genetic etiology remains poorly understood. Methyltransferase 4, N6-adenosine (METTL4), is a methyltransferase enzyme responsible for RNA and DNA methylation and is known to be activated under hypoxic conditions. However, its potential link to susceptibility to pneumonia has not been evaluated. This study aimed to explore candidate regulatory single nucleotide polymorphisms (SNPs) within the METTL4 gene and their association with the development of severe pneumonia.

Results: In this study, we recruited a cohort of 1034 children with severe pneumonia and 8426 healthy controls. We investigated the associations of candidate regulatory single nucleotide polymorphisms (SNPs) within METTL4 polymorphisms with severe pneumonia. Our results indicated that the C allele of rs9989554 (P = 0.00023, OR = 1.21, 95% CI: 1.09-1.34) and the G allele of rs16943442 (P = 0.0026, OR = 1.22, 95% CI: 1.07-1.38) were significantly associated with an increased risk of severe pneumonia. The regulatory potential of these two SNPs in the lung was investigated using tools such as expression quantitative trait loci (eQTLs), RegulomeDB, and FORGEdb.

Conclusions: This study represents the first investigation elucidating the role of genetic variations in the METTL4 gene and their influence on susceptibility to severe pneumonia in pediatric populations. METTL4 is identified as a novel predisposing gene for severe pneumonia and a potential therapeutic target. Further research is warranted to validate this correlation and to comprehensively elucidate the biological role of the METTL4 gene in severe pneumonia.

背景：肺炎是五岁以下儿童死亡和健康负担的主要原因，但其遗传病因仍知之甚少。甲基转移酶4，n6 -腺苷（METTL4）是一种负责RNA和DNA甲基化的甲基转移酶，已知在缺氧条件下被激活。然而，它与肺炎易感性的潜在联系尚未得到评估。本研究旨在探索METTL4基因内的候选调节单核苷酸多态性（snp）及其与重症肺炎发展的关系。结果：在这项研究中，我们招募了1034名重症肺炎儿童和8426名健康对照。我们研究了METTL4多态性中候选调节单核苷酸多态性（snp）与重症肺炎的关系。结果显示，rs9989554基因的C等位基因（P = 0.00023, OR = 1.21, 95% CI: 1.09-1.34）和rs16943442基因的G等位基因（P = 0.0026, OR = 1.22, 95% CI: 1.07-1.38）与重症肺炎风险增加显著相关。使用表达定量性状位点（eqtl）、RegulomeDB和FORGEdb等工具研究了这两个snp在肺中的调控潜力。结论：本研究首次阐明了METTL4基因遗传变异的作用及其对儿童重症肺炎易感性的影响。METTL4是一种新的重症肺炎易感基因和潜在的治疗靶点。需要进一步的研究来验证这种相关性，并全面阐明METTL4基因在重症肺炎中的生物学作用。

{"title":"Correlation of METTL4 genetic variants and severe pneumonia pediatric patients in Southern China.","authors":"Liuheyi Ma, Xiaoyu Zuo, Bingtai Lu, Yuxia Zhang","doi":"10.1186/s12863-025-01306-5","DOIUrl":"https://doi.org/10.1186/s12863-025-01306-5","url":null,"abstract":"Background: Pneumonia is a major cause of mortality and health burden in children under five, yet its genetic etiology remains poorly understood. Methyltransferase 4, N6-adenosine (METTL4), is a methyltransferase enzyme responsible for RNA and DNA methylation and is known to be activated under hypoxic conditions. However, its potential link to susceptibility to pneumonia has not been evaluated. This study aimed to explore candidate regulatory single nucleotide polymorphisms (SNPs) within the METTL4 gene and their association with the development of severe pneumonia.Results: In this study, we recruited a cohort of 1034 children with severe pneumonia and 8426 healthy controls. We investigated the associations of candidate regulatory single nucleotide polymorphisms (SNPs) within METTL4 polymorphisms with severe pneumonia. Our results indicated that the C allele of rs9989554 (P = 0.00023, OR = 1.21, 95% CI: 1.09-1.34) and the G allele of rs16943442 (P = 0.0026, OR = 1.22, 95% CI: 1.07-1.38) were significantly associated with an increased risk of severe pneumonia. The regulatory potential of these two SNPs in the lung was investigated using tools such as expression quantitative trait loci (eQTLs), RegulomeDB, and FORGEdb.Conclusions: This study represents the first investigation elucidating the role of genetic variations in the METTL4 gene and their influence on susceptibility to severe pneumonia in pediatric populations. METTL4 is identified as a novel predisposing gene for severe pneumonia and a potential therapeutic target. Further research is warranted to validate this correlation and to comprehensively elucidate the biological role of the METTL4 gene in severe pneumonia.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"33"},"PeriodicalIF":1.9,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12044828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comprehensive transcriptome of muscle development in Sichuan white rabbit. 四川白兔肌肉发育的综合转录组。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-04-23 DOI: 10.1186/s12863-025-01322-5

Xiangyu Zhang, Kai Zhang, Dengping Huang, Shangjun Yang, Min Zhang, Qin Yin

Background: The Sichuan white rabbit is a unique domestic breed and is famous for its high meat production. Muscle development is a complicated biological process, but the underlying regulatory mechanisms have not been elucidated. Here, we generated comprehensive transcriptome datasets (i.e., mRNAs, miRNAs and lncRNAs) in three developmental stages of Sichuan white rabbits, and aim to systematically explore the regulatory network in myogenesis.

Results: We generated extensive transcriptome datasets (mRNAs, miRNAs and lncRNAs) revealing the myogenic regulatory network at different time points. Our differential expression analysis identified 2,995 DE genes, 1,211 DE-lncRNAs, and 305 DE-miRNAs with distinct expression patterns across developmental stages. In addition, functional enrichment analysis of DE mRNAs and miRNAs indicates their involvement in muscle growth, development, and regeneration, highlighting biological processes and muscle-specific functions. Interaction analysis between DE-lncRNAs and mRNAs uncovered a complex regulatory network, especially between 21 and 27 days of development. These findings contribute to better understanding of the transcriptomic changes during muscle development and have implications for breeding improvement in Sichuan white rabbits.

Conclusions: Our study provides a comprehensive overview of the transcriptomic changes during muscle development in Sichuan white rabbits. The identification and functional annotation of DE genes, miRNAs, and lncRNAs provide valuable insights into the molecular mechanisms underlying this process. These findings pave the way for targeted investigations into the role of non-coding RNAs in muscle biology.

背景：四川白兔是国内独特的品种，以产肉量高而闻名。肌肉发育是一个复杂的生物学过程，但其潜在的调控机制尚未阐明。在此，我们建立了四川白兔三个发育阶段的转录组数据集（即mrna、miRNAs和lncRNAs），旨在系统地探索肌肉发生的调控网络。结果：我们生成了广泛的转录组数据集（mrna、miRNAs和lncRNAs），揭示了不同时间点的肌生成调控网络。我们的差异表达分析确定了2,995个DE基因，1,211个DE- lncrna和305个DE- mirna在发育阶段具有不同的表达模式。此外，DE mrna和mirna的功能富集分析表明，它们参与肌肉生长、发育和再生，突出了生物过程和肌肉特异性功能。de - lncrna和mrna之间的相互作用分析揭示了一个复杂的调控网络，特别是在发育的21至27天之间。这些发现有助于更好地理解肌肉发育过程中的转录组变化，并对四川白兔的育种改进具有重要意义。结论：本研究提供了四川白兔肌肉发育过程中转录组学变化的全面概述。DE基因、mirna和lncrna的鉴定和功能注释为这一过程背后的分子机制提供了有价值的见解。这些发现为有针对性地研究非编码rna在肌肉生物学中的作用铺平了道路。

{"title":"Comprehensive transcriptome of muscle development in Sichuan white rabbit.","authors":"Xiangyu Zhang, Kai Zhang, Dengping Huang, Shangjun Yang, Min Zhang, Qin Yin","doi":"10.1186/s12863-025-01322-5","DOIUrl":"https://doi.org/10.1186/s12863-025-01322-5","url":null,"abstract":"Background: The Sichuan white rabbit is a unique domestic breed and is famous for its high meat production. Muscle development is a complicated biological process, but the underlying regulatory mechanisms have not been elucidated. Here, we generated comprehensive transcriptome datasets (i.e., mRNAs, miRNAs and lncRNAs) in three developmental stages of Sichuan white rabbits, and aim to systematically explore the regulatory network in myogenesis.Results: We generated extensive transcriptome datasets (mRNAs, miRNAs and lncRNAs) revealing the myogenic regulatory network at different time points. Our differential expression analysis identified 2,995 DE genes, 1,211 DE-lncRNAs, and 305 DE-miRNAs with distinct expression patterns across developmental stages. In addition, functional enrichment analysis of DE mRNAs and miRNAs indicates their involvement in muscle growth, development, and regeneration, highlighting biological processes and muscle-specific functions. Interaction analysis between DE-lncRNAs and mRNAs uncovered a complex regulatory network, especially between 21 and 27 days of development. These findings contribute to better understanding of the transcriptomic changes during muscle development and have implications for breeding improvement in Sichuan white rabbits.Conclusions: Our study provides a comprehensive overview of the transcriptomic changes during muscle development in Sichuan white rabbits. The identification and functional annotation of DE genes, miRNAs, and lncRNAs provide valuable insights into the molecular mechanisms underlying this process. These findings pave the way for targeted investigations into the role of non-coding RNAs in muscle biology.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"32"},"PeriodicalIF":1.9,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12016129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144042139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Faroese sheep expand overall global ovine genetic diversity. 法罗羊扩大了全球绵羊的整体遗传多样性。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-04-18 DOI: 10.1186/s12863-025-01319-0

Eva Kjæld Hansen, Jens Ivan Í Gerðinum, Dag Inge Våge, Svein-Ole Mikalsen

Background: Faroese sheep have an unclear history. While it is assumed that the Vikings brought sheep to the Faroes, traces of pre-Viking age sheep are also found. Historical sources cite disasters in a period around year 1600 that essentially eradicated the sheep population, and subsequent imports from Iceland to the northern part of Faroes, and from Shetland and Orkneys to the southern part of Faroes. We have here investigated the genetic relationship of northern Faroe sheep with other breeds.

Results: A total of 359 sheep from four flocks from three Faroese islands (Streymoy, Eysturoy, Kalsoy) were genotyped using the GeneSeek Genomic Profiler Ovine 50K chip. The samples were clearly stratified into three groups corresponding to island of origin. This is likely due to the minimal transport of animals between the islands during extended periods of time. The Faroese samples were compared with the data from the Sheep HapMap database, representing breeds from different parts of the world, and, additionally, Norwegian White Sheep. The Northern European short-tailed breeds clearly stood out from the remaining global breeds, and Faroese sheep gained a peripheral position among the other North Atlantic short-tail breeds, with Icelandic sheep and Norwegian spael as their closest neighbors. The peripheral position suggests that the link to the surrounding breeds might be more distant than expected.

Conclusions: Despite known imports of sheep from neighboring countries after the year 1600, this is poorly reflected in the genotyping data. One possible explanation could be that the present-day Faroese sheep have an unbroken genetic link to the pre-year 1300 Faroese sheep (which possibly were a mix of old-Norse and old-British/Irish animals), regardless of the presumed post-year 1600 influence from other breeds in the North Atlantic region.

背景：法罗羊的历史不清楚。虽然人们认为维京人把羊带到了法罗群岛，但也发现了维京时代之前羊的痕迹。历史资料引用了1600年左右的灾难，基本上消灭了羊的数量，随后从冰岛进口到法罗群岛北部，从设得兰群岛和奥克尼群岛进口到法罗群岛南部。我们在这里调查了北法罗羊与其他品种的遗传关系。结果：采用GeneSeek genomics Profiler Ovine 50K芯片对法罗群岛（Streymoy、Eysturoy、Kalsoy） 4个羊群中的359只羊进行了基因分型。样本被清楚地分成三组，对应于原产岛。这可能是由于在很长一段时间内，岛屿之间的动物运输很少。法罗样本与Sheep HapMap数据库中的数据进行了比较，该数据库代表了来自世界不同地区的品种，此外还有挪威白羊。北欧短尾羊明显地从其他全球品种中脱颖而出，而法罗羊在其他北大西洋短尾羊中获得了边缘地位，冰岛羊和挪威猎犬是它们最近的邻居。外围位置表明，与周围品种的联系可能比预期的要远。结论：尽管已知1600年后从邻国进口了绵羊，但这在基因分型数据中反映不佳。一种可能的解释是，现在的法罗羊与1300年前的法罗羊（可能是古挪威和古英国/爱尔兰动物的混合体）有着不间断的遗传联系，而不考虑1600年后北大西洋地区其他品种的影响。

{"title":"Faroese sheep expand overall global ovine genetic diversity.","authors":"Eva Kjæld Hansen, Jens Ivan Í Gerðinum, Dag Inge Våge, Svein-Ole Mikalsen","doi":"10.1186/s12863-025-01319-0","DOIUrl":"https://doi.org/10.1186/s12863-025-01319-0","url":null,"abstract":"Background: Faroese sheep have an unclear history. While it is assumed that the Vikings brought sheep to the Faroes, traces of pre-Viking age sheep are also found. Historical sources cite disasters in a period around year 1600 that essentially eradicated the sheep population, and subsequent imports from Iceland to the northern part of Faroes, and from Shetland and Orkneys to the southern part of Faroes. We have here investigated the genetic relationship of northern Faroe sheep with other breeds.Results: A total of 359 sheep from four flocks from three Faroese islands (Streymoy, Eysturoy, Kalsoy) were genotyped using the GeneSeek Genomic Profiler Ovine 50K chip. The samples were clearly stratified into three groups corresponding to island of origin. This is likely due to the minimal transport of animals between the islands during extended periods of time. The Faroese samples were compared with the data from the Sheep HapMap database, representing breeds from different parts of the world, and, additionally, Norwegian White Sheep. The Northern European short-tailed breeds clearly stood out from the remaining global breeds, and Faroese sheep gained a peripheral position among the other North Atlantic short-tail breeds, with Icelandic sheep and Norwegian spael as their closest neighbors. The peripheral position suggests that the link to the surrounding breeds might be more distant than expected.Conclusions: Despite known imports of sheep from neighboring countries after the year 1600, this is poorly reflected in the genotyping data. One possible explanation could be that the present-day Faroese sheep have an unbroken genetic link to the pre-year 1300 Faroese sheep (which possibly were a mix of old-Norse and old-British/Irish animals), regardless of the presumed post-year 1600 influence from other breeds in the North Atlantic region.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"31"},"PeriodicalIF":1.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144058382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Construction of a comprehensive library of repeated sequences for the annotation of Citrus genomes. 柑橘基因组重复序列综合文库的构建。

IF 1.9 Q3 GENETICS & HEREDITY

BMC genomic data

Pub Date : 2025-04-18 DOI: 10.1186/s12863-025-01321-6

Delphine Giraud, Nathalie Choisne, Marilyne Summo, Stéphanie Sidibe-Bocs, Héléna Vassilieff, Gilles Costantino, Gaetan Droc, Pierre-Yves Teycheney, Florian Maumus, Patrick Ollitrault, François Luro

Background: The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.

Results: We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.

Conclusion: Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.

背景：基因组中重复序列的全面注释是研究这些序列随时间的动态及其参与基因调控的必要前提。目前，柑橘基因组中重复序列的多样性只是部分表征，因为这些注释是使用异质生物信息学工具进行的，每种工具都有其特异性，并且仅用于转座因子的注释。结果：我们结合了互补的重复查找程序，包括REPET， CAULIFINDER和TAREAN，能够识别植物基因组中发现的所有类型的重复序列，包括转座元件，内源性caulimovirids和卫星dna。本文首先采用细粒度注释方法，建立了一个共识序列文库，其中包含了医学C. medica、微甘菊C. micrantha、网状C. reticulata和最大C. maxima这四个祖先亲本物种基因组中重复序列的鉴定。第二种更快的注释方法是通过添加从其他柑橘物种和属于Aurantioideae亚科的近缘物种的基因组组装中检索到的新的重复序列来丰富数据集。最终的参考库包含3091个一致性序列，其中94.5%为转座因子。在柑橘属中首次鉴定了内源caulimovirids的多样性，为最终的参考文库提供了160个一致的序列。最后，还鉴定了10个卫星dna。结论：结合多种重复序列检测方法，可以全面标注柑橘基因组中所有重复序列。利用本工作报告的最终参考文库将提高我们对柑橘物种形成过程中重复序列动态的理解，特别是在基因组复制和杂交事件导致现代品种的过程中。利用所开发的网络界面RepeatLoc Citrus探索沿染色体的重复位置插入，也将进一步研究转座元件和内源性caulimovirids在柑橘物种基因组结构和基因调控中的作用。

{"title":"Construction of a comprehensive library of repeated sequences for the annotation of Citrus genomes.","authors":"Delphine Giraud, Nathalie Choisne, Marilyne Summo, Stéphanie Sidibe-Bocs, Héléna Vassilieff, Gilles Costantino, Gaetan Droc, Pierre-Yves Teycheney, Florian Maumus, Patrick Ollitrault, François Luro","doi":"10.1186/s12863-025-01321-6","DOIUrl":"https://doi.org/10.1186/s12863-025-01321-6","url":null,"abstract":"Background: The comprehensive annotation of repeated sequences in genomes is an essential prerequisite for studying the dynamics of these sequences over time and their involvement in gene regulation. Currently, the diversity of repeated sequences in Citrus genomes is only partially characterized because the annotations have been performed using heterogeneous bioinformatics tools, each with its specificity and dedicated only to the annotation of transposable elements.Results: We combined complementary repeat-finding programs including REPET, CAULIFINDER, and TAREAN, to enable the identification of all types of repetitive sequences found in plant genomes, including transposable elements, endogenous caulimovirids, and satellite DNAs. A fine-grained annotation method was first developed to create a consensus sequence library of repeated sequences identified in the genome assemblies of C. medica, C. micrantha, C. reticulata, and C. maxima, the four ancestral parental species involved in the formation of economically valuable cultivated Citrus varieties. A second, faster annotation method was developed to enrich the dataset by adding new repeated sequences retrieved from genome assemblies of other Citrus species and closely related species belonging to the Aurantioideae subfamily. The final reference library contains 3,091 consensus sequences, of which 94.5% are transposable elements. The diversity of endogenous caulimovirids was characterized for the first time within the genus Citrus, contributing 160 consensus sequences to the final reference library. Finally, 10 satellite DNAs were also identified.Conclusion: Combining multiple repeat detection methods enables the comprehensive annotation of all repeated sequences in Citrus genomes. Using the final reference library reported in this work will improve our understanding of the dynamics of repeated sequences during Citrus speciation, particularly following the genome duplication and hybridization events that led to modern cultivars. The exploration of repeat position insertions along chromosomes using the developed web interface, RepeatLoc Citrus, will also make it possible to further investigate the role of transposable elements and endogenous caulimovirids in genome structure and gene regulation in Citrus species.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"26 1","pages":"30"},"PeriodicalIF":1.9,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0