首页 > 最新文献

Genomics & informatics最新文献

英文 中文
Compression rates of microbial genomes are associated with genome size and base composition. 微生物基因组的压缩率与基因组大小和碱基组成有关。
Pub Date : 2024-10-10 DOI: 10.1186/s44342-024-00018-z
Jon Bohlin, John H-O Pettersson

Background: To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.

Results: We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.

Conclusion: As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.

背景介绍一串符号的压缩程度揭示了其复杂性的重要细节。例如,不可压缩的字符串是随机的,信息潜力低,而高度可压缩的字符串则相反。由于微生物基因组在大小和碱基组成方面存在很大差异,因此我们探讨了微生物基因组在多大程度上适合压缩。例如,微生物基因组的大小从共生体中的不到 10 万个碱基对到土壤中的超过 1 千万个碱基对不等。由于腺嘌呤和胸腺嘧啶以及胞嘧啶和鸟嘌呤的频率相似,基因组碱基组成通常被概括为基因组 AT 或 GC 含量。碱基组成决定了由多个核苷酸或寡核苷酸组成的 DNA 词的频率,因此也可能影响可压缩性。我们利用 4,713 个 RefSeq 基因组,采用基于 DNA 的压缩算法(MBGC)和通用压缩算法(ZPAQ),使用广义加性模型研究了可压缩性与基因组大小、AT 含量以及基因组寡核苷酸使用方差(OUV)之间的关系:结果:我们发现基因组大小(p由于缺乏可压缩性等同于随机性,我们的研究结果表明,较小和富含 AT 的基因组可能比较大和 AT 贫乏的基因组平均积累了更多的随机突变,而较大和 AT 贫乏的基因组反过来又显著增加了冗余。此外,我们还发现 OUV 是微生物基因组可压缩性的有力代表。我们发现 ZPAQ 压缩器与 MBGC 压缩器的结果一致,只是在富 AT 和 AT 贫瘠/富 GC 基因组的可压缩性方面表现较差。
{"title":"Compression rates of microbial genomes are associated with genome size and base composition.","authors":"Jon Bohlin, John H-O Pettersson","doi":"10.1186/s44342-024-00018-z","DOIUrl":"10.1186/s44342-024-00018-z","url":null,"abstract":"<p><strong>Background: </strong>To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.</p><p><strong>Results: </strong>We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.</p><p><strong>Conclusion: </strong>As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A prediction of mutations in infectious viruses using artificial intelligence. 利用人工智能预测传染性病毒的突变。
Pub Date : 2024-10-08 DOI: 10.1186/s44342-024-00019-y
Won Jong Choi, Jongkeun Park, Do Young Seong, Dae Sun Chung, Dongwan Hong

Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.

自早期阶段以来,SARS-CoV-2 出现了许多亚型,其变异表现出地区和种族差异。这些变异极大地影响了病毒的传染性和严重程度。本研究旨在预测 SARS-CoV-2 演变过程中出现的变异,并找出预测变异的关键特征。我们从公开的数据库中收集并整理了有关 SARS-CoV-2 世系、日期、支系和变异的数据,并对这些数据进行了处理,以预测变异。此外,我们还利用各种人工智能模型来预测新出现的突变,并根据支系信息创建了各种训练集。只使用突变信息会导致学习模型的性能低下,而加入支系分化则会导致机器学习模型(包括 XGBoost)的性能提高(准确率:0.999)。然而,固定在 Omicron 的受体结合基序(RBM)区域的突变导致预测性能下降。利用这些模型,我们按照最近出现的 24A 和 24B 支系预测了 24C 的潜在突变位置。我们在 RBM 区域的 Q493 位置发现了一个突变。我们的研究为预测不断进化的传染性病毒的新突变开发了有效的人工智能模型和特征。
{"title":"A prediction of mutations in infectious viruses using artificial intelligence.","authors":"Won Jong Choi, Jongkeun Park, Do Young Seong, Dae Sun Chung, Dongwan Hong","doi":"10.1186/s44342-024-00019-y","DOIUrl":"10.1186/s44342-024-00019-y","url":null,"abstract":"<p><p>Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of the technology used for structural characterization of the GMO genome using NGS data. 回顾利用 NGS 数据对转基因生物基因组进行结构表征的技术。
Pub Date : 2024-10-02 DOI: 10.1186/s44342-024-00016-1
Kahee Moon, Prakash Basnet, Taeyoung Um, Ik-Young Choi

The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.

转基因生物(GMO)的分子特征描述对于确保安全和获得商业化监管批准至关重要。根据 CODEX 标准,这种表征包括评估引入基因的存在、插入位点、拷贝数和核苷酸序列结构。随着技术的进步,下一代测序(NGS)的使用已超过 Southern 印迹等传统方法。虽然这两种方法都具有较高的可重复性和准确性,但 Southern 印迹法需要对每个靶点进行重复探针设计和分析,耗费大量人力和时间,因此通量较低。相反,NGS 通过将全基因组测序(WGS)数据映射到质粒序列,准确识别 T-DNA 插入位点和侧翼区域,有助于进行快速、全面的分析。这一优势可有效检测 T-DNA 的存在、拷贝数和非预期基因插入,而无需额外的探针工作。本文回顾了利用 NGS 进行转基因生物基因组鉴定的现状,并为此提出了更有效的策略。
{"title":"Review of the technology used for structural characterization of the GMO genome using NGS data.","authors":"Kahee Moon, Prakash Basnet, Taeyoung Um, Ik-Young Choi","doi":"10.1186/s44342-024-00016-1","DOIUrl":"10.1186/s44342-024-00016-1","url":null,"abstract":"<p><p>The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Antibiotic resistance challenge: evaluating anthraquinones as rifampicin monooxygenase inhibitors through integrated bioinformatics analysis. 抗生素耐药性挑战:通过综合生物信息学分析评估作为利福平单氧化酶抑制剂的蒽醌类化合物。
Pub Date : 2024-09-04 DOI: 10.1186/s44342-024-00015-2
Mohammad Reza Arabestani, Masoumeh Saadat, Amir Taherkhani

Objective: Antibiotic resistance poses a pressing and crucial global public health challenge, leading to significant clinical and health-related consequences. Substantial evidence highlights the pivotal involvement of rifampicin monooxygenase (RIFMO) in the context of antibiotic resistance. Hence, inhibiting RIFMO could offer potential in the treatment of various infections. Anthraquinones, a group of organic compounds, have shown promise in addressing tuberculosis. This study employed integrated bioinformatics approaches to evaluate the potential inhibitory effects of a selection of anthraquinones on RIFMO. The findings were subsequently compared with those of rifampicin (RIF), serving as a positive control inhibitor.

Methods: The AutoDock 4.0 tool assessed the binding free energy between 21 anthraquinones and the RIFMO catalytic cleft. The ligands were ranked based on the most favorable scores derived from ΔGbinding. The docking analyses for the highest-ranked anthraquinone and RIF underwent a cross-validation process. This validation procedure utilized the SwissDock server and the Schrödinger Maestro docking software. Molecular dynamics simulations were conducted to scrutinize the stability of the backbone atoms in free RIFMO, RIFMO-RIF, and RIFMO complexed with the top-ranked anthraquinone throughout a 100-ns computer simulation. The Discovery Studio Visualizer tool visualized interactions between RIFMO residues and ligands. An evaluation of the pharmacokinetics and toxicity profiles of the tested compounds was also conducted.

Results: Five anthraquinones were indicated with ΔGbinding scores less than - 10 kcal/mol. Hypericin emerged as the most potent RIFMO inhibitor, boasting a ΔGbinding score and inhibition constant value of - 12.11 kcal/mol and 798.99 pM, respectively. The agreement across AutoDock 4.0, SwissDock, and Schrödinger Maestro results highlighted hypericin's notable binding affinity to the RIFMO catalytic cleft. The RIFMO-hypericin complex achieved stability after a 70-ns computer simulation, exhibiting a root-mean-square deviation of 0.55 nm. Oral bioavailability analysis revealed that all anthraquinones except hypericin, sennidin A, and sennidin B may be suitable for oral administration. Furthermore, the carcinogenicity prediction analysis indicated a favorable safety profile for all examined anthraquinones.

Conclusion: Inhibiting RIFMO, particularly with anthraquinones such as hypericin, holds promise as a potential therapeutic strategy for infectious diseases.

目的:抗生素耐药性是一项紧迫而严峻的全球公共卫生挑战,会导致严重的临床和健康后果。大量证据表明,利福平单加氧酶(RIFMO)在抗生素耐药性中起着关键作用。因此,抑制 RIFMO 有助于治疗各种感染。蒽醌是一类有机化合物,已显示出治疗结核病的前景。本研究采用了综合生物信息学方法来评估精选蒽醌类化合物对 RIFMO 的潜在抑制作用。研究结果随后与作为阳性对照抑制剂的利福平(RIF)进行了比较:方法:AutoDock 4.0工具评估了21种蒽醌类化合物与RIFMO催化裂隙之间的结合自由能。根据ΔG结合得出的最有利得分对配体进行排序。对排名最高的蒽醌与 RIF 的对接分析进行了交叉验证。这一验证过程使用了 SwissDock 服务器和 Schrödinger Maestro 对接软件。在整个 100-ns 的计算机模拟过程中,对自由 RIFMO、RIFMO-RIF 和 RIFMO 与排名第一的蒽醌复合物的骨架原子的稳定性进行了分子动力学模拟。Discovery Studio Visualizer工具将RIFMO残基与配体之间的相互作用可视化。此外,还对测试化合物的药代动力学和毒性特征进行了评估:结果表明,有五个蒽醌类化合物的ΔG结合分数小于-10 kcal/mol。金丝桃素是最有效的 RIFMO 抑制剂,其 ΔGbinding 分数和抑制常数值分别为 - 12.11 kcal/mol 和 798.99 pM。AutoDock 4.0、SwissDock 和 Schrödinger Maestro 的结果一致,突出表明金丝桃素与 RIFMO 催化裂隙的结合亲和力很强。RIFMO-hypericin 复合物在 70 秒的计算机模拟后达到了稳定,均方根偏差为 0.55 nm。口服生物利用度分析表明,除金丝桃素、番泻甙 A 和番泻甙 B 外,所有蒽醌类化合物都适合口服。此外,致癌性预测分析表明,所有研究的蒽醌类化合物都具有良好的安全性:结论:抑制 RIFMO(尤其是使用金丝桃素等蒽醌类化合物)有望成为一种潜在的传染病治疗策略。
{"title":"Antibiotic resistance challenge: evaluating anthraquinones as rifampicin monooxygenase inhibitors through integrated bioinformatics analysis.","authors":"Mohammad Reza Arabestani, Masoumeh Saadat, Amir Taherkhani","doi":"10.1186/s44342-024-00015-2","DOIUrl":"10.1186/s44342-024-00015-2","url":null,"abstract":"<p><strong>Objective: </strong>Antibiotic resistance poses a pressing and crucial global public health challenge, leading to significant clinical and health-related consequences. Substantial evidence highlights the pivotal involvement of rifampicin monooxygenase (RIFMO) in the context of antibiotic resistance. Hence, inhibiting RIFMO could offer potential in the treatment of various infections. Anthraquinones, a group of organic compounds, have shown promise in addressing tuberculosis. This study employed integrated bioinformatics approaches to evaluate the potential inhibitory effects of a selection of anthraquinones on RIFMO. The findings were subsequently compared with those of rifampicin (RIF), serving as a positive control inhibitor.</p><p><strong>Methods: </strong>The AutoDock 4.0 tool assessed the binding free energy between 21 anthraquinones and the RIFMO catalytic cleft. The ligands were ranked based on the most favorable scores derived from ΔG<sub>binding</sub>. The docking analyses for the highest-ranked anthraquinone and RIF underwent a cross-validation process. This validation procedure utilized the SwissDock server and the Schrödinger Maestro docking software. Molecular dynamics simulations were conducted to scrutinize the stability of the backbone atoms in free RIFMO, RIFMO-RIF, and RIFMO complexed with the top-ranked anthraquinone throughout a 100-ns computer simulation. The Discovery Studio Visualizer tool visualized interactions between RIFMO residues and ligands. An evaluation of the pharmacokinetics and toxicity profiles of the tested compounds was also conducted.</p><p><strong>Results: </strong>Five anthraquinones were indicated with ΔG<sub>binding</sub> scores less than - 10 kcal/mol. Hypericin emerged as the most potent RIFMO inhibitor, boasting a ΔG<sub>binding</sub> score and inhibition constant value of - 12.11 kcal/mol and 798.99 pM, respectively. The agreement across AutoDock 4.0, SwissDock, and Schrödinger Maestro results highlighted hypericin's notable binding affinity to the RIFMO catalytic cleft. The RIFMO-hypericin complex achieved stability after a 70-ns computer simulation, exhibiting a root-mean-square deviation of 0.55 nm. Oral bioavailability analysis revealed that all anthraquinones except hypericin, sennidin A, and sennidin B may be suitable for oral administration. Furthermore, the carcinogenicity prediction analysis indicated a favorable safety profile for all examined anthraquinones.</p><p><strong>Conclusion: </strong>Inhibiting RIFMO, particularly with anthraquinones such as hypericin, holds promise as a potential therapeutic strategy for infectious diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shared alleles and genetic structures in different Thai domestic cat breeds: the possible influence of common racial origins. 泰国不同家猫品种的共同等位基因和遗传结构:共同种族起源的可能影响。
Pub Date : 2024-07-31 DOI: 10.1186/s44342-024-00013-4
Wattanawan Jaito, Worapong Singchat, Chananya Patta, Chadaphon Thatukan, Nichakorn Kumnan, Piangjai Chalermwong, Trifan Budi, Thitipong Panthum, Wongsathit Wongloet, Pish Wattanadilokchatkun, Thanyapat Thong, Narongrit Muangmai, Kyudong Han, Prateep Duengkae, Rattanin Phatcharakullawarawat, Kornsorn Srikulnath

Over hundreds of years, cats have been domesticated and selectively bred, resulting in numerous pedigreed breeds expedited by recent cat shows and breeding associations. Concerns have been raised about the limited breeding options and the genetic implications of inbreeding, indicating challenges in maintaining genetic diversity and accurate identification in purebred cats. In this study, genetic variability and structure were examined in 5 Thai domestic cat breeds using 15 microsatellite markers and mitochondrial DNA (mtDNA) D-loop sequencing. In total, 184 samples representing the Wichien Maat (WCM), Suphalak (SL), Khao-Manee (KM), Korat (KR), and Konja (KJ) breeds were analyzed. High genetic diversity (Ho and He > 0.5) was observed in all breeds, and mtDNA analysis revealed two primary haplogroups (A and B) that were shared among all domestic cat breeds in Thailand and globally. However, minor differences were observed between Thai domestic cat breeds based on clustering analyses, in which a distinct genetic structure was observed in the WCM breed. This suggests that allele fixation for distinctive morphological traits has occurred in Thai domestic cat breeds that emerged in isolated regions with shared racial origins. Analysis of relationships among individuals within the breed revealed high identification efficiency in Thai domestic cat breeds (P(ID)sibs < 10-4). Additionally, diverse and effective individual identification can be ensured by optimizing marker efficiency by using only nine loci. This comprehensive genetic characterization provides valuable insights into conservation strategies and breeding practices for Thai domestic cat breeds.

数百年来,猫一直被驯化并进行选择性繁殖,由此产生了许多纯种品种,并在最近的猫展和繁殖协会中得到了加速发展。人们对有限的育种选择和近亲繁殖的遗传影响表示担忧,这表明保持纯种猫的遗传多样性和准确鉴定面临挑战。本研究使用 15 个微卫星标记和线粒体 DNA(mtDNA)D-环测序对 5 个泰国家猫品种的遗传变异性和结构进行了研究。共分析了 184 个样本,分别代表 Wichien Maat (WCM)、Suphalak (SL)、Khao-Manee (KM)、Korat (KR) 和 Konja (KJ) 品种。所有品种的遗传多样性都很高(Ho 和 He > 0.5),mtDNA 分析显示,泰国和全球所有家猫品种都有两个主要单倍群组(A 和 B)。然而,根据聚类分析,泰国家猫品种之间存在细微差别,其中在 WCM 品种中观察到独特的遗传结构。这表明,在具有共同种族起源的孤立地区出现的泰国家猫品种中,出现了独特形态特征的等位基因固定现象。对品种内个体间关系的分析表明,泰国家猫品种的识别效率很高(P(ID)sibs -4)。此外,通过优化标记效率,仅使用九个位点就能确保对不同个体进行有效识别。这一全面的遗传特性分析为泰国家猫品种的保护策略和育种实践提供了宝贵的见解。
{"title":"Shared alleles and genetic structures in different Thai domestic cat breeds: the possible influence of common racial origins.","authors":"Wattanawan Jaito, Worapong Singchat, Chananya Patta, Chadaphon Thatukan, Nichakorn Kumnan, Piangjai Chalermwong, Trifan Budi, Thitipong Panthum, Wongsathit Wongloet, Pish Wattanadilokchatkun, Thanyapat Thong, Narongrit Muangmai, Kyudong Han, Prateep Duengkae, Rattanin Phatcharakullawarawat, Kornsorn Srikulnath","doi":"10.1186/s44342-024-00013-4","DOIUrl":"10.1186/s44342-024-00013-4","url":null,"abstract":"<p><p>Over hundreds of years, cats have been domesticated and selectively bred, resulting in numerous pedigreed breeds expedited by recent cat shows and breeding associations. Concerns have been raised about the limited breeding options and the genetic implications of inbreeding, indicating challenges in maintaining genetic diversity and accurate identification in purebred cats. In this study, genetic variability and structure were examined in 5 Thai domestic cat breeds using 15 microsatellite markers and mitochondrial DNA (mtDNA) D-loop sequencing. In total, 184 samples representing the Wichien Maat (WCM), Suphalak (SL), Khao-Manee (KM), Korat (KR), and Konja (KJ) breeds were analyzed. High genetic diversity (H<sub>o</sub> and H<sub>e</sub> > 0.5) was observed in all breeds, and mtDNA analysis revealed two primary haplogroups (A and B) that were shared among all domestic cat breeds in Thailand and globally. However, minor differences were observed between Thai domestic cat breeds based on clustering analyses, in which a distinct genetic structure was observed in the WCM breed. This suggests that allele fixation for distinctive morphological traits has occurred in Thai domestic cat breeds that emerged in isolated regions with shared racial origins. Analysis of relationships among individuals within the breed revealed high identification efficiency in Thai domestic cat breeds (P<sub>(ID)sibs</sub> < 10<sup>-4</sup>). Additionally, diverse and effective individual identification can be ensured by optimizing marker efficiency by using only nine loci. This comprehensive genetic characterization provides valuable insights into conservation strategies and breeding practices for Thai domestic cat breeds.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11292921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetic diversity and natural selection analysis of VAR2CSA and vir genes: implication for vaccine development. VAR2CSA 和 vir 基因的遗传多样性和自然选择分析:对疫苗开发的影响。
Pub Date : 2024-07-15 DOI: 10.1186/s44342-024-00009-0
Joseph Hawadak, Aditi Arya, Shewta Chaudhry, Vineeta Singh

Variable surface antigens (VSAs) encoded by var and vir genes in Plasmodium falciparum and Plasmodium vivax, respectively, are known to be involved in malaria pathogenesis and host immune escape through antigenic variations. Knowledge of the genetic diversity of these antigens is essential for malaria control and effective vaccine development. In this study, we analysed the genetic diversity and evolutionary patterns of two fragments (DBL2X and DBL3X) of VAR2CSA gene and four vir genes (vir 4, vir 12, vir 21 and vir 27) from different endemic regions, including Southeast Asia and sub-Saharan Africa. High levels of segregating sites (S) and haplotype diversity (Hd) were observed in both var and vir genes. Among vir genes, vir 12 (S = 131, Hd = 0.996) and vir 21 (S = 171, Hd = 892) were found to be more diverse as compared to vir 4 (S = 11, Hd = 0.748) and vir 27 (S = 23, Hd = 0.814). DBL2X (S = 99, Hd = 0.996) and DBL3X (S = 307, Hd = 0.999) fragments showed higher genetic diversity. Our analysis indicates that var and vir genes are highly diverse and follow the similar evolutionary pattern globally. Some codons showed signatures of positive or negative selection pressure, but vir and var genes are likely to be under balancing selection. This study highlights the high variability of var and vir genes and underlines the need of functional experimental studies to determine the most relevant allelic forms for effective progress towards vaccine formulation and testing.

众所周知,恶性疟原虫和间日疟原虫的变异表面抗原(VSAs)分别由 var 和 vir 基因编码,它们通过抗原变异参与疟疾发病机制和宿主免疫逃逸。了解这些抗原的遗传多样性对于疟疾控制和有效疫苗开发至关重要。在这项研究中,我们分析了来自东南亚和撒哈拉以南非洲等不同疟疾流行地区的 VAR2CSA 基因的两个片段(DBL2X 和 DBL3X)和四个 vir 基因(vir 4、vir 12、vir 21 和 vir 27)的遗传多样性和进化模式。在var和vir基因中都观察到了高水平的分离位点(S)和单体型多样性(Hd)。在 vir 基因中,vir 12(S = 131,Hd = 0.996)和 vir 21(S = 171,Hd = 892)的多样性高于 vir 4(S = 11,Hd = 0.748)和 vir 27(S = 23,Hd = 0.814)。DBL2X(S = 99,Hd = 0.996)和 DBL3X(S = 307,Hd = 0.999)片段显示出更高的遗传多样性。我们的分析表明,var 和 vir 基因具有高度的多样性,并在全球范围内遵循相似的进化模式。一些密码子显示出正向或负向选择压力的特征,但 vir 和 var 基因很可能处于平衡选择下。这项研究凸显了var和vir基因的高度变异性,并强调有必要进行功能实验研究,以确定最相关的等位基因形式,从而有效推进疫苗的研发和测试。
{"title":"Genetic diversity and natural selection analysis of VAR2CSA and vir genes: implication for vaccine development.","authors":"Joseph Hawadak, Aditi Arya, Shewta Chaudhry, Vineeta Singh","doi":"10.1186/s44342-024-00009-0","DOIUrl":"10.1186/s44342-024-00009-0","url":null,"abstract":"<p><p>Variable surface antigens (VSAs) encoded by var and vir genes in Plasmodium falciparum and Plasmodium vivax, respectively, are known to be involved in malaria pathogenesis and host immune escape through antigenic variations. Knowledge of the genetic diversity of these antigens is essential for malaria control and effective vaccine development. In this study, we analysed the genetic diversity and evolutionary patterns of two fragments (DBL2X and DBL3X) of VAR2CSA gene and four vir genes (vir 4, vir 12, vir 21 and vir 27) from different endemic regions, including Southeast Asia and sub-Saharan Africa. High levels of segregating sites (S) and haplotype diversity (Hd) were observed in both var and vir genes. Among vir genes, vir 12 (S = 131, Hd = 0.996) and vir 21 (S = 171, Hd = 892) were found to be more diverse as compared to vir 4 (S = 11, Hd = 0.748) and vir 27 (S = 23, Hd = 0.814). DBL2X (S = 99, Hd = 0.996) and DBL3X (S = 307, Hd = 0.999) fragments showed higher genetic diversity. Our analysis indicates that var and vir genes are highly diverse and follow the similar evolutionary pattern globally. Some codons showed signatures of positive or negative selection pressure, but vir and var genes are likely to be under balancing selection. This study highlights the high variability of var and vir genes and underlines the need of functional experimental studies to determine the most relevant allelic forms for effective progress towards vaccine formulation and testing.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141622092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of common genetic factors and immune-related pathways associating more than two autoimmune disorders: implications on risk, diagnosis, and treatment. 确定与两种以上自身免疫性疾病相关的共同遗传因素和免疫相关途径:对风险、诊断和治疗的影响。
Pub Date : 2024-07-02 DOI: 10.1186/s44342-024-00004-5
Aruna Rajalingam, Anjali Ganjiwale

Autoimmune disorders (ADs) are chronic conditions resulting from failure or breakdown of immunological tolerance, resulting in the host immune system attacking its cells or tissues. Recent studies report shared effects, mechanisms, and evolutionary origins among ADs; however, the possible factors connecting them are unknown. This study attempts to identify gene signatures commonly shared between different autoimmune disorders and elucidate their molecular pathways linking the pathogenesis of these ADs using an integrated gene expression approach. We employed differential gene expression analysis across 19 datasets of whole blood/peripheral blood cell samples with five different autoimmune disorders (rheumatoid arthritis, multiple sclerosis, systemic lupus erythematosus, Crohn's disease, and type 1 diabetes) to get nine key genes-EGR1, RUNX3, SMAD7, NAMPT, S100A9, S100A8, CYBB, GATA2, and MCEMP1 that were primarily involved in cell and leukocyte activation, leukocyte mediated immunity, IL-17, AGE-RAGE signaling in diabetic complications, prion disease, and NOD-like receptor signaling confirming its role in immune-related pathways. Combined with biological interpretations such as gene ontology (GO), pathway enrichment, and protein-protein interaction (PPI) network, our current study sheds light on the in-depth research on early detection, diagnosis, and prognosis of different ADs.

自身免疫性疾病(ADs)是由于免疫耐受失败或崩溃,导致宿主免疫系统攻击其细胞或组织而引起的慢性疾病。最近的研究报告称,自体免疫疾病之间存在共同的效应、机制和进化起源;然而,连接这些疾病的可能因素尚不清楚。本研究试图利用一种综合基因表达方法,识别不同自身免疫性疾病之间常见的基因特征,并阐明其分子通路与这些自身免疫性疾病的发病机制之间的联系。我们在五个不同自身免疫性疾病(类风湿性关节炎、多发性硬化症、系统性红斑狼疮、克罗恩病和 1 型糖尿病)的全血/外周血细胞样本的 19 个数据集中采用了差异基因表达分析,得到了九个关键基因--EGR1、RUNX3、SMAD7、NAMPT、S100A9、S100A8、CYBB、GATA2 和 MCEMP1,这些基因主要参与细胞和白细胞活化、白细胞介导的免疫、IL-17、糖尿病并发症中的 AGE-RAGE 信号转导、朊病毒病和 NOD 样受体信号转导,证实了其在免疫相关通路中的作用。结合基因本体(GO)、通路富集和蛋白相互作用(PPI)网络等生物学解释,我们目前的研究揭示了不同AD的早期检测、诊断和预后的深入研究。
{"title":"Identification of common genetic factors and immune-related pathways associating more than two autoimmune disorders: implications on risk, diagnosis, and treatment.","authors":"Aruna Rajalingam, Anjali Ganjiwale","doi":"10.1186/s44342-024-00004-5","DOIUrl":"10.1186/s44342-024-00004-5","url":null,"abstract":"<p><p>Autoimmune disorders (ADs) are chronic conditions resulting from failure or breakdown of immunological tolerance, resulting in the host immune system attacking its cells or tissues. Recent studies report shared effects, mechanisms, and evolutionary origins among ADs; however, the possible factors connecting them are unknown. This study attempts to identify gene signatures commonly shared between different autoimmune disorders and elucidate their molecular pathways linking the pathogenesis of these ADs using an integrated gene expression approach. We employed differential gene expression analysis across 19 datasets of whole blood/peripheral blood cell samples with five different autoimmune disorders (rheumatoid arthritis, multiple sclerosis, systemic lupus erythematosus, Crohn's disease, and type 1 diabetes) to get nine key genes-EGR1, RUNX3, SMAD7, NAMPT, S100A9, S100A8, CYBB, GATA2, and MCEMP1 that were primarily involved in cell and leukocyte activation, leukocyte mediated immunity, IL-17, AGE-RAGE signaling in diabetic complications, prion disease, and NOD-like receptor signaling confirming its role in immune-related pathways. Combined with biological interpretations such as gene ontology (GO), pathway enrichment, and protein-protein interaction (PPI) network, our current study sheds light on the in-depth research on early detection, diagnosis, and prognosis of different ADs.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11221123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring molecular targets: herbal isolates in cervical cancer therapy. 探索分子靶点:宫颈癌治疗中的草药分离物。
Pub Date : 2024-06-26 DOI: 10.1186/s44342-024-00008-1
Maryam Ahmadi, Razieh Abdollahi, Marzieh Otogara, Amir Taherkhani

Objective: Cervical cancer (CxCa) stands as a significant global health challenge, ranking fourth in cancer-related mortality among the female population. While chemotherapy regimens have demonstrated incremental progress in extending overall survival, the outlook for recurrent CxCa patients remains disheartening. An imperative necessity arises to delve into innovative therapeutic avenues, with molecular targeted therapy emerging as a promising candidate. Previous investigations have shed light on the therapeutic effectiveness of five distinct herbal compounds, epicatechin, curcumin, myricetin, jatrorrhizine, and arborinine, within the context of CxCa.

Methods: A systems biology approach was employed to discern differentially expressed genes (DEGs) in CxCa tissues relative to healthy cervical epithelial tissues. A protein-protein interaction network (PPIN) was constructed, anchored in the genes related to CxCa. The central genes were discerned within the PPIN, and Kaplan-Meier survival curves explored their prognostic significance. An assessment of the binding affinity of the selected herbal compounds to the master regulator of prognostic markers in CxCa was conducted.

Results: A significant correlation between the overexpression of MYC, IL6, JUN, RRM2, and VEGFA and an adverse prognosis in CxCa was indicated. The regulation of these markers is notably influenced by the transcription factor CEBPD. Molecular docking analysis indicated that the binding affinity between myricetin and the CEBPD DNA binding site was robust.

Conclusion: The findings presented herein have unveiled pivotal genes and pathways that play a central role in the malignant transformation of CxCa. CEBPD has emerged as a potential target for harnessing the therapeutic potential of myricetin in this context.

目的:宫颈癌(CxCa)是全球健康面临的重大挑战,在女性癌症相关死亡率中排名第四。虽然化疗方案在延长总生存期方面取得了逐步进展,但复发性宫颈癌患者的前景仍然令人沮丧。当务之急是探索创新的治疗途径,而分子靶向治疗则是一种前景广阔的候选疗法。之前的研究揭示了表儿茶素、姜黄素、没食子酸、药根碱和乔木碱这五种不同草药化合物对 CxCa 的治疗效果:方法:研究人员采用系统生物学方法鉴别 CxCa 组织中相对于健康宫颈上皮组织的差异表达基因(DEGs)。以与 CxCa 相关的基因为基础,构建了蛋白质-蛋白质相互作用网络(PPIN)。在 PPIN 中发现了中心基因,并通过 Kaplan-Meier 生存曲线探讨了其预后意义。对所选草药化合物与 CxCa 预后标志物主调节因子的结合亲和力进行了评估:结果:MYC、IL6、JUN、RRM2 和 VEGFA 的过表达与 CxCa 的不良预后之间存在明显的相关性。这些标志物的调控明显受到转录因子 CEBPD 的影响。分子对接分析表明,myricetin 与 CEBPD DNA 结合位点的结合亲和力很强:本文的研究结果揭示了在 CxCa 恶性转化过程中起核心作用的关键基因和通路。CEBPD 已成为利用 myricetin 治疗潜力的潜在靶点。
{"title":"Exploring molecular targets: herbal isolates in cervical cancer therapy.","authors":"Maryam Ahmadi, Razieh Abdollahi, Marzieh Otogara, Amir Taherkhani","doi":"10.1186/s44342-024-00008-1","DOIUrl":"10.1186/s44342-024-00008-1","url":null,"abstract":"<p><strong>Objective: </strong>Cervical cancer (CxCa) stands as a significant global health challenge, ranking fourth in cancer-related mortality among the female population. While chemotherapy regimens have demonstrated incremental progress in extending overall survival, the outlook for recurrent CxCa patients remains disheartening. An imperative necessity arises to delve into innovative therapeutic avenues, with molecular targeted therapy emerging as a promising candidate. Previous investigations have shed light on the therapeutic effectiveness of five distinct herbal compounds, epicatechin, curcumin, myricetin, jatrorrhizine, and arborinine, within the context of CxCa.</p><p><strong>Methods: </strong>A systems biology approach was employed to discern differentially expressed genes (DEGs) in CxCa tissues relative to healthy cervical epithelial tissues. A protein-protein interaction network (PPIN) was constructed, anchored in the genes related to CxCa. The central genes were discerned within the PPIN, and Kaplan-Meier survival curves explored their prognostic significance. An assessment of the binding affinity of the selected herbal compounds to the master regulator of prognostic markers in CxCa was conducted.</p><p><strong>Results: </strong>A significant correlation between the overexpression of MYC, IL6, JUN, RRM2, and VEGFA and an adverse prognosis in CxCa was indicated. The regulation of these markers is notably influenced by the transcription factor CEBPD. Molecular docking analysis indicated that the binding affinity between myricetin and the CEBPD DNA binding site was robust.</p><p><strong>Conclusion: </strong>The findings presented herein have unveiled pivotal genes and pathways that play a central role in the malignant transformation of CxCa. CEBPD has emerged as a potential target for harnessing the therapeutic potential of myricetin in this context.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11201312/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation of missense mutation-related type 1 diabetes mellitus through integrating genomic databases and bioinformatic approach. 通过整合基因组数据库和生物信息学方法研究与错义突变相关的 1 型糖尿病。
Pub Date : 2024-06-26 DOI: 10.1186/s44342-024-00005-4
Dyonisa Nasirochmi Pakha, Ratih Dewi Yudhani, Lalu Muhammad Irham

Though genes are already known to be responsible for type 1 diabetes mellitus (T1DM), the knowledge of missense mutation of that disease gene has still to be under covered. A genomic database and a bioinformatics-based approach are integrated in the present study in order to address this issue. Initially, nine variants associated with T1DM were retrieved from the GWAS catalogue. Different genomic algorithms such as PolyPhen2.0, SNPs and GTEx analyser programs were used to study the structural and functional effects of these mutations. Subsequently, SNPnexus was also employed to understand the effect of these mutations on the function of the expressed protein. Nine missense variants of T1DM were identified using the GWAS catalogue database. Among these nine SNPs, three were predicted to be related to the progression of T1DM disease by affecting the protein level. TYK2 gene variants with SNP rs34536443 were thought to have a probably damaging effect. Meanwhile, both COL4A3 and IFIH1 genes with SNPs rs55703767 and rs35667974, respectively, might alter protein function through a possibly damaging prediction. Among the variants of the three genes, the TYK2 gene with SNP rs34536443 had the strongest contribution in affecting the development of T1DM, with a score of 0.999. We sincerely hope that the results could be of immense importance in understanding the genetic basis of T1DM.

尽管人们已经知道 1 型糖尿病(T1DM)的致病基因,但对该疾病基因的错义突变的了解仍然不足。为了解决这个问题,本研究整合了基因组数据库和基于生物信息学的方法。最初,研究人员从 GWAS 目录中检索到九个与 T1DM 相关的变异基因。研究人员使用了不同的基因组学算法,如 PolyPhen2.0、SNPs 和 GTEx 分析程序来研究这些变异的结构和功能效应。随后,SNPnexus 也被用来了解这些突变对表达蛋白功能的影响。利用 GWAS 目录数据库确定了九个 T1DM 的错义变异。在这九个 SNPs 中,有三个被预测与 T1DM 疾病的进展有关,因为它们会影响蛋白质水平。带有 SNP rs34536443 的 TYK2 基因变异被认为可能具有损伤作用。同时,COL4A3 和 IFIH1 基因的 SNPs rs55703767 和 rs35667974 都可能通过可能的破坏性预测来改变蛋白质功能。在这三个基因的变异中,带有 SNP rs34536443 的 TYK2 基因对 T1DM 发病的影响最大,得分为 0.999。我们衷心希望这些研究成果能对了解 T1DM 的遗传基础起到重要作用。
{"title":"Investigation of missense mutation-related type 1 diabetes mellitus through integrating genomic databases and bioinformatic approach.","authors":"Dyonisa Nasirochmi Pakha, Ratih Dewi Yudhani, Lalu Muhammad Irham","doi":"10.1186/s44342-024-00005-4","DOIUrl":"10.1186/s44342-024-00005-4","url":null,"abstract":"<p><p>Though genes are already known to be responsible for type 1 diabetes mellitus (T1DM), the knowledge of missense mutation of that disease gene has still to be under covered. A genomic database and a bioinformatics-based approach are integrated in the present study in order to address this issue. Initially, nine variants associated with T1DM were retrieved from the GWAS catalogue. Different genomic algorithms such as PolyPhen2.0, SNPs and GTEx analyser programs were used to study the structural and functional effects of these mutations. Subsequently, SNPnexus was also employed to understand the effect of these mutations on the function of the expressed protein. Nine missense variants of T1DM were identified using the GWAS catalogue database. Among these nine SNPs, three were predicted to be related to the progression of T1DM disease by affecting the protein level. TYK2 gene variants with SNP rs34536443 were thought to have a probably damaging effect. Meanwhile, both COL4A3 and IFIH1 genes with SNPs rs55703767 and rs35667974, respectively, might alter protein function through a possibly damaging prediction. Among the variants of the three genes, the TYK2 gene with SNP rs34536443 had the strongest contribution in affecting the development of T1DM, with a score of 0.999. We sincerely hope that the results could be of immense importance in understanding the genetic basis of T1DM.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11201337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141461511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey on large language model annotation of cellular senescence from figures in review articles. 从综述文章中的数字对细胞衰老进行大语言模型注释的调查。
Pub Date : 2024-06-17 DOI: 10.1186/s44342-024-00011-6
Yuki Yamagata, Ryota Yamada

This study evaluated large language models (LLMs), particularly the GPT-4 with vision (GPT-4 V) and GPT-4 Turbo, for annotating biomedical figures, focusing on cellular senescence. We assessed the ability of LLMs to categorize and annotate complex biomedical images to enhance their accuracy and efficiency. Our experiments employed prompt engineering with figures from review articles, achieving more than 70% accuracy for label extraction and approximately 80% accuracy for node-type classification. Challenges were noted in the correct annotation of the relationship between directionality and inhibitory processes, which were exacerbated as the number of nodes increased. Using figure legends was a more precise identification of sources and targets than using captions, but sometimes lacked pathway details. This study underscores the potential of LLMs in decoding biological mechanisms from text and outlines avenues for improving inhibitory relationship representations in biomedical informatics.

本研究评估了大型语言模型(LLMs),特别是用于注释生物医学图像的 GPT-4 with vision (GPT-4 V) 和 GPT-4 Turbo,重点是细胞衰老。我们评估了 LLMs 对复杂生物医学图像进行分类和注释的能力,以提高其准确性和效率。我们的实验采用了评论文章中图片的提示工程,标签提取的准确率超过 70%,节点类型分类的准确率约为 80%。我们注意到,在正确标注方向性和抑制过程之间的关系方面存在挑战,而随着节点数量的增加,这种挑战更加严重。与使用标题相比,使用图例能更准确地识别来源和目标,但有时缺乏路径细节。这项研究强调了 LLM 在从文本中解码生物机制方面的潜力,并概述了在生物医学信息学中改进抑制关系表征的途径。
{"title":"Survey on large language model annotation of cellular senescence from figures in review articles.","authors":"Yuki Yamagata, Ryota Yamada","doi":"10.1186/s44342-024-00011-6","DOIUrl":"10.1186/s44342-024-00011-6","url":null,"abstract":"<p><p>This study evaluated large language models (LLMs), particularly the GPT-4 with vision (GPT-4 V) and GPT-4 Turbo, for annotating biomedical figures, focusing on cellular senescence. We assessed the ability of LLMs to categorize and annotate complex biomedical images to enhance their accuracy and efficiency. Our experiments employed prompt engineering with figures from review articles, achieving more than 70% accuracy for label extraction and approximately 80% accuracy for node-type classification. Challenges were noted in the correct annotation of the relationship between directionality and inhibitory processes, which were exacerbated as the number of nodes increased. Using figure legends was a more precise identification of sources and targets than using captions, but sometimes lacked pathway details. This study underscores the potential of LLMs in decoding biological mechanisms from text and outlines avenues for improving inhibitory relationship representations in biomedical informatics.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11800539/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141437956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genomics & informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1