Genomics & informatics最新文献_第5页

Customizing GPT for natural language dialogue interface in database access. 为数据库访问中的自然语言对话界面定制 GPT。

Genomics & informatics

Pub Date : 2024-11-01 DOI: 10.1186/s44342-024-00020-5

Jin-Dong Kim, Kousaku Okubo

The paper presents Anatomy3DExplorer, a customized ChatGPT designed as a natural language dialogue interface for exploring 3D models of anatomical structures. It illustrates the significant potential of large language models (LLMs) as user-friendly interfaces for database access. Furthermore, it showcases the seamless integration of LLMs and database APIs, within the GPTS framework, offering a promising and straightforward approach.

本文介绍了 Anatomy3DExplorer，这是一个定制的 ChatGPT，设计用作自然语言对话界面，用于探索解剖结构的 3D 模型。它展示了大型语言模型（LLM）作为用户友好型数据库访问界面的巨大潜力。此外，它还展示了在 GPTS 框架内 LLM 与数据库 API 的无缝集成，提供了一种前景广阔的直接方法。

引用次数: 0

Towards automated phenotype definition extraction using large language models. 利用大型语言模型实现自动表型定义提取。

Genomics & informatics

Pub Date : 2024-10-31 DOI: 10.1186/s44342-024-00023-2

Ramya Tekumalla, Juan M Banda

Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data ("hallucinations"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.

电子表型涉及对结构化和非结构化数据的详细分析，采用基于规则的方法、机器学习、自然语言处理和混合方法。目前，准确表型定义的开发需要大量的文献综述和临床专家，因此这一过程既耗时又不可扩展。大型语言模型为表型定义的自动提取提供了一个前景广阔的途径，但也存在一些重大缺陷，包括可靠性问题、产生非事实数据（"幻觉"）的倾向、误导性结果和潜在危害。为了应对这些挑战，我们的研究着手实现两个关键目标：（1）定义标准评估集，以确保大型语言模型的输出既有用又可靠；（2）评估从大型语言模型中提取表型定义的各种提示方法，并用我们既定的评估任务对它们进行评估。我们的研究结果表明，这项任务仍需要人工评估和验证，结果很有希望。不过，加强表型提取是可能的，这样可以减少文献查阅和评估所花费的时间。

{"title":"Towards automated phenotype definition extraction using large language models.","authors":"Ramya Tekumalla, Juan M Banda","doi":"10.1186/s44342-024-00023-2","DOIUrl":"10.1186/s44342-024-00023-2","url":null,"abstract":"Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data (\"hallucinations\"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bioregulatory event extraction using large language models: a case study of rice literature. 使用大型语言模型提取生物调控事件：水稻文献案例研究。

Genomics & informatics

Pub Date : 2024-10-31 DOI: 10.1186/s44342-024-00022-3

Xinzhi Yao, Zhihan He, Jingbo Xia

The extraction of biological regulation events has been a key focus in the field of biomedical nature language processing (BioNLP). However, existing methods often encounter challenges such as cascading errors in text mining pipelines and limitations in topic coverage from the selected corpus. Fortunately, the emergence of large language models (LLMs) presents a potential solution due to their robust semantic understanding and extensive knowledge base. To explore this potential, our project at the Biomedical Linked Annotation Hackathon 8 (BLAH 8) investigates the feasibility of using LLMs to extract biological regulation events. Our findings, based on the analysis of rice literature, demonstrate the promising performance of LLMs in this task, while also highlighting several concerns that must be addressed in future LLM-based application in low-resource topic.

生物调控事件的提取一直是生物医学自然语言处理（BioNLP）领域的重点。然而，现有的方法经常会遇到一些挑战，如文本挖掘管道中的级联错误和所选语料库主题覆盖范围的局限性。幸运的是，大型语言模型（LLM）的出现提供了一个潜在的解决方案，因为它们具有强大的语义理解能力和广泛的知识库。为了探索这一潜力，我们在生物医学关联注释黑客马拉松 8（BLAH 8）上的项目研究了使用 LLMs 提取生物调控事件的可行性。我们的研究结果基于对水稻文献的分析，证明了 LLM 在这项任务中的良好表现，同时也强调了未来基于 LLM 的低资源主题应用中必须解决的几个问题。

引用次数: 0

Fast and accurate short-read alignment with hybrid hash-tree data structure. 利用混合哈希树数据结构实现快速准确的短读取配准。

Genomics & informatics

Pub Date : 2024-10-29 DOI: 10.1186/s44342-024-00012-5

Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki

Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.

NGS（新一代测序仪）产生的短读数数据量迅速增加，需要开发快速准确的读数比对程序。目前使用的是基于哈希表（BLAST）和Burrows-Wheeler变换（bwa-mem）的程序，已知后者性能更优。我们在此介绍一种新算法，它是哈希表和后缀树的混合算法，我们设计它的目的是加快短读数与大型参考序列（如人类基因组）的比对速度。使用我们的系统处理一个人类基因组样本（读取深度为 30）的总周转时间仅为 31 分钟，而使用 bwa-mem/gatk 则需要 25 小时以上。我们的系统仅比对器就需要 28 分钟，而 bwa-mem 则需要约 2 小时。我们的新算法比 bwa-mem 快 4.4 倍，而准确率却相差无几。比对后的变异调用和其他下游分析可以使用开源工具，如 SAMtools 和基因组分析工具包（gatk），以及我们自己的快速变异调用程序来完成。

{"title":"Fast and accurate short-read alignment with hybrid hash-tree data structure.","authors":"Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki","doi":"10.1186/s44342-024-00012-5","DOIUrl":"10.1186/s44342-024-00012-5","url":null,"abstract":"Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"19"},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight technology stacks for assistive linked annotations. 用于辅助链接注释的轻量级技术栈。

Genomics & informatics

Pub Date : 2024-10-10 DOI: 10.1186/s44342-024-00021-4

Nishad Thalhath

This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.

本报告介绍了第八届生物医学关联注释黑客马拉松（BLAH）项目的研究成果，该项目旨在探索轻量级技术堆栈，以增强辅助性关联注释。该项目利用现代 JavaScript 框架和边缘函数，实现了浏览器内的命名实体识别（NER）、网络界面内的无服务器嵌入和矢量搜索，以及高效的无服务器全文搜索。通过这种实验方法，证明了这些技术的可行性和性能。结果表明，轻量级堆栈可以显著提高注释工具的效率和成本效益，并在各种使用案例中提供了一种本地优先、面向隐私和安全的解决方案，以替代传统的基于服务器的解决方案。这项工作强调了开发反应更灵敏、可扩展和用户友好的注释界面的潜力，这将使生物信息学研究人员、从业人员和软件开发人员受益匪浅。

引用次数: 0

Molecular diagnostic approach to rare neurological diseases from a clinician viewpoint. 从临床医生的角度看罕见神经系统疾病的分子诊断方法。

Genomics & informatics

Pub Date : 2024-10-10 DOI: 10.1186/s44342-024-00025-0

Jin Sook Lee

Advancements in sequencing technology have significantly enhanced diagnostic capabilities for rare neurological diseases. This progress in molecular diagnostics can greatly impact clinical management and facilitate the development of personalized treatments for patients with rare neurological diseases. Neurologists with expertise should raise clinical awareness, as phenotyping remains crucial for making a clinical diagnosis, even in the genomics era. They should prioritize different types of genomic tests, considering both the benefits and the limitations inherent to each test. Notably, long-read sequencing is being utilized in cases suspected to involve repeat expansion disorders or complex structural variants. Repeat expansion disorders are highly prevalent in neurological diseases, particularly within the ataxia group. Significant efforts, including periodic reanalysis, data sharing, or integration of genomics with multi-omics studies, should be directed toward cases that remain undiagnosed after standard next-generation sequencing.

测序技术的进步大大提高了罕见神经系统疾病的诊断能力。分子诊断技术的进步可极大地影响临床管理，促进罕见神经疾病患者个性化治疗的发展。具有专业知识的神经科医生应提高临床意识，因为即使在基因组时代，表型分析对临床诊断仍然至关重要。他们应优先考虑不同类型的基因组检验，同时考虑每种检验的优势和固有局限性。值得注意的是，长线程测序正被用于疑似重复扩展障碍或复杂结构变异的病例。重复扩增紊乱在神经系统疾病中非常普遍，尤其是在共济失调群中。对于标准下一代测序后仍未确诊的病例，应针对其开展大量工作，包括定期重新分析、数据共享或将基因组学与多组学研究相结合。

引用次数: 0

Compression rates of microbial genomes are associated with genome size and base composition. 微生物基因组的压缩率与基因组大小和碱基组成有关。

Genomics & informatics

Pub Date : 2024-10-10 DOI: 10.1186/s44342-024-00018-z

Jon Bohlin, John H-O Pettersson

Background: To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.

Results: We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.

Conclusion: As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.

背景介绍一串符号的压缩程度揭示了其复杂性的重要细节。例如，不可压缩的字符串是随机的，信息潜力低，而高度可压缩的字符串则相反。由于微生物基因组在大小和碱基组成方面存在很大差异，因此我们探讨了微生物基因组在多大程度上适合压缩。例如，微生物基因组的大小从共生体中的不到 10 万个碱基对到土壤中的超过 1 千万个碱基对不等。由于腺嘌呤和胸腺嘧啶以及胞嘧啶和鸟嘌呤的频率相似，基因组碱基组成通常被概括为基因组 AT 或 GC 含量。碱基组成决定了由多个核苷酸或寡核苷酸组成的 DNA 词的频率，因此也可能影响可压缩性。我们利用 4,713 个 RefSeq 基因组，采用基于 DNA 的压缩算法（MBGC）和通用压缩算法（ZPAQ），使用广义加性模型研究了可压缩性与基因组大小、AT 含量以及基因组寡核苷酸使用方差（OUV）之间的关系：结果：我们发现基因组大小（p由于缺乏可压缩性等同于随机性，我们的研究结果表明，较小和富含 AT 的基因组可能比较大和 AT 贫乏的基因组平均积累了更多的随机突变，而较大和 AT 贫乏的基因组反过来又显著增加了冗余。此外，我们还发现 OUV 是微生物基因组可压缩性的有力代表。我们发现 ZPAQ 压缩器与 MBGC 压缩器的结果一致，只是在富 AT 和 AT 贫瘠/富 GC 基因组的可压缩性方面表现较差。

{"title":"Compression rates of microbial genomes are associated with genome size and base composition.","authors":"Jon Bohlin, John H-O Pettersson","doi":"10.1186/s44342-024-00018-z","DOIUrl":"10.1186/s44342-024-00018-z","url":null,"abstract":"Background: To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.Results: We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.Conclusion: As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A prediction of mutations in infectious viruses using artificial intelligence. 利用人工智能预测传染性病毒的突变。

Genomics & informatics

Pub Date : 2024-10-08 DOI: 10.1186/s44342-024-00019-y

Won Jong Choi, Jongkeun Park, Do Young Seong, Dae Sun Chung, Dongwan Hong

Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.

自早期阶段以来，SARS-CoV-2 出现了许多亚型，其变异表现出地区和种族差异。这些变异极大地影响了病毒的传染性和严重程度。本研究旨在预测 SARS-CoV-2 演变过程中出现的变异，并找出预测变异的关键特征。我们从公开的数据库中收集并整理了有关 SARS-CoV-2 世系、日期、支系和变异的数据，并对这些数据进行了处理，以预测变异。此外，我们还利用各种人工智能模型来预测新出现的突变，并根据支系信息创建了各种训练集。只使用突变信息会导致学习模型的性能低下，而加入支系分化则会导致机器学习模型（包括 XGBoost）的性能提高（准确率：0.999）。然而，固定在 Omicron 的受体结合基序（RBM）区域的突变导致预测性能下降。利用这些模型，我们按照最近出现的 24A 和 24B 支系预测了 24C 的潜在突变位置。我们在 RBM 区域的 Q493 位置发现了一个突变。我们的研究为预测不断进化的传染性病毒的新突变开发了有效的人工智能模型和特征。

{"title":"A prediction of mutations in infectious viruses using artificial intelligence.","authors":"Won Jong Choi, Jongkeun Park, Do Young Seong, Dae Sun Chung, Dongwan Hong","doi":"10.1186/s44342-024-00019-y","DOIUrl":"10.1186/s44342-024-00019-y","url":null,"abstract":"Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Review of the technology used for structural characterization of the GMO genome using NGS data. 回顾利用 NGS 数据对转基因生物基因组进行结构表征的技术。

Genomics & informatics

Pub Date : 2024-10-02 DOI: 10.1186/s44342-024-00016-1

Kahee Moon, Prakash Basnet, Taeyoung Um, Ik-Young Choi

The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.

转基因生物（GMO）的分子特征描述对于确保安全和获得商业化监管批准至关重要。根据 CODEX 标准，这种表征包括评估引入基因的存在、插入位点、拷贝数和核苷酸序列结构。随着技术的进步，下一代测序（NGS）的使用已超过 Southern 印迹等传统方法。虽然这两种方法都具有较高的可重复性和准确性，但 Southern 印迹法需要对每个靶点进行重复探针设计和分析，耗费大量人力和时间，因此通量较低。相反，NGS 通过将全基因组测序（WGS）数据映射到质粒序列，准确识别 T-DNA 插入位点和侧翼区域，有助于进行快速、全面的分析。这一优势可有效检测 T-DNA 的存在、拷贝数和非预期基因插入，而无需额外的探针工作。本文回顾了利用 NGS 进行转基因生物基因组鉴定的现状，并为此提出了更有效的策略。

{"title":"Review of the technology used for structural characterization of the GMO genome using NGS data.","authors":"Kahee Moon, Prakash Basnet, Taeyoung Um, Ik-Young Choi","doi":"10.1186/s44342-024-00016-1","DOIUrl":"10.1186/s44342-024-00016-1","url":null,"abstract":"The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Antibiotic resistance challenge: evaluating anthraquinones as rifampicin monooxygenase inhibitors through integrated bioinformatics analysis. 抗生素耐药性挑战：通过综合生物信息学分析评估作为利福平单氧化酶抑制剂的蒽醌类化合物。

Genomics & informatics

Pub Date : 2024-09-04 DOI: 10.1186/s44342-024-00015-2

Mohammad Reza Arabestani, Masoumeh Saadat, Amir Taherkhani

Objective: Antibiotic resistance poses a pressing and crucial global public health challenge, leading to significant clinical and health-related consequences. Substantial evidence highlights the pivotal involvement of rifampicin monooxygenase (RIFMO) in the context of antibiotic resistance. Hence, inhibiting RIFMO could offer potential in the treatment of various infections. Anthraquinones, a group of organic compounds, have shown promise in addressing tuberculosis. This study employed integrated bioinformatics approaches to evaluate the potential inhibitory effects of a selection of anthraquinones on RIFMO. The findings were subsequently compared with those of rifampicin (RIF), serving as a positive control inhibitor.

Methods: The AutoDock 4.0 tool assessed the binding free energy between 21 anthraquinones and the RIFMO catalytic cleft. The ligands were ranked based on the most favorable scores derived from ΔG_binding. The docking analyses for the highest-ranked anthraquinone and RIF underwent a cross-validation process. This validation procedure utilized the SwissDock server and the Schrödinger Maestro docking software. Molecular dynamics simulations were conducted to scrutinize the stability of the backbone atoms in free RIFMO, RIFMO-RIF, and RIFMO complexed with the top-ranked anthraquinone throughout a 100-ns computer simulation. The Discovery Studio Visualizer tool visualized interactions between RIFMO residues and ligands. An evaluation of the pharmacokinetics and toxicity profiles of the tested compounds was also conducted.

Results: Five anthraquinones were indicated with ΔG_binding scores less than - 10 kcal/mol. Hypericin emerged as the most potent RIFMO inhibitor, boasting a ΔG_binding score and inhibition constant value of - 12.11 kcal/mol and 798.99 pM, respectively. The agreement across AutoDock 4.0, SwissDock, and Schrödinger Maestro results highlighted hypericin's notable binding affinity to the RIFMO catalytic cleft. The RIFMO-hypericin complex achieved stability after a 70-ns computer simulation, exhibiting a root-mean-square deviation of 0.55 nm. Oral bioavailability analysis revealed that all anthraquinones except hypericin, sennidin A, and sennidin B may be suitable for oral administration. Furthermore, the carcinogenicity prediction analysis indicated a favorable safety profile for all examined anthraquinones.

Conclusion: Inhibiting RIFMO, particularly with anthraquinones such as hypericin, holds promise as a potential therapeutic strategy for infectious diseases.

目的：抗生素耐药性是一项紧迫而严峻的全球公共卫生挑战，会导致严重的临床和健康后果。大量证据表明，利福平单加氧酶（RIFMO）在抗生素耐药性中起着关键作用。因此，抑制 RIFMO 有助于治疗各种感染。蒽醌是一类有机化合物，已显示出治疗结核病的前景。本研究采用了综合生物信息学方法来评估精选蒽醌类化合物对 RIFMO 的潜在抑制作用。研究结果随后与作为阳性对照抑制剂的利福平（RIF）进行了比较：方法：AutoDock 4.0工具评估了21种蒽醌类化合物与RIFMO催化裂隙之间的结合自由能。根据ΔG结合得出的最有利得分对配体进行排序。对排名最高的蒽醌与 RIF 的对接分析进行了交叉验证。这一验证过程使用了 SwissDock 服务器和 Schrödinger Maestro 对接软件。在整个 100-ns 的计算机模拟过程中，对自由 RIFMO、RIFMO-RIF 和 RIFMO 与排名第一的蒽醌复合物的骨架原子的稳定性进行了分子动力学模拟。Discovery Studio Visualizer工具将RIFMO残基与配体之间的相互作用可视化。此外，还对测试化合物的药代动力学和毒性特征进行了评估：结果表明，有五个蒽醌类化合物的ΔG结合分数小于-10 kcal/mol。金丝桃素是最有效的 RIFMO 抑制剂，其 ΔGbinding 分数和抑制常数值分别为 - 12.11 kcal/mol 和 798.99 pM。AutoDock 4.0、SwissDock 和 Schrödinger Maestro 的结果一致，突出表明金丝桃素与 RIFMO 催化裂隙的结合亲和力很强。RIFMO-hypericin 复合物在 70 秒的计算机模拟后达到了稳定，均方根偏差为 0.55 nm。口服生物利用度分析表明，除金丝桃素、番泻甙 A 和番泻甙 B 外，所有蒽醌类化合物都适合口服。此外，致癌性预测分析表明，所有研究的蒽醌类化合物都具有良好的安全性：结论：抑制 RIFMO（尤其是使用金丝桃素等蒽醌类化合物）有望成为一种潜在的传染病治疗策略。

{"title":"Antibiotic resistance challenge: evaluating anthraquinones as rifampicin monooxygenase inhibitors through integrated bioinformatics analysis.","authors":"Mohammad Reza Arabestani, Masoumeh Saadat, Amir Taherkhani","doi":"10.1186/s44342-024-00015-2","DOIUrl":"10.1186/s44342-024-00015-2","url":null,"abstract":"Objective: Antibiotic resistance poses a pressing and crucial global public health challenge, leading to significant clinical and health-related consequences. Substantial evidence highlights the pivotal involvement of rifampicin monooxygenase (RIFMO) in the context of antibiotic resistance. Hence, inhibiting RIFMO could offer potential in the treatment of various infections. Anthraquinones, a group of organic compounds, have shown promise in addressing tuberculosis. This study employed integrated bioinformatics approaches to evaluate the potential inhibitory effects of a selection of anthraquinones on RIFMO. The findings were subsequently compared with those of rifampicin (RIF), serving as a positive control inhibitor.Methods: The AutoDock 4.0 tool assessed the binding free energy between 21 anthraquinones and the RIFMO catalytic cleft. The ligands were ranked based on the most favorable scores derived from ΔGbinding. The docking analyses for the highest-ranked anthraquinone and RIF underwent a cross-validation process. This validation procedure utilized the SwissDock server and the Schrödinger Maestro docking software. Molecular dynamics simulations were conducted to scrutinize the stability of the backbone atoms in free RIFMO, RIFMO-RIF, and RIFMO complexed with the top-ranked anthraquinone throughout a 100-ns computer simulation. The Discovery Studio Visualizer tool visualized interactions between RIFMO residues and ligands. An evaluation of the pharmacokinetics and toxicity profiles of the tested compounds was also conducted.Results: Five anthraquinones were indicated with ΔGbinding scores less than - 10 kcal/mol. Hypericin emerged as the most potent RIFMO inhibitor, boasting a ΔGbinding score and inhibition constant value of - 12.11 kcal/mol and 798.99 pM, respectively. The agreement across AutoDock 4.0, SwissDock, and Schrödinger Maestro results highlighted hypericin's notable binding affinity to the RIFMO catalytic cleft. The RIFMO-hypericin complex achieved stability after a 70-ns computer simulation, exhibiting a root-mean-square deviation of 0.55 nm. Oral bioavailability analysis revealed that all anthraquinones except hypericin, sennidin A, and sennidin B may be suitable for oral administration. Furthermore, the carcinogenicity prediction analysis indicated a favorable safety profile for all examined anthraquinones.Conclusion: Inhibiting RIFMO, particularly with anthraquinones such as hypericin, holds promise as a potential therapeutic strategy for infectious diseases.","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0