Pub Date : 2024-10-31DOI: 10.1186/s44342-024-00023-2
Ramya Tekumalla, Juan M Banda
Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data ("hallucinations"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.
{"title":"Towards automated phenotype definition extraction using large language models.","authors":"Ramya Tekumalla, Juan M Banda","doi":"10.1186/s44342-024-00023-2","DOIUrl":"10.1186/s44342-024-00023-2","url":null,"abstract":"<p><p>Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data (\"hallucinations\"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1186/s44342-024-00022-3
Xinzhi Yao, Zhihan He, Jingbo Xia
The extraction of biological regulation events has been a key focus in the field of biomedical nature language processing (BioNLP). However, existing methods often encounter challenges such as cascading errors in text mining pipelines and limitations in topic coverage from the selected corpus. Fortunately, the emergence of large language models (LLMs) presents a potential solution due to their robust semantic understanding and extensive knowledge base. To explore this potential, our project at the Biomedical Linked Annotation Hackathon 8 (BLAH 8) investigates the feasibility of using LLMs to extract biological regulation events. Our findings, based on the analysis of rice literature, demonstrate the promising performance of LLMs in this task, while also highlighting several concerns that must be addressed in future LLM-based application in low-resource topic.
{"title":"Bioregulatory event extraction using large language models: a case study of rice literature.","authors":"Xinzhi Yao, Zhihan He, Jingbo Xia","doi":"10.1186/s44342-024-00022-3","DOIUrl":"10.1186/s44342-024-00022-3","url":null,"abstract":"<p><p>The extraction of biological regulation events has been a key focus in the field of biomedical nature language processing (BioNLP). However, existing methods often encounter challenges such as cascading errors in text mining pipelines and limitations in topic coverage from the selected corpus. Fortunately, the emergence of large language models (LLMs) presents a potential solution due to their robust semantic understanding and extensive knowledge base. To explore this potential, our project at the Biomedical Linked Annotation Hackathon 8 (BLAH 8) investigates the feasibility of using LLMs to extract biological regulation events. Our findings, based on the analysis of rice literature, demonstrate the promising performance of LLMs in this task, while also highlighting several concerns that must be addressed in future LLM-based application in low-resource topic.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142560352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.
{"title":"Fast and accurate short-read alignment with hybrid hash-tree data structure.","authors":"Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki","doi":"10.1186/s44342-024-00012-5","DOIUrl":"10.1186/s44342-024-00012-5","url":null,"abstract":"<p><p>Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"19"},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10DOI: 10.1186/s44342-024-00021-4
Nishad Thalhath
This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.
{"title":"Lightweight technology stacks for assistive linked annotations.","authors":"Nishad Thalhath","doi":"10.1186/s44342-024-00021-4","DOIUrl":"10.1186/s44342-024-00021-4","url":null,"abstract":"<p><p>This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"17"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10DOI: 10.1186/s44342-024-00025-0
Jin Sook Lee
Advancements in sequencing technology have significantly enhanced diagnostic capabilities for rare neurological diseases. This progress in molecular diagnostics can greatly impact clinical management and facilitate the development of personalized treatments for patients with rare neurological diseases. Neurologists with expertise should raise clinical awareness, as phenotyping remains crucial for making a clinical diagnosis, even in the genomics era. They should prioritize different types of genomic tests, considering both the benefits and the limitations inherent to each test. Notably, long-read sequencing is being utilized in cases suspected to involve repeat expansion disorders or complex structural variants. Repeat expansion disorders are highly prevalent in neurological diseases, particularly within the ataxia group. Significant efforts, including periodic reanalysis, data sharing, or integration of genomics with multi-omics studies, should be directed toward cases that remain undiagnosed after standard next-generation sequencing.
{"title":"Molecular diagnostic approach to rare neurological diseases from a clinician viewpoint.","authors":"Jin Sook Lee","doi":"10.1186/s44342-024-00025-0","DOIUrl":"10.1186/s44342-024-00025-0","url":null,"abstract":"<p><p>Advancements in sequencing technology have significantly enhanced diagnostic capabilities for rare neurological diseases. This progress in molecular diagnostics can greatly impact clinical management and facilitate the development of personalized treatments for patients with rare neurological diseases. Neurologists with expertise should raise clinical awareness, as phenotyping remains crucial for making a clinical diagnosis, even in the genomics era. They should prioritize different types of genomic tests, considering both the benefits and the limitations inherent to each test. Notably, long-read sequencing is being utilized in cases suspected to involve repeat expansion disorders or complex structural variants. Repeat expansion disorders are highly prevalent in neurological diseases, particularly within the ataxia group. Significant efforts, including periodic reanalysis, data sharing, or integration of genomics with multi-omics studies, should be directed toward cases that remain undiagnosed after standard next-generation sequencing.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"18"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10DOI: 10.1186/s44342-024-00018-z
Jon Bohlin, John H-O Pettersson
Background: To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.
Results: We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.
Conclusion: As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.
背景介绍一串符号的压缩程度揭示了其复杂性的重要细节。例如,不可压缩的字符串是随机的,信息潜力低,而高度可压缩的字符串则相反。由于微生物基因组在大小和碱基组成方面存在很大差异,因此我们探讨了微生物基因组在多大程度上适合压缩。例如,微生物基因组的大小从共生体中的不到 10 万个碱基对到土壤中的超过 1 千万个碱基对不等。由于腺嘌呤和胸腺嘧啶以及胞嘧啶和鸟嘌呤的频率相似,基因组碱基组成通常被概括为基因组 AT 或 GC 含量。碱基组成决定了由多个核苷酸或寡核苷酸组成的 DNA 词的频率,因此也可能影响可压缩性。我们利用 4,713 个 RefSeq 基因组,采用基于 DNA 的压缩算法(MBGC)和通用压缩算法(ZPAQ),使用广义加性模型研究了可压缩性与基因组大小、AT 含量以及基因组寡核苷酸使用方差(OUV)之间的关系:结果:我们发现基因组大小(p由于缺乏可压缩性等同于随机性,我们的研究结果表明,较小和富含 AT 的基因组可能比较大和 AT 贫乏的基因组平均积累了更多的随机突变,而较大和 AT 贫乏的基因组反过来又显著增加了冗余。此外,我们还发现 OUV 是微生物基因组可压缩性的有力代表。我们发现 ZPAQ 压缩器与 MBGC 压缩器的结果一致,只是在富 AT 和 AT 贫瘠/富 GC 基因组的可压缩性方面表现较差。
{"title":"Compression rates of microbial genomes are associated with genome size and base composition.","authors":"Jon Bohlin, John H-O Pettersson","doi":"10.1186/s44342-024-00018-z","DOIUrl":"10.1186/s44342-024-00018-z","url":null,"abstract":"<p><strong>Background: </strong>To what degree a string of symbols can be compressed reveals important details about its complexity. For instance, strings that are not compressible are random and carry a low information potential while the opposite is true for highly compressible strings. We explore to what extent microbial genomes are amenable to compression as they vary considerably both with respect to size and base composition. For instance, microbial genome sizes vary from less than 100,000 base pairs in symbionts to more than 10 million in soil-dwellers. Genomic base composition, often summarized as genomic AT or GC content due to the similar frequencies of adenine and thymine on one hand and cytosine and guanine on the other, also vary substantially; the most extreme microbes can have genomes with AT content below 25% or above 85% AT. Base composition determines the frequency of DNA words, consisting of multiple nucleotides or oligonucleotides, and may therefore also influence compressibility. Using 4,713 RefSeq genomes, we examined the association between compressibility, using both a DNA based- (MBGC) and a general purpose (ZPAQ) compression algorithm, and genome size, AT content as well as genomic oligonucleotide usage variance (OUV) using generalized additive models.</p><p><strong>Results: </strong>We find that genome size (p < 0.001) and OUV (p < 0.001) are both strongly associated with genome redundancy for both type of file compressors. The DNA-based MBGC compressor managed to improve compression with approximately 3% on average with respect to ZPAQ. Moreover, MBGC detected a significant (p < 0.001) compression ratio difference between AT poor and AT rich genomes which was not detected with ZPAQ.</p><p><strong>Conclusion: </strong>As lack of compressibility is equivalent to randomness, our findings suggest that smaller and AT rich genomes may have accumulated more random mutations on average than larger and AT poor genomes which, in turn, were significantly more redundant. Moreover, we find that OUV is a strong proxy for genome compressibility in microbial genomes. The ZPAQ compressor was found to agree with the MBGC compressor, albeit with a poorer performance, except for the compressibility of AT-rich and AT-poor/GC-rich genomes.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08DOI: 10.1186/s44342-024-00019-y
Won Jong Choi, Jongkeun Park, Do Young Seong, Dae Sun Chung, Dongwan Hong
Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.
{"title":"A prediction of mutations in infectious viruses using artificial intelligence.","authors":"Won Jong Choi, Jongkeun Park, Do Young Seong, Dae Sun Chung, Dongwan Hong","doi":"10.1186/s44342-024-00019-y","DOIUrl":"10.1186/s44342-024-00019-y","url":null,"abstract":"<p><p>Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.
{"title":"Review of the technology used for structural characterization of the GMO genome using NGS data.","authors":"Kahee Moon, Prakash Basnet, Taeyoung Um, Ik-Young Choi","doi":"10.1186/s44342-024-00016-1","DOIUrl":"10.1186/s44342-024-00016-1","url":null,"abstract":"<p><p>The molecular characterization of genetically modified organisms (GMOs) is essential for ensuring safety and gaining regulatory approval for commercialization. According to CODEX standards, this characterization involves evaluating the presence of introduced genes, insertion sites, copy number, and nucleotide sequence structure. Advances in technology have led to the increased use of next-generation sequencing (NGS) over traditional methods such as Southern blotting. While both methods provide high reproducibility and accuracy, Southern blotting is labor-intensive and time-consuming due to the need for repetitive probe design and analyses for each target, resulting in low throughput. Conversely, NGS facilitates rapid and comprehensive analysis by mapping whole-genome sequencing (WGS) data to plasmid sequences, accurately identifying T-DNA insertion sites and flanking regions. This advantage allows for efficient detection of T-DNA presence, copy number, and unintended gene insertions without additional probe work. This paper reviews the current status of GMO genome characterization using NGS and proposes more efficient strategies for this purpose.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1186/s44342-024-00015-2
Mohammad Reza Arabestani, Masoumeh Saadat, Amir Taherkhani
Objective: Antibiotic resistance poses a pressing and crucial global public health challenge, leading to significant clinical and health-related consequences. Substantial evidence highlights the pivotal involvement of rifampicin monooxygenase (RIFMO) in the context of antibiotic resistance. Hence, inhibiting RIFMO could offer potential in the treatment of various infections. Anthraquinones, a group of organic compounds, have shown promise in addressing tuberculosis. This study employed integrated bioinformatics approaches to evaluate the potential inhibitory effects of a selection of anthraquinones on RIFMO. The findings were subsequently compared with those of rifampicin (RIF), serving as a positive control inhibitor.
Methods: The AutoDock 4.0 tool assessed the binding free energy between 21 anthraquinones and the RIFMO catalytic cleft. The ligands were ranked based on the most favorable scores derived from ΔGbinding. The docking analyses for the highest-ranked anthraquinone and RIF underwent a cross-validation process. This validation procedure utilized the SwissDock server and the Schrödinger Maestro docking software. Molecular dynamics simulations were conducted to scrutinize the stability of the backbone atoms in free RIFMO, RIFMO-RIF, and RIFMO complexed with the top-ranked anthraquinone throughout a 100-ns computer simulation. The Discovery Studio Visualizer tool visualized interactions between RIFMO residues and ligands. An evaluation of the pharmacokinetics and toxicity profiles of the tested compounds was also conducted.
Results: Five anthraquinones were indicated with ΔGbinding scores less than - 10 kcal/mol. Hypericin emerged as the most potent RIFMO inhibitor, boasting a ΔGbinding score and inhibition constant value of - 12.11 kcal/mol and 798.99 pM, respectively. The agreement across AutoDock 4.0, SwissDock, and Schrödinger Maestro results highlighted hypericin's notable binding affinity to the RIFMO catalytic cleft. The RIFMO-hypericin complex achieved stability after a 70-ns computer simulation, exhibiting a root-mean-square deviation of 0.55 nm. Oral bioavailability analysis revealed that all anthraquinones except hypericin, sennidin A, and sennidin B may be suitable for oral administration. Furthermore, the carcinogenicity prediction analysis indicated a favorable safety profile for all examined anthraquinones.
Conclusion: Inhibiting RIFMO, particularly with anthraquinones such as hypericin, holds promise as a potential therapeutic strategy for infectious diseases.
{"title":"Antibiotic resistance challenge: evaluating anthraquinones as rifampicin monooxygenase inhibitors through integrated bioinformatics analysis.","authors":"Mohammad Reza Arabestani, Masoumeh Saadat, Amir Taherkhani","doi":"10.1186/s44342-024-00015-2","DOIUrl":"10.1186/s44342-024-00015-2","url":null,"abstract":"<p><strong>Objective: </strong>Antibiotic resistance poses a pressing and crucial global public health challenge, leading to significant clinical and health-related consequences. Substantial evidence highlights the pivotal involvement of rifampicin monooxygenase (RIFMO) in the context of antibiotic resistance. Hence, inhibiting RIFMO could offer potential in the treatment of various infections. Anthraquinones, a group of organic compounds, have shown promise in addressing tuberculosis. This study employed integrated bioinformatics approaches to evaluate the potential inhibitory effects of a selection of anthraquinones on RIFMO. The findings were subsequently compared with those of rifampicin (RIF), serving as a positive control inhibitor.</p><p><strong>Methods: </strong>The AutoDock 4.0 tool assessed the binding free energy between 21 anthraquinones and the RIFMO catalytic cleft. The ligands were ranked based on the most favorable scores derived from ΔG<sub>binding</sub>. The docking analyses for the highest-ranked anthraquinone and RIF underwent a cross-validation process. This validation procedure utilized the SwissDock server and the Schrödinger Maestro docking software. Molecular dynamics simulations were conducted to scrutinize the stability of the backbone atoms in free RIFMO, RIFMO-RIF, and RIFMO complexed with the top-ranked anthraquinone throughout a 100-ns computer simulation. The Discovery Studio Visualizer tool visualized interactions between RIFMO residues and ligands. An evaluation of the pharmacokinetics and toxicity profiles of the tested compounds was also conducted.</p><p><strong>Results: </strong>Five anthraquinones were indicated with ΔG<sub>binding</sub> scores less than - 10 kcal/mol. Hypericin emerged as the most potent RIFMO inhibitor, boasting a ΔG<sub>binding</sub> score and inhibition constant value of - 12.11 kcal/mol and 798.99 pM, respectively. The agreement across AutoDock 4.0, SwissDock, and Schrödinger Maestro results highlighted hypericin's notable binding affinity to the RIFMO catalytic cleft. The RIFMO-hypericin complex achieved stability after a 70-ns computer simulation, exhibiting a root-mean-square deviation of 0.55 nm. Oral bioavailability analysis revealed that all anthraquinones except hypericin, sennidin A, and sennidin B may be suitable for oral administration. Furthermore, the carcinogenicity prediction analysis indicated a favorable safety profile for all examined anthraquinones.</p><p><strong>Conclusion: </strong>Inhibiting RIFMO, particularly with anthraquinones such as hypericin, holds promise as a potential therapeutic strategy for infectious diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over hundreds of years, cats have been domesticated and selectively bred, resulting in numerous pedigreed breeds expedited by recent cat shows and breeding associations. Concerns have been raised about the limited breeding options and the genetic implications of inbreeding, indicating challenges in maintaining genetic diversity and accurate identification in purebred cats. In this study, genetic variability and structure were examined in 5 Thai domestic cat breeds using 15 microsatellite markers and mitochondrial DNA (mtDNA) D-loop sequencing. In total, 184 samples representing the Wichien Maat (WCM), Suphalak (SL), Khao-Manee (KM), Korat (KR), and Konja (KJ) breeds were analyzed. High genetic diversity (Ho and He > 0.5) was observed in all breeds, and mtDNA analysis revealed two primary haplogroups (A and B) that were shared among all domestic cat breeds in Thailand and globally. However, minor differences were observed between Thai domestic cat breeds based on clustering analyses, in which a distinct genetic structure was observed in the WCM breed. This suggests that allele fixation for distinctive morphological traits has occurred in Thai domestic cat breeds that emerged in isolated regions with shared racial origins. Analysis of relationships among individuals within the breed revealed high identification efficiency in Thai domestic cat breeds (P(ID)sibs < 10-4). Additionally, diverse and effective individual identification can be ensured by optimizing marker efficiency by using only nine loci. This comprehensive genetic characterization provides valuable insights into conservation strategies and breeding practices for Thai domestic cat breeds.
{"title":"Shared alleles and genetic structures in different Thai domestic cat breeds: the possible influence of common racial origins.","authors":"Wattanawan Jaito, Worapong Singchat, Chananya Patta, Chadaphon Thatukan, Nichakorn Kumnan, Piangjai Chalermwong, Trifan Budi, Thitipong Panthum, Wongsathit Wongloet, Pish Wattanadilokchatkun, Thanyapat Thong, Narongrit Muangmai, Kyudong Han, Prateep Duengkae, Rattanin Phatcharakullawarawat, Kornsorn Srikulnath","doi":"10.1186/s44342-024-00013-4","DOIUrl":"10.1186/s44342-024-00013-4","url":null,"abstract":"<p><p>Over hundreds of years, cats have been domesticated and selectively bred, resulting in numerous pedigreed breeds expedited by recent cat shows and breeding associations. Concerns have been raised about the limited breeding options and the genetic implications of inbreeding, indicating challenges in maintaining genetic diversity and accurate identification in purebred cats. In this study, genetic variability and structure were examined in 5 Thai domestic cat breeds using 15 microsatellite markers and mitochondrial DNA (mtDNA) D-loop sequencing. In total, 184 samples representing the Wichien Maat (WCM), Suphalak (SL), Khao-Manee (KM), Korat (KR), and Konja (KJ) breeds were analyzed. High genetic diversity (H<sub>o</sub> and H<sub>e</sub> > 0.5) was observed in all breeds, and mtDNA analysis revealed two primary haplogroups (A and B) that were shared among all domestic cat breeds in Thailand and globally. However, minor differences were observed between Thai domestic cat breeds based on clustering analyses, in which a distinct genetic structure was observed in the WCM breed. This suggests that allele fixation for distinctive morphological traits has occurred in Thai domestic cat breeds that emerged in isolated regions with shared racial origins. Analysis of relationships among individuals within the breed revealed high identification efficiency in Thai domestic cat breeds (P<sub>(ID)sibs</sub> < 10<sup>-4</sup>). Additionally, diverse and effective individual identification can be ensured by optimizing marker efficiency by using only nine loci. This comprehensive genetic characterization provides valuable insights into conservation strategies and breeding practices for Thai domestic cat breeds.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11292921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}