首页 > 最新文献

Genomics & informatics最新文献

英文 中文
Dissecting non-B DNA structural motifs in untranslated regions of eukaryotic genomes. 剖析真核生物基因组非翻译区中的非 B 型 DNA 结构图案。
Pub Date : 2024-11-27 DOI: 10.1186/s44342-024-00028-x
Aruna Sesha Chandrika Gummadi, Divya Kumari Muppa, Venakata Rajesh Yella

The untranslated regions (UTRs) of genes significantly impact various biological processes, including transcription, posttranscriptional control, mRNA stability, localization, and translation efficiency. In functional areas of genomes, non-B DNA structures such as cruciform, curved, triplex, G-quadruplex, and Z-DNA structures are common and have an impact on cellular physiology. Although the role of these structures in cis-regulatory regions such as promoters is well established in eukaryotic genomes, their prevalence within UTRs across different eukaryotic classes has not been extensively documented. Our study investigated the prevalence of various non-B DNA motifs within the 5' and 3' UTRs across diverse eukaryotic species. Our comparative analysis encompassed the 5'-UTRs and 3'UTRs of 360 species representing diverse eukaryotic domains of life, including Arthropoda (Diptera, Hemiptera, and Hymenoptera), Chordata (Artiodactyla, Carnivora, Galliformes, Passeriformes, Primates, Rodentia, Squamata, Testudines), Magnoliophyta (Brassicales), Fabales (Poales), and Nematoda (Rhabditida), on the basis of datasets derived from the UTRdb. We observed that species belonging to taxonomic orders such as Rhabditida, Diptera, Brassicales, and Hemiptera present a prevalence of curved DNA motifs in their UTRs, whereas orders such as Testudines, Galliformes, and Rodentia present a preponderance of G-quadruplexes in both UTRs. The distribution of motifs is conserved across different taxonomic classes, although species-specific variations in motif preferences were also observed. Our research unequivocally illuminates the prevalence and potential functional implications of non-B DNA motifs, offering invaluable insights into the evolutionary and biological significance of these structures.

基因的非翻译区(UTR)对转录、转录后控制、mRNA 稳定性、定位和翻译效率等各种生物过程都有重大影响。在基因组的功能区,非 B 型 DNA 结构(如十字形、弯曲形、三重形、G-四重形和 Z-DNA 结构)很常见,并对细胞生理学产生影响。虽然这些结构在启动子等顺式调控区的作用在真核生物基因组中已得到证实,但它们在不同真核生物类别的 UTR 中的普遍性还没有得到广泛的记录。我们的研究调查了不同真核生物物种的 5' 和 3' UTR 中各种非 B DNA 主题的普遍性。我们的比较分析涵盖了 360 个物种的 5'-UTR 和 3'UTR,这些物种代表了真核生物的不同生命领域,包括节肢动物门(双翅目、半翅目和膜翅目)、脊索动物门(偶蹄目、食肉目、瘿形目、蝶形目和蝶形目)、真核生物门(真核生物)、真核生物门(真核生物)、真核生物门(真核生物)和真核生物门(真核生物)、在 UTRdb 数据集的基础上,我们对属于真核生物分类群的物种进行了分类,其中包括节肢动物门(双翅目、半翅目和膜翅目)、脊索动物门(有尾目、食肉目、胆形目、百灵目、灵长目、啮齿目、有鳞目、蹄目)、木兰纲(芸苔目)、梭形目和线虫纲(横纹目)。我们观察到,属于轮虫纲、双翅目、芸苔目和半翅目等分类目的物种在其 UTR 中普遍存在弯曲的 DNA 主题,而属于蹄目、胆形目和啮齿目等分类目的物种则在两个 UTR 中都存在大量的 G-四叠体。在不同的分类类别中,主题的分布是一致的,尽管在主题偏好方面也观察到了物种的特异性差异。我们的研究明确揭示了非 B 型 DNA 主题的普遍性和潜在功能意义,为了解这些结构的进化和生物学意义提供了宝贵的见解。
{"title":"Dissecting non-B DNA structural motifs in untranslated regions of eukaryotic genomes.","authors":"Aruna Sesha Chandrika Gummadi, Divya Kumari Muppa, Venakata Rajesh Yella","doi":"10.1186/s44342-024-00028-x","DOIUrl":"10.1186/s44342-024-00028-x","url":null,"abstract":"<p><p>The untranslated regions (UTRs) of genes significantly impact various biological processes, including transcription, posttranscriptional control, mRNA stability, localization, and translation efficiency. In functional areas of genomes, non-B DNA structures such as cruciform, curved, triplex, G-quadruplex, and Z-DNA structures are common and have an impact on cellular physiology. Although the role of these structures in cis-regulatory regions such as promoters is well established in eukaryotic genomes, their prevalence within UTRs across different eukaryotic classes has not been extensively documented. Our study investigated the prevalence of various non-B DNA motifs within the 5' and 3' UTRs across diverse eukaryotic species. Our comparative analysis encompassed the 5'-UTRs and 3'UTRs of 360 species representing diverse eukaryotic domains of life, including Arthropoda (Diptera, Hemiptera, and Hymenoptera), Chordata (Artiodactyla, Carnivora, Galliformes, Passeriformes, Primates, Rodentia, Squamata, Testudines), Magnoliophyta (Brassicales), Fabales (Poales), and Nematoda (Rhabditida), on the basis of datasets derived from the UTRdb. We observed that species belonging to taxonomic orders such as Rhabditida, Diptera, Brassicales, and Hemiptera present a prevalence of curved DNA motifs in their UTRs, whereas orders such as Testudines, Galliformes, and Rodentia present a preponderance of G-quadruplexes in both UTRs. The distribution of motifs is conserved across different taxonomic classes, although species-specific variations in motif preferences were also observed. Our research unequivocally illuminates the prevalence and potential functional implications of non-B DNA motifs, offering invaluable insights into the evolutionary and biological significance of these structures.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"25"},"PeriodicalIF":0.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11603647/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomic characteristics of Vibrio vulnificus strains isolated from clinical and environmental sources. 从临床和环境来源分离的弧菌菌株的基因组特征。
Pub Date : 2024-11-27 DOI: 10.1186/s44342-024-00029-w
Jinkyeong Lee, Jeong-Ih Shin, Woo Young Cho, Kun Taek Park, Yeun-Jun Chung, Seung-Hyun Jung

Vibrio vulnificus, a gram-negative pathogenic bacterium, transmitted via undercooked seafood or contaminated seawater, causes septicemia and wound infections. In this study, we analyzed 15 clinical and 11 environmental isolates. In total, 20 sequence types (STs), including eight novel STs, were identified. Antibiotic resistance gene analysis commonly detected the cyclic AMP receptor protein (CRP) in both the clinical and environmental isolates. Interestingly, clinical and environmental isolates were non-susceptible to third-generation cephalosporins, such as ceftazidime and cefotaxime, complicating the treatment of V. vulnificus infection. Multiple antibiotic resistance (MAR) index ranged from 0.1 to 0.5, with clinical isolates showing a higher mean MAR index than the environmental isolates, indicating their broader spectrum of resistance. Notable, no quantitative (124.3 vs. 126.5) and qualitative (adherence, antiphagocytosis, and chemotaxis/motility) differences in virulence factors were observed between the environmental and clinical strains. The molecular characteristics identified in this study provide insights into the virulence of V. vulnificus strains in South Korea, highlighting the need for continuous surveillance of antibiotic resistance in emerging V. vulnificus strains.

弧菌(Vibrio vulnificus)是一种革兰氏阴性致病菌,通过未煮熟的海产品或受污染的海水传播,导致败血症和伤口感染。在这项研究中,我们分析了 15 个临床分离株和 11 个环境分离株。共鉴定出 20 种序列类型(ST),包括 8 种新型 ST。抗生素耐药性基因分析在临床和环境分离株中普遍检测到环磷酸腺苷受体蛋白(CRP)。有趣的是,临床和环境分离株对头孢他啶和头孢噻肟等第三代头孢菌素无耐药性,这使弧菌感染的治疗变得复杂。多重抗生素耐药性(MAR)指数介于 0.1 至 0.5 之间,临床分离物的平均 MAR 指数高于环境分离物,表明它们的耐药性范围更广。值得注意的是,环境菌株和临床菌株的毒力因子在数量(124.3 对 126.5)和质量(粘附性、抗吞噬性和趋化性/机动性)方面均无差异。本研究确定的分子特征有助于深入了解韩国弧菌菌株的毒力,突出了对新出现的弧菌菌株的抗生素耐药性进行持续监测的必要性。
{"title":"Genomic characteristics of Vibrio vulnificus strains isolated from clinical and environmental sources.","authors":"Jinkyeong Lee, Jeong-Ih Shin, Woo Young Cho, Kun Taek Park, Yeun-Jun Chung, Seung-Hyun Jung","doi":"10.1186/s44342-024-00029-w","DOIUrl":"10.1186/s44342-024-00029-w","url":null,"abstract":"<p><p>Vibrio vulnificus, a gram-negative pathogenic bacterium, transmitted via undercooked seafood or contaminated seawater, causes septicemia and wound infections. In this study, we analyzed 15 clinical and 11 environmental isolates. In total, 20 sequence types (STs), including eight novel STs, were identified. Antibiotic resistance gene analysis commonly detected the cyclic AMP receptor protein (CRP) in both the clinical and environmental isolates. Interestingly, clinical and environmental isolates were non-susceptible to third-generation cephalosporins, such as ceftazidime and cefotaxime, complicating the treatment of V. vulnificus infection. Multiple antibiotic resistance (MAR) index ranged from 0.1 to 0.5, with clinical isolates showing a higher mean MAR index than the environmental isolates, indicating their broader spectrum of resistance. Notable, no quantitative (124.3 vs. 126.5) and qualitative (adherence, antiphagocytosis, and chemotaxis/motility) differences in virulence factors were observed between the environmental and clinical strains. The molecular characteristics identified in this study provide insights into the virulence of V. vulnificus strains in South Korea, highlighting the need for continuous surveillance of antibiotic resistance in emerging V. vulnificus strains.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"26"},"PeriodicalIF":0.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11603906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neuromuscular diseases: genomics-driven advances. 神经肌肉疾病:基因组学驱动的进步。
Pub Date : 2024-11-26 DOI: 10.1186/s44342-024-00027-y
Anna Cho

Neuromuscular diseases (NMDs) are a group of rare disorders characterized by significant genetic and clinical complexity. Advances in genomics have revolutionized both the diagnosis and treatment of NMDs. While fewer than 30 NMDs had known genetic causes before the 1990s, more than 600 have now been identified, largely due to the adoption of next-generation sequencing (NGS) technologies such as whole-exome sequencing (WES) and whole-genome sequencing (WGS). These technologies have enabled more precise and earlier diagnoses, although the genetic complexity of many NMDs continues to pose challenges. Gene therapy has been a transformative breakthrough in the treatment of NMDs. In spinal muscular atrophy (SMA), therapies like nusinersen, onasemnogene abeparvovec, and risdiplam have dramatically improved patient outcomes. Similarly, Duchenne muscular dystrophy (DMD) has seen significant progress, most notably with the FDA approval of delandistrogene moxeparvovec, the first micro-dystrophin gene therapy. Despite these advancements, challenges remain, including the rarity of many NMDs, genetic heterogeneity, and the high costs associated with genomic technologies and therapies. Continued progress in gene therapy, RNA-based therapeutics, and personalized medicine holds promise for further breakthroughs in the management of these debilitating diseases.

神经肌肉疾病(NMDs)是一组罕见疾病,其特点是遗传和临床复杂性显著。基因组学的进步彻底改变了 NMDs 的诊断和治疗。20 世纪 90 年代以前,已知遗传原因的 NMDs 不到 30 种,而现在已经确定的有 600 多种,这主要归功于新一代测序(NGS)技术的采用,如全外显子组测序(WES)和全基因组测序(WGS)。尽管许多 NMDs 的遗传复杂性仍构成挑战,但这些技术已使诊断更加精确和提前。基因疗法是治疗 NMDs 的变革性突破。在脊髓性肌萎缩症(SMA)方面,nusinersen、onasemnogene abeparvovec 和 risdiplam 等疗法极大地改善了患者的预后。同样,杜氏肌营养不良症(DMD)也取得了重大进展,其中最引人注目的是美国食品及药物管理局批准了首个微量肌营养不良蛋白基因疗法 delandistrogene moxeparvovec。尽管取得了这些进展,但挑战依然存在,包括许多 NMDs 的罕见性、遗传异质性以及与基因组技术和疗法相关的高昂成本。基因疗法、基于 RNA 的疗法和个性化医疗的不断进步有望在治疗这些使人衰弱的疾病方面取得进一步突破。
{"title":"Neuromuscular diseases: genomics-driven advances.","authors":"Anna Cho","doi":"10.1186/s44342-024-00027-y","DOIUrl":"10.1186/s44342-024-00027-y","url":null,"abstract":"<p><p>Neuromuscular diseases (NMDs) are a group of rare disorders characterized by significant genetic and clinical complexity. Advances in genomics have revolutionized both the diagnosis and treatment of NMDs. While fewer than 30 NMDs had known genetic causes before the 1990s, more than 600 have now been identified, largely due to the adoption of next-generation sequencing (NGS) technologies such as whole-exome sequencing (WES) and whole-genome sequencing (WGS). These technologies have enabled more precise and earlier diagnoses, although the genetic complexity of many NMDs continues to pose challenges. Gene therapy has been a transformative breakthrough in the treatment of NMDs. In spinal muscular atrophy (SMA), therapies like nusinersen, onasemnogene abeparvovec, and risdiplam have dramatically improved patient outcomes. Similarly, Duchenne muscular dystrophy (DMD) has seen significant progress, most notably with the FDA approval of delandistrogene moxeparvovec, the first micro-dystrophin gene therapy. Despite these advancements, challenges remain, including the rarity of many NMDs, genetic heterogeneity, and the high costs associated with genomic technologies and therapies. Continued progress in gene therapy, RNA-based therapeutics, and personalized medicine holds promise for further breakthroughs in the management of these debilitating diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"24"},"PeriodicalIF":0.0,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11600827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142735453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining HPO by organ and system to facilitate practical use by clinicians. 按器官和系统检查 HPO,以方便临床医生实际使用。
Pub Date : 2024-11-12 DOI: 10.1186/s44342-024-00024-1
Eisuke Dohi, Terue Takatsuki, Yuka Tateisi, Toyofumi Fujiwara, Yasunori Yamamoto

The Human Phenotype Ontology (HPO) is widely used for annotating clinical text data, and sufficient annotation is crucial for the effective utilization of clinical texts. It was known that the use of LLMs can successfully extract symptoms and findings, but cannot annotate them with the HPO. We hypothesized that one of the potential issue for this is the lack of appropriate terms in the HPO. Therefore, during the Biomedical Linked Annotation Hackathon 8 (BLAH8), we attempted the following two tasks in order to grasp the overall picture of HPO. (1) Extract all HPO terms for each of the 23 HPO subclasses (defined as categories) directly under the HPO "Phenotypic abnormality" and then (2) search for major attributes in each of 23 categories. We employed LLM for these two tasks related to examining HPO and, at the same time, found that LLM didn't work well without ingenuity for tasks that lacked sentences and context. A manual search for terms within each category revealed that the HPO contains a mix of terms with four major attributes: (1) Disease Name, (2) Condition, (3) Test Data, and (4) Symptoms and Findings. Manual curation showed that the ratio of symptoms and findings varied from 0 to 93.1% across categories. For clinicians, who are end-users of medical terminology including HPO, it is difficult to understand ontologies. However, for good quality ontology is also important for good-quality data, and a clinician's help is essential. It is also important to make the overall picture and limitations of ontologies easy to understand in order to bring out the explanatory power of LLMs and artificial intelligence.

人类表型本体(HPO)被广泛用于注释临床文本数据,而充分的注释对于有效利用临床文本至关重要。众所周知,使用 LLMs 可以成功提取症状和检查结果,但却无法使用 HPO 对其进行注释。我们假设,造成这种情况的潜在问题之一是 HPO 中缺乏适当的术语。因此,在生物医学关联注释黑客马拉松 8(BLAH8)期间,我们尝试了以下两项任务,以掌握 HPO 的全貌。(1)直接在 HPO "表型异常 "下提取 23 个 HPO 子类(定义为类别)中每个类别的所有 HPO 术语,然后(2)搜索 23 个类别中每个类别的主要属性。我们在这两项与检查 HPO 相关的任务中使用了 LLM,同时发现,对于缺乏句子和上下文的任务,如果没有巧妙的方法,LLM 的效果并不好。对每个类别中的术语进行人工搜索后发现,HPO 包含具有以下四个主要属性的混合术语:(1) 疾病名称;(2) 条件;(3) 测试数据;(4) 症状和结果。人工整理显示,不同类别中症状和结果的比例从 0% 到 93.1% 不等。临床医生是包括 HPO 在内的医学术语的最终用户,他们很难理解本体。不过,高质量的本体对于高质量的数据也很重要,临床医生的帮助是必不可少的。同样重要的是,要使本体的整体情况和局限性易于理解,以发挥 LLM 和人工智能的解释能力。
{"title":"Examining HPO by organ and system to facilitate practical use by clinicians.","authors":"Eisuke Dohi, Terue Takatsuki, Yuka Tateisi, Toyofumi Fujiwara, Yasunori Yamamoto","doi":"10.1186/s44342-024-00024-1","DOIUrl":"10.1186/s44342-024-00024-1","url":null,"abstract":"<p><p>The Human Phenotype Ontology (HPO) is widely used for annotating clinical text data, and sufficient annotation is crucial for the effective utilization of clinical texts. It was known that the use of LLMs can successfully extract symptoms and findings, but cannot annotate them with the HPO. We hypothesized that one of the potential issue for this is the lack of appropriate terms in the HPO. Therefore, during the Biomedical Linked Annotation Hackathon 8 (BLAH8), we attempted the following two tasks in order to grasp the overall picture of HPO. (1) Extract all HPO terms for each of the 23 HPO subclasses (defined as categories) directly under the HPO \"Phenotypic abnormality\" and then (2) search for major attributes in each of 23 categories. We employed LLM for these two tasks related to examining HPO and, at the same time, found that LLM didn't work well without ingenuity for tasks that lacked sentences and context. A manual search for terms within each category revealed that the HPO contains a mix of terms with four major attributes: (1) Disease Name, (2) Condition, (3) Test Data, and (4) Symptoms and Findings. Manual curation showed that the ratio of symptoms and findings varied from 0 to 93.1% across categories. For clinicians, who are end-users of medical terminology including HPO, it is difficult to understand ontologies. However, for good quality ontology is also important for good-quality data, and a clinician's help is essential. It is also important to make the overall picture and limitations of ontologies easy to understand in order to bring out the explanatory power of LLMs and artificial intelligence.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142635517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Customizing GPT for natural language dialogue interface in database access. 为数据库访问中的自然语言对话界面定制 GPT。
Pub Date : 2024-11-01 DOI: 10.1186/s44342-024-00020-5
Jin-Dong Kim, Kousaku Okubo

The paper presents Anatomy3DExplorer, a customized ChatGPT designed as a natural language dialogue interface for exploring 3D models of anatomical structures. It illustrates the significant potential of large language models (LLMs) as user-friendly interfaces for database access. Furthermore, it showcases the seamless integration of LLMs and database APIs, within the GPTS framework, offering a promising and straightforward approach.

本文介绍了 Anatomy3DExplorer,这是一个定制的 ChatGPT,设计用作自然语言对话界面,用于探索解剖结构的 3D 模型。它展示了大型语言模型(LLM)作为用户友好型数据库访问界面的巨大潜力。此外,它还展示了在 GPTS 框架内 LLM 与数据库 API 的无缝集成,提供了一种前景广阔的直接方法。
{"title":"Customizing GPT for natural language dialogue interface in database access.","authors":"Jin-Dong Kim, Kousaku Okubo","doi":"10.1186/s44342-024-00020-5","DOIUrl":"10.1186/s44342-024-00020-5","url":null,"abstract":"<p><p>The paper presents Anatomy3DExplorer, a customized ChatGPT designed as a natural language dialogue interface for exploring 3D models of anatomical structures. It illustrates the significant potential of large language models (LLMs) as user-friendly interfaces for database access. Furthermore, it showcases the seamless integration of LLMs and database APIs, within the GPTS framework, offering a promising and straightforward approach.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142565407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards automated phenotype definition extraction using large language models. 利用大型语言模型实现自动表型定义提取。
Pub Date : 2024-10-31 DOI: 10.1186/s44342-024-00023-2
Ramya Tekumalla, Juan M Banda

Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data ("hallucinations"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.

电子表型涉及对结构化和非结构化数据的详细分析,采用基于规则的方法、机器学习、自然语言处理和混合方法。目前,准确表型定义的开发需要大量的文献综述和临床专家,因此这一过程既耗时又不可扩展。大型语言模型为表型定义的自动提取提供了一个前景广阔的途径,但也存在一些重大缺陷,包括可靠性问题、产生非事实数据("幻觉")的倾向、误导性结果和潜在危害。为了应对这些挑战,我们的研究着手实现两个关键目标:(1)定义标准评估集,以确保大型语言模型的输出既有用又可靠;(2)评估从大型语言模型中提取表型定义的各种提示方法,并用我们既定的评估任务对它们进行评估。我们的研究结果表明,这项任务仍需要人工评估和验证,结果很有希望。不过,加强表型提取是可能的,这样可以减少文献查阅和评估所花费的时间。
{"title":"Towards automated phenotype definition extraction using large language models.","authors":"Ramya Tekumalla, Juan M Banda","doi":"10.1186/s44342-024-00023-2","DOIUrl":"10.1186/s44342-024-00023-2","url":null,"abstract":"<p><p>Electronic phenotyping involves a detailed analysis of both structured and unstructured data, employing rule-based methods, machine learning, natural language processing, and hybrid approaches. Currently, the development of accurate phenotype definitions demands extensive literature reviews and clinical experts, rendering the process time-consuming and inherently unscalable. Large language models offer a promising avenue for automating phenotype definition extraction but come with significant drawbacks, including reliability issues, the tendency to generate non-factual data (\"hallucinations\"), misleading results, and potential harm. To address these challenges, our study embarked on two key objectives: (1) defining a standard evaluation set to ensure large language models outputs are both useful and reliable and (2) evaluating various prompting approaches to extract phenotype definitions from large language models, assessing them with our established evaluation task. Our findings reveal promising results that still require human evaluation and validation for this task. However, enhanced phenotype extraction is possible, reducing the amount of time spent in literature review and evaluation.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bioregulatory event extraction using large language models: a case study of rice literature. 使用大型语言模型提取生物调控事件:水稻文献案例研究。
Pub Date : 2024-10-31 DOI: 10.1186/s44342-024-00022-3
Xinzhi Yao, Zhihan He, Jingbo Xia

The extraction of biological regulation events has been a key focus in the field of biomedical nature language processing (BioNLP). However, existing methods often encounter challenges such as cascading errors in text mining pipelines and limitations in topic coverage from the selected corpus. Fortunately, the emergence of large language models (LLMs) presents a potential solution due to their robust semantic understanding and extensive knowledge base. To explore this potential, our project at the Biomedical Linked Annotation Hackathon 8 (BLAH 8) investigates the feasibility of using LLMs to extract biological regulation events. Our findings, based on the analysis of rice literature, demonstrate the promising performance of LLMs in this task, while also highlighting several concerns that must be addressed in future LLM-based application in low-resource topic.

生物调控事件的提取一直是生物医学自然语言处理(BioNLP)领域的重点。然而,现有的方法经常会遇到一些挑战,如文本挖掘管道中的级联错误和所选语料库主题覆盖范围的局限性。幸运的是,大型语言模型(LLM)的出现提供了一个潜在的解决方案,因为它们具有强大的语义理解能力和广泛的知识库。为了探索这一潜力,我们在生物医学关联注释黑客马拉松 8(BLAH 8)上的项目研究了使用 LLMs 提取生物调控事件的可行性。我们的研究结果基于对水稻文献的分析,证明了 LLM 在这项任务中的良好表现,同时也强调了未来基于 LLM 的低资源主题应用中必须解决的几个问题。
{"title":"Bioregulatory event extraction using large language models: a case study of rice literature.","authors":"Xinzhi Yao, Zhihan He, Jingbo Xia","doi":"10.1186/s44342-024-00022-3","DOIUrl":"10.1186/s44342-024-00022-3","url":null,"abstract":"<p><p>The extraction of biological regulation events has been a key focus in the field of biomedical nature language processing (BioNLP). However, existing methods often encounter challenges such as cascading errors in text mining pipelines and limitations in topic coverage from the selected corpus. Fortunately, the emergence of large language models (LLMs) presents a potential solution due to their robust semantic understanding and extensive knowledge base. To explore this potential, our project at the Biomedical Linked Annotation Hackathon 8 (BLAH 8) investigates the feasibility of using LLMs to extract biological regulation events. Our findings, based on the analysis of rice literature, demonstrate the promising performance of LLMs in this task, while also highlighting several concerns that must be addressed in future LLM-based application in low-resource topic.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142560352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and accurate short-read alignment with hybrid hash-tree data structure. 利用混合哈希树数据结构实现快速准确的短读取配准。
Pub Date : 2024-10-29 DOI: 10.1186/s44342-024-00012-5
Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki

Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.

NGS(新一代测序仪)产生的短读数数据量迅速增加,需要开发快速准确的读数比对程序。目前使用的是基于哈希表(BLAST)和Burrows-Wheeler变换(bwa-mem)的程序,已知后者性能更优。我们在此介绍一种新算法,它是哈希表和后缀树的混合算法,我们设计它的目的是加快短读数与大型参考序列(如人类基因组)的比对速度。使用我们的系统处理一个人类基因组样本(读取深度为 30)的总周转时间仅为 31 分钟,而使用 bwa-mem/gatk 则需要 25 小时以上。我们的系统仅比对器就需要 28 分钟,而 bwa-mem 则需要约 2 小时。我们的新算法比 bwa-mem 快 4.4 倍,而准确率却相差无几。比对后的变异调用和其他下游分析可以使用开源工具,如 SAMtools 和基因组分析工具包(gatk),以及我们自己的快速变异调用程序来完成。
{"title":"Fast and accurate short-read alignment with hybrid hash-tree data structure.","authors":"Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki","doi":"10.1186/s44342-024-00012-5","DOIUrl":"10.1186/s44342-024-00012-5","url":null,"abstract":"<p><p>Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"19"},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight technology stacks for assistive linked annotations. 用于辅助链接注释的轻量级技术栈。
Pub Date : 2024-10-10 DOI: 10.1186/s44342-024-00021-4
Nishad Thalhath

This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.

本报告介绍了第八届生物医学关联注释黑客马拉松(BLAH)项目的研究成果,该项目旨在探索轻量级技术堆栈,以增强辅助性关联注释。该项目利用现代 JavaScript 框架和边缘函数,实现了浏览器内的命名实体识别(NER)、网络界面内的无服务器嵌入和矢量搜索,以及高效的无服务器全文搜索。通过这种实验方法,证明了这些技术的可行性和性能。结果表明,轻量级堆栈可以显著提高注释工具的效率和成本效益,并在各种使用案例中提供了一种本地优先、面向隐私和安全的解决方案,以替代传统的基于服务器的解决方案。这项工作强调了开发反应更灵敏、可扩展和用户友好的注释界面的潜力,这将使生物信息学研究人员、从业人员和软件开发人员受益匪浅。
{"title":"Lightweight technology stacks for assistive linked annotations.","authors":"Nishad Thalhath","doi":"10.1186/s44342-024-00021-4","DOIUrl":"10.1186/s44342-024-00021-4","url":null,"abstract":"<p><p>This report presents the findings of a project from the 8th Biomedical Linked Annotation Hackathon (BLAH) to explore lightweight technology stacks to enhance assistive linked annotations. Using modern JavaScript frameworks and edge functions, in-browser Named Entity Recognition (NER), serverless embedding and vector search within web interfaces, and efficient serverless full-text search were implemented. Through this experimental approach, a proof of concept to demonstrate the feasibility and performance of these technologies was demonstrated. The results show that lightweight stacks can significantly improve the efficiency and cost-effectiveness of annotation tools and provide a local-first, privacy-oriented, and secure alternative to traditional server-based solutions in various use cases. This work emphasizes the potential of developing annotation interfaces that are more responsive, scalable, and user-friendly, which would benefit bioinformatics researchers, practitioners, and software developers.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"17"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular diagnostic approach to rare neurological diseases from a clinician viewpoint. 从临床医生的角度看罕见神经系统疾病的分子诊断方法。
Pub Date : 2024-10-10 DOI: 10.1186/s44342-024-00025-0
Jin Sook Lee

Advancements in sequencing technology have significantly enhanced diagnostic capabilities for rare neurological diseases. This progress in molecular diagnostics can greatly impact clinical management and facilitate the development of personalized treatments for patients with rare neurological diseases. Neurologists with expertise should raise clinical awareness, as phenotyping remains crucial for making a clinical diagnosis, even in the genomics era. They should prioritize different types of genomic tests, considering both the benefits and the limitations inherent to each test. Notably, long-read sequencing is being utilized in cases suspected to involve repeat expansion disorders or complex structural variants. Repeat expansion disorders are highly prevalent in neurological diseases, particularly within the ataxia group. Significant efforts, including periodic reanalysis, data sharing, or integration of genomics with multi-omics studies, should be directed toward cases that remain undiagnosed after standard next-generation sequencing.

测序技术的进步大大提高了罕见神经系统疾病的诊断能力。分子诊断技术的进步可极大地影响临床管理,促进罕见神经疾病患者个性化治疗的发展。具有专业知识的神经科医生应提高临床意识,因为即使在基因组时代,表型分析对临床诊断仍然至关重要。他们应优先考虑不同类型的基因组检验,同时考虑每种检验的优势和固有局限性。值得注意的是,长线程测序正被用于疑似重复扩展障碍或复杂结构变异的病例。重复扩增紊乱在神经系统疾病中非常普遍,尤其是在共济失调群中。对于标准下一代测序后仍未确诊的病例,应针对其开展大量工作,包括定期重新分析、数据共享或将基因组学与多组学研究相结合。
{"title":"Molecular diagnostic approach to rare neurological diseases from a clinician viewpoint.","authors":"Jin Sook Lee","doi":"10.1186/s44342-024-00025-0","DOIUrl":"10.1186/s44342-024-00025-0","url":null,"abstract":"<p><p>Advancements in sequencing technology have significantly enhanced diagnostic capabilities for rare neurological diseases. This progress in molecular diagnostics can greatly impact clinical management and facilitate the development of personalized treatments for patients with rare neurological diseases. Neurologists with expertise should raise clinical awareness, as phenotyping remains crucial for making a clinical diagnosis, even in the genomics era. They should prioritize different types of genomic tests, considering both the benefits and the limitations inherent to each test. Notably, long-read sequencing is being utilized in cases suspected to involve repeat expansion disorders or complex structural variants. Repeat expansion disorders are highly prevalent in neurological diseases, particularly within the ataxia group. Significant efforts, including periodic reanalysis, data sharing, or integration of genomics with multi-omics studies, should be directed toward cases that remain undiagnosed after standard next-generation sequencing.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"22 1","pages":"18"},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genomics & informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1