Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis

Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong
{"title":"Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis","authors":"Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong","doi":"arxiv-2407.09089","DOIUrl":null,"url":null,"abstract":"Interrogation of biological pathways is an integral part of omics data\nanalysis. Large language models (LLMs) enable the generation of custom pathways\nand gene sets tailored to specific scientific questions. These targeted sets\nare significantly smaller than traditional pathway enrichment analysis\nlibraries, reducing multiple hypothesis testing and potentially enhancing\nstatistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a\npython-based bioinformatics toolkit that streamlines the generation of pathways\nand gene sets for transcriptomic analysis. It operates in three steps: 1)\nderiving relevant pathways based on the researcher's scientific question, 2)\ngenerating valid gene sets for each pathway, and 3) outputting the results as\n.GMX files. Lomics also provides explanations for pathway selections.\nConsistency and accuracy are ensured through iterative processes, JSON format\nvalidation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol\nverification. Lomics serves as a foundation for integrating LLMs into omics\nresearch, potentially improving the specificity and efficiency of pathway\nanalysis.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.09089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Interrogation of biological pathways is an integral part of omics data analysis. Large language models (LLMs) enable the generation of custom pathways and gene sets tailored to specific scientific questions. These targeted sets are significantly smaller than traditional pathway enrichment analysis libraries, reducing multiple hypothesis testing and potentially enhancing statistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a python-based bioinformatics toolkit that streamlines the generation of pathways and gene sets for transcriptomic analysis. It operates in three steps: 1) deriving relevant pathways based on the researcher's scientific question, 2) generating valid gene sets for each pathway, and 3) outputting the results as .GMX files. Lomics also provides explanations for pathway selections. Consistency and accuracy are ensured through iterative processes, JSON format validation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol verification. Lomics serves as a foundation for integrating LLMs into omics research, potentially improving the specificity and efficiency of pathway analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Lomics:使用大型语言模型生成通路和基因组,用于转录组分析
对生物通路的研究是 omics 数据分析不可或缺的一部分。大型语言模型(LLM)可以生成针对特定科学问题的定制通路和基因集。这些目标集比传统的通路富集分析库小很多,减少了多重假设检验,并可能提高统计能力。Lomics (Large Language Models for Omics Studies) v1.0 是一个基于 Python 的生物信息学工具包,可简化转录组分析中通路和基因集的生成。它分为三个步骤1)根据研究人员的科学问题生成相关通路;2)为每个通路生成有效的基因组;3)将结果输出为 GMX 文件。通过迭代过程、JSON 格式验证和 HUGO 基因命名委员会 (HGNC) 基因符号验证,确保了一致性和准确性。Lomics 是将 LLMs 整合到 omics 研究中的基础,有可能提高通路分析的特异性和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-variable control to mitigate loads in CRISPRa networks Some bounds on positive equilibria in mass action networks Non-explosivity of endotactic stochastic reaction systems Limits on the computational expressivity of non-equilibrium biophysical processes When lowering temperature, the in vivo circadian clock in cyanobacteria follows and surpasses the in vitro protein clock trough the Hopf bifurcation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1