Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong
{"title":"Lomics:使用大型语言模型生成通路和基因组,用于转录组分析","authors":"Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong","doi":"arxiv-2407.09089","DOIUrl":null,"url":null,"abstract":"Interrogation of biological pathways is an integral part of omics data\nanalysis. Large language models (LLMs) enable the generation of custom pathways\nand gene sets tailored to specific scientific questions. These targeted sets\nare significantly smaller than traditional pathway enrichment analysis\nlibraries, reducing multiple hypothesis testing and potentially enhancing\nstatistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a\npython-based bioinformatics toolkit that streamlines the generation of pathways\nand gene sets for transcriptomic analysis. It operates in three steps: 1)\nderiving relevant pathways based on the researcher's scientific question, 2)\ngenerating valid gene sets for each pathway, and 3) outputting the results as\n.GMX files. Lomics also provides explanations for pathway selections.\nConsistency and accuracy are ensured through iterative processes, JSON format\nvalidation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol\nverification. Lomics serves as a foundation for integrating LLMs into omics\nresearch, potentially improving the specificity and efficiency of pathway\nanalysis.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis\",\"authors\":\"Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong\",\"doi\":\"arxiv-2407.09089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interrogation of biological pathways is an integral part of omics data\\nanalysis. Large language models (LLMs) enable the generation of custom pathways\\nand gene sets tailored to specific scientific questions. These targeted sets\\nare significantly smaller than traditional pathway enrichment analysis\\nlibraries, reducing multiple hypothesis testing and potentially enhancing\\nstatistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a\\npython-based bioinformatics toolkit that streamlines the generation of pathways\\nand gene sets for transcriptomic analysis. It operates in three steps: 1)\\nderiving relevant pathways based on the researcher's scientific question, 2)\\ngenerating valid gene sets for each pathway, and 3) outputting the results as\\n.GMX files. Lomics also provides explanations for pathway selections.\\nConsistency and accuracy are ensured through iterative processes, JSON format\\nvalidation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol\\nverification. Lomics serves as a foundation for integrating LLMs into omics\\nresearch, potentially improving the specificity and efficiency of pathway\\nanalysis.\",\"PeriodicalId\":501325,\"journal\":{\"name\":\"arXiv - QuanBio - Molecular Networks\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Molecular Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.09089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.09089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
对生物通路的研究是 omics 数据分析不可或缺的一部分。大型语言模型(LLM)可以生成针对特定科学问题的定制通路和基因集。这些目标集比传统的通路富集分析库小很多,减少了多重假设检验,并可能提高统计能力。Lomics (Large Language Models for Omics Studies) v1.0 是一个基于 Python 的生物信息学工具包,可简化转录组分析中通路和基因集的生成。它分为三个步骤1)根据研究人员的科学问题生成相关通路;2)为每个通路生成有效的基因组;3)将结果输出为 GMX 文件。通过迭代过程、JSON 格式验证和 HUGO 基因命名委员会 (HGNC) 基因符号验证,确保了一致性和准确性。Lomics 是将 LLMs 整合到 omics 研究中的基础,有可能提高通路分析的特异性和效率。
Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis
Interrogation of biological pathways is an integral part of omics data
analysis. Large language models (LLMs) enable the generation of custom pathways
and gene sets tailored to specific scientific questions. These targeted sets
are significantly smaller than traditional pathway enrichment analysis
libraries, reducing multiple hypothesis testing and potentially enhancing
statistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a
python-based bioinformatics toolkit that streamlines the generation of pathways
and gene sets for transcriptomic analysis. It operates in three steps: 1)
deriving relevant pathways based on the researcher's scientific question, 2)
generating valid gene sets for each pathway, and 3) outputting the results as
.GMX files. Lomics also provides explanations for pathway selections.
Consistency and accuracy are ensured through iterative processes, JSON format
validation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol
verification. Lomics serves as a foundation for integrating LLMs into omics
research, potentially improving the specificity and efficiency of pathway
analysis.