Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong
{"title":"Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis","authors":"Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong","doi":"arxiv-2407.09089","DOIUrl":null,"url":null,"abstract":"Interrogation of biological pathways is an integral part of omics data\nanalysis. Large language models (LLMs) enable the generation of custom pathways\nand gene sets tailored to specific scientific questions. These targeted sets\nare significantly smaller than traditional pathway enrichment analysis\nlibraries, reducing multiple hypothesis testing and potentially enhancing\nstatistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a\npython-based bioinformatics toolkit that streamlines the generation of pathways\nand gene sets for transcriptomic analysis. It operates in three steps: 1)\nderiving relevant pathways based on the researcher's scientific question, 2)\ngenerating valid gene sets for each pathway, and 3) outputting the results as\n.GMX files. Lomics also provides explanations for pathway selections.\nConsistency and accuracy are ensured through iterative processes, JSON format\nvalidation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol\nverification. Lomics serves as a foundation for integrating LLMs into omics\nresearch, potentially improving the specificity and efficiency of pathway\nanalysis.","PeriodicalId":501325,"journal":{"name":"arXiv - QuanBio - Molecular Networks","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Molecular Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.09089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Interrogation of biological pathways is an integral part of omics data
analysis. Large language models (LLMs) enable the generation of custom pathways
and gene sets tailored to specific scientific questions. These targeted sets
are significantly smaller than traditional pathway enrichment analysis
libraries, reducing multiple hypothesis testing and potentially enhancing
statistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a
python-based bioinformatics toolkit that streamlines the generation of pathways
and gene sets for transcriptomic analysis. It operates in three steps: 1)
deriving relevant pathways based on the researcher's scientific question, 2)
generating valid gene sets for each pathway, and 3) outputting the results as
.GMX files. Lomics also provides explanations for pathway selections.
Consistency and accuracy are ensured through iterative processes, JSON format
validation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol
verification. Lomics serves as a foundation for integrating LLMs into omics
research, potentially improving the specificity and efficiency of pathway
analysis.