{"title":"SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing","authors":"Devam Mondal, Atharva Inamdar","doi":"arxiv-2407.03381","DOIUrl":null,"url":null,"abstract":"RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq,\nare critical tools for the biologist looking to analyze the genetic\nactivity/transcriptome of a tissue or cell during an experimental procedure.\nPlatforms like Illumina's next-generation sequencing (NGS) are used to produce\nthe raw data for this experimental procedure. This raw FASTQ data must then be\nprepared via a complex series of data manipulations by bioinformaticians. This\nprocess currently takes place on an unwieldy textual user interface like a\nterminal/command line that requires the user to install and import multiple\nprogram packages, preventing the untrained biologist from initiating data\nanalysis. Open-source platforms like Galaxy have produced a more user-friendly\npipeline, yet the visual interface remains cluttered and highly technical,\nremaining uninviting for the natural scientist. To address this, SeqMate is a\nuser-friendly tool that allows for one-click analytics by utilizing the power\nof a large language model (LLM) to automate both data preparation and analysis\n(differential expression, trajectory analysis, etc). Furthermore, by utilizing\nthe power of generative AI, SeqMate is also capable of analyzing such findings\nand producing written reports of upregulated/downregulated/user-prompted genes\nwith sources cited from known repositories like PubMed, PDB, and Uniprot.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.03381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq,
are critical tools for the biologist looking to analyze the genetic
activity/transcriptome of a tissue or cell during an experimental procedure.
Platforms like Illumina's next-generation sequencing (NGS) are used to produce
the raw data for this experimental procedure. This raw FASTQ data must then be
prepared via a complex series of data manipulations by bioinformaticians. This
process currently takes place on an unwieldy textual user interface like a
terminal/command line that requires the user to install and import multiple
program packages, preventing the untrained biologist from initiating data
analysis. Open-source platforms like Galaxy have produced a more user-friendly
pipeline, yet the visual interface remains cluttered and highly technical,
remaining uninviting for the natural scientist. To address this, SeqMate is a
user-friendly tool that allows for one-click analytics by utilizing the power
of a large language model (LLM) to automate both data preparation and analysis
(differential expression, trajectory analysis, etc). Furthermore, by utilizing
the power of generative AI, SeqMate is also capable of analyzing such findings
and producing written reports of upregulated/downregulated/user-prompted genes
with sources cited from known repositories like PubMed, PDB, and Uniprot.