Bioinformatics (Oxford, England)最新文献

英文中文

PgRC2: engineering the compression of sequencing reads.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf101

Tomasz M Kowalski, Szymon Grabowski

Summary: The FASTQ format remains at the heart of high-throughput sequencing. Despite advances in specialized FASTQ compressors, they are still imperfect in terms of practical performance tradeoffs. We present a multi-threaded version of Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of approximating the shortest common superstring over high-quality reads. Redundancy in the obtained string is efficiently removed by using a compact temporary representation. The current version, v2.0, preserves the compression ratio of the previous one, reducing the compression (resp. decompression) time by a factor of 8-9 (resp. 2-2.5) on a 14-core/28-thread machine.

Availability and implementation: PgRC 2.0 can be downloaded from https://github.com/kowallus/PgRC and https://zenodo.org/records/14882486 (10.5281/zenodo.14882486).

引用次数: 0

SpectroPipeR-a streamlining post Spectronaut® DIA-MS data analysis R package.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf086

Stephan Michalik, Elke Hammer, Leif Steil, Manuela Gesell Salazar, Christian Hentschker, Kristin Surmann, Larissa M Busch, Thomas Sura, Uwe Völker

Summary: Proteome studies frequently encounter challenges in down-stream data analysis due to limited bioinformatics resources, rapid data generation, and variations in analytical methods. To address these issues, we developed SpectroPipeR, an R package designed to streamline data analysis tasks and provide a comprehensive, standardized pipeline for Spectronaut® DIA-MS data. This novel package automates various analytical processes, including XIC plots, ID rate summary, normalization, batch and covariate adjustment, relative protein quantification, multivariate analysis, and statistical analysis, while generating interactive HTML reports for e.g. ELN systems.

Availability and implementation: The SpectroPipeR package (manual: https://stemicha.github.io/SpectroPipeR/) was written in R and is freely available on GitHub (https://github.com/stemicha/SpectroPipeR).

引用次数: 0

FlowPacker: protein side-chain packing with torsional flow matching. FlowPacker：具有扭流匹配的蛋白质侧链填料。

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf010

Jin Sub Lee, Philip M Kim

Motivation: Accurate prediction of protein side-chain conformations is necessary to understand protein folding, protein-protein interactions and facilitate de novo protein design.

Results: Here, we apply torsional flow matching and equivariant graph attention to develop FlowPacker, a fast and performant model to predict protein side-chain conformations conditioned on the protein sequence and backbone. We show that FlowPacker outperforms previous state-of-the-art baselines across most metrics with improved runtime. We further show that FlowPacker can be used to inpaint missing side-chain coordinates and also for multimeric targets, and exhibits strong performance on a test set of antibody-antigen complexes.

Availability and implementation: Code is available at https://gitlab.com/mjslee0921/flowpacker.

动机：准确预测蛋白质侧链构象对于理解蛋白质折叠、蛋白质相互作用和促进蛋白质从头设计是必要的。结果：本文应用扭转流匹配和等变图关注开发了FlowPacker模型，该模型能够快速预测蛋白质侧链构象，并以蛋白质序列和主链为条件。我们发现，FlowPacker在大多数指标上都优于之前最先进的基准，并改善了运行时间。我们进一步表明，FlowPacker可以用于填补缺失的侧链坐标，也可以用于多聚靶标，并且在抗体-抗原复合物的测试集上表现出很强的性能。可用性：代码可在https://gitlab.com/mjslee0921/flowpacker.Supplementary上获得；补充数据可在Bioinformatics在线上获得。

引用次数: 0

MUSET: set of utilities for constructing abundance unitig matrices from sequencing data.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf054

Riccardo Vicedomini, Francesco Andreace, Yoann Dufresne, Rayan Chikhi, Camila Duitama González

Summary: MUSET is a novel set of utilities designed to efficiently construct abundance unitig matrices from sequencing data. Unitig matrices extend the concept of k-mer matrices by merging overlapping k-mers that unambiguously belong to the same sequence. MUSET addresses the limitations of current software by integrating k-mer counting and unitig extraction to generate unitig matrices containing abundance values, as opposed to only presence-absence in previous tools. These matrices preserve variations between samples while reducing disk space and the number of rows compared to k-mer matrices. We evaluated MUSET's performance using datasets derived from a 618-GB collection of ancient oral sequencing samples, producing a filtered unitig matrix that records abundances in <10 h and 20 GB memory.

Availability and implementation: MUSET is open source and publicly available under the AGPL-3.0 licence in GitHub at https://github.com/CamilaDuitama/muset. Source code is implemented in C++ and provided with kmat_tools, a collection of tools for processing k-mer matrices. Version v0.5.1 is available on Zenodo with DOI 10.5281/zenodo.14164801.

{"title":"MUSET: set of utilities for constructing abundance unitig matrices from sequencing data.","authors":"Riccardo Vicedomini, Francesco Andreace, Yoann Dufresne, Rayan Chikhi, Camila Duitama González","doi":"10.1093/bioinformatics/btaf054","DOIUrl":"10.1093/bioinformatics/btaf054","url":null,"abstract":"Summary: MUSET is a novel set of utilities designed to efficiently construct abundance unitig matrices from sequencing data. Unitig matrices extend the concept of k-mer matrices by merging overlapping k-mers that unambiguously belong to the same sequence. MUSET addresses the limitations of current software by integrating k-mer counting and unitig extraction to generate unitig matrices containing abundance values, as opposed to only presence-absence in previous tools. These matrices preserve variations between samples while reducing disk space and the number of rows compared to k-mer matrices. We evaluated MUSET's performance using datasets derived from a 618-GB collection of ancient oral sequencing samples, producing a filtered unitig matrix that records abundances in <10 h and 20 GB memory.Availability and implementation: MUSET is open source and publicly available under the AGPL-3.0 licence in GitHub at https://github.com/CamilaDuitama/muset. Source code is implemented in C++ and provided with kmat_tools, a collection of tools for processing k-mer matrices. Version v0.5.1 is available on Zenodo with DOI 10.5281/zenodo.14164801.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897428/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating multiple alignments on a pangenomic scale.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf104

Jannik Olbrich, Thomas Büchler, Enno Ohlebusch

Motivation: Since novel long read sequencing technologies allow for de novo assembly of many individuals of a species, high-quality assemblies are becoming widely available. For example, the recently published draft human pangenome reference was based on assemblies composed of contigs. There is an urgent need for a software-tool that is able to generate a multiple alignment of genomes of the same species because current multiple sequence alignment programs cannot deal with such a volume of data.

Results: We show that the combination of a well-known anchor-based method with the technique of prefix-free parsing yields an approach that is able to generate multiple alignments on a pangenomic scale, provided that large-scale structural variants are rare. Furthermore, experiments with real world data show that our software tool PANgenomic Anchor-based Multiple Alignment significantly outperforms current state-of-the art programs.

Availability and implementation: Source code is available at: https://gitlab.com/qwerzuiop/panama, archived at swh:1:dir:e90c9f664995acca9063245cabdd97549cf39694.

引用次数: 0

Embed-Search-Align: DNA sequence alignment using Transformer models.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf041

Pavan Holur, K C Enevoldsen, Shreyas Rajesh, Lajoyce Mboning, Thalia Georgiou, Louis-S Bouchard, Matteo Pellegrini, Vwani Roychowdhury

Motivation: DNA sequence alignment, an important genomic task, involves assigning short DNA reads to the most probable locations on an extensive reference genome. Conventional methods tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have encoded DNA sequences into vectors using Transformers and have shown promising results in tasks involving classification of short DNA sequences. Performance at sequence classification tasks does not, however, guarantee sequence alignment, where it is necessary to conduct a genome-wide search to align every read successfully, a significantly longer-range task by comparison.

Results: We bridge this gap by developing a "Embed-Search-Align" (ESA) framework, where a novel Reference-Free DNA Embedding (RDE) Transformer model generates vector embeddings of reads and fragments of the reference in a shared vector space; read-fragment distance metric is then used as a surrogate for sequence similarity. ESA introduces: (i) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich reference-free, sequence-level embeddings, and (ii) a DNA vector store to enable search across fragments on a global scale. RDE is 99% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), rivaling conventional algorithmic sequence alignment methods such as Bowtie and BWA-Mem. RDE far exceeds the performance of six recent DNA-Transformer model baselines such as Nucleotide Transformer, Hyena-DNA, and shows task transfer across chromosomes and species.

Availability and implementation: Please see https://anonymous.4open.science/r/dna2vec-7E4E/readme.md.

{"title":"Embed-Search-Align: DNA sequence alignment using Transformer models.","authors":"Pavan Holur, K C Enevoldsen, Shreyas Rajesh, Lajoyce Mboning, Thalia Georgiou, Louis-S Bouchard, Matteo Pellegrini, Vwani Roychowdhury","doi":"10.1093/bioinformatics/btaf041","DOIUrl":"10.1093/bioinformatics/btaf041","url":null,"abstract":"Motivation: DNA sequence alignment, an important genomic task, involves assigning short DNA reads to the most probable locations on an extensive reference genome. Conventional methods tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have encoded DNA sequences into vectors using Transformers and have shown promising results in tasks involving classification of short DNA sequences. Performance at sequence classification tasks does not, however, guarantee sequence alignment, where it is necessary to conduct a genome-wide search to align every read successfully, a significantly longer-range task by comparison.Results: We bridge this gap by developing a \"Embed-Search-Align\" (ESA) framework, where a novel Reference-Free DNA Embedding (RDE) Transformer model generates vector embeddings of reads and fragments of the reference in a shared vector space; read-fragment distance metric is then used as a surrogate for sequence similarity. ESA introduces: (i) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich reference-free, sequence-level embeddings, and (ii) a DNA vector store to enable search across fragments on a global scale. RDE is 99% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), rivaling conventional algorithmic sequence alignment methods such as Bowtie and BWA-Mem. RDE far exceeds the performance of six recent DNA-Transformer model baselines such as Nucleotide Transformer, Hyena-DNA, and shows task transfer across chromosomes and species.Availability and implementation: Please see https://anonymous.4open.science/r/dna2vec-7E4E/readme.md.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11919449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143367052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ENACT: End-to-End Analysis of Visium High Definition (HD) Data. ENACT：Visium 高清（HD）数据端到端分析。

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf094

Mena Kamel, Yiwen Song, Ana Solbas, Sergio Villordo, Amrut Sarangi, Pavel Senin, Mathew Sunaal, Luis Cano Ayestas, Clement Levin, Seqian Wang, Marion Classe, Ziv Bar-Joseph, Albert Pla Planas

Motivation: Spatial transcriptomics (ST) enables the study of gene expression within its spatial context in histopathology samples. To date, a limiting factor has been the resolution of sequencing based ST products. The introduction of the Visium High Definition (HD) technology opens the door to cell resolution ST studies. However, challenges remain in the ability to accurately map transcripts to cells and in assigning cell types based on the transcript data.

Results: We developed ENACT, a self-contained pipeline that integrates advanced cell segmentation with Visium HD transcriptomics data to infer cell types across whole tissue sections. Our pipeline incorporates novel bin-to-cell assignment methods, enhancing the accuracy of single-cell transcript estimates. Validated on diverse synthetic and real datasets, our approach is both scalable to samples with hundreds of thousands of cells and effective, offering a robust solution for spatially resolved transcriptomics analysis.

Availability and implementation: ENACT source code is available at https://github.com/Sanofi-Public/enact-pipeline. Experimental data are available at https://zenodo.org/records/14748859.

{"title":"ENACT: End-to-End Analysis of Visium High Definition (HD) Data.","authors":"Mena Kamel, Yiwen Song, Ana Solbas, Sergio Villordo, Amrut Sarangi, Pavel Senin, Mathew Sunaal, Luis Cano Ayestas, Clement Levin, Seqian Wang, Marion Classe, Ziv Bar-Joseph, Albert Pla Planas","doi":"10.1093/bioinformatics/btaf094","DOIUrl":"10.1093/bioinformatics/btaf094","url":null,"abstract":"Motivation: Spatial transcriptomics (ST) enables the study of gene expression within its spatial context in histopathology samples. To date, a limiting factor has been the resolution of sequencing based ST products. The introduction of the Visium High Definition (HD) technology opens the door to cell resolution ST studies. However, challenges remain in the ability to accurately map transcripts to cells and in assigning cell types based on the transcript data.Results: We developed ENACT, a self-contained pipeline that integrates advanced cell segmentation with Visium HD transcriptomics data to infer cell types across whole tissue sections. Our pipeline incorporates novel bin-to-cell assignment methods, enhancing the accuracy of single-cell transcript estimates. Validated on diverse synthetic and real datasets, our approach is both scalable to samples with hundreds of thousands of cells and effective, offering a robust solution for spatially resolved transcriptomics analysis.Availability and implementation: ENACT source code is available at https://github.com/Sanofi-Public/enact-pipeline. Experimental data are available at https://zenodo.org/records/14748859.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11925495/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MetAssimulo 2.0: a web app for simulating realistic 1D and 2D metabolomic 1H NMR spectra.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf045

Yan Yan, Beatriz Jiménez, Michael T Judge, Toby Athersuch, Maria De Iorio, Timothy M D Ebbels

Motivation: Metabolomics extensively utilizes nuclear magnetic resonance (NMR) spectroscopy due to its excellent reproducibility and high throughput. Both 1D and 2D NMR spectra provide crucial information for metabolite annotation and quantification, yet present complex overlapping patterns which may require sophisticated machine learning algorithms to decipher. Unfortunately, the limited availability of labeled spectra can hamper application of machine learning, especially deep learning algorithms which require large amounts of labeled data. In this context, simulation of spectral data becomes a tractable solution for algorithm development.

Results: Here, we introduce MetAssimulo 2.0, a comprehensive upgrade of the MetAssimulo 1.b metabolomic 1H NMR simulation tool, reimplemented as a Python-based web application. Where MetAssimulo 1.0 only simulated 1D 1H spectra of human urine, MetAssimulo 2.0 expands functionality to urine, blood, and cerebral spinal fluid, enhancing the realism of blood spectra by incorporating a broad protein background. This enhancement enables a closer approximation to real blood spectra, achieving a Pearson correlation of approximately 0.82. Moreover, this tool now includes simulation capabilities for 2D J-resolved (J-Res) and Correlation Spectroscopy spectra, significantly broadening its utility in complex mixture analysis. MetAssimulo 2.0 simulates both single, and groups, of spectra with both discrete (case-control, e.g. heart transplant versus healthy) and continuous (e.g. body mass index) outcomes and includes inter-metabolite correlations. It thus supports a range of experimental designs and demonstrating associations between metabolite profiles and biomedical responses.By enhancing NMR spectral simulations, MetAssimulo 2.0 is well positioned to support and enhance research at the intersection of deep learning and metabolomics.

Availability and implementation: The code and the detailed instruction/tutorial for MetAssimulo 2.0 is available at https://github.com/yanyan5420/MetAssimulo_2.git. The relevant NMR spectra for metabolites are deposited in MetaboLights with accession number MTBLS12081.

{"title":"MetAssimulo 2.0: a web app for simulating realistic 1D and 2D metabolomic 1H NMR spectra.","authors":"Yan Yan, Beatriz Jiménez, Michael T Judge, Toby Athersuch, Maria De Iorio, Timothy M D Ebbels","doi":"10.1093/bioinformatics/btaf045","DOIUrl":"10.1093/bioinformatics/btaf045","url":null,"abstract":"Motivation: Metabolomics extensively utilizes nuclear magnetic resonance (NMR) spectroscopy due to its excellent reproducibility and high throughput. Both 1D and 2D NMR spectra provide crucial information for metabolite annotation and quantification, yet present complex overlapping patterns which may require sophisticated machine learning algorithms to decipher. Unfortunately, the limited availability of labeled spectra can hamper application of machine learning, especially deep learning algorithms which require large amounts of labeled data. In this context, simulation of spectral data becomes a tractable solution for algorithm development.Results: Here, we introduce MetAssimulo 2.0, a comprehensive upgrade of the MetAssimulo 1.b metabolomic 1H NMR simulation tool, reimplemented as a Python-based web application. Where MetAssimulo 1.0 only simulated 1D 1H spectra of human urine, MetAssimulo 2.0 expands functionality to urine, blood, and cerebral spinal fluid, enhancing the realism of blood spectra by incorporating a broad protein background. This enhancement enables a closer approximation to real blood spectra, achieving a Pearson correlation of approximately 0.82. Moreover, this tool now includes simulation capabilities for 2D J-resolved (J-Res) and Correlation Spectroscopy spectra, significantly broadening its utility in complex mixture analysis. MetAssimulo 2.0 simulates both single, and groups, of spectra with both discrete (case-control, e.g. heart transplant versus healthy) and continuous (e.g. body mass index) outcomes and includes inter-metabolite correlations. It thus supports a range of experimental designs and demonstrating associations between metabolite profiles and biomedical responses.By enhancing NMR spectral simulations, MetAssimulo 2.0 is well positioned to support and enhance research at the intersection of deep learning and metabolomics.Availability and implementation: The code and the detailed instruction/tutorial for MetAssimulo 2.0 is available at https://github.com/yanyan5420/MetAssimulo_2.git. The relevant NMR spectra for metabolites are deposited in MetaboLights with accession number MTBLS12081.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AskBeacon-performing genomic data exchange and analytics with natural language.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf079

Anuradha Wickramarachchi, Shakila Tonni, Sonali Majumdar, Sarvnaz Karimi, Sulev Kõks, Brendan Hosking, Jordi Rambla, Natalie A Twine, Yatish Jain, Denis C Bauer

Motivation: Enabling clinicians and researchers to directly interact with global genomic data resources by removing technological barriers is vital for medical genomics. AskBeacon enables large language models (LLMs) to be applied to securely shared cohorts via the Global Alliance for Genomics and Health Beacon protocol. By simply "asking" Beacon, actionable insights can be gained, analyzed, and made publication-ready.

Results: In the Parkinson's Progression Markers Initiative (PPMI), we use natural language to ask whether the sex-differences observed in Parkinson's disease are due to X-linked or autosomal markers. AskBeacon returns a publication-ready visualization showing that for PPMI the autosomal marker occurred 1.4 times more often in males with Parkinson's disease than females, compared to no differences for the X-linked marker. We evaluate commercial and open-weight LLM models, as well as different architectures to identify the best strategy for translating research questions to Beacon queries. AskBeacon implements extensive safety guardrails to ensure that genomic data is not exposed to the LLM directly, and that generated code for data extraction, analysis and visualization process is sanitized and hallucination resistant, so data cannot be leaked or falsified.

Availability and implementation: AskBeacon is available at https://github.com/aehrc/AskBeacon.

{"title":"AskBeacon-performing genomic data exchange and analytics with natural language.","authors":"Anuradha Wickramarachchi, Shakila Tonni, Sonali Majumdar, Sarvnaz Karimi, Sulev Kõks, Brendan Hosking, Jordi Rambla, Natalie A Twine, Yatish Jain, Denis C Bauer","doi":"10.1093/bioinformatics/btaf079","DOIUrl":"10.1093/bioinformatics/btaf079","url":null,"abstract":"Motivation: Enabling clinicians and researchers to directly interact with global genomic data resources by removing technological barriers is vital for medical genomics. AskBeacon enables large language models (LLMs) to be applied to securely shared cohorts via the Global Alliance for Genomics and Health Beacon protocol. By simply \"asking\" Beacon, actionable insights can be gained, analyzed, and made publication-ready.Results: In the Parkinson's Progression Markers Initiative (PPMI), we use natural language to ask whether the sex-differences observed in Parkinson's disease are due to X-linked or autosomal markers. AskBeacon returns a publication-ready visualization showing that for PPMI the autosomal marker occurred 1.4 times more often in males with Parkinson's disease than females, compared to no differences for the X-linked marker. We evaluate commercial and open-weight LLM models, as well as different architectures to identify the best strategy for translating research questions to Beacon queries. AskBeacon implements extensive safety guardrails to ensure that genomic data is not exposed to the LLM directly, and that generated code for data extraction, analysis and visualization process is sanitized and hallucination resistant, so data cannot be leaked or falsified.Availability and implementation: AskBeacon is available at https://github.com/aehrc/AskBeacon.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143476922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ImmunoTar-integrative prioritization of cell surface targets for cancer immunotherapy.

Bioinformatics (Oxford, England)

Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf060

Rawan Shraim, Brian Mooney, Karina L Conkrite, Amber K Hamilton, Gregg B Morin, Poul H Sorensen, John M Maris, Sharon J Diskin, Ahmet Sacan

Motivation: Cancer remains a leading cause of mortality globally. Recent improvements in survival have been facilitated by the development of targeted and less toxic immunotherapies, such as chimeric antigen receptor (CAR)-T cells and antibody-drug conjugates (ADCs). These therapies, effective in treating both pediatric and adult patients with solid and hematological malignancies, rely on the identification of cancer-specific surface protein targets. While technologies like RNA sequencing and proteomics exist to survey these targets, identifying optimal targets for immunotherapies remains a challenge in the field.

Results: To address this challenge, we developed ImmunoTar, a novel computational tool designed to systematically prioritize candidate immunotherapeutic targets. ImmunoTar integrates user-provided RNA-sequencing or proteomics data with quantitative features from multiple public databases, selected based on predefined criteria, to generate a score representing the gene's suitability as an immunotherapeutic target. We validated ImmunoTar using three distinct cancer datasets, demonstrating its effectiveness in identifying both known and novel targets across various cancer phenotypes. By compiling diverse data into a unified platform, ImmunoTar enables comprehensive evaluation of surface proteins, streamlining target identification and empowering researchers to efficiently allocate resources, thereby accelerating the development of effective cancer immunotherapies.

Availability and implementation: Code and data to run and test ImmunoTar are available at https://github.com/sacanlab/immunotar.

{"title":"ImmunoTar-integrative prioritization of cell surface targets for cancer immunotherapy.","authors":"Rawan Shraim, Brian Mooney, Karina L Conkrite, Amber K Hamilton, Gregg B Morin, Poul H Sorensen, John M Maris, Sharon J Diskin, Ahmet Sacan","doi":"10.1093/bioinformatics/btaf060","DOIUrl":"10.1093/bioinformatics/btaf060","url":null,"abstract":"Motivation: Cancer remains a leading cause of mortality globally. Recent improvements in survival have been facilitated by the development of targeted and less toxic immunotherapies, such as chimeric antigen receptor (CAR)-T cells and antibody-drug conjugates (ADCs). These therapies, effective in treating both pediatric and adult patients with solid and hematological malignancies, rely on the identification of cancer-specific surface protein targets. While technologies like RNA sequencing and proteomics exist to survey these targets, identifying optimal targets for immunotherapies remains a challenge in the field.Results: To address this challenge, we developed ImmunoTar, a novel computational tool designed to systematically prioritize candidate immunotherapeutic targets. ImmunoTar integrates user-provided RNA-sequencing or proteomics data with quantitative features from multiple public databases, selected based on predefined criteria, to generate a score representing the gene's suitability as an immunotherapeutic target. We validated ImmunoTar using three distinct cancer datasets, demonstrating its effectiveness in identifying both known and novel targets across various cancer phenotypes. By compiling diverse data into a unified platform, ImmunoTar enables comprehensive evaluation of surface proteins, streamlining target identification and empowering researchers to efficiently allocate resources, thereby accelerating the development of effective cancer immunotherapies.Availability and implementation: Code and data to run and test ImmunoTar are available at https://github.com/sacanlab/immunotar.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Bioinformatics (Oxford, England)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀