Pub Date : 2024-11-15DOI: 10.1093/bioinformatics/btae687
Robert C Edgar
Motivation: Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.
Results: Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a "mega-alphabet" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.
Availability: https://github.com/rcedgar/reseek.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Protein structure alignment by reseek improves sensitivity to remote homologs.","authors":"Robert C Edgar","doi":"10.1093/bioinformatics/btae687","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae687","url":null,"abstract":"<p><strong>Motivation: </strong>Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.</p><p><strong>Results: </strong>Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a \"mega-alphabet\" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.</p><p><strong>Availability: </strong>https://github.com/rcedgar/reseek.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Spatial transcriptomics allows for the measurement of high-throughput gene expression data while preserving the spatial structure of tissues and histological images. Integrating gene expression, spatial information, and image data to learn discriminative low-dimensional representations is critical for dissecting tissue heterogeneity and analyzing biological functions. However, most existing methods have limitations in effectively utilizing spatial information and high-resolution histological images. We propose a signal-diffusion-based unsupervised contrast learning method (SDUCL) for learning low-dimensional latent embeddings of cells/spots.
Results: SDUCL integrates image features, spatial relationships and gene expression information. We designed a signal diffusion microenvironment discovery algorithm, which effectively captures and integrates interaction information within the cellular microenvironment by simulating the biological signal diffusion process. By maximizing the mutual information between the local representation and the microenvironment representation of cells/spots, SDUCL learns more discriminative representations. SDUCL was employed to analyze spatial transcriptomics datasets from multiple species, encompassing both normal and tumor tissues. SDUCL performed well in downstream tasks such as clustering, visualization, trajectory inference, and differential gene analysis, thereby enhancing our understanding of tissue structure and tumor microenvironments.
{"title":"A signal-diffusion-based unsupervised contrastive representation learning for spatial transcriptomics analysis.","authors":"Nan Chen, Xiao Yu, Weimin Li, Fangfang Liu, Yin Luo, Zhongkun Zuo","doi":"10.1093/bioinformatics/btae663","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae663","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics allows for the measurement of high-throughput gene expression data while preserving the spatial structure of tissues and histological images. Integrating gene expression, spatial information, and image data to learn discriminative low-dimensional representations is critical for dissecting tissue heterogeneity and analyzing biological functions. However, most existing methods have limitations in effectively utilizing spatial information and high-resolution histological images. We propose a signal-diffusion-based unsupervised contrast learning method (SDUCL) for learning low-dimensional latent embeddings of cells/spots.</p><p><strong>Results: </strong>SDUCL integrates image features, spatial relationships and gene expression information. We designed a signal diffusion microenvironment discovery algorithm, which effectively captures and integrates interaction information within the cellular microenvironment by simulating the biological signal diffusion process. By maximizing the mutual information between the local representation and the microenvironment representation of cells/spots, SDUCL learns more discriminative representations. SDUCL was employed to analyze spatial transcriptomics datasets from multiple species, encompassing both normal and tumor tissues. SDUCL performed well in downstream tasks such as clustering, visualization, trajectory inference, and differential gene analysis, thereby enhancing our understanding of tissue structure and tumor microenvironments.</p><p><strong>Availability: </strong>https://github.com/WeiMin-Li-visual/SDUCL.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1093/bioinformatics/btae670
Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg
Motivation: Biomedical visualizations are key to accessing biomedical knowledge and detecting new patterns in large datasets. Interactive visualizations are essential for biomedical data scientists and are omnipresent in data analysis software and data portals. Without appropriate descriptions, these visualizations are not accessible to all people with blindness and low vision, who often rely on screen reader accessibility technologies to access visual information on digital devices. Screen readers require descriptions to convey image content. However, many images lack informative descriptions due to unawareness and difficulty writing such descriptions. Describing complex and interactive visualizations, like genomics data visualizations, is even more challenging. Automatic generation of descriptions could be beneficial, yet current alt text generating models are limited to basic visualizations and cannot be used for genomics.
Results: We present AltGosling, an automated description generation tool focused on interactive data visualizations of genome-mapped data, created with the grammar-based genomics toolkit Gosling. The logic-based algorithm of AltGosling creates various descriptions including a tree-structured navigable panel. We co-designed AltGosling with a blind screen reader user (co-author). We show that AltGosling outperforms state-of-the-art large language models and common image-based neural networks for alt text generation of genomics data visualizations. As a first of its kind in genomic research, we lay the groundwork to increase accessibility in the field.
Availability and implementation: The source code, examples, and interactive demo are accessible under the MIT License at https://github.com/gosling-lang/altgosling. The package is available at https://www.npmjs.com/package/altgosling.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"AltGosling: Automatic Generation of Text Descriptions for Accessible Genomics Data Visualization.","authors":"Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg","doi":"10.1093/bioinformatics/btae670","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae670","url":null,"abstract":"<p><strong>Motivation: </strong>Biomedical visualizations are key to accessing biomedical knowledge and detecting new patterns in large datasets. Interactive visualizations are essential for biomedical data scientists and are omnipresent in data analysis software and data portals. Without appropriate descriptions, these visualizations are not accessible to all people with blindness and low vision, who often rely on screen reader accessibility technologies to access visual information on digital devices. Screen readers require descriptions to convey image content. However, many images lack informative descriptions due to unawareness and difficulty writing such descriptions. Describing complex and interactive visualizations, like genomics data visualizations, is even more challenging. Automatic generation of descriptions could be beneficial, yet current alt text generating models are limited to basic visualizations and cannot be used for genomics.</p><p><strong>Results: </strong>We present AltGosling, an automated description generation tool focused on interactive data visualizations of genome-mapped data, created with the grammar-based genomics toolkit Gosling. The logic-based algorithm of AltGosling creates various descriptions including a tree-structured navigable panel. We co-designed AltGosling with a blind screen reader user (co-author). We show that AltGosling outperforms state-of-the-art large language models and common image-based neural networks for alt text generation of genomics data visualizations. As a first of its kind in genomic research, we lay the groundwork to increase accessibility in the field.</p><p><strong>Availability and implementation: </strong>The source code, examples, and interactive demo are accessible under the MIT License at https://github.com/gosling-lang/altgosling. The package is available at https://www.npmjs.com/package/altgosling.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and "tail labels" with few known examples. Previous methods mainly focused on protein sequence features, overlooking the semantic meaning of protein labels.
Results: We introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling.","authors":"Wenkai Xiang, Zhaoping Xiong, Huan Chen, Jiacheng Xiong, Wei Zhang, Zunyun Fu, Mingyue Zheng, Bing Liu, Qian Shi","doi":"10.1093/bioinformatics/btae680","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae680","url":null,"abstract":"<p><strong>Motivation: </strong>Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and \"tail labels\" with few known examples. Previous methods mainly focused on protein sequence features, overlooking the semantic meaning of protein labels.</p><p><strong>Results: </strong>We introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1093/bioinformatics/btae677
Jaime Moreno, Henrik Nielsen, Ole Winther, Felix Teufel
Motivation: Protein subcellular location prediction is a widely explored task in bioinformatics because of its importance in proteomics research. We propose DeepLocPro, an extension to the popular method DeepLoc, tailored specifically to archaeal and bacterial organisms.
Results: DeepLocPro is a multiclass subcellular location prediction tool for prokaryotic proteins, trained on experimentally verified data curated from UniProt and PSORTdb. DeepLocPro compares favorably to the PSORTb 3.0 ensemble method, surpassing its performance across multiple metrics in our benchmark experiment.
Availability: The DeepLocPro prediction tool is available online at https://ku.biolib.com/deeplocpro and https://services.healthtech.dtu.dk/services/DeepLocPro-1.0/.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Predicting the subcellular location of prokaryotic proteins with DeepLocPro.","authors":"Jaime Moreno, Henrik Nielsen, Ole Winther, Felix Teufel","doi":"10.1093/bioinformatics/btae677","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae677","url":null,"abstract":"<p><strong>Motivation: </strong>Protein subcellular location prediction is a widely explored task in bioinformatics because of its importance in proteomics research. We propose DeepLocPro, an extension to the popular method DeepLoc, tailored specifically to archaeal and bacterial organisms.</p><p><strong>Results: </strong>DeepLocPro is a multiclass subcellular location prediction tool for prokaryotic proteins, trained on experimentally verified data curated from UniProt and PSORTdb. DeepLocPro compares favorably to the PSORTb 3.0 ensemble method, surpassing its performance across multiple metrics in our benchmark experiment.</p><p><strong>Availability: </strong>The DeepLocPro prediction tool is available online at https://ku.biolib.com/deeplocpro and https://services.healthtech.dtu.dk/services/DeepLocPro-1.0/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1093/bioinformatics/btae678
Zhijian Huang, Yucheng Wang, Song Chen, Yaw Sing Tan, Lei Deng, Min Wu
Motivation: RNA is implicated in numerous aberrant cellular functions and disease progressions, highlighting the crucial importance of RNA-targeted drugs. To accelerate the discovery of such drugs, it is essential to develop an effective computational method for predicting RNA-small molecule affinity (RSMA). Recently, deep learning based computational methods have been promising due to their powerful nonlinear modeling ability. However, the leveraging of advanced deep learning methods to mine the diverse information of RNAs, small molecules and their interaction still remains a great challenge.
Results: In this study, we present DeepRSMA, an innovative cross-attention-based deep learning method for RSMA prediction. To effectively capture fine-grained features from RNA and small molecules, we developed nucleotide-level and atomic-level feature extraction modules for RNA and small molecules, respectively. Additionally, we incorporated both sequence and graph views into these modules to capture features from multiple perspectives. Moreover, a Transformer-based cross-fusion module is introduced to learn the general patterns of interactions between RNAs and small molecules. To achieve effective RSMA prediction, we integrated the RNA and small molecule representations from the feature extraction and cross-fusion modules. Our results show that DeepRSMA outperforms baseline methods in multiple test settings. The interpretability analysis and the case study on spinal muscular atrophy (SMA) demonstrate that DeepRSMA has the potential to guide RNA-targeted drug design.
Availability: The codes and data are publicly available at https://github.com/Hhhzj-7/DeepRSMA.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"DeepRSMA: a cross-fusion based deep learning method for RNA-small molecule binding affinity prediction.","authors":"Zhijian Huang, Yucheng Wang, Song Chen, Yaw Sing Tan, Lei Deng, Min Wu","doi":"10.1093/bioinformatics/btae678","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae678","url":null,"abstract":"<p><strong>Motivation: </strong>RNA is implicated in numerous aberrant cellular functions and disease progressions, highlighting the crucial importance of RNA-targeted drugs. To accelerate the discovery of such drugs, it is essential to develop an effective computational method for predicting RNA-small molecule affinity (RSMA). Recently, deep learning based computational methods have been promising due to their powerful nonlinear modeling ability. However, the leveraging of advanced deep learning methods to mine the diverse information of RNAs, small molecules and their interaction still remains a great challenge.</p><p><strong>Results: </strong>In this study, we present DeepRSMA, an innovative cross-attention-based deep learning method for RSMA prediction. To effectively capture fine-grained features from RNA and small molecules, we developed nucleotide-level and atomic-level feature extraction modules for RNA and small molecules, respectively. Additionally, we incorporated both sequence and graph views into these modules to capture features from multiple perspectives. Moreover, a Transformer-based cross-fusion module is introduced to learn the general patterns of interactions between RNAs and small molecules. To achieve effective RSMA prediction, we integrated the RNA and small molecule representations from the feature extraction and cross-fusion modules. Our results show that DeepRSMA outperforms baseline methods in multiple test settings. The interpretability analysis and the case study on spinal muscular atrophy (SMA) demonstrate that DeepRSMA has the potential to guide RNA-targeted drug design.</p><p><strong>Availability: </strong>The codes and data are publicly available at https://github.com/Hhhzj-7/DeepRSMA.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1093/bioinformatics/btae672
Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu
Summary: Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.
Availability and implementation: The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.","authors":"Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu","doi":"10.1093/bioinformatics/btae672","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae672","url":null,"abstract":"<p><strong>Summary: </strong>Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.</p><p><strong>Availability and implementation: </strong>The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1093/bioinformatics/btae671
Brandon H Bergsneider, Orieta Celiku
Summary: Network analysis (NA) has recently emerged as a new paradigm by which to model the symptom patterns of patients with complex illnesses such as cancer. NA uses graph theory-based methods to capture the interplay between symptoms and identify which symptoms may be most impactful to patient quality of life and are therefore most critical to treat/prevent. Despite NA's increasing popularity in research settings, its clinical applicability is hindered by the lack of a unified platform that consolidates all the software tools needed to perform NA, and by the lack of methods for capturing heterogeneity across patient cohorts. Addressing these limitations, we present PRONA, an R-package for Patient Reported Outcomes Network Analysis. PRONA not only consolidates previous NA tools into a unified, easy-to-use analysis pipeline, but also augments the traditional approach with functionality for performing unsupervised discovery of patient subgroups with distinct symptom patterns.
Availability and implementation: PRONA is implemented in R. Source code, installation, and use instructions are available on GitHub at https://github.com/bbergsneider/PRONA.
Supplementary information: Supplementary information is available at Bioinformatics online.
摘要:网络分析(NA)是最近出现的一种新范式,可用于模拟癌症等复杂疾病患者的症状模式。网络分析使用基于图论的方法来捕捉症状之间的相互作用,并确定哪些症状可能对患者的生活质量影响最大,因此是治疗/预防的关键。尽管 NA 在研究环境中越来越受欢迎,但由于缺乏一个统一的平台来整合执行 NA 所需的所有软件工具,以及缺乏捕捉患者队列间异质性的方法,NA 的临床适用性受到了阻碍。为了解决这些局限性,我们推出了 PRONA,一个用于患者报告结果网络分析的 R 软件包。PRONA 不仅将之前的 NA 工具整合到一个统一、易用的分析管道中,还通过对具有不同症状模式的患者亚群进行无监督发现的功能增强了传统方法:PRONA用R语言实现。源代码、安装和使用说明可在GitHub上获取:https://github.com/bbergsneider/PRONA.Supplementary information:补充信息可在 Bioinformatics online 上获取。
{"title":"PRONA: An R-package for Patient Reported Outcomes Network Analysis.","authors":"Brandon H Bergsneider, Orieta Celiku","doi":"10.1093/bioinformatics/btae671","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae671","url":null,"abstract":"<p><strong>Summary: </strong>Network analysis (NA) has recently emerged as a new paradigm by which to model the symptom patterns of patients with complex illnesses such as cancer. NA uses graph theory-based methods to capture the interplay between symptoms and identify which symptoms may be most impactful to patient quality of life and are therefore most critical to treat/prevent. Despite NA's increasing popularity in research settings, its clinical applicability is hindered by the lack of a unified platform that consolidates all the software tools needed to perform NA, and by the lack of methods for capturing heterogeneity across patient cohorts. Addressing these limitations, we present PRONA, an R-package for Patient Reported Outcomes Network Analysis. PRONA not only consolidates previous NA tools into a unified, easy-to-use analysis pipeline, but also augments the traditional approach with functionality for performing unsupervised discovery of patient subgroups with distinct symptom patterns.</p><p><strong>Availability and implementation: </strong>PRONA is implemented in R. Source code, installation, and use instructions are available on GitHub at https://github.com/bbergsneider/PRONA.</p><p><strong>Supplementary information: </strong>Supplementary information is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1093/bioinformatics/btae664
Marcio Soares Ferreira, Sebastian Stricker, Tomas Fitzgerald, Jack Monahan, Fanny Defranoux, Philip Watson, Bettina Welz, Omar Hammouda, Joachim Wittbrodt, Ewan Birney
High resolution imaging of model organisms allows the quantification of important physiological measurements. In the case of fish with transparent embryos, these videos can visualise key physiological processes, such as heartbeat. High throughput systems can provide enough measurements for the robust investigation of developmental processes as well as the impact of system perturbations on physiological state. However, few analytical schemes have been designed to handle thousands of high-resolution videos without the need for some level of human intervention. We developed a software package, named FEHAT, to provide a fully automated solution for the analytics of large numbers of heart rate imaging datasets obtained from developing Medaka fish embryos in 96 well plate format imaged on an Acquifer machine. FEHAT uses image segmentation to define regions of the embryo showing changes in pixel intensity over time, followed by the classification of the most likely position of the heart and Fourier Transformations to estimate the heart rate. Here we describe some important features of the FEHAT software, showcasing its performance across a large set of medaka fish embryos and compare its performance to established, less automated solutions. FEHAT provides reliable heart rate estimates across a range of temperature-based perturbations and can be applied to tens of thousands of embryos without the need for any human intervention.
Availability: Data used in this manuscript will be made available on request.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"FEHAT: Efficient, Large scale and Automated Heartbeat Detection in Medaka Fish Embryos.","authors":"Marcio Soares Ferreira, Sebastian Stricker, Tomas Fitzgerald, Jack Monahan, Fanny Defranoux, Philip Watson, Bettina Welz, Omar Hammouda, Joachim Wittbrodt, Ewan Birney","doi":"10.1093/bioinformatics/btae664","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae664","url":null,"abstract":"<p><p>High resolution imaging of model organisms allows the quantification of important physiological measurements. In the case of fish with transparent embryos, these videos can visualise key physiological processes, such as heartbeat. High throughput systems can provide enough measurements for the robust investigation of developmental processes as well as the impact of system perturbations on physiological state. However, few analytical schemes have been designed to handle thousands of high-resolution videos without the need for some level of human intervention. We developed a software package, named FEHAT, to provide a fully automated solution for the analytics of large numbers of heart rate imaging datasets obtained from developing Medaka fish embryos in 96 well plate format imaged on an Acquifer machine. FEHAT uses image segmentation to define regions of the embryo showing changes in pixel intensity over time, followed by the classification of the most likely position of the heart and Fourier Transformations to estimate the heart rate. Here we describe some important features of the FEHAT software, showcasing its performance across a large set of medaka fish embryos and compare its performance to established, less automated solutions. FEHAT provides reliable heart rate estimates across a range of temperature-based perturbations and can be applied to tens of thousands of embryos without the need for any human intervention.</p><p><strong>Availability: </strong>Data used in this manuscript will be made available on request.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1093/bioinformatics/btae637
Sean J McIlwain, Anna Hoefges, Amy K Erbe, Paul M Sondel, Irene M Ong
Introduction: Ultradense peptide binding arrays that can probe millions of linear peptides comprising the entire proteomes of human or mouse, or hundreds of thousands of microbes, are powerful tools for studying the antibody repertoire in serum samples to understand adaptive immune responses.
Motivation: There are few tools for exploring high-dimensional, significant and reproducible antibody targets for ultradense peptide binding arrays at the linear peptide, epitope (grouping of adjacent peptides), and protein level across multiple samples/subjects (i.e. epitope spread or immunogenic regions of proteins) for understanding the heterogeneity of immune responses.
Results: We developed HERON (Hierarchical antibody binding Epitopes and pROteins from liNear peptides), an R package, which identifies immunogenic epitopes, using meta-analyses and spatial clustering techniques to explore antibody targets at various resolution and confidence levels, that can be found consistently across a specified number of samples through the entire proteome to study antibody responses for diagnostics or treatment. Our approach estimates significance values at the linear peptide (probe), epitope, and protein level to identify top candidates for validation. We test the performance of predictions on all three levels using correlation between technical replicates and comparison of epitope calls on two datasets, which shows HERON's competitiveness in estimating false discovery rates and finding general and sample-level regions of interest for antibody binding.
Availability: The HERON R package is available at Bioconductor https://bioconductor.org/packages/release/bioc/html/HERON.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
简介超高密度肽结合阵列可以探测数百万个线性肽,包括人类或小鼠的整个蛋白质组,或数十万个微生物,是研究血清样本中抗体复合物以了解适应性免疫反应的强大工具:目前很少有工具能在线性肽、表位(相邻肽的分组)和蛋白质水平上探索超高密度肽结合阵列的高维、重要和可重现的抗体靶标(即蛋白质的表位扩散或免疫原性区域),以了解免疫反应的异质性:我们开发了HERON(Hierarchical antibody binding Epitopes and pROteins from liNear peptides),这是一个R软件包,它利用荟萃分析和空间聚类技术识别免疫原表位,以不同的分辨率和置信度探索抗体靶点,这些靶点可以在整个蛋白质组的指定数量样本中找到,以研究用于诊断或治疗的抗体反应。我们的方法在线性肽(探针)、表位和蛋白质水平上估算显著性值,以确定需要验证的顶级候选目标。我们使用技术复制之间的相关性和两个数据集上表位调用的比较来测试所有三个层面的预测性能,这表明 HERON 在估计误发现率和发现抗体结合的一般和样本级感兴趣区域方面具有竞争力:HERON R软件包可从Bioconductor https://bioconductor.org/packages/release/bioc/html/HERON.html.Supplementary 获取:补充数据可在 Bioinformatics online 上获取。
{"title":"Ranking Antibody Binding Epitopes and Proteins Across Samples from Whole Proteome Tiled Linear Peptides.","authors":"Sean J McIlwain, Anna Hoefges, Amy K Erbe, Paul M Sondel, Irene M Ong","doi":"10.1093/bioinformatics/btae637","DOIUrl":"10.1093/bioinformatics/btae637","url":null,"abstract":"<p><strong>Introduction: </strong>Ultradense peptide binding arrays that can probe millions of linear peptides comprising the entire proteomes of human or mouse, or hundreds of thousands of microbes, are powerful tools for studying the antibody repertoire in serum samples to understand adaptive immune responses.</p><p><strong>Motivation: </strong>There are few tools for exploring high-dimensional, significant and reproducible antibody targets for ultradense peptide binding arrays at the linear peptide, epitope (grouping of adjacent peptides), and protein level across multiple samples/subjects (i.e. epitope spread or immunogenic regions of proteins) for understanding the heterogeneity of immune responses.</p><p><strong>Results: </strong>We developed HERON (Hierarchical antibody binding Epitopes and pROteins from liNear peptides), an R package, which identifies immunogenic epitopes, using meta-analyses and spatial clustering techniques to explore antibody targets at various resolution and confidence levels, that can be found consistently across a specified number of samples through the entire proteome to study antibody responses for diagnostics or treatment. Our approach estimates significance values at the linear peptide (probe), epitope, and protein level to identify top candidates for validation. We test the performance of predictions on all three levels using correlation between technical replicates and comparison of epitope calls on two datasets, which shows HERON's competitiveness in estimating false discovery rates and finding general and sample-level regions of interest for antibody binding.</p><p><strong>Availability: </strong>The HERON R package is available at Bioconductor https://bioconductor.org/packages/release/bioc/html/HERON.html.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}