首页 > 最新文献

Bioinformatics最新文献

英文 中文
Hi-GeoMVP: a hierarchical geometry-enhanced deep learning model for drug response prediction. Hi-GeoMVP:用于药物反应预测的分层几何增强深度学习模型。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-13 DOI: 10.1093/bioinformatics/btae204
Yurui Chen, Louxin Zhang
MOTIVATIONPersonalized cancer treatments require accurate drug response predictions. Existing deep learning methods show promise but higher accuracy is needed to serve the purpose of precision medicine. The prediction accuracy can be improved with not only topology but geometrical information of drugs.RESULTSA novel deep learning methodology for drug response prediction is presented, named Hi-GeoMVP. It synthesizes hierarchical drug representation with multi-omics data, leveraging graph neural networks and variational autoencoders for detailed drug and cell line representations. Multi-task learning is employed to make better prediction, while both 2D and 3D molecular representations capture comprehensive drug information. Testing on the GDSC dataset confirms Hi-GeoMVP's enhanced performance, surpassing prior state-of-the-art methods by improving the Pearson correlation coefficient from 0.934 to 0.941 and decreasing the root mean square error from 0.969 to 0.931. In the case of blind test, Hi-GeoMVP demonstrated robustness, outperforming the best previous models with a superior Pearson correlation coefficient in the drug-blind test. These results underscore Hi-GeoMVP's capabilities in drug response prediction, implying its potential for precision medicine.AVAILABILITY AND IMPLEMENTATIONThe source code is available at https://github.com/matcyr/Hi-GeoMVP.SUPPLEMENTARY INFORMATIONSupplementary data is available at Bioinformatics online.
动机个性化的癌症治疗需要准确的药物反应预测。现有的深度学习方法前景广阔,但要达到精准医疗的目的,还需要更高的准确性。结果提出了一种用于药物反应预测的新型深度学习方法,名为 Hi-GeoMVP。该方法利用图神经网络和变异自动编码器对详细的药物和细胞系进行表征,从而综合了多组学数据的分层药物表征。它采用多任务学习来进行更好的预测,而二维和三维分子表征都能捕捉到全面的药物信息。在 GDSC 数据集上进行的测试证实了 Hi-GeoMVP 性能的提高,其皮尔逊相关系数从 0.934 提高到了 0.941,均方根误差从 0.969 降低到了 0.931,超过了之前的先进方法。在盲测情况下,Hi-GeoMVP 表现出稳健性,在药盲测试中的皮尔逊相关系数优于之前的最佳模型。这些结果凸显了 Hi-GeoMVP 在药物反应预测方面的能力,暗示了它在精准医疗方面的潜力。可获得性和实施源代码可在 https://github.com/matcyr/Hi-GeoMVP.SUPPLEMENTARY 上获取信息补充数据可在 Bioinformatics online 上获取。
{"title":"Hi-GeoMVP: a hierarchical geometry-enhanced deep learning model for drug response prediction.","authors":"Yurui Chen, Louxin Zhang","doi":"10.1093/bioinformatics/btae204","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae204","url":null,"abstract":"MOTIVATION\u0000Personalized cancer treatments require accurate drug response predictions. Existing deep learning methods show promise but higher accuracy is needed to serve the purpose of precision medicine. The prediction accuracy can be improved with not only topology but geometrical information of drugs.\u0000\u0000\u0000RESULTS\u0000A novel deep learning methodology for drug response prediction is presented, named Hi-GeoMVP. It synthesizes hierarchical drug representation with multi-omics data, leveraging graph neural networks and variational autoencoders for detailed drug and cell line representations. Multi-task learning is employed to make better prediction, while both 2D and 3D molecular representations capture comprehensive drug information. Testing on the GDSC dataset confirms Hi-GeoMVP's enhanced performance, surpassing prior state-of-the-art methods by improving the Pearson correlation coefficient from 0.934 to 0.941 and decreasing the root mean square error from 0.969 to 0.931. In the case of blind test, Hi-GeoMVP demonstrated robustness, outperforming the best previous models with a superior Pearson correlation coefficient in the drug-blind test. These results underscore Hi-GeoMVP's capabilities in drug response prediction, implying its potential for precision medicine.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000The source code is available at https://github.com/matcyr/Hi-GeoMVP.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data is available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140708510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeoAgDT: Optimization of personal neoantigen vaccine composition by digital twin simulation of a cancer cell population. NeoAgDT:通过对癌细胞群进行数字孪生模拟,优化个人新抗原疫苗成分。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-13 DOI: 10.1093/bioinformatics/btae205
Anja Mösch, Filippo Grazioli, Pierre Machart, Brandon Malone
MOTIVATIONNeoantigen vaccines make use of tumor-specific mutations to enable the patient's immune system to recognize and eliminate cancer. Selecting vaccine elements, however, is a complex task which needs to take into account not only the underlying antigen presentation pathway but also tumor heterogeneity.RESULTSHere, we present NeoAgDT, a two-step approach consisting of: (1) simulating individual cancer cells to create a digital twin of the patient's tumor cell population and (2) optimizing the vaccine composition by integer linear programming based on this digital twin. NeoAgDT shows improved selection of experimentally-validated neoantigens over ranking-based approaches in a study of seven patients.AVAILABILITYThe NeoAgDT code is published on Github: https://github.com/nec-research/neoagdt.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机新抗原疫苗利用肿瘤特异性突变使患者的免疫系统能够识别并消灭癌症。结果在这里,我们介绍了 NeoAgDT,这是一种分两步进行的方法,包括:(1)模拟单个癌细胞,创建患者肿瘤细胞群的数字孪生体;(2)在此数字孪生体的基础上,通过整数线性规划优化疫苗组成。在一项针对七名患者的研究中,NeoAgDT显示实验验证的新抗原选择比基于排序的方法有所改进。可用性NeoAgDT代码发布在Github上:https://github.com/nec-research/neoagdt.SUPPLEMENTARY 信息补充数据可在Bioinformatics online上获取。
{"title":"NeoAgDT: Optimization of personal neoantigen vaccine composition by digital twin simulation of a cancer cell population.","authors":"Anja Mösch, Filippo Grazioli, Pierre Machart, Brandon Malone","doi":"10.1093/bioinformatics/btae205","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae205","url":null,"abstract":"MOTIVATION\u0000Neoantigen vaccines make use of tumor-specific mutations to enable the patient's immune system to recognize and eliminate cancer. Selecting vaccine elements, however, is a complex task which needs to take into account not only the underlying antigen presentation pathway but also tumor heterogeneity.\u0000\u0000\u0000RESULTS\u0000Here, we present NeoAgDT, a two-step approach consisting of: (1) simulating individual cancer cells to create a digital twin of the patient's tumor cell population and (2) optimizing the vaccine composition by integer linear programming based on this digital twin. NeoAgDT shows improved selection of experimentally-validated neoantigens over ranking-based approaches in a study of seven patients.\u0000\u0000\u0000AVAILABILITY\u0000The NeoAgDT code is published on Github: https://github.com/nec-research/neoagdt.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140707633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A data-adaptive methods in detecting exogenous methyltransferase accessible chromatin in human genome using nanopore sequencing. 利用纳米孔测序检测人类基因组中外源性甲基转移酶可访问染色质的数据自适应方法。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-13 DOI: 10.1093/bioinformatics/btae206
Kailing Tu, Xuemei Li, Qilin Zhang, Wei Huang, Dan Xie
MOTIVATIONIdentifying chromatin accessibility is one of the key steps in studying the regulation of eukaryotic genomes. The combination of exogenous methyltransferase and nanopore sequencing provides an strategy to identify open chromatin over long genomic ranges at the single-molecule scale. However, endogenous methylation, non-open-chromatin-specific exogenous methylation and base-calling errors limit the accuracy and hinders its application to complex genomes.RESULTSWe systematically evaluated the impact of these three influence factors, and developed a model-based computational method, methyltransferase accessible genome region finder(MAGNIFIER), to address the issues. By incorporating control data, MAGNIFIER attenuates the three influence factors with data-adaptive comparison strategy. We demonstrate that MAGNIFIER is not only sensitive to identify the open chromatin with much improved accuracy, but also able to detect the chromatin accessibility of repetitive regions that are missed by NGS-based methods. By incorporating long-read RNA-seq data, we revealed the association between the accessible Alu elements and non-classic gene isoforms.AVAILABILITYFreely avaliable on web at https://github.com/Goatofmountain/MAGNIFIER.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机确定染色质的可及性是研究真核生物基因组调控的关键步骤之一。外源甲基转移酶与纳米孔测序的结合提供了一种在单分子尺度上识别长基因组范围内开放染色质的策略。结果 我们系统地评估了这三个影响因素的影响,并开发了一种基于模型的计算方法--甲基转移酶可访问基因组区域搜索器(MAGNIFIER)来解决这些问题。通过纳入对照数据,MAGNIFIER 利用数据自适应比较策略削弱了这三个影响因素。我们证明,MAGNIFIER 不仅能灵敏地识别开放染色质,而且准确性大大提高,还能检测基于 NGS 方法遗漏的重复区域的染色质可及性。通过结合长线程 RNA-seq 数据,我们揭示了可访问的 Alu 元素与非经典基因同工酶之间的关联。AVAILABILITY免费提供,网址:https://github.com/Goatofmountain/MAGNIFIER.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
{"title":"A data-adaptive methods in detecting exogenous methyltransferase accessible chromatin in human genome using nanopore sequencing.","authors":"Kailing Tu, Xuemei Li, Qilin Zhang, Wei Huang, Dan Xie","doi":"10.1093/bioinformatics/btae206","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae206","url":null,"abstract":"MOTIVATION\u0000Identifying chromatin accessibility is one of the key steps in studying the regulation of eukaryotic genomes. The combination of exogenous methyltransferase and nanopore sequencing provides an strategy to identify open chromatin over long genomic ranges at the single-molecule scale. However, endogenous methylation, non-open-chromatin-specific exogenous methylation and base-calling errors limit the accuracy and hinders its application to complex genomes.\u0000\u0000\u0000RESULTS\u0000We systematically evaluated the impact of these three influence factors, and developed a model-based computational method, methyltransferase accessible genome region finder(MAGNIFIER), to address the issues. By incorporating control data, MAGNIFIER attenuates the three influence factors with data-adaptive comparison strategy. We demonstrate that MAGNIFIER is not only sensitive to identify the open chromatin with much improved accuracy, but also able to detect the chromatin accessibility of repetitive regions that are missed by NGS-based methods. By incorporating long-read RNA-seq data, we revealed the association between the accessible Alu elements and non-classic gene isoforms.\u0000\u0000\u0000AVAILABILITY\u0000Freely avaliable on web at https://github.com/Goatofmountain/MAGNIFIER.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140707991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Literature mining discerns latent disease-gene relationships. 文献挖掘发现潜在的疾病基因关系。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-12 DOI: 10.1093/bioinformatics/btae185
Priyadarshini Rai, Atishay Jain, Shivani Kumar, Divya Sharma, Neha Jha, Smriti Chawla, Abhijith S. Raj, Apoorva Gupta, Sarita Poonia, A. Majumdar, Tanmoy Chakraborty, Gaurav Ahuja, Debarka Sengupta
MOTIVATIONDysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape (HCL) project (Han et al., 2020), researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates.RESULTSTo circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained Bidirectional Encoder Representations from Transformers (BERT) for language modeling from the domain of Natural Language Processing (NLP) to learn vector representation of entities such as genes, diseases, tissues, cell-types etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in-silico synthesis of hypotheses linking different biological entities such as genes and conditions.AVAILABILITY AND IMPLEMENTATIONPathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-ModelBioSentVec based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-ModelPathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机基因突变或调控网络受损导致基因功能失调,往往会引发受影响组织的病理状态。主要由于遗传多效性和缺乏合适的计算方法,全面绘制这些明显的基因病理关系图谱是一项艰巨的任务。随着高通量基因组学平台和人类细胞景观(Human Cell Landscape,HCL)项目(Han 等人,2020 年)等社区规模计划的出现,研究人员已经能够在单细胞水平上绘制健康组织的基因表达图谱。然而,在疾病方面,我们目前还无法获得类似的丰富知识。结果为了避免这种情况,我们挖掘了截至 2019 年 5 月发表的 1,800 万篇 PubMed 摘要,并自动选择了其中 450 万篇描述特定基因在疾病发病机制中作用的摘要。此外,我们从自然语言处理(NLP)领域微调了用于语言建模的预训练转换器双向编码器表征(BERT),以学习基因、疾病、组织、细胞类型等实体的向量表征,从而在向量空间中保留它们之间的关系。经过重新利用的 BERT 预测了训练数据中没有引用的疾病-基因关联,从而突显了对连接基因和病症等不同生物实体的假设进行内部合成的可行性。可用性和实施PathoBERT 预训练模型:https://github.com/Priyadarshini-Rai/Pathomap-ModelBioSentVec 基于抽象分类模型:https://github.com/Priyadarshini-Rai/Pathomap-ModelPathomap R软件包:https://github.com/Priyadarshini-Rai/Pathomap.SUPPLEMENTARY 信息补充数据可从生物信息学在线网站获取。
{"title":"Literature mining discerns latent disease-gene relationships.","authors":"Priyadarshini Rai, Atishay Jain, Shivani Kumar, Divya Sharma, Neha Jha, Smriti Chawla, Abhijith S. Raj, Apoorva Gupta, Sarita Poonia, A. Majumdar, Tanmoy Chakraborty, Gaurav Ahuja, Debarka Sengupta","doi":"10.1093/bioinformatics/btae185","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae185","url":null,"abstract":"MOTIVATION\u0000Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape (HCL) project (Han et al., 2020), researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates.\u0000\u0000\u0000RESULTS\u0000To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained Bidirectional Encoder Representations from Transformers (BERT) for language modeling from the domain of Natural Language Processing (NLP) to learn vector representation of entities such as genes, diseases, tissues, cell-types etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in-silico synthesis of hypotheses linking different biological entities such as genes and conditions.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-ModelBioSentVec based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-ModelPathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140711872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and dirichlet process mixture model. scDAC:利用耦合自动编码器和 Dirichlet 过程混合物模型对单细胞转录组数据进行深度自适应聚类。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-11 DOI: 10.1093/bioinformatics/btae198
Sijing An, Jinhui Shi, Runyan Liu, Yaowen Chen, Jing Wang, Shuofeng Hu, Xinyu Xia, Guohua Dong, Xiaochen Bo, Zhen He, Xiaomin Ying
MOTIVATIONClustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging.RESULTSHere we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with fifteen widely-used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes.AVAILABILITYThe scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机 对单细胞 RNA 测序(scRNA-seq)数据进行聚类分析是揭示细胞异质性的重要一步。人们提出了许多聚类方法,以从 scRNA-seq 数据中发现异源细胞类型。结果在这里,我们提出了一种单细胞深度自适应聚类(scDAC)模型,它将自动编码器(AE)和德里克利特过程混杂模型(DPMM)结合在一起。通过联合优化AE和DPMM的模型参数,scDAC在scRNA-seq数据上实现了具有精确聚类数的自适应聚类。我们在五个具有不同细胞类型数量的子样本数据集上验证了 scDAC 的性能,并将其与九个 scRNA-seq 数据集上广泛使用的十五种聚类方法进行了比较。结果表明,scDAC 可以自适应地找到准确的细胞类型或亚型数量,其性能优于其他方法。此外,scDAC 的性能对超参数变化具有鲁棒性。源代码可在 https://github.com/labomics/scDAC.SUPPLEMENTARY 上获取。 补充数据可在 Bioinformatics online 上获取。
{"title":"scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and dirichlet process mixture model.","authors":"Sijing An, Jinhui Shi, Runyan Liu, Yaowen Chen, Jing Wang, Shuofeng Hu, Xinyu Xia, Guohua Dong, Xiaochen Bo, Zhen He, Xiaomin Ying","doi":"10.1093/bioinformatics/btae198","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae198","url":null,"abstract":"MOTIVATION\u0000Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging.\u0000\u0000\u0000RESULTS\u0000Here we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with fifteen widely-used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes.\u0000\u0000\u0000AVAILABILITY\u0000The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140714203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VirusPredictor: XGBoost-based software to predict virus-related sequences in human data. VirusPredictor:基于 XGBoost 的软件,用于预测人类数据中的病毒相关序列。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-10 DOI: 10.1093/bioinformatics/btae192
Guangchen Liu, Xun Chen, Yihui Luan, Dawei Li
MOTIVATIONDiscovering disease causative pathogens, particularly viruses without reference genomes, poses a technical challenge as they are often unidentifiable through sequence alignment. Machine learning prediction of patient high-throughput sequences unmappable to human and pathogen genomes may reveal sequences originating from uncharacterized viruses. Currently, there is a lack of software specifically designed for accurately predicting such viral sequences in human data.RESULTSWe developed a fast XGBoost method and software VirusPredictor leveraging an in-house viral genome database. Our two-step XGBoost models first classify each query sequence into one of three groups: infectious virus, endogenous retrovirus (ERV) or non-ERV human. The prediction accuracies increased as the sequences became longer, ie, 0.76, 0.93, and 0.98 for 150-350 (Illumina short reads), 850-950 (Sanger sequencing data), and 2,000-5,000 bp sequences, respectively. Then, sequences predicted to be from infectious viruses are further classified into one of six virus taxonomic subgroups, and the accuracies increased from 0.92 to > 0.98 when query sequences increased from 150-350 to > 850 bp. The results suggest that Illumina short reads should be de novo assembled into contigs (e.g., ∼1,000 bp or longer) before prediction whenever possible. We applied VirusPredictor to multiple real genomic and metagenomic datasets and obtained high accuracies. VirusPredictor, a user-friendly open-source Python software, is useful for predicting the origins of patients' unmappable sequences. This study is the first to classify ERVs in infectious viral sequence prediction. This is also the first study combining virus sub-group predictions.AVAILABILITYwww.dllab.org/software/VirusPredictor.html.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机发现致病病原体,尤其是没有参考基因组的病毒,是一项技术挑战,因为通过序列比对往往无法识别这些病原体。对无法与人类和病原体基因组比对的病人高通量序列进行机器学习预测,可能会发现源自未定性病毒的序列。目前,还缺乏专门用于准确预测人类数据中此类病毒序列的软件。结果我们利用内部病毒基因组数据库开发了一种快速 XGBoost 方法和软件 VirusPredictor。我们的两步 XGBoost 模型首先将每个查询序列分为三类:传染性病毒、内源性逆转录病毒 (ERV) 或非ERV 人类。序列越长,预测准确率越高,150-350(Illumina 短读数)、850-950(Sanger 测序数据)和 2,000-5,000 bp 序列的预测准确率分别为 0.76、0.93 和 0.98。当查询序列从 150-350 bp 增加到大于 850 bp 时,准确度从 0.92 增加到大于 0.98。结果表明,Illumina 短读数应尽可能在预测前从头组装成等体(例如,1000 bp 或更长)。我们将 VirusPredictor 应用于多个真实的基因组和元基因组数据集,并获得了很高的准确率。VirusPredictor 是一款用户友好的开源 Python 软件,可用于预测患者不可应用序列的来源。这项研究首次在传染性病毒序列预测中对 ERV 进行了分类。这也是第一项结合病毒亚群预测的研究。AVAILABILITYwww.dllab.org/software/VirusPredictor.html.SUPPLEMENTARY INFORMATIONS补充数据可在生物信息学网上获取。
{"title":"VirusPredictor: XGBoost-based software to predict virus-related sequences in human data.","authors":"Guangchen Liu, Xun Chen, Yihui Luan, Dawei Li","doi":"10.1093/bioinformatics/btae192","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae192","url":null,"abstract":"MOTIVATION\u0000Discovering disease causative pathogens, particularly viruses without reference genomes, poses a technical challenge as they are often unidentifiable through sequence alignment. Machine learning prediction of patient high-throughput sequences unmappable to human and pathogen genomes may reveal sequences originating from uncharacterized viruses. Currently, there is a lack of software specifically designed for accurately predicting such viral sequences in human data.\u0000\u0000\u0000RESULTS\u0000We developed a fast XGBoost method and software VirusPredictor leveraging an in-house viral genome database. Our two-step XGBoost models first classify each query sequence into one of three groups: infectious virus, endogenous retrovirus (ERV) or non-ERV human. The prediction accuracies increased as the sequences became longer, ie, 0.76, 0.93, and 0.98 for 150-350 (Illumina short reads), 850-950 (Sanger sequencing data), and 2,000-5,000 bp sequences, respectively. Then, sequences predicted to be from infectious viruses are further classified into one of six virus taxonomic subgroups, and the accuracies increased from 0.92 to > 0.98 when query sequences increased from 150-350 to > 850 bp. The results suggest that Illumina short reads should be de novo assembled into contigs (e.g., ∼1,000 bp or longer) before prediction whenever possible. We applied VirusPredictor to multiple real genomic and metagenomic datasets and obtained high accuracies. VirusPredictor, a user-friendly open-source Python software, is useful for predicting the origins of patients' unmappable sequences. This study is the first to classify ERVs in infectious viral sequence prediction. This is also the first study combining virus sub-group predictions.\u0000\u0000\u0000AVAILABILITY\u0000www.dllab.org/software/VirusPredictor.html.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140719747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
rnaCrosslinkOO: An Object-Oriented R Package for the Analysis of RNA Structural Data Generated by RNA Crosslinking Experiments. rnaCrosslinkOO:面向对象的 R 软件包,用于分析 RNA 交联实验生成的 RNA 结构数据。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-10 DOI: 10.1093/bioinformatics/btae193
Jonathan L. Price, Omer Ziv, M. Pinckert, Andrew Lim, Eric A. Miska
SUMMARYRNA (Ribonucleic Acid) molecules have secondary and tertiary structures in vivo which play a crucial role in cellular processes such as the regulation of gene expression, RNA processing and localisation. The ability to investigate these structures will enhance our understanding of their function and contribute to the diagnosis and treatment of diseases caused by RNA dysregulation. However, there are no mature pipelines or packages for processing and analysing complex in vivo RNA structural data. Here, we present rnaCrosslinkOO (RNA Crosslink Object-Oriented), a novel software package for the comprehensive analysis of data derived from the COMRADES (Crosslinking of Matched RNA and Deep Sequencing) method. rnaCrosslinkOO offers a comprehensive pipeline from raw sequencing reads to the identification and comparison of RNA structural features. It includes read processing and alignment, clustering of duplexes, data exploration, folding and comparisons of RNA structures. rnaCrosslinkOO also enables comparisons between conditions, the identification of inter-RNA interactions, and the incorporation of reactivity data to improve structure prediction.AVAILABILITY AND IMPLEMENTATIONrnaCrosslinkOO is freely available to non-commercial users and implemented in R, with the source code and documentation accessible at [https://CRAN.R-project.org/package=rnaCrosslinkOO]. The software is supported on Linux, macOS, and Windows platforms.
摘要RNA(核糖核酸)分子在体内具有二级和三级结构,在基因表达调控、RNA加工和定位等细胞过程中发挥着至关重要的作用。研究这些结构的能力将提高我们对其功能的认识,并有助于诊断和治疗由 RNA 失调引起的疾病。然而,目前还没有成熟的管道或软件包来处理和分析复杂的体内 RNA 结构数据。在这里,我们介绍 rnaCrosslinkOO(RNA Crosslink Object-Oriented),这是一个新颖的软件包,用于综合分析 COMRADES(Crosslinking of Matched RNA and Deep Sequencing)方法获得的数据。它包括读取处理和比对、双链聚类、数据探索、折叠和 RNA 结构比较。rnaCrosslinkOO 还能在不同条件下进行比较,识别 RNA 之间的相互作用,并结合反应性数据来改进结构预测。AVAILABILITY AND IMPLEMENTATIONrnaCrosslinkOO 可免费提供给非商业用户,并用 R 语言实现,源代码和文档可在 [https://CRAN.R-project.org/package=rnaCrosslinkOO] 上查阅。该软件支持 Linux、macOS 和 Windows 平台。
{"title":"rnaCrosslinkOO: An Object-Oriented R Package for the Analysis of RNA Structural Data Generated by RNA Crosslinking Experiments.","authors":"Jonathan L. Price, Omer Ziv, M. Pinckert, Andrew Lim, Eric A. Miska","doi":"10.1093/bioinformatics/btae193","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae193","url":null,"abstract":"SUMMARY\u0000RNA (Ribonucleic Acid) molecules have secondary and tertiary structures in vivo which play a crucial role in cellular processes such as the regulation of gene expression, RNA processing and localisation. The ability to investigate these structures will enhance our understanding of their function and contribute to the diagnosis and treatment of diseases caused by RNA dysregulation. However, there are no mature pipelines or packages for processing and analysing complex in vivo RNA structural data. Here, we present rnaCrosslinkOO (RNA Crosslink Object-Oriented), a novel software package for the comprehensive analysis of data derived from the COMRADES (Crosslinking of Matched RNA and Deep Sequencing) method. rnaCrosslinkOO offers a comprehensive pipeline from raw sequencing reads to the identification and comparison of RNA structural features. It includes read processing and alignment, clustering of duplexes, data exploration, folding and comparisons of RNA structures. rnaCrosslinkOO also enables comparisons between conditions, the identification of inter-RNA interactions, and the incorporation of reactivity data to improve structure prediction.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000rnaCrosslinkOO is freely available to non-commercial users and implemented in R, with the source code and documentation accessible at [https://CRAN.R-project.org/package=rnaCrosslinkOO]. The software is supported on Linux, macOS, and Windows platforms.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140718326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
shinyseg: a web application for flexible cosegregation and sensitivity analysis. shinyseg:用于灵活共聚和敏感性分析的网络应用程序。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-10 DOI: 10.1093/bioinformatics/btae201
Christian Carrizosa, Dag E Undlien, Magnus D Vigeland
MOTIVATIONCosegregation analysis is a powerful tool for identifying pathogenic genetic variants, but its implementation remains challenging. Existing software is either limited in scope or too demanding for many end users. Moreover, current solutions lack methods for assessing the robustness of cosegregation evidence, which is important due to its reliance on uncertain estimates.RESULTSWe present shinyseg, a comprehensive web application for clinical cosegregation analysis. Our app streamlines penetrance specification based on either liability classes or epidemiological data such as risks, hazard ratios, and age of onset distribution. In addition, it incorporates sensitivity analyses to assess the robustness of cosegregation evidence, and offers support in clinical interpretation.AVAILABILITY AND IMPLEMENTATIONThe shinyseg app is freely available at https://chrcarrizosa.shinyapps.io/shinyseg, with documentation and complete R source code on https://chrcarrizosa.github.io/shinyseg and https://github.com/chrcarrizosa/shinyseg.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
动机osegregation 分析是鉴定致病基因变异的强大工具,但其实施仍具有挑战性。现有软件要么范围有限,要么对许多最终用户来说要求过高。此外,目前的解决方案缺乏评估共聚集证据稳健性的方法,而这一点由于共聚集依赖于不确定的估计值而非常重要。我们的应用程序根据责任类别或流行病学数据(如风险、危险比和发病年龄分布)简化了穿透性规范。此外,它还结合了敏感性分析,以评估共聚集证据的稳健性,并为临床解释提供支持。可用性和实施方法可在 https://chrcarrizosa.shinyapps.io/shinyseg 免费获取 shinyseg 应用程序,文档和完整的 R 源代码可在 https://chrcarrizosa.github.io/shinyseg 和 https://github.com/chrcarrizosa/shinyseg.SUPPLEMENTARY 获取信息补充数据可在 Bioinformatics online 上获取。
{"title":"shinyseg: a web application for flexible cosegregation and sensitivity analysis.","authors":"Christian Carrizosa, Dag E Undlien, Magnus D Vigeland","doi":"10.1093/bioinformatics/btae201","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae201","url":null,"abstract":"MOTIVATION\u0000Cosegregation analysis is a powerful tool for identifying pathogenic genetic variants, but its implementation remains challenging. Existing software is either limited in scope or too demanding for many end users. Moreover, current solutions lack methods for assessing the robustness of cosegregation evidence, which is important due to its reliance on uncertain estimates.\u0000\u0000\u0000RESULTS\u0000We present shinyseg, a comprehensive web application for clinical cosegregation analysis. Our app streamlines penetrance specification based on either liability classes or epidemiological data such as risks, hazard ratios, and age of onset distribution. In addition, it incorporates sensitivity analyses to assess the robustness of cosegregation evidence, and offers support in clinical interpretation.\u0000\u0000\u0000AVAILABILITY AND IMPLEMENTATION\u0000The shinyseg app is freely available at https://chrcarrizosa.shinyapps.io/shinyseg, with documentation and complete R source code on https://chrcarrizosa.github.io/shinyseg and https://github.com/chrcarrizosa/shinyseg.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics online.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140716879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph. 从生物医学文献中提取知识并将其建模为标注图的网络平台。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-10 DOI: 10.1093/bioinformatics/btae194
Antonio Di Maria, Lorenzo Bellomo, Fabrizio Billeci, Alfio Cardillo, S. Alaimo, Paolo Ferragina, Alfredo Ferro, A. Pulvirenti
MOTIVATIONThe rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging.RESULTSWe introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e., in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches.AVAILABILITYhttps://netme.click/.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics.
动机生物医学文献的迅速增加,使科学家越来越难以跟上研究发现的步伐。因此,计算工具变得越来越广泛,其中网络分析在一些生命科学领域发挥着至关重要的作用。结果我们介绍了 NetMe 2.0,这是一个基于网络的平台,它能从一组输入文本(即 PubMed Central 的论文全文或摘要、免费文本或用户上传的 PDF 文件)中自动提取相关生物医学实体及其关系,并将其建模为生物医学知识图谱(BKG)。NetMe 2.0 还实现了一个创新的检索增强生成模块(Graph-RAG),该模块在 BKG 建模的关系之上工作,允许提炼出解释其内容的格式良好的句子。实验结果表明,与最先进的方法相比,NetMe 2.0 可以推断出全面可靠的生物网络,并具有显著的精确度-召回率指标。AVAILABILITYhttps://netme.click/.SUPPLEMENTARY INFORMATIONS补充数据可在 Bioinformatics 网站获取。
{"title":"A web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph.","authors":"Antonio Di Maria, Lorenzo Bellomo, Fabrizio Billeci, Alfio Cardillo, S. Alaimo, Paolo Ferragina, Alfredo Ferro, A. Pulvirenti","doi":"10.1093/bioinformatics/btae194","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae194","url":null,"abstract":"MOTIVATION\u0000The rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging.\u0000\u0000\u0000RESULTS\u0000We introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e., in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches.\u0000\u0000\u0000AVAILABILITY\u0000https://netme.click/.\u0000\u0000\u0000SUPPLEMENTARY INFORMATION\u0000Supplementary data are available at Bioinformatics.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140718539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VarChat: the generative AI assistant for the interpretation of human genomic variations. VarChat:解读人类基因组变异的生成式人工智能助手。
IF 5.8 3区 生物学 Q1 Mathematics Pub Date : 2024-04-05 DOI: 10.1093/bioinformatics/btae183
F. De Paoli, Silvia Berardelli, I. Limongelli, E. Rizzo, S. Zucca
MOTIVATIONIn the modern era of genomic research, the scientific community is witnessing an explosive growth in the volume of published findings.While this abundance of data offers invaluable insights, it also places a pressing responsibility on genetic professionals and researchers to stay informed about the latest findings and their clinical significance. Genomic variant interpretation is currently facing a challenge in identifying the most up-to-date and relevant scientific papers, while also extracting meaningful information to accelerate the process from clinical assessment to reporting.Computer-aided literature search and summarization can play a pivotal role in this context. By synthesizing complex genomic findings into concise, interpretable summaries, this approach facilitates the translation of extensive genomic datasets into clinically relevant insights.RESULTSTo bridge this gap, we present VarChat (varchat.engenome.com), an innovative tool based on generative AI, developed to find and summarize the fragmented scientific literature associated with genomic variants into brief yet informative texts.VarChat provides users with a concise description of specific genetic variants, detailing their impact on related proteins and possible effects on human health. Additionally, VarChat offers direct links to related scientific trustable sources, and encourages deeper research.AVAILABILITYvarchat.engenome.com.
动机在现代基因组研究时代,科学界发表的研究成果呈爆炸式增长。虽然这些丰富的数据提供了宝贵的见解,但也给基因专业人员和研究人员带来了紧迫的责任,即随时了解最新研究成果及其临床意义。基因组变异解读目前面临的挑战是如何识别最新的相关科学论文,同时提取有意义的信息,以加快从临床评估到报告的过程。在这种情况下,计算机辅助文献检索和总结可以发挥关键作用。通过将复杂的基因组研究结果综合成简明易懂的摘要,这种方法有助于将广泛的基因组数据集转化为临床相关的见解。结果为了弥补这一差距,我们推出了基于生成式人工智能的创新工具 VarChat (varchat.engenome.com),该工具的开发目的是查找与基因组变异相关的零散科学文献,并将其摘要成简短但信息丰富的文本。VarChat 为用户提供了特定基因变异的简明描述,详细说明了它们对相关蛋白质的影响以及对人类健康可能产生的影响。此外,VarChat 还提供了相关科学可信来源的直接链接,并鼓励用户进行更深入的研究。
{"title":"VarChat: the generative AI assistant for the interpretation of human genomic variations.","authors":"F. De Paoli, Silvia Berardelli, I. Limongelli, E. Rizzo, S. Zucca","doi":"10.1093/bioinformatics/btae183","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae183","url":null,"abstract":"MOTIVATION\u0000In the modern era of genomic research, the scientific community is witnessing an explosive growth in the volume of published findings.While this abundance of data offers invaluable insights, it also places a pressing responsibility on genetic professionals and researchers to stay informed about the latest findings and their clinical significance. Genomic variant interpretation is currently facing a challenge in identifying the most up-to-date and relevant scientific papers, while also extracting meaningful information to accelerate the process from clinical assessment to reporting.Computer-aided literature search and summarization can play a pivotal role in this context. By synthesizing complex genomic findings into concise, interpretable summaries, this approach facilitates the translation of extensive genomic datasets into clinically relevant insights.\u0000\u0000\u0000RESULTS\u0000To bridge this gap, we present VarChat (varchat.engenome.com), an innovative tool based on generative AI, developed to find and summarize the fragmented scientific literature associated with genomic variants into brief yet informative texts.VarChat provides users with a concise description of specific genetic variants, detailing their impact on related proteins and possible effects on human health. Additionally, VarChat offers direct links to related scientific trustable sources, and encourages deeper research.\u0000\u0000\u0000AVAILABILITY\u0000varchat.engenome.com.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140740658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1