首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
PCA-constrained multi-core matrix fusion network: A novel approach for cancer subtype identification. PCA约束多核矩阵融合网络:癌症亚型识别的新方法
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-08-24 DOI: 10.1142/S0219720024500148
Min Li, Zhifang Qi, Liang Liu, Mingzhu Lou, Shaobo Deng

Cancer subtyping refers to categorizing a particular cancer type into distinct subtypes or subgroups based on a range of molecular characteristics, clinical manifestations, histological features, and other relevant factors. The identification of cancer subtypes can significantly enhance precision in clinical practice and facilitate personalized diagnosis and treatment strategies. Recent advancements in the field have witnessed the emergence of numerous network fusion methods aimed at identifying cancer subtypes. The majority of these fusion algorithms, however, solely rely on the fusion network of a single core matrix for the identification of cancer subtypes and fail to comprehensively capture similarity. To tackle this issue, in this study, we propose a novel cancer subtype recognition method, referred to as PCA-constrained multi-core matrix fusion network (PCA-MM-FN). The PCA-MM-FN algorithm initially employs three distinct methods to obtain three core matrices. Subsequently, the obtained core matrices are projected into a shared subspace using principal component analysis, followed by a weighted network fusion. Lastly, spectral clustering is conducted on the fused network. The results obtained from conducting experiments on the mRNA expression, DNA methylation, and miRNA expression of five TCGA datasets and three multi-omics benchmark datasets demonstrate that the proposed PCA-MM-FN approach exhibits superior accuracy in identifying cancer subtypes compared to the existing methods.

癌症亚型是指根据一系列分子特征、临床表现、组织学特征和其他相关因素,将特定癌症类型分为不同的亚型或亚组。癌症亚型的确定可大大提高临床实践的精确性,促进个性化诊断和治疗策略。该领域的最新进展见证了许多旨在识别癌症亚型的网络融合方法的出现。然而,这些融合算法大多仅依靠单一核心矩阵的融合网络来识别癌症亚型,无法全面捕捉相似性。针对这一问题,我们在本研究中提出了一种新型癌症亚型识别方法,即 PCA-约束多核矩阵融合网络(PCA-MM-FN)。PCA-MM-FN 算法首先采用三种不同的方法获得三个核心矩阵。随后,利用主成分分析法将获得的核心矩阵投影到一个共享子空间,然后进行加权网络融合。最后,对融合后的网络进行光谱聚类。通过对 5 个 TCGA 数据集和 3 个多组学基准数据集的 mRNA 表达、DNA 甲基化和 miRNA 表达进行实验得出的结果表明,与现有方法相比,拟议的 PCA-MM-FN 方法在识别癌症亚型方面表现出更高的准确性。
{"title":"PCA-constrained multi-core matrix fusion network: A novel approach for cancer subtype identification.","authors":"Min Li, Zhifang Qi, Liang Liu, Mingzhu Lou, Shaobo Deng","doi":"10.1142/S0219720024500148","DOIUrl":"10.1142/S0219720024500148","url":null,"abstract":"<p><p>Cancer subtyping refers to categorizing a particular cancer type into distinct subtypes or subgroups based on a range of molecular characteristics, clinical manifestations, histological features, and other relevant factors. The identification of cancer subtypes can significantly enhance precision in clinical practice and facilitate personalized diagnosis and treatment strategies. Recent advancements in the field have witnessed the emergence of numerous network fusion methods aimed at identifying cancer subtypes. The majority of these fusion algorithms, however, solely rely on the fusion network of a single core matrix for the identification of cancer subtypes and fail to comprehensively capture similarity. To tackle this issue, in this study, we propose a novel cancer subtype recognition method, referred to as PCA-constrained multi-core matrix fusion network (PCA-MM-FN). The PCA-MM-FN algorithm initially employs three distinct methods to obtain three core matrices. Subsequently, the obtained core matrices are projected into a shared subspace using principal component analysis, followed by a weighted network fusion. Lastly, spectral clustering is conducted on the fused network. The results obtained from conducting experiments on the mRNA expression, DNA methylation, and miRNA expression of five TCGA datasets and three multi-omics benchmark datasets demonstrate that the proposed PCA-MM-FN approach exhibits superior accuracy in identifying cancer subtypes compared to the existing methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450014"},"PeriodicalIF":0.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of autoencoder and graph convolutional network for predicting breast cancer drug response. 整合自动编码器和图卷积网络,预测乳腺癌药物反应。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 DOI: 10.1142/S0219720024500136
V Abinas, U Abhinav, E M Haneem, A Vishnusankar, K A Abdul Nazeer

Background and objectives: Breast cancer is the most prevalent type of cancer among women. The effectiveness of anticancer pharmacological therapy may get adversely affected by tumor heterogeneity that includes genetic and transcriptomic features. This leads to clinical variability in patient response to therapeutic drugs. Anticancer drug design and cancer understanding require precise identification of cancer drug responses. The performance of drug response prediction models can be improved by integrating multi-omics data and drug structure data. Methods: In this paper, we propose an Autoencoder (AE) and Graph Convolutional Network (AGCN) for drug response prediction, which integrates multi-omics data and drug structure data. Specifically, we first converted the high dimensional representation of each omic data to a lower dimensional representation using an AE for each omic data set. Subsequently, these individual features are combined with drug structure data obtained using a Graph Convolutional Network and given to a Convolutional Neural Network to calculate IC[Formula: see text] values for every combination of cell lines and drugs. Then a threshold IC[Formula: see text] value is obtained for each drug by performing K-means clustering of their known IC[Formula: see text] values. Finally, with the help of this threshold value, cell lines are classified as either sensitive or resistant to each drug. Results: Experimental results indicate that AGCN has an accuracy of 0.82 and performs better than many existing methods. In addition to that, we have done external validation of AGCN using data taken from The Cancer Genome Atlas (TCGA) clinical database, and we got an accuracy of 0.91. Conclusion: According to the results obtained, concatenating multi-omics data with drug structure data using AGCN for drug response prediction tasks greatly improves the accuracy of the prediction task.

背景和目的:乳腺癌是女性中发病率最高的癌症类型。抗癌药物治疗的有效性可能会受到肿瘤异质性(包括遗传和转录组特征)的不利影响。这导致患者对治疗药物的临床反应存在差异。抗癌药物的设计和对癌症的理解需要对癌症药物反应进行精确识别。通过整合多组学数据和药物结构数据,可以提高药物反应预测模型的性能。方法:本文提出了一种用于药物反应预测的自动编码器(AE)和图卷积网络(AGCN),它整合了多组学数据和药物结构数据。具体来说,我们首先使用 AE 将每个 omic 数据集的高维表示转换为低维表示。然后,将这些单个特征与使用图卷积网络获得的药物结构数据结合起来,再交给卷积神经网络计算细胞系和药物每种组合的 IC[计算公式:见正文]值。然后,通过对已知的 IC[计算公式:见正文]值进行 K-means 聚类,为每种药物得出一个 IC[计算公式:见正文]阈值。最后,在该阈值的帮助下,细胞系被划分为对每种药物敏感或耐药。结果:实验结果表明,AGCN 的准确率为 0.82,优于许多现有方法。此外,我们还使用癌症基因组图谱(TCGA)临床数据库中的数据对 AGCN 进行了外部验证,结果发现其准确率为 0.91。结论根据研究结果,使用 AGCN 将多组学数据与药物结构数据串联起来用于药物反应预测任务,大大提高了预测任务的准确性。
{"title":"Integration of autoencoder and graph convolutional network for predicting breast cancer drug response.","authors":"V Abinas, U Abhinav, E M Haneem, A Vishnusankar, K A Abdul Nazeer","doi":"10.1142/S0219720024500136","DOIUrl":"https://doi.org/10.1142/S0219720024500136","url":null,"abstract":"<p><p><b>Background and objectives:</b> Breast cancer is the most prevalent type of cancer among women. The effectiveness of anticancer pharmacological therapy may get adversely affected by tumor heterogeneity that includes genetic and transcriptomic features. This leads to clinical variability in patient response to therapeutic drugs. Anticancer drug design and cancer understanding require precise identification of cancer drug responses. The performance of drug response prediction models can be improved by integrating multi-omics data and drug structure data. <b>Methods:</b> In this paper, we propose an Autoencoder (AE) and Graph Convolutional Network (AGCN) for drug response prediction, which integrates multi-omics data and drug structure data. Specifically, we first converted the high dimensional representation of each omic data to a lower dimensional representation using an AE for each omic data set. Subsequently, these individual features are combined with drug structure data obtained using a Graph Convolutional Network and given to a Convolutional Neural Network to calculate IC[Formula: see text] values for every combination of cell lines and drugs. Then a threshold IC[Formula: see text] value is obtained for each drug by performing K-means clustering of their known IC[Formula: see text] values. Finally, with the help of this threshold value, cell lines are classified as either sensitive or resistant to each drug. <b>Results:</b> Experimental results indicate that AGCN has an accuracy of 0.82 and performs better than many existing methods. In addition to that, we have done external validation of AGCN using data taken from The Cancer Genome Atlas (TCGA) clinical database, and we got an accuracy of 0.91. <b>Conclusion:</b> According to the results obtained, concatenating multi-omics data with drug structure data using AGCN for drug response prediction tasks greatly improves the accuracy of the prediction task.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 3","pages":"2450013"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141761970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gtie-Rt: A comprehensive graph learning model for predicting drugs targeting metabolic pathways in human. Gtie-Rt:用于预测以人类代谢途径为靶点的药物的综合图学习模型。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-07-20 DOI: 10.1142/S0219720024500100
Hayat Ali Shah, Juan Liu, Zhihui Yang

Drugs often target specific metabolic pathways to produce a therapeutic effect. However, these pathways are complex and interconnected, making it challenging to predict a drug's potential effects on an organism's overall metabolism. The mapping of drugs with targeting metabolic pathways in the organisms can provide a more complete understanding of the metabolic effects of a drug and help to identify potential drug-drug interactions. In this study, we proposed a machine learning hybrid model Graph Transformer Integrated Encoder (GTIE-RT) for mapping drugs to target metabolic pathways in human. The proposed model is a composite of a Graph Convolution Network (GCN) and transformer encoder for graph embedding and attention mechanism. The output of the transformer encoder is then fed into the Extremely Randomized Trees Classifier to predict target metabolic pathways. The evaluation of the GTIE-RT on drugs dataset demonstrates excellent performance metrics, including accuracy (>95%), recall (>92%), precision (>93%) and F1-score (>92%). Compared to other variants and machine learning methods, GTIE-RT consistently shows more reliable results.

药物通常针对特定的代谢途径产生治疗效果。然而,这些途径复杂且相互关联,因此预测药物对生物体整体代谢的潜在影响具有挑战性。绘制以生物体内代谢途径为靶点的药物图谱可以更全面地了解药物的代谢效应,并有助于识别潜在的药物间相互作用。在这项研究中,我们提出了一种机器学习混合模型 Graph Transformer Integrated Encoder (GTIE-RT),用于绘制药物在人体内的靶向代谢途径图。该模型由图形卷积网络(GCN)和用于图形嵌入和关注机制的变换器编码器组成。转换器编码器的输出被输入到极随机树分类器中,以预测目标代谢途径。在药物数据集上对 GTIE-RT 进行的评估显示了其出色的性能指标,包括准确率(>95%)、召回率(>92%)、精确率(>93%)和 F1 分数(>92%)。与其他变体和机器学习方法相比,GTIE-RT 始终显示出更可靠的结果。
{"title":"Gtie-Rt: A comprehensive graph learning model for predicting drugs targeting metabolic pathways in human.","authors":"Hayat Ali Shah, Juan Liu, Zhihui Yang","doi":"10.1142/S0219720024500100","DOIUrl":"10.1142/S0219720024500100","url":null,"abstract":"<p><p>Drugs often target specific metabolic pathways to produce a therapeutic effect. However, these pathways are complex and interconnected, making it challenging to predict a drug's potential effects on an organism's overall metabolism. The mapping of drugs with targeting metabolic pathways in the organisms can provide a more complete understanding of the metabolic effects of a drug and help to identify potential drug-drug interactions. In this study, we proposed a machine learning hybrid model Graph Transformer Integrated Encoder (GTIE-RT) for mapping drugs to target metabolic pathways in human. The proposed model is a composite of a Graph Convolution Network (GCN) and transformer encoder for graph embedding and attention mechanism. The output of the transformer encoder is then fed into the Extremely Randomized Trees Classifier to predict target metabolic pathways. The evaluation of the GTIE-RT on drugs dataset demonstrates excellent performance metrics, including accuracy (>95%), recall (>92%), precision (>93%) and F1-score (>92%). Compared to other variants and machine learning methods, GTIE-RT consistently shows more reliable results.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450010"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141727966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of transcript regulation mechanism prediction models based on binding motif environment of transcription factor AoXlnR in Aspergillus oryzae. 基于黑曲霉转录因子 AoXlnR 的结合主题环境构建转录本调控机制预测模型
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 DOI: 10.1142/S0219720024500173
Hiroya Oka, Takaaki Kojima, Ryuji Kato, Kunio Ihara, Hideo Nakano

DNA-binding transcription factors (TFs) play a central role in transcriptional regulation mechanisms, mainly through their specific binding to target sites on the genome and regulation of the expression of downstream genes. Therefore, a comprehensive analysis of the function of these TFs will lead to the understanding of various biological mechanisms. However, the functions of TFs in vivo are diverse and complicated, and the identified binding sites on the genome are not necessarily involved in the regulation of downstream gene expression. In this study, we investigated whether DNA structural information around the binding site of TFs can be used to predict the involvement of the binding site in the regulation of the expression of genes located downstream of the binding site. Specifically, we calculated the structural parameters based on the DNA shape around the DNA binding motif located upstream of the gene whose expression is directly regulated by one TF AoXlnR from Aspergillus oryzae, and showed that the presence or absence of expression regulation can be predicted from the sequence information with high accuracy ([Formula: see text]-1.0) by machine learning incorporating these parameters.

DNA 结合型转录因子(TFs)在转录调控机制中发挥着核心作用,主要是通过与基因组上的靶位点特异性结合,调控下游基因的表达。因此,全面分析这些转录因子的功能将有助于了解各种生物学机制。然而,TFs 在体内的功能是多样而复杂的,而且在基因组上确定的结合位点并不一定参与下游基因的表达调控。在本研究中,我们探讨了能否利用 TFs 结合位点周围的 DNA 结构信息来预测结合位点是否参与调控位于结合位点下游的基因的表达。具体来说,我们根据位于基因上游、其表达受一种来自黑曲霉的 TF AoXlnR 直接调控的 DNA 结合位点周围的 DNA 形状计算了结构参数,结果表明,通过机器学习结合这些参数,可以从序列信息预测表达调控的存在与否,准确率很高([公式:见正文]-1.0)。
{"title":"Construction of transcript regulation mechanism prediction models based on binding motif environment of transcription factor AoXlnR in <i>Aspergillus oryzae</i>.","authors":"Hiroya Oka, Takaaki Kojima, Ryuji Kato, Kunio Ihara, Hideo Nakano","doi":"10.1142/S0219720024500173","DOIUrl":"10.1142/S0219720024500173","url":null,"abstract":"<p><p>DNA-binding transcription factors (TFs) play a central role in transcriptional regulation mechanisms, mainly through their specific binding to target sites on the genome and regulation of the expression of downstream genes. Therefore, a comprehensive analysis of the function of these TFs will lead to the understanding of various biological mechanisms. However, the functions of TFs <i>in vivo</i> are diverse and complicated, and the identified binding sites on the genome are not necessarily involved in the regulation of downstream gene expression. In this study, we investigated whether DNA structural information around the binding site of TFs can be used to predict the involvement of the binding site in the regulation of the expression of genes located downstream of the binding site. Specifically, we calculated the structural parameters based on the DNA shape around the DNA binding motif located upstream of the gene whose expression is directly regulated by one TF AoXlnR from <i>Aspergillus oryzae</i>, and showed that the presence or absence of expression regulation can be predicted from the sequence information with high accuracy ([Formula: see text]-1.0) by machine learning incorporating these parameters.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 3","pages":"2450017"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141761969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NDMNN: A novel deep residual network based MNN method to remove batch effects from scRNA-seq data. NDMNN:基于深度残差网络的新型 MNN 方法,用于消除 scRNA-seq 数据中的批次效应。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-07-20 DOI: 10.1142/S021972002450015X
Yupeng Ma, Yongzhen Pei

The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.

单细胞 RNA 测序(scRNA-seq)技术的快速发展产生了大量数据。然而,由于时间点、实验人员和使用仪器的不同等各种因素,这些数据往往表现出批次效应,从而掩盖了数据本身的生物学差异。根据 scRNA-seq 数据的特点,我们设计了一个密集的深度残差网络模型,简称为 NDnetwork。随后,我们将 NDnetwork 模型与 MNN 方法相结合,校正了 scRNA-seq 数据中的批次效应,并将其命名为 NDMNN 方法。综合实验结果表明,NDMNN方法在校正scRNA-seq数据的批次效应方面优于现有的常用方法。随着单细胞测序规模的不断扩大,我们相信 NDMNN 将成为生物界研究人员在研究中校正批次效应的重要工具。有关 NDMNN 方法的源代码和实验结果,请访问 https://github.com/mustang-hub/NDMNN。
{"title":"NDMNN: A novel deep residual network based MNN method to remove batch effects from scRNA-seq data.","authors":"Yupeng Ma, Yongzhen Pei","doi":"10.1142/S021972002450015X","DOIUrl":"10.1142/S021972002450015X","url":null,"abstract":"<p><p>The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450015"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141735374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How much can ChatGPT really help computational biologists in programming? ChatGPT 对计算生物学家的编程到底有多大帮助?
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-01 Epub Date: 2024-05-22 DOI: 10.1142/S021972002471001X
Chowdhury Rafeed Rahman, Limsoon Wong

ChatGPT, a recently developed product by openAI, is successfully leaving its mark as a multi-purpose natural language based chatbot. In this paper, we are more interested in analyzing its potential in the field of computational biology. A major share of work done by computational biologists these days involve coding up bioinformatics algorithms, analyzing data, creating pipelining scripts and even machine learning modeling and feature extraction. This paper focuses on the potential influence (both positive and negative) of ChatGPT in the mentioned aspects with illustrative examples from different perspectives. Compared to other fields of computer science, computational biology has (1) less coding resources, (2) more sensitivity and bias issues (deals with medical data), and (3) more necessity of coding assistance (people from diverse background come to this field). Keeping such issues in mind, we cover use cases such as code writing, reviewing, debugging, converting, refactoring, and pipelining using ChatGPT from the perspective of computational biologists in this paper.

ChatGPT 是 openAI 最近开发的一款产品,作为一款基于自然语言的多功能聊天机器人,它成功地留下了自己的印记。在本文中,我们更感兴趣的是分析它在计算生物学领域的潜力。如今,计算生物学家的大部分工作都涉及生物信息学算法编码、数据分析、创建流水线脚本,甚至机器学习建模和特征提取。本文将从不同角度举例说明 ChatGPT 在上述方面的潜在影响(包括正面和负面影响)。与计算机科学的其他领域相比,计算生物学具有以下特点:(1)编码资源较少;(2)敏感性和偏差问题较多(涉及医学数据);(3)更需要编码帮助(来自不同背景的人员进入这一领域)。考虑到这些问题,我们在本文中从计算生物学家的角度出发,介绍了使用 ChatGPT 进行代码编写、审查、调试、转换、重构和流水线等工作的用例。
{"title":"How much can ChatGPT really help computational biologists in programming?","authors":"Chowdhury Rafeed Rahman, Limsoon Wong","doi":"10.1142/S021972002471001X","DOIUrl":"10.1142/S021972002471001X","url":null,"abstract":"<p><p>ChatGPT, a recently developed product by openAI, is successfully leaving its mark as a multi-purpose natural language based chatbot. In this paper, we are more interested in analyzing its potential in the field of computational biology. A major share of work done by computational biologists these days involve coding up bioinformatics algorithms, analyzing data, creating pipelining scripts and even machine learning modeling and feature extraction. This paper focuses on the potential influence (both positive and negative) of ChatGPT in the mentioned aspects with illustrative examples from different perspectives. Compared to other fields of computer science, computational biology has (1) less coding resources, (2) more sensitivity and bias issues (deals with medical data), and (3) more necessity of coding assistance (people from diverse background come to this field). Keeping such issues in mind, we cover use cases such as code writing, reviewing, debugging, converting, refactoring, and pipelining using ChatGPT from the perspective of computational biologists in this paper.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2471001"},"PeriodicalIF":1.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Long- and Short-term Dependencies for Improving Drug-Target Binding Affinity Prediction using Transformer and Edge Contraction Pooling 利用变换器和边缘收缩池学习长期和短期依赖性以改进药物-靶点结合亲和力预测
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-15 DOI: 10.1142/s0219720023500300
Min Gao, Shaohua Jiang, Weibin Ding, Ting Xu, Zhijian Lyu
{"title":"Learning Long- and Short-term Dependencies for Improving Drug-Target Binding Affinity Prediction using Transformer and Edge Contraction Pooling","authors":"Min Gao, Shaohua Jiang, Weibin Ding, Ting Xu, Zhijian Lyu","doi":"10.1142/s0219720023500300","DOIUrl":"https://doi.org/10.1142/s0219720023500300","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"23 5","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138999807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive Recognition of DNA-binding proteins based on Pre-trained Language Model BERT 基于预训练语言模型的 DNA 结合蛋白预测识别 BERT
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-08 DOI: 10.1142/s0219720023500282
Yue Ma, Yongzhen Pei, Changguo Li
{"title":"Predictive Recognition of DNA-binding proteins based on Pre-trained Language Model BERT","authors":"Yue Ma, Yongzhen Pei, Changguo Li","doi":"10.1142/s0219720023500282","DOIUrl":"https://doi.org/10.1142/s0219720023500282","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"185 3","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139011307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imputation for single-cell RNA-seq data with non-negative matrix factorization and transfer learning 利用非负矩阵因式分解和迁移学习对单细胞 RNA-seq 数据进行估算
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-08 DOI: 10.1142/s0219720023500294
Jiadi Zhu, Youlong Yang
{"title":"Imputation for single-cell RNA-seq data with non-negative matrix factorization and transfer learning","authors":"Jiadi Zhu, Youlong Yang","doi":"10.1142/s0219720023500294","DOIUrl":"https://doi.org/10.1142/s0219720023500294","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"65 2","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139011371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for the Uniqueness of the Longest Common Subsequence. 最长共同后序唯一性算法。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-01 Epub Date: 2024-01-10 DOI: 10.1142/S0219720023500270
Yue Wang

Given several number sequences, determining the longest common subsequence is a classical problem in computer science. This problem has applications in bioinformatics, especially determining transposable genes. Nevertheless, related works only consider how to find one longest common subsequence. In this paper, we consider how to determine the uniqueness of the longest common subsequence. If there are multiple longest common subsequences, we also determine which number appears in all/some/none of the longest common subsequences. We focus on four scenarios: (1) linear sequences without duplicated numbers; (2) circular sequences without duplicated numbers; (3) linear sequences with duplicated numbers; (4) circular sequences with duplicated numbers. We develop corresponding algorithms and apply them to gene sequencing data.

给定几个数字序列,确定最长公共子序列是计算机科学中的一个经典问题。这一问题在生物信息学中也有应用,尤其是确定转座基因。然而,相关工作只考虑如何找到一个最长公共子序列。在本文中,我们考虑的是如何确定最长公共子序列的唯一性。如果存在多个最长公共子序列,我们还要确定哪个数字出现在所有/部分/无最长公共子序列中。我们重点研究四种情况:(1) 无重复数字的线性序列;(2) 无重复数字的循环序列;(3) 有重复数字的线性序列;(4) 有重复数字的循环序列。我们开发了相应的算法,并将其应用于基因测序数据。
{"title":"Algorithms for the Uniqueness of the Longest Common Subsequence.","authors":"Yue Wang","doi":"10.1142/S0219720023500270","DOIUrl":"10.1142/S0219720023500270","url":null,"abstract":"<p><p>Given several number sequences, determining the longest common subsequence is a classical problem in computer science. This problem has applications in bioinformatics, especially determining transposable genes. Nevertheless, related works only consider how to find one longest common subsequence. In this paper, we consider how to determine the uniqueness of the longest common subsequence. If there are multiple longest common subsequences, we also determine which number appears in all/some/none of the longest common subsequences. We focus on four scenarios: (1) linear sequences without duplicated numbers; (2) circular sequences without duplicated numbers; (3) linear sequences with duplicated numbers; (4) circular sequences with duplicated numbers. We develop corresponding algorithms and apply them to gene sequencing data.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2350027"},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139425753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1